Scheduling algorithm for xsl:for-each/@saxon:threads=N

Sunday, 24 March 2019

This story started half year ago when Michael Kay, author of Saxon XSLT processor, was dealing with performance in multithreaded environment. See Bug #3958.

The problem is like this.

Given XSLT:

<xsl:stylesheet exclude-result-prefixes="#all" 
  version="3.0" 
  xmlns:saxon="http://saxon.sf.net/"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text" />

  <xsl:template name="main">
    <xsl:for-each saxon:threads="4" select="1 to 10">
      <xsl:choose>
        <xsl:when test=". eq 1">
          <!-- Will take 10 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/10')?url"/>
        </xsl:when>
        <xsl:when test=". eq 5">
          <!-- Will take 9 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/9')?url"/>
        </xsl:when>
        <xsl:when test=". eq 10">
          <!-- Will take 8 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/8')?url"/>
        </xsl:when>
      </xsl:choose>
    </xsl:for-each>
    <xsl:text>
</xsl:text>
  </xsl:template>
</xsl:stylesheet>

Implement engine to achieve best performance of parallel for-each.

Naive implementation that will distribute iterations per threads will run into unfair load on threads, so some load-balancing is required. That was the case Saxon EE.

Michael Kay has been trying to find most elegant way for the implementation and has written the comment:

I can't help feeling that the answer to this must lie in using the Streams machinery, and Spliterators in particular. I've spent another hour or so reading all about Spliterators, and I have to confess I really don't understand the paradigm. If someone can enlighten me, please go ahead...

We have decided to take the challange and to model the expected behavior using Streams. Here is our go:

import java.util.stream.IntStream;
import java.util.stream.Stream;
import java.util.function.Consumer;
import java.util.function.Function;

public class Streams
{
  public static class Item<T>
  {
    public Item(int index, T data)
    {
      this.index = index;
      this.data = data;
    }
    
    int index;
    T data;
  }

  public static void main(String[] args)
  {
    run(
      "Sequential",
      input(),
      Streams::action,
      Streams::output,
      true);
    
    run(
      "Parallel ordered", 
      input().parallel(),
      Streams::action,
      Streams::output,
      true);
    
    run(
      "Parallel unordered", 
      input().parallel(),
      Streams::action,
      Streams::output,
      false);    
  }
  
  private static void run(
    String description,
    Stream<Item<String>> input,
    Function<Item<String>, String[]> action,
    Consumer<String[]> output,
    boolean ordered)
  {
    System.out.println(description);
    
    long start = System.currentTimeMillis();
   
    if (ordered)
    {
      input.map(action).forEachOrdered(output);
    }
    else
    {
      input.map(action).forEach(output);
    }
    
    long end = System.currentTimeMillis();
    
    System.out.println("Execution time: " + (end - start) + "ms.");
    System.out.println();
  }
  
  private static Stream<Item<String>> input()
  {
    return IntStream.range(0, 10).
      mapToObj(i -> new Item<String>(i + 1, "Data " + (i + 1)));
  }
  
  private static String[] action(Item<String> item)
  {
    switch(item.index)
    {
      case 1:
      {
        sleep(10);
        
        break;
      }
      case 5:
      {
        sleep(9);
        
        break;
      }
      case 10:
      {
        sleep(8);
        
        break;
      }
      default:
      {
        sleep(1);
        
        break;
      }
    }
    
    String[] result = { "data:", item.data, "index:", item.index + "" };
    
    return result;
  }
  
  private synchronized static void output(String[] value)
  {
    boolean first = true;
    
    for(String item: value)
    {
      if (first)
      {
        first = false;
      }
      else
      {
        System.out.print(' ');
      }
    
      System.out.print(item);
    }

    System.out.println();
  }
  
  private static void sleep(int seconds)
  {
    try
    {
      Thread.sleep(seconds * 1000);
    }
    catch(InterruptedException e)
    {
      throw new IllegalStateException(e);
    }
  }
}

We model three cases:

"Sequential"

slowest, single threaded execution with output:

data: Data 1 index: 1
data: Data 2 index: 2
data: Data 3 index: 3
data: Data 4 index: 4
data: Data 5 index: 5
data: Data 6 index: 6
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 9 index: 9
data: Data 10 index: 10
Execution time: 34009ms.

"Parallel ordered"

fast, multithread execution preserving order, with output:

data: Data 1 index: 1
data: Data 2 index: 2
data: Data 3 index: 3
data: Data 4 index: 4
data: Data 5 index: 5
data: Data 6 index: 6
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 9 index: 9
data: Data 10 index: 10
Execution time: 10019ms.

"Parallel unordered"

fastest, multithread execution not preserving order, with output:

data: Data 6 index: 6
data: Data 2 index: 2
data: Data 4 index: 4
data: Data 3 index: 3
data: Data 9 index: 9
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 5 index: 5
data: Data 10 index: 10
data: Data 1 index: 1
Execution time: 10001ms.

What we can add in conclusion is that xslt engine could try automatically decide what approach to use, as many SQL engines are doing, and not to force developer to go into low level engine details.

Sunday, 24 March 2019 07:52:02 UTC

Comments [0] -
Java | Thinking aloud | xslt

Comments are closed.