ZetCode

Java Spliterator in Parallel

Last modified: May 8, 2025

The java.util.Spliterator is a powerful iterator-like interface introduced in Java 8, designed for efficiently traversing and partitioning elements of a data source. Its most crucial capability is parallel processing, enabling large datasets to be split into smaller segments that can be processed concurrently by multiple threads. Unlike a traditional Iterator, which only supports sequential iteration, a Spliterator can divide data dynamically, making it a foundational component of Java's parallel Streams API.

A Spliterator provides metadata about the underlying data source using characteristics that help optimize parallel execution. These characteristics influence how tasks are distributed across threads, ensuring efficient workload division. Common characteristics include:

These characteristics are essential for parallel computation, allowing the Java runtime to make intelligent decisions about task partitioning and resource allocation. By leveraging trySplit, a Spliterator enables workload division across multiple threads, ensuring that computational efficiency scales with available processing power.

Java's Fork/Join framework relies on Spliterator internally for managing parallel streams, dynamically distributing tasks to maximize performance. Developers can either use parallel strea ms for automatic concurrency or explicit multi-threading for fine-grained control over execution. Understanding how Spliterator facilitates parallel processing is crucial for building scalable, high-performance applications.

Basic Spliterator Traversal (Sequential)

This example demonstrates basic sequential traversal using a Spliterator. We obtain a Spliterator from a List of strings. Then, we use the tryAdvance method in a loop to process each element. tryAdvance takes a Consumer that specifies the action to perform on the element. It returns false when no more elements are available.

This example showcases the fundamental iteration capability of a Spliterator. It prints characteristics and estimated size before iterating. The iteration itself is sequential, processing one element at a time in the main thread.

Main.java
package com.zetcode;

import java.util.List;
import java.util.Spliterator;

public class Main {

    public static void main(String[] args) {

        List<String> names = List.of("John", "Jane", "Mike", "Sarah", "Tom");

        Spliterator<String> spliterator = names.spliterator();

        System.out.println("Characteristics: " + spliterator.characteristics());
        System.out.println("Estimated size: " + spliterator.estimateSize());
        System.out.println("Elements (sequentially):");

        boolean hasNextElement;
        do {
            hasNextElement = spliterator.tryAdvance(name -> {
                System.out.println("Processing: " + name + " by " + Thread.currentThread().getName());
                // Simulate some work
                try {
                    Thread.sleep(100);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            });
        } while (hasNextElement); // Continue until no more elements remain

        System.out.println("Finished processing.");
    }
}

The output shows the characteristics and initial estimated size. Each element is processed by the main thread. This forms the basis for understanding how Spliterators work before diving into parallel processing.

Understanding trySplit

The trySplit method is fundamental to a Spliterator's role in parallel processing. It attempts to partition the source elements into two. If successful, it returns a new Spliterator covering a leading portion of the elements, while the original Spliterator covers the remainder. This example demonstrates splitting, but still processes sequentially.

It's crucial to understand that calling trySplit by itself does not initiate parallel execution. It merely prepares the data by dividing it. The resulting Spliterators must be processed by separate threads to achieve parallelism, as shown in later examples.

Main.java
package com.zetcode;

import java.util.List;
import java.util.Spliterator;

public class Main {

    public static void main(String[] args) {

        List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
        Spliterator<Integer> spliterator1 = numbers.spliterator();

        System.out.println("Original spliterator size: " + spliterator1.estimateSize());

        // Attempt to split the spliterator
        Spliterator<Integer> spliterator2 = spliterator1.trySplit();

        if (spliterator2 != null) {
            System.out.println("First split size: " + spliterator1.estimateSize());
            System.out.println("Second split size: " + spliterator2.estimateSize());

            System.out.println("\nProcessing first split (sequentially):");
            spliterator1.forEachRemaining(num -> System.out.println(num + " by " + Thread.currentThread().getName()));

            System.out.println("\nProcessing second split (sequentially):");
            spliterator2.forEachRemaining(num -> System.out.println(num + " by " + Thread.currentThread().getName()));
        } else {
            System.out.println("Could not split the spliterator. Processing all elements:");
            spliterator1.forEachRemaining(System.out::println);
        }
    }
}

The output demonstrates that the original list is divided. However, both parts are processed sequentially by the main thread. This example clarifies that trySplit is a mechanism for partitioning, not direct parallel execution.

Parallel Execution with ExecutorService and Spliterator

This example shows how to achieve true parallel processing using Spliterator splits with an ExecutorService. After splitting a Spliterator, we submit tasks to an ExecutorService. Each task processes one of the splits. This allows different parts of the data to be handled concurrently by different threads.

We create a fixed-size thread pool. Each Spliterator (the original and the one returned by trySplit) is processed by a separate task submitted to the pool. The Thread.currentThread().getName() call helps identify which thread processes each element.

Main.java
package com.zetcode;

import java.util.List;
import java.util.Spliterator;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Main {

    public static void main(String[] args) throws InterruptedException {

        List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
        Spliterator<Integer> spliterator1 = numbers.spliterator();
        Spliterator<Integer> spliterator2 = spliterator1.trySplit(); // s1 is now roughly the second half

        try (ExecutorService executor = Executors.newFixedThreadPool(2)) {

            System.out.println("Submitting tasks for parallel processing...");

            // Task for the first split (which is now spliterator2)
            if (spliterator2 != null) {
                executor.submit(() -> {
                    System.out.println("Processing second half (split part) by " + Thread.currentThread().getName());
                    spliterator2.forEachRemaining(num -> {
                        System.out.println("S2: " + num + " by " + Thread.currentThread().getName());
                        try { Thread.sleep(100); } catch (InterruptedException e) { /*ignore*/ }
                    });
                });
            }
            
            // Task for the remaining part of the original spliterator (spliterator1)
            executor.submit(() -> {
                System.out.println("Processing first half (original part) by " + Thread.currentThread().getName());
                spliterator1.forEachRemaining(num -> {
                    System.out.println("S1: " + num + " by " + Thread.currentThread().getName());
                    try { Thread.sleep(100); } catch (InterruptedException e) { /*ignore*/ }
                });
            });

            executor.shutdown();
            if (!executor.awaitTermination(60, TimeUnit.SECONDS)) {
                executor.shutdownNow();
            }
            System.out.println("All tasks completed.");
        }
    }
}

The output will show elements being processed by different threads from the pool (e.g., "pool-1-thread-1", "pool-1-thread-2"). This confirms that the processing of the two data partitions occurs in parallel. This manual approach gives fine-grained control over parallel execution.

Leveraging Parallel Streams with Spliterator

Java Streams provide a high-level API for processing sequences of elements. Parallel streams use Spliterators internally to divide work among multiple threads. This example shows two ways to get parallel streams: collection.parallelStream and StreamSupport.stream(spliterator, true).

The parallelStream method on a collection directly returns a parallel stream. Alternatively, StreamSupport.stream can create a stream from an existing Spliterator. Setting its second argument to true makes the resulting stream parallel. The Java Fork/Join framework manages the parallelism under the hood.

Main.java
package com.zetcode;

import java.util.List;
import java.util.Spliterator;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;

public class Main {

    public static void main(String[] args) {

        List<String> words = List.of("apple", "banana", "cherry", "date", 
                                     "elderberry", "fig", "grape", "honeydew");

        System.out.println("Processing with collection.parallelStream():");
        words.parallelStream().forEach(word -> 
            System.out.println(word + " processed by " + Thread.currentThread().getName())
        );

        System.out.println("\nProcessing with StreamSupport.stream(spliterator, true):");
        Spliterator<String> spliterator = words.spliterator();
        Stream<String> parallelStreamFromSpliterator = StreamSupport.stream(spliterator, true);
        
        parallelStreamFromSpliterator.forEach(word -> 
            System.out.println(word + " processed by " + Thread.currentThread().getName())
        );

        System.out.println("Finished processing with parallel streams.");
    }
}

The output reveals how multiple threads execute tasks concurrently, efficiently distributing workload across available processor cores. By leveraging Java's Fork/Join framework, the Streams API seamlessly handles parallel execution, ensuring optimal resource utilization without requiring manual thread management. This example illustrates how Spliterator facilitates data partitioning, enabling parallel streams to process elements independently while improving performance in large-scale computations.

Parallel Processing of Primitive Data Types

Java provides specialized Spliterators for primitive types like int, long, and double (e.g., Spliterator.OfInt). These avoid the overhead of boxing/unboxing primitives to their wrapper classes. This example uses Spliterator.OfInt obtained from an int array.

We first demonstrate creating a parallel IntStream using StreamSupport.intStream(spliterator, true) to calculate a sum. Then, we show manual parallel processing by splitting the Spliterator.OfInt and using an ExecutorService. Each task uses forEachRemaining(IntConsumer) for efficient primitive processing.

Main.java
package com.zetcode;

import java.util.Arrays;
import java.util.Spliterator;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.LongAdder;
import java.util.stream.IntStream;
import java.util.stream.StreamSupport;

public class Main {

    public static void main(String[] args) throws InterruptedException {

        int[] numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20};

        // Method 1: Using parallel IntStream from Spliterator.OfInt
        Spliterator.OfInt spliteratorForStream = Arrays.spliterator(numbers);
        IntStream parallelIntStream = StreamSupport.intStream(spliteratorForStream, true);
        long sum1 = parallelIntStream.sum();
        System.out.println("Sum using parallel IntStream: " + sum1);

        // Method 2: Manual parallel processing with ExecutorService
        System.out.println("\nManual parallel processing of primitive array:");
        Spliterator.OfInt s1 = Arrays.spliterator(numbers);
        Spliterator.OfInt s2 = s1.trySplit();

        LongAdder partialSum1 = new LongAdder();
        LongAdder partialSum2 = new LongAdder();

        try (ExecutorService executor = Executors.newFixedThreadPool(2)) {
            if (s2 != null) {
                executor.submit(() -> {
                    s2.forEachRemaining((int val) -> {
                        partialSum2.add(val);
                    });
                });
            }
            executor.submit(() -> {
                s1.forEachRemaining((int val) -> {
                    partialSum1.add(val);
                });
            });

            executor.shutdown();
            if (!executor.awaitTermination(1, TimeUnit.MINUTES)) {
                executor.shutdownNow();
            }
        }

        long totalSum = partialSum1.sum() + partialSum2.sum();
        System.out.println("Sum using manual parallel processing: " + totalSum);
    }
}

This example showcases two approaches for parallel processing of primitive arrays. Using parallel streams is often more concise. Manual control with ExecutorService offers flexibility, especially when integrating with existing threading models or when fine-grained task management is required. Both methods leverage the splitting capabilities of Spliterator.OfInt.

Parallel Summation of Numbers using Spliterator

This example demonstrates summing a list of numbers in parallel. We obtain a Spliterator from a List<Integer>, split it, and then use an ExecutorService to calculate partial sums concurrently. Each task is a Callable that returns its partial sum. These partial sums are then collected using Future objects and combined to get the total sum.

This pattern is common for "divide and conquer" parallel algorithms. The work of summation is divided between threads, and results are aggregated. It highlights how Spliterator facilitates breaking down a computation for parallel execution.

Main.java
package com.zetcode;

import java.util.List;
import java.util.Spliterator;

import java.util.concurrent.*;
import java.util.function.Consumer;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class Main {

    public static void main(String[] args) throws Exception {
        List<Integer> numbers = IntStream.rangeClosed(1, 10000)
                .boxed()
                .collect(Collectors.toList());

        Spliterator<Integer> spliterator1 = numbers.spliterator();
        Spliterator<Integer> spliterator2 = spliterator1.trySplit(); // s1 is now the second half

        try (ExecutorService executor = Executors.newFixedThreadPool(2)) {

            // Task for the first part (which is spliterator2)
            Callable<Long> sumTask1 = () -> {
                long sum = 0;
                if (spliterator2 != null) {
                    SummingConsumer consumer = new SummingConsumer();
                    spliterator2.forEachRemaining(consumer);
                    sum = consumer.getTotal();
                }
                System.out.println("Sum from task 1 (thread " + Thread.currentThread().getName() + "): " + sum);
                return sum;
            };

            // Task for the second part (remaining of spliterator1)
            Callable<Long> sumTask2 = () -> {
                SummingConsumer consumer = new SummingConsumer();
                spliterator1.forEachRemaining(consumer);
                long sum = consumer.getTotal();
                System.out.println("Sum from task 2 (thread " + Thread.currentThread().getName() + "): " + sum);
                return sum;
            };

            Future<Long> future1 = executor.submit(sumTask1);
            Future<Long> future2 = executor.submit(sumTask2);

            long totalSum = future1.get() + future2.get();
            System.out.println("Total sum calculated in parallel: " + totalSum);

            long expectedSum = numbers.stream().mapToLong(Integer::longValue).sum();
            System.out.println("Expected sum (sequential stream): " + expectedSum);

            executor.shutdown();
            if (!executor.awaitTermination(1, TimeUnit.MINUTES)) {
                executor.shutdownNow();
            }
        }
    }

    // Helper class for summing, as lambda variable for sum must be effectively final
    static class SummingConsumer implements Consumer<Integer> {
        private long total = 0;
        @Override
        public void accept(Integer value) {
            total += value;
        }
        public long getTotal() {
            return total;
        }
    }
}

The splitting mechanism is achieved using trySplit, which divides the original Spliterator into two parts. The first Spliterator (spliterator1) retains one portion of the data, while the second Spliterator (spliterator2) handles the other. This allows parallel processing of each subset independently.

To perform the parallel summation, an ExecutorService is employed with a fixed thread pool of size 2. Each split is processed by a separate thread, ensuring optimized resource utilization. The tasks are submitted as Callable<Long> functions, which return computed sums asynchronously.

A helper class, SummingConsumer, is used for accumulating sums while traversing elements with forEachRemaining. This approach is necessary because variables within lambda expressions must be effectively final, preventing direct in-place modification.

Once both tasks complete execution, their results are combined to obtain the total sum in parallel. The computed result is then verified against a sequential sum obtained through a standard stream operation. This ensures correctness and provides insight into the performance benefits of parallel execution.

Parallel Data Transformation using Spliterator

This example demonstrates parallel processing using Spliterator and ExecutorService. The program starts with a predefined list of words and splits the workload into two separate tasks using trySplit. Each task transforms its assigned words to uppercase and appends thread identification to illustrate concurrency in action.

This is useful for CPU-bound transformation tasks on large datasets. Splitting the data allows multiple cores to work on the transformation simultaneously, potentially speeding up the overall process significantly.

Main.java
package com.zetcode;

import java.util.ArrayList;
import java.util.List;
import java.util.Spliterator;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;

public class Main {

    public static void main(String[] args) throws Exception {
        List<String> words = List.of("alpha", "bravo", "charlie", "delta", "echo",
                "foxtrot", "golf", "hotel", "india", "juliett");

        Spliterator<String> s1 = words.spliterator();
        Spliterator<String> s2 = s1.trySplit(); // s1 is now the second half

        try (ExecutorService executor = Executors.newFixedThreadPool(2)) {

            Callable<List<String>> transformTask1 = () -> {
                List<String> result = new ArrayList<>();
                if (s2 != null) { // s2 is the first half
                    s2.forEachRemaining(word -> {
                        result.add(word.toUpperCase() + " (processed by " + Thread.currentThread().getName() + ")");
                    });
                }
                return result;
            };

            Callable<List<String>> transformTask2 = () -> {
                List<String> result = new ArrayList<>();
                s1.forEachRemaining(word -> { // s1 is the second half
                    result.add(word.toUpperCase() + " (processed by " + Thread.currentThread().getName() + ")");
                });
                return result;
            };

            Future<List<String>> future1 = executor.submit(transformTask1);
            Future<List<String>> future2 = executor.submit(transformTask2);

            List<String> combinedResult = new ArrayList<>();
            combinedResult.addAll(future1.get());
            combinedResult.addAll(future2.get());

            System.out.println("Transformed words in parallel:");
            combinedResult.forEach(System.out::println);

            executor.shutdown();
            if (!executor.awaitTermination(1, TimeUnit.MINUTES)) {
                executor.shutdownNow();
            }
        }

        System.out.println("\nSequentially transformed for order reference:");
        words.stream().map(String::toUpperCase).forEach(System.out::println);
    }
}

The splitting mechanism divides the original Spliterator into two segments. The first half (s2) and the remaining portion (s1) are then processed independently by separate threads. This approach optimizes CPU utilization by distributing computational tasks across multiple threads.

To facilitate true parallel execution, a fixed thread pool (ExecutorService) is employed, ensuring that each split is processed concurrently. The tasks are defined as Callable<List<String>>, allowing asynchronous execution while returning transformed results upon completion.

Each task uses forEachRemaining to process elements within its split, ensuring efficient traversal of words without explicit iteration. Once both tasks complete execution, their results are merged into a combined list for final output.

It is important to note that order consistency is not guaranteed unless explicitly managed. Since the processing occurs across multiple threads, the final merged result may differ from the original sequence. For reference, a sequential transformation using a standard stream is displayed alongside the parallel output.

Finally, the ExecutorService is shut down gracefully, ensuring efficient resource cleanup. If tasks exceed their expected execution time, an emergency shutdown prevents unnecessary resource consumption.

Parallel Data Aggregation (e.g., Counting) using Spliterator

This example demonstrates parallel data aggregation, specifically counting elements that match a certain criterion. We use a list of strings (simulating lines from a file). The goal is to count how many strings contain a specific keyword. The Spliterator is split, and each part is processed by a task in an ExecutorService. Each task counts matching strings in its segment, and the main thread sums these partial counts.

This approach is effective for operations like filtering and counting on large datasets. By dividing the dataset and processing parts concurrently, we can often achieve better performance than a purely sequential approach.

Main.java
package com.zetcode;

import java.util.List;
import java.util.Spliterator;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

public class Main {

    public static void main(String[] args) throws Exception {
        List<String> lines = List.of(
                "The quick brown fox",
                "jumps over the lazy dog",
                "A B C D E F G",
                "Another line with fox here",
                "Spliterators are useful for parallel fox processing",
                "The lazy fox is quick"
        );
        String keyword = "fox";

        Spliterator<String> s1 = lines.spliterator();
        Spliterator<String> s2 = s1.trySplit(); // s1 is now the second half

        try (ExecutorService executor = Executors.newFixedThreadPool(2)) {

            Callable<Integer> countTask1 = () -> {
                AtomicInteger count = new AtomicInteger(0);
                if (s2 != null) { // s2 is the first half
                    s2.forEachRemaining(line -> {
                        if (line.contains(keyword)) {
                            count.incrementAndGet();
                        }
                    });
                }
                System.out.println("Count from task 1 (thread " + Thread.currentThread().getName() + "): " + count.get());
                return count.get();
            };

            Callable<Integer> countTask2 = () -> {
                AtomicInteger count = new AtomicInteger(0);
                s1.forEachRemaining(line -> { // s1 is the second half
                    if (line.contains(keyword)) {
                        count.incrementAndGet();
                    }
                });
                System.out.println("Count from task 2 (thread " + Thread.currentThread().getName() + "): " + count.get());
                return count.get();
            };

            Future<Integer> future1 = executor.submit(countTask1);
            Future<Integer> future2 = executor.submit(countTask2);

            int totalCount = future1.get() + future2.get();
            System.out.println("Total lines containing '" + keyword + "': " + totalCount);

            long expectedCount = lines.stream().filter(line -> line.contains(keyword)).count();
            System.out.println("Expected count (sequential stream): " + expectedCount);

            executor.shutdown();
            if (!executor.awaitTermination(1, TimeUnit.MINUTES)) {
                executor.shutdownNow();
            }
        }
    }
}

In this example, a list of strings is processed in parallel to count occurrences of the word "fox". Each task uses an AtomicInteger for its local count to ensure thread safety within the lambda if it were more complex, though here simple local variable would also work. The partial counts are retrieved via Future objects and summed.

A key component in this implementation is the use of AtomicInteger, which guarantees thread-safe increments while counting occurrences of the target keyword. This approach ensures correctness when modifying shared variables inside a lambda expression—though for simple cases, a local variable would suffice.

Once both tasks complete execution, their partial results are retrieved via Future objects and summed to calculate the total keyword occurrences in parallel. The output is then validated by comparing it against a sequential stream-based count, demonstrating the accuracy and efficiency gains achieved through concurrent processing.

Source

Java Spliterator Documentation

This tutorial covered the Java Spliterator interface with a focus on its use in parallel processing. We explored basic traversal, splitting, characteristics, custom Spliterators, and integration with parallel streams and ExecutorService. Understanding Spliterators is vital for writing efficient, scalable Java code that can leverage multi-core processors for data-intensive tasks.

Author

My name is Jan Bodnar, and I am a dedicated programmer with many years of experience in the field. I began writing programming articles in 2007 and have since authored over 1,400 articles and eight e-books. With more than eight years of teaching experience, I am committed to sharing my knowledge and helping others master programming concepts.

List all Java tutorials.