Java Spliterator in Parallel
Last modified: May 8, 2025
The java.util.Spliterator
is a powerful iterator-like interface
introduced in Java 8, designed for efficiently traversing and partitioning
elements of a data source. Its most crucial capability is parallel processing,
enabling large datasets to be split into smaller segments that can be processed
concurrently by multiple threads. Unlike a traditional
Iterator
, which only supports sequential iteration, a
Spliterator
can divide data dynamically, making it a foundational
component of Java's parallel Streams API.
A Spliterator
provides metadata about the underlying data source
using characteristics that help optimize parallel execution. These
characteristics influence how tasks are distributed across threads, ensuring
efficient workload division. Common characteristics include:
ORDERED
: Elements are accessed in a defined sequence.DISTINCT
: Ensures all elements are unique, avoiding redundant computations.SORTED
: Guarantees a specific order, enabling optimized parallel sorting.SIZED
: Provides an accurate element count, helping balance workload distribution.NONNULL
: Prevents null values from disrupting parallel execution.IMMUTABLE
: Ensures thread safety by preventing structural modifications.CONCURRENT
: Allows safe parallel modification without race conditions.SUBSIZED
: Indicates that split portions retain a reliable size estimate.
These characteristics are essential for parallel computation, allowing the Java
runtime to make intelligent decisions about task partitioning and resource
allocation. By leveraging trySplit
, a Spliterator
enables workload division across multiple threads, ensuring that computational
efficiency scales with available processing power.
Java's Fork/Join framework relies on Spliterator
internally for
managing parallel streams, dynamically distributing tasks to maximize
performance. Developers can either use parallel strea ms for automatic
concurrency or explicit multi-threading for fine-grained control over execution.
Understanding how Spliterator
facilitates parallel processing is
crucial for building scalable, high-performance applications.
Basic Spliterator Traversal (Sequential)
This example demonstrates basic sequential traversal using a Spliterator
.
We obtain a Spliterator
from a List
of strings. Then, we
use the tryAdvance
method in a loop to process each element.
tryAdvance
takes a Consumer
that specifies the action
to perform on the element. It returns false
when no more elements
are available.
This example showcases the fundamental iteration capability of a
Spliterator
. It prints characteristics and estimated size before
iterating. The iteration itself is sequential, processing one element at a time
in the main thread.
package com.zetcode; import java.util.List; import java.util.Spliterator; public class Main { public static void main(String[] args) { List<String> names = List.of("John", "Jane", "Mike", "Sarah", "Tom"); Spliterator<String> spliterator = names.spliterator(); System.out.println("Characteristics: " + spliterator.characteristics()); System.out.println("Estimated size: " + spliterator.estimateSize()); System.out.println("Elements (sequentially):"); boolean hasNextElement; do { hasNextElement = spliterator.tryAdvance(name -> { System.out.println("Processing: " + name + " by " + Thread.currentThread().getName()); // Simulate some work try { Thread.sleep(100); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } }); } while (hasNextElement); // Continue until no more elements remain System.out.println("Finished processing."); } }
The output shows the characteristics and initial estimated size. Each element is processed by the main thread. This forms the basis for understanding how Spliterators work before diving into parallel processing.
Understanding trySplit
The trySplit
method is fundamental to a Spliterator
's
role in parallel processing. It attempts to partition the source elements into
two. If successful, it returns a new Spliterator
covering a leading
portion of the elements, while the original Spliterator
covers the
remainder. This example demonstrates splitting, but still processes
sequentially.
It's crucial to understand that calling trySplit
by itself does not
initiate parallel execution. It merely prepares the data by dividing it. The
resulting Spliterators must be processed by separate threads to achieve
parallelism, as shown in later examples.
package com.zetcode; import java.util.List; import java.util.Spliterator; public class Main { public static void main(String[] args) { List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); Spliterator<Integer> spliterator1 = numbers.spliterator(); System.out.println("Original spliterator size: " + spliterator1.estimateSize()); // Attempt to split the spliterator Spliterator<Integer> spliterator2 = spliterator1.trySplit(); if (spliterator2 != null) { System.out.println("First split size: " + spliterator1.estimateSize()); System.out.println("Second split size: " + spliterator2.estimateSize()); System.out.println("\nProcessing first split (sequentially):"); spliterator1.forEachRemaining(num -> System.out.println(num + " by " + Thread.currentThread().getName())); System.out.println("\nProcessing second split (sequentially):"); spliterator2.forEachRemaining(num -> System.out.println(num + " by " + Thread.currentThread().getName())); } else { System.out.println("Could not split the spliterator. Processing all elements:"); spliterator1.forEachRemaining(System.out::println); } } }
The output demonstrates that the original list is divided. However, both parts
are processed sequentially by the main thread. This example clarifies that
trySplit
is a mechanism for partitioning, not direct parallel
execution.
Parallel Execution with ExecutorService and Spliterator
This example shows how to achieve true parallel processing using Spliterator
splits with an ExecutorService
. After splitting a Spliterator, we
submit tasks to an ExecutorService
. Each task processes one of the
splits. This allows different parts of the data to be handled concurrently by
different threads.
We create a fixed-size thread pool. Each Spliterator (the original and the one
returned by trySplit
) is processed by a separate task submitted to
the pool. The Thread.currentThread().getName() call helps identify which thread
processes each element.
package com.zetcode; import java.util.List; import java.util.Spliterator; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; public class Main { public static void main(String[] args) throws InterruptedException { List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); Spliterator<Integer> spliterator1 = numbers.spliterator(); Spliterator<Integer> spliterator2 = spliterator1.trySplit(); // s1 is now roughly the second half try (ExecutorService executor = Executors.newFixedThreadPool(2)) { System.out.println("Submitting tasks for parallel processing..."); // Task for the first split (which is now spliterator2) if (spliterator2 != null) { executor.submit(() -> { System.out.println("Processing second half (split part) by " + Thread.currentThread().getName()); spliterator2.forEachRemaining(num -> { System.out.println("S2: " + num + " by " + Thread.currentThread().getName()); try { Thread.sleep(100); } catch (InterruptedException e) { /*ignore*/ } }); }); } // Task for the remaining part of the original spliterator (spliterator1) executor.submit(() -> { System.out.println("Processing first half (original part) by " + Thread.currentThread().getName()); spliterator1.forEachRemaining(num -> { System.out.println("S1: " + num + " by " + Thread.currentThread().getName()); try { Thread.sleep(100); } catch (InterruptedException e) { /*ignore*/ } }); }); executor.shutdown(); if (!executor.awaitTermination(60, TimeUnit.SECONDS)) { executor.shutdownNow(); } System.out.println("All tasks completed."); } } }
The output will show elements being processed by different threads from the pool (e.g., "pool-1-thread-1", "pool-1-thread-2"). This confirms that the processing of the two data partitions occurs in parallel. This manual approach gives fine-grained control over parallel execution.
Leveraging Parallel Streams with Spliterator
Java Streams provide a high-level API for processing sequences of elements.
Parallel streams use Spliterators internally to divide work among multiple
threads. This example shows two ways to get parallel streams:
collection.parallelStream
and
StreamSupport.stream(spliterator, true)
.
The parallelStream
method on a collection directly returns a
parallel stream. Alternatively, StreamSupport.stream
can create a
stream from an existing Spliterator
. Setting its second argument to
true
makes the resulting stream parallel. The Java Fork/Join
framework manages the parallelism under the hood.
package com.zetcode; import java.util.List; import java.util.Spliterator; import java.util.stream.Stream; import java.util.stream.StreamSupport; public class Main { public static void main(String[] args) { List<String> words = List.of("apple", "banana", "cherry", "date", "elderberry", "fig", "grape", "honeydew"); System.out.println("Processing with collection.parallelStream():"); words.parallelStream().forEach(word -> System.out.println(word + " processed by " + Thread.currentThread().getName()) ); System.out.println("\nProcessing with StreamSupport.stream(spliterator, true):"); Spliterator<String> spliterator = words.spliterator(); Stream<String> parallelStreamFromSpliterator = StreamSupport.stream(spliterator, true); parallelStreamFromSpliterator.forEach(word -> System.out.println(word + " processed by " + Thread.currentThread().getName()) ); System.out.println("Finished processing with parallel streams."); } }
The output reveals how multiple threads execute tasks concurrently, efficiently
distributing workload across available processor cores. By leveraging Java's
Fork/Join framework, the Streams API seamlessly handles parallel execution,
ensuring optimal resource utilization without requiring manual thread
management. This example illustrates how Spliterator
facilitates
data partitioning, enabling parallel streams to process elements independently
while improving performance in large-scale computations.
Parallel Processing of Primitive Data Types
Java provides specialized Spliterators for primitive types like
int
, long
, and double
(e.g.,
Spliterator.OfInt
). These avoid the overhead of boxing/unboxing
primitives to their wrapper classes. This example uses
Spliterator.OfInt
obtained from an int
array.
We first demonstrate creating a parallel IntStream
using
StreamSupport.intStream(spliterator, true)
to calculate a sum.
Then, we show manual parallel processing by splitting the Spliterator.OfInt
and using an ExecutorService
. Each task uses forEachRemaining(IntConsumer)
for efficient primitive processing.
package com.zetcode; import java.util.Arrays; import java.util.Spliterator; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.LongAdder; import java.util.stream.IntStream; import java.util.stream.StreamSupport; public class Main { public static void main(String[] args) throws InterruptedException { int[] numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}; // Method 1: Using parallel IntStream from Spliterator.OfInt Spliterator.OfInt spliteratorForStream = Arrays.spliterator(numbers); IntStream parallelIntStream = StreamSupport.intStream(spliteratorForStream, true); long sum1 = parallelIntStream.sum(); System.out.println("Sum using parallel IntStream: " + sum1); // Method 2: Manual parallel processing with ExecutorService System.out.println("\nManual parallel processing of primitive array:"); Spliterator.OfInt s1 = Arrays.spliterator(numbers); Spliterator.OfInt s2 = s1.trySplit(); LongAdder partialSum1 = new LongAdder(); LongAdder partialSum2 = new LongAdder(); try (ExecutorService executor = Executors.newFixedThreadPool(2)) { if (s2 != null) { executor.submit(() -> { s2.forEachRemaining((int val) -> { partialSum2.add(val); }); }); } executor.submit(() -> { s1.forEachRemaining((int val) -> { partialSum1.add(val); }); }); executor.shutdown(); if (!executor.awaitTermination(1, TimeUnit.MINUTES)) { executor.shutdownNow(); } } long totalSum = partialSum1.sum() + partialSum2.sum(); System.out.println("Sum using manual parallel processing: " + totalSum); } }
This example showcases two approaches for parallel processing of primitive
arrays. Using parallel streams is often more concise. Manual control with
ExecutorService
offers flexibility, especially when integrating
with existing threading models or when fine-grained task management is required.
Both methods leverage the splitting capabilities of
Spliterator.OfInt
.
Parallel Summation of Numbers using Spliterator
This example demonstrates summing a list of numbers in parallel. We obtain a
Spliterator
from a List<Integer>
, split it, and
then use an ExecutorService
to calculate partial sums concurrently.
Each task is a Callable
that returns its partial sum. These partial
sums are then collected using Future
objects and combined to get
the total sum.
This pattern is common for "divide and conquer" parallel algorithms. The work of
summation is divided between threads, and results are aggregated. It highlights
how Spliterator
facilitates breaking down a computation for
parallel execution.
package com.zetcode; import java.util.List; import java.util.Spliterator; import java.util.concurrent.*; import java.util.function.Consumer; import java.util.stream.Collectors; import java.util.stream.IntStream; public class Main { public static void main(String[] args) throws Exception { List<Integer> numbers = IntStream.rangeClosed(1, 10000) .boxed() .collect(Collectors.toList()); Spliterator<Integer> spliterator1 = numbers.spliterator(); Spliterator<Integer> spliterator2 = spliterator1.trySplit(); // s1 is now the second half try (ExecutorService executor = Executors.newFixedThreadPool(2)) { // Task for the first part (which is spliterator2) Callable<Long> sumTask1 = () -> { long sum = 0; if (spliterator2 != null) { SummingConsumer consumer = new SummingConsumer(); spliterator2.forEachRemaining(consumer); sum = consumer.getTotal(); } System.out.println("Sum from task 1 (thread " + Thread.currentThread().getName() + "): " + sum); return sum; }; // Task for the second part (remaining of spliterator1) Callable<Long> sumTask2 = () -> { SummingConsumer consumer = new SummingConsumer(); spliterator1.forEachRemaining(consumer); long sum = consumer.getTotal(); System.out.println("Sum from task 2 (thread " + Thread.currentThread().getName() + "): " + sum); return sum; }; Future<Long> future1 = executor.submit(sumTask1); Future<Long> future2 = executor.submit(sumTask2); long totalSum = future1.get() + future2.get(); System.out.println("Total sum calculated in parallel: " + totalSum); long expectedSum = numbers.stream().mapToLong(Integer::longValue).sum(); System.out.println("Expected sum (sequential stream): " + expectedSum); executor.shutdown(); if (!executor.awaitTermination(1, TimeUnit.MINUTES)) { executor.shutdownNow(); } } } // Helper class for summing, as lambda variable for sum must be effectively final static class SummingConsumer implements Consumer<Integer> { private long total = 0; @Override public void accept(Integer value) { total += value; } public long getTotal() { return total; } } }
The splitting mechanism is achieved using trySplit
, which
divides the original Spliterator
into two parts. The first
Spliterator
(spliterator1) retains one portion of the data, while
the second Spliterator
(spliterator2) handles the other. This
allows parallel processing of each subset independently.
To perform the parallel summation, an ExecutorService
is employed
with a fixed thread pool of size 2. Each split is processed by a separate
thread, ensuring optimized resource utilization. The tasks are submitted as
Callable<Long>
functions, which return computed sums
asynchronously.
A helper class, SummingConsumer
, is used for accumulating sums
while traversing elements with forEachRemaining
. This approach is
necessary because variables within lambda expressions must be effectively
final, preventing direct in-place modification.
Once both tasks complete execution, their results are combined to obtain the total sum in parallel. The computed result is then verified against a sequential sum obtained through a standard stream operation. This ensures correctness and provides insight into the performance benefits of parallel execution.
Parallel Data Transformation using Spliterator
This example demonstrates parallel processing using Spliterator
and
ExecutorService
. The program starts with a predefined list of words
and splits the workload into two separate tasks using trySplit
.
Each task transforms its assigned words to uppercase and appends thread
identification to illustrate concurrency in action.
This is useful for CPU-bound transformation tasks on large datasets. Splitting the data allows multiple cores to work on the transformation simultaneously, potentially speeding up the overall process significantly.
package com.zetcode; import java.util.ArrayList; import java.util.List; import java.util.Spliterator; import java.util.concurrent.Callable; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; public class Main { public static void main(String[] args) throws Exception { List<String> words = List.of("alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel", "india", "juliett"); Spliterator<String> s1 = words.spliterator(); Spliterator<String> s2 = s1.trySplit(); // s1 is now the second half try (ExecutorService executor = Executors.newFixedThreadPool(2)) { Callable<List<String>> transformTask1 = () -> { List<String> result = new ArrayList<>(); if (s2 != null) { // s2 is the first half s2.forEachRemaining(word -> { result.add(word.toUpperCase() + " (processed by " + Thread.currentThread().getName() + ")"); }); } return result; }; Callable<List<String>> transformTask2 = () -> { List<String> result = new ArrayList<>(); s1.forEachRemaining(word -> { // s1 is the second half result.add(word.toUpperCase() + " (processed by " + Thread.currentThread().getName() + ")"); }); return result; }; Future<List<String>> future1 = executor.submit(transformTask1); Future<List<String>> future2 = executor.submit(transformTask2); List<String> combinedResult = new ArrayList<>(); combinedResult.addAll(future1.get()); combinedResult.addAll(future2.get()); System.out.println("Transformed words in parallel:"); combinedResult.forEach(System.out::println); executor.shutdown(); if (!executor.awaitTermination(1, TimeUnit.MINUTES)) { executor.shutdownNow(); } } System.out.println("\nSequentially transformed for order reference:"); words.stream().map(String::toUpperCase).forEach(System.out::println); } }
The splitting mechanism divides the original Spliterator
into two
segments. The first half (s2) and the remaining portion (s1) are then processed
independently by separate threads. This approach optimizes CPU utilization by
distributing computational tasks across multiple threads.
To facilitate true parallel execution, a fixed thread pool
(ExecutorService
) is employed, ensuring that each split is
processed concurrently. The tasks are defined as
Callable<List<String>>
, allowing asynchronous execution
while returning transformed results upon completion.
Each task uses forEachRemaining
to process elements within its
split, ensuring efficient traversal of words without explicit iteration. Once
both tasks complete execution, their results are merged into a combined list for
final output.
It is important to note that order consistency is not guaranteed unless explicitly managed. Since the processing occurs across multiple threads, the final merged result may differ from the original sequence. For reference, a sequential transformation using a standard stream is displayed alongside the parallel output.
Finally, the ExecutorService
is shut down gracefully, ensuring
efficient resource cleanup. If tasks exceed their expected execution time, an
emergency shutdown prevents unnecessary resource consumption.
Parallel Data Aggregation (e.g., Counting) using Spliterator
This example demonstrates parallel data aggregation, specifically counting elements
that match a certain criterion. We use a list of strings (simulating lines from a
file). The goal is to count how many strings contain a specific keyword.
The Spliterator
is split, and each part is processed by a task in an
ExecutorService
. Each task counts matching strings in its segment, and
the main thread sums these partial counts.
This approach is effective for operations like filtering and counting on large datasets. By dividing the dataset and processing parts concurrently, we can often achieve better performance than a purely sequential approach.
package com.zetcode; import java.util.List; import java.util.Spliterator; import java.util.concurrent.Callable; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; public class Main { public static void main(String[] args) throws Exception { List<String> lines = List.of( "The quick brown fox", "jumps over the lazy dog", "A B C D E F G", "Another line with fox here", "Spliterators are useful for parallel fox processing", "The lazy fox is quick" ); String keyword = "fox"; Spliterator<String> s1 = lines.spliterator(); Spliterator<String> s2 = s1.trySplit(); // s1 is now the second half try (ExecutorService executor = Executors.newFixedThreadPool(2)) { Callable<Integer> countTask1 = () -> { AtomicInteger count = new AtomicInteger(0); if (s2 != null) { // s2 is the first half s2.forEachRemaining(line -> { if (line.contains(keyword)) { count.incrementAndGet(); } }); } System.out.println("Count from task 1 (thread " + Thread.currentThread().getName() + "): " + count.get()); return count.get(); }; Callable<Integer> countTask2 = () -> { AtomicInteger count = new AtomicInteger(0); s1.forEachRemaining(line -> { // s1 is the second half if (line.contains(keyword)) { count.incrementAndGet(); } }); System.out.println("Count from task 2 (thread " + Thread.currentThread().getName() + "): " + count.get()); return count.get(); }; Future<Integer> future1 = executor.submit(countTask1); Future<Integer> future2 = executor.submit(countTask2); int totalCount = future1.get() + future2.get(); System.out.println("Total lines containing '" + keyword + "': " + totalCount); long expectedCount = lines.stream().filter(line -> line.contains(keyword)).count(); System.out.println("Expected count (sequential stream): " + expectedCount); executor.shutdown(); if (!executor.awaitTermination(1, TimeUnit.MINUTES)) { executor.shutdownNow(); } } } }
In this example, a list of strings is processed in parallel to count occurrences
of the word "fox". Each task uses an AtomicInteger
for its local
count to ensure thread safety within the lambda if it were more complex, though
here simple local variable would also work. The partial counts are retrieved via
Future
objects and summed.
A key component in this implementation is the use of AtomicInteger
,
which guarantees thread-safe increments while counting occurrences of the target
keyword. This approach ensures correctness when modifying shared variables
inside a lambda expression—though for simple cases, a local variable would
suffice.
Once both tasks complete execution, their partial results are retrieved via
Future
objects and summed to calculate the total keyword
occurrences in parallel. The output is then validated by comparing it against
a sequential stream-based count, demonstrating the accuracy and efficiency
gains achieved through concurrent processing.
Source
Java Spliterator Documentation
This tutorial covered the Java Spliterator
interface with a focus
on its use in parallel processing. We explored basic traversal, splitting,
characteristics, custom Spliterators, and integration with parallel streams and
ExecutorService
. Understanding Spliterators is vital for writing
efficient, scalable Java code that can leverage multi-core processors for
data-intensive tasks.
Author
List all Java tutorials.