Java Stream distinct
last modified May 8, 2025
This article demonstrates how to use the Java Stream distinct method to remove duplicate elements from streams.
The distinct method is an intermediate operation in Java Streams that
filters out duplicate elements, ensuring only unique values remain in the
stream. It determines uniqueness based on the equals
method of
the elements.
For ordered streams, distinct
maintains the original encounter
order, preserving the sequence of elements as they appear. In contrast, for
unordered streams, removing duplicates may improve performance by reducing
unnecessary order tracking overhead.
Basic distinct Syntax
The distinct
method provides a simple way to remove duplicate
elements from a stream, ensuring that only unique values are retained.
Stream<T> distinct()
This operation relies on the equals
method to compare elements and
identify duplicates. To ensure correct behavior, stream elements should
implement both equals
and hashCode
methods
appropriately. Improper implementations can lead to unexpected results when
filtering unique elements.
How distinct Works Internally
The distinct
method in Java Streams is not based on hashing or
key-value storage like a HashMap
. Instead, it performs stateful
filtering to ensure that only unique elements, as determined by their
equals
method, are retained in the stream.
Feature | Stream | HashMap |
---|---|---|
Purpose | Process and transform data dynamically | Store key-value pairs efficiently |
Data storage | Does not store elements | Stores elements in a hashed structure |
Uniqueness logic | Uses equals in distinct | Uses hashing for fast lookups |
Performance | Can be slower for large datasets | Optimized for O(1) key lookup |
Internally, distinct
maintains a stateful filter by keeping track
of previously seen elements using a LinkedHashSet
. As the stream is
processed, each element is checked for equality against those already seen. If
an element is unique (according to equals
), it is passed
downstream; otherwise, it is filtered out. This approach preserves the encounter
order but may be less efficient than hashing for very large datasets.
Unlike a HashMap
, which provides fast O(1) lookups using hashing,
distinct
in streams does not index elements for fast access.
Instead, it compares each element sequentially, which can impact performance for
large streams.
Removing duplicates from primitive values
The distinct
method can be used with streams of primitive values to
eliminate duplicates and keep only unique elements.
void main() { Stream.of(2, 5, 3, 2, 5, 7, 3, 8) .distinct() .forEach(System.out::println); }
This example removes duplicate integers from the stream. The
distinct
operation preserves the first occurrence of each unique
number.
$ java Main.java 2 5 3 7 8
Removing duplicate strings
The distinct
method can also be applied to streams of strings to
filter out repeated values and retain only unique strings.
void main() { Stream.of("apple", "orange", "apple", "banana", "orange") .distinct() .forEach(System.out::println); }
This example removes duplicate strings from the stream. String comparison is case-sensitive, so "Apple" and "apple" would be considered distinct.
$ java Main.java apple orange banana
Custom objects with equals/hashCode
When using distinct
with custom objects, it is important that the
objects properly implement equals
and hashCode
to
ensure correct identification of duplicates.
record Person(String name, int age) { } void main() { Stream.of( new Person("Alice", 30), new Person("Bob", 25), new Person("Alice", 30), new Person("Charlie", 35), new Person("Bob", 25) ) .distinct() .forEach(p -> System.out.println(p.name() + " - " + p.age())); }
This example removes duplicate Person objects. Records automatically implement
proper hashCode
methods based on their
components.
$ java Main.java Alice - 30 Bob - 25 Charlie - 35
Custom objects without proper equals/hashCode
If custom objects do not implement equals
and hashCode
correctly, the distinct
method may not recognize duplicates as
expected.
class Product { String name; double price; Product(String name, double price) { this.name = name; this.price = price; } // No equals/hashCode implementation } void main() { Stream.of( new Product("Laptop", 999.99), new Product("Phone", 699.99), new Product("Laptop", 999.99) ) .distinct() .forEach(p -> System.out.println(p.name + " - " + p.price)); }
This example shows that without proper equals
and
hashCode
methods, distinct
won't work as expected,
treating objects with same values as different.
$ java Main.java Laptop - 999.99 Phone - 699.99 Laptop - 999.99
Combining with other operations
The distinct
method can be combined with other stream operations,
such as filtering and mapping, to create more complex data processing pipelines.
void main() { Stream.of("apple", "banana", "apple", "orange", "banana", "kiwi") .filter(s -> s.length() > 4) .distinct() .map(String::toUpperCase) .forEach(System.out::println); }
This example filters for long fruits, removes duplicates, and converts to uppercase, showing how distinct can be combined with other operations.
$ java Main.java BANANA ORANGE
Distinct with nested collections
The distinct
method is useful for removing duplicates after
flattening nested collections into a single stream.
void main() { List<List<String>> nestedLists = List.of( List.of("a", "b", "c"), List.of("b", "c", "d"), List.of("c", "d", "e") ); nestedLists.stream() .flatMap(List::stream) .distinct() .forEach(System.out::println); }
This example flattens nested lists and then removes duplicate elements, demonstrating a common use case for distinct.
$ java Main.java a b c d e
Distinct words in a text file
The distinct
method can be used to extract all unique words from a
text file, ignoring case and punctuation, which is helpful for tasks such as
building a vocabulary list or analyzing unique words in documents.
The Battle of Thermopylae was fought between an alliance of Greek city-states, led by King Leonidas of Sparta, and the Persian Empire of Xerxes I over the course of three days, during the second Persian invasion of Greece.
This file contains a brief description of the Battle of Thermopylae. We can
use the distinct
method to extract all unique words from this text
file, ignoring case and punctuation.
void main() throws IOException { Path path = Paths.get("thermopylae.txt"); Files.lines(path) .flatMap(line -> Arrays.stream(line.split("\\W+"))) .map(String::toLowerCase) .filter(s -> !s.isEmpty()) .distinct() .forEach(System.out::println); }
This example reads lines from a file, splits them into words, normalizes them to
lower case, removes empty strings, and prints all unique words. The
split("\\W+")
regular expression splits on any non-word character,
effectively removing punctuation.
Source
Java Stream distinct documentation
In this article we have explored the Java Stream distinct method. It provides
an efficient way to remove duplicate elements from streams, but requires proper
implementation of equals
and code for custom objects. Understanding
distinct is essential for working with data that may contain duplicates.
Author
List all Java tutorials.