ZetCode

Java Stream distinct

last modified May 8, 2025

This article demonstrates how to use the Java Stream distinct method to remove duplicate elements from streams.

The distinct method is an intermediate operation in Java Streams that filters out duplicate elements, ensuring only unique values remain in the stream. It determines uniqueness based on the equals method of the elements.

For ordered streams, distinct maintains the original encounter order, preserving the sequence of elements as they appear. In contrast, for unordered streams, removing duplicates may improve performance by reducing unnecessary order tracking overhead.

Basic distinct Syntax

The distinct method provides a simple way to remove duplicate elements from a stream, ensuring that only unique values are retained.

Stream<T> distinct()

This operation relies on the equals method to compare elements and identify duplicates. To ensure correct behavior, stream elements should implement both equals and hashCode methods appropriately. Improper implementations can lead to unexpected results when filtering unique elements.

How distinct Works Internally

The distinct method in Java Streams is not based on hashing or key-value storage like a HashMap. Instead, it performs stateful filtering to ensure that only unique elements, as determined by their equals method, are retained in the stream.

FeatureStreamHashMap
PurposeProcess and transform data dynamicallyStore key-value pairs efficiently
Data storageDoes not store elementsStores elements in a hashed structure
Uniqueness logicUses equals in distinctUses hashing for fast lookups
PerformanceCan be slower for large datasetsOptimized for O(1) key lookup

Internally, distinct maintains a stateful filter by keeping track of previously seen elements using a LinkedHashSet. As the stream is processed, each element is checked for equality against those already seen. If an element is unique (according to equals), it is passed downstream; otherwise, it is filtered out. This approach preserves the encounter order but may be less efficient than hashing for very large datasets.

Unlike a HashMap, which provides fast O(1) lookups using hashing, distinct in streams does not index elements for fast access. Instead, it compares each element sequentially, which can impact performance for large streams.

Removing duplicates from primitive values

The distinct method can be used with streams of primitive values to eliminate duplicates and keep only unique elements.

Main.java
void main() {

    Stream.of(2, 5, 3, 2, 5, 7, 3, 8)
          .distinct()
          .forEach(System.out::println);
}

This example removes duplicate integers from the stream. The distinct operation preserves the first occurrence of each unique number.

$ java Main.java
2
5
3
7
8

Removing duplicate strings

The distinct method can also be applied to streams of strings to filter out repeated values and retain only unique strings.

Main.java
void main() {

    Stream.of("apple", "orange", "apple", "banana", "orange")
          .distinct()
          .forEach(System.out::println);
}

This example removes duplicate strings from the stream. String comparison is case-sensitive, so "Apple" and "apple" would be considered distinct.

$ java Main.java
apple
orange
banana

Custom objects with equals/hashCode

When using distinct with custom objects, it is important that the objects properly implement equals and hashCode to ensure correct identification of duplicates.

Main.java
record Person(String name, int age) {
}

void main() {

    Stream.of(
            new Person("Alice", 30),
            new Person("Bob", 25),
            new Person("Alice", 30),
            new Person("Charlie", 35),
            new Person("Bob", 25)
        )
        .distinct()
        .forEach(p -> System.out.println(p.name() + " - " + p.age()));
}

This example removes duplicate Person objects. Records automatically implement proper equals and hashCode methods based on their components.

$ java Main.java
Alice - 30
Bob - 25
Charlie - 35

Custom objects without proper equals/hashCode

If custom objects do not implement equals and hashCode correctly, the distinct method may not recognize duplicates as expected.

Main.java
class Product {

    String name;
    double price;
    
    Product(String name, double price) {
        this.name = name;
        this.price = price;
    }
    
    // No equals/hashCode implementation
}

void main() {

    Stream.of(
            new Product("Laptop", 999.99),
            new Product("Phone", 699.99),
            new Product("Laptop", 999.99)
        )
        .distinct()
        .forEach(p -> System.out.println(p.name + " - " + p.price));
}

This example shows that without proper equals and hashCode methods, distinct won't work as expected, treating objects with same values as different.

$ java Main.java
Laptop - 999.99
Phone - 699.99
Laptop - 999.99

Combining with other operations

The distinct method can be combined with other stream operations, such as filtering and mapping, to create more complex data processing pipelines.

Main.java
void main() {

    Stream.of("apple", "banana", "apple", "orange", "banana", "kiwi")
          .filter(s -> s.length() > 4)
          .distinct()
          .map(String::toUpperCase)
          .forEach(System.out::println);
}

This example filters for long fruits, removes duplicates, and converts to uppercase, showing how distinct can be combined with other operations.

$ java Main.java
BANANA
ORANGE

Distinct with nested collections

The distinct method is useful for removing duplicates after flattening nested collections into a single stream.

Main.java
void main() {

    List<List<String>> nestedLists = List.of(
        List.of("a", "b", "c"),
        List.of("b", "c", "d"),
        List.of("c", "d", "e")
    );
    
    nestedLists.stream()
              .flatMap(List::stream)
              .distinct()
              .forEach(System.out::println);
}

This example flattens nested lists and then removes duplicate elements, demonstrating a common use case for distinct.

$ java Main.java
a
b
c
d
e

Distinct words in a text file

The distinct method can be used to extract all unique words from a text file, ignoring case and punctuation, which is helpful for tasks such as building a vocabulary list or analyzing unique words in documents.

thermopylae.txt
The Battle of Thermopylae was fought between an alliance of Greek city-states,
led by King Leonidas of Sparta, and the Persian Empire of Xerxes I over the
course of three days, during the second Persian invasion of Greece.

This file contains a brief description of the Battle of Thermopylae. We can use the distinct method to extract all unique words from this text file, ignoring case and punctuation.

Main.java
void main() throws IOException {

    Path path = Paths.get("thermopylae.txt");

    Files.lines(path)
        .flatMap(line -> Arrays.stream(line.split("\\W+")))
        .map(String::toLowerCase)
        .filter(s -> !s.isEmpty())
        .distinct()
        .forEach(System.out::println);
}

This example reads lines from a file, splits them into words, normalizes them to lower case, removes empty strings, and prints all unique words. The split("\\W+") regular expression splits on any non-word character, effectively removing punctuation.

Source

Java Stream distinct documentation

In this article we have explored the Java Stream distinct method. It provides an efficient way to remove duplicate elements from streams, but requires proper implementation of equals and code for custom objects. Understanding distinct is essential for working with data that may contain duplicates.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Java tutorials.