Find duplicates in a List in Java

TheServerSide.com

https://www.theserverside.com/blog/Coffee-Talk-Java-News-Stories-and-Opinions/Find-duplicates-in-a-List-in-Java

Find duplicates in a List in Java

By Walker Aldridge

It’s easy to remove duplicates from a list in Java. There are a variety of functions in Java that simplify that process.

Finding duplicates in a Java list? That’s actually a bit more complicated, but finding the dupes in a Lists is by no means an impossible task.

How to find duplicates in a Java List

The most common approaches to removing duplicates from a List in Java include the following:

A brute force comparison using nested loops.
The use of a HashSet to find the unique duplicates.
A combined use of multiple Lists and HashSets.
The use of the Java Streams API to find duplicates.
The use of the frequency method in Java’s Collections class.

Brute-force Java duplicate finder

A brute-force approach to solve this problem involves going through the list one element at a time and looking for a match. If a match is found, the matching item is put in a second list.

List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
List<Object> duplicates = new ArrayList<Object>();
for (int x = 0; x < myList.size(); x++) {
  for (int y = x + 1; y < myList.size(); y++) {
    //if (x == y) break;
    if (myList.get(x).equals(myList.get(y))) {
     duplicates.add(myList.get(x));
     break;
    }
  }
}
System.out.println(duplicates);

This code, when executed, prints out the items that duplicate:

[0, 1, 1, 5, 0]

How to find the set of duplicates

If you only need a unique list of the duplicate items, you could use a HashSet instead of an ArrayList to hold duplicates. Here’s the code to do that:

List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
HashSet<Object> duplicates = new HashSet<Object>();

for (int x = 0; x < myList.size(); x++) {
  for (int y = x + 1; y < myList.size(); y++) {
    if (x == y) break;
    if (myList.get(x).equals(myList.get(y))) {
      duplicates.add(myList.get(x));
      break;
    }
  }
}
System.out.println(duplicates);

When this code runs, it prints out the unique set of duplicates in the list, which is:

[0, 1, 5]

Optimized use of HashSet to find duplicates

When items are added to a HashSet, the add method returns true if the item is new, and false if the item is a duplicate.

We can use this behavior to improve both the speed and readability of our algorithm.

In the improved duplicate finder, create a second List to hold the duplicates. First try to add items to the HashSet, and if the HashSet indicates the item is already in the set, add that duplicate to the List:

List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
HashSet<Object> uniqueItems = new HashSet<Object>();
List<Object> duplicates = new ArrayList<Object>();

for (Object item : myList) {
  if (!uniqueItems.add(item)) {
    duplicates.add(item);
  }
}
System.out.println(duplicates);

When this code runs it prints out the following result:

[1, 0, 0, 1, 5]

How to find duplicates with a Java Stream

We can combine the improved speed of the HashSet above with the speed and efficiency of a Java Stream to create a very succinct mechanism. That is how the code below removes duplicates from the Java List:

List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
HashSet uniqueItems = new HashSet();
List<Object> duplicates = myList.stream()
                                .filter(n -> !uniqueItems.add(n))
                                .toList();
System.out.println(duplicates);

When this code runs, it prints out:

[1, 0, 0, 1, 5]

How to find the frequency of duplicates in a List

Another approach to find duplicates in a Java list is to use the frequency method of the Collections class.

This example prints out the number of times each unique element in the List occurs, which is a bit of a twist on the original requirement.

The following code accomplishes three things:

Creates a HashSet of unique values based on the original list.
Loops through the unique elements in the HashSet.
Prints out the occurrence of each unique element in the List.

List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
HashSet<Object> duplicates = new HashSet<Object>(myList);

for (Object duplicate : duplicates) {
  System.out.print(duplicate);
  System.out.print(" Occurrences: ");
  System.out.println(Collections.frequency(myList, duplicate));
}

When this code runs, it prints out:

0 Occurrences: 3
1 Occurrences: 3
2 Occurrences: 1
3 Occurrences: 1
5 Occurrences: 2
6 Occurrences: 1

Duplicates in Java Lists

There are many approaches to find and identify duplicate elements in a list, from a brute-force tackling of the problem, to the super-efficient use of HashMaps and the Java Streams API.

Assess your personal use case, and decide which approach works best for you.

09 Jan 2024