https://www.theserverside.com/blog/Coffee-Talk-Java-News-Stories-and-Opinions/Find-duplicates-in-a-List-in-Java
It’s easy to remove duplicates from a list in Java. There are a variety of functions in Java that simplify that process.
Finding duplicates in a Java list? That’s actually a bit more complicated, but finding the dupes in a Lists is by no means an impossible task.
The most common approaches to removing duplicates from a List in Java include the following:
A brute-force approach to solve this problem involves going through the list one element at a time and looking for a match. If a match is found, the matching item is put in a second list.
List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
List<Object> duplicates = new ArrayList<Object>();
for (int x = 0; x < myList.size(); x++) {
for (int y = x + 1; y < myList.size(); y++) {
//if (x == y) break;
if (myList.get(x).equals(myList.get(y))) {
duplicates.add(myList.get(x));
break;
}
}
}
System.out.println(duplicates);
This code, when executed, prints out the items that duplicate:
[0, 1, 1, 5, 0]
If you only need a unique list of the duplicate items, you could use a HashSet instead of an ArrayList to hold duplicates. Here’s the code to do that:
List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
HashSet<Object> duplicates = new HashSet<Object>();
for (int x = 0; x < myList.size(); x++) {
for (int y = x + 1; y < myList.size(); y++) {
if (x == y) break;
if (myList.get(x).equals(myList.get(y))) {
duplicates.add(myList.get(x));
break;
}
}
}
System.out.println(duplicates);
When this code runs, it prints out the unique set of duplicates in the list, which is:
[0, 1, 5]
When items are added to a HashSet, the add method returns true if the item is new, and false if the item is a duplicate.
We can use this behavior to improve both the speed and readability of our algorithm.
In the improved duplicate finder, create a second List to hold the duplicates. First try to add items to the HashSet, and if the HashSet indicates the item is already in the set, add that duplicate to the List:
List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
HashSet<Object> uniqueItems = new HashSet<Object>();
List<Object> duplicates = new ArrayList<Object>();
for (Object item : myList) {
if (!uniqueItems.add(item)) {
duplicates.add(item);
}
}
System.out.println(duplicates);
When this code runs it prints out the following result:
[1, 0, 0, 1, 5]
We can combine the improved speed of the HashSet above with the speed and efficiency of a Java Stream to create a very succinct mechanism. That is how the code below removes duplicates from the Java List:
List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
HashSet uniqueItems = new HashSet();
List<Object> duplicates = myList.stream()
.filter(n -> !uniqueItems.add(n))
.toList();
System.out.println(duplicates);
When this code runs, it prints out:
[1, 0, 0, 1, 5]
Another approach to find duplicates in a Java list is to use the frequency method of the Collections class.
This example prints out the number of times each unique element in the List occurs, which is a bit of a twist on the original requirement.
The following code accomplishes three things:
List<Object> myList = List.of(0, 1, 1, 2, 3, 5, 6, 0, 0, 1, 5);
HashSet<Object> duplicates = new HashSet<Object>(myList);
for (Object duplicate : duplicates) {
System.out.print(duplicate);
System.out.print(" Occurrences: ");
System.out.println(Collections.frequency(myList, duplicate));
}
When this code runs, it prints out:
0 Occurrences: 3
1 Occurrences: 3
2 Occurrences: 1
3 Occurrences: 1
5 Occurrences: 2
6 Occurrences: 1
There are many approaches to find and identify duplicate elements in a list, from a brute-force tackling of the problem, to the super-efficient use of HashMaps and the Java Streams API.
Assess your personal use case, and decide which approach works best for you.
09 Jan 2024