Comparison in Java

We sort and compare items as humans because it helps us find things faster.

Consider the following objects:

List<Book> books = new ArrayList<>();
books.add(new Book("Pride and Prejudice", "Austen, Jane", 1813));
books.add(new Book("Emma", "Austen, Jane", 1815));
books.add(new Book("Frankenstein", "Shelley, Mary", 1818));

They are of class Book, which looks roughly like this:

public class Book {
    String title;
    String author;
    int publicationYear;
    public Book(String title, String author, int publicationYear) {
        this.title = title;
        this.author = author;
        this.publicationYear = publicationYear;
    }

    // ... methods ...
}

To move along our example, we'll need to be able to print an object of the Book class, so we'll override Java's toString() method.

@Override
public String toString() {
    return "**" + this.title + "**" +
           " by " + this.author + 
           " (" + this.publicationYear + ")";
}

Now if we wrap up our main method with a System.out.println(books);, we'll get the following output:

[**Pride and Prejudice** by Austen, Jane (1813), **Emma** by Austen, Jane (1815), **Frankenstein** by Shelley, Mary (1818)]

Sort these books

If I ask you to sort these books, perhaps you do so by author, or by title, or by publicationYear. All the fields of this class are sortable. Maybe you ask me a follow-up question to determine how I want them sorted.

In fact, that's exactly what Java does. If we add a sort statement before printing, it tells us it doesn't know how we want to compare book objects.

Collections.sort(books);

The output is not particularly friendly.

Comparison.java:30: error: no suitable method found for sort(List<Book>)
        Collections.sort(books);
                   ^
    method Collections.<T#1>sort(List<T#1>) is not applicable
      (inference variable T#1 has incompatible bounds
        equality constraints: Book
        lower bounds: Comparable<? super T#1>)
    method Collections.<T#2>sort(List<T#2>,Comparator<? super T#2>) is not applicable
      (cannot infer type-variable(s) T#2
        (actual and formal argument lists differ in length))
  where T#1,T#2 are type-variables:
    T#1 extends Comparable<? super T#1> declared in method <T#1>sort(List<T#1>)
    T#2 extends Object declared in method <T#2>sort(List<T#2>,Comparator<? super T#2>)
1 error

Basically, this boils down to needing a Comparator<T> or a Comparable<T>. But what are those?

Java classes for Sorting

What is a Comparable<T>?

Comparable is a Java interface which can be implemented on a class to define the default order of a class. Maybe you have a User class in your application, and you always want to sort them by their String username.

public class User implements Comparable<User> {
    String username;
    // ... et. cetera ...

    /** Compare this User to another. */
    int compare(User other) {
        return this.username.compareTo(other.username);
    }
}

Often Comparable<T> makes sense for your objects. Here, the T is not a class inside this class (as it is in List<T>) but actually it is the name of the class you are making Comparable.

Note that there are some advanced situations where you might make a class Comparable to another class besides itself, but it would be a super-class of the current class.

With this definition in place, we can call Java's sorting methods:

List<User> users = //...
Collections.sort(listOfUsers);

What is a Comparator<T>?

Sometimes there is not an obvious default order for a class. Let's go back to our book example.

public class Book {
    String title;
    String author;
    int publicationYear;
    // ... snip ...
}

We may want to sort by title at one moment, then later by author, and then later by publicationYear. With only Comparable, we would need to have 3 different Book classes and we would need to shuffle data in and out.

Instead, we can make three Comparator classes that go with the Book class.

public class Book {
    String title;
    String author;
    int publicationYear;
    // ... snip ...

    /** This class lets us sort Books by title. */
    static class TitleCmp implements Comparator<Book> {
        public int compare(Book left, Book right) {
            return left.title.compareTo(right.title);
        }
    }
}

With a Comparator defined, we can then sort a list of books with that specific Comparator object.

List<Book> books = //...
Collections.sort(books, new Book.TitleCmp());
System.out.println(books);

We now get books alphabetically by title:

[**Emma** by Austen, Jane (1815), **Frankenstein** by Shelley, Mary (1818), **Pride and Prejudice** by Austen, Jane (1813)]

Java 8 Lambdas (easier Comparators!).

In Java 8, a simpler way to create objects from classes that only define 1 method was introduced: lambda syntax.

For instance, we can print out all three orders concisely as:

Collections.sort(books, 
  (left, right) -> left.author.compareTo(right.author));
System.out.println(books);
Collections.sort(books, 
  (left, right) -> left.title.compareTo(right.title));
System.out.println(books);
Collections.sort(books, 
  (left, right) -> Integer.compare(left.publicationYear, right.publicationYear));
System.out.println(books);

But we have to know that we're defining a Comparator and it's method takes two arguments. Nicely enough, Java will figure out the rest.

Comparable vs. Comparator

TL;DR: Here's a table comparing them:

Method Left Right Comments
int compare(T left, T right) left right Comparators take in two arguments.
int compare(T other) this other Comparables take in one argument and compare to themselves.

Why do compare methods return an int?

Compare methods are a replacement for <, >, and == for your new class, and therefore have three possible outputs.

Situation Output
a < b a negative number
a == b 0
a > b a positive numbers

If you think about integer subtraction, int compare(int a, int b) { return a - b; } performs appropriately, returning numbers that are interpretable as all three comparisons!

Sorting in the other direction:

Because it's an integer, we can also sort in the opposite direction, descending (a.ka., Z to A, biggest to smallest, etc.) by writing a Comparator or Comparable that takes the negative of an existing one. We also have access to Collections.reverse(list).

Why is comparing Strings fairly slow?

When you have other types, you tend to be fancier. For instance, here's a sketch of how to compare two strings:

class StrCmp implements Comparator<String> {
    int compare(String left, String right) {
        int N = left.length();
        // Can't be equal if they have different sizes.
        if (right.length() != N) {
            // Shorter strings first:
            return Integer.compare(N, right.length());
        }

        // If they have the same size...
        for (int i = 0; i < N; i++) {
            char lc = left.charAt(i);
            char rc = right.charAt(i);
            // If any character is different, exit immediately with which direction!
            if (lc != rc) {
                return Character.compare(lc, rc);
            }
        }
        // If we got to the end, they must be the same.
        return 0;
    }
}

Never use this code; trust that java.lang.String is a Comparable already and just call left.compareTo(right); as we've been doing earlier.

Sorting IRL: How do I compare...?

Class/Type Method
byte Byte.compare(x, y)
short Short.compare(x, y)
int Integer.compare(x, y)
long Long.compare(x, y)
char Character.compare(x, y)
String x.compareTo(y)
Comparable<T> x.compareTo(y)
Your class here. Define it yourself.