How should I compare String objects

You can compare String objects in a variety of ways, and the results are often different. The correctness of your result depends largely on what type of comparison you need. Common comparison techniques include the following:

  • Compare with the == operator.
  • Compare with a String object’s equals method.
  • Compare with a String object’s compareTo method.
  • Compare with a Collator object.

Comparing with the == Operator

The == operator works on String object references. If two String variables point to the same object in memory, the comparison returns a true result. Otherwise, the comparison returns false, regardless whether the text has the same character values. The == operator does not compare actual char data. Without this clarification, you might be surprised that the following code snippet prints The strings are unequal.

String name1 = "Michèle";
String name2 = new String("Michèle");
if (name1 == name2) {
System.out.println("The strings are equal.");
} else {
System.out.println("The strings are unequal.");
}

The Java platform creates an internal pool for string literals and constants. String literals and constants that have the exact same char values and length will exist exactly once in the pool. Comparisons of String literals and constants with the same char values will always be equal.

Comparing with the equals Method

The equals method compares the actual char content of two strings. This method returns true when two String objects hold char data with the same values. This code sample prints The strings are equal.

String name1 = "Michèle";
String name2 = new String("Michèle");
if (name1.equals(name2) {
System.out.println("The strings are equal.");
} else {
System.out.println("The strings are unequal.");
}

Comparing with the compareTo Method

The compareTo method compares char values similarly to the equals method. Additionally, the method returns a negative integer if its own String object precedes the argument string. It returns zero if the strings are equal. It returns a positive integer if the object follows the argument string. The compareTo, method says that cat precedes hat. The most important information to understand about this comparison is that the method compares the char values literally. It determines that the value of ‘c’ in cat has a numeric value less than the ‘h’ in hat.

String w1 = "cat";
String w2 = "hat";
int comparison = w1.compareTo(w2);
if (comparison < 0) {
System.out.printf("%s < %sn", w1, w2);
} else {
System.out.printf("%s < %sn", w2, w1);
}

The above code sample demonstrates the behavior of the compareTo method and prints cat < hat. We expect that result, so where’s the weakness? Where’s the problem?

Producing Errors

A problem appears when you want to compare text as natural language, like you do when using a word dictionary. The String class doesn’t have the ability to compare text from a natural language perspective. Its equals and compareTo methods compare the individual char values in the string. If the char value at index n in name1 is the same as the char value at index n in name2 for all n in both strings, the equals method returns true.

Ask the same compareTo method to compare cat and Hat, and the method produces results that would confuse most students. Any second grader knows that cat still precedes Hat regardless of capitalization. However, the compareTo method will tell you Hat < cat. The method determines this because the uppercase letters precede lowercase letters in the Unicode character table. This is the same ordering that appears in the ASCII character tables as well. Clearly, this ordering is not always desirable when you want to present your application users with sorted text.

Another potential problem appears when trying to determine string equality. Text can have multiple internal representations. For example, the name “Michèle” contains the Unicode character sequence M i c h è l e. However, you can also use the sequence M i c h e ` l e. The second version of the name uses a “combining sequence” (‘e’ + ‘`’) to represent ‘è’. Graphical systems that understand Unicode will display these two representations so that they appear the same even though their internal character sequences are slightly different. A String object’s simplistic equals method says that these two strings have different text. They are not lexicographically equal, but they are definitely equal linguistically .

The following code snippet prints this: The strings are unequal. Neither the equals nor compareTo methods understand the linguistic equivalence of these strings.

String name1 = "Michèle";
String name2 = "Micheu0300le"; //U+0300 is the COMBINING GRAVE ACCENT
if (name1.equals(name2)) {
System.out.println("The strings are equal.");
} else {
System.out.println("The strings are unequal.");
}

If you’re trying to sort a list of names, the results of String’s compareTo method are almost certainly wrong. If you want to search for a name, again the equals method will subtly trip you up if your user enters combining sequences…or if your database normalizes data differently from how the user enters them. The point is that String’s simplistic comparisons are wrong whenever you are working with natural language sorting or searching. For these operations, you need something more powerful than simple char value comparisons.

Using a Collator

The java.text.Collator class provides natural language comparisons. Natural language comparisons depend upon locale-specific rules that determine the equality and ordering of characters in a particular writing system.

A Collator object understands that people expect “cat” to come before “Hat” in a dictionary. Using a collator comparison, the following code prints cat < Hat.

Collator collator = Collator.getInstance(new Locale("en", "US"));
int comparison = collator.compare("cat", "Hat");
if (comparison < 0) {
System.out.printf("%s < %sn", "cat", "Hat");
} else {
System.out.printf("%s < %sn", "Hat", "cat");
}

A collator knows that the character sequence M i c h è l e is equal to M i c h e ` l e in some situations, usually those in which natural language processing is important.

The following comparison uses a Collator object. It recognizes the combining sequence and evaluates the two strings as equal. It prints this: The strings are equal.

Collator collator = Collator.getInstance(Locale.US);
String name1 = "Michèle";
String name2 = "Micheu0300le";
int comparison = collator.compare(name1, name2);
if (comparison == 0) {
System.out.println("The strings are equal.");
} else {
System.out.println("The string are unequal.");
}

A Collator object can even understand several “levels” of character differences. For example, e and d are two different letters. Their difference is a “primary” difference. The letters e and è are different too, but the difference is a “secondary” one. Depending upon how you configure a Collator instance, you can consider the words “Michèle” and “Michele” to be equal. The following code will print The strings are equal.

Collator collator = Collator.getInstance(Locale.US);
collator.setStrength(Collator.PRIMARY);
int comparison = collator.compare("Michèle", "Michele");
if (comparison == 0) {
System.out.println("The strings are equal.");
} else {
System.out.println("The string are unequal.");
}

Summary

Consider when the equals method is more appropriate than the == operator. Also, when you need to order text, consider whether a Collator object’s natural language comparison is needed. After you consider the subtle differences among the various comparisons, you might discover that you’ve been using the wrong API in some places. Knowing the differences helps you make the right choices for your applications and customers.

More Information

Use the following resources to find more information about the material in this technical tip:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s