Five ways to count vowels

Posted on 15 September 2015

Hi. I’m Niels and I’m a sucker for benchmarks!

We had a recent chat on how trivial bits of code and the pro’s and con’s of optimizing for speed versus optimizing for readability. We came up with a very simple problem; counting vowels. The vowels are obviously just a list of characters; a, e, i, o and u. Counting the amount of vowels in a sentence is simply a matter of iterating through the characters of the text and compare them to each vowel in the string.

So simple that it’s in fact trivial to figure out five different approaches to the problem. But which one is the fastest? Time to break out one of the most awesome tools in our toolbox: the benchmark. I’ve been using JMH a lot and it’s very helpful to quickly answer these essential questions. The 5 approaches are a search through an array of vowels, a binary search through the same array, a hash-set lookup, a String.contains() and a simply if-block with 5 comparison expressions. The code (boilerplate omitted):

``````    public int withIf() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
char c = TEST_STRING.charAt(i);
if(c == 'a' || 'c' == 'e' || c == 'i' || c == 'o' || c == 'u') {
vowels++;
}
}

return vowels;
}

public int withArray() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
char c = TEST_STRING.charAt(i);
for(int j = 0;j < VOWELS.length;j++) {
if(VOWELS[j] == c) {
vowels++;
}
}
}

return vowels;
}

public int withHashSet() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
char c = TEST_STRING.charAt(i);
if(VOWEL_SET.contains(c)) {
vowels++;
}
}

return vowels;
}

public int withArrayBinSearch() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
char c = TEST_STRING.charAt(i);
if(Arrays.binarySearch(VOWELS, c) >= 0) {
vowels++;
}
}

return vowels;
}

public int withString() {
int vowels = 0;
for(int i = 0;i < TEST_STRING.length();i++) {
if(VOWEL_STRING.contains(TEST_STRING.substring(i, i + 1))) {
vowels++;
}
}

return vowels;
}``````

Can you guess which one is the fastest? The results aren’t very surprising:

``````Benchmark                          Mode   Samples        Score  Score error    Units
m.IfVsArray.withArray             thrpt         5    17137,384     2295,428   ops/ms
m.IfVsArray.withArrayBinSearch    thrpt         5     4037,655      344,203   ops/ms
m.IfVsArray.withHashSet           thrpt         5     7014,505      591,107   ops/ms
m.IfVsArray.withIf                thrpt         5    22911,273     2570,412   ops/ms
m.IfVsArray.withString            thrpt         5     1246,593       56,895   ops/ms``````

The if block is fastest with 22911 ops/second on average. The array search (O(n)) is second at 17137. The binary search (O(log n)) and hash set (O(1)-ish) obviously suffer a lot from the method calling overhead and, in the case of the set, autoboxing. The String.contains method is by far the worst; suffering from object creation, method call overhead and an O(n) complexity.

So, is this relevant? In most cases it isn’t. Obviously the benchmark is very synthetic. But if you’re working on some low level code that’s running very hot just figuring out a few approaches and throwing them into a quick benchmark might be worth it! And heck, it’s fun!