Binary search explained

As I mentioned in my welcome post, I’m going to write a couple of blogs related to git bisect. The crucial part of this command (which is even hidden in its name) is binary search. I believe that the majority of those who read this text are pretty well aware of what it is and how it works. If you’re among them or prefer to read Wiki instead feel free to close this page and return to my blog when I have something more interesting to tell. The rest of you I kindly invite to continue reading.

The primary motivation behind this post is to make sure that I have the same understanding of basic concepts as a reader. The secondary goal is to practice a bit. I have never blogged in English before, and I was a bit worried about it. But then I decided that if I can manage to explain such a simple thing as binary search, then it shouldn’t be hard to do the rest. So targets are set, and I can finally start.

Definition Link to heading

According to Wikipedia, binary search is a search algorithm that finds the position of a target value within a sorted array. Simply speaking, if you a) have a list of something b) sorted according to some rules and c) want to find something else in this list, you most probably want to use binary search. What are alternatives you might ask? The naïve approach would be to examine elements of the list one by one until the target element is found or the end of the list is reached. The fancy name of this algorithm is linear search.

Linear search Link to heading

Imagine we have the following sequence of integers.

1 3 4 6 7 8 10 13 14 18 19 21 24 37 40 45 71

What is the index of 7? I don’t know what you’re doing now, but I just clicked on each number in the list until reached the sought-for. I did five clicks what gives us 4 as the desired index. We’re lucky, and our number is located somewhere close to the beginning. But what if it’s the last or even not present at all?

Let n be the number of elements in the list.

n	clicks
8	8
128	128
1024	1024

Doing 1k clicks may take a while. That’s where binary search comes to the rescue.

Binary search Link to heading

The binary search algorithm exploits the property b) from the above namely the fact that the list is sorted. If we compare the target element to an arbitrary element from the list, we’ll be able to make the following conclusions.

the target element is to the left if it’s lesser
the target element is to the right if it’s greater
the target element is found if it’s equal

This is all thanks to the ordered nature of the list. Now we only need.

Take the middle element.
Compare it to the target one.
Take the appropriate half.
Repeat until found or nothing left.

This is it. Let’s apply it to our sequence.

1 3 4 6 7 8 10 13 14 18 19 21 24 37 40 45 71

range	middle	value	comparison	action
[0,16]	17 / 2 ~ 9	9th element is 14	7 is lesser	go left
[0,7]	8 / 2 = 4	4th element is 6	7 is greater	go right
[4,7]	4 / 2 = 2	(4+2)th element is 8	7 is lesser	go left
[4,4]	1 / 2 ~ 1	(4+1)th element is 7	7 is equal	found

It’s important to remember that the index of the 9th element is 8. The rest should be trivial. Below the same actions are present in a picture.

AlwaysAngry [CC BY-SA 4.0], from Wikimedia Commons

We “clicked” four times, it’s a bit better than five. But how binary search performs on a different number of elements?

n	clicks
8	3
128	7
1024	10

We’ve got ten comparisons instead of a thousand. Sold! Or? Though the algorithm is straightforward and efficient, there are still some pitfalls, so even Java didn’t make it at the first attempt. Fortunately, we don’t have to write yet another implementation and can use an existing one from git. But this is the topic of the next post, stay tuned.

Started a series of blog posts on git bisect. The first one gives a short explanation of what binary search is. https://t.co/KnJFta9HpC
— Artem Khvastunov (@art_spb) January 5, 2019

Update. Roman Elizarov recently wrote an article on how to properly implement binary search. Check it out, it has a couple of interesting plot twists.

Binary search is a simple algorithm, but can you code a straightforward and correct implementation of it? It seems that it is not quite easy to figure out unless you were taught to do it. https://t.co/WlYi2cQsim
— Roman Elizarov (@relizarov) June 16, 2019