Search algorithm and the golden ratio

From time to time, I need to insert an item to the correct index in an already sorted collection. One good example is a list of contacts, where the list is already sorted by the last name, and displayed somewhere in the UI. If the user adds a new contact, the new one needs to appear at the correct position.

So I wrote 2 little helper methods to do just that:

private static int ConditionalIndex<T>(this IList<T> list, T item, Func<T, T, int> comparer)
{
    var length = list.Count;
    for (var i = 0; i < length; i++)
        if (comparer(list[i], item) > 0)
            return i;
    return length;
}

public static void ConditionalInsert<T>(this IList<T> list, T item, Func<T, T, int> comparer)
{
    var index = list.ConditionalIndex(item, comparer);
    if (index >= list.Count) list.Add(item);
    else list.Insert(index, item);
}

And I use them like this:

var newContact = new Contact
{
    FirstName = "David",
    LastName = "Anderson"
};

MyContacts.ConditionalInsert(newContact, (c1,c2) => string.Compare(c1.LastName, c2.LastName));

This has been working fine until one day the list had grown big enough to take quite a few seconds to just add a new contact, and the user experience isn’t that good anymore.

The problem is that the ConditionalIndex method is just using a simple loop to search the index. So if the list has 10,000 items and it happens to be so unlucky that the new item needs to be inserted to the end of the list, it would take the method to go through all 10,000 items and execute that comparer delegate 10,000 times.

Therefore, I decided to optimize it to use a smarter algorithm.

After some digging around, I’ve chosen the Golden Section Search. It is a method that effectively reduces the range to be searched to find the target much faster.

To illustrate this idea in a simple way, let’s say we have a list of numbers from 0 to 9 but missing the number 7, and I want to insert 7 to this list. Golden section search will find the correct index by doing the following:

  1. Determine a split index, using the length of the search range divided by the Golden Ratio. In this case, it would be 10 ÷ 1.618 rounded up to integer, which is 6.
  2. Compare the item at index 6 with the item to be inserted, which is comparing number 6 and 7.
  3. If the item in the list is smaller, take the list before the split index as the new search range and repeat the above 2 steps. If the item is larger, take the list after the split index. If the item is the same, then the correct index is found.
  4. In our example, it would mean to take 8 and 9. The new search range length is 2 so the new split index is 2 ÷ 1.618 ≈ 1.
  5. Compare 9 to 7 means taking 8 as the new search range.
  6. Since there’s only 1 item left in the search range, simply compare 8 to 7 and we get the correct index to insert is the index before 8, which is 7.

My actual implementation of this algorithm is as follow:

public static int ConditionalIndex<T>(this IList<T> list, T item, Func<T, T, int> comparer)
{
    var length = list.Count;
    if (length == 0)
        return 0;

    const double goldenRatio = 1.618;
    var startIndex = 0;
    var endIndex = length - 1;
    while (true)
    {
        // If the start index is larger then the end index,
        // that means the search has already passed all necessary
        // search ranges, just return the start index.
        if (startIndex > endIndex)
            return startIndex;

        // If the start index is the same as the end index,
        // that means there is only 1 item left to compare.
        // Compare this item and return the result.
        if (startIndex == endIndex)
            // If the item in the list is larger than the given item,
            // retrun the index of the item in the list, otherwise return the next index.
            return comparer(list[startIndex], item) > 0 ? startIndex : startIndex + 1;

        // If there are still multiple items in the search range,
        // find the split index of the search range.
        var splitIndex = (int)Math.Round(endIndex - (endIndex - startIndex) / goldenRatio);

        // Compare the item sitting at the spliting point.
        var result = comparer(list[splitIndex], item);
        if (result > 0)
            // If this item is larger than the given item,
            // keep searching the range before the splitting point.
            endIndex = splitIndex - 1;
        else if (result < 0)
            // If this item is smaller than the given item,
            // keep searching the range after it.
            startIndex = splitIndex + 1;
        else
            // Otherwise, return the split index.
            return splitIndex;
    }
}

This has resulted in a huge boost in my stress test on a list that has 500,000 items, almost 100 times faster.

Although the use of the golden ratio is suppose to give you the best split index in general, you need to use Math.Round to round it up to an int for the most accurate result. I found that this is actually a trade off compare to loosing a bit of the precision and just ignore rounding the result casting it directly to and int:

var splitIndex = (int)(endIndex - (endIndex - startIndex) / goldenRatio);

This has given me another 1 to 2% speed boost in my stress test.

These extension methods are part of the EnumerationExtensions class in my open sourced Codify library on GitHub.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s