Category: Optimization

Concurrency: Aggregating concurrent requests to a heavy function

I have been dealing with some interesting optimization problems these days, and one of them was a heavy method being called multiple times with the same input in a very short period of time.

The method DownloadCustomersInternal is used to download customer data from a server asynchronously and returns a Task. It resides in a class that is shared across multiple components, and it often gets called by these components almost at the same time. When there’s a lot of data to download, this method can take a while to finish, it makes sense for the callers of this method to wait on the same task if there’s already one. So a wrapper method DownloadCustomers is written:

private Task _downloadTask;
private readonly object _downloadLock = new object();

private Task DownloadCustomers()
{
    lock (_downloadLock)
    {
        // If there is already a download task, just return this one.
        if (_downloadTask != null)
            return _downloadTask;

        // Perform the real download and assign it's task to the download task reference.
        _downloadTask = DownloadCustomersInternal();

        // When the download process is done, clear the download task reference.
        return _downloadTask = _downloadTask.ContinueWith(task =>
        {
            lock (_downloadLock)
                _downloadTask = null;
        });
    }
}

The wrapper method simply checks if there is already a _downloadTask and returns it if there is. This ensures that all calls to this method get the same Task instance before _downloadTask is set to null, which only happens after DownloadCustomersInternal is finished asynchronously. Because the code sets _downloadTask to null uses the same lock, it guarantees thread safety of the variable.

This will resolve the problem by returning the same Task to callers who requests the data while the data is being downloaded, but what if the method is called right after a previous download is finished? Downloading the same data in very close succession can also be considered redundant, so it would be nice to introduce some sort of “cooling period” before the next download can be initiated. Therefore, an updated version of the above method is written:

private Task _downloadTask;
private readonly object _downloadLock = new object();
private DateTime _lastDownloadTime = DateTime.MinValue;
private static readonly TimeSpan MinimumDownloadInterval = TimeSpan.FromSeconds(5);

private Task DownloadCustomers()
{
    lock (_downloadLock)
    {
        // If there is already a download task, just return this one.
        if (_downloadTask != null)
            return _downloadTask;

        // Check if the time between this download request and the previous one is too close.
        var intervalFromLastDownload = DateTime.Now - _lastDownloadTime;
        if (intervalFromLastDownload < MinimumDownloadInterval)
        {
            // If this download request is too close to the previous one,
            // make this request wait until the minimum download interval is passed,
            // then perform the download.
            var waitTaskSource = new TaskCompletionSource<bool>();
            _downloadTask = waitTaskSource.Task.ContinueWith(_=>DownloadCustomersInternal()).Unwrap();

            // Force the wait in a background thread to ensure a non-blocking UI experience.
            ThreadPool.QueueUserWorkItem(_ =>
            {
                Thread.Sleep(MinimumDownloadInterval - intervalFromLastDownload);
                waitTaskSource.SetResult(true);
            });
        }
        else
            // If this download request is far enough from the last one,
            // just perform the normal download.
            _downloadTask = DownloadCustomersInternal();

        // When the download process is done, always clear the download task reference
        // and update the download timestamp.
        return _downloadTask = _downloadTask.ContinueWith(task =>
        {
            lock (_downloadLock)
            {
                _downloadTask = null;
                _lastDownloadTime = DateTime.Now;
            }
        });
    }
}

The key in this update is determining if a call is too close to a previous call when there is no task running. If the call is too close, simply wait in the background until MinimumDownloadInterval elapses. The waiting code in the background is also thread safe because waitTaskSource.Task.ContinueWith (line 22) is assigned to _downloadTask in the same scope of the first lock, and waitTaskSource is only set to finish when the background wait is elapsed.

Advertisements

Search algorithm and the golden ratio

From time to time, I need to insert an item to the correct index in an already sorted collection. One good example is a list of contacts, where the list is already sorted by the last name, and displayed somewhere in the UI. If the user adds a new contact, the new one needs to appear at the correct position.

So I wrote 2 little helper methods to do just that:

private static int ConditionalIndex<T>(this IList<T> list, T item, Func<T, T, int> comparer)
{
    var length = list.Count;
    for (var i = 0; i < length; i++)
        if (comparer(list[i], item) > 0)
            return i;
    return length;
}

public static void ConditionalInsert<T>(this IList<T> list, T item, Func<T, T, int> comparer)
{
    var index = list.ConditionalIndex(item, comparer);
    if (index >= list.Count) list.Add(item);
    else list.Insert(index, item);
}

And I use them like this:

var newContact = new Contact
{
    FirstName = "David",
    LastName = "Anderson"
};

MyContacts.ConditionalInsert(newContact, (c1,c2) => string.Compare(c1.LastName, c2.LastName));

This has been working fine until one day the list had grown big enough to take quite a few seconds to just add a new contact, and the user experience isn’t that good anymore.

The problem is that the ConditionalIndex method is just using a simple loop to search the index. So if the list has 10,000 items and it happens to be so unlucky that the new item needs to be inserted to the end of the list, it would take the method to go through all 10,000 items and execute that comparer delegate 10,000 times.

Therefore, I decided to optimize it to use a smarter algorithm.

After some digging around, I’ve chosen the Golden Section Search. It is a method that effectively reduces the range to be searched to find the target much faster.

To illustrate this idea in a simple way, let’s say we have a list of numbers from 0 to 9 but missing the number 7, and I want to insert 7 to this list. Golden section search will find the correct index by doing the following:

  1. Determine a split index, using the length of the search range divided by the Golden Ratio. In this case, it would be 10 ÷ 1.618 rounded up to integer, which is 6.
  2. Compare the item at index 6 with the item to be inserted, which is comparing number 6 and 7.
  3. If the item in the list is smaller, take the list before the split index as the new search range and repeat the above 2 steps. If the item is larger, take the list after the split index. If the item is the same, then the correct index is found.
  4. In our example, it would mean to take 8 and 9. The new search range length is 2 so the new split index is 2 ÷ 1.618 ≈ 1.
  5. Compare 9 to 7 means taking 8 as the new search range.
  6. Since there’s only 1 item left in the search range, simply compare 8 to 7 and we get the correct index to insert is the index before 8, which is 7.

My actual implementation of this algorithm is as follow:

public static int ConditionalIndex<T>(this IList<T> list, T item, Func<T, T, int> comparer)
{
    var length = list.Count;
    if (length == 0)
        return 0;

    const double goldenRatio = 1.618;
    var startIndex = 0;
    var endIndex = length - 1;
    while (true)
    {
        // If the start index is larger then the end index,
        // that means the search has already passed all necessary
        // search ranges, just return the start index.
        if (startIndex > endIndex)
            return startIndex;

        // If the start index is the same as the end index,
        // that means there is only 1 item left to compare.
        // Compare this item and return the result.
        if (startIndex == endIndex)
            // If the item in the list is larger than the given item,
            // retrun the index of the item in the list, otherwise return the next index.
            return comparer(list[startIndex], item) > 0 ? startIndex : startIndex + 1;

        // If there are still multiple items in the search range,
        // find the split index of the search range.
        var splitIndex = (int)Math.Round(endIndex - (endIndex - startIndex) / goldenRatio);

        // Compare the item sitting at the spliting point.
        var result = comparer(list[splitIndex], item);
        if (result > 0)
            // If this item is larger than the given item,
            // keep searching the range before the splitting point.
            endIndex = splitIndex - 1;
        else if (result < 0)
            // If this item is smaller than the given item,
            // keep searching the range after it.
            startIndex = splitIndex + 1;
        else
            // Otherwise, return the split index.
            return splitIndex;
    }
}

This has resulted in a huge boost in my stress test on a list that has 500,000 items, almost 100 times faster.

Although the use of the golden ratio is suppose to give you the best split index in general, you need to use Math.Round to round it up to an int for the most accurate result. I found that this is actually a trade off compare to loosing a bit of the precision and just ignore rounding the result casting it directly to and int:

var splitIndex = (int)(endIndex - (endIndex - startIndex) / goldenRatio);

This has given me another 1 to 2% speed boost in my stress test.

These extension methods are part of the EnumerationExtensions class in my open sourced Codify library on GitHub.