RSS 2.0
Sign In
# Wednesday, 25 June 2014

Though parallel and async algorithms solve different tasks, they converge in some cases. And it's not always immediately clear what's the best.

Consider the following task: get a total word count contained in a given a set of urls.

At first we've solved it as a parallel task: indeed this fits to MapReduce pattern when you get urls' contents to count the number of words in  parallel (Map), and then sum word counts per each url to get final result (Reduce). But then we decided that the very same MapReduce algorithm can be implemented with async.

This is a parallel word count:

public static int ParallelWordCount(IEnumerable<string> urls)
{
  var result = 0;

  Parallel.ForEach(
    urls,
    url =>
    {
      string content;

      using(var client = new WebClient())
      {
        content = client.DownloadString(url);
      }

      var count = WordCount(content);

      Interlocked.Add(ref result, count);
    });

  return result;
}

Here is async word count:

public static async Task<int> WordCountAsync(IEnumerable<string> urls)
{
  return (await Task.WhenAll(urls.Select(url => WordCountAsync(url)))).Sum();
}

public static async Task<int> WordCountAsync(string url)
{
  string content;

  using(var client = new WebClient())
  {
    content = await client.DownloadStringTaskAsync(url);
  }

  return WordCount(content);
}

And this is an implementation of word count for a text (it's less important for this discussion):

public static int WordCount(string text)
{
  var count = 0;
  var space = true;

  for(var i = 0; i < text.Length; ++i)
  {
    if (space != char.IsWhiteSpace(text[i]))
    {
      space = !space;

      if (!space)
      {
        ++count;
      }
    }
  }

  return count;
}

Our impressions are:

  1. The parallel version is contained in one method, while the async one is implemeneted with two methods.

    This is due to the fact that C# compiler fails to generate async labmda function. We attribute this to Microsoft who leads and implements C# spec. Features should be composable. If one can implement a method as a lambda function, and one can implement a method as async then one should be able to implement a method as an async lambda function.

  2. Both parallel and async versions are using thread pool to run their logic.

  3. While both implementations follow MapReduce pattern, we can see that async version is much more scaleable. It's because of parallel threads stay blocked while waiting for an http response. On the other hand async tasks are not bound to any thread and are just not running while waiting for I/O.

This sample helped us to answer the question as to when to use parallel and when async. The simple answer goes like this:

  • if your logic is only CPU bound then use parallel API;
  • otherwise use async API (this accounts I/O waits).
Wednesday, 25 June 2014 12:51:28 UTC  #    Comments [5] -
.NET | Thinking aloud
Sunday, 13 March 2016 05:05:34 UTC
Nice post. Are you saying we can use Parallel.ForEach and Async/Wait together ?
vikas
Sunday, 13 March 2016 09:24:35 UTC
We cannot say anything about your specific case.
In general Parallel and async algorithms have their niches.
We can guess that there are problems where both Parallel
worke with async together to solve a single task.
Such problem should probably deal both with IO and CPU intensive task.
Vladimir Nesterovsky
Sunday, 13 March 2016 19:06:41 UTC
Thanks Nesterovsky for the response. Have posted one question in the stack overflow. http://stackoverflow.com/questions/35932203/how-to-use-multi-threading-to-call-stored-procedure-for-each-of-item-in-collecti

the problem is, if i use async/await programming to process each item of the collection and call a stored procedure which is performing insert/update in one table for each item. in this case, we might have deadlock or locking. because each SP takes upto 5 second to process one item. so this might not help me.

and another option is : Use Parallel.Foreach, but i got to know, the Parellel.Foreach would work for first item well, however will give connection related error. as it Parallel.Foreach uses the same connection for all items.

Now, I don't know, how to solve my problem.

Thanks

vikas
Monday, 14 March 2016 06:30:25 UTC
1. I don't see why should you get deadlock or lock just because
each iteration takes up to 5 seconds.

2. Be aware that SQL Server has it's policy against DDOS attacks.
If you will create too many parallel tasks SQL Server just can decide
to deny some of request or put some request into queue.

3. From your task definition on StackOverflow it seems that it can be completely
implemented within SQL server. So, it will greatly reduce server roundtrips.
Vladimir Nesterovsky
Monday, 14 March 2016 06:36:42 UTC
Also, you can create all async tasks (e.g. List&lt;Task&gt;), and then call Task.WhenAll() to wait them all to run.
This will probably give better results than sequential runs.
Vladimir Nesterovsky
Comments are closed.
Archive
<2014 June>
SunMonTueWedThuFriSat
25262728293031
1234567
891011121314
15161718192021
22232425262728
293012345
Statistics
Total Posts: 387
This Year: 3
This Month: 0
This Week: 0
Comments: 2194
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)