Introduction to Asynchronous Programming in C# (Part 2)

Introduction to Asynchronous Programming in C# (Part 2)

Hi everyone, Jeremy Kruer here. Today I am going to take a little break from the F# programming and talk about asynchronous programming in C#. A while back I created a video about Asynchronous Programming in C# and I promised that I would show an example of how to convert a synchronous Web API into an asynchronous Web API. A viewer posted a comment asking for me to create part 2, so today I am going to create part 2 of that video.

Quick Asynchronous Overview

Asynchronous programming is a way to help scale a web application. It allows the server to process more simultaneous requests at once as opposed to synchronous APIs that tie up threads while waiting for slow IO tasks to complete. The most common areas that will benefit from asynchronous programming are:

  1. Network Requests
  2. Database Requests
  3. Disk/IO Requests

Asynchronous programming allows the thread to serve other incoming requests while it is waiting for these slow tasks to complete.

Project Overview

I created a synchronous web API that makes multiple network requests. Then I am going to convert that synchronous code into asynchronous code. This will allow you to see what is required to convert synchronous code into asynchronous code as well as compare the two code bases side by side. In the end you will see that the sync and async code looks very similar. There is not a lot of difference between the sync and the async code. This is one of the things that I like a lot about the C# implementation of async code. It keeps the code very readable and easy to understand.

I posted a copy of the code on GitHub so you can clone a copy of it and work along side of me. https://github.com/jkruer01/SyncVsAsync

The code example isn't a very realistic real world example, but it works to serve the point of how async code would be used.

Controller

The controller is called SearchBingSyncController and it only has a single Get method on it. I'm assuming that you have worked with Web API before, so I am not going to go into the details of how Web API works here. The Get method makes a call to a SyncService which is where all of the actual work is done. Here is the code for the SearchBingSyncController.

using System.Collections.Generic;  
using Microsoft.AspNetCore.Mvc;  
using SyncVsAsync.Services;

namespace SyncVsAsync.Controllers  
{
    [Route("api/[controller]")]
    public class SearchBingSyncController : Controller
    {
        private readonly SyncService _syncService;

        public SearchBingSyncController()
        {
            _syncService = new SyncService();
        }

        // GET api/values
        [HttpGet]
        public List<string> Get()
        {
            return _syncService.Search();
        }
    }
}

Service

The service does a Bing search for "C#" and then it downloads the first 5 search results. The service returns the Bing search page URL as well as the URLs for the 5 search results that it downloaded. Like I said, it isn't very useful or practical in the real world but I wanted to create something that would represent a very IO intensive and long running task.

In the real world, a slow task that could benefit from Asynchronous programming could be a database call that takes 200 milliseconds to complete, but I wanted to create an exaggerated example. Here is the code for the Sync Service.

using System;  
using System.Collections.Generic;  
using System.Linq;  
using System.Net;  
using HtmlAgilityPack;

namespace SyncVsAsync.Services  
{
    public class SyncService
    {
        public List<string> Search()
        {
            var bingUrls = getBingUrls();
            return bingUrls.SelectMany(CrawlBingSearchPage).ToList();
        }

        private List<string> CrawlBingSearchPage(string url)
        {
            var result = new List<string>
            {
                url
            };

            var doc = LoadDocument(url);
            var searchResultsGrid = GetSearchResultsGrid(doc);
            if (searchResultsGrid == null) return result;

            var searchResultsLinks = GetSearchResultsLinks(searchResultsGrid);

            foreach (var linkUrl in searchResultsLinks)
            {
                result.Add(linkUrl);
                LoadDocument(linkUrl);
            }

            return result;
        }

        private static List<string> GetSearchResultsLinks(HtmlNode searchResultsGrid)
        {
            return searchResultsGrid.SelectNodes("//li/h2/a[@href]")
                .Select(link => link.GetAttributeValue("href", null))
                .Where(linkUrl => linkUrl != null
                                  && linkUrl.ToLower().StartsWith("http"))
                .Take(5)
                .ToList();
        }

        private static HtmlNode GetSearchResultsGrid(HtmlDocument doc)
        {
            return doc.DocumentNode
                .Descendants("ol")
                .FirstOrDefault(d => d.Attributes.Contains("id")
                                     && d.Attributes["id"].Value.Contains("b_results"));
        }

        private HtmlDocument LoadDocument(string url)
        {
            try
            {
                using (var client = new WebClient())
                {
                    var content = client.DownloadString(url);
                    var doc = new HtmlDocument();
                    doc.LoadHtml(content);
                    return doc;
                }
            }
            catch (Exception)
            {
                //I don't care if it fails for this demo.
                return new HtmlDocument();
            }
        }

        private IEnumerable<string> getBingUrls()
        {
            return new List<string>
            {
                "http://www.bing.com/search?q=C%23&first=0"
            };
        }
    }
}

I'm not going to go into specifics on what the code is doing, because it isn't really important. What is important, is line 63 within the LoadDocument method. This is where the actual network call is made to download a webpage. This is used for both the Bing search page as well as to download each of the first 5 search results:

var content = client.DownloadString(url);  

This line is using a WebClient to download a website. This code is synchronous and from a programming standpoint is very slow. While the computer is making this web request, the thread that is handling the request is just sitting and spinning while it waits for the request to finish downloading. Even if it only takes half a second to download a website, that is half a second that the thread could be serving other incoming requests. Instead it is just sitting there doing nothing.

Run the code and go to /api/SearchBing/Sync. The API call takes about 20 seconds to complete.

Async Naming Conventions

The Microsoft recommended naming conventions for async methods it to append the "Async" suffix to the end of every async method. This is an easy way to differentiate between methods that are synchronous and methods that are asynchronous. If you are using a class that you didn't write, you don't want to have to open every single method to figure out if the method is synchronous or asynchronous. By following this naming convention it will help make it easier for everyone involved.

Async return types

Every async method should return either Task or Task<T>. Although it is technically possible to return void it should be avoided at ALL costs! The reason is that if an async method returns void then the caller of that method has no way of knowing once that task has completed. Do everyone a favor and pretend that returning void from an async method is not allowed. Any synchronous method that previously returned void should return Task once converted to async. Any method that previously returned a value should return Task<T> once converted to async.

Conversion to Asynchronous

Make a copy of SyncService and name it AsyncService. Rename the class to AsyncService. The first thing we want to change is the line 63 that was mentioned before. Based on the Microsoft recommended naming conventions you would assume that to convert DownloadString from sync to async you would use the DownloadStringAsync method on WebClient. Unfortunately this is one of the gotchas that exist. The DownloadStringAsync method existed prior to the introduction of the Task Parallel Library and the async/await keywords. For backwards compatibility reasons, Microsoft had to leave that existing method alone.

One way you can tell is that the return type is void. As mentioned earlier although it is possible to return void it is avoided at all costs and very rarely used. Instead, you want to use the DownloadStringTaskAsync method.

Change line 63 to this:

var content = client.DownloadStringTaskAsync(url);  

await Keyword

At this point, content if of type Task. However, we don't want content to be a task we want content to be the actual string result. Also, we don't want to move forward until the task is completed. To accomplish this, we want to add the await keyword immediately before the method call:

var content = await client.DownloadStringTaskAsync(url);  

The await keyword will wait for the task to complete and will also assign the value of the task to the content variable.

async Keyword

At this point, we will get a compiler error complaining that we can only use the await keyword within async methods. Within C# you always have to use async and await together which is why I often refer to it as async / await.

Add the async modifier to the LoadDocument method:

private async HtmlDocument LoadDocument(string url)  

At this point, we will get a new compiler error that says

The return type of an async method must be void, Task, or Task

Since we want to return a value we need to return a Task<T>. Change the method signature to:

private async Task<HtmlDocument> LoadDocument(string url)  

At this point, the compiler errors go away for this method, but we aren't done yet. Do you remember the naming conventions that we talked about earlier? We need to refactor the method to rename it and append the "Async" suffix (make sure you update all references that call this method):

private async Task<HtmlDocument> LoadDocumentAsync(string url)  

Async / Await Virus

When I first started learning about async/await I thought that I could just make a few small changes to my code and I would be done. Unfortunately this isn't the case. Once you start introducing asynchronous code into your code base, it will quickly spread everyone just like a virus. Let me explain why.

As mentioned earlier, the most common areas for using Asynchronous code is:
1. Network Requests
2. Database Requests
3. Disk/IO Requests

When working with web apps, almost every call ends in one of these 3 requests and it is usually at the bottom of the stack where it happens. Here is the catch, any method that calls an async method must also be converted to async. If the lowest level of your web call is a call to the database, and that database call is converted to async then every method up the chain all the way to the controller must be converted to async as well.

Converting Other Methods

Now we need to start working our way up the chain converting each method one at a time. Each method is going to go through the same process that we used for the LoadDocument method.

  1. Change to call the Asynchronous method instead of the Synchronous method.
  2. Add the await keyword.
  3. Add the async keyword.
  4. Change the return type to either Task or Task<T>.
  5. Add the "Async" suffix to the end of the method name.

CrawlBingSearchPage Method

        private async Task<List<string>> CrawlBingSearchPageAsync(string url)
        {
            var result = new List<string>
            {
                url
            };

            var doc = await LoadDocumentAsync(url);
            var searchResultsGrid = GetSearchResultsGrid(doc);
            if (searchResultsGrid == null) return result;

            var searchResultsLinks = GetSearchResultsLinks(searchResultsGrid);

            foreach (var linkUrl in searchResultsLinks)
            {
                result.Add(linkUrl);
                await LoadDocumentAsync(linkUrl);
            }

            return result;
        }

Search Method

This method requires an additional modification. In the synchronous version it uses Linq's SelectMany method. Unfortunately, we cannot use SelectMany with an async method (as far as I am aware). So we need to refactor the code and use a foreach loop instead.

        public async Task<List<string>> SearchAsync()
        {
            var bingUrls = getBingUrls();
            var result = new List<string>();
            foreach (var bingUrl in bingUrls)
            {
                result.AddRange(await CrawlBingSearchPageAsync(bingUrl));
            }
            return result;
        }

Controller

In the controller we need to replace the SyncService with the AsyncService and we also need to convert the Get method to async using the same steps we used before:
1. Change to call the Asynchronous method instead of the Synchronous method.
2. Add the await keyword.
3. Add the async keyword.
4. Change the return type to either Task or Task<T>.
5. Add the "Async" suffix to the end of the method name.

using System.Collections.Generic;  
using System.Threading.Tasks;  
using Microsoft.AspNetCore.Mvc;  
using SyncVsAsync.Services;

namespace SyncVsAsync.Controllers  
{
    [Route("api/[controller]")]
    public class SearchBingAsyncController : Controller
    {
        private readonly AsyncService _syncService;

        public SearchBingAsyncController()
        {
            _syncService = new AsyncService();
        }

        // GET api/values
        [HttpGet]
        public async Task<List<string>> GetAsync()
        {
            return await _syncService.SearchAsync();
        }
    }
}

Running the Code

At this point, the application should compile and you should be able to run it. If you go to /api/SearchBingSync you should hit the synchronous code and if you go to /api/SearchBingAsync you should hit the asynchronous code. Both controllers should return the same value and both should take approximately the same amount of time to complete.

Why Isn't the Async Code Faster?

both should take approximately the same amount of time to complete

Some of you may be confused about this statement. Why bother converting code from synchronous to asynchronous if we aren't going to get a performance improvement?

Think about a real life example of an asynchronous process. When you go to McDonald's to order food, the cashier takes your order, sends it to the kitchen, and then helps the next person in line. The cashier doesn't stand there and do nothing while your food is being prepared. This is how asynchronous code works.

When the cashier starts helping the next person in line while your food is being prepared, it doesn't make your food get prepared any faster. However, it allows McDonald's to serve more customers at the same time. This is our goal with a webserver...to server more customers at the same time. That is what asynchronous code helps us to do.

Load Testing

I wanted to demonstrate this so in the same solution on GitHub I created a console application that would simulate load to our Web API. It will generate 50 simultaneous requests and record whether the call was successful or not and how long the call took.

My expectation was that the synchronous controller would start timing out but the asynchronous controller would be able to handle all of the requests. Unfortunately that is not what happened. Actually, I saw the exact opposite. When I compared the synchronous to the asynchronous the synchronous controller was typically able to successfully complete more requests than the asynchronous controller.

Why is this happening? I'm not sure yet but I am going to find out. If you know why, please leave a comment below and let me know.

If you want to find out why this is happening, subscribe below and once I find out I will create new post to and let you know.

Get Notified of New Posts

* indicates required

Related Article