Introduction of Parallelism Threading
ith this post I want to start a series devoted to the new parallel programming features in .NET Framework 4 and introduce you the Task Parallel Library (TPL).Update. The list of posts in this series:
- Getting Started (this post)
- Task Schedulers and Synchronization Context
- Task Cancellation
- Blocking Collection and the Producer-Consumer Problem
I have a simple goal this time. I want to parallelize a long-running console application and add a responsive WPF UI. By the way, I’m not going to concentrate too much on measuring performance. I’ll try to show the most common caveats, but in most cases just seeing that the application runs faster is good enough for me.
Now, let the journey begin. Here’s my small program that I want to parallelize. The SumRootN method returns the sum of the nth root of all integers from one to 10 million, where n is a parameter. In the Main method, I call this method for roots from 2 through 19. I’m using the Stopwatch class to check how many milliseconds the program takes to run.
using System.Threading.Tasks;
using System.Threading;
using System.Diagnostics;
using System;
class Program
{
static void Main(string[] args)
{
var watch = Stopwatch.StartNew();
for (int i = 2; i < 20; i++)
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
}
Console.WriteLine(watch.ElapsedMilliseconds);
Console.ReadLine();
}
public static double SumRootN(int root)
{
double result = 0;
for (int i = 1; i < 10000000; i++)
{
result += Math.Exp(Math.Log(i) / root);
}
return result;
}
}
using System.Threading;
using System.Diagnostics;
using System;
class Program
{
static void Main(string[] args)
{
var watch = Stopwatch.StartNew();
for (int i = 2; i < 20; i++)
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
}
Console.WriteLine(watch.ElapsedMilliseconds);
Console.ReadLine();
}
public static double SumRootN(int root)
{
double result = 0;
for (int i = 1; i < 10000000; i++)
{
result += Math.Exp(Math.Log(i) / root);
}
return result;
}
}
Since I’m using a for loop, the Parallel.For method is the easiest way to add parallelism. All I need to do is replace
for (int i = 2; i < 20; i++)
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
}
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
}
Parallel.For(2, 20, (i) =>
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
});
{
var result = SumRootN(i);
Console.WriteLine("root {0} : {1} ", i, result);
});
When you use the Parallel.For method, the .NET Framework automatically manages the threads that service the loop, so you don’t need to do this yourself. But remember that running code in parallel on two processors does not guarantee that the code will run exactly twice as fast. Nothing comes for free; although you don’t need to manage threads yourself, the .NET Framework still uses them behind the scenes. And of course this leads to some overhead. In fact, if your operation is simple and fast and you run a lot of short parallel cycles, you may get much less benefit from parallelization than you might expect.
Another thing you probably noticed when you run the code is that now you don’t see the results in the proper order: Instead of seeing increasing roots, you see quite a different picture. But let’s pretend that we just need results, without any specific order. In this blog post, I’m going to leave this problem unresolved.
Now it’s time to take things one step further. I don’t want to write a console application; I want some UI. So I’m switching to Windows Presentation Foundation (WPF). I have created a small window that has only one Start button, one text block to display results, and one label to show elapsed time.
The event handler for the sequential execution looks pretty simple:
private void start_Click(object sender, RoutedEventArgs e)
{
textBlock1.Text = "";
label1.Content = "Milliseconds: ";
var watch = Stopwatch.StartNew();
for (int i = 2; i < 20; i++)
{
var result = SumRootN(i);
textBlock1.Text += "root " + i.ToString() + " " +
result.ToString() + Environment.NewLine;
}
var time = watch.ElapsedMilliseconds;
label1.Content += time.ToString();
}
{
textBlock1.Text = "";
label1.Content = "Milliseconds: ";
var watch = Stopwatch.StartNew();
for (int i = 2; i < 20; i++)
{
var result = SumRootN(i);
textBlock1.Text += "root " + i.ToString() + " " +
result.ToString() + Environment.NewLine;
}
var time = watch.ElapsedMilliseconds;
label1.Content += time.ToString();
}
Compile and run the application to make sure that everything works fine. As you might notice, UI is frozen and the text block does not update until all of the computations are done. This is a good demonstration of why WPF recommends never executing long-running operations in the UI thread.
Let’s change the for loop to the parallel one:
Parallel.For(2, 20, (i) =>
{
var result = SumRootN(i);
textBlock1.Text += "root " + i.ToString() + " " +
result.ToString() + Environment.NewLine;
});
{
var result = SumRootN(i);
textBlock1.Text += "root " + i.ToString() + " " +
result.ToString() + Environment.NewLine;
});
What happened? Well, as I mentioned earlier, the Task Parallel Library still uses threads. When you call the Parallel.For method, the .NET Framework starts new threads automatically. I didn’t have problems with the console application because the Console class is thread safe. But in WPF, UI components can be safely accessed only by a dedicated UI thread. Since Parallel.For uses worker threads besides the UI thread, it’s unsafe to manipulate the text block directly in the parallel loop body. If you use, let’s say, Windows Forms, you might have different problems, but problems nonetheless (another exception or even an application crash).
Luckily, WPF provides an API that solves this problem. Most controls have a special Dispatcher object that enables other threads to interact with the UI thread by sending asynchronous messages to it. So our parallel loop should actually look like this:
Parallel.For(2, 20, (i) =>
{
var result = SumRootN(i);
this.Dispatcher.BeginInvoke(new Action(() =>
textBlock1.Text += "root " + i.ToString() + " " +
result.ToString() + Environment.NewLine)
, null);
});
{
var result = SumRootN(i);
this.Dispatcher.BeginInvoke(new Action(() =>
textBlock1.Text += "root " + i.ToString() + " " +
result.ToString() + Environment.NewLine)
, null);
});
Now I have our parallel WPF application running on my computer almost twice fast. But what about this freezing UI? Don’t all modern applications have responsive UI? And if Parallel.For starts new threads, why is the UI thread still blocked?
The reason is that Parallel.For tries to exactly imitate the behavior of the normal for loop, so it blocks the further code execution until it finishes all its work.
Let’s take a short pause here. If you already have an application that works and satisfies all your requirements, and you want to simply speed it up by using parallel processing, it might be enough just to replace some of the loops with Parallel.For or Parallel.ForEach. But in many cases you need more advanced tools.
To make the UI responsive, I am going to use tasks, which is a new concept introduced by the Task Parallel Library. A task represents an asynchronous operation that is often run on a separate thread. The .NET Framework optimizes load balancing and also provides a nice API for managing tasks and making asynchronous calls between them. To start an asynchronous operation, I’ll use the Task.Factory.StartNew method.
So I’ll delete the Parallel.For and replace it with the following code, once again trying to change as little as possible.
for (int i = 2; i < 20; i++)
{
var t = Task.Factory.StartNew(() =>
{
var result = SumRootN(i);
this.Dispatcher.BeginInvoke(new Action(() =>
textBlock1.Text += "root " + i.ToString() + " " +
result.ToString() + Environment.NewLine)
,null);
});
}
{
var t = Task.Factory.StartNew(() =>
{
var result = SumRootN(i);
this.Dispatcher.BeginInvoke(new Action(() =>
textBlock1.Text += "root " + i.ToString() + " " +
result.ToString() + Environment.NewLine)
,null);
});
}
Compile, run… Well, UI is responsive. I can move and resize the window while the program calculates the results. But I have two problems now:
1. My program tells me that it took 0 milliseconds to execute.
2. The program calculates the method only for root 20 and shows me a list of identical results.
Let’s start with the last one. C# experts can shout it out: closure! Yes, i is used in a loop, so when a thread starts working, i’s value has already changed. Since i is equal to 20 when the loop is exited, this is the value that is always passed to newly created tasks.
Problems with closure like this one are common when you deal with lots of delegates in the form of lambda expressions (which is almost inevitable with asynchronous programming), so watch out for it. The solution is really easy. Just copy the value of the loop variable into a variable declared within the loop. Then use this local variable instead of the loop variable.
for (int i = 2; i < 20; i++)
{
int j = i;
var t = Task.Factory.StartNew(() =>
{
var result = SumRootN(j);
this.Dispatcher.BeginInvoke(new Action(() =>
textBlock1.Text += "root " + j.ToString() + " " +
result.ToString() + Environment.NewLine)
, null);
});
}
{
int j = i;
var t = Task.Factory.StartNew(() =>
{
var result = SumRootN(j);
this.Dispatcher.BeginInvoke(new Action(() =>
textBlock1.Text += "root " + j.ToString() + " " +
result.ToString() + Environment.NewLine)
, null);
});
}
Sometimes it’s OK to move on without waiting for the threads to finish their jobs. But sometimes you need to get a signal that the work is done, because it affects your workflow. A timer is a good example of the second scenario.
To get my time measurement, I have to wrap code that reads the timer value into yet another method from the Task Parallel Library: TaskFactory.ContinueWhenAll. It does exactly what I need: It waits for all the threads in an array to finish and then executes the delegate. This method works on arrays only, so I need to store all the tasks somewhere to be able to wait for them all to finish.
Here’s what my final code looks like:
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
}
public static double SumRootN(int root)
{
double result = 0;
for (int i = 1; i < 10000000; i++)
{
result += Math.Exp(Math.Log(i) / root);
}
return result;
}
private void start_Click(object sender, RoutedEventArgs e)
{
textBlock1.Text = "";
label1.Content = "Milliseconds: ";
var watch = Stopwatch.StartNew();
List<Task> tasks = new List<Task>();
for (int i = 2; i < 20; i++)
{
int j = i;
var t = Task.Factory.StartNew(() =>
{
var result = SumRootN(j);
this.Dispatcher.BeginInvoke(new Action(() =>
textBlock1.Text += "root " + j.ToString() + " " +
result.ToString() +
Environment.NewLine)
, null);
});
tasks.Add(t);
}
Task.Factory.ContinueWhenAll(tasks.ToArray(),
result =>
{
var time = watch.ElapsedMilliseconds;
this.Dispatcher.BeginInvoke(new Action(() =>
label1.Content += time.ToString()));
});
}
}
{
public MainWindow()
{
InitializeComponent();
}
public static double SumRootN(int root)
{
double result = 0;
for (int i = 1; i < 10000000; i++)
{
result += Math.Exp(Math.Log(i) / root);
}
return result;
}
private void start_Click(object sender, RoutedEventArgs e)
{
textBlock1.Text = "";
label1.Content = "Milliseconds: ";
var watch = Stopwatch.StartNew();
List<Task> tasks = new List<Task>();
for (int i = 2; i < 20; i++)
{
int j = i;
var t = Task.Factory.StartNew(() =>
{
var result = SumRootN(j);
this.Dispatcher.BeginInvoke(new Action(() =>
textBlock1.Text += "root " + j.ToString() + " " +
result.ToString() +
Environment.NewLine)
, null);
});
tasks.Add(t);
}
Task.Factory.ContinueWhenAll(tasks.ToArray(),
result =>
{
var time = watch.ElapsedMilliseconds;
this.Dispatcher.BeginInvoke(new Action(() =>
label1.Content += time.ToString()));
});
}
}
Multi-core machines are dominant these days and one could easily find machines with multicore by default. Do we have to change the way we write programs? Wouldn’t running my program on a multi-core machine automatically improve performance? After all aren’t there more threads?
The answer is no! Running a program on a multi-core machine has nearly zero performance enhancements. With the exception of IO operations, your program will still use one thread of a single core at a time. Thread switching occurs – which by the way incurs overheads but at all times, only one thread will be active
What does multithreading have to do with multi-cores? Can’t we write multi-threaded codes if we have a single core? Yes, we can. But again, as long as we have a single core, only one thread at a time will work since all threads belong to that same single core… logical enough. When your multi-threaded program runs on a multi-core machine, threads belonging to different cores will be able to run together and hence your program will take advantage of the hardware power.
Multithreading vs. Parallelism
It is useful to point out the difference between traditional multithreading and parallel programming. In the past, most computers had a single CPU and multithreading was used to take advantage of idle time, such as when a program blocks for user input. Using this approach, one thread can be executed while another is waiting. On a single-CPU system, multithreading is used to allow two or more tasks to share the CPU. Although this type of multithreading will remain useful, it was not designed for situations in which two or more CPUs are available.When multiple CPUs are present, a second type of multithreading capability is needed because it is possible to execute portions of a program simultaneously, with each part executing on its own CPU. This can be used to significantly speed up the execution of some types of operations, such as sorting, transforming, or searching a large array.
Challenges in Development
Microsoft .NET 4 comes with a host of new features that makes enterprise application development more productive and manageable. .NET has had support for parallel programming since version 1.0, now referred to as classic threading, but it was hard to use effectively and made you think too much about managing multiple threads of the parallel aspects of your program, which detracts from focusing on what needs to be done.While developing and running with custom applications using Microsoft .NET platform, there are many challenges observed across enterprises, however it is not exhaustive. One of them would be
- Faster and efficient processing of complex algorithms and large data sets
- Currently Hardware manufacturers are not able to increase the speed of individual CPUs; instead they are increasing the cores to provide high speed. Due to this trend, multi core processors are becoming the order of the day. There is hardly an enterprise where applications are not running on multi core servers, but still there is no direct correlation of improvement in performance. This enables parallel activities and avoids dead locks is challenging tasks for developers.
- Within the Microsoft Technology Center, a performance benchmarking of the same program on single core and then a multi core system was carried out. It was found that the performance of the application did not increase significantly just by adding cores unless the underlying program constructs were adapted to multi core architecture. They are in the process publishing this research and an improvement from using parallel programming APIs is in the pipeline.
Addressing Challenges
- .NET Framework 4 introduces a parallel library with new a programming model that considerably simplifies development and debugging of applications and can take advantage of the modern multi core hardware. The parallel library takes care of various complexities related to multi-core programming like synchronization issues, locking, task division, etc. and can automatically distribute work to multiple cores depending on their availability under the hood leaving the developers to focus on the business processes.
- Importantly, programs written or targeted for multi-cores can also run on single core machines without any syntax or configuration changes. However programs that are not written using parallel API needs to be changed/adapted to use parallel API if it is to get performance benefit from multi core architecture, though the change in program is a very small. Enterprises will find effective usage of multi core machines for application logic developed using .NET Framework 4 parallel APIs and see significant improvement in application performance.
Advantages of Parallel Programming for developers
With this article, I intended to bring to your notice that, each one of us needs to understand and start using this knowledge in applications. Going forward all servers will have multicore and we will not use that multicore power unless we consider this while development.There are some cautions that we should be aware of.
- Considering Overheads
- Coordinating Data
- Scaling Applications