Why locking your data?

In one of my previous posts i described the motivation for using Parallel processing and how it can be misused. In this post i want to simplify the concept of locking mechanisms as one of the methods for enabling a correct use of Parallel processing.

So lets examine a very simple example in our code:

class SafeClass
{
    int x;
    void Increment()
    {
        x++;
    }
}

How many instructions do you think the computer actually executes in the “Increment” function?

You might be surprised, but assuming we have a standard computer and we are compiling this in .NET, this will be compiled into multiple instructions. The computer does not know what increment means, our computer only knows very basic instructions and it will construct the appropriate instruction sequence for the Incrementation at compile time. If we compile this simple program and inspect the compiled code with “JustDecompile” – we will see these instructions:

IL

Notice the 3rd and the 10th instructions:
– The 3rd instruction is loading a copy of the value from field x.
– The 10th instruction is storing the new incremented value back to field x.
See the instructions in between? Those are the ones that actually perform the incrementation. Remember, our processor is discrete, it executes only 1 instruction at a time.

Right about now you may be asking yourself why should you even bother understanding this. Well, as i mentioned in my previous post, parallel programming is a must-know in today’s multicore era for every software engineer. What does it have to do with incrementing the value of a field? I’ll explain.

Imagine your program now has to run on a multi-core machine and your application is now using multi-threading techniques to achieve parallelizm. Lets assume your program has 2 threads at this point and they both can operate on our previously mentioned method at the same time. What happens if each of them needs to increment the value of that field? Do we you know who is first? Who is second? Maybe they perform the incrementations at the same time?

You see the danger? Not yet? Bear with me.

Now, imagine that both threads perform the “Increment” function. The first thread performs the 3rd instruction (pushing a copy of the field value on to the stack) and before getting to the 10th instruction (which supposed to store the incremented value back to the field) the 2nd thread performs the read instruction as well. What value did the 2nd thread read? That’s right, the initial value, not the incremented value. So both threads incremented the same value and got the same result that was stored in the field.

At this point, your program ‘thinks’ that the value was incremented by a total of 2 (because 2 incrementations were performed), the actual increment was only by 1 and the data is now corrupt! Imagine if this happens with a bank account.

Remedy:

So how do we protect ourselves and our code from such disasters in a parallel environment? That is where locks come in to play. The basic idea is that a lock is like a guard that can enforce atomicity, meaning each high-level statement in our code will be performed from beginning to end before another thread can enter and read (or write) the value.

Until all the instructions are completed by the first thread, the lock will hold all other threads from accessing the field until the first threads completes all its work. The simple lock is basically a policeman enforcing there is always only 1 thread using the guarded data.

You can read more in this awesome post from Joseph Albahari.

I hope this post was insightful as to the simplest dangers we face in a parallel processing environment.

If you are angry, mad or just happy about this post and want to share it with me – leave a comment.

Code on,
Shonn Lyga.

About Shonn Lyga

Obsessed with anything and everything in Software Engineering, Technology and Science
This entry was posted in .NET, Multithreading, OOP and tagged , , , . Bookmark the permalink.

2 Responses to Why locking your data?

  1. Tal Jerome says:

    Just wanted to point out, that updating a shared variable is better done using interlocked functions (or intrinsics) instead of locks it’s much faster, can’t deadlock and imho, more readable.
    To the best of my knowledge C# supports interlocked operations.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s