====== Lecture 8 Scribe Notes ======
//by Donald Soon, Charlie Xu, Stephen Kimura//
==== Race Conditions ====
Now it is time to introduce you to the wonderful (and at times confusing) world of race conditions. But before we even define what they are, let’s take a look at a hypothetical situation, and see why they’re even worth bringing up.
=== ATM ===
Suppose two people (like Professor Kohler and his Mother) have a shared bank account that can be accessed by ATM from anywhere in the world.
{{basicatm.jpg?400x300 }}
{{2007spring:notes:basicatm_mom.jpg?400x300}}
{{2007spring:notes:shopperb1.jpg }}
^ATM1 ^ ---------------------------------------|-------------------------------------------- ^Bank ^ ---------------------|-------------------------------------- ^ATM2 ^-------
Moreover, the bank owners hired a bunch of UCLA students (who haven't taken CS111) to write the code to handle ATM operations such as depositing, withdrawing and checking balances. The bank's machine that will run this code will be multi-threaded and have normal scheduling algorithms and context switches that any normal machine would have. These students decided to define depositing and withdrawing as a function that takes in two parameters, a bank account and an amount to deposit/withdraw. What would happen if Professor Kohler and his Mother access the same bank account using two different ATMs at the same instant in time, and the bank uses the below code (written by the student) to process these transactions?
typedef struct bank_account { // our bank account struct holding a balance
uint32_t balance;
} bank_account_t;
void deposit(bank_account_t *b, uint32_t amt) { // deposit and withdraw called when an ATM transaction is process
b->balance += amt;
}
void withdraw(bank_account_t *b, uint32_t amt) {
b->balance -= amt;
}
Hold it. What if we try to withdraw more money than what we have in our balance (ie bal < amt)? Since we are using an unsigned int, ''b->balance'' would suddenly be a VERY large number (we're rich!). But eventually, the bug would easily be fixed like this:
void withdraw(bank_account_t *b, uint32_t amt) {
if (b->balance >= amt) // makes sure we have enough money before withdrawing!
b->balance -= amt;
}
Now that's more like it.
Shall we test this code? Yes, let's. In the test, Professor Kohler's Mother deposits his allowance of $5 while he withdraws $5 of lunch money **at the same time**.
|time| __**Mom**__ | __**Balance**__ | **__Professor__** ^ __**Comments**__ ^
| 0 | | $5 | ^The bank account starts off with $5 in it ^
| 1 | dep(b,5) | $5 | ^Mom tries a deposit of $5 ^
| 2 | loads balance of $5 | $5 | ^loads the $5 balance from bank's memory ^
| 3 | control transfer ---|-------$5 -------|--> ^Control switches to Professor ^
| 4 | | $5 | with(b,5) ^Professor calls withdraw of $5 ^
| 5 | | $5 | loads balance of $5 ^loads the $5 balance from memory (note: mom's $5 deposit hasn't fully finished yet) ^
| 6 | | $0 <-----|---stores $0 ^Professor's withdraw call stores $0 to memory and returns ^
| 7 | | $0 | ^Bank Balance is now $0 in the bank's memory ^
| 8 | <--------------|-------$0-------|--control transfer ^Control goes back to Mom ^
| 9 | stores $10 ------ |----->$10 | ^Mom deposits $5 to "her" balance of $5, and stores this $10 to memory ^
| 10 | | $10 | ^Balance is now $10 in memory! ^
What went wrong here? The ''deposit'' and ''withdraw'' of $''5'' should have essentially cancelled each other out. The end result should be ''b->bal'' equals ''5'' but it's not, it's ''10''. The subtle problem, is that both the Professor and his Mom load the same value of "balance" and then go to complete their respective transactions. If either the Professor or his Mom could be patient and wait for the other to complete their transaction first, then there wouldn't be a problem, because the balance could correctly update itself AFTER each transaction, but not in the middle of one!
We can even try another one. This time let's assume both the Professor and his mother withdraw $10 from their account. Here is a possible execution order of those instructions:
|time| __**Mom**__ | __**Balance**__ | **__Professor__** ^ __**Comments**__ ^
| 0 | | $10 | ^The bank account starts off with $10 in it ^
| 1 | with(b,10) | $10 | ^Mom tries a withdrawal of $10 ^
| 2 | loads balance of $10 | $10 | ^loads the $10 balance from bank's memory and the if statement evaluates to TRUE ^
| 3 | control transfer ---|-------$10-------|--> ^Control switches to Professor ^
| 4 | | $10 | with(b,10) ^Professor calls withdraw of $10 ^
| 5 | | $10 | loads balance of $10 ^loads the $10 balance from memory and the if statement is TRUE ^
| 6 | | $0 <-----|---stores $0 ^Professor's balance is decremented to $0 and stored to memory ^
| 7 | | $0 | ^Bank Balance is now $0 in the bank's memory ^
| 8 | <--------------|-------$0-------|--control transfer ^Control goes back to Mom ^
| 9 | loads $0 | $0 | ^ Mom's withdraw RELOADS the current balance to decrement, which is now $0 ^
| 7 | $42,949,672.46 | $0 | ^$0 - $10 causes overflow, and we're rich! ^
| 10 | | $42,949,672.46| ^Good for us, bad for the bank...UCLA students are now fired ^
The problem shown above is a lot harder to spot than you would think. In real life, it is so hard to debug that it has it's own special name: the "Heisen-Bug", named after the Heisenberg Uncertainty principle. In most general terms, this principle states that when you try to observe or measure a variable in a system, you will inherently disturb other variables in that system when taking that measurement! In terms our program, the ignorant programmer might add "printf()" statements here and there to see where the problem is occuring, yet adding statements to "observe" the current state of the program will inherently throw off the timing and execution order of the program!
Thus, we hope that these examples have shown how easly the output is affected by slight changes in the timing and order in which instructions in a program execute. We now define a few terms in order to formalize the ideas behind race conditions, and so that we can better predict when they will occur and prevent them from happening!
===Sequential Consistency===
Two or more processes are sequentially consistent if for any execution, the same observable results as that execution could be obtained by interleaving the operations of those processes in some sequential order. In other words, if we have multiple threads/processes running different pieces of code trying to accomplish a task, the final outcome should be the result of these threads executing in some sequential order. We should be able to come up with some order that these instructions were executed to get that result. For example, if two processes try to write to memory, they cannot write into that memory at the same time and cause that memory to become jumbled and hold an undefined value. The value in that memory should be as if either process 1 executed, then process 2, or vice-versa. Phew...hopefully this concept was repeated in a enough different ways finally become clear!
Take our example before, the order in which the instructions were executed changed the result from a correct result to an incorrect result(it made us rich), thus the threads running were not sequentially consistent. Note that sequential consistency does not gauruntee that the result will be correct, it only says that the result will be predictable, perhaps predictably wrong, but that's a start.
To fully understand whether or not we can write programs that are sequentially consistent, we need to see what kind of hardware consistency our EE friends designing hardware can provide.
What kind of consistency model does the processor (hardware) provide?
A processor provides sequential consistency for most assembly INSTRUCTIONS. That is, if two processes execute a write memory assembly instruction to the same memory, the result in memory will be from either the value written by process A or process B, never a weird undefined value somewhere in between. (There are some instructions that do not provide sequential consistency; see Lecture 9 for more; but reads and writes usually do.)
However, we still need to ensure that we have sequential consistency for higher level operations, and we can use this "simple" gauruntee from the processor to build up higher level sequential consistency!
Lets take a look at the assembly code for withdraw and deposit
deposit(a, amt)
let “a” be stored in register %eax and “amt” in register %ebx
0(%eax) -> account_number
4(%eax) -> balance
1) movl 4(%eax), %edx
2) addl %ebx, %edx
3) movl %edx, 4(%eax)
withdraw(a, amt)
let “a” be stored in register %eax and “amt” in register %ebx
1’) movl 4(%eax), %edx
2’) cmpl %ebx, %edx
3’) jae 6’
4’) movl $-1, $eax
5’) jmp out
6’) movl 4(%eax), %edx
6.5’) subl %ebx, %edx
7’) movl %edx, 4(%edx)
8’) movl $0, %edx
The processor gives us sequential consisteny, but lets see if we can still screw up.
Here both A and B will deposit $5 with initial balance $10. We expect $20 to be the balance at the end.
| **A** | **A's %edx** | **B** | **B's %edx** | **balance** |
| A1 | 10 | | | 10 |
| A2 | 15 | | | 10 |
| A3 | 15 | | | 15 |
| | | B1 | 15 | 15 |
| | | B2 | 20 | 15 |
| | | B3 | 20 | 20 |
A# means A runs line # of the deposit function
Ok great we got $20, which is the correct result.
What if we executed the instructions in a different way
| **A** | **A's %edx** | **B** | **B's %edx** | **balance** |
| A1 | 10 | | | 10 |
| | | B1 | 10 | 10 |
| A2 | 15 | | | 10 |
| | | B2 | 15 | 10 |
| A3 | 15 | | | 15 |
| | | B3 | 15 | 15 |
This time we get only $15. Both A and B got $10 for the initial balance, both added $5 to it, and both stored the resulting $15 back into balance. This is not what we would want to see when we use an ATM. We can see that the result is caused by a race condition.
what if A withdraws $10 and B also withdraws $10?
| **A** | **A's %edx** | **B** | **B's %edx** | **balance** |
| A1' | 10 | | | 10 |
| A2' | 10 | | | 10 |
| A3' | 10 | | | 10 |
| | | B1' | 10 | 10 |
| | | B2' | 10 | 10 |
| | | B3' | 10 | 10 |
| | | B6' | 10 | 10 |
| | | B6.5'| 0 | 10 |
| | | B7' | 0 | 0 |
| | | B8' | | |
| A6' | 0 | | | 10 |
| A6.5'|42949672.86| | | 0 |
| A7' |42949672.86| | | 42949672.86 |
| A8' | | | | |
Here, both A and B check for sufficient funds at the same time. Both find that subtracting 10 from 10 is fine and both proceed with the withdraw. Now B goes first and the balance becomes 0, now A subtracts 10 and since balance is unsigned, we end up with a huge positive integer.
This is something we probably would want to see when we withdraw from our bank accounts, but this is not something the bank would want to see. At any rate, this is another example of a race condition causing problems for us (or the bank anyway).
===Isolation Atomicity===
In lecture we define two processes as **isolated atomically** if the two sequences are never executed interleaved.
What this means is, that when a process that is isolated atomically starts execution, no other processes may interrupt its execution. The reason we had a problem at the ATM machine was that depositing and withdrawing were not isolated atomically. The deposit process for instance, could be called by different ATM machines, creating another deposit process to execute before another one has finished.
Usually an entire process does not need to be isolated atomically, but below we define critical sections, which are the specific sections of code that need to be executed atomically so that we can avoid race conditions.
====Critical Sections====
A critical section is a set of instructions where, to preserve sequential consistency at a high level, at most one process instruction pointer may be in the set of any moment in time. Stated another way, if one process is executing an instruction in a critical section, no other process will executing something in that entire set of critical instructions. But how do we know what code is part of a critical section?
===Finding Critical Sections===
In the ATM example, the unwanted results were due to the “balance” variable being accessed and changed by two different processes at the same time. Therefore, we know that any code that __writes or dependent reads to shared state__ are critical sections.
Lets take a look at the code for the ATM again
void deposit (struct account *a, unsigned amt) {
a->balance += amt;
}
int withdraw (struct account *a, unsigned amt) {
if(a->balance >= amt) {
a->balance -= amt;
return 0;
}
else
return -1;
}
Here "balance" is shared by any ATM accessing Professor Kohler’s bank account. Changing the value of "balance" is writing to a shared state, so any code that changes the value of balance is in the critical section.
//What about the if statement?//
The if statement simply compares “balance” to “amt” so it does not change the value of balance. However, whether or not the if statement’s body executes depends on the value of balance. Therefore, this is an example of a dependent read and should be included in the critical section.
Lets take a look at the assembly version of these two functions again.
deposit(a, amt)
let “a” be stored in register %eax and “amt” in register %ebx
0(%eax) -> account_number
4(%eax) -> balance
1) movl 4(%eax), %edx
2) addl %ebx, %edx
3) movl %edx, 4(%eax)
For the deposit function, all 3 lines of instructions are a critical section. Lines 1 and 3 both access the balance, which is stored in 4(%eax) (as well as %edx after line 1). The value of Line 3 also exhibits data dependency on line 2 since the value of register %edx is changed in line 2 and used in line 3.
withdraw(a, amt)
let “a” be stored in register %eax and “amt” in register %ebx
1’) movl 4(%eax), %edx
2’) cmpl %ebx, %edx
3’) jae 6’
4’) movl $-1, $eax
5’) jmp out
6’) movl 4(%eax), %edx
6.5’) subl %ebx, %edx
7’) movl %edx, 4(%edx)
8’) movl $0, %edx
Lines 1’ through 3’ implement the branch statement that determines whether or not there are sufficient funds for a withdraw. These 3 instructions are part of the critical section since the result is dependant on the value of balance.
Lines 4’ and 5’ return -1 if there are insufficient funds and do not depend on any shared state, and is therefore not part of the critical section.
Lines 6’ through 7’ perform the subtraction from balance and therefore must be part of the critical section.
Line 8’ returns 0 to indicate success and does not depend on any shared states, and is therefore not part of the critical section.
==Minimum Sufficient Critical Sections==
The easiest way to enforce a critical section is to only allow one thread. If there exists only one thread then we do not have any synchronization issues and every line of instruction is essentially part of the critical section. However, this defeats the purpose of having threads in the first place and performace is sacrificed.
Instead of restricting ourselves to only one thread, it is better to find the minimum sufficient critical sections. By minimum, we mean the bare minimum instructions, and by sufficient we mean the critical section is “good enough” to maintain consistency.
===Enforcing Critical Sections with Locks===
To enforce a critical section, we need to make sure that at most one process is inside the critical section at any time. To do this we simply "lock" up the critical section when a process enters. This will prevent any other process from entering the section and force them to wait until the current process exits the critical section. When a process leaves a critical section, it should "unlock" the critical section to allow another process to enter.
At most one lock maybe be obtained by a process for a given critical section. If more than one process could obtain a lock, then the critical section will have more than one process inside, which is not what we want. Also a process should always unlock when leaving a critical section or the critical section will be locked forever.
==Mutual Exclusion Lock (mutex)==
A mutex is a locking mechanism with 2 operations
Acquire: wait for the mutex to become unlocked if it is in the locked state, then atomically set to lock
Release: sets the mutex to the unlocked state, current state must be locked
When initialized, a mutex is in the unlocked state.
==Implementation of a mutex==
Before we jump into when and where to use a mutex, it may be useful to see how a mutex is actually implemented.
Here, the idea of an "Atomic Instruction" is important. In order to create a lock, we would need to first test if the lock exists. If it does, we wait, if not then we can lock!
In C code it may look something like this:
unsigned test_and_set (unsigned *addr, unsigned new_value) {
unsigned old_value = *addr;{{tas.jpg|}}
*addr = new_value;
return old value;
}
But, we just mentioned that we needed this to be an atomic instruction! It turns out that there is actually an x86 instruction that executes the three lines of code in test_and_set atomically. For reference, the instruction is called "xchgl".
We use test and set because we can wait for the old value to change and then immediately set it afterwards to represent locking. Below we can visualize how this works.
Figure 1 is an example of a successful test_and_set. X could be the representation of an unlocked lock (in the 0 state). The %eax register is used as the value to swap back into X immediately after testing it. Here, X = 0 and %eax = 1 and we set X to 1 and return 0 through %eax. This essentially sets the state to "locked"
Figure 2 is an example of a failed test_and_set. Here we see X has a value of 1 and %eax also has a value of 1, so if we swap the values then both will still have a value of 1. If 1 is returned then acquiring the lock was not successful and the process will have to try again. It can be seen why test_and_set is a simple, but elegant approach here, because upon finding a locked lock, we simply keep it that way (we replace 1 with 1).
So how do we unlock? When a process successfully calls test_and_set, it currently owns the lock. To unlock, that process will have to change X to a different value other than 1. If X != 1, then test_and_set will return a value that is not 1 for another process and now that other process will be able to obtain the lock.
Here's an example:
1) Process 1 calls test_and_set while X = 0, and passes in 1 through %eax. This sets X to 1.
2) test_and_set returns 0, so process 1 can proceed
3) Process 2 calls test_and_set and 1 is returned since X = 1, Process 2 cannot proceed since the return value is 1
4) Process 1 sets X to 0
5) Process 2 calls test_and_set again and now it will return 0, so now Process 2 can set X to 1 and proceed
{{tas2.jpg|}}
Here is how we use test_and_set as a lock:
// Acquire the lock by looping until the lock is not locked
void acquire (unsigned *x) {
while (test_and_set (&x, 1) == 1) {
/* do nothing */
}
}
// Release by simply setting the lock object to zero
void release (unsigned *x) {
*x = 0;
}
Notice that the while loop prevents a process from proceeding until a value other than 1 is returned. Theres nothing magical about the value 1. We could use another value if we wanted as long as we are consistent.
==Using a Mutex==
A mutex should be used whenever there is a critical section. Before entering the critical section, acquire should be called to lock the mutex, and after leaving the critical section, release should be called to unlock the mutex.
Lets revisit our ATM example one last time to see how a mutex can be used to prevent unwanted results.
struct account {
unsigned account_number;
unsigned balance;
mutex_t account_lock;
}
void deposit (struct account *a, unsigned amt) {
acquire(&a->account_lock); //start of critical section, lock mutex
a->balance += amt;
release(&a->account_lock); //end of critical section, unlock mutex
}
int withdraw (struct account *a, unsigned amt) {
acquire(&a->account_lock); //start of critical section, lock mutex
if(a->balance >= amt) {
a->balance -= amt;
release(&a->account_lock); //critical section ends here for sufficient funds, unlock mutex
return 0;
}
else
release(&a->account_lock); //critical section ends here for insufficient funds, unlock mutex
return -1;
}
==Utilization==
We can employ either a spinning or a blocking version of a mutex. In terms of utilization, a blocking mutex has better utilization as one would expect.
==Fairness==
If multiple processes want to acquire a lock and get blocked in the process, which process should get the lock when it becomes available again? If any of the processes could obtain the lock, then one process could potentially starve. For example, if 5 processes want to acquire a lock, 4 of the processes could keep passing the lock amongst themselves and the 5th process will never obtain the lock. One way of fixing this is to use a queue for blocked processes. Processes obtain locks in order of their requests for the lock. This way, we can avoid starvation for obtaining locks.
=== Coarse Grained Locks ===
Coarse grained locks use a small number of locks that protect a large amount of shared state. In our ATM example, an example of a coarse grained lock would be a single lock that protects all accounts.
**''+''** Relatively easy to implement\\
**''+''** Small space cost\\
**''-''** Lock contention: Many threads or processes waiting on one lock\\
=== Fine Grained Locks ===
Fined grained locks use many locks that protect a small amount of shared state.
**''+''** Less lock contention\\
**''-''** Takes up more space\\
**''-''** Harder to implement\\
==== Race Conditions in REAL CS111 LIFE ===
Imagine you are in CS111 and it is your turn to be the sribe. You sit through a life-changing lecture on "race-conditions", and "race" to the nearest internet providing machine (after class has fully finished, of course) so that you can document every last detail of what you just learned into the wiki. You log in... You find the lecture to edit... You click EDIT... and to your horror, you get a window looking something like this!
{{race_lock_1_2.jpg|}}
You have just experienced a real-life race condition (good thing you now know what they are!). This is exactly what you would see if you and a scribe in the same group as you tried to modify the same page (or "critical section")of the wiki at the same exact time! Thankfully, the CS department has taken its own advice and created locks so that only one person can log onto a certain page in the wiki at a time.
But let's take away locks, and see what would happen. What problems would this cause?
Suppose you and your partners log on to the wiki, and receive "revision 0" from the server (the last saved revision). Then you both edit different parts of this revision, eventually get hungry and go to lunch, and save your corresponding revisions, say "revision 1" and "revision 2" in some order, back onto the server. You can see, that whoevers revision was saved last, would be the revision that would be "permament", destroying your partner's revision for ever! At this point, you should hope the other scribe members in your group don't have a past history of scribe-targeted violence!
To extend this idea further, suppose now that Professor Kohler has put you in charge of the mutex locking system for the CS111 wiki page. What's the best way to implement this? Fine, or coarse? If we have one big lock for the entire scribe database, then only one scribe can edit at a time, even if they are scribing for different weeks in the lecture! This is coarse-grained locking, and you can see for this situation, it is inefficient. What we should really do, is implement a fine-grained locking system that has different locks for each lecture of the quarter,. This way, only scribes that are trying to edit a page that is already being edited by another scribe will be blocked!