You are expected to understand this. CS 111 Operating Systems Principles, Spring 2006
You are here: CS111: [[2006spring:notes:lec10]]
 
 
 

Lecture 10 notes

CS111 Scribe Notes 5/4/2006



Today’s lecture will cover:
Lock Contention
Deadlock
I/O Interactions

Lock Contention

-Many threads waiting on a lock
-This is a software problem that should be avoided by better scheduling
-How to improve?

-Think about WeensyOS2:

for (i = 0; i < 320; i++)
	*cursorpos++ = *PRINTCHAR;  //critical section

-One solution (Fine grained locking)

for (i = 0; i < 320; i++) {
	lock(&m);
	*cursorpos++ = *PRINTCHAR;  //critical section
	unlock(&m);
}
  • Advantage: Less lock contention than coarse grained locking.
  • Disadvantage: has to grab and release the lock 320 times.
  • Disadvantage: Correctness is less obvious than coarse grained locking.

-Second solution (Coarse grained locking):

lock(&m);
for (i = 0; i < 320; i++) {
	*cursorpos++ = *PRINTCHAR;  //critical section
}
unlock(&m);
  • Advantage: Doesn’t have to lock as many times. If locking takes a very long time, this is clearly the better method.
  • Advantage: It is very easy to see that this code is correct.
  • Disadvantage: Lock Contention.

Realistic example where lock contention takes place: Creating a new process

  1. Add process to run queue
  2. Need to protect run queue to avoid race conditions

Pseudocode to add processes to the run queue:

lock(&kernel_mutex);
add process to run queue;
unlock(&kernel_mutex);

where kernel_mutex is a single mutex used by the kernel for all processes

What is wrong with this solution?

  • Locks too many shared objects (We are locking the entire kernel when all we need to lock is the run queue)
  • Causes lock contention
  • Too coarse

Solution:

  • Add mutexes that protect finer grained objects

New pseudocode:

lock(&runq_mutex);
add process to run queue;
unlock(&runq_mutex);

What if there was still lock contention here?

  • We need to create a more creative solution (with even finer grained locks) such as:
    • Locks for head and tail
    • Locks for even and odd numbered processes
Deadlock
  • System stops making progress because of synchronization

Deadlock Example #1:

  • say a blocked process (2) is given a disk interrupt, while the mutex is locked. The interrupt happens while process 1 is running.

Process 1

		
   lock(&runq_mutex)
   add p to runq	     <--inerrupt here
   unlock(&runq_mutex)	

Process 2

   lock(&runq_mutex)   <--begin running here
   add p to runq
   unlock(&runq_mutex)

Circular wait between the tasks:

.:resource-graph.jpg

CPU is running Process 2, who wants runq_mutex. But T1 has the runq_mutex and is waiting for the CPU.

Solution #1:

  • Recursive mutex
  • Problem with this solution:
    • The lock loses its value

Solution #2:

  • Turn interrupts off when acquiring the lock, and turn them back on when the lock is released

Deadlock Example #2:

Int v1, v2;
Mutex_t m1, m2;
F1() {
	lock(&m1);
	lock(&m2);
	x = v1 + v2;
	unlock(&m2);
	unlock(&m1);
}
F2() {
	lock(&m2);
	lock(&m1);
	x = v1 - v2;
	unlock(&m1);
	unlock(&m2);
}

Problem:

  • Both functions could start executing at the same time. F1 could grab m1 and F2 could grab m2, so both are waiting for each other to grab the other mutex.

4 Deadlock conditions

  • mutual exclusion
  • hold & wait (blocking)
    • if no lock -> block
  • no preemption
    • no way to forcibly take a lock from a process who has it
  • circular wait

We can prevent deadlock by eliminating just one of the 4 deadlock conditions.

One solution:

  • Lock Ordering (This solution eliminates circular wait)
    • Apply total order to all locks
    • i.e. Only obtain a lock l if order (l) > order (l’) for all l’ held by current thread
    • rewritten F2 with this new ordering technique:
F2() {
	lock(&m1);
	lock(&m2);
	x = v1 - v2;
	unlock(&m2);
	unlock(&m1);
}

Return to Deadlock Example #1

Solution 2: Fix the problem using ordering.

  • Order(interrupt flag) < order ( runq_mutex)

Deadlock Avoidance

Deadlock Avoidance Example #1:

  • OS running 3 processes, P1, P2, P3
  • P1 needs more memory than the machine can offer
    • OS decides to save some of P2’s state to disk (swapping)
    • What if, writing to disk requires more memory? (stack page on kernel stack)

This would result in a deadlock.

Solution: Keep system in safe state -Kernel allocates a kernel stack before it is needed

Problems with Solution: There is less memory for the system to use

  • Must make a conservative approximation of future usage so there are less resources to go around.

Deadlock Detection and Recovery

  • Kernel can kill a process when in deadlock.
  • Kernel can violate one of four conditions (such as forcing process to return if cannot gain acces to lock) and recover
I/O Interactions

.:io-interactions.jpg

I/O Interactions Example #1: We want to read 40B from disk given:

Disk latency: 50 microseconds
Cycle: 1 ns = 10E-9 s  (1 Ghz processor)
Programmed IO instruction (PIO) = 1000 cycles = 1 ns
Clock interrupt frequency = 1 ms
      Interrupt Overhead: 5 PIO = 5 microseconds

Solution #1 (busy waiting):
Actual instructions we might use:

outb(0x1F2, 1)
outb(0x1F3, 0)
outb(0x1F4, 0)
outb(0x1F5, 0)
outb(0x1F6, 0)      // Construct 0 from 4 bytes of 0: 0x00000000	
outb(0x1F7, 0x20)  // Read Sector
while((inb(0x1F7) & 0xC0) != 0x40)
	/* do nothing */
  1. Write request to disk
    • 5 PIO instructions
  2. Wait for response
    • 1 PIO instruction (actually takes 50 microseconds because disk latency is 50microseconds)
  3. Read response
    • 40 PIO instructions

Overhead: how much time the CPU is busy for this disk request

  • 95 microseconds

Turnaround time: Time until 40B overhead)

  • 95 microseconds

Throughput: # requests per second (1/Overhead)

  • 10500 requests/second

Solution #2:
Use Clock interrupts with device buffering instead of polling

  1. Make request
    • 5 PIOs
    • Block requests. Do something else
  2. Every clock interrupt
    • Check whether device is ready (1 PIO)
    • Read data off device (40 PIOs)

Overhead: 46 microseconds
Turnaround Time: 546 microseconds (500 microseconds because on average we wait ½ of a clock interrupt)
Throughput: 1/46 microseconds = 21700 requests/s

Device controller has an internal request buffer

  • Send a new request before the previous request completes
  • This causes the throughput to be 1/Overhead instead of 1/Turnaround Time
  • Advantages: Throughput has increased because we are now allowing other processes to run
  • Disadvantages: Turnaround time is very bad

Solution #3:
Device interrupts

(picture here)

  1. Make request
    • 5 PIO + 50 microseconds disk latency
  2. Interrupt
    • 5 PIO
  3. Check data (check if succeeds)
    • 1 PIO
  4. Read data
    • 40 PIO

Overhead: 51 microseconds
Turnaround Time: 101 microseconds (51 microseconds overhead + 50 microseconds latency)
Throughput: 19,000 requests/s

Solution #4:
Direct Memory Access

  1. Make request
    • 5 PIO + 50 microseconds disk latency
  2. Interrupt
    • 5 PIO
  3. Check data
    • 1 PIO

Overhead: 11 microseconds
Turnaround Time: 61 microseconds (11 microseconds overhead + 50 microseconds latency)
Throughput 90,900 requests/second

Solution #5:
Direct Memory Access with Polling

  1. Check buffer for new command
    • 1 PIO

Overhead: 1 microsecond
Turnaround Time: 51 microseconds
Throughput: 1,000,000 requests/second

Polling is a good thing!

.

 
2006spring/notes/lec10.txt · Last modified: 2006/09/26 11:42 (external edit)
 
Recent changes RSS feed Driven by DokuWiki