You are expected to understand this. CS 111 Operating Systems Principles, Fall 2007
You are here: CS111: [[notes:lec16]]
 
 
 

Lecture 16 Scribe Notes

Notes by: Armand Babian, Carlos Benito, Nguyen Hoang

Looking Back to the Previous Lecture

During our previous lecture we talked about various ways to get virtual memory to support process isolation and allow the operating system to allocate memory in fixed size blocks in order to avoid external fragmentation. We also found other ways virtual memory can improve the computer's utilization, in particular we did swapping. We ended the lecture with a brief presentation of shared libraries.

Shared Libraries

Many aspects of shared libraries are supported by the linker, but one important part of shared libraries is implemented by virtual memory, namely the part that ensures that every version of the shared library in the system corresponds to one piece of physical memory.

The idea behind shared libraries is as followed: Every program on the system (more or less) uses the C library libc.

  • Many programs on top of that will use the equivalent in Windows, for example, that provides access to all of the Windows foundation classes. On Linux there are sets of libraries called KDE, GDK, X, and many others. Thus many programs link to many libraries.
  • Why duplicate all this memory? These libraries mostly consist of code and in today's programs code is rarely modified. We just run code as it is from the disk and rarely do we modify that program's instructions as that program runs. This tells us that code can possibly be mapped read only. Once code is mapped read only, we can safely use a single piece of physical memory to represent that code wherever it is linked in whatever programs are running. This does not violate process isolation because those programs can't tell the difference. Since the code is read only, it doesn't matter how many other processes have that piece of code mapped in from the same piece of physical memory because no process can change the memory.


For example, say we have two processes A and B - both having code mapped at 0x10000, which first consists of the C library libc. We can thus load in libc only once, as follows:

This improves utilization and performance because when B loads it doesn’t have to load in libc itself. It can use the copy that is already mapped in. This is an example of safe sharing without violating process isolation. Processes A and B can share the same piece of physical memory without violating isolation because neither of them can change it due to the library being read only.

What parts of the C library can not be safely shared?

  • Shared states. If there is any writable global variable in the C library, the variable has to be on its own page for every process.

What other case, particularly in Unix, is going to lead to having multiple processes with roughly the same code?

Forking

Forking is a little bit more complex than shared libraries. Shared libraries consist almost entirely of code, so we can map the shared library when the process is created. When a process forks though, the child process and the parent process have to be isolated. Both of the processes must be able to write their own address space.

How can we optimize when fork copies an address space? Let's think about it. What other system calls will usually follow a fork? exec().

  • In your lab 1 shell, you forked off a child process and forced it to run another program. What happened when exec ran? exec completely trashed the process's address space and loaded it with a brand new one from disk. So forking and exec usually happen closely in time. What does that say about the child process's address space? At the time the child processes address space is destroyed, it's almost exactly the same as the parent process's address space because it hasn't had time to make very many changes.

Usually after a fork, the child's address space is mostly the same as the parent's up to the exec. It really doesn't make much sense to copy all that address space given that all that copying is going to be useless. So what do we do?

Eager Copying

One way to do a fork is Eager Copying, where we copy the address space immediately.

For example, suppose we have a process A which has its own code A1, data A2, and a stack A3 mapped anywhere to physical memory as shown below.

Now suppose process A calls fork and creates a new virtual address space for process B as A's child. In order to create B's address space we go ahead and copy its code, data and stack. So B1 is a copy of A1, B2 is a copy of A2, and B3 is a copy of A3, as shown below.

For eager copying, as soon as we know that we need a copy, we make that copy. Eager copying is simple because we know that the two addresses are isolated.

Lazy Copying

In lazy copying, we copy on demand, meaning we delay the copy until it is required.

How can we execute lazy copying? For the example above, what if we started with a completely empty address space for B and on a page fault for some virtual address va we copy va from the parent? Will this work?

  • This is not a great idea because while the child process runs, the parent process is also allowed to run. The child process's address space is supposed to be a copy of the parent's at the instance the parent calls fork so any changes after fork are invisible to the child. But if we use this implementation to do lazy copying, it might be that the parent runs first and modifies some of its memory and only then does the child run. Now we're copying later changes from the parent. So this does not work.

What if we temporarily make the entire address space read only?

  • When the entire address space is read only, the parent and child processes can share the associated physical memory because neither of them can tell the difference. Both processes think they have an isolated copy of memory. However, we need to be able to detect writes and make a copy lazily when a write happens. Luckily, page fault hardware allows us to do that because we can mark a page as read only and at that point we can take a page fault on writes and make a copy only on writes.


Let's take a look back to the basics of our page fault handler to refresh our memory:

  • page_fault_handler(uintptr va, int cpl, bool write)
    • va => the virtual address
    • cpl => privilege level of process that caused the fault
    • write => whether or not the fault was due to a write


Now let's do our original forking example, but this time with lazy copying.

We do the fork and B's address space is copied from A's, but this time B's page table marks each of these pages as mapping onto the same pages as in A. At the same time we mark all of A's pages and all of B's pages as read only.

Copy-on-write Fork

With copy-on-write fork, we make the child's address space a copy of the parent's, then we mark the child's and the parent's address spaces as read only, and then we continue running.

What page might likely first receive a page fault?

  • The stack. Maybe one of the processes calls a function in which case the stack will get a fault because that process pushes some arguments onto the stack and writes into the stack.


What happens when the page fault handler gets called?

  • Just as before, the page fault handler looks up the information on the faulting va.


Now let's flesh out our page_fault_handler a little to help us out:

page_fault_handler(uintptr va, int cpl, bool write){
     vp = vpage_info(current, va);
     if(vp is copy_on_write){
	     copy the page contents into a new page
	     mark va as pointing to the new page with pgdir_set(current pgdir, va, new page pte)
     }
}


Now, let's look back to our previous example to see how we can use copy-on-write fork.

If we assume B runs first and that B's stack page is the first to receive a write, here's what will happen:

  1. B's write action will cause a page fault
  2. the kernel runs and notices that this is a copy-on-write page
  3. the kernel then allocates a new page and copies the data from the old page into the new page and makes a new read-write mapping on the new page
  4. we’re off to the races

The process can’t tell the difference between eager copying and lazy copying except for its speed. Now fork runs much faster.

Now what happens when the A process calls a new function?

  • It gets a page fault because A3 is still marked read only. If we strictly followed our previous instructions, we would allocate a new page for A3, copy the contents of A3 into A3 and re-map the A3 page as read-write, as shown below.

This seems a little wasteful. How can we know that we don't need to do all that?

  • With a reference count. We keep a reference count on each physical page which is the number of times that page is referenced from a page table and we only need to make a copy if the physical page's reference count is greater than 1. If the physical page's reference count is equal to 1, then we've got the only copy and we can re-map that copy as read-write without violating isolation, because no one else is looking at that copy.

To implement this, we would use something like:

struct page_info{
      int pi_refcount;
}

When you make the child's address space a copy of the parent's, you would increment the pi_refcount field for every mapped physical page owned by A.

How high can pi_refcount get?

  • It can get arbitrarily high if a process forks many times. So it’s important that we keep a reference count here as opposed to some shared bit.


Now, we can completely flesh out our page fault handler and have it look something like this.

page_fault_handler(uintptr va, int cpl, bool write){
      vp = vpage_info(current, va);
      if(vp->copy_on_write && write){
            pte = pgdir_get(current, va); 	                 // look up physical address corresponding to this virtual address
            pn = PAGENUM(pte);	                                 // look up page number by shifting right by 12 bits which is doen by the PAGENUM macro
            if(page_info[pn].pi_refcount > 1){
                  page_info[pn].pi_refcount--;
                  pn = page_alloc();	// allocate a new physical page
                  memcpy_physical(pn*PAGESIZE, PRE_ADDR(pte), PAGE_SIZE);	// to get from a page number to a physical address you multiply by the PAGE_SIZE
                  pte = pn * PAGESIZE + PTE_P + PTE_U	         // change page table entry to refer to new physical page
            } 
     
            pgdir_set(current->pg_dir, va, pte+ PTE_W);	         // re-map
            run(current);
      }	
}

So we improved performance of fork() for copy on write.

What else can you do with Virtual Memory?

You can do Garbage Collection with Virtual Memory.

Stop and Copy Garbage Collector

In a Stop and Copy Garbage Collector you realize that your out of memory, you stop the program, and you scan over all available objects that are on the heap copying them into a smaller area. Then you get rid of everything that was left over.

Virtually Memory lets you do a Stop and Copy Garbage Collector Lazily:

Stop mark the entire memory space as inaccessible, and then run the program again. As the program accesses addresses it can only be accessing addresses that were reachable so you can copy objects a page at a time into the new space.

Be aware that Virtual Memory has been used again and again and again in user programs.

System Calls For Virtual Memory

mmap(int fd, ... );

  • Maps part of the buffer cache into process memory space
  • mmap is like a read without reading. You read an entire file at once into your virtual memory space from the buffer cache.
  • mmap is like Demand Pageing but for file contents rather than code.

mprotect(...);

  • With mprotect a process can make parts of its own memory space inaccessible, read-only etc.
  • So you can say "I mark this page inaccessible/read-only."

This isn't very usefull by itself but the OS has a way of pasing page faults up to a process. If the OS receives a page fault for a page that the OS does not know how to handle the OS passes the page fault to the program.

It does this by sending a Segmentation Violation Signal. This signal is a page fault. Therefore, you can write a signal handler for that signal which attempts to compensate for the fact that your program just dereferenced a NULL pointer.

SIGSEGV;

  • This signal is a page fault handler (on Unix)

This page fault handler can use mmap or mprotect to make memory appear underneath the program as that program runs. A combination of mmap, mprotect, and SIGSEGV and some other system calls can let you write user space garbage collectors that use the virtual memory system to improve performance and utilization.

Distributed Systems

Why do we have Distributed Systems (DS)? What is a DS?

A DS is simply a collection of computers connected over a network that are trying to accomplish some common goal.

We don’t call my computer and your computer a DS just because they are connected to the same internet but if my computer and your computer are working together to accomplish some function they together become a part of a DS.

What’s cool about DS?

The reason is Bob Metcalfe. Bob invented the Ethernet which is the networking technology that most of your computer’s use when their not on wireless.

1)

Bob described the value of networks in terms of something called the Network Effect.

Network Effect

The Network Effect is supposed to hold for any system that connects a bunch components together and it states that:

The value of a network system is more than proportional to the number of systems on the network.

An example of more than proportional is: If there are N computers on the network then the value of the network might be proportional to the square of them.

What is the value of a Network?

Let’s say that I have a brand new network called: ButtBook; and let’s also say that I’m the only person on ButtBook:

single_face_of_buttbook.jpg

ButtBook has very little value. I log in and all I find out about is my rear end. If someone else logs into my network and we’re connected:

double_face_of_buttbook.jpg

The value of the network has grown by a factor of two because there is a lot more information that each of us can get. But when we add a third person to the network the total value of the network goes up by two:

triple_face_of_buttbook.jpg

Both I and my friend X get new information from this person Y and as you keep adding nodes to this network each of them can learn about everybody else in the network:

more_faces_of_buttbook.jpg

Eventually it gets to be like real facebook. There are soooooo many other people on facebook that the value of facebook as a whole is proportional to the square of the number of people on facebook.

This is sort of the technical justification for the NE that each new person added to the network gets value from everybody already on it. So that’s sort of a high level justification for DS. Let’s go a little bit lower.

What makes a DS so powerful in an OS sense? When you add a network what does that improve?

  • Adding networking to an OS improves versatility because a networked system can do things that a single system cannot.

But even if… Say your storing all of the books in your house on a single computer then you move to a model where you store some of that data elsewhere what do you get? What has changed?

  • Robustness! Because the failure of your computer no longer loses all of your data.

DS gives us a particularly Hard Type of Modularity. Soft Modularity is basically enforced by contract so the called function agrees to not mess with the caller’s stack. This is just a Convention. Hard Modularity is something that’s enforced i.e. the kernel enforces process isolation but nothing the kernel can do, can enforce power isolation among the different processes on a single computer.

For example, If PG&E doesn’t like your house because you stopped paying your bills all of the processes on your computer are affected at the same time. That is not very good modularity. Actually storing different computers on different continents with different power providers now you have Hard Modularity.

  • One intuition among DS; that we can improve robustness by improving our modularity. Moving functions onto different computer that have different failure modes.
  • Another intuition is that the network gives any single computer access to the pooled power of hundreds of thousands of other computers.

When you go to the google homepage and you enter a search; the results seem to comeback instantaneously but those results were created using the join of thousands of computers touched that data. The network allows you access google without you having thousands of servers in your house.

What abstractions allow you to build a DS? How do applications use the network?

It turns out that one very fruitful abstraction that has made a lot of sense for DS over the years is Procedure Call.

Procedure Call

Why Procedure Call (PC)?

Most of the forms of modularity that we’ve seen in this class have resembled PC’s

  1. PC within a process.
  2. System calls which is a PC which crosses the boundaries between kernel and user.

The basic pattern is that one party makes a request and another party responds.

Andrew Birrell and Bruce Nelson decided that this was the right abstraction for DS and came up with Remote Procedure Call (RPC).

Remote Procedure Call

RPC is simply a PC where the caller and the callee are on different computers separated by a network. RPC is an Abstraction that changes a function call into a message exchage.

Here’s how it works: As an example we’ll use a web base e-mail:

If all of your mail was stored on a local computer than you can imagine a function in your mail program that loads a particular message. Load message given some message ID and maybe this function would return a message to you. And so on a local computer we would write this doing something like the following:

msg_t *load_message(int mid){
Look up message ID on disk;
Allocate memory;
Read message;
Return message;             
}

What has happened here is that this load message code runs:

load_message_diagram.jpg

The mail reader creates memory for the message then reads the message into memory with a read system call. But of course, the caller to load message function has no idea what happends inside the load message function; the load message function because of soft modularity is supposed to be a black box; so what remote procedure call does is that it takes the contents of this function and moves it onto another computer.

Here is the RPC transformation:

The load message function instead of doing all the stuff it does above it does the following:

msg_t *load_message(int mid){
Create a package containing “ Load Message ‘mid’ ”
Send the packet to another computer which we call  server;
Receive the response;
Read the message from the response;
Return the message;
}

So we have the same function call with the exact same signature. We can’t tell the difference of the first load message function from the second unless you look inside.

But now instead of accessing the disk this creates a message with some “send call” and sends that request “load message” over to another computer. That other computer is where your actual message data is stored. This message goes to another program. This program is the server rather than the reader:

server( ){
Reads the packet; 
Process’s the packet; // In this case it’s a load message packet 
Construct’s the response packet; 
Sends the response;
}

This is what is going on:

rpc_load_message_diagram.jpg

The mail server program does the exact same functionality (the load message functionality that was in the old version of the code and sends the response back).

This is the basic idea behind RPC.

Another definition for Distributed System

A distributed system is one in which the failure of a computer you did not even know existed can redender your own computer" unsable". (by Leslie Lamport). With this definition, Leslie Lamport implied that:

  1. Distributed system does not really make computers more Robust!
  2. Faliure becomes common in large system. This means the chance for one of computers in a large system to be failed increases exponential.

For example,in the Remote Procedure Call we see that when a mail reader(client) sends a request or package to a server, the client may not get the respond form the server. Therefore, the client is bocked forever. This kind of problem is called Synchronuos RPC.

SYNCHRONOUS RPC

  Promgram blocks until RPC completes.
  How to make the Synchronous RPS more Robust:
     -  TIME OUT.
     -  Another ways which is more comprehensive is the Asynchronous RPC (do not block)      

ASYNCHR0NOUS RPC

  1. Send a message then continue processing
  2. Process respond separately

What are the advantages of Asynchronous RPC compare to Synchronous RPC?

  1. Improve utilization
  2. More robust

Here is an example of Synchornous RPC vs. Asynchronous RPC:

synchronousrpc.jpg

In synchornous RPC, the client can only send one request at a time since it must wait for the respond from server per request. So,the client will be blocked ( doing nothing) untill it gets the respond. This model is low utilization.

pc3.jpg

In Asynchronous RPC, instead of sending one request at a time, it can send 4 or more requests in parallel and gets responds in any order. So, Aysynchronous is a type of prefetching Strategy. Besides the mail reader has a chance to get control back after sending requests instead of being blocked for the responds.

HOW TO BUILD AN RPC SERVER

There are two roles:

  1. Client's role: Client generates RPC message, and wait for respond. Client is said to be ACTIVE.
  2. Sever's role: Sever waits for RPC requests, and then generate responds. Sever is said to be PASSIVE.

Here are some of the popular ways to use these roles :

  • Client/Server commputing: Exclusive clients and servers

The web is a good example of this type. The web browser is the exclusive client, and the web sever is the exclusive server

  • Proxying: Server for one protocol is client in another.

Example: A proxy acts as server and client.

pic1.jpg

  • Peer to peer: All clients are also servers.

SIMPLE SERVER OUTLINE

int accept(int fd , struct sockaddr *addr , socklent_t * len ) /* fd must be a server file descriptor, and the return value is a new fd for a new connection.*/


int listen (int fd , int backlog ) /* Take fd and turns fd into a server file descriptor waitting for connection*/

 int main(){
     int fd = ...
     listen( fd , 5);
     while(1){
           int cfd = caccept(fd , ...) /* accept a file descriptor for a new connection
           process(cfd); /* process the new file descriptor */
           close(cfd); /* close connection */
          }
    }
 
 process( int cfd){
     char buf[1024];
     int r;
     while(1){
          r = read(cfd,buf + pos , 1024); /* read message from fd into bufffer */
          if (r = 0);
              break;
          pos += r;
     }
    create response;
    write the respond  to cfd;
 }
What is wrong with this server?
  1. Starvation: This happens when client connects to server without sending any data to server. This makes the server to be boocked forever.
  1. Buffer overload: the read() function inside the process() may read more than 1024 chars into the buff. When there are 1024 chars in the buffer, the server will stop writting to buffer itself and begins matching all information in the stack. This causes potential segmentation fault, and the Evil client can take over the operation system in the server.
How can an evil client do that?

The client evil may write data into buf that look like codes which return the contain of the buffer. Then the attacker can take over the server.

How to solve these problems?
  1. Handle each conection in each own process.
  2. fork before each process.

More detail will be continued in the next lecture.

 
notes/lec16.txt · Last modified: 2007/12/05 22:25 by cbx
 
Recent changes RSS feed Driven by DokuWiki