Notes by: Armand Babian, Carlos Benito, Nguyen Hoang
During our previous lecture we talked about various ways to get virtual memory to support process isolation and allow the operating system to allocate memory in fixed size blocks in order to avoid external fragmentation. We also found other ways virtual memory can improve the computer's utilization, in particular we did swapping. We ended the lecture with a brief presentation of shared libraries.
Many aspects of shared libraries are supported by the linker, but one important part of shared libraries is implemented by virtual memory, namely the part that ensures that every version of the shared library in the system corresponds to one piece of physical memory.
The idea behind shared libraries is as followed: Every program on the system (more or less) uses the C library libc.
For example, say we have two processes A and B - both having code mapped at 0x10000, which first consists of the C library libc. We can thus load in libc only once, as follows:
This improves utilization and performance because when B loads it doesn’t have to load in libc itself. It can use the copy that is already mapped in. This is an example of safe sharing without violating process isolation. Processes A and B can share the same piece of physical memory without violating isolation because neither of them can change it due to the library being read only.
What parts of the C library can not be safely shared?
What other case, particularly in Unix, is going to lead to having multiple processes with roughly the same code?
Forking is a little bit more complex than shared libraries. Shared libraries consist almost entirely of code, so we can map the shared library when the process is created. When a process forks though, the child process and the parent process have to be isolated. Both of the processes must be able to write their own address space.
How can we optimize when fork copies an address space? Let's think about it. What other system calls will usually follow a fork? exec().
Usually after a fork, the child's address space is mostly the same as the parent's up to the exec. It really doesn't make much sense to copy all that address space given that all that copying is going to be useless. So what do we do?
One way to do a fork is Eager Copying, where we copy the address space immediately.
For example, suppose we have a process A which has its own code A1, data A2, and a stack A3 mapped anywhere to physical memory as shown below.
Now suppose process A calls fork and creates a new virtual address space for process B as A's child. In order to create B's address space we go ahead and copy its code, data and stack. So B1 is a copy of A1, B2 is a copy of A2, and B3 is a copy of A3, as shown below.
For eager copying, as soon as we know that we need a copy, we make that copy. Eager copying is simple because we know that the two addresses are isolated.
In lazy copying, we copy on demand, meaning we delay the copy until it is required.
How can we execute lazy copying? For the example above, what if we started with a completely empty address space for B and on a page fault for some virtual address va we copy va from the parent? Will this work?
What if we temporarily make the entire address space read only?
Let's take a look back to the basics of our page fault handler to refresh our memory:
Now let's do our original forking example, but this time with lazy copying.
We do the fork and B's address space is copied from A's, but this time B's page table marks each of these pages as mapping onto the same pages as in A. At the same time we mark all of A's pages and all of B's pages as read only.
With copy-on-write fork, we make the child's address space a copy of the parent's, then we mark the child's and the parent's address spaces as read only, and then we continue running.
What page might likely first receive a page fault?
What happens when the page fault handler gets called?
Now let's flesh out our page_fault_handler a little to help us out:
page_fault_handler(uintptr va, int cpl, bool write){
vp = vpage_info(current, va);
if(vp is copy_on_write){
copy the page contents into a new page
mark va as pointing to the new page with pgdir_set(current pgdir, va, new page pte)
}
}
Now, let's look back to our previous example to see how we can use copy-on-write fork.
If we assume B runs first and that B's stack page is the first to receive a write, here's what will happen:
The process can’t tell the difference between eager copying and lazy copying except for its speed. Now fork runs much faster.
Now what happens when the A process calls a new function?
This seems a little wasteful. How can we know that we don't need to do all that?
To implement this, we would use something like:
struct page_info{
int pi_refcount;
}
When you make the child's address space a copy of the parent's, you would increment the pi_refcount field for every mapped physical page owned by A.
How high can pi_refcount get?
Now, we can completely flesh out our page fault handler and have it look something like this.
page_fault_handler(uintptr va, int cpl, bool write){
vp = vpage_info(current, va);
if(vp->copy_on_write && write){
pte = pgdir_get(current, va); // look up physical address corresponding to this virtual address
pn = PAGENUM(pte); // look up page number by shifting right by 12 bits which is doen by the PAGENUM macro
if(page_info[pn].pi_refcount > 1){
page_info[pn].pi_refcount--;
pn = page_alloc(); // allocate a new physical page
memcpy_physical(pn*PAGESIZE, PRE_ADDR(pte), PAGE_SIZE); // to get from a page number to a physical address you multiply by the PAGE_SIZE
pte = pn * PAGESIZE + PTE_P + PTE_U // change page table entry to refer to new physical page
}
pgdir_set(current->pg_dir, va, pte+ PTE_W); // re-map
run(current);
}
}
So we improved performance of fork() for copy on write.
What else can you do with Virtual Memory?
You can do Garbage Collection with Virtual Memory.
In a Stop and Copy Garbage Collector you realize that your out of memory, you stop the program, and you scan over all available objects that are on the heap copying them into a smaller area. Then you get rid of everything that was left over.
Virtually Memory lets you do a Stop and Copy Garbage Collector Lazily:
Stop mark the entire memory space as inaccessible, and then run the program again. As the program accesses addresses it can only be accessing addresses that were reachable so you can copy objects a page at a time into the new space.
Be aware that Virtual Memory has been used again and again and again in user programs.
mmap(int fd, ... );
mprotect(...);
This isn't very usefull by itself but the OS has a way of pasing page faults up to a process. If the OS receives a page fault for a page that the OS does not know how to handle the OS passes the page fault to the program.
It does this by sending a Segmentation Violation Signal. This signal is a page fault. Therefore, you can write a signal handler for that signal which attempts to compensate for the fact that your program just dereferenced a NULL pointer.
SIGSEGV;
This page fault handler can use mmap or mprotect to make memory appear underneath the program as that program runs. A combination of mmap, mprotect, and SIGSEGV and some other system calls can let you write user space garbage collectors that use the virtual memory system to improve performance and utilization.
Why do we have Distributed Systems (DS)? What is a DS?
A DS is simply a collection of computers connected over a network that are trying to accomplish some common goal.
We don’t call my computer and your computer a DS just because they are connected to the same internet but if my computer and your computer are working together to accomplish some function they together become a part of a DS.
What’s cool about DS?
The reason is Bob Metcalfe. Bob invented the Ethernet which is the networking technology that most of your computer’s use when their not on wireless.
Bob described the value of networks in terms of something called the Network Effect.
The Network Effect is supposed to hold for any system that connects a bunch components together and it states that:
The value of a network system is more than proportional to the number of systems on the network.
An example of more than proportional is: If there are N computers on the network then the value of the network might be proportional to the square of them.
What is the value of a Network?
Let’s say that I have a brand new network called: ButtBook; and let’s also say that I’m the only person on ButtBook:
ButtBook has very little value. I log in and all I find out about is my rear end. If someone else logs into my network and we’re connected:
The value of the network has grown by a factor of two because there is a lot more information that each of us can get. But when we add a third person to the network the total value of the network goes up by two:
Both I and my friend X get new information from this person Y and as you keep adding nodes to this network each of them can learn about everybody else in the network:
Eventually it gets to be like real facebook. There are soooooo many other people on facebook that the value of facebook as a whole is proportional to the square of the number of people on facebook.
This is sort of the technical justification for the NE that each new person added to the network gets value from everybody already on it. So that’s sort of a high level justification for DS. Let’s go a little bit lower.
What makes a DS so powerful in an OS sense? When you add a network what does that improve?
But even if… Say your storing all of the books in your house on a single computer then you move to a model where you store some of that data elsewhere what do you get? What has changed?
DS gives us a particularly Hard Type of Modularity. Soft Modularity is basically enforced by contract so the called function agrees to not mess with the caller’s stack. This is just a Convention. Hard Modularity is something that’s enforced i.e. the kernel enforces process isolation but nothing the kernel can do, can enforce power isolation among the different processes on a single computer.
For example, If PG&E doesn’t like your house because you stopped paying your bills all of the processes on your computer are affected at the same time. That is not very good modularity. Actually storing different computers on different continents with different power providers now you have Hard Modularity.
When you go to the google homepage and you enter a search; the results seem to comeback instantaneously but those results were created using the join of thousands of computers touched that data. The network allows you access google without you having thousands of servers in your house.
What abstractions allow you to build a DS? How do applications use the network?
It turns out that one very fruitful abstraction that has made a lot of sense for DS over the years is Procedure Call.
Why Procedure Call (PC)?
Most of the forms of modularity that we’ve seen in this class have resembled PC’s
The basic pattern is that one party makes a request and another party responds.
Andrew Birrell and Bruce Nelson decided that this was the right abstraction for DS and came up with Remote Procedure Call (RPC).
RPC is simply a PC where the caller and the callee are on different computers separated by a network. RPC is an Abstraction that changes a function call into a message exchage.
Here’s how it works: As an example we’ll use a web base e-mail:
If all of your mail was stored on a local computer than you can imagine a function in your mail program that loads a particular message. Load message given some message ID and maybe this function would return a message to you. And so on a local computer we would write this doing something like the following:
msg_t *load_message(int mid){
Look up message ID on disk;
Allocate memory;
Read message;
Return message;
}
What has happened here is that this load message code runs:
The mail reader creates memory for the message then reads the message into memory with a read system call. But of course, the caller to load message function has no idea what happends inside the load message function; the load message function because of soft modularity is supposed to be a black box; so what remote procedure call does is that it takes the contents of this function and moves it onto another computer.
Here is the RPC transformation:
The load message function instead of doing all the stuff it does above it does the following:
msg_t *load_message(int mid){
Create a package containing “ Load Message ‘mid’ ”
Send the packet to another computer which we call server;
Receive the response;
Read the message from the response;
Return the message;
}
So we have the same function call with the exact same signature. We can’t tell the difference of the first load message function from the second unless you look inside.
But now instead of accessing the disk this creates a message with some “send call” and sends that request “load message” over to another computer. That other computer is where your actual message data is stored. This message goes to another program. This program is the server rather than the reader:
server( ){
Reads the packet;
Process’s the packet; // In this case it’s a load message packet
Construct’s the response packet;
Sends the response;
}
This is what is going on:
The mail server program does the exact same functionality (the load message functionality that was in the old version of the code and sends the response back).
This is the basic idea behind RPC.
A distributed system is one in which the failure of a computer you did not even know existed can redender your own computer" unsable". (by Leslie Lamport). With this definition, Leslie Lamport implied that:
For example,in the Remote Procedure Call we see that when a mail reader(client) sends a request or package to a server, the client may not get the respond form the server. Therefore, the client is bocked forever. This kind of problem is called Synchronuos RPC.
Promgram blocks until RPC completes.
How to make the Synchronous RPS more Robust:
- TIME OUT.
- Another ways which is more comprehensive is the Asynchronous RPC (do not block)
What are the advantages of Asynchronous RPC compare to Synchronous RPC?
Here is an example of Synchornous RPC vs. Asynchronous RPC:
In synchornous RPC, the client can only send one request at a time since it must wait for the respond from server per request. So,the client will be blocked ( doing nothing) untill it gets the respond. This model is low utilization.
In Asynchronous RPC, instead of sending one request at a time, it can send 4 or more requests in parallel and gets responds in any order. So, Aysynchronous is a type of prefetching Strategy. Besides the mail reader has a chance to get control back after sending requests instead of being blocked for the responds.
There are two roles:
Here are some of the popular ways to use these roles :
The web is a good example of this type. The web browser is the exclusive client, and the web sever is the exclusive server
Example: A proxy acts as server and client.
int accept(int fd , struct sockaddr *addr , socklent_t * len ) /* fd must be a server file descriptor, and the return value is a new fd for a new connection.*/
int listen (int fd , int backlog ) /* Take fd and turns fd into a server file descriptor waitting for connection*/
int main(){
int fd = ...
listen( fd , 5);
while(1){
int cfd = caccept(fd , ...) /* accept a file descriptor for a new connection
process(cfd); /* process the new file descriptor */
close(cfd); /* close connection */
}
}
process( int cfd){
char buf[1024];
int r;
while(1){
r = read(cfd,buf + pos , 1024); /* read message from fd into bufffer */
if (r = 0);
break;
pos += r;
}
create response;
write the respond to cfd;
}
The client evil may write data into buf that look like codes which return the contain of the buffer. Then the attacker can take over the server.
More detail will be continued in the next lecture.