====== CS111 Lecture 17: Distributed Systems II ======
//Anthony Urso, Tingyu Thomas Lin//
"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."
-- Leslie Lamport
In distributed systems, we would like to get the **Network Effect**, where the value of the system increases with the number of computers that are in the system. But if Lamport's view of distributed systems is true, the chances of your computer breaking then increases as more computers join the network. Needless to say, this is highly undesirable.
So how do we avoid Lamport's view and have the Network Effect instead? The answer is **modularity**. Lamport's view is a result of **bad modularity**. Good modularity will, among other things, isolate the failure of one module from other modules. In the case of a network, a module can be thought of as a computer, and the failure of any one computer should not cause other computers in the network to fail.
Before we move on to more about networked systems, we'll first talk a bit about system calls needed for networking.
===== System Calls =====
So what system calls/OS interface should one use to build client/server systems? That is, how does the client and server send and receive data?
**Using a file descriptors abstraction**
We abstract sending and receiving on the network as reading and writing to a network file descriptor. This is a stream abstraction, as we can't lseek back in time to re-receive items that have already came across the network.
**Network file descriptors**
Several system calls are necessary to run a networked program, and they differ depending on whether the program is a server or a client. They both use the following system calls:
* **socket()** - creates a network file descriptor
* **bind()** - takes the network file descriptor and maps it to an port
* **close()** - closes the connection
A client has the additional system call:
* **connect()** - connects to a server
and a server has the additional system calls:
* **listen()** - assigns a port to listen for incoming **connect()** requests from a client
* **accept()** - sets up the connection to a client
==== Client/Server shared System Calls ====
**int socket(int domain, int type, int protocol);**
//Creates a network file descriptor//.
**int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen);**
//Attach the socket, created by the // **socket()**// call, to the address specified by// my_addr. //In other words, it's what tells the OS that we want the sockfd to be, say,// 168.192.0.107:1337.
==== Client Side ====
**int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);**
//Connects the socket// sockfd //to the network address specified in// serv_addr. //So if the server is at 168.192.0.225:80, this is what we use to connect to that server address//.
==== Server Side ====
**int listen(int sockfd, int backlog);**
//Listen for connections on the socket specified by// sockfd.
**int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);**
//Accept a connection on a file descriptor//, sockfd, //returned by a previous listen(...) call//.
===== Designing a Server =====
==== Serial Programming ====
// serial server
// - it can only service one client at a time
int main()
{
// setup the listening socket
int listener = socket(...);
bind(listener, ...);
listen(listener, ...);
// we sit in this loop forever to service client requests
while (1) {
//accept the new connection from a client.
//If there isn't a connection yet, accept() blocks until one comes.
int connfd = accept(listener, ...);
// read the request
char buf[...];
read(connfd, buf, ...);
//process request code here
// send reply to client
write(connfd, ...);
close(connfd);
}
}
=== Weaknesses ===
This code has several robustness problems. One is that the client, after connecting to the server, never sends a request. This server will block forever on the read call. Similarly, the client may never read the reply the server sent, again causing the server to block forever. In both cases, the server can no longer service new requests. This leaves the server vulnerable to a **Denial of Service attack**.
A Denial of Service attack is where a server stops providing service to legitimate users because it is being monopolized by an attacker. In this case, the attack could consist of opening a connection to the server and refusing to send data. The connection would stall in the read and prevent other clients from utilizing the server.
=== Robustness Strategy ===
A simple solution to this problem is to use timeouts for connections. This can be accomplished with the **alarm()** system call.
**unsigned int alarm(unsigned int nsecs);**
//Set an alarm clock to signal with SIGALRM after// nsecs //seconds//.
A signal handler will also be needed to handle the alarm signal. There are various ways to handle the signal, but one viable way is to simply close the connection. Since the connection is now closed, the blocked read/write calls will unblock and return an error. The server can then continue on its merry way to service the next client. This solution not only solves the DoS attack problem, it handles other non-malicious problems like loss of connection and network problems.
==== Improving Server's Utilization ====
Beyond robustness, the server may have problems with performance. The limitation of the server is that it can only service one client at a time. This may not be problem if the time it takes to process a request and the time it takes to receive and send messages across the network is small. However, the internet has large latencies that can be around 100ms or greater. This represents a significant number of CPU cycles that is spent waiting for the request to arrive from the client or for the reply to be fully received by the client. As a result, this server may suffer from low utilization.
The solution to this problem is to instead service several requests in parallel. The time spent waiting for one client can be spent servicing another client. There are several solutions, and it is still debated in the professional community over which is the best one. The solutions are: multiple processes, multiple threads, and event-driven programming.
=== Multiple Processes ===
Instead of having one server process servicing one client at a time, we have many processes servicing many clients at a time. There are two approaches to this: pre-forking and process per connection.
==Pre-forking==
The basic idea is to essentially have several serial servers running in parallel. The number of simultaneous clients that can be serviced will be the number of server processes running in parallel. Apache does this by default.
// Pre-forked server
// - This can handle two clients at a time.
int main()
{
int listener = socket(...);
bind(listener, ...);
listen(listener, ...);
fork(); //spawn a second server. More processes can be created to handle more simultaneous connections.
while (1) {
int connfd = accept(listener, ...);
char buf[...];
read(connfd, buf, ...);
//process request code here
write(connfd, ...);
close(connfd);
}
}
==Process Per Connection==
In process per connection, requests are serviced in children processes, while the parent process sits in a loop, creating a child process for each request. This scheme can service a large amount of simultaneous clients.
// Process per connection server
// - This can handle many clients in parallel.
int main()
{
int listener = socket(...);
bind(listener, ...);
listen(listener, ...);
while (1) {
int connfd = accept(listener, ...);
pid_t pid;
if (pid = fork()) {
close(connfd);
} else {
char buf[...];
read(connfd, buf, ...);
// process the request
write(connfd, ...);
close(connfd);
exit(0);
}
}
}
== Weaknesses and Robustness Strategies ==
** Denial of Service **
The process per connection suffers from a possible denial of service attack. An attacker can flood the server with connection requests. This will cause the server to spawn a large amount of processes which will fill up the server's memory.
Threads can be used instead of processes to lighten the memory use.
** Synchronization of Shared State **
The downside of using multiple processes is that, as with any multi-process program, synchronization of shared state like log files, databases, caches, etc., becomes a concern. Synchronization mechanisms need to be built into the server. There are several ways of achieving this, including the use of pipes and signals.
=== Multiple Threads ===
Using multiple threads is similar to the process per connection server. Instead of spawning processes, the server creates new threads to service requests.
**int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void*), void *restrict arg);**
//Create a new thread//.
**void pthread_exit(void *value_ptr);**
//Causes a thread to exit//.
// thread per connection, or threading server
void *serve_connection(void *connfd)
{
char buf[...];
read(*connfd, buf, ...);
// process the request
write(*connfd, ...);
close(*connfd);
pthread_exit(...);
}
int main()
{
int listener = socket(...);
bind(listener, ...);
listen(listener, ...);
while (1) {
pthread_t thread;
int connfd = accept(listener, ...);
pthread_create(&thread, NULL, &serve_connection,
(void *)&connfd);
}
}
== Weaknesses and Robustness Strategies ==
** Denial of Service **
This server is prone to the same denial of service attack as the process per connection server; an attacker merely needs to open an infinite number of connections to the server to exhaust its memory or file descriptors. Threads however do take less memory than processes, so threads can handle more connections before breaking from the high load.
** Synchronization **
Like with the multi-process server, synchronization of shared state between threads becomes a concern; access to shared resources needs to be serialized. This can be achieved with synchronization mechanisms like the pthread_mutex.
**int pthread_mutex_init(pthread_mutex_t *restrict mutex, const pthread_mutexattr_t *restrict attr);**
//Instantiate a pthread mutex//.
**int pthread_mutex_lock(pthread_mutex_t *mutex);**
**int pthread_mutex_unlock(pthread_mutex_t *mutex);**
//Lock or unlock a pthread mutex//.
These calls can be used to acquire a mutex prior to entering a critical section of code or accessing a shared resource.
=== Event-driven Programming ===
In event-driven programming, the program will never block (with the exception of select()). This eliminates all synchronization issues involved with multiple processes and threads. All system calls like read() to not block and return an error if they would have blocked. The only blocking call of event-driven programming is select():
**int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);**
//Block for// timeout //seconds or until an element in the file descriptor sets// readfds, writefds, //or// exceptfds // is ready//.
// non-blocking or event-based server
int main()
{
int listener = socket(...);
bind(listener, ...);
listen(listener, ...);
client_t *clients;
while (1) {
int fd = accept(...);
if (fd >= 0)
// add client to clients
}
for (client c in clients) {
//try to read request
//try to write reply
//close connection if complete
}
//use select to block until any of the following conditions hold:
//listener has new connection
//any fd has data to read
//any fd has room to write
}
== Weaknesses ==
The event-driven paradigm is difficult to code for.
===== =====
-----
===== Network File Systems =====
We would like to allow computers to access data stored elsewhere using the file system interface. The goal is to make distributed systems look like disks, which we already understand. So how do we implement this?
==== NFS ====
{{1b.jpg|:notes:1b.jpg}}
//An illustration of the NFS client and server and their relationship to the VFS layer and kernel//.
//Illustrations by Subhash Arja, Matthew Ho, Grant Jenks, Samuel Kwok//
This is a diagram of how Network File System (NFS) works. On the client side, the NFS module is connected to the Virtual File System (VFS), like any other file system does. The job of NFS is to convert systems like read() and write() into **Remote Procedure Calls** (RPCs) and then to send the RPCs to the server that is hosting the files. On the server side, the NFS sits in user space. The job of NFS at the server is to take RPCs clients send and convert the RPCs into system calls that read and write files on the server's storage disk.
For example, say that the client wants to read a file. The client uses the system call read(). The read gets passed to the kernel VFS layer. VFS hands it off to the NFS stub, which converts the read to an RPC. The RPC is sent to the NFS server. The server reads the file off its storage disk, packages, and sends it back across the network back to the client. The VFS stub at the client receives the server's reply, and passes it back up to the program running in user space.
==== Remote Procedure Calls ====
There are two variants of RPC: synchronous and asynchronous.
=== Synchronous RPC ===
The way synchronous RPCs works is like this: the client sends a single message to the server. The server responds to the client. When the client receives the responds, the client sends the next message and then waits again for the server to respond.
Let's say the RPC is a read RPC, and has two parameters:
*File ID: the id of the file to read
*Offset: the offset into the file where the byte to read is at
The reply from the server is either the byte read or an error.
So to read a file, the program will request the first byte. After the server replies with the byte, the program requests the second byte, waits for the server to send the second byte, and so on and so forth.
{{2b.jpg|:notes:2b.jpg}}
//A time plot of the "request one byte and wait for a reply" synchronous network filesystem//.
This has very low utilization. Most of the time is spent waiting for the RPC to reach the server and for the server's reply to come back. It would be nice to be able to send and receive messages while waiting. This is exactly what asynchronous does.
=== Asynchronous RPC ===
Instead of sending a single RPC and waiting for the response, the client sends several RPCs back-to-back.
{{5.jpg|:notes:5.jpg}}
//A time plot of the asynchronous network file system. The blue represents the asynchronous RPC, the red synchronous//.
This greatly improves utility, as time spent waiting is now used to send bytes.
There are however a few issues with asynchronous RPCs. Though RPCs are sent out in order, there is no guarantee that the responses are received in order. Therefore, there must be some way to distinguish which response is a result of which RPCs.
Also, some of the back-to-back RPCs may get lost along the way, and how to handle loss needs to be specified. If 100 RPCs are sent and the 3rd one is lost, should the 3rd and all 96 subsequent RPCs be resent? Or should only the lost packets be resent? This needs to be decided.
== Other Utilization Conciderations ==
To further improve utility, other utilization increase techniques can be employed, like batching and pre-fetching. For example, in this example's read RPC, an additional parameter can be added that specifies how many bytes to download, reducing the RPC overhead per byte.