====== Lecture 4 Scribe Notes ====== //By Tom Carpel, Neil Huang and Sean MacIntyre// ====== Processes: Abstraction and Implementation ====== ===== Process ===== A process is considered one of the following: * A virtual computer. * A program in execution in an isolated domain. Two concepts that are essential in understanding processes are __Abstraction__ and __Implementation__. * __Abstraction__ – How does an application work in an isolated domain? * __Implementation__ – How does a kernel implement an isolated domain? ==== Von Neumann Computer Architecture ==== A Von Neumann Computer Architecture closely represent the computers that we operate. It includes 4 components. - ALU/CU [Arithmetic Logic Unit / Control Unit] - Registers (Example: Instruction Pointer) - Primary memory - I/O Devices ==== ALU/CU (CPU: Central Processing Unit) and Registers ==== === Virtualization === * Emulation - A type of virtualization that provides exactly the same interface as the real thing. It is one possible virtualization but it is just not quite right. We can imagine an arcade machine emulation, implemented as such: struct arcade_regs { int instruction_ptr; . . . } void cpu (struct arcade_regs *r) { while (1) { i = get_next_instruction(r); if (i == jmp) r->instruction_str = r->accum; else if (I == ADD) r->reg0 = r->reg0 + r->reg1; . . . } Advantage: * Robust: Can check all errors and avoid crashing the real system. Disadvantage: * Performance: Everything has to be implemented by a great amount of real instructions. For example, emulating an arcade game: emulated CPU instructions = 10 real CPU instructions //To improve on this we want to cut down the ratio of (# instructions executed)/(# emulated instructions executed). Want to get it to be 1, like running on the actual hardware.// === Abstraction === We use Hardware! === Implementation === Maintain process isolation. How? By using kernel & architecture. **Main Motivation**: Increases performance. But we can't necessarily use the real architecture directly! __MIPS X instruction:__ OxCFC00000 hsc – Halt and Spontaneously Combust Reg(31) <- PC This is an example of a **Dangerous Instruction**! This is an actual instruction from the MIPS instruction manual (although it's probably just a joke). [[www.bsz.be/karistep/csl-tr-86-289.pdf]] __Solution__: Change the architecture to prevent access to dangerous instructions! Architecture: (in x86 terms) * CPL register: Current Privilege Level. An incremental level of access ranging from: CPL 0 -> all privilege (kernel) CPL 3 -> application privilege/user privilege * Dangerous instructions can __only__ be executed when CPL = 0. * Instruction for changing CPL is in itself “dangerous.” Kernel: Sets its own CPL to 0 and when running an application set its CPL to 3. An application shouldn't be allowed to change its CPL, for it violates the process isolation principle. However, the question of how are applications now going to execute kernel code and System calls? The solution is Protected Control Transfer. === Protected Control Transfer === Allow applications to jump into selected kernel entry points: It needs: - Application’s instructions for PCT - Architecture’s implementation - Kernel’s implementation Example for 1): int $48 int <- “interrupt” $48 <- system call number Example for 2): The Interrupt Descriptor Table. {{idt.png|}} The Processor performs: - Looks up IDT entry. - Saves old registers onto a new stack. - Sets the new CPL. - Jumps to the routine. Example for 3): The Kernel performs: - Sets up interrupt descriptor table. - Loads IDT register (lidt, IDT register instruction, dangerous instruction). - Implements the system call. Kernel must also save __all__ of the registers from the process so it can restore them later. It saves them in a __Process Descriptor__. (One descriptor for each different process) {{processdescriptor.png|}} ==== Primary Memory ==== === Abstraction === * Process sees Hardware === Implementation === * Isolated Address Space – Process’ View of Memory **Main Motivation**: Increases performance The processor prevents a process (consider a single application) from modifying some parts of memory, hence, we have an __isolated__ address space. __Applications can **Read/Write**__: * Its own memory **ONLY**. __Applications cannot **Read/Write**__: * Kernel memory * Other Processes' Memory If the process was able to freely modify all parts of memory, specifically the kernel memory, it could potentially change its own CPL (current privilege level) from application privileges to kernel (or all) privileges, which could be exploited maliciously. The process could also possibly modify the interrupt descriptor table and cause the kernel to jump to the wrong memory location. __Process Address Space__: A typical process address space is shown below of the 4GB block of byte-addressed memory. {{processaddressspace.png|}} The stack and heap both expand as the process’ program(s) execute. If they collide, we have overflow and this process dies. Using multiple process address spaces, we limit memory overflow-causing death to only that single process in which overflow occurred. The infamous Blue Screen of Death is due to a lack of process isolation; one process overwrites another’s memory and the system crashes. We see fewer of these cases in Windows written with the NT kernel because it includes process isolation, which the earlier kernels lacked. ==== Multiple Processes ==== For multiple processes, we have multiple Isolated Address Spaces, each of which contains the sections described in the memory layout above. //Old School Implementation:// Segmentation * Each process descriptor is statically allocated * Each process can only modify its memory {{segmentation.png|}} //New School Implementation:// Virtual Memory * Arbitrary mapping of actual memory to virtual memory of process descriptor {{virtualmem.png|}} === Process Descriptor Overview === __Threads__: A thread is a virtual processor. That is, there is a virtual execution context that contains: - CPU - Registers - Local Variables/Return Addresses We would like to be able to support a process with multiple threads. That is, we have multiple register sets and multiple stacks (since we are executing different functions) for a single process. This enables better utilization, but could introduce complicated race conditions, which would be discussed later. ==== I/O ==== === Abstraction === * File Descriptors === Implementation === * System Calls **Main Motivation**: Generality & Simplicity I/O devices encompass a large range, from data storage, such as hard disks, USB drives, CDs, DVDs, etc., to peripherals like keyboards, mice, digital cameras, LCD and CRT monitors, and printers. To be able to handle such a wide variety, I/O devices must be abstracted in a very broad and general manner. The abstraction must also be general enough to accompany the use of additional devices as new products are created and developed. This kind of generalization is the motivation behind Unix's BIG PICTURE: **EVERYTHING is a file** A file descriptor (fd) is just a number that is assigned by the operating system that points (or refers to) to a specific file structure: int fd = open(const char * name, int mode); I/O devices are implemented using system calls, such as the open() function above, which are built into the operating system. The implementation takes place in the software rather than the hardware because these I/O devices are not very performance sensitive, and system calls provide a simple interface for software applications to easily access I/O devices. For example, a programmer wanting to write to a hard disk need not understand anything about the hardware itself, e.g. the sector size; the only necessary information is the name of the device (const char * name) and the file access mode (int mode), e.g. reading only, writing only. The file descriptors used by a particular process are kept in the process descriptor structure, as seen below. Each of these file descriptors point to a file structure, which includes information about that file such as file type, size, and offset. {{filedesciptors.png|}} ==== Starting a New Process ==== === Implementation: === - Allocate process descriptor - Initialize process descriptor - Set process ID to an unused process ID - Set CPL to 3 - Set new address space However, problem like __Infinite Loop__ occur. L1: jmp L1 == Interrupt == * Hardware device causes kernel to start running == Timer Interrupt == - Executes “int” every 1ms - Kernel gets control - Kernel runs different process by using a Scheduler ==== Process Isolation ==== In Process Isolation, we want: * Process don't affect one another. * Most system calls only affect the current process So we need to set up the process environment prior to running a process. **fork()** - We use fork() to create COPY of current process. **execvp()** - We use execvp() to run a new program in and replace the current process, and we keep the I/O devices (file descriptors). pid_t p = fork(); If (p == 0) { New process; } Else { Old process; }