107 NTUST OS - HW 1

學號: 隨班附讀 a128891

Part 1

暫存器與 Stack 上的值

ax 的值為 0xaa55 or $(43605)_{10}$
cx 的值為 0x0 or $(0)_{10}$
bx 的值為 0x0 or $(0)_{10}$

Part 2:

Chapter 0: 主要介紹 OS 的工作內容、Process、System call 的概念以及文件系統的設計

OS tasks:

shares a computer among multiple programs and to provide a more useful set of services than the hardware alone supports, and also makes they run (or appear to run) at the same time.
manages and abstracts the low-level hardware, so that, for example, a word processor need not concern itself with which type of disk hardware is being used.
provides controlled ways for programs to interact, so that they can share data or work together.

Xv6

本書透過 xv6 來闡述 OS 的概念
introduced by Ken Thompson and Dennis Ritchie’s Unix OS.
xv6 shell is an implementation of the Unix Bourne shell.

xv6 使用傳統 kernel 的概念，kernel 是用來提供程序 (process) 服務，process 會有自己的 instructions, data 以及 stack。

Process and System call:

Process:
- programs that run on computers.
- contains three parts:
  1. Instructions: implements the program’s computation.
  2. Data: the variables on which the computation acts.
  3. Stack: organizes the program’s procedure calls.
  - 在 xv6 裡，除了以上三個要素 (稱為 user-space memory) 還有 per-process state (private to the kernel)，此外，當 process 沒在運行時，會被 kernel 以 process identifier (or pid) 給標記起來
System call:
- 當程式需要呼叫 kernel 的服務時，所需啟動的程序 (procedure call)
- 在 user space and kernel space 之間交替運行 (two steps):
  1. the system call enters the kernel from user space;
  2. then, the kernel performs the service and returns.

這部分也順道介紹了殼 (Shell)

a user program, not part of the kernel
easy to be replaced because it is only an ordinary program which has two basic functions:
1. read commands from the user
2. then, execute them.

這裡介紹兩個重要的 system calls: fork() 跟 exec():

fork()
- enables processes (parent process) to create a new process, called the child process, with exactly the same memory contents as the calling process.
- 複製後除了和原先的 process 同時執行，兩者也互不影響 (具有不同的 memory 和 register)。
- if in parent process, fork returns main process’s pid, if in children process, returns 0.
- 父、子程序如果輸出不相同，誰的運算速度快就比較早輸出，因此每次結果都有可能不同
- 常搭配 exit(), wait() 使用:
  - exit(): 讓 children process 在 if 之後結束
  - wait(): 讓 parent process 在 children process 結束後開始
```
    int pid = fork();    // create a process
    if(pid > 0){
        printf("parent: child=%d\n", pid);
        pid = wait();    // return the child's pid if child exist, wait if not.
        printf("child %d is done\n", pid);
    } else if(pid == 0){
        printf("child: exiting\n");
        exit();    // stop executing and release child's resources
    } else {
        printf("fork error\n");
    }
```
exec()
- exec 能從某個文件中讀取 memory image 來取代呼叫它的 process 的 memory，且一旦執行成功會直接結束不返回值，若失敗則在返回失敗訊息後由原 process 執行後續代碼。
- 為 Unix 執行存放在硬碟可執行文件的唯一方式
- 共有六個函數: execl, execlp, execle, execv, execvp, execve，各自的差異主要在其輸入的參數
```
    char *argv[3];
    argv[0] = "echo";
    argv[1] = "hello";
    argv[2] = 0;
    exec("/bin/echo", argv);    // replaces the calling process's memory with a new memory image loaded from a file stored in the file system.
    printf("exec error\n");
```

I/O and File descriptors

定義一個整數型態的變數 file descriptor，用來代表一個被 kernel 管理且可被 process 讀寫的物件，process 可透過打開文件、目錄、設備或是創建 pipe、或是複製現有 descriptor 來取得此物件。
對於被 file descriptor 所指向的物件通常稱為 file
定義了三個輸出入的資料流，分別為:
- reads from file descriptor 0 (standard input)
- writes output to file descriptor 1 (standard output)
- writes error messages to file descriptor 2 (standard error).

兩個主要的 function calls : read() and write()

read(fd, buf, n):
- 從 fd 中讀取最多 n bytes 的資料複製進 buf 中，並回傳讀取的 byte 數量
- 每次讀取完當前的 file offset 會向前移動已讀取的 byte 數量，下一輪讀取會從新的起點開始讀取，當沒有東西可讀取時會 return zero
write(fd, buf, n):
- 從 buf 中提取 n bytes 的資料寫進 fd 中，並回傳寫入的 byte 數量
- 跟 read 相同，會在每次寫完資料後移動 file offset 的位置，在下一論寫入時修改寫入的起點

// 這段程式碼從 standard input copy 資料到 standard output
// 此外，這段程式並不知道是從哪裡讀取資料的，也不知道寫入什麼文件中。
char buf[512];
int n;

for(;;){
    n = read(0, buf, sizeof buf);
    if(n == 0)
        break;
    if(n < 0){
        fprintf(2, "read error\n");
        exit();
    }
    if(write(1, buf, n) != n){
        fprintf(2, "write error\n");
        exit();
    }
}

File descriptors 跟 fork 一起使用，可以更簡單的實作 I/O redirection

fork copies the parent’s file descriptor along with its memory

char *argv[2];
argv[0] = "cat";
argv[1] = 0;
if(fork() == 0) {
    close(0);
    open("input.txt", O_RDONLY);
    exec("cat", argv);
}

其餘的 system calls:
- close: releases a file descriptor, let it can be reuse by a future system call.
- dup: duplicates an existing file descriptor and return it.

Pipes

一個小的 kernel 緩衝區，利用一對 file descriptors (一個處理 reading，另一個處理 writing)，來提供 processes 間進行溝通的方式。

References:
- linux c语言 fork() 和 exec 函数的简介和用法 - https://blog.csdn.net/nvd11/article/details/8856278
- 程序员必备知识——fork和exec函数详解 - https://blog.csdn.net/bad_good_man/article/details/49364947

Chapter 1: 探討 OS 主要的特性

This chapter provides an overview of how OS are organized to achieve its requirements: multiplexing, isolation, and interaction

為何不將 system call 以 library 的形式使用？這樣的話每個 application 不就可以客製化其 function calls，甚至可以寫在 HW 裡頭來加速嗎？

因為這樣須確保所有的 application 是能夠 well-behaved 的，這裡的 well-behaved 舉的例子是 process 能夠週期性的不使用 processor，因為若某 app 有問題，而沒有一個通用的方式來處理，出現 bug 時會造成其他 app 無法運作。
且實務上程式間是無法互相信賴的，因此通常會希望 OS 在設計上能夠有 stong isolation 而不是 cooperative scheme，來劃清程式之間的工作

User mode, kernel mode, and system calls

為了達到 isolation 的目的，OS 通常會阻止 app 能夠直接存取到較敏感的 hardware resources，如:

禁止較敏感的 system call 被執行, ex: read and write raw disk sectors
利用 exec() 來建立 memory image，而不是直接給予實際的記憶體空間
File descriptors 的設計

而多數的 processors 為了讓 hardware 具有 strong isolation 的特性，會建立兩種空間: kernel mode and user mode 來執行指令

kernel mode: 被允許執行 privileged instructions
user mode: 不被允許執行 privileged instructions，且事後會轉換到 kernel mode 讓 kernel mode 中的軟件得以清除 application 剩餘的部分
Ex: x86 中有個 int (interrupt)，用途就是為了做 space 間的轉換
- 整體流程:
  1. process make a system call
  2. processor switches to the kernel stack, raise the hardware privilege level, and start executing the kernel instructions that implement the system call
  3. kernel returns to user space after the system call is finished, hardware lowers its privilege level, and switches back to the user stack.
  4. resumes executing user instructions.

Kernel organization

根據要把 OS 的哪些部分放到 kernel mode 裡執行，通常會設計出兩種 kernel 架構:
1. Monolithic kernel:
  - entire OS resides in the kernel.
  - Pros:
    - entire system runs with full HW privilege.
    - easy for different parts of the operating system to cooperate.
  - Cons:
    - interfaces between different parts of the OS are often complex, it is easy for OS developer to make a mistake.
    - if any mistake happens, it is often fatal.
2. Microkernel:
  - minimize the amount of OS code that runs in kernel mode, and execute most of the OS in user mode.
  - Pros:
    - kernel structure is simple and small.
  - Cons:
    - interprocess communication (IPC) is slower.

Process overview

為了提高 isolation 的程度，利用了 process abstraction 的概念
- process 讓每個 program 以為獨佔一台 machine，擁有一個其他 process 無法讀寫的 private memory system/ address space。
- Xv6 利用 page tables 來給予每個 process 各自的 address space，這個 table 可以將 virtual address 映射到 physical address 中。
- 每個 process 會有各自的 page table，記錄了 instruction、全域變數、stack、heap area 等資料。
xv6 利用 struct proc 來管理每個 process 的狀態 (由 allocproc 建立)

struct proc {
    uint sz;                    // Size of process memory (bytes)
    pde_t* pgdir;               // Page table
    char *kstack;               // Bottom of kernel stack for this process 
    enum procstate state;       // Process state
    int pid;                    // Process ID
    struct proc *parent;        // Parent process
    struct trapframe *tf;       // Trap frame for current syscall
    struct context *context;    // swtch() here to run process
    void *chan;                 // If non−zero, sleeping on chan
    int killed;                 // If non−zero, have been killed
    struct file *ofile[NOFILE]; // Open files
    struct inode *cwd;          // Current directory
    char name[16];              // Process name (debugging)
};

Part 3

Chapter 3: Traps, interrupts, and drivers

前言

通常程式在執行的時候，CPU 的運作會遵循一個固定的迴圈:

read the instruction,
advance the program counter,
execute the instruction，

但有時候會出現需要從 user program 跳回 kernel 的情形，本章在探討處理這些情形時會遇到的三大狀況以及處理方式

這些情況包含了:
1. system call: user program asks for an OS service.
2. exception: a program performs an illegal action.
3. interrupt: a device generates a signal when it needs attention from the OS.
為了處理這些狀況， kernel 會被要求具有以下能力:
1. 能使 processor 在 user mode 以及 kernel mode 之間轉換
2. 能跟 devices 平行處理
3. 能夠了解 devices 的 interface
除此之外，也須避免以下事情發生:
1. The system must save the processor’s registers for future transparent resume.
2. The system must be set up for execution in the kernel.
3. The system must chose a place for the kernel to start executing.
4. The kernel must be able to retrieve information about the event, e.g., system call arguments.

在現代的 processors 中，通常都是使用一種硬體機制在處理所有情況，最常見的就是 interrupt (int in x86)

Interrupt and trap

在執行 interrupt 的時候，其程序大致如下:

stops the normal processor loop
processor saves its registers
executes a new sequence called an interrupt handler
finishes the execution
processor returns to from the interrupt

這裡介紹了另一種停止的類型，traps，跟 interrupts 的差異主要在發出的來源不同:

interrupts:
- 由 devices 所發出且可能跟正在跑的 process 無關
- happen concurrently with other activities
traps: 由正在跑的 process 發出(e.g., the process makes a system call and as a result generates a trap)

X86 protection

x86 對於 OS 的 protection 定義了四個階級: 0 (most privilege) 到 3 (least peivilege)，而大多數 OS 只定義兩個 levels (0 和 3)，分別對應到 kernel mode 及 user mode。
x86 透過修改 process privilege 來處裡 interrupt handlers，其過程如下 (invokes the int n):
- Terms:
  - IDT: Interrupt descriptor table
  - CPL: Current Privilege level
  - DPL: Descriptor Privilege level
1. Fetch the n’th descriptor from the IDT, where n is the argument of int.
2. Check that CPL in %cs is <= DPL. $\to$ 讓 int 不會隨意被呼叫
3. Save %esp and %ss in CPU-internal registers, but only if the target segment selec- tor’s PL < CPL.
4. Load %ss and %esp from a task segment descriptor.
5. Push %ss.
6. Push %esp.
7. Push %eflags.
8. Push %cs.
9. Push %eip.
10. Clear the IF bit in %eflags, but only on an interrupt.
11. Set %cs and %eip to the values in the descriptor.
Kernel stack after an int instruction
- 在 int 執行過後， stack 的內容產生了變化:
  - DPL 低於 CPL
  - 因為若不修改 privilege level，x86 不會儲存 %ss 以及 %esp，
- 若想要返回 int 結果，可以呼叫 iret
在 x86 系統中，制定了 256 種不同用途的 interrupts。
- xv6 呼叫 Tvinit 將這 256 種 interrupts 記錄在 table idt，interrupt i 會被記錄在 vector[i] 裡頭
- Tvinit 控制 T_SYSCALL 並傳一個值 1 設定其為 “trap” type gate，此外，不清理 IF flag，允許其他 interrupts，也將 privilege level 設定為 DPL_USER
```
tvinit(void)
{
    int i;

    for(i = 0; i < 256; i++)
        SETGATE(idt[i], 0, SEG_KCODE<<3, vectors[i], 0);
    SETGATE(idt[T_SYSCALL], 1, SEG_KCODE<<3, vectors[T_SYSCALL], DPL_USER);
    initlock(&tickslock, "time");
}
```
遇到 protection levels 要從 user 換到 kernel mode 時，會需要處理 trap 中的 stack switch，此時不能直接使用 user process 中的 stack，因為裡面可能包含惡意程式或錯誤，xv6 在這裡設計了 switchuvm 來將 kernel stack 中最上層的記憶體地址記錄在 task segment descriptor 中

switchuvm(struct proc *p)
{
    if(p == 0)
        panic("switchuvm: no process");
    if(p−>kstack == 0)
        panic("switchuvm: no kstack");
    if(p−>pgdir == 0)
        panic("switchuvm: no pgdir");
    
    pushcli();
    mycpu()−>gdt[SEG_TSS] = SEG16(STS_T32A, &mycpu()−>ts, sizeof(mycpu()−>ts)−1, 0);
    mycpu()−>gdt[SEG_TSS].s = 0;
    mycpu()−>ts.ss0 = SEG_KDATA << 3;
    mycpu()−>ts.esp0 = (uint)p−>kstack + KSTACKSIZE;
    // setting IOPL=0 in eflags *and* iomb beyond the tss segment limit
    // forbids I/O instructions (e.g., inb and outb) from user space 
    mycpu()−>ts.iomb = (ushort) 0xFFFF;
    ltr(SEG_TSS << 3);
    lcr3(V2P(p−>pgdir)); // switch to process’s address space
    popcli();
}

當 trap 發生，processor 會從 task segment descriptor 讀取 %esp and %ss 並和舊的 user %ss, %esp 做替換，但如果這整件事情是在 kernel mode 發生的話，則不會執行任何動作。再來，processor 也會記錄 %eglags, %cs, %eip 等資料，最後再由 Alltraps 來記錄 %ds, %es, %fs, %gs 的資料 (src exist in 3304-3310)

alltraps:
    # Build trap frame.
    pushl %ds
    pushl %es
    pushl %fs
    pushl %gs
    pushal

    # Set up data segments.
    movw $(SEG_KDATA<<3), %ax
    movw %ax, %ds
    movw %ax, %es
    
    # Call trap(tf), where tf=%esp
    pushl %esp
    call trap
    addl $4, %esp

最後會產生 struct trapframe

struct trapframe{
    // registers as pushed by pusha
    uint edi;
    uint esi;
    uint ebp;
    uint oesp;  //useless & ignored
    uint ebx;
    uint edx;
    uint ecx;
    uint eax;    // contains the system call number for the kernel to inspect later.

    // rest of trap frame
    ushort gs;
    ushort padding1;
    ushort fs;
    ushort padding2;
    ushort es;
    ushort padding3;
    ushort ds;
    ushort padding4;
    uint trapno;

    // below here defined by x86 hardware
    uint err;
    uint eip;
    ushort cs;    // user code segment selector
    ushort padding5;
    uint eflags;    // content of the %eflags register

    // below here only when crossing rings, such as from user to kernel
    uint esp;
    ushort ss;
    ushort padding6;
}

C trap handler

這裏解釋了 Trap 的 src 內容 (3401-3480)，列舉如下:

查看 hardware trap number tf->trapno 來確認是否被呼叫以及需要執行的程序為何，例如: trap is T_SYSCALL，trap 就會呼叫 system call handler syscall
檢查完 system call 之後， trap 會看是否有 hardware interrupts 出現
trap 如果不是被 system call 或是 hardware device 呼叫的話，會預設是被 incorrect behavior 所呼叫，此時如果是 user program，xv6 會打印相關資料並設定 proc->killed 提醒事後要清理掉這個 user program；但如果是 kernel 所驅動的話，trap 除了打印相關資料，也會呼叫 panic。
C trap handler - flowchart

System calls

假如發生了 system call，trap 會呼叫 syscall，syscall會從 trap frame 中讀取 system call number，存到 system call tables 中，如果 number 有問題，return -1。

syscall(void)
{
    int num;
    struct proc *curproc = myproc();

    // 主要讀取的資訊 (eax)
    num = curproc−>tf−>eax;
    if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {
        curproc−>tf−>eax = syscalls[num]();
    } else {
        cprintf("%d %s: unknown sys call %d\n", curproc−>pid, curproc−>name, num);
        curproc−>tf−>eax = −1;
    }
}

接下來討論一個問題：如何找到 system call 的 arguments？
$\to$ 利用 helper functions: argint, argptr, argstr, and argfd
$\to$ retrieve the n’th system call argument, as either an integer, pointer, a string, or a file descriptor.

Interrupts

Devices 可以產生 interrupts，因此 xv6 會透過 hardware 來處理
除了 devices 可在任意時間產生以外，Interrupts 和 system calls 很相似。

早期是透過 Programmable interrupt controler (PIC) 來提供中斷程序，隨著 multiprocessor PC boards 的出現，需要新的方式來處理，因為每個 CPU 需要各自的 controller，這裏包括兩部分:

在 I/O systems 中的 (IOPIC, ioapic.c)
在每個 processor 中的 (LAPIC, lapic.c)

初始化過程中，每個 device 會處理自己的中斷，並指定哪個 processor 要來處理這個中斷，而儲存在 LAPIC 的代碼則是可以被所有的 processor 取用，xv6 將這個機制寫在 lapicinit 中，其中最重要的是底下這行：

lapicw(TIMER, PERIODIC | (T_IRQ0 + IRQ_TIMER));

這裏告訴 LAPIC 要週期性的在 IRQ_TIMER 產生一個 interrupt。

此外，processor 可以透過設定 %eflags register 中的 IF flag 來決定能不能接收 interrupts。可用的 functions 包含了 cli 以及 sti。

Drivers

Driver 是 OS 中用來管理某 device 的代碼，通常稱作驅動程式，作用包括:
- 告知 device 如何運作
- 提供與設備相關的中斷程序以及如何處理等
這類程式較難處理的原因包括：程序需和設備同時運行、Driver 要懂 device 的 interface、interface 可能很複雜但沒有好的解釋

這邊舉 Disk driver 為例，傳統上 disk hardware 將資料表示為一連串 512-byte 的區塊 (also called sectors)，但是，OS 對其 file system 所使用的 block size，可能會跟這個 sector 不同。此外，這兩個區塊所包含的內容很有可能不同步，可能是還沒完全從 disk 中取出，也可能是已更新但是還未完全寫到 disk。
Disk driver的作用就是要確保程式不會因為不同步的問題而無法運作。

xv6 會使用一個 struct buf 來呈現這個區塊

struct buf {
    int flags;    // track the relationship between memory and disk
    uint dev;    // numbering
    uint blockno;
    struct sleeplock lock; 
    uint refcnt;
    struct buf *prev; // LRU cache list 
    struct buf *next;
    struct buf *qnext; // disk queue 
    uchar data[BSIZE];    // BSIZE: identical to the IDE's sector size
};

#define B_VALID 0x2 // buffer has been read from disk
#define B_DIRTY 0x4 // buffer needs to be written to disk

Disk driver

IDE 已慢慢被 SCSI 以及 SATA 取代，但 IDE 的介面因為簡單還是常被用在測試程序中
流程:
- kernel 在 boot time 的時候會呼叫 ideinit 來初始化 disk driver。
- ideinit 呼叫 ioapicenable 來啟動 IDE_IRQ interrupt，只針對最後一個 CPU (ncpu-1)
- ideinit 呼叫 idewait 讓 disk 等候接收指令，這指令會觀察 IDE_BSY 以及 IDE_DRDY 的狀態
- 到這邊，disk controller is ready，ideinit 會檢查 disk 的數量
- buffer cache 呼叫 iderw 將磁碟中的資料送進緩衝區
- iderw 透過 idestart 將緩衝區資料送進磁碟
- 當 process 進入 running state 後，才開始處理 I/O process
- 程序結束後會觸發一個 interrupt，trap 呼叫 ideintr 來接手，ideintr 會檢查當前 buffer 的狀況，並繼續處理下個 buffer