106 NTUST OS

B10532016 四電資三楊泓彬

項目一

eax 為 0xaa55，edx 為 0x80，esp 為 0x6f2c，eip 為 0x7c00，eflags 為 0x202，其餘為 0。

項目二

Chapter 0

作業系統的主要工作：

讓不同的程式可以共用一台電腦，彼此互動、共享檔案
將硬體抽象化（例如 word 儲存檔案時不用考慮硬碟的類型）
讓不同程式可以（看起來像）同時運行

kernel

kernel 是一個特殊的程式，擁有硬體特權，為 process 提供服務，作為一般的程式和作業系統的溝通介面。

受限於 kernel 的保護機制，process 只能存取自己的記憶體，當其需要調用核心服務（kernel service）時，它會在作業系統的介面發起一個 system call，硬體會提高它的權限、執行 kernel 定義的功能並將結果回傳。

Process 和 Memory

一個 xv6 程序包含了兩個部分，分別是 user-space memory （指令、資料，和 stack ）和僅有 kernel 可見的程序狀態。xv6 支援 time-share，它會讓等待中的程序在閒置的 CPU 上切換執行。當一個程序沒有在執行時，xv6 會儲存它的 CPU 暫存器，並在下次執行時恢復。kernel 會將每個程序和一個 pid（process identifier）建立關聯。

程序可以透過 fork 建立 child process（子程序）。child process 和創建它的程序 parent process（父程序）有著同樣的記憶體內容，在父程序中，fork 會回傳子程序的 pid；在子程序中則會回傳 0。用下面的程式碼舉例：

  int pid = fork();
  if(pid > 0){
      printf("parent: child=%d\n", pid);
      pid = wait();
      printf("child %d is done\n", pid);
  } else if(pid == 0){
      printf("child: exiting\n");
      exit();
  } else {
      printf("fork error\n");
  }

exit 會讓調用它的程序停止執行並釋放資源（例如記憶體、開啟的檔案），wait 會回傳當前程序退出的子程序 pid；如果沒有子程序退出，wait 會一直等候。在這個例子中，兩行輸出

parent: child=1234
child: exiting

的順序是不固定的，端看父、子程序中，哪個先呼叫 printf。當子程序結束、父程序的 wait 回傳以後，父程序會印出

parent: child 1234 is done

雖然子程序和父程序的記憶體內容在一開始是一樣的，但它們其實運行在不同的記憶體、不同的暫存器，所以改變其中一邊的變數值並不會影響另一邊。舉例來說，把 wait 的回傳值存進父程序 pid，並不會改變子程序的 pid，子程序的 pid 仍然為 0。

exec 會從檔案系統中讀取一個存放記憶體映像（memory image）的檔案，並置換掉當前程序的記憶體。這個檔案必須有著特定的格式（例如 EFL，其指定了哪部分存 instructions、哪部分存 data、要從哪個 instruction 開始執行等等）。當 exec 執行成功後，它並不會回到呼叫他的程式；它會從 ELF 標頭中宣告的進入點（entry point）開始執行。
exec 有兩個參數：

可執行檔名
一個字串參數陣列

舉例來說：

  char *argv[3];

  argv[0] = "echo";
  argv[1] = "hello";
  argv[2] = 0;
  exec("/bin/echo", argv);
  printf("exec error\n");

這段程式會把調用程式替換成 /bin/echo 並執行，echo hello 則是傳入 /bin/echo 的參數。大多數的程式會忽略第一個參數（照慣例來說會是程式名稱）。

xv6 shell 使用上述的呼叫為用戶執行程式。shell 的主要架構很簡單，主迴圈會用 getcmd 讀取一行指令，然後呼叫 fork，創建一個 shell 程式的副本。父程序呼叫 wait，同時子程序執行命令。舉例來說，如果使用者輸入 "echo hello"，getcmd會以 echo hello 為參數呼叫 runcmd；並由runcmd（8086）執行實際命令。對於 echo hello，runcmd 會呼叫 exec。如果 exec 成功執行，子程序就會開始執行 echo 裡的指令，而非繼續執行 runcmd。在某個時間點，echo 會呼叫 exit，讓父程序從 wait 返回 main。

xv6 隱藏式的分配（allocate）大多數的 user-space memory：fork 分配子程序拷貝父程序所需的記憶體空間，exec 則分配足以儲存執行檔的空間。程序在運行時如果需要更多記憶體（可能是呼叫 malloc）可以呼叫 sbrk(n) 來增加 n (bytes) 的空間；sbrk 會回傳新記憶體的位置。

xv6 並沒有「使用者」的概念，當然也不會有所謂的隔離保護機制；用 Unix 的術語來說，所有的 xv6 指令都是以 root 執行。

I/O and File descriptors

檔案描述符（file descriptor） 是一個小整數，用來表示一個被 kernel 管理、程序可以讀/寫的物件（object），檔案描述符的介面（interface）將檔案、pipe、和裝置抽象化，讓他們看起來都像一串 byte 流（stream）。

程序可以透過 read(fd, buf, n) 和 write(fd, buf, n) 存取檔案描述符，下面的程式碼是 cat 程序的實現，會從標準輸入複製資料到標準輸出中，如果發生錯誤，則將錯誤訊息寫入標準錯誤：

  char buf[512];
  int n;

  for(;;){
      /* 從檔案描述符 0 (standard input) 讀 sizeof(buf) 個 byte 到 buf 中 */
      /* n = 實際讀入的 byte 數*/
      n = read(0, buf, sizeof buf); 
      if (n == 0)
          break;
      if (n < 0) {
          /* 把 "read error" 寫進檔案描述符 2 (standard error) */
          fprintf(2, "read error\n");
          exit();
      }
      /* 把 buf 中 n byte 寫進檔案描述符 1 (standard output) */
      /* if(實際寫入的 byte 數) != n */
      if (write(1, buf, n) != n) {
          /* 把 "write error" 寫進檔案描述符 2 (standard error) */
          fprintf(2, "write error\n");
          exit();
      }
  }

檔案描述符和 fork 搭配，實現 I/O 重新導向：

char *argv[2];

argv[0] = "cat";
crgv[1] = 0;

if (fork() == 0) {
    /* 關閉標準輸入 */
    close(0);
    /* 開啟 input.txt */
    open("input.txt", O_RDONLY);
    exec("cat", argv);
}

父、子程序會共享檔案描述符的 byte offset

if(fork() == 0) {
    write(1, "hello ", 6);
    exit(); 
} else {
    wait();
    write(1, "world\n", 6);
}

執行完成後檔案描述符 1 的輸出會是：

hello world

Pipe

Pipe 是一個開放給 process 的小型 kernel 緩衝區，有一對檔案描述符，一個讀一個寫，從一端寫入的資料可以從另一端被讀取，因此不同的 process 可以透過 pipe 溝通。

Pipe 跟使用暫存檔相比，並沒有更強大的能力（可以被取代）：

echo hello world | wc

echo hello world >/tmp/xyz; wc </tmp/xyz

不過 pipe 有幾項優點：

會自動清理
能傳遞任意長度的資料，使用暫存檔需要有足夠的硬碟空間
可以平行處理，使用暫存檔須等前面的程序執行完成，後面的才能開始
在實作程序間溝通時，跟非 blocking semantics 的檔案相比，有較高效率的 blocking read & write

File system

檔案系統包含檔案和資料夾，檔案是未解析過的 byte 陣列；資料夾則包含檔案和其他資料夾。檔案系統採用樹狀結構，從一個特殊的資料夾 root 開始；path 則呈現 /a/b/c 這樣的形式，代表 root 目錄下 a 資料夾中的 b 資料夾裡的 c 檔案。

Chapter 1

作業系統組織

作業系統的關鍵在於讓多個程式可以同時運作，程式間也可以溝通、共享資源；如果某個程式發生錯誤，也不能影響其他正常運作的程式。因此，作業系統必須滿足三個要求：多工、隔離、互動（multiplexing, isolation, and interaction）。

抽象物理資源

雖然使用函式庫也可以完成系統調用的功能，如此也能有更高的效能與效率（某些嵌入式系統便是如此）；但為了達到上述的三個需求，並確保系統的穩定性，將敏感的資源抽象化、並由作業系統統一管理，似乎是比較妥當的選擇。Unix 也用時間證明了這是最佳的方式。

User mode、kernel mode、和 system call

如果應用程式發生錯誤，我們不能讓作業系統一起當掉；作業系統也應該有能力清除執行錯誤的應用程式，並確保其他應用程式仍可繼續運行。為此，我們需要有效的隔離應用程式和作業系統。

CPU 為這樣的隔離提供硬體層面的支持，以 x86 來說，它有兩種指令執行模式，分別是 user mode 和 kernel mode。在 kernel mode 下才能夠執行特殊指令（例如硬碟、I/O 設備的讀寫）。舉例來說，一般的程式若要讀寫硬碟，必須先呼叫特殊指令切換到 kernel mode，進入 kernel mode 後，kernel 會先驗證 system call 的參數，並決定是否接受請求，隨後才會跳到 kenel 指定的地方開始執行。

kernel mode 的入口與其參數驗證非常重要，如果應用程式能夠輕易進入 kernel mode，將面臨惡意程式的攻擊。

Process 總覽

Process 是 Unix 系統中的隔離基本單位，它為程式提供各自獨立、不可侵犯的 address space，同時利用 page table 將 virtual address 對應到 physical address。

如圖所示，一個 address space 包含從虛擬位置 0 開始的 user memory，前面放置程序的指令、全域變數，後面則有 stack 區、heap 區（malloc 用）。

xv6 使用 struct proc 來維護一個程序的狀態，包含 page table、kernel stack、當前運行狀態。

每個 process 都有一個 thread 來執行指令。當要切換 process 時，kernel 會把正在運行的 thread 暫停，並把其狀態存進 thread 的 stack，然後開始運行另一個。

第一個地址空間

當電腦開機時，它會初始化自己，並從硬碟中載入 boot loader 到記憶體並運行；然後 boot loader 把 xv6 kernel 從硬碟中載入物理地址 0x100000（因為 0xa0000 到 0x100000 屬於 I/O 設備），並從 entry 開始運行。

entry page table 被定義在 main.c 裡面，由虛擬記憶體 0:0x400000 對應到實體位置 0x400000；只要 entry 還在執行，這個 map 就會存在，但終究會被移除。

entry 512 會將虛擬地址 KERNBASE:KERNBASE+400000 映射到實體位置 0:0x400000，這個入口會在 entry 終止執行後被 kernel 所用；這個 mapping 會把 kernel 的指令和資料限制在 4MB 內。

創建第一個 process

main 在初始化設備和系統被後，會呼叫 userinit 建立第一個 process；user init 會呼叫 allocproc。allocproc 的工作是在 process table 中分配一個 slot（即 struct proc），並初始化 process 的狀態。allocproc 會在 proc 的表中尋找標記為 UNUSED 的 slot，找到後會將其狀態設置為 EMBRY0，標記為「被使用」，並分配一個唯一的 pid 給該程序。

在 main 呼要 userinit 後 mpmain 會呼叫 scheduler 開始運行程序。scheduler 會搜尋狀態為 RUNNABLE 的程序，此時只會有一個——initproc。隨後 proc 會被設定為該程序，switchuvm 也會通知硬體使用目標程序的 page table。

項目 3

Chapter 3

Traps, interrupts, drivers

當執行程序時，會不斷循環：讀取指令、更新程式計數器、執行，再讀取指令。但有時候會需要進入 kernel 而不是執行下一條指令；包括

interrupt：設備信號的發出，例如
- clock chip 每 100 ms 會產生一個中斷以實現分時
- 硬盤讀完一個 block 後會發出一個中斷，告訴作業系統該 block 已經準備好了
exception：非法操作，例如
- 存取不存在 page table 的記憶體位置
- 除零
利用 system call 向 kernel 請求服務

中斷發生時，作業系統會中斷循環、將程序的暫存器內容保存起來，並執行中斷處理程序中的指令。

x86 的保護機制

x86 將權限分成四種保護等級，從 0（最高）到 3（最低）；大部分的作業系統只使用 0 和 3，分別為 kernel mode 和 user mode。當前執行指令的權限等級會存在 %cs 暫存器中的 CPL 區。

interrupt descriptor table（IDT）定義了 interrupt handler；包含 256 個進入點，每個都提供了處理對應情況時所需要的 %cs 和 %eip。

需要進行系統調用時，程式會調用指令 int n，其中 n 為 IDT 的索引，其執行流程如下：

從 IDT 取得第 n 個 descriptor
檢查 %cs 中 CPL 是否 ≦ DPL（DPL 為 descriptor 的權限等級）
如果目標的 PL < CPL，就將 %esp 和 %ess 的值存進 CPU 內部的暫存器中
從工作片段中讀取 %ss 和 %esp
push %ss
push %esp
push %eflags
push %cs
push %eip
如果是 interrupt 的話，清除 %eflags 中的 IF 位元。
將 %cs 和 %eip 設為 descriptor 中的值

上圖為一個 int 指令執行後，stack 的情況。作業系統可以使用 iret 指令來從一個 int 指令中回傳；其會從 stack 中 pop 出 int 存進去的值並恢復 %eip 讓原本的程序繼續執行。

Code: Assembly trap handlers

x86 允許 256 個不同的中斷；中斷 0~31 被定義為軟體異常、32~63 為硬體中斷、64 為 system call 的中斷代號。

tvinit (3367) 設置了 IDT 表中的 256 個項，中斷 i 被位於 vectors[i] 的 code 處理。

3366 void
3367 tvinit(void)
3368 {
3369   int i;
3370  
3371   for(i = 0; i < 256; i++)
3372     SETGATE(idt[i], 0, SEG_KCODE<<3, vectors[i], 0);
3373   SETGATE(idt[T_SYSCALL], 1, SEG_KCODE<<3, vectors[T_SYSCALL], DPL_USER);
3374
3375   initlock(&tickslock, "time");
3376 }

當權限從 user mode 轉向 kernel mode 時，switchuvm (1860) 會把用戶程序的 kernel stack 頂端的地址存入任務階段描述符。

1860 switchuvm(struct proc *p)
1861 {
1862   if(p == 0)
1863     panic("switchuvm: no process");
1864   if(p−>kstack == 0)
1865     panic("switchuvm: no kstack");
1866   if(p−>pgdir == 0)
1867     panic("switchuvm: no pgdir");
1868  
1869   pushcli();
1870   mycpu()−>gdt[SEG_TSS] = SEG16(STS_T32A, &mycpu()−>ts,
1871   sizeof(mycpu()−>ts)−1, 0);
1872   mycpu()−>gdt[SEG_TSS].s = 0;
1873   mycpu()−>ts.ss0 = SEG_KDATA << 3;
1874   mycpu()−>ts.esp0 = (uint)p−>kstack + KSTACKSIZE;
1875   // setting IOPL=0 in eflags *and* iomb beyond the tss segment limit
1876   // forbids I/O instructions (e.g., inb and outb) from user   space
1877   mycpu()−>ts.iomb = (ushort) 0xFFFF;
1878   ltr(SEG_TSS << 3);
1879   lcr3(V2P(p−>pgdir)); // switch to process’s address space  
1880   popcli();
1881 }

當 interrupt 發生時，如果程序在 user mode 下運行，它會從任務階段描述符中載入 %esp 和 %ss，把舊的 %ss 和 %esp push 進新的 stack；如果程序在 kernel mode 下運行則不動作。
接下來程序會 push %eflags, %cs, %eip；某些時候會連 error word 也 push 進去。最後從 IDT 表中讀取對應的 %eip 和 % cs。

xv6 使用一個 Perl 腳本 (3250) 來產生 IDT 表對應的中斷處理進入點，然後跳到 apptraps。

3250 #!/usr/bin/perl −w
3251
3252 # Generate vectors.S, the trap/interrupt entry points.
3253 # There has to be one entry point per interrupt number
3254 # since otherwise there’s no way for trap() to discover
3255 # the interrupt number.
3256
3257 print "# generated by vectors.pl − do not edit\n";
3258 print "# handlers\n";
3259 print ".globl alltraps\n";
3260 for(my $i = 0; $i < 256; $i++){
3261   print ".globl vector$i\n";
3262   print "vector$i:\n";
3263     if(!($i == 8 || ($i >= 10 && $i <= 14) || $i == 17)){
3264       print " pushl \$0\n";
3265     }
3266   print " pushl \$$i\n";
3267   print " jmp alltraps\n";
3268 }
3269
3270 print "\n# vector table\n";
3271 print ".data\n";
3272 print ".globl vectors\n";
3273 print "vectors:\n";
3274 for(my $i = 0; $i < 256; $i++){
3275   print " .long vector$i\n";
3276 }

alltraps (3304) 會 push %ds、%es、%fs、%gs、通用暫存器，此時 kernel stack 就有一個完整的 struct trapframe （0602），包含從 kernel 回到原始程序所需要的資訊。

3304 alltraps:
3305 # Build trap frame.
3306 pushl %ds
3307 pushl %es
3308 pushl %fs
3309 pushl %gs
3310 pushal
3311
3312 # Set up data segments.
3313 movw $(SEG_KDATA<<3), %ax
3314 movw %ax, %ds
3315 movw %ax, %es
3316
3317 # Call trap(tf), where tf=%esp
3318 pushl %esp
3319 call trap
3320 addl $4, %esp
3321
3322 # Return falls through to trapret...
3323 .globl trapret
3324 trapret:
3325 popal
3326 popl %gs
3327 popl %fs
3328 popl %es
3329 popl %ds
3330 addl $0x8, %esp # trapno and errcode
3331 iret

trapfram 的資料結構：

0602 struct trapframe {
0603 // registers as pushed by pusha
0604   uint edi;
0605   uint esi;
0606   uint ebp;
0607   uint oesp; // useless & ignored
0608   uint ebx;
0609   uint edx;
0610   uint ecx;
0611   uint eax;
0612  
0613   // rest of trap frame
0614   ushort gs;
0615   ushort padding1;
0616   ushort fs;
0617   ushort padding2;
0618   ushort es;
0619   ushort padding3;
0620   ushort ds;
0621   ushort padding4;
0622   uint trapno;
0623  
0624   // below here defined by x86 hardware
0625   uint err;
0626   uint eip;
0627   ushort cs;
0628   ushort padding5;
0629   uint eflags;
0630  
0631   // below here only when crossing rings, such as from user to kernel
0632   uint esp;
0633   ushort ss;
0634   ushort padding6;
0635 };

設置完成後，alltraps 就能呼叫 C 中斷處理程序 trap（3401），根據不同情況作出對應的動作。

3401 trap(struct trapframe *tf)
3402 {
3403   if(tf−>trapno == T_SYSCALL){
3404     if(myproc()−>killed)
3405       exit();
3406     myproc()−>tf = tf;
3407     syscall();
3408   if(myproc()−>killed)
3409     exit();
3410   return;
3411   }
3412
3413   switch(tf−>trapno){
3414   case T_IRQ0 + IRQ_TIMER:
3415     if(cpuid() == 0){
3416       acquire(&tickslock);
3417       ticks++;
3418       wakeup(&ticks);
3419       release(&tickslock);
3420     }
3421     lapiceoi();
3422     break;
3423   case T_IRQ0 + IRQ_IDE:
3424     ideintr();
3425     lapiceoi();
3426     break;
3427   case T_IRQ0 + IRQ_IDE+1:
3428     // Bochs generates spurious IDE1 interrupts.
3429     break;
3430   case T_IRQ0 + IRQ_KBD:
3431     kbdintr();
3432     lapiceoi();
3433     break;
3434   case T_IRQ0 + IRQ_COM1:
3435     uartintr();
3436     lapiceoi();
3437     break;
3438   case T_IRQ0 + 7:
3439   case T_IRQ0 + IRQ_SPURIOUS:
3440     cprintf("cpu%d: spurious interrupt at %x:%x\n",
3441     cpuid(), tf−>cs, tf−>eip);
3442     lapiceoi();
3443     break;
3444 
...
3450   default:
3451     if(myproc() == 0 || (tf−>cs&3) == 0){
3452       // In kernel, it must be our mistake.
3453       cprintf("unexpected trap %d from cpu %d eip %x (cr2=  0x%x)\n",
3454       tf−>trapno, cpuid(), tf−>eip, rcr2());
3455       panic("trap");
3456     }
3457     // In user space, assume process misbehaved.
3458     cprintf("pid %d %s: trap %d err %d on cpu %d "
3459             "eip 0x%x addr 0x%x−−kill proc\n",
3460             myproc()−>pid, myproc()−>name, tf−>trapno,
3461             tf−>err, cpuid(), tf−>eip, rcr2());
3462     myproc()−>killed = 1;
3463   }
3464    
3465   // Force process exit if it has been killed and is in user   space.
3466   // (If it is still executing in the kernel, let it keep runni  ng
3467   // until it gets to the regular system call return.)
3468   if(myproc() && myproc()−>killed && (tf−>cs&3) == DPL_U  SER)
3469     exit();
3470  
3471   // Force process to give up CPU on clock tick.
3472   // If interrupts were on while locks held, would need to ch  eck nlock.
3473   if(myproc() && myproc()−>state == RUNNING &&
3474      tf−>trapno == T_IRQ0+IRQ_TIMER)
3475     yield();
3476  
3477   // Check if the process has been killed since we yielded
3478   if(myproc() && myproc()−>killed && (tf−>cs&3) == DPL_U  SER)
3479     exit();
3480 }

Code: system call

對於 system call，trap 會呼叫 syscall（3701）；
syscall 會在 %eax 保存 system call 的回傳值；如果是非法的，他會顯示錯誤並回傳 -1。

函示 argint、argptr 和 argstr 獲得第 n 個 system call 參數，分別用來獲取整數、指標、字串，和檔案描述符。

3700 void
3701 syscall(void)
3702 {
3703   int num;
3704   struct proc *curproc = myproc();
3705
3706   num = curproc−>tf−>eax;
3707   if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {
3708     curproc−>tf−>eax = syscalls[num]();
3709   } else {
3710     cprintf("%d %s: unknown sys call %d\n",
3711     curproc−>pid, curproc−>name, num);
3712     curproc−>tf−>eax = −1;
3713   }
3714 }

Code: Interrupts

主機板上的設備可以產生 interrupt，作業系統必須配置硬體來處理。interrupt 和 system call 相似，但前者可能在任何時候產生。

早期的主機板有一個簡單的可程式化中斷控制器（programmable interrupt controler，PIC）來實現中斷，但到了多核心的時代，需要兩個部分來完成，一部分在 I/O system 中（IO APIC, ioapic.c），另一部分依附在各處理器上（local APIC, lapic.c）。

IO APIC 有一張表，處理器可以透過 memory-mapped I/O 來設置入口，不同的裝置會對應到不同的 interrupt，同時也會標明應該交由哪一個處理器處理。舉例來說，xv6 將鍵盤的 interrupt 路由到 0 號處理器（8274）、將硬碟的 interrupt 路由到編號最大的處理器。

8273 void
8274 consoleinit(void)
8275 {
8276   initlock(&cons.lock, "console");
8277
8278   devsw[CONSOLE].write = consolewrite;
8279   devsw[CONSOLE].read = consoleread;
8280   cons.locking = 1;
8281  
8282   ioapicenable(IRQ_KBD, 0);
8283 }

Timer chip 在 LAPIC 中，以便讓各處理器可以獨立接收 timer interrupt。xv6 在 lapicinit（7408）中設置；關鍵在 timer（7421）這行，它會告訴 LAPIC 以 IRQ_TIMER（也就是 IRQ 0）為週期產生中斷。

7407 void
7408 lapicinit(void)
7409 {
7410   if(!lapic)
7411     return;
7412  
7413   // Enable local APIC; set spurious interrupt vector.
7414   lapicw(SVR, ENABLE | (T_IRQ0 + IRQ_SPURIOUS));
7415  
7416   // The timer repeatedly counts down at bus frequency
7417   // from lapic[TICR] and then issues an interrupt.
7418   // If xv6 cared more about precise timekeeping,
7419   // TICR would be calibrated using an external time sourc  e.
7420   lapicw(TDCR, X1);
7421   lapicw(TIMER, PERIODIC | (T_IRQ0 + IRQ_TIMER));
7422   lapicw(TICR, 10000000);
7423  
7424   // Disable logical interrupt lines.
7425   lapicw(LINT0, MASKED);
7426   lapicw(LINT1, MASKED);
7427  
7428   // Disable performance counter overflow interrupts
7429   // on machines that provide that interrupt entry.
7430   if(((lapic[VER]>>16) & 0xFF) >= 4)
7431     lapicw(PCINT, MASKED);
7432  
7433   // Map error interrupt to IRQ_ERROR.
7434   lapicw(ERROR, T_IRQ0 + IRQ_ERROR);
7435  
7436   // Clear error status register (requires back−to−back write  s).
7437   lapicw(ESR, 0);
7438   lapicw(ESR, 0);
7439  
7440   // Ack any outstanding interrupts.
7441   lapicw(EOI, 0);
7442  
7443   // Send an Init Level De−Assert to synchronise arbit  ration ID’s.
7444   lapicw(ICRHI, 0);
7445   lapicw(ICRLO, BCAST | INIT | LEVEL);
7446   while(lapic[ICRLO] & DELIVS)
7447   ;
7448  
7449   
7450   // Enable interrupts on the APIC (but not on the proce  ssor).
7451   lapicw(TPR, 0);
7452 }

處理器可以透過 %eflags 中的 IF flag 控制是否要收到中斷訊號；指令 cli 透過清除 IF 來屏蔽中斷，sti 則可開啟。

Driver

Driver 是作業系統中用來管理特定設備的程式。

Code: Disk driver

xv6 用 struct buf（3850）來表示一個 block：

3850 struct buf {
3851   int flags;
3852   uint dev;
3853   uint blockno;
3854   struct sleeplock lock;
3855   uint refcnt;
3856   struct buf *prev; // LRU cache list
3857   struct buf *next;
3858   struct buf *qnext; // disk queue
3859   uchar data[BSIZE];
3860 };

kernel 在啟動時通過呼叫 main 中的 ideinit（4251）初始化硬碟驅動程式。ideinit 會呼叫 ioapicenable 來開啟 IDE_IRQ interrupt（4256）。

接下來 ideinit 開始檢查硬碟，它會呼叫 idewait（4257）等待硬碟接受指令。主機板藉由 I/O 阜 0x1f7 來表示硬碟狀態，idewait 會一直等到 IDE_BSY 被清除並設置為 IDE_DRDY。

4251 ideinit(void)
4252 {
4253   int i;
4254  
4255   initlock(&idelock, "ide");
4256   ioapicenable(IRQ_IDE, ncpu − 1);
4257   idewait(0);
4258  
4259   // Check if disk 1 is present
4260   outb(0x1f6, 0xe0 | (1<<4));
4261   for(i=0; i<1000; i++){
4262     if(inb(0x1f7) != 0){
4263       havedisk1 = 1;
4264       break;
4265     }
4266   }
4267  
4268   // Switch back to disk 0.
4269   outb(0x1f6, 0xe0 | (0<<4));
4270 }

此時硬碟控制器便已就緒。

硬碟的存取時間對處理器而言非常久，為了有效利用 CPU，等待的時間可以執行其他程序，等到硬碟操作完成後再接收 interrupt。

iderw（4354）維護了一個存放請求的 queue，他會將 buffer b push 進 queue（4367~4371），如果這個 buffer 在 queue 的頭，他會呼叫 idestart 將其送到硬碟。

4350 // Sync buf with disk.
4351 // If B_DIRTY is set, write buf to disk, clear B_DIRTY, set B_VALID.
4352 // Else if B_VALID is not set, read buf from disk, set B_VALID.
4353 void
4354 iderw(struct buf *b)
4355 {
4356   struct buf **pp;
4357  
4358   if(!holdingsleep(&b−>lock))
4359     panic("iderw: buf not locked");
4360   if((b−>flags & (B_VALID|B_DIRTY)) == B_VALID)
4361     panic("iderw: nothing to do");
4362   if(b−>dev != 0 && !havedisk1)
4363     panic("iderw: ide disk 1 not present");
4364  
4365   acquire(&idelock);
4366  
4367   // Append b to idequeue.
4368   b−>qnext = 0;
4369   for(pp=&idequeue; *pp; pp=&(*pp)−>qnext)
4370   ;
4371   *pp = b;
4372  
4373   // Start disk if necessary.
4374   if(idequeue == b)
4375     idestart(b);
4376  
4377   // Wait for request to finish.
4378   while((b−>flags & (B_VALID|B_DIRTY)) != B_VALID){
4379     sleep(b, &idelock);
4380   }
4381  
4382  
4383   release(&idelock);
4384 }

idestart 會根據 flag 來決定要讀取或是寫入（若設置 B_DIRTY 則寫入；若未設置 B_VALID 則為讀取），在操作結束後也會產生不同的 interrupt；trap 會呼叫 ideintr（4304）來處理，它會查看 queue 上的第一個 buffer 來確定發生了什麼操作；如果是讀取，它會呼叫 insl 把資料寫進 buffer，此時 buffer 便已就緒。接著他會設置 B_VALID、清除 B_DIRTY，然後喚醒程序；最後將下一個等待中的 buffer 送進硬碟。

4304 ideintr(void)
4305 {
4306   struct buf *b;
4307  
4308   // First queued buffer is the active request.
4309   acquire(&idelock);
4310  
4311   if((b = idequeue) == 0){
4312     release(&idelock);
4313     return;
4314   }
4315   idequeue = b−>qnext;
4316  
4317   // Read data if needed.
4318   if(!(b−>flags & B_DIRTY) && idewait(1) >= 0)
4319     insl(0x1f0, b−>data, BSIZE/4);
4320  
4321   // Wake process waiting for this buf.
4322   b−>flags |= B_VALID;
4323   b−>flags &= ~B_DIRTY;
4324   wakeup(b);
4325  
4326   // Start disk on next buf in queue.
4327   if(idequeue != 0)
4328     idestart(idequeue);
4329  
4330   release(&idelock);
4331 }