106 OS homework 2-2 成果報告書

組長：B10431031 林言達
組員：
組員：
github: https://github.com/segnoda/xv6-public

🙂 使用情境說明(包含流程圖)

When we wants to utilize all the threads of CPU in some programs such as image processing, rendering of animation or web server, we can use multi-threaded program to do so.

To demonstrate how to use multi-threaded program, we are going to write a multi-threaded counter by the following cases.

First Case

First, we create a global variable g_counter as a counter for all threads to use. g_counter should be first initialized as 0.

Then, we can start to create threads to increase the count of g_counter. For each thread, we need 4KB of memory as a stack and argument for the function. We use (void *)malloc(4096) to allocate from heap and (int *)malloc(4) for 4 byte single argument.

We also need a function to perform computing, which is void counter(void *args). In counter(), we use a for loop to increase g_counter for 5000 times. We decided to utilize 8 threads to execute counter(). Therefore, at the end of the program, we should have total number of 8 * 5000 = 40000 in g_counter. We can define a variable final_target to check the result of g_counter.

Finally, we can use clone() to create and start all the threads as well as join() to wait for all the threads.

The structure of the multi-threaded counter should be look like this:

#define NUM_THREADS 8
#define TARGET_COUNT_PER_THREAD 5000

int g_counter;

void counter(void *arg)
{
  for(int i = 0; i < TARGET_COUNT_PER_THREAD; i++){
    g_counter = g_counter + 1;
  }
  exit();
}

int main(int argc, char **argv)
{
  int final_target = NUM_THREADS * TARGET_COUNT_PER_THREAD;
  g_counter = 0;

  void *stacks[NUM_THREADS];
  int *args[NUM_THREADS];
  for(int i = 0; i < NUM_THREADS; i++){
    stacks[i] = (void*) malloc(4096);
    args[i] = (int*) malloc(4);
    *args[i] = i;
  }

  for(int i = 0; i < NUM_THREADS; i++){
    clone(counter, args[i], stacks[i]);
  }

  for(int i = 0; i < NUM_THREADS; i++){
    void *joinstack;
    join(&joinstack);
  } 

  printf(1, "g_counter is %d\n", g_counter);
  printf(1, "final_target is %d\n", final_target);
  if(g_counter == final_target)
    printf(1, "TEST PASSED!\n");
  else
    printf(1, "TEST FAILED!\n");

  exit();
}

As we run this program, we should get:

g_counter is 40000
final_target is 40000
TEST PASSED!

Second Case

Always consider the worst case!

If we not just run g_counter = g_counter + 1 inside for loop. Instead, we have the following code:

void counter(void *arg)
{
  for(int i = 0; i < TARGET_COUNT_PER_THREAD; i++){
    int temp = g_counter;
    temp = temp + 1;
    sleep(0);
    g_counter = temp;
  }
  exit();
}

Which do the same thing but in a more complex way, we get the result:

g_counter is 9730
final_target is 40000
TEST FAILED!

This is the so-called race condition.

Race Condition

We found that the reason of not having race condition in the first case is that the compiler is smart enougth to simplify those codes into one line of assembly. The idea of simplifying is:

for(int i = 0; i < TARGET_COUNT_PER_THREAD; i++){
    g_counter = g_counter + 1;
}

$\downarrow$

for(int i = 0; i < 5000; i++){
    g_counter = g_counter + 1;
}

$\downarrow$

g_counter = g_counter + 5000;

$\downarrow$

addl   $0x1388,0xb00    ;0x1388 == 5000, &g_counter == 0xb00

The one-line assembly is atomic. Therefore, there is no race condition in first case.

However, compiler can not simplify the second case, which cause the race condition. In order to avoid race condition, we need to use semaphore to create a critical section.

Semaphore

To achieve all the functions of semaphore, we implemented sem_init(), sem_destroy(), sem_wait() and sem_signal() four system calls.

sem_init() initialize the id and size of semaphore.
sem_destroy() delete the target semaphore.
sem_wait() wait until a unit of the resource becomes available, and decrease semaphore value by given number.
sem_signal() increase the value of semaphore variable by given number.

Finally, we can use semaphore inside our multi-threaded counter.

Final Case

First, we use sem_init() to initialize the size of the semaphore. Because only one thread can access g_counter at one time, the size of sem_size should be 1. We also set the id of semaphore as SEMAPHORE_NUM.

Inside the for loop in counter(), we put the actually code between sem_wait() and sem_signal() to create critical section.

After finishing all the computing, we use sem_destroy() to delete the semaphore.

The structure of the final case should be look like this:

#define NUM_THREADS 8
#define TARGET_COUNT_PER_THREAD 5000
#define SEMAPHORE_NUM 0

int g_counter;

void counter(void *arg)
{
  for(int i = 0; i < TARGET_COUNT_PER_THREAD; i++){
    sem_wait(SEMAPHORE_NUM, 1);
    int temp = g_counter;
    temp = temp + 1;
    sleep(0);
    g_counter = temp;
    sem_signal(SEMAPHORE_NUM, 1);
  }
  exit();
}

int main(int argc, char **argv)
{
  int sem_size = 1;
  int final_target = NUM_THREADS * TARGET_COUNT_PER_THREAD;
  g_counter = 0;
  
  sem_init(SEMAPHORE_NUM, sem_size);

  void *stacks[NUM_THREADS];
  int *args[NUM_THREADS];
  for(int i = 0; i < NUM_THREADS; i++){
    stacks[i] = (void*) malloc(4096);
    args[i] = (int*) malloc(4);
    *args[i] = i;
  }

  for(int i = 0; i < NUM_THREADS; i++){
    clone(counter, args[i], stacks[i]);
  }

  for(int i = 0; i < NUM_THREADS; i++){
    void *joinstack;
    join(&joinstack);
  } 

  printf(1, "g_counter is %d\n", g_counter);
  printf(1, "final_target is %d\n", final_target);
  if(g_counter == final_target)
    printf(1, "TEST PASSED!\n");
  else
    printf(1, "TEST FAILED!\n");
    
  sem_destroy(SEMAPHORE_NUM);

  exit();
}

Flow

We can conclude the flow of multi-threaded counter with two simple flow charts.

The first one is main function, which do five things:

Initialize counter, semaphore, stack and arguments.
Create all threads.
Join all threads.
Verify correctness of counter.
Destroy semaphore

The second one is counter() function, which create critical section inside for loop.

😇 成功畫面

In the final case, we add some messages while initializing semaphore and threads.

🏃 實作過程(修改哪些檔案[含圖片])

During the development, we separated the development process into two parts. One is semaphore, another is thread. In order to add system calls, we need to modify several files.

Semaphore

In defs.h, we add system call prototype for semaphore.c.

@@ -9,6 +9,7 @@ struct spinlock;
 struct sleeplock;
 struct stat;
 struct superblock;
+struct semaphore;
 
 // bio.c
 void            binit(void);
@@ -186,5 +187,12 @@ void            switchkvm(void);
 int             copyout(pde_t*, uint, void*, uint);
 void            clearpteu(pde_t *pgdir, char *uva);
 
+//semaphore.c
+void            seminit(void);
+int             sem_init(int, int);
+int             sem_destroy(int);
+int             sem_wait(int, int);
+int             sem_signal(int, int);
+
 // number of elements in fixed-size array
 #define NELEM(x) (sizeof(x)/sizeof((x)[0]))

In semaphore.c, we implement all semaphore system calls, including sem_init(), sem_destroy(), sem_wait() and sem_signal().

@@ -0,0 +1,81 @@
+#include "types.h"
+#include "x86.h"
+#include "defs.h"
+#include "param.h"
+#include "spinlock.h"
+
+#define SEMNUM 32
+
+struct semaphore {
+  int value;
+  int active;
+  struct spinlock lock;
+} sem[SEMNUM];
+
+void
+seminit(void)
+{
+  for(int i = 0; i < SEMNUM; ++i) {
+    initlock(&sem[i].lock, "semaphore");
+    sem[i].active = sem[i].value = 0;
+  }
+}
+
+int
+sem_init(int num, int max)
+{
+  if(num < 0 || num >= SEMNUM) return -1;
+
+  acquire(&sem[num].lock);
+  if(sem[num].active != 0) return -1;
+  sem[num].value = max;
+  sem[num].active = 1;
+  release(&sem[num].lock);
+
+  return 0;
+}
+
+int
+sem_destroy(int num)
+{
+  if(num < 0 || num >= SEMNUM) return -1;
+
+  acquire(&sem[num].lock);
+  if(sem[num].active != 1) return -1;
+  sem[num].active = 0;
+  release(&sem[num].lock);
+
+  return 0;
+}
+
+int
+sem_wait(int num, int count)
+{
+  if(num < 0 || num >= SEMNUM) return -1;
+
+  acquire(&sem[num].lock);
+  if(!sem[num].active) return -1;
+  while(sem[num].value <= count - 1) {
+    sleep(&sem[num], &sem[num].lock);
+  }
+  sem[num].value -= count;
+  release(&sem[num].lock);
+
+  return 0;
+}
+
+int
+sem_signal(int num, int count)
+{
+  if(num < 0 || num >= SEMNUM) return -1;
+
+  acquire(&sem[num].lock);
+  if(!sem[num].active) return -1;
+  sem[num].value += count;
+  if(sem[num].value > 0){
+    wakeup(&sem[num]);
+  }
+  release(&sem[num].lock);
+
+  return 0;
+}

In syscall.h, we add some system call numbers.

@@ -20,3 +20,7 @@
 #define SYS_link   19
 #define SYS_mkdir  20
 #define SYS_close  21
+#define SYS_sem_init    22
+#define SYS_sem_destroy 23
+#define SYS_sem_wait    24
+#define SYS_sem_signal  25

In syscall.c, we add system call wrapper prototype for semaphore.

@@ -103,6 +103,10 @@ extern int sys_unlink(void);
 extern int sys_wait(void);
 extern int sys_write(void);
 extern int sys_uptime(void);
+extern int sys_sem_init(void);
+extern int sys_sem_destroy(void);
+extern int sys_sem_wait(void);
+extern int sys_sem_signal(void);
 
 static int (*syscalls[])(void) = {
 [SYS_fork]    sys_fork,
@@ -126,6 +130,10 @@ static int (*syscalls[])(void) = {
 [SYS_link]    sys_link,
 [SYS_mkdir]   sys_mkdir,
 [SYS_close]   sys_close,
+[SYS_sem_init]     sys_sem_init,
+[SYS_sem_destroy]  sys_sem_destroy,
+[SYS_sem_wait]     sys_sem_wait,
+[SYS_sem_signal]   sys_sem_signal
 };

In sysproc.c, we implement system call wrapper for semaphore.

@@ -89,3 +89,42 @@ sys_uptime(void)
   release(&tickslock);
   return xticks;
 }
+
+int
+sys_sem_init(void)
+{
+  int num, max;
+  if(argint(0, &num) < 0) return -1;
+  if(argint(1, &max) < 0) return -1;
+
+  return sem_init(num, max);
+}
+
+int
+sys_sem_destroy(void)
+{
+  int num;
+  if(argint(0, &num) < 0) return -1;
+
+  return sem_destroy(num);
+}
+
+int
+sys_sem_wait(void)
+{
+  int num, count;
+  if(argint(0, &num) < 0) return -1;
+  if(argint(1, &count) < 0) return -1;
+
+  return sem_wait(num, count);
+}
+
+int
+sys_sem_signal(void)
+{
+  int num, count;
+  if(argint(0, &num) < 0) return -1;
+  if(argint(1, &count) < 0) return -1;
+
+  return sem_signal(num, count);
+}

In user.h, we add system call prototype for user program.

@@ -23,6 +23,10 @@ int getpid(void);
 char* sbrk(int);
 int sleep(int);
 int uptime(void);
+int sem_init(int, int);
+int sem_destroy(int);
+int sem_wait(int, int);
+int sem_signal(int, int);

In usys.S, we add system calls to system call table.

@@ -29,3 +29,7 @@ SYSCALL(getpid)
 SYSCALL(sbrk)
 SYSCALL(sleep)
 SYSCALL(uptime)
+SYSCALL(sem_init)
+SYSCALL(sem_destroy)
+SYSCALL(sem_wait)
+SYSCALL(sem_signal)

In main.c, we add seminit() to initialize semaphore table.

@@ -27,6 +27,7 @@ main(void)
   consoleinit();   // console hardware
   uartinit();      // serial port
   pinit();         // process table
+  seminit();       // semaphore table

Thread

In defs.h, we add system call prototype for proc.c.

@@ -121,6 +121,8 @@ void            userinit(void);
 int             wait(void);
 void            wakeup(void*);
 void            yield(void);
+int             clone(void*, void*, void*);
+int             join(void**);

In proc.h, we add some variables in struct proc.

@@ -49,6 +49,8 @@ struct proc {
   struct file *ofile[NOFILE];  // Open files
   struct inode *cwd;           // Current directory
   char name[16];               // Process name (debugging)
+  int isthread;                // Is thread or not
+  int stack;                   // thread's stack address
 };

In proc.c, we implement all thread system calls, including clone() and join(). We also tweak the exit() system call.

@@ -256,6 +256,12 @@ exit(void)
   for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
     if(p->parent == curproc){
       p->parent = initproc;
+      // clean up the kernel stack of child threads
+      if(p->isthread == 1){
+        p->state = UNUSED;
+        kfree(p->kstack);
+        p->kstack = 0;
+      }
       if(p->state == ZOMBIE)
         wakeup1(initproc);
     }
@@ -532,3 +538,114 @@ procdump(void)
     cprintf("\n");
   }
 }
+
+int
+clone(void* function, void *arg, void *stack)
+{
+  int i, pid;
+  struct proc *np;
+  struct proc *curproc = myproc();
+
+  // Allocate process.
+  if((np = allocproc()) == 0){
+    return -1;
+  }
+
+  // Copy process state from proc.
+  //if((np->pgdir = copyuvm(curproc->pgdir, curproc->sz)) == 0){
+  //  kfree(np->kstack);
+  //  np->kstack = 0;
+  //  np->state = UNUSED;
+  //  return -1;
+  //}
+
+  np->sz = curproc->sz;
+  //np->parent = curproc;
+  // if the process calling clone is itself a thread,
+  // copy its parent to the new thread
+  if(curproc->isthread == 0){
+    np->parent = curproc;
+  }
+  else{
+    np->parent = curproc->parent;
+  }
+  *np->tf = *curproc->tf;
+
+  // Clear %eax so that fork returns 0 in the child.
+  np->tf->eax = 0;
+
+  // reallocate old process's page table to new process
+  np->pgdir = curproc->pgdir;
+
+  // modified the return ip to thread function
+  np->tf->eip = (int)function;
+
+  // modified the return ip to thread function
+  np->isthread = 1;
+
+  // modified the stack
+  np->stack = (int)stack;
+  np->tf->esp = (int)stack + 4092;
+  *((int *)(np->tf->esp)) = (int)arg; // push the argument
+  *((int *)(np->tf->esp - 4)) = 0xFFFFFFFF; // push the return address
+  np->tf->esp -= 4;
+
+  for(i = 0; i < NOFILE; i++)
+    if(curproc->ofile[i])
+      np->ofile[i] = filedup(curproc->ofile[i]);
+  np->cwd = idup(curproc->cwd);
+
+  safestrcpy(np->name, curproc->name, sizeof(curproc->name));
+
+  pid = np->pid;
+
+  acquire(&ptable.lock);
+
+  np->state = RUNNABLE;
+
+  release(&ptable.lock);
+
+  return pid;
+}
+
+int
+join(void **stack)
+{
+  struct proc *p;
+  int havekids, pid;
+  struct proc *curproc = myproc();
+  
+  acquire(&ptable.lock);
+  for(;;){
+    // Scan through table looking for exited children.
+    havekids = 0;
+    for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
+      if(p->parent != curproc || p->isthread != 1)
+        continue;
+      havekids = 1;
+      if(p->state == ZOMBIE){
+        // Found one.
+        pid = p->pid;
+        kfree(p->kstack);
+        p->kstack = 0;
+        p->pid = 0;
+        p->parent = 0;
+        p->name[0] = 0;
+        p->killed = 0;
+        p->state = UNUSED;
+        release(&ptable.lock);
+        return pid;
+      }
+    }
+
+    // No point waiting if we don't have any children.
+    if(!havekids || curproc->killed){
+      release(&ptable.lock);
+      return -1;
+    }
+
+    // Wait for children to exit.  (See wakeup1 call in proc_exit.)
+    sleep(curproc, &ptable.lock);  //DOC: wait-sleep
+    *(int*)stack = curproc->stack;
+  }
+}

In syscall.h, we add some system call numbers.

@@ -24,3 +24,5 @@
 #define SYS_sem_destroy 23
 #define SYS_sem_wait    24
 #define SYS_sem_signal  25
+#define SYS_clone  26
+#define SYS_join   27

In syscall.c, we add system call wrapper prototype for thread.

@@ -107,6 +107,8 @@ extern int sys_sem_init(void);
 extern int sys_sem_destroy(void);
 extern int sys_sem_wait(void);
 extern int sys_sem_signal(void);
+extern int sys_clone(void);
+extern int sys_join(void);
 
 static int (*syscalls[])(void) = {
 [SYS_fork]    sys_fork,
@@ -133,7 +135,9 @@ static int (*syscalls[])(void) = {
 [SYS_sem_init]     sys_sem_init,
 [SYS_sem_destroy]  sys_sem_destroy,
 [SYS_sem_wait]     sys_sem_wait,
-[SYS_sem_signal]   sys_sem_signal
+[SYS_sem_signal]   sys_sem_signal,
+[SYS_clone]   sys_clone,
+[SYS_join]   sys_join
 };

In sysproc.c, we implement system call wrapper for thread.

@@ -128,3 +128,23 @@ sys_sem_signal(void)
 
   return sem_signal(num, count);
 }
+
+int
+sys_clone(void)
+{
+  int function, arg, stack;
+  if(argint(0, &function) < 0) return -1;
+  if(argint(1, &arg) < 0) return -1;
+  if(argint(2, &stack) < 0) return -1;
+
+  return clone((void*)function, (void*)arg, (void*)stack);
+}
+
+int
+sys_join(void)
+{
+  int stack;
+  if(argint(0, &stack) < 0) return -1;
+
+  return join((void**)stack);
+}

In user.h, we add system call prototype for user program.

@@ -27,6 +27,8 @@ int sem_init(int, int);
 int sem_destroy(int);
 int sem_wait(int, int);
 int sem_signal(int, int);
+int clone(void*, void*, void*);
+int join(void**);

In usys.S, we add system calls to system call table.

@@ -33,3 +33,5 @@ SYSCALL(sem_init)
 SYSCALL(sem_destroy)
 SYSCALL(sem_wait)
 SYSCALL(sem_signal)
+SYSCALL(clone)
+SYSCALL(join)

😎 結論

In this project, I think the implementation of the kernel thread is quite straight forward. All we need to do is modifying old system calls. However, dealing with race condition is not that easy. We have to think about when to acquire spinlock and when to release. We also need to well considered all the possible scenarios during the development. In sum, I learnt a lot from this project.

📅 組員分工

Due to lack of team members, I have to implement all the functions including user test programs.