組長:B10431031 林言達
組員:
組員:
github: https://github.com/segnoda/xv6-public
When we wants to utilize all the threads of CPU in some programs such as image processing, rendering of animation or web server, we can use multi-threaded program to do so.
To demonstrate how to use multi-threaded program, we are going to write a multi-threaded counter by the following cases.
First, we create a global variable g_counter as a counter for all threads to use. g_counter should be first initialized as 0.
Then, we can start to create threads to increase the count of g_counter. For each thread, we need 4KB of memory as a stack and argument for the function. We use (void *)malloc(4096) to allocate from heap and (int *)malloc(4) for 4 byte single argument.
We also need a function to perform computing, which is void counter(void *args). In counter(), we use a for loop to increase g_counter for 5000 times. We decided to utilize 8 threads to execute counter(). Therefore, at the end of the program, we should have total number of 8 * 5000 = 40000 in g_counter. We can define a variable final_target to check the result of g_counter.
Finally, we can use clone() to create and start all the threads as well as join() to wait for all the threads.
The structure of the multi-threaded counter should be look like this:
#define NUM_THREADS 8
#define TARGET_COUNT_PER_THREAD 5000
int g_counter;
void counter(void *arg)
{
for(int i = 0; i < TARGET_COUNT_PER_THREAD; i++){
g_counter = g_counter + 1;
}
exit();
}
int main(int argc, char **argv)
{
int final_target = NUM_THREADS * TARGET_COUNT_PER_THREAD;
g_counter = 0;
void *stacks[NUM_THREADS];
int *args[NUM_THREADS];
for(int i = 0; i < NUM_THREADS; i++){
stacks[i] = (void*) malloc(4096);
args[i] = (int*) malloc(4);
*args[i] = i;
}
for(int i = 0; i < NUM_THREADS; i++){
clone(counter, args[i], stacks[i]);
}
for(int i = 0; i < NUM_THREADS; i++){
void *joinstack;
join(&joinstack);
}
printf(1, "g_counter is %d\n", g_counter);
printf(1, "final_target is %d\n", final_target);
if(g_counter == final_target)
printf(1, "TEST PASSED!\n");
else
printf(1, "TEST FAILED!\n");
exit();
}
As we run this program, we should get:
g_counter is 40000
final_target is 40000
TEST PASSED!
Always consider the worst case!
If we not just run g_counter = g_counter + 1 inside for loop. Instead, we have the following code:
void counter(void *arg)
{
for(int i = 0; i < TARGET_COUNT_PER_THREAD; i++){
int temp = g_counter;
temp = temp + 1;
sleep(0);
g_counter = temp;
}
exit();
}
Which do the same thing but in a more complex way, we get the result:
g_counter is 9730
final_target is 40000
TEST FAILED!
This is the so-called race condition.
We found that the reason of not having race condition in the first case is that the compiler is smart enougth to simplify those codes into one line of assembly. The idea of simplifying is:
for(int i = 0; i < TARGET_COUNT_PER_THREAD; i++){
g_counter = g_counter + 1;
}
for(int i = 0; i < 5000; i++){
g_counter = g_counter + 1;
}
g_counter = g_counter + 5000;
addl $0x1388,0xb00 ;0x1388 == 5000, &g_counter == 0xb00
The one-line assembly is atomic. Therefore, there is no race condition in first case.
However, compiler can not simplify the second case, which cause the race condition. In order to avoid race condition, we need to use semaphore to create a critical section.
To achieve all the functions of semaphore, we implemented sem_init(), sem_destroy(), sem_wait() and sem_signal() four system calls.
sem_init() initialize the id and size of semaphore.sem_destroy() delete the target semaphore.sem_wait() wait until a unit of the resource becomes available, and decrease semaphore value by given number.sem_signal() increase the value of semaphore variable by given number.Finally, we can use semaphore inside our multi-threaded counter.
First, we use sem_init() to initialize the size of the semaphore. Because only one thread can access g_counter at one time, the size of sem_size should be 1. We also set the id of semaphore as SEMAPHORE_NUM.
Inside the for loop in counter(), we put the actually code between sem_wait() and sem_signal() to create critical section.
After finishing all the computing, we use sem_destroy() to delete the semaphore.
The structure of the final case should be look like this:
#define NUM_THREADS 8
#define TARGET_COUNT_PER_THREAD 5000
#define SEMAPHORE_NUM 0
int g_counter;
void counter(void *arg)
{
for(int i = 0; i < TARGET_COUNT_PER_THREAD; i++){
sem_wait(SEMAPHORE_NUM, 1);
int temp = g_counter;
temp = temp + 1;
sleep(0);
g_counter = temp;
sem_signal(SEMAPHORE_NUM, 1);
}
exit();
}
int main(int argc, char **argv)
{
int sem_size = 1;
int final_target = NUM_THREADS * TARGET_COUNT_PER_THREAD;
g_counter = 0;
sem_init(SEMAPHORE_NUM, sem_size);
void *stacks[NUM_THREADS];
int *args[NUM_THREADS];
for(int i = 0; i < NUM_THREADS; i++){
stacks[i] = (void*) malloc(4096);
args[i] = (int*) malloc(4);
*args[i] = i;
}
for(int i = 0; i < NUM_THREADS; i++){
clone(counter, args[i], stacks[i]);
}
for(int i = 0; i < NUM_THREADS; i++){
void *joinstack;
join(&joinstack);
}
printf(1, "g_counter is %d\n", g_counter);
printf(1, "final_target is %d\n", final_target);
if(g_counter == final_target)
printf(1, "TEST PASSED!\n");
else
printf(1, "TEST FAILED!\n");
sem_destroy(SEMAPHORE_NUM);
exit();
}
We can conclude the flow of multi-threaded counter with two simple flow charts.
The first one is main function, which do five things:

The second one is counter() function, which create critical section inside for loop.

In the final case, we add some messages while initializing semaphore and threads.

During the development, we separated the development process into two parts. One is semaphore, another is thread. In order to add system calls, we need to modify several files.
In defs.h, we add system call prototype for semaphore.c.
@@ -9,6 +9,7 @@ struct spinlock;
struct sleeplock;
struct stat;
struct superblock;
+struct semaphore;
// bio.c
void binit(void);
@@ -186,5 +187,12 @@ void switchkvm(void);
int copyout(pde_t*, uint, void*, uint);
void clearpteu(pde_t *pgdir, char *uva);
+//semaphore.c
+void seminit(void);
+int sem_init(int, int);
+int sem_destroy(int);
+int sem_wait(int, int);
+int sem_signal(int, int);
+
// number of elements in fixed-size array
#define NELEM(x) (sizeof(x)/sizeof((x)[0]))
In semaphore.c, we implement all semaphore system calls, including sem_init(), sem_destroy(), sem_wait() and sem_signal().
@@ -0,0 +1,81 @@
+#include "types.h"
+#include "x86.h"
+#include "defs.h"
+#include "param.h"
+#include "spinlock.h"
+
+#define SEMNUM 32
+
+struct semaphore {
+ int value;
+ int active;
+ struct spinlock lock;
+} sem[SEMNUM];
+
+void
+seminit(void)
+{
+ for(int i = 0; i < SEMNUM; ++i) {
+ initlock(&sem[i].lock, "semaphore");
+ sem[i].active = sem[i].value = 0;
+ }
+}
+
+int
+sem_init(int num, int max)
+{
+ if(num < 0 || num >= SEMNUM) return -1;
+
+ acquire(&sem[num].lock);
+ if(sem[num].active != 0) return -1;
+ sem[num].value = max;
+ sem[num].active = 1;
+ release(&sem[num].lock);
+
+ return 0;
+}
+
+int
+sem_destroy(int num)
+{
+ if(num < 0 || num >= SEMNUM) return -1;
+
+ acquire(&sem[num].lock);
+ if(sem[num].active != 1) return -1;
+ sem[num].active = 0;
+ release(&sem[num].lock);
+
+ return 0;
+}
+
+int
+sem_wait(int num, int count)
+{
+ if(num < 0 || num >= SEMNUM) return -1;
+
+ acquire(&sem[num].lock);
+ if(!sem[num].active) return -1;
+ while(sem[num].value <= count - 1) {
+ sleep(&sem[num], &sem[num].lock);
+ }
+ sem[num].value -= count;
+ release(&sem[num].lock);
+
+ return 0;
+}
+
+int
+sem_signal(int num, int count)
+{
+ if(num < 0 || num >= SEMNUM) return -1;
+
+ acquire(&sem[num].lock);
+ if(!sem[num].active) return -1;
+ sem[num].value += count;
+ if(sem[num].value > 0){
+ wakeup(&sem[num]);
+ }
+ release(&sem[num].lock);
+
+ return 0;
+}
In syscall.h, we add some system call numbers.
@@ -20,3 +20,7 @@
#define SYS_link 19
#define SYS_mkdir 20
#define SYS_close 21
+#define SYS_sem_init 22
+#define SYS_sem_destroy 23
+#define SYS_sem_wait 24
+#define SYS_sem_signal 25
In syscall.c, we add system call wrapper prototype for semaphore.
@@ -103,6 +103,10 @@ extern int sys_unlink(void);
extern int sys_wait(void);
extern int sys_write(void);
extern int sys_uptime(void);
+extern int sys_sem_init(void);
+extern int sys_sem_destroy(void);
+extern int sys_sem_wait(void);
+extern int sys_sem_signal(void);
static int (*syscalls[])(void) = {
[SYS_fork] sys_fork,
@@ -126,6 +130,10 @@ static int (*syscalls[])(void) = {
[SYS_link] sys_link,
[SYS_mkdir] sys_mkdir,
[SYS_close] sys_close,
+[SYS_sem_init] sys_sem_init,
+[SYS_sem_destroy] sys_sem_destroy,
+[SYS_sem_wait] sys_sem_wait,
+[SYS_sem_signal] sys_sem_signal
};
In sysproc.c, we implement system call wrapper for semaphore.
@@ -89,3 +89,42 @@ sys_uptime(void)
release(&tickslock);
return xticks;
}
+
+int
+sys_sem_init(void)
+{
+ int num, max;
+ if(argint(0, &num) < 0) return -1;
+ if(argint(1, &max) < 0) return -1;
+
+ return sem_init(num, max);
+}
+
+int
+sys_sem_destroy(void)
+{
+ int num;
+ if(argint(0, &num) < 0) return -1;
+
+ return sem_destroy(num);
+}
+
+int
+sys_sem_wait(void)
+{
+ int num, count;
+ if(argint(0, &num) < 0) return -1;
+ if(argint(1, &count) < 0) return -1;
+
+ return sem_wait(num, count);
+}
+
+int
+sys_sem_signal(void)
+{
+ int num, count;
+ if(argint(0, &num) < 0) return -1;
+ if(argint(1, &count) < 0) return -1;
+
+ return sem_signal(num, count);
+}
In user.h, we add system call prototype for user program.
@@ -23,6 +23,10 @@ int getpid(void);
char* sbrk(int);
int sleep(int);
int uptime(void);
+int sem_init(int, int);
+int sem_destroy(int);
+int sem_wait(int, int);
+int sem_signal(int, int);
In usys.S, we add system calls to system call table.
@@ -29,3 +29,7 @@ SYSCALL(getpid)
SYSCALL(sbrk)
SYSCALL(sleep)
SYSCALL(uptime)
+SYSCALL(sem_init)
+SYSCALL(sem_destroy)
+SYSCALL(sem_wait)
+SYSCALL(sem_signal)
In main.c, we add seminit() to initialize semaphore table.
@@ -27,6 +27,7 @@ main(void)
consoleinit(); // console hardware
uartinit(); // serial port
pinit(); // process table
+ seminit(); // semaphore table
In defs.h, we add system call prototype for proc.c.
@@ -121,6 +121,8 @@ void userinit(void);
int wait(void);
void wakeup(void*);
void yield(void);
+int clone(void*, void*, void*);
+int join(void**);
In proc.h, we add some variables in struct proc.
@@ -49,6 +49,8 @@ struct proc {
struct file *ofile[NOFILE]; // Open files
struct inode *cwd; // Current directory
char name[16]; // Process name (debugging)
+ int isthread; // Is thread or not
+ int stack; // thread's stack address
};
In proc.c, we implement all thread system calls, including clone() and join(). We also tweak the exit() system call.
@@ -256,6 +256,12 @@ exit(void)
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
if(p->parent == curproc){
p->parent = initproc;
+ // clean up the kernel stack of child threads
+ if(p->isthread == 1){
+ p->state = UNUSED;
+ kfree(p->kstack);
+ p->kstack = 0;
+ }
if(p->state == ZOMBIE)
wakeup1(initproc);
}
@@ -532,3 +538,114 @@ procdump(void)
cprintf("\n");
}
}
+
+int
+clone(void* function, void *arg, void *stack)
+{
+ int i, pid;
+ struct proc *np;
+ struct proc *curproc = myproc();
+
+ // Allocate process.
+ if((np = allocproc()) == 0){
+ return -1;
+ }
+
+ // Copy process state from proc.
+ //if((np->pgdir = copyuvm(curproc->pgdir, curproc->sz)) == 0){
+ // kfree(np->kstack);
+ // np->kstack = 0;
+ // np->state = UNUSED;
+ // return -1;
+ //}
+
+ np->sz = curproc->sz;
+ //np->parent = curproc;
+ // if the process calling clone is itself a thread,
+ // copy its parent to the new thread
+ if(curproc->isthread == 0){
+ np->parent = curproc;
+ }
+ else{
+ np->parent = curproc->parent;
+ }
+ *np->tf = *curproc->tf;
+
+ // Clear %eax so that fork returns 0 in the child.
+ np->tf->eax = 0;
+
+ // reallocate old process's page table to new process
+ np->pgdir = curproc->pgdir;
+
+ // modified the return ip to thread function
+ np->tf->eip = (int)function;
+
+ // modified the return ip to thread function
+ np->isthread = 1;
+
+ // modified the stack
+ np->stack = (int)stack;
+ np->tf->esp = (int)stack + 4092;
+ *((int *)(np->tf->esp)) = (int)arg; // push the argument
+ *((int *)(np->tf->esp - 4)) = 0xFFFFFFFF; // push the return address
+ np->tf->esp -= 4;
+
+ for(i = 0; i < NOFILE; i++)
+ if(curproc->ofile[i])
+ np->ofile[i] = filedup(curproc->ofile[i]);
+ np->cwd = idup(curproc->cwd);
+
+ safestrcpy(np->name, curproc->name, sizeof(curproc->name));
+
+ pid = np->pid;
+
+ acquire(&ptable.lock);
+
+ np->state = RUNNABLE;
+
+ release(&ptable.lock);
+
+ return pid;
+}
+
+int
+join(void **stack)
+{
+ struct proc *p;
+ int havekids, pid;
+ struct proc *curproc = myproc();
+
+ acquire(&ptable.lock);
+ for(;;){
+ // Scan through table looking for exited children.
+ havekids = 0;
+ for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
+ if(p->parent != curproc || p->isthread != 1)
+ continue;
+ havekids = 1;
+ if(p->state == ZOMBIE){
+ // Found one.
+ pid = p->pid;
+ kfree(p->kstack);
+ p->kstack = 0;
+ p->pid = 0;
+ p->parent = 0;
+ p->name[0] = 0;
+ p->killed = 0;
+ p->state = UNUSED;
+ release(&ptable.lock);
+ return pid;
+ }
+ }
+
+ // No point waiting if we don't have any children.
+ if(!havekids || curproc->killed){
+ release(&ptable.lock);
+ return -1;
+ }
+
+ // Wait for children to exit. (See wakeup1 call in proc_exit.)
+ sleep(curproc, &ptable.lock); //DOC: wait-sleep
+ *(int*)stack = curproc->stack;
+ }
+}
In syscall.h, we add some system call numbers.
@@ -24,3 +24,5 @@
#define SYS_sem_destroy 23
#define SYS_sem_wait 24
#define SYS_sem_signal 25
+#define SYS_clone 26
+#define SYS_join 27
In syscall.c, we add system call wrapper prototype for thread.
@@ -107,6 +107,8 @@ extern int sys_sem_init(void);
extern int sys_sem_destroy(void);
extern int sys_sem_wait(void);
extern int sys_sem_signal(void);
+extern int sys_clone(void);
+extern int sys_join(void);
static int (*syscalls[])(void) = {
[SYS_fork] sys_fork,
@@ -133,7 +135,9 @@ static int (*syscalls[])(void) = {
[SYS_sem_init] sys_sem_init,
[SYS_sem_destroy] sys_sem_destroy,
[SYS_sem_wait] sys_sem_wait,
-[SYS_sem_signal] sys_sem_signal
+[SYS_sem_signal] sys_sem_signal,
+[SYS_clone] sys_clone,
+[SYS_join] sys_join
};
In sysproc.c, we implement system call wrapper for thread.
@@ -128,3 +128,23 @@ sys_sem_signal(void)
return sem_signal(num, count);
}
+
+int
+sys_clone(void)
+{
+ int function, arg, stack;
+ if(argint(0, &function) < 0) return -1;
+ if(argint(1, &arg) < 0) return -1;
+ if(argint(2, &stack) < 0) return -1;
+
+ return clone((void*)function, (void*)arg, (void*)stack);
+}
+
+int
+sys_join(void)
+{
+ int stack;
+ if(argint(0, &stack) < 0) return -1;
+
+ return join((void**)stack);
+}
In user.h, we add system call prototype for user program.
@@ -27,6 +27,8 @@ int sem_init(int, int);
int sem_destroy(int);
int sem_wait(int, int);
int sem_signal(int, int);
+int clone(void*, void*, void*);
+int join(void**);
In usys.S, we add system calls to system call table.
@@ -33,3 +33,5 @@ SYSCALL(sem_init)
SYSCALL(sem_destroy)
SYSCALL(sem_wait)
SYSCALL(sem_signal)
+SYSCALL(clone)
+SYSCALL(join)
In this project, I think the implementation of the kernel thread is quite straight forward. All we need to do is modifying old system calls. However, dealing with race condition is not that easy. We have to think about when to acquire spinlock and when to release. We also need to well considered all the possible scenarios during the development. In sum, I learnt a lot from this project.
Due to lack of team members, I have to implement all the functions including user test programs.