writing a unix shell from scratch in c

system

what a shell actually does

a shell is a program that reads commands from the user, parses them, and executes them. that's it. the mystique around shells disappears once you build one.

under the hood, a shell does four things in a loop:

  1. print a prompt and read a line
  2. parse the line into tokens
  3. fork a child process
  4. exec the command in the child

the repl

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/wait.h>

#define MAX_ARGS 64
#define MAX_INPUT 1024

void run_command(char **args) {
    pid_t pid = fork();

    if (pid == 0) {
        // child process
        if (execvp(args[0], args) == -1) {
            perror("exec failed");
            exit(EXIT_FAILURE);
        }
    } else if (pid > 0) {
        // parent waits for child
        int status;
        waitpid(pid, &status, 0);
    } else {
        perror("fork failed");
    }
}

int main() {
    char input[MAX_INPUT];
    char *args[MAX_ARGS];

    while (1) {
        printf("mysh> ");
        fflush(stdout);

        if (!fgets(input, MAX_INPUT, stdin)) break;

        // strip newline
        input[strcspn(input, "\n")] = 0;

        // tokenize
        int argc = 0;
        char *token = strtok(input, " ");
        while (token && argc < MAX_ARGS - 1) {
            args[argc++] = token;
            token = strtok(NULL, " ");
        }
        args[argc] = NULL;

        if (argc == 0) continue;

        // built-in: exit
        if (strcmp(args[0], "exit") == 0) break;

        // built-in: cd
        if (strcmp(args[0], "cd") == 0) {
            if (args[1]) chdir(args[1]);
            continue;
        }

        run_command(args);
    }
    return 0;
}

why fork + exec

fork() creates an exact copy of the current process. exec() replaces the current process image with a new program. together they give you a clean environment for each command.

the reason for separating fork and exec is power: between fork and exec, you can redirect file descriptors, set environment variables, set process groups — all before the new program starts running.

implementing pipes

pipes connect the stdout of one process to the stdin of the next. the pipe() syscall gives you a pair of file descriptors:

void run_pipeline(char **left_args, char **right_args) {
    int pipefd[2];
    pipe(pipefd); // pipefd[0] = read end, pipefd[1] = write end

    pid_t left = fork();
    if (left == 0) {
        // left command writes to pipe
        dup2(pipefd[1], STDOUT_FILENO);
        close(pipefd[0]);
        close(pipefd[1]);
        execvp(left_args[0], left_args);
        exit(1);
    }

    pid_t right = fork();
    if (right == 0) {
        // right command reads from pipe
        dup2(pipefd[0], STDIN_FILENO);
        close(pipefd[0]);
        close(pipefd[1]);
        execvp(right_args[0], right_args);
        exit(1);
    }

    // parent closes both ends and waits
    close(pipefd[0]);
    close(pipefd[1]);
    waitpid(left,  NULL, 0);
    waitpid(right, NULL, 0);
}

signal handling

without signal handling, pressing Ctrl+C kills your shell instead of the running command. fix it:

#include <signal.h>

void sigint_handler(int sig) {
    // do nothing in the parent — the signal propagates to the child
    write(STDOUT_FILENO, "\n", 1);
}

// in main, before the repl loop:
signal(SIGINT, sigint_handler);

what to build next

  • I/O redirection: cmd > file and cmd < file using open() + dup2()
  • background jobs: cmd & — don't call waitpid(), track PIDs separately
  • job control: fg, bg, jobs using process groups and SIGTSTP
  • history: readline integration or a simple circular buffer

building a shell teaches you more about unix process management than any other project. every mysterious shell behavior suddenly makes sense.

Command Palette

Search for a command to run...