Sunday, April 7, 2013

Unix in a Nutshell

!!! Attention: this article (and in fact my whole blog) have moved to site:
http://thecsbox.com/2013/05/04/unix-in-a-nutshell/ !!!


I recently read the original paper on Unix: “The Unix Time-Sharing System” (Ritchie, Thompson, 1974), and I’ve long been a Unix and Linux fan, so I thought I’ll take this occasion to write an article about what I consider are Unix’s main contributions and what makes it so interesting.

A little bit of history first...

Unix was written by Dennis Ritchie and Ken Thompson in the famous AT&T Bell Labs during the 1970s. The original hardware they used was a PDP-7 and the first version was written in assembly. This may sound incredible today, but at that time not many high level languages existed and it was natural to write something as complex and performance sensitive as an operating system in assembly. Unix is also being accredited for being the first widespread general-purpose operating system. While operating systems existed before Unix, they were usually produced by the hardware manufacturer and were thus very heterogeneous. The idea of having a separate OS provider and to have one OS support different hardware did not really exist before Unix.

Contributions of Unix

I consider the following points the most important contributions of Unix:
  • file system and file I/O
  • greatly assisted the widespread use of C
  • focus on simplicity and elegance in its design

The contribution to file systems was something I only learnt through reading the paper and during the lecture, where we were very lucky to have Prof. Kernighan tell us about the original atmosphere at Bell Labs where Unix was born. Before that file systems did not really exist and file I/O was everything but easy and homogeneous. In the paper it also says: “the most important role of Unix is to provide a file system”. While OS classes usually go through file systems close to the end of the semester, and we have gotten so used to file systems that we take them for granted, I suppose without Unix the way we access our hard disks and the directory tree structure that Windows, Mac and Linux all use today might have well ended up as something completely different.

The file I/O methods read() and write() that exist as system calls and functions in C and the idea of a file descriptor are also things that we still use today. Even with much higher level languages like Java, or when we’re programming for Android or iOS and everything looks quite different from the surface, the low-level system call will still be the read() and write() file I/O methods as described in the original paper in 1974. I think this is so amazing for some core functionalities to remain unchanged for forty years in computer science.

Another design principle they introduced related to files is the idea that “everything is a file”. We can still see this in Linux today, where there is a /dev folder which enables access to the hardware devices. For the programmer writing to a device is conceptually the same like writing to a file, and this greatly unifies the interface.

The importance of C for the development of Unix, and conversely the way Unix helped push forward the widespread use of C are also another major contribution. C was also written in Bell Labs by Dennis Ritchie, and they soon decided to rewrite Unix in C. It may seem obvious now to choose C for an operating system but in the paper they actually spent a whole paragraph arguing about the benefits of C over assembly, since the notion at that time was that “something as complex as an operating system [...] had to be written exclusively in assembly”.

The fact that many descendants of Unix are still in use today and that Unix has built-in TCP/IP networking support (which was actually added in a later version) are one of the main reasons why C is still so relevant today. It is hard to imagine operating systems or networking code to be written in any other language than C. While so many new high level languages have emerged since the invention of C, many widespread languages (like C++, Java and php) have their roots in C. I don’t know how the programming landscape would look like today without C.

Unix is also singular in its focus on simplicity and elegance in its design. When one looks at Unix and the programs that come with it, it’s essentially a set of small, neat, self-contained tools with which one can do powerful things. In this way it is very similar to C, where the language itself is small by today’s standards but still very powerful. I feel this focus on elegance and simplicity in design is something that has become somewhat forgotten nowadays, where programs and code take huge dimensions. In that respect, it can be very inspiring to read through and study the design and code of Unix.

Equally important, but more technical contributions are
  • process management
  • multi-user, access control
  • pipes, redirectors, background processes

The way process management is done in the original version is surprisingly similar to what we see in today’s operating systems influenced by Unix. The system calls fork() and exec() for process creation, as well as the ability to multi-task were already present back then. While different scheduling algorithms have evolved over time, the underlying idea and basic operations related to process management are surprisingly similar.

Unix is also known for being a multi-user operating system with effective access control, essentially for being an OS with well designed security features. Each user has a unique user id (UID) and a group id (GID) for the group it belongs to. Files manage access control by specifying file ownership and by specifying who is allowed to read, modify and execute a file. For people familiar with for example Linux, you will know that these mechanisms are still in use today. Android actually also makes heavy use of these access control mechanisms, by letting each process be a separate “user” and thus controlling the access to files and other resources. It’s quite impressive to still see such a widespread use of these mechanisms today, and it certainly was worth the time Thompson and Ritchie spent designing the security features.

Pipes, redirectors and the ability to create background processes were something already present in the original paper and are also still widely used today. For people familiar with the Linux shell, with the “|” operator you can create a pipe to connect the output of one program to the input of another program. With redirectors (“<” and “>”) you also have the ability to redirect input and output. With “&” behind a program name you can start the program as a background process, so that it doesn’t disturb your present interaction with the shell. All these mechanisms again enforce the Unix design philosophy that the combination of small programs can let you do powerful and complex things.

Other noteworthy things
  • mount command still in use today
  • superuser still in use today
  • programming at that time very different

Some other noteworthy things I’d like to point out are that the mount command for including removable volumes into the file system is still being used today to for example mount a usb drive. The powerful superuser (aka root) is also something that they designed into the original version.

And lastly something that I find striking is that programming back in the 70s was so entirely different from what it is today. There was no Internet, no high level languages, no advanced debuggers and much fewer programmers. Hacking back then was much closer to the hardware, with only very thin intermediate layers and not so many levels of indirection. Something like a java call stack with dozens of functions to trace back was not quite the case back then ;-) Since there was no Internet, programming involved a lot of social interaction actually, especially when you have that many talented people together like at Bell Labs. If you were writing a program for Unix in there and would stumble across a problem, you would basically just walk up to Ritchie or Thompson and ask them for help, instead of googling the bug up like we do it today. Pretty cool, isn’t it?

A few quotes in the end

A few quotes I like in the end to wrap this article up that became way longer than I expected.

In the introduction of the paper, talking about hardware requirements: “Unix can run on hardware costing as little as 40,000$” Oh well, I guess standards were different back then...

From the paper: “The success of Unix lies not so much in new invention but rather in the full exploitation of a carefully selected set of fertile ideas, and especially in showing that they can be the keys to the implementation of a small yet powerful operating system

The National Academy of Engineering about why they elected Thompson as a member: “for designing Unix, an operating system whose efficiency, breadth, power, and style have guided a generation’s exploitation of minicomputers”.

And by the way, Thompson and Ritchie also received the Turing Award for their work on Unix and operating system theory :-)

Addional Resources

Paper: The UNIX time-sharing system (Ritchie, Thompson). Published in the Communications of the ACM in 1974.(You need access to the ACM digital library, which is usually the case in a university network).

-------------------------------
Let me know what you think of this article and where you see room for improvement. I only just started this blog, so feedback of any kind is appreciated!


Related Articles:
What's the difference between TCP and UDP?
Should I study Computer Science?
How to prepare for Programming Interviews - Part I

No comments:

Post a Comment