In this course we always and only use the gcc compiler distributed by the GNU Project at http://gcc.gnu.org/. The GNU gcc compiler is open source and free. It is known in industry and academia as a very "tried and true" reference quality, highly supported implementation of the ANSI standard for the C language.
C allows programmers to make mistakes which create an unsafe and unpredictable state while making no promise to detect, warn or recover from the bad things it allows us to do. If you index beyond the end of an array or dereference a pointer to memory that you don't own, the C compiler does not promise to detect such infractions and throw an exception to force you to safely handle the situation. In some cases your program will crash at the point of the violation. This is the best case scenario. Unfortunately, what often happens is that it continues to execute in a corrupted state and may run to completion but produce incorrect results. Sometimes your program will execute for a while only to crash at a seemingly unrelated point later. Other times it runs to completion and produces no detectable error or mistake. This last scenario is analogous to playing Russian Roulette and walking away unscathed.
The bad news is that once you violate the rules of the C compiler, the behavior of your program is no longer guaranteed by the C compiler. In such a case your program's behavior may be determined in part by the hardware and software platform it is running on. As a result your program may act quite differently from execution to execution. More bad news is that if your program crashes, it may crash at a point that has no apparent relation to the place where you did the bad thing that setup the disaster. The best thing that can happen when you violate your program is that it crashes right away. The worst case scenario is that it never crashes and you get different results running the same code on a different platform or even get varying results on the same machine from run to run. This notion of a program going into an undefined state and behaving differently each time you run it is something students have a hard time understanding. Novices in C continually demonstrate this by saying things like "But it ran fine on my Linux machine! How could it crash on the Mac?". Such statements show that the notion of undefined state is completely foreign to those who have cut their teeth on a language like Java that watches every line of code that is executing to detect violations of integrity such as indexing past the end of an array or parsing a number that is too big for a variable (value overflow). The C compiler trades away that security for performance. Make no mistake, the performance hit borne by a language like Java to do this kind of sand boxing is quite significant. C is leaner and meaner but leaves you vulnerable to a bizarre new universe of undefined state and unpredictable behavior if you violate the rules.
The only cure for such vulnerability is prevention. The best heuristic for prevention is a thorough understanding of memory, addressing, pointers and platform dependence. Only when these concepts are clearly understood can a set of good and safe practices be derived. In this first chapter we introduce the interactive Unix command shell which is the environment you will live in. We then begin to lay the foundation for understanding memory organization and how C uses it to store variables. Chapter one will also serve as a preview to the entire course - getting you to where you can write a simple C program and use simple Unix tools to verify it. This chapter's goal is to let you see the possibilities for combining C and Unix to bring clarity and simplicity to your development cycle. This breadth first overview will open your eyes and pique your interest for what's to come. Before we write "Hello World" in C, we need to familiarize ourselves with Unix so we can log in, navigate files and directories, edit and compile. We start with command shells.
The Unix operating system has evolved into many variants of Linux which support Windows-like GUIs. This course is not interested in any GUI interfaces to Unix. Instead we focus on the command line interface. When we connect to Unix we connect to a command shell which is not a GUI and does not support drag and drop or other Windows like operations. Most connection programs will however let you do some basic copy/paste to/from the session window. Unix command shells present a window that displays a prompt (such as the % sign) at the start of the line and waits for a command to be typed in. You execute that command by hitting return then wait for it to complete. You know that command has finished when the prompt comes back to you. There are several popular shells available on every Unix system. When you log into your Unix account you are assigned some shell chosen by you or whoever set up your Unix account. It is not uncommon for Unix users to have strong preferences about which shell is best to use for what. Most agree however that some shells are better suited for interactive command line work, while other shells are better for scripting. Scripting is the art of writing programs which consist of shell commands. Such scripts are then executed as a program just like you might execute a Java or C program. All the shells will let you execute their commands at the command line or batched together in a file. The difference is that when you execute shell commands inside the script file you have access to more than just the one-liner commands. Inside the script file you have access to that shell's programming language syntax with conditionals, loops, function calls and more. When you use a shell for scripting, the differences between shells begin to show - especially as you do more complex and interesting things.
In this course we will be connecting to our Pitt Linux machines to do our C language work. Most students are assigned a tcsh shell connection by default at login time. If you are an experienced shell user you can of course change your shell to one of the other variants available on Pitt's Linux network.
We just made a distinction between using a shell for interactive commands and using a shell for scripting. Its time to illustrate and explain those concepts with some concrete examples. We assume that you have been shown how to log onto your school or institution's Unix machine and authenticate a connection to your user account. We will not teach shell scripting in this course. We just want to show you an example of command line interaction with a shell vs. writing a script containing shell commands, so that you understand the distinction between the two.
The screen shot below illustrates an interactive shell session. This session consists of the login and then proceeds to issue some shell commands. You may often hear the terms shell commands and Unix commands used interchangeably. Every command is just a program. Someone had to write it. The commands that come with Unix are programs. Soon we will write C programs, shell scripts and PERL scripts that are executable and can be used just like the built in Unix commands. If you write a C program and compile it into an executable binary file named a.out, then a.out can be executed by typing its name on the command line. It becomes a verb or command - even though it was written by you and is not a built in command. Whatever name they are called by - programs, commands, scripts, verbs or executables - they were written by someone using some programming language and they are stored in a file somewhere. They can be executed by invoking their executable file's name on the command line. A simple Unix command named where will even tell you where (the full directory path) on your Unix machine a command's executable file is located . The built in Unix commands that you invoke from the command line are generally called commands but some are often referred to as utilities. Commands and utilities are very small programs that are specific and narrow in their function. Whatever shell you are running on Unix will have many built in commands and utilities.
We will demonstrate a few commonly used commands that all the shells have in common. Commands that allow you to list and read files, navigate directories and get information about your Unix system and environment.
| echo  $SHELL | The echo command is printing the value of an environment variable named SHELL. The $ sign in front of SHELL tells echo to print the value of SHELL not the literal text SHELL. The value of the SHELL variable is the full path to whatever shell program we are running under and interacting with right now. My shell is /bin/csh - also known as the C shell. Notice we are getting the full path of the program that is our shell. The actual program name is csh and it lives in the /bin directory right off the root of our Unix machine's file hierarchy. My C shell offers a percent sign as the command prompt. Your C shell may be set up to use $ sign or some other string as the prompt. | 
| set  prompt="$ " | Allows us to change the % prompt to something other than the percent sign. I like the $ sign followed by a space. | 
| pwd | Print working directory tells us what directory we're in. | 
| cd C | Steps us into the C subdirectory. | 
| ls *.c | Shows only files in this dir that end with .c | 
| ls -l *.c | Lists each file on its own line with properties of each file such as access privileges, owner, size, date etc. | 
The above session was an example of command line interaction with a shell.
A shell script can be as simple as a text file containing a few one-liner commands. All you need to do to write a shell script is use the very first line of your script file to specify which shell you want to interpret the commands that follow in the file. Remember, there are at least a half dozen different popular Unix shells and each of those shells will let you use them interactively on the command line, or to execute commands in a file. Every time you write a script file you must specify which shell program you want to interpret (execute) the commands in the file. Below is a sample shell script file that uses the bourne shell located at /bin/sh to interpret the commands in the file. Shell script files are usually named with a .sh extension to indicate to readers what they are.
|  | 
We just said this but its worth repeating to make sure the distinction between command shell and shell script is clear. The above shell script is just a text file containing a few one-liner shell commands that could have just as easily been typed at the terminal interactively. When you write a shell script however you gain access to that shell's entire programming language of loops, conditionals and function calls. Of course you can always write a shell script that does not take advantage of the programming syntax. Like the shell script above. No loops or conditionals - just a sequence of simple one-liner shell commands to be executed sequentially.
The #! must be the first two characters in the first line of the file. It is called a SHA BANG and is followed by the path to the shell program that we want to be interpreting/executing the commands. At the very end of the file our last command is an echo which is fed two arguments. The first is a literal string and the second is the value of an environment variable named SHELL.
No executable program, script or command file can be executed unless that file's execution flag is turned on.
-rw-r--r-- 1 tlh staff 203 Jun 4 14:24 shellscript.sh
Looking above at the first 10 characters of the directory info for shellscript.sh we see
 
	These help pages are probably not as user friendly as most novices would like, but they are helpful. Use the man command to get help on any command. Just type man command which displays the manual pages on a command. Below we view the man pages on the chmod command:
|  | 
Here are some commands that will come in handy for the work we do in Chapter 1 in general.
Many of these commands report their output to stdout which is one of several standard streams in Unix. Stdout represents the screen (terminal, monitor, console). When we say that a command prints this or that, we mean it displays text to stdout. In Unix a stream is a source or destination for data. Some commands take their input from stdin which is the keyboard. Some streams are associated with an I/O device. For example stdin being the keyboard and stdout being the screen. We will talk about the other standard streams later.
Connections and users
| whoami | prints the ID of the user currently connected/logged-in to this shell/command window. It may seem redundant to ask the system who you are, but this command becomes useful if you encounter an abandoned terminal that someone has walked away from without logging out. By typing whoami you are shown the ID of that user. Should this ever happen, please be considerate and log that user out via the logout command | 
| logout | terminates your connection to the command shell and closes the connection window. Also kills any processes you may have started while logged in. | 
| who | prints a list of other users who are logged onto this Unix machine. | 
| finger userID | prints information about a particular user (user does not have to be logged in). | 
| whois domainName | is completely different than who, but you might accidentally use whois when you meant who. Try whois yahoo.com and see what you get. | 
The following commands deal with files. The term filePath is used to mean either the full file path such as /usr/local/bin/ls or a relative file path such as ../documents/resume.txt or simply a file name such as resume.txt.
| pwd | prints the full path of the directory you are in right now (current working directory). | 
| cd dirPath | changes your current working directory to be dirPath. If you leave dirPath blank then you are taken to your home directory | 
| ls | prints a list of the files & directories in your current working directory. Hidden files (those starting with a dot ".") are not listed. | 
| ls dirPath | prints a list of the files & directories in directory dirPath. Hidden files (those starting with a dot ".") are not listed. | 
| ls -a | prints a list of files & directories in your directory. Hidden files/directories are also listed. Any file or directory that starts with a '.' (DOT) will not show up in a listing unless the -a switch is used. | 
| ls -l | prints a list of files & directories in your directory. One line per file. Meta info about each file is also displayed on the line. | 
| ls -al | combines the -a (all) switch with the -l (line) switch. All files & directories including hidden are displayed one per line with meta info about each file on the line. | 
| mkdir dirPath | creates a new directory. | 
| cat filePath | prints contents of specified file. | 
| more filePath | prints contents of specified file one screen at a time. Space bar prints the next screenful. | 
| less filePath | similar to more. | 
| head filePath | prints first few lines at top of specified file. | 
| tail filePath | prints last few lines at bottom of specified file. | 
| file filePath | examines specified file and makes a good guess as to its type. | 
| cp fromFilePath toFilePath | copies specified file. | 
| mv fromFilePath toFilePath | same as cp except source file is deleted. | 
| rm filePath | deletes specified file(s). Warning! this command can be a WMD | 
| rmdir dirPath | delete specified directory (or directories). Warning! this command can be a WMD | 
| find dirPath -name '*.txt' | finds and lists all files under the specified directory (and its subdirectories) whose name ends with .txt | 
| grep hello *.txt | searches all .txt files in current directory hierarchy for the text "hello". For each match a filename and the matching line of text from the file is displayed. | 
| wc FilePath | for specified file(s), displays number of lines, words, chars and bytes in that file. | 
| diff FilePath1 FilePath2 | displays the line by line differences between two text files. | 
| comm FilePath1 FilePath2 | like diff but displays common lines instead of differences. | 
| cmp FilePath1 FilePath2 | like diff, but for binary files. | 
| date | displays current date and time | 
| od FilePath | displays specified binary file in octal or hex. | 
You have seen enough Unix to begin learning some C. We had to show you the basics of files and directories so you don't get lost when compiling and executing source files and executables. Now let's talk about bits, bytes and memory so that we can take our first look at C and get hands on with the gcc compiler.
In C, memory is an array of bytes. A char is always one byte wide. Every data type in C is represented as a chunk of bytes. Some data types require more bytes to store a value than others. C provides a sizeof operator which when applied to any data type's name (int, char, float etc.) or to any value producing expression (x, arr[i], &i etc.) returns an integer telling how many bytes of data that object or type uses. On any ANSI compliant C environment sizeof char is always one. A character always occupies exactly one byte of memory. All objects of all types in C are stored as a contiguous sequence of bytes. A byte is the smallest unit of memory that can be read/written from/to memory or a file. This notion of byte based representation is held in strong consensus as being a fundamental precept for the C language. All the statements in the above paragraph are true on all platforms for an ANSI compliant C environment. The one thing that is not guaranteed to be the same on all compliant platforms is the number of bits in a byte. The most common C environments use 8 bits per byte and use 32 bits for words/addresses. However, according to the documentation of the C89 standard:
"... on a machine with 36-bit words, a byte can be defined to consist of 9, 12, 18, or 36 bits, these numbers being all the exact divisors of 36 which are not less than 8. These strictures codify the widespread presumption that any object can be treated as an array of characters, the size of which is given by the sizeof operator with that object’s type as its operand. These definitions do not preclude holes in struct objects. Such holes are in fact often mandated by alignment and packing requirements. The holes simply do not participate in representing the composite value of an object."
As we will see, the sizeof operator is a very useful tool to find out platform specific information about your machine. For the rest of this course, our discussions and illustrations will use an 8 bit byte and 32 bit addressing platform since 8-bit / 32-bit still represents the majority of systems running C code at this writing.
Programs run in main memory (RAM). Think of memory as an array of bytes. Each byte is a chunk of bits (typically 8). Each bit is a single value that is either a 1 or a 0. Now think of memory as a long row of houses on one long street like "memory lane". Each house has its own mailing address. When you write a letter you address it to a house at some address. Likewise in RAM, the smallest chunk of memory that can be addressed is a byte. Every byte of memory gets its own address starting at 0 and ending at 4,294,967,295 which is 4 Gigabytes, or 2 to the 32'nd power.
This example is an illustration of a 32 bit addressing system. A system where addresses are stored in 32 bits and thus have a range of 0 through 4,294,967,295
| addr: 0 | addr: 1 | addr: 2 | addr: 3 | addr: 4 | addr: 5 | addr: 6 | addr: 7 | addr: 8 | addr: 9 | addr:10 | addr:11 | addr:12 | . . . | . . . | . . . | addr: 4,294,967,295 | 
| 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | . . . | . . . | . . . | 1 byte | 
Let's go back to our postal delivery analogy. Every house (byte of RAM) is located on memory lane and the first house (byte) is numbered 0 (zero). Each thereafter is numbered 1, 2 ,3 ,4 etc. Any letter addressed to a house is simply labeled with the house number. No names (yet). Suppose the envelope being sent only has space for 6 decimal digits in the house number field on the face of the envelope. How many houses could possibly get letters on memory lane? That's simple. 1,000,000. Six decimal digits means houses whose address is between 0 and 999,999 could be addressed. Houses with higher numbers simply cannot get mail because of the addressing limitation.
With this in mind, what does it mean to be on a 32 bit address platform? It means that addresses are expressed as a 32 bit value. As a result no more than 4 Gigs or (2 to the 32) bytes of memory can be addressed. How many bytes of memory can be addressed on an architecture with a 64 bit address space? The answer is 2 to the 64'th - the square of 4G which is a huge number. As such, 64 bit platforms (and there are plenty of 64 bit machines around) don't actually use all 64 bits to express an address. Its way overkill. No one could afford to buy that much RAM, nor could that much RAM be manufactured small enough (yet) to fit in a PC.
Let's look at how C uses memory to store variables on a typical 32 bit addressing platform.
C compilers on a 32 bit platform typically define an int variable to be the same size as an address value. 4 bytes. The GNU C compiler running on the AFS Unix machines at CMU is no exception. Let's look at some examples of how our C compiler stores variables that you might declare in your program. Assume a declaration such as int x. The compiler finds 4 consecutive bytes and names that memory x. What we mean by naming that memory x is that the name x is a synonym for the value of the variable x.
| addr: 0 | addr: 1 | addr: 2 | addr: 3 | addr: 4 | addr: 5 | 1'st byte of x | 2'nd byte of x | 3'rd byte of x | 4'th byte of x | addr:10 | addr:11 | addr:12 | . . . | . . . | . . . | addr: 4,294,967,295 | 
| 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | . . . | . . . | . . . | 1 byte | 
In this case the compiler picked a chunk of memory from byte #6 through byte #9. Four bytes to store a 32 bit integer.
Here is a char variable char c; being declared and storage illustrated below
| addr: 0 | addr: 1 | addr: 2 | addr: 3 | addr: 4 | addr: 5 | c | addr: 7 | addr: 8 | addr: 9 | addr:10 | addr:11 | addr:12 | . . . | . . . | . . . | addr: 4,294,967,295 | 
| 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | 1 byte | . . . | . . . | . . . | 1 byte | 
In this case the compiler picked byte #6. One byte to store a char. A short int on our platform is a 16 bit int and thus would use 2 bytes of storage. A float variable would be 4 bytes and a double variable would be 8 bytes of storage. The compiler always chooses the memory location to store a variable. The programmer has no input into this decision. The only exception to this rule is that once we learn how to allocate dynamic memory we will have the option to store variables in a region of memory known as the heap. In either case however, we cannot request any specific memory address for the storage of a variable.
Let's go back to our statement that the name of a variable is a synonym for the value of the memory.
int x;   /*  x is the name of a 4 byte chunk of memory in RAM */
x = 7;   /*   to change the value in x  we use its name on the left side of an assignment statement  */
int y = x + 3;   /*  to lookup the value of x we use its name in an expression */
	Once a variable is declared, the variable's name is a synonym for the contents (value) of the variable. There are other properties associated with variables such as the type of the variable (int, char, float, double, etc). A variable's type implies what range of values can be stored in it, and how much storage that type requires. If you come from a Java background you already understand range of values but the storage requirement property is not as much emphasized. Another property of a variable which is explicitly de-emphasized in (at least older versions of) Java is the address of a variable. The C language gives us operators to ask a variable how much storage it uses for its type and at what address that variable is stored.
Assume we have an int variable named x.
The sizeof operator is a very intelligent and useful operator that accepts not only explicit type names like int or variable names like x, but accepts certain kinds of expressions and tells the number of bytes required by the data type.
Here is a screenshot of output produced by a simple C program that we have written for you. It lists the primitive data types, their associated storage requirement and range of values. On different platforms, different results may be displayed. This output comes from execution on a 64 bit machine that is part of Carnegie Mellon's AFS network as of Fall 2009.
The output of sizeof.c on one of the 64 bit Linux machines running Carnegie Mellon's AFS (Andrew File System).
|  | 
Once we have written a few C programs and covered the basics, you can come back and read the source code of sizeof.c. It lists the built in primitive date types and shows which format specifiers are used to print those data types, along with the range of values that can be stored in that type. This little program is a handy reference.
So far we have looked at how memory is laid out and how the C compiler sets aside chunks of bytes for variables. We have seen how to ask the compiler where a variable is stored (&x) and how many bytes that variable is occupying (sizeof x). Let's now look at how the values of integers are encoded. We want to see for instance, what is the exact bit pattern used to represent 7, -13 or 157 in an integer variable.
Signed vs. unsigned numbers. Integers in C are either signed or unsigned. The unsigned are the simplest to understand. They are just binary numbers or base 2. We start with the smallest integer variable - the char type. A char is a one byte object. This fact is true across all platforms. The sizeof operator will always return 1 for a char. A char variable is most often used to store an ASCII value which represents a letter of the alphabet or punctuation character, but it is just a small integer. Sometimes a char is referred to as a byte but there is no byte type per se in C.
unsigned char c =157; /* the bit pattern representing decimal 157 is 10011101 */
Confirming 10011101 as the binary representation of 157 is done as follows:
| 2^7 | 2^6 | 2^5 | 2^4 | 2^3 | 2^2 | 2^1 | 2^0 | 
| 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | 
| 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 
| 128 | +0 | +0 | +16 | +8 | +4 | +0 | +1 | 
The decimal equivalent of 10011101 is 128+16+8+4+1==157.  Thus the statement unsigned char c = 157 would put the bit pattern 10011101 into the 8 bits of variable c.
char c = -7; stored?two's complement. In two's complement the high order bit is the sign bit. A 1 in the highest order bit means the number is negative. A 0 means the number is non negative. complement means that the negation of a number complements the absolute value. complementary angles add up to 90 deg. complementary binary numbers add to 0. Thus the complement of a positive number is simply whatever bit pattern you have to add to the number to get a sum of zero. There is a simple procedure to form the two's complement of a positive number. Just flip the bits and add 1. Let's do this with a simple value.
char c = 7;   /* bit pattern is: 00000111 */
c = -7;       /* now flip the bits to:   11111000 and add 1 which gives us: 11111001 */
  00000111
+
  11111001
  --------
  00000000
	two's complement is used by all signed numbers such as short int (16 bits), int (32 bits) and long int (again 32 bits). Signed numbers sacrifice their highest order bit for direction thus cutting the absolute magnitude in half. Unsigned numbers are encoded as simple binary and all the bits are for magnitude.
|  | 
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
		#include is a pre-compilation directive that instructs the compiler to replace the #include line with the text from the indicated file and compile it. Notice the name of the included file is in angle brackets. The file being pulled in for compilation lives in some predetermined directory where the C compiler was installed on your computer.  The find command can tell you where included files (or any other) are located on your system.  Soon you will write your own .h files and put them with your  main program file in its directory. In that case your include would look like #include "myfile.h". The quoted filename would be expected to be in the current directory. You could add path information to the file name such as #include "project1/myfile.h" or #include "../myfile.h" which would mean  "myfile.h" is located in a project directory or the directory above this one.
	
		The int main( int argc, char *argv[] )  function has some similarity to Java also in that just like a Java program, a C program must have exactly one main function (now that we are in C, we refer to methods as functions). The main function can be prototyped to return other types, but void and int are the most common. For this course we will adhere to the most common practice which is to return an int. The argc parameter indicates how many strings were entered on the command line that invoked this program. The argv parameter is the array of strings passed to the program on the command line when the program is invoked.  Once  we cover strings, pointers and arrays you will fully understand the syntax and semantics of argc and argv.
	
int x,y;printf("Hello World: Enter two small positive integers for x and y: ");
fflush(stdout); /* needed because output streams are buffered */
scanf("%d %d", &x, &y);
printf("You entered %d for x and %d for y\n", x,y);	C defines several streams. A stream is just a source or destination for data. The most commonly used are stdin (the keyboard), and stdout (the terminal or screen/monitor). These streams are always open and ready to be used. The printf function writes to stdout, while the scanf function reads from stdin. In the first printf we supply a string literal. Notice that we flush our stdout stream because I/O in C is buffered. Every time you write to stdout, (or any other stream) you are really writing to a buffer in memory. Only when that stream's buffer is flushed does the data actually show up on the stream/device. The ANSI standard says that unless you flush the stream you are not guaranteed that the data reaches the stream. In this case it means you might never see the text show up on your screen. Novices are often puzzled when debugging code with print statements because they are writing printf after printf such as "in function foo" and "returning from function foo" etc, and they are sure the program is going into foo and back out again but the output never shows up on the screen. The reason is they never flushed the buffer. In virtually every ANSI C compiler I've used, the compiler does a flush for you every time a newline is written to the stream. Many programmers take this for granted and treat newline '\n' as a flush of a text stream. However, it is not guaranteed behavior. In ANSI C only a flush of the stream guarantees the buffered output actually gets written to the stream - but, as just mentioned, virtually every compiler treats newline as a flush, and nearly every programmer takes advantage of this. One more thing about streams. Flushing only applies to output streams like stdout and stderr or a disk file you are writing to. You can't/don't flush input streams.
The scanf statement causes your program to stop and wait for the user to type some keystrokes followed by a RETURN. As soon as the user hits RETURN the entire sequence of keystrokes (string) is stored in the system buffer. Now the %d and %d embedded in the format string come into play. Those %ds are format specifiers for conversion to integer. We will soon survey the other conversion types like %s for string, %c for char, %f for float, and so on. Since there are two %ds the inputted string will be tokenized to extract the first two tokens. The first token will be converted to an integer, and the second will be converted to integer also. You can refer back to our sizeof.c program for an example of the primitive data types and the format specifiers used to print them.
The first converted value will be stored at memory address &x and the second successful converted value at memory address &y. Recall that the & (address-of) operator produces the address of the variable to its right. We are telling scanf to read a line of text from the keyboard, convert the first two tokens in the string to integers and store those integers inside x and y respectively.
Our scanf is a value returning function. It returns an int representing the number of successful conversions. Since our format string only had two % something or others in it, then our scanf can only return a value of 2, 1, or 0 depending on whether it found 2, 1, or 0 tokens that could be converted to int. In the case of an unsuccessful conversion, that respective variable does not get anything assigned into it. Unlike Java, scanf will not crash, throw an exception or give any indication if you enter your first name where a number was expected. It will simply fail to convert that string to a number and fail to store any new value at the address specified. It is the programmer's responsibility to check the value returned by scanf to verify that scanf returned the number two, if two conversions were requested.
The printf and scanf functions have another form which allows us to read/write stdin/stdout. This alternate form lets us pass in the name of the stream. This form is used to read/write text files but will take stdin/stdout as stream args just as well.
andfprintf( stdout, "Hello World" )is identical toprintf( "Hello World" )
fscanf( stdin, "%d %d", &x, &y )is identical toscanf( "%d %d", &x, &y );
Download hello-1.c to your Unix machine and cd into the directory where you put the file.
|  | 
		Notice I issued a shell command  pwd which printed the current working directory.  I then printed a listing of the files in the directory with an ls command. The only file in the directory was our hello-1.c which  I compiled with gcc  -W -Wall -Wextra -O2 hello-1.c  then listed the directory again.
	
Let's break this command down into its components and explain each one.
gcc is the program that compiles our code. Our GNU C compiler is itself, just another program.
			 -W -Wall -Wextra  -O2  these are switches that customize the behavior of our compilation. Remember we promised to show you how to get all the help the compiler can give you. Using these switches tells the compiler to apply more scrutiny to your code so that those things which can be detected at compile time will be reported to you as warnings and errors. The -O2  ("oh two" not "zero two") switch is calling for code optimization at a level of 2.  This course does not  address code optimization, but -O2 switch does detect use of un-initialized variables. There are many other switches you can use in your compilation command that we will not cover in this course. The history of how these switches came about  and what things they detect is a rather random and spurious.  As the language evolved switches were added or changed in a rather ad-hoc manner. For example -Wall means "warnings all". So you might think that means it warns on all infractions. Well, not quite. If you want to detect failure to use argv or argc then you must add -W which is just "warnings". Go figure.  Better yet, use them  and never ignore warnings.  In this course you are never allowed to hand in code with warnings.
			hello-1.c the list of files to be compiled. In this case its a list of just one file. That will change soon.
		
		Because a file named a.out showed up in our directory, we know the compilation succeeded. We did have some warnings, but no errors. There are some warnings that are always benign and some that are benign depending on what you are doing or not doing in the code. Many warnings are a foreboding of disaster. We will identify and qualify warnings as we encounter them. For now we will never ignore warnings because the dangerous ones must be fixed, and the benign ones are easy to fix. The warnings we just encountered are a result of the fact that we never used the argc and argv parameters. This warning is benign since we do not intend to pass anything in on the command line to this program. By ignoring argc and argv we're not missing anything. The dangerous scenario is when you use argv but never check argc to see if anything actually got sent in. As a result you try to read argv[1] but there is no argv[1] and you are now out of bounds in an array. We can ignore these warnings or we can change the prototype of our main such that there is no argc or argv.  Its a simple fix - just make main look like int main().  Go ahead and make the change right now and re-compile. If you want your executable to be named something other than a.out, then add the -o switch (dash little oh) to your command and put the name you want after the -o. For instance you could recompile with
		
			gcc -W -Wall -Wextra -O2  hello-1.c   -o hello-1.exe
		
Now that we have a clean compilation we can execute the program. Since I am the owner of this file, I should already have execute privileges on a.out Let's do an ls -l command to see who owns a.out and what the privileges on that file are.
|  | 
The -l switch on the ls command lists each file on its own line and gives several properties for the file. We are trying to verify we have privileges to execute this file. I can see that tlh is the owner and I have execution privileges. Typically if you compile a C program, you are granted execution rights to the executable file that is created. We have already seen the chmod command and can look in the man pages for help. Suppose we did not have execution privileges to this file. This happens when for instance we copy an executable from someone else's directory and they are the owner, not us, or if we download an executable to our machine. The fix is simple. The following discussion illustrates some alternative syntax for the chmod command.
|  | 
		The chmod command allowed us to turn on execution +x for a.out. Of course this example didn't really prove anything since I already had execute privileges. Let's turn off execution, look at the flags and try to execute. The shell will refuse to execute. Then we can turn execution back on, examine the flags and execute. To turn off execution we will use another form of the chmod command chmod 666 a.out. Think of each digit in 666 as an octal digit that represents three bits: owner, group, and world respectively. The first 6 changes the owner's rwx flags to 110 respectively. Thus owner now has rw- (no execution). The other two 6's do the same thing to the group and world flags. When we examine the flags after chmod 666 a.out they look like -rw-rw-rw-. We then use the same form of the command but this time with 755 to give us full privileges and give everyone else read and execute,  which is a typical security setting.
	
|  | 
We now know how to turn read, write and execution privileges on/off. Before we continue we should explain the syntax ./a.out. Note the ./ before the a.out. The dot character is a synonym for the current working directory. You can verify this by issuing an ls command with . as the arg. The / (slash character) is the directory path separator. So the ./ is the syntax that prepends the current working directory path to the file we want to execute. This is needed because whenever you type a verb (our program is a verb) on the command line, the shell assumes that verb is the name of an installed program in some public directory of installed programs. We don't want the shell looking in /usr/local/bin or some other default directory for our a.out file. We want the shell to execute the a.out that's right here in our working directory.
Modify our hello-1.c program such that it reports the values of x and y only if both numbers have good values in them.
Here is a sample solution: solution-1.c
In this solution we introduce more piece of the language: function prototypes. A function's prototype is its signature line with a semicolon after it. In our solution-1.c we prototyped the fatal function (which we wrote just now) above main as follows:
void fatal( char * msg );Like variables, functions must be declared before referenced. A prototype above main satisfies the compiler. We can then put the actual function body below main, or even in another file. Function prototypes are more than a convenience. They solve an intractable problem. Suppose you have two functions named functA and functB respectively. Suppose functB calls functA and functA calls functB. This is mutual recursion. Without prototypes it would be impossible to get such a program to compile. If you placed functA above functB the compiler would complain that you called functB before defining it. If you reverse the order you get the same complaint about functA being called before it has been defined. The only solution is to use prototypes which may be placed in any order as long as a function's prototype appears before the function is used. Prototypes are often placed inside a .h file which we will see soon. In our solution-1.c we just put our fatal function's prototype above the main function in the same source file where the function is called.
|  | 
All these problems are a result of the fact that functB was referenced before it was defined.
| mutual.c: In function `functA': mutual.c:12: warning: implicit declaration of function `functB' | If a function is referenced before declared (or prototyped) the compiler will assume the function returns an int and attempt to find the declaration later. This is what implicit declaration means. | 
| mutual.c: At top level: mutual.c:15: error: conflicting types for 'functB' | The compiler has finally seen the actual declaration and it conflicts with the implicit one made earlier. The actual declaration does not return int. As a result no actual declaration is found to match the implicit one and compilation fails. | 
| mutual.c:12: error: previous implicit declaration of 'functB' was here | The compiler points out where the implicit declaration was generated (which does not match the actual declaration found later). | 
As an exercise, follow the example of prototyping shown in solution-1.c. Fix the mutual.c program by putting prototypes for functA and functB above main, then move the actual function definitions below main. Warning! once you get it to compile cleanly, beware that if you execute it, it will run until you hit ^C to make it stop. Those recursive functions have no base case and will just keep calling each other until the program runs out of memory and crashes. Eventually we will understand recursion and use it correctly. This example was intended only to illustrate the canonical case for prototypes which is mutual recursion.
		Going back to our solution-1.c file, our fatal function takes a string parameter.  When main calls fatal it passes a literal string constant. The incoming parameter is prototyped as a char *  type which means pointer to character. Since a string is just a sequence of characters, a pointer to the first character of that sequence represents a pointer to a string. C however has no string type per-se and thus there is no such thing as a pointer to a string - only a pointer to a character.   We will cover strings in explicit detail in our next chapter. For now we show you how to pass strings  to a function and how to prototype the incoming string parameter as char *.  The exit function causes the program to exit immediately without returning back through the call chain.  EXIT_FAILURE is a #define value.
	
It is dangerous to ignore implicit declaration warnings because the compiler will attempt to find some function that matches the name of the function you called. It is possible that the compiler will find a function by the same name as the one you called, but that function is not the intended match. This happens when you are writing a new function and that function's name is the same as some obscure function that is part of your compiler. Further suppose that you call your function but you forget to actually write the function definition. In this case the compiler looks for your newly written version of the function but instead discovers an existing function by that same name and matches (links) them to make the compilation succeed. So, your program compiles but when you run it and that function is called, the code that gets executed is not what you intended and you get a crash or worse - unpredictable behavior.
Function prototype vs. function signature
A function's signature is how the compiler disambiguates it from all other functions. In C the signature consists of the name and args by number, type and order. The return type is not part of the signature and you cannot have two functions that differ only in return type.
Before we go on to some more Unix stuff let's revisit argc and argv with a program that will make it clear how to use them. This program contains a simple loop that prints each token passed into the program from the command line.
cmdargs.c|  | 
output of cmdargs.c
|  | 
In C, the for loop does not allow us to declare the loop counter variable inside the for statement as does Java. The counter must be declared outside, prior to the loop statement.
We now turn our attention to reading and writing files. Error checking will also be covered.
		Thus far we have seen how to read and write the two predefined text streams stdin and stdout using fscanf(stdin,"%d",&x) and fprintf(stdout,"x=%d",x). Those two streams are associated with text devices. We now show you how to use fprintf and fscanf to read/write text files.
	
Here is a program that demonstrates fscanf(), fprintf(), and a few other pieces of the language.
fileio-1.c|  | 
In the program above we are expecting the user to enter two values on the command line after the name of the executable. If the executable is named a.out then we are expecting the user to enter something like this on the command line.
./a.out 10 output.txtif (NULL==infile) rather than if (infile==NULL). Can you guess why?  We do it to avoid the consequences of this common error using the assignment operator:  if(infile=NULL) which is a legal assignment.  If we make our typo in reverse the compiler catches it for us. Because if (NULL=infile) will not compile.
	
	We store a number such as 7 into variable i.
| 1st byte | 2nd byte | 3rd byte | 4th byte | 
| 00000000 00000000 00000000 00000111 | |||
We write that number out to our text file using: fprintf( outfile, "%d\n", i );
What gets written to the file is the following two bytes:
| 1st byte | 2nd byte | 
| '7' | '\n' | 
| 00110111 | 00001010 | 
The first byte is the ASCII character code value for the number 7. There is a difference between the number 7 and the character '7'. fprintf() means file print formatted . Our format string is "%d" which tells fprintf to convert the number to a sequence of digit characters which represent the number. The last byte is the ASCII for newline ('\n') character which is decimal 10 in the ASCII chart. Unix writes out the newline as a single byte of ASCII value 10 (or 00001010 in binary). Windows and DOS on the other hand, write out two bytes of ASCII values for '\n', 10 and 13.
Suppose the value in our integer variable i was 64295. Then the sequence of bytes written to the output file would be:
| - | 1st byte | 2nd byte: | 3rd byte | 4th byte | 5th byte | 6th byte: '\n' | 
| character | '6' | '4' | '2' | '9' | '5' | '\n' | 
| decimal value from ASCII table | 54 | 52 | 50 | 57 | 53 | 10 | 
| binary | 00110110 | 00110100 | 00110010 | 00111001 | 00110101 | 00001010 | 
		And lastly, if i contained a negative integer like -7 then fprintf(outfile,"%d\n",i) would produce:
	
| 1st byte: '-' | 2nd byte: '7' | 3rd byte: '\n' | 
| 00101101 | 00110111 | 00001010 | 
You may be wondering where I got the decimal ASCII values for characters such as '6' or '\n' (newline). Most modern compilers represent characters using the ASCII table scheme. Although this is not required by the ANSI standard the exceptions are few. If you suspect your ANSI compiler does not use the ASCII table, you should write a simple program to echo the letters of the alphabet followed by their decimal value.
 
	Look at the ASCII table and notice it is a list of 8 bit values from 1 through 127. Thus a text file is a file that contains bytes with values no greater than 127. In other words it only contains ASCII characters. As it turns out nearly all compilers have some graphical control character mapped to values 128 through 255. This is sometimes called the extended ASCII character set. These extended value mappings are not standard and vary greatly from system to system. Our table above lists each character in the standard ASCII set by its decimal, HEX, octal and even HTML value. For instance the graphical character 'A'.
'A' is decimal 65, HEX 41 and octal 101 (and binary 01000001 which is not shown in our table).
		Let's go back and look at our example above where we showed what would be written to the text file if we executed fprintf( outfile, "%d\n", i ); where i is 64295. This time we will display that sequence in HEX. We also split up the 8 bit byte in the bottom row into 2 groups of 4 bits. The ease of translation from a HEX digit to a group of 4 bits becomes apparent. In fact HEX is very often used instead of binary to illustrate the exact bit pattern in a given chunk of memory. Hex is more compact and readable.
	
| '6' | '4' | '2' | '9' | '5' | '\n' | 
| HEX 36 | HEX 34 | HEX 32 | HEX 39 | HEX 35 | HEX 0A | 
| 0011 0110 | 0011 0100 | 0011 0010 | 0011 1001 | 0011 0101 | 0000 1010 | 
Download the following program skeleton and finish it. Your program will prompt the user to repeatedly enter an integer from stdin. However, it will be entered in a loop, one digit at a time followed by a return. As the user enters each digit you will update your calculation of the entire number being entered and echo its current value to stdout.
scanf("%c",&charVar) or charVar = getc(stdin) to read your key from stdin. See man pages for doc on getc().
		| Declare a FILE variable as in: | FILE * txtFile; | 
| Use fopen as in: | txtFile = fopen( argv[1], "rt" );  /* ("rt"  means open for read as text. Use "wt" to write, "at" to append )*/  | 
| always verify the file variable is not NULL | if (NULL==txtFile) exit(EXIT_FAILURE); | 
| Read using fscanf() as in: | int result = fscanf( txtFile, "%d", &x); | 
| Verify the fscanf succeeded: | if (result !=1) { fprintf(stderr,"fscanf failed\n"); } | 
| Write using fprintf as in: |   | 
| Verify the write succeeded: |  if (result != 1) { fprintf(stderr,"fwrite failed\n"); } | 
| Close file when done as in: | fclose(txtFile);  | 
Suppose you have a text file that consists of a sequence of numbers with one number per line like below and that input file has a blank line at the bottom.
10 20 30 40 50
The above input file has a blank line at the bottom. You want to write a loop that repeatedly reads a number from the input file and then writes that number to stdout. You want to stop when you exhaust the input file of numbers. The following program illustrates a common error committed by novices as they try to use the feof( infile ) function to indicate when the input file has been exhausted. The feof function takes an open input file parameter and returns true/false whether end of file has been reached. The problem with our read loop is the newline at the bottom of the file will cause the feof() function to NOT report EOF. As a result of this newline we enter the loop one more time. The fscanf eats the newline and of course can't convert it to a number. No new value is stored in x which contains the last successful conversion. So fscanf returns EOF value (which will never be positive) but since we are not looking at the value returned by fscanf we go ahead and write out the old value of x again.
The correct approach is to break as soon as your read does not return the expected value rather than relying on feof. Once the loop is exited you are free to look at feof().
|  | 
Download and try this incorrect code for yourself. The correct way to read the file is:
while ( fscanf(txtFile, "%hi", &x) == 1 )
{
...Write x out to some output file
}
Note that in the correct form above we do not enter the loop again unless fscanf returned exactly the number of conversions requested. We know that fscanf will only return 1 when it successfully finds another number in the file. Any other value from fscanf() and our loop stops. This is the correct idiom to read of all the numbers in the file. As an exercise, edit the feof-caveat.c file and replace the bad loop with the correct loop above. Test it again. What happens now?
We now illustrate reading and writing binary files using fread and fwrite. Recall that binary files are not intended to be readable to people. Such files can contain bytes whose value is greater than 127. Unix does not distinguish between text and binary files. To Unix, all files are just sequences of bytes. C however does make a distinction and that distinction is really that binary files are read and written using functions that do no formatting or conversion. fread() and fwrite() simply copy a chunk of bytes to or from RAM or disk.
It is important to open a binary file using the "b" mode in the fopen() because on some platforms (Windows) you may get incorrect output if you fwrite to a binary file that was not opened using the "b" mode.
Our next program demonstrates fread() and fwrite() and arrays of ints.
fileio-2.c|  | 
First we have int x=7;
| 32 bits in variable x | |||
| 1st byte | 2nd byte | 3rd byte | 4th byte | 
| 00000000 | 00000000 | 00000000 | 00000111 | 
Then fwrite( &x, sizeof(x), 1, outfile ); and what gets written to the file is:
| 32 bits written to the file | |||
| 1st byte | 2nd byte | 3rd byte | 4th byte | 
| 00000000 | 00000000 | 00000000 | 00000111 | 
Note that fwrite() simply copied a chunk of memory from RAM to disk without formatting or converting it. Now let's look at the meaning of each parameter of fwrite().
The same thing applies for the short int and the char variables. Let's skip to the array being written to disk.
 
	Again, the chunk is simply copied to the file with no formatting or conversion. Let's look at the parameters again:
fwrite(arr1, sizeof(arr1),1, outfile ); in a platform independent manner, such that you replace the literal '1' with the actual number of ints in the array, and pass sizeof the number of bytes in the data type that the array points to? Don't hardcode the number 20 anywhere or the data type int in the value you give the sizoef operator. (click here to see the answer)
		Assume we just did: fwrite( &x, sizeof(x), 1, outfile ); and what got written to the file was:
| 1st byte | 2nd byte | 3rd byte | 4th byte | 
| 00000000 | 00000000 | 00000000 | 00000111 | 
Now read it back in with: fread( &x, sizeof(x), 1, infile ); and this is what memory looks like:
| 32 bits in variable x | |||
| 1st byte | 2nd byte | 3rd byte | 4th byte | 
| 00000000 | 00000000 | 00000000 | 00000111 | 
This is pretty simple. We just did the inverse. We copied a chunk byte for byte from disk to RAM. No formatting or conversion. Notice that the parameters are exactly the same. The only difference is that the last parameter is now the source and the first parameter is now the destination. There are a few caveats or "gotcha" cases that must be understood and avoided. Suppose we declare a short int like this: short int shint = 7; and we write it out to disk with fwrite() like this:
short int shint = 7;
fwrite( &shint, sizeof(shint), 1, outfile );
	What got written to disk looks like this:
| 16 bits of shint | next byte junk | next byte junk | |
| 00000000 | 00000111 | 10010010 | 11110111 | 
So.. we see that we have only written out 2 bytes to disk - that's all the memory our variable shint occupies in memory and so that's all our fwrite() put out
fread( &x, sizeof(x), 1, infile ); /* x is full 32 bit int -- not a 16 bit short int */
Well.. memory would now look like this:
| 32 bits in variable x | |||
| 1st byte | 2nd byte | 3rd byte | 4th byte | 
| 16 bits from shint | 16 bits junk AFTER shint | ||
| 00000000 00000111 | 10010010 11110111 | ||
This is all WRONG! If we print x, out comes a number much bigger than 7! We screwed up when we read a 4 bytes chunk from a 2 byte object and got 2 bytes of data and 2 bytes of garbage. We have no idea what those extra two bytes are! Let's now look at the inverse error.
Suppose we just wrote out our 32 bit x to the disk and then we execute: fread( &shint, sizeof(shint), 1, infile );.  What do we get when we print our short int?  We see a big fat 0 come out because we only grabbed the first two bytes of x. Those bytes are all zeros, the 2nd two bytes which contain the value 7 are never read into our short var. Moral of the story: Read exactly what you wrote. If you don't you will lose data, pick up garbage or overwrite memory that shouldn't be written to. Unlike Java, C does not throw an exception for you or necessarily crash when you do something dangerous like touch memory you don't own. The result of any such memory errors is always the same - unpredictable/undefined behavior. This is the worst thing that can happen. There is no sure way to recognize and recover from such errors. You must avoid them with strict adherence to the rules.
	
fread and fwrite can fail for reasons other than memory errors. These other failure modes are easily detected using ferror() and feof(). It is the programmer's responsibility to check for an error after every I/O operation. Here is a program that writes and reads a binary file and does error checking after each I/O attempt.
fileio-3.c| 
/* fileio-3.c
   demonstrates the following:
   - FILE, fopen, fwrite, fread, fclose
   - writing and reading binary files of primitives
   - formatted console I/O of values read from binary files
   - #define
   - ferror() - returns a value that is re-initialized by each call to fwrite or fread. non zero value indicates I/O error
   - feof() returns true or false if EOF has been encountered on the stream just read
   Expects a command line arg: name of binary output file to be created
   Writes ten ints followed by ten doubles to that file then reopens the file for reading, and echoes values to console
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NUM_INTS      10
#define NUM_DOUBLES   10
int main( int argc, char *argv[] )
{
	FILE *binaryFile;
	int     i;       /* loop counter as values are writen/read to/from file */
	int     iVal;    /* a sequence of these will be written to outfile */
	double  dVal;    /* followed by a sequence of these  */
	if (argc < 2)
	{
		printf("usage: ./a.out  | 
| Declare a FILE variable as in: | FILE * binaryFile; | 
| Use fopen as in: |  binaryFile = fopen( argv[1], "rb" );   /* ("rb"  means open for read as binary. Use "wb" to write, "ab" to append) */ | 
| always verify the file variable is not NULL | if (NULL==binaryFile) exit(EXIT_FAILURE); | 
| Read using fread() as in: | int result = fread( &iVal, sizeof(iVal), 1, binaryFile) | 
| Verify the fread suceeded: | if (result != 1) fprintf(stderr,"fread failed\n");  | 
| Write using fwrite as in: |  int result = fwrite( &i, sizeof(i), 1, binaryFile ) | 
| Verify the write suceeded: |  if (result != 1) fprintf(stderr,"fwrite failed\n");  | 
| Close file when done as in: | fclose(binaryFile);  | 
Now that we can write, compile and execute a trivial C program, we can apply some Unix tools, starting with stream redirection (then pipes and eventually shell script). As we do, keep in mind one of our primary uses of Unix tools is program verification. Once a piece of code is written, how can we streamline the testing and verification of the code? Suppose you know what the correct output of your program should look like and you want to compare that reference output repeatedly against your program's output until they match. The low tech way would be to have a hard copy of that reference output in hand to compare against what your program just outputted to the screen. Alternatively, you might have a second window displaying that reference output to visually compare to your program's output. This method is awkward and prone to error. A better solution is to redirect the screen output produced by your program and save it into a file, then use the diff utility to look for differences between the your program's output and the reference output. To do this we must explain the standard streams and show you how to redirect them.
A stream is either a source or destination for data. The three most commonly used standard text streams are stdin, stdout and stderr. Another stream is /dev/null.
scanf(..) statement, you are reading text from the keyboard.
		printf(..) you are sending text to the terminal.
		fprintf(stderr, "Fatal Error! program aborting\n") you are sending text to the stderr stream which will also show up on your terminal. We will differentiate stderr from stdout shortly.
		The majority of Unix commands that produce text output, write that output to stdout. Since we want to demonstrate redirecting stdout to a text file we don't have to write a program to produce output. We can just execute any number of Unix commands that write to stdout and then redirect that output to a file. Once this simple process is understood we can go back and redirect the stdout of any program we have written. In the screen shot below we execute the ls command and see its output to stdout (the screen). We then execute the same command but redirect stdout to a text file using the > operator.
|  | 
In the above screen shot, our second ls command used the redirection operator > to redirect stdout to a file instead of the screen. As a result our third ls command shows a new file appearing in the directory. A cat of that new file proves that the output from the previous ls command got stored in the file "listing.txt". Note that the format of the text in the file is slightly different from the same text displayed originally on the screen. The redirected output has a newline after each filename. This formatting difference only occurs with some Unix commands. If we were to redirect the output of one of our C programs, Unix would not modify the format. In the case of the ls command, Unix assumes that when a directory listing is redirected, the user of that redirected file will want those file names each on its own line to facilitate the reading and processing of those filenames one at a time from the file. This only occurs with certain Unix commands whose output is generally agreed to be most usable if separated by newlines. Not all commands get their redirected output modified like this. When we start doing scripting we will appreciate this convenient formatting that the output of some commands receive when redirected.
The redirection append operator >> behaves just like redirection but appends to the specified file instead of recreating that file. In the following example we use the echo command to send some text to stdout but we redirect that stdout to a text file which we then get a listing on and then display the contents of, using the more command.
|  | 
Remember than any program which writes to stdout can be redirected. What we will demonstrate next is that any program that reads from stdin can have its input redirected to come from somewhere else. The redirection operator can cause a program to get its stdin from a file rather than default stdin device (keyboard). Suppose your program reads from the keyboard. You can store those keyboard inputs into a text file then redirect that text file into your program so that when your program runs it does not stop and wait for you to type. Instead, every time your program executes a read from stdin, it will take text from the text file rather than from the actual keyboard device. Let's look at an example using the solution executable to our next exercise and the input file we just created that contains a line of text containing the number 73.
5' 10"which is the number of whole feet followed by a single quote, then a space, then the number of remainder inches followed by a double quote. The output must be exactly in this format to be correct. You must escape the double quote " character imbedded in your format string by putting a backslash \ before it. Otherwise the compiler will think it is the end of the format string.
Here are the steps to test your program against the reference solution executable
./a.out < input-1.txt > my-output-1.txt
./solution-2.exe < input-1.txt > solution-output-1.txt
diff my-output-1.txt solution-output-1.txtThe following form tells diff to ignore differences in case, newlines and whitespace.
diff -b -B -i my-output-1.txt solution-output-1.txt
Now is a good time show you another stream: stderr and demonstrate the difference between stdout and stderr A short while ago we executed our hello-1.c program and redirected its stdout to a file. As a result we did not see the prompts for the values to enter, nor did we see the prints of the values read in. The stderr stream is traditionally used to write error messages. Text printed to the stderr stream is not re-directed when the > operator is used. Let's go back and modify your exercise 2 solution to print all error messages to stdout. We will then execute our new program and redirect its output to a file.
Exercise 3: Modify your solution to exercise-2 such that the error message is written to stderr using: fprintf(stderr,"scanf failed on conversion to integer");
When you have it working, execute the following command
./a.out > output.txtstdin is stream 0, stdout is stream 1, stderr is stream 2.
Here are some rules for redirecting the standard streams:
Beware, the redirection operators are notorious for behaving differently under different shells. The following examples apply to the csh shell and may not work as illustrated if you are using a different shell.
| ls -l > dirlisting.txt | Creates a new file "dirlisting.txt" and dumps the output of the ls command into it (ls writes to stdout). | 
| echo 5 10 > input.txt | Creates a new file "input.txt" and writes 5 10 into it. 5 and 10 are two separate args and they only get one space put in between them even if there were really several spaces. | 
| echo "5 10" > input.txt | Creates a new file "input.txt" and writes 5 10 into it. Since the 5 and 10 are quoted there is only one argument to the echo command and that entire quoted string with all embedded spaces is written to the file. | 
| ./a.out > output.txt | Creates a new file "output.txt" and redirects into it any text that a.out wrote to stdout. Anything that a.out wrote to stderr is not put into the file. | 
| ./a.out < input.txt | Redirection operators can work from right to left. Executes a.out and causes it to get its input from input.txt rather than stdin. The assumption here is that a.out reads from stdin and we are replacing stdin with our input file so that every time a.out tries to read from stdin it will instead be reading from input.txt. | 
| ./a.out >> output.txt | Same as above but appends to output.txt rather than re-writing it. Stream numbers may be added just like in the above examples. | 
| ./a.out >& output.txt | Redirect both stdout and stderr to a text file. Any text written to either stream ends up in output.txt | 
Redirection operations can be parenthesized. This is what you must do to redirect stdout and stderr separately when using the csh (C) shell.
| (./a.out > out.txt) >& err.txt | Redirects stdout from a.out into out.txt, and redirect stderr from a.out into err.txt. | 
Write a small program that reads a number from the stdin using scanf and echoes that number to stdout with a newline after the number. Compile it and name the executable a.out. Test it then apply these commands to that executable. Do you understand why they don't work?
input.txt > ./a.outIn the first command you violated the first rule. The first arg on the command line must be an output producing command (not just a file name). The second command causes your executable to be overwritten by the contents of the input file. Not what you wanted either. We now show you how to accomplish what you were trying to do with the second command.
Consider again this ill fated command: cat input.txt > ./a.out from the section above. It is intuitive to see the intention was to make the a.out executable take its input from the input file. Unfortunately we overwrote our executable instead. The right way to accomplish the intent of the command is to use the pipe operator | which is the vertical bar character, typically above the ENTER key.
cat input.txt | ./a.outThe above command performs exactly what we had been trying to do. Our a.out program gets its input from the file instead of stdin. To illustrate a more powerful use, we combine multiple Unix commands with the pipe operator.
| wc filePath | reports how many newlines, words and bytes a file contains. | 
| 
$ cat foo.c
#include <stdlib.h>
#include <stdio.h>
int main( )
{
  int x=0;
  printf("Enter a small positive number: ");
  fflush(stdout);
  scanf("%d", &x );
  printf("x= %d\n",x);
  return 0;
}
$ wc foo.c
 14  25 187 foo.c | 
The behavior of the wc command can be customized by using switches such as:
-c print the byte counts -m print the character counts -l print the newline counts -L print the length of the longest line -w print the word counts
We can now pipe the results of commands such as ls, find or grep into the wc command to count the number of items those commands found. We then pipe the results into the wc command with the -l switch to produce a count of those files.
Here we use the ls command to find the .c files in our current directory.
| $ ls *.c fixed.c fscanf.c lab2.c listOps.c m2.c m4.c pt.c sol2.c foo.c getline.c lab3-template.c m1.c m3.c mutual.c sizeof.c solution-2.c $ $ ls *.c | wc -l 16 $ | 
Here we use the find command to find all files ending in .c in our current directory's entire hierarchy. we then pipe the results into the wc command with the -l switch to produce a count of those files.
| $ find -name '*.c' ./OldFiles/listOps.c ./OldFiles/lab3-template.c ./OldFiles/lab2.c ./OldFiles/sizeof.c ./OldFiles/getline.c ./OldFiles/mutual.c ./OldFiles/C/hello-1.c ./OldFiles/C/solution-2.c ./OldFiles/fixed.c ./OldFiles/fscanf.c ./OldFiles/solution-2.c ./OldFiles/sol2.c ./OldFiles/m1.c ./OldFiles/m2.c ./OldFiles/m3.c ./OldFiles/m4.c ./OldFiles/pt.c ./OldFiles/foo.c ./listOps.c ./lab3-template.c ./lab2.c ./sizeof.c ./getline.c ./mutual.c ./C/hello-1.c ./C/solution-2.c ./fixed.c ./fscanf.c ./solution-2.c ./sol2.c ./m1.c ./m2.c ./m3.c ./m4.c ./pt.c ./foo.c $ $ find -name '*.c' | wc -l 36 $ | 
Below we grep for all occupancies of "exit" in all the .c files in our current directory hierarchy and then count the number of matches by piping the grep output into the wc command.
| $ grep exit *.c
lab3-template.c:                printf("Initial malloc of dictionary failed. Program exiting\n");
lab3-template.c:      if ( !outfile ) exit(EXIT_FAILURE );
lab3-template.c:      exit( EXIT_SUCCESS );
lab3-template.c:  if ( !infile ) exit( EXIT_FAILURE );
listOps.c:freeing all memory correctly before exit worth 5% X/C
listOps.c:                      exit( 0 );
listOps.c:      exit(0);
m2.c:  exit( EXIT_FAILURE ); /* a NON_ZERO value */
m3.c:      exit(EXIT_FAILURE);
m3.c:  exit( EXIT_FAILURE ); /* a NON_ZERO value */
m4.c:     exit( EXIT_FAILURE ); /* a NON_ZERO value */
m4.c:  exit( EXIT_FAILURE ); /* a NON_ZERO value */
pt.c:  exit( EXIT_FAILURE);
sol2.c:- exit() function
sol2.c:  exit( EXIT_FAILURE);
solution-2.c:- exit() function
solution-2.c:  exit( EXIT_FAILURE);
$
$ grep exit *.c | wc -l
17
$ | 
Another command that interacts powerfully with the pipe operator is xargs. The xargs command allows you to execute some command on each match or hit that some other command has produced. For example if you want to delete all files in a directory hierarchy that satisfy some criteria you would use the find command to produce the matching files, then pipe those matches into the xargs command and then put the rm command after the xargs command to indicate that you want xargs to execute the rm command on each one of those filenames found by the find command.
Let's delete all .swp files in the entire handin directory of this 15-123 course on Carnegie Mellon's AFS file space. These .swp files are the traces of students attempting to create or edit files from within the handin directory. Since students only have the privilege to copy file into but not edit/create from within, a .swp file is the by-product of a rejected attempt to do so. First we use the find command to see a list of all the .swp files. Then we re-issue the command and pipe the list of files into the rm command via the xargs command.
| $ cd HANDIN $ cd HANDIN $ $ pwd /afs/andrew.cmu.edu/course/15/123-tlh/handin $ $ find -name '*.swp' ./mjelin/quiz-00/1/.swp ./irm/quiz-00/1/.swp $ $ $ find -name '*.swp' | xargs rm $ $ find -name '*.swp' $ $ | 
Our initial listing shows only two .swp files in the entire handin hierarchy. Not bad. Students are catching on to the rules. Next we re-issue the command but this time we pipe it into the xargs command. Our xargs command will apply whatever command we specify (i.e. rm) to each line of output that is being piped in from the find command. It is important to understand that just piping the find into the rm command will not work because the rm command will not work on a long string of filenames separated by newlines. This is where the xargs command mediates between the results of the find and the rm command to feed each filename found, one at a time to the rm command. Once the command completes we do another find just to verify we got them all.
Unix has a utility to combine and compress multiple files and directories into a single file. The tar utility got its name from the "tape archive" utility. Many Unix systems still use some form of tape backup although more modern and denser media is becoming increasingly popular. Let's demonstrate the tar utility and then explain the switches. I have a directory named "classical" containing some songs. I want to archive that directory into a single tar file. This is analogous to zipping up a directory on Windows.
| $ pwd /afs/andrew.cmu.edu/usr20/tlh $ $ ls classical/ ClairDeLune.wma FantasieImpromptu.wma Liebestraum.wma MoonlightSonata.wma RustlesofSpring.wma UnSuspiro.wma EtudeinE.wma FurElise.wma Malaguena.wma PreludeinEminor.wma TheMaiden.wma $ $ tar zcfv classical.tgz classical classical/ classical/Malaguena.wma classical/MoonlightSonata.wma classical/ClairDeLune.wma classical/RustlesofSpring.wma classical/TheMaiden.wma classical/UnSuspiro.wma classical/FantasieImpromptu.wma classical/EtudeinE.wma classical/PreludeinEminor.wma classical/Liebestraum.wma classical/FurElise.wma $ ls classical.* classical.tgz $ | 
Notice that the switches zcfv before the name of the archive file to be created, do not have a dash in front of them. The last argument is a list of files and/or directories to be archived. In the above example one lone directory was specified but we could have specified a list of files separated by spaces. Like this:
| $ cd classical/ $ $ ls ClairDeLune.wma FantasieImpromptu.wma Liebestraum.wma MoonlightSonata.wma RustlesofSpring.wma UnSuspiro.wma EtudeinE.wma FurElise.wma Malaguena.wma PreludeinEminor.wma TheMaiden.wma $ $ tar zcfv mytar.tgz FurElise.wma Liebestraum.wma MoonlightSonata.wma RustlesofSpring.wma FurElise.wma Liebestraum.wma MoonlightSonata.wma RustlesofSpring.wma $ $ ls ClairDeLune.wma FantasieImpromptu.wma Liebestraum.wma MoonlightSonata.wma PreludeinEminor.wma TheMaiden.wma EtudeinE.wma FurElise.wma Malaguena.wma mytar.tgz RustlesofSpring.wma UnSuspiro.wma $ | 
In our above illustrations we specified the z,c,f, and v options.
To uncompress an archive use the tar command with the option x. First we delete the classical directory. Then we untar the classical.tgz file to recreate the original classical directory.
| $ $ rm -rf classical $ $ tar xzf classical.tgz $ ls classical ClairDeLune.wma FantasieImpromptu.wma Liebestraum.wma MoonlightSonata.wma PreludeinEminor.wma TheMaiden.wma EtudeinE.wma FurElise.wma Malaguena.wma mytar.tgz RustlesofSpring.wma UnSuspiro.wma $ | 
To x option means extract (untar) the archive. The z option means it was compressing using the z (gzip) option and should be uncompressed using the z (gzip) option. The f option means the filename to be un-tarred is being specified on the command line. In this case classical.tgz. A directory named classical is created in the same dir where the command is issued. x. First we delete the classical directory. Then we untar the classical.tgz file to recreate the original classical directory.
Warning: The options used above such as czfv or xzf are used in old style form, all bunched together with no spaces or dashes between. If you want to use the dash '-' before them you must put a dash before every switch individually such as -x -v -f . If you mix the two you will get incorrect results.