Chapter 1: Hello Unix, Hello World

Copyright Timothy L. Hoffman

C compiler for this course

In this course we always and only use the gcc compiler distributed by the GNU Project at http://gcc.gnu.org/. The GNU gcc compiler is open source and free. It is known in industry and academia as a very "tried and true" reference quality, highly supported implementation of the ANSI standard for the C language.

Motivation

C allows programmers to make mistakes which create an unsafe and unpredictable state while making no promise to detect, warn or recover from the bad things it allows us to do. If you index beyond the end of an array or dereference a pointer to memory that you don't own, the C compiler does not promise to detect such infractions and throw an exception to force you to safely handle the situation. In some cases your program will crash at the point of the violation. This is the best case scenario. Unfortunately, what often happens is that it continues to execute in a corrupted state and may run to completion but produce incorrect results. Sometimes your program will execute for a while only to crash at a seemingly unrelated point later. Other times it runs to completion and produces no detectable error or mistake. This last scenario is analogous to playing Russian Roulette and walking away unscathed.

The bad news is that once you violate the rules of the C compiler, the behavior of your program is no longer guaranteed by the C compiler. In such a case your program's behavior may be determined in part by the hardware and software platform it is running on. As a result your program may act quite differently from execution to execution. More bad news is that if your program crashes, it may crash at a point that has no apparent relation to the place where you did the bad thing that setup the disaster. The best thing that can happen when you violate your program is that it crashes right away. The worst case scenario is that it never crashes and you get different results running the same code on a different platform or even get varying results on the same machine from run to run. This notion of a program going into an undefined state and behaving differently each time you run it is something students have a hard time understanding. Novices in C continually demonstrate this by saying things like "But it ran fine on my Linux machine! How could it crash on the Mac?". Such statements show that the notion of undefined state is completely foreign to those who have cut their teeth on a language like Java that watches every line of code that is executing to detect violations of integrity such as indexing past the end of an array or parsing a number that is too big for a variable (value overflow). The C compiler trades away that security for performance. Make no mistake, the performance hit borne by a language like Java to do this kind of sand boxing is quite significant. C is leaner and meaner but leaves you vulnerable to a bizarre new universe of undefined state and unpredictable behavior if you violate the rules.

The only cure for such vulnerability is prevention. The best heuristic for prevention is a thorough understanding of memory, addressing, pointers and platform dependence. Only when these concepts are clearly understood can a set of good and safe practices be derived. In this first chapter we introduce the interactive Unix command shell which is the environment you will live in. We then begin to lay the foundation for understanding memory organization and how C uses it to store variables. Chapter one will also serve as a preview to the entire course - getting you to where you can write a simple C program and use simple Unix tools to verify it. This chapter's goal is to let you see the possibilities for combining C and Unix to bring clarity and simplicity to your development cycle. This breadth first overview will open your eyes and pique your interest for what's to come. Before we write "Hello World" in C, we need to familiarize ourselves with Unix so we can log in, navigate files and directories, edit and compile. We start with command shells.

Topics covered in this chapter

1.1

Unix shells: command shells vs. shell scripting

1.2

Hello World in C

1.3

C file I/O text & binary

1.4

Redirecting the standard streams

Chapter 1.1 Unix shells: command shells vs. shell scripting

The Unix operating system has evolved into many variants of Linux which support Windows-like GUIs. This course is not interested in any GUI interfaces to Unix. Instead we focus on the command line interface. When we connect to Unix we connect to a command shell which is not a GUI and does not support drag and drop or other Windows like operations. Most connection programs will however let you do some basic copy/paste to/from the session window. Unix command shells present a window that displays a prompt (such as the % sign) at the start of the line and waits for a command to be typed in. You execute that command by hitting return then wait for it to complete. You know that command has finished when the prompt comes back to you. There are several popular shells available on every Unix system. When you log into your Unix account you are assigned some shell chosen by you or whoever set up your Unix account. It is not uncommon for Unix users to have strong preferences about which shell is best to use for what. Most agree however that some shells are better suited for interactive command line work, while other shells are better for scripting. Scripting is the art of writing programs which consist of shell commands. Such scripts are then executed as a program just like you might execute a Java or C program. All the shells will let you execute their commands at the command line or batched together in a file. The difference is that when you execute shell commands inside the script file you have access to more than just the one-liner commands. Inside the script file you have access to that shell's programming language syntax with conditionals, loops, function calls and more. When you use a shell for scripting, the differences between shells begin to show - especially as you do more complex and interesting things.

In this course we will be connecting to our Pitt Linux machines to do our C language work. Most students are assigned a tcsh shell connection by default at login time. If you are an experienced shell user you can of course change your shell to one of the other variants available on Pitt's Linux network.

Command Shell

We just made a distinction between using a shell for interactive commands and using a shell for scripting. Its time to illustrate and explain those concepts with some concrete examples. We assume that you have been shown how to log onto your school or institution's Unix machine and authenticate a connection to your user account. We will not teach shell scripting in this course. We just want to show you an example of command line interaction with a shell vs. writing a script containing shell commands, so that you understand the distinction between the two.

The screen shot below illustrates an interactive shell session. This session consists of the login and then proceeds to issue some shell commands. You may often hear the terms shell commands and Unix commands used interchangeably. Every command is just a program. Someone had to write it. The commands that come with Unix are programs. Soon we will write C programs, shell scripts and PERL scripts that are executable and can be used just like the built in Unix commands. If you write a C program and compile it into an executable binary file named a.out, then a.out can be executed by typing its name on the command line. It becomes a verb or command - even though it was written by you and is not a built in command. Whatever name they are called by - programs, commands, scripts, verbs or executables - they were written by someone using some programming language and they are stored in a file somewhere. They can be executed by invoking their executable file's name on the command line. A simple Unix command named where will even tell you where (the full directory path) on your Unix machine a command's executable file is located . The built in Unix commands that you invoke from the command line are generally called commands but some are often referred to as utilities. Commands and utilities are very small programs that are specific and narrow in their function. Whatever shell you are running on Unix will have many built in commands and utilities.

We will demonstrate a few commonly used commands that all the shells have in common. Commands that allow you to list and read files, navigate directories and get information about your Unix system and environment.

echo $SHELL	The echo command is printing the value of an environment variable named SHELL. The $ sign in front of SHELL tells echo to print the value of SHELL not the literal text SHELL. The value of the SHELL variable is the full path to whatever shell program we are running under and interacting with right now. My shell is /bin/csh - also known as the C shell. Notice we are getting the full path of the program that is our shell. The actual program name is csh and it lives in the /bin directory right off the root of our Unix machine's file hierarchy. My C shell offers a percent sign as the command prompt. Your C shell may be set up to use $ sign or some other string as the prompt.
set prompt="$ "	Allows us to change the % prompt to something other than the percent sign. I like the $ sign followed by a space.
pwd	Print working directory tells us what directory we're in.
cd C	Steps us into the C subdirectory.
ls *.c	Shows only files in this dir that end with .c
ls -l .c*	Lists each file on its own line with properties of each file such as access privileges, owner, size, date etc.

The above session was an example of command line interaction with a shell.

Shell script

A shell script can be as simple as a text file containing a few one-liner commands. All you need to do to write a shell script is use the very first line of your script file to specify which shell you want to interpret the commands that follow in the file. Remember, there are at least a half dozen different popular Unix shells and each of those shells will let you use them interactively on the command line, or to execute commands in a file. Every time you write a script file you must specify which shell program you want to interpret (execute) the commands in the file. Below is a sample shell script file that uses the bourne shell located at /bin/sh to interpret the commands in the file. Shell script files are usually named with a .sh extension to indicate to readers what they are.

shellscript.sh

#!/bin/sh
#  this is a comment - any line that begins with # is a comment  (except the very first line of the file)
# the very first line of a  shell script file must be the "sha bang" line.
# this  "#!"  sequence is called a sha bang, followed by the full path of the particular shell we want to interpret/execute our commands
# every non-commented command will be interpreted/executed by the bourne shell program located in /bin/sh
echo "Hello Bourne shell!"
echo "executing  pwd command"
pwd
echo "executing ls command"
ls
echo "executing who command"
who
echo "executing cat shellscript.sh"
cat shellscript.sh
echo "executing grep -i shell *.sh"
grep -i shell  *.sh
echo "Goodbye Bourne shell. Now returning to my command shell which is" "$SHELL"

We just said this but its worth repeating to make sure the distinction between command shell and shell script is clear. The above shell script is just a text file containing a few one-liner shell commands that could have just as easily been typed at the terminal interactively. When you write a shell script however you gain access to that shell's entire programming language of loops, conditionals and function calls. Of course you can always write a shell script that does not take advantage of the programming syntax. Like the shell script above. No loops or conditionals - just a sequence of simple one-liner shell commands to be executed sequentially.

The #! must be the first two characters in the first line of the file. It is called a SHA BANG and is followed by the path to the shell program that we want to be interpreting/executing the commands. At the very end of the file our last command is an echo which is fed two arguments. The first is a literal string and the second is the value of an environment variable named SHELL.

No executable program, script or command file can be executed unless that file's execution flag is turned on.

the command: ls -l filename will tell you if a file's execution flag is on or off
the command: chmod +x filename will turn a file's execution flag on. ( -x will turn it off )

-rw-r--r-- 1 tlh staff 203 Jun  4 14:24 shellscript.sh

Looking above at the first 10 characters of the directory info for shellscript.sh we see

FILE TYPE
'-' means file is a plain file. 'd' means directory and 'l' means link.
OWNER'S RIGHTS
Every file belongs to some owner (user account).
'r' means owner has read access. '-' means no read access. We can change this via the chmod command.
'w' means owner has write access. '-' means no write access. We can change this via the chmod command.
'-' means owner has no eXecute privilege. 'x' means owner has execute privs on this file. We can change this via the chmod command.
GROUP'S RIGHTS
Every Unix account belongs to some group.
'r' means group has read access. '-' means no read access. We can change this via the chmod command.
'w' means group has no write access. '-' means no write access. We can change this via the chmod command.
'-' means group has no eXecute privilege. 'x' means owner has execute privs on this file. We can change this via the chmod command.
WORLD'S RIGHTS
world is the set of all Unix accounts (users) on this Unix system.
'r' means world has read access. '-' means no read access. We can change this via the chmod command.
'w' means world has no write access. '-' means no write access. We can change this via the chmod command.
'-' means world has no eXecute privilege. 'x' means owner has execute privs on this file. We can change this via the chmod command.

The man command will show you documentation on any command.

These help pages are probably not as user friendly as most novices would like, but they are helpful. Use the man command to get help on any command. Just type man command which displays the manual pages on a command. Below we view the man pages on the chmod command:


$ man chmod
CHMOD(1)                            User Commands                            CHMOD(1)

NAME
       chmod - change file access permissions

SYNOPSIS
       chmod [OPTION]... MODE[,MODE]... FILE...
       chmod [OPTION]... OCTAL-MODE FILE...
       chmod [OPTION]... --reference=RFILE FILE...

DESCRIPTION
       This  manual  page documents the GNU version of chmod.  chmod changes the per-
       missions of each given file according to mode, which can be either a  symbolic
       representation  of  changes  to  make, or an octal number representing the bit
       pattern for the new permissions.

       The format of a symbolic  mode  is  '[ugoa...][[+-=][rwxXstugo...]...][,...]'.
       Multiple symbolic operations can be given, separated by commas.

       A  combination  of the letters 'ugoa' controls which users' access to the file
       will be changed: the user who owns it (u), other users  in  the  file's  group
       (g),  other  users  not in the file's group (o), or all users (a).  If none of
       these are given, the effect is as if 'a' were given, but bits that are set  in
       the umask are not affected.

       The  operator  '+' causes the permissions selected to be added to the existing
       permissions of each file; '-' causes them to be removed; and '='  causes  them
       to be the only permissions that the file has.

       The  letters  'rwxXstugo'  select  the new permissions for the affected users:
       read (r), write (w), execute (or access for directories) (x), execute only  if
       the  file  is a directory or already has execute permission for some user (X),
       set user or group ID on execution (s), sticky (t), the permissions granted  to
       the user who owns the file (u), the permissions granted to other users who are
       members of the file's group (g), and the permissions granted to users that are
       in neither of the two preceding categories (o).

       A  numeric  mode  is from one to four octal digits (0-7), derived by adding up
       the bits with values 4, 2, and 1.  Any omitted digits are assumed to be  lead-
       ing  zeros.   The first digit selects the set user ID (4) and set group ID (2)
       and sticky (1) attributes.  The second digit selects permissions for the  user
       who  owns  the  file:  read (4), write (2), and execute (1); the third selects
       permissions for other users in the file's group, with the same values; and the
       fourth for other users not in the file's group, with the same values.

       chmod  never  changes the permissions of symbolic links; the chmod system call
       cannot change their permissions.  This is not a problem since the  permissions
       of  symbolic  links are never used.  However, for each symbolic link listed on
       the command line, chmod changes the permissions of the  pointed-to  file.   In
       contrast,  chmod ignores symbolic links encountered during recursive directory
       traversals.

STICKY FILES
       On older Unix systems, the sticky bit caused executable files to be hoarded in
       swap  space.   This  feature is not useful on modern VM systems, and the Linux
       kernel ignores the sticky bit on files.  Other kernels may use the sticky  bit
       on files for system-defined purposes.  On some systems, only the superuser can
       set the sticky bit on files.

STICKY DIRECTORIES
       When the sticky bit is set on a directory, files  in  that  directory  may  be
       unlinked or renamed only by root or their owner.  Without the sticky bit, any-
       one able to write to the directory can delete or rename files.  The sticky bit
       is commonly found on directories, such as /tmp, that are world-writable.

OPTIONS
       Change the mode of each FILE to MODE.

       -c, --changes
              like verbose but report only when a change is made

       --no-preserve-root
              do not treat '/' specially (the default)

       --preserve-root
              fail to operate recursively on '/'

       -f, --silent, --quiet
              suppress most error messages

       -v, --verbose
              output a diagnostic for every file processed

       --reference=RFILE
              use RFILE's mode instead of MODE values

       -R, --recursive
              change files and directories recursively

       --help display this help and exit

       --version
              output version information and exit

       Each MODE is of the form '[ugoa]*([-+=]([rwxXst]*|[ugo]))+'.

AUTHOR
       Written by David MacKenzie and Jim Meyering.

REPORTING BUGS
       Report bugs to .

COPYRIGHT
       Copyright (C) 2006 Free Software Foundation, Inc.
       This  is  free software.  You may redistribute copies of it under the terms of
       the GNU General Public License .   There
       is NO WARRANTY, to the extent permitted by law.

SEE ALSO
       The  full  documentation  for chmod is maintained as a Texinfo manual.  If the
       info and chmod programs are properly installed at your site, the command

              info chmod

       should give you access to the complete manual.

chmod 5.96                             May 2006                              CHMOD(1)

Here are some commands that will come in handy for the work we do in Chapter 1 in general.

Many of these commands report their output to stdout which is one of several standard streams in Unix. Stdout represents the screen (terminal, monitor, console). When we say that a command prints this or that, we mean it displays text to stdout. In Unix a stream is a source or destination for data. Some commands take their input from stdin which is the keyboard. Some streams are associated with an I/O device. For example stdin being the keyboard and stdout being the screen. We will talk about the other standard streams later.

Connections and users

whoami	prints the ID of the user currently connected/logged-in to this shell/command window. It may seem redundant to ask the system who you are, but this command becomes useful if you encounter an abandoned terminal that someone has walked away from without logging out. By typing whoami you are shown the ID of that user. Should this ever happen, please be considerate and log that user out via the logout command
logout	terminates your connection to the command shell and closes the connection window. Also kills any processes you may have started while logged in.
who	prints a list of other users who are logged onto this Unix machine.
finger userID	prints information about a particular user (user does not have to be logged in).
whois domainName	is completely different than who, but you might accidentally use whois when you meant who. Try whois yahoo.com and see what you get.

The following commands deal with files. The term filePath is used to mean either the full file path such as /usr/local/bin/ls or a relative file path such as ../documents/resume.txt or simply a file name such as resume.txt.

Directories

pwd	prints the full path of the directory you are in right now (current working directory).
cd dirPath	changes your current working directory to be dirPath. If you leave dirPath blank then you are taken to your home directory
ls	prints a list of the files & directories in your current working directory. Hidden files (those starting with a dot ".") are not listed.
ls dirPath	prints a list of the files & directories in directory dirPath. Hidden files (those starting with a dot ".") are not listed.
ls -a	prints a list of files & directories in your directory. Hidden files/directories are also listed. Any file or directory that starts with a '.' (DOT) will not show up in a listing unless the -a switch is used.
ls -l	prints a list of files & directories in your directory. One line per file. Meta info about each file is also displayed on the line.
ls -al	combines the -a (all) switch with the -l (line) switch. All files & directories including hidden are displayed one per line with meta info about each file on the line.
mkdir dirPath	creates a new directory.

Displaying the contents of files

cat filePath	prints contents of specified file.
more filePath	prints contents of specified file one screen at a time. Space bar prints the next screenful.
less filePath	similar to more.
head filePath	prints first few lines at top of specified file.
tail filePath	prints last few lines at bottom of specified file.
file filePath	examines specified file and makes a good guess as to its type.
cp fromFilePath toFilePath	copies specified file.
mv fromFilePath toFilePath	same as cp except source file is deleted.
rm filePath	deletes specified file(s). Warning! this command can be a WMD
rmdir dirPath	delete specified directory (or directories). Warning! this command can be a WMD

Finding files that satisfy a criteria

find dirPath -name '*.txt'

finds and lists all files under the specified directory (and its subdirectories) whose name ends with .txt

Searching for text patterns inside files

grep hello *.txt

searches all .txt files in current directory hierarchy for the text "hello". For each match a filename and the matching line of text from the file is displayed.

Some other powerful utilities and commands

wc FilePath	for specified file(s), displays number of lines, words, chars and bytes in that file.
diff FilePath1 FilePath2	displays the line by line differences between two text files.
comm FilePath1 FilePath2	like diff but displays common lines instead of differences.
cmp FilePath1 FilePath2	like diff, but for binary files.
date	displays current date and time
od FilePath	displays specified binary file in octal or hex.

Chapter 1.2 Memory, bits & bytes, "Hello World"

You have seen enough Unix to begin learning some C. We had to show you the basics of files and directories so you don't get lost when compiling and executing source files and executables. Now let's talk about bits, bytes and memory so that we can take our first look at C and get hands on with the gcc compiler.

Memory organization and platform dependence

In C, memory is an array of bytes. A char is always one byte wide. Every data type in C is represented as a chunk of bytes. Some data types require more bytes to store a value than others. C provides a sizeof operator which when applied to any data type's name (int, char, float etc.) or to any value producing expression (x, arr[i], &i etc.) returns an integer telling how many bytes of data that object or type uses. On any ANSI compliant C environment sizeof char is always one. A character always occupies exactly one byte of memory. All objects of all types in C are stored as a contiguous sequence of bytes. A byte is the smallest unit of memory that can be read/written from/to memory or a file. This notion of byte based representation is held in strong consensus as being a fundamental precept for the C language. All the statements in the above paragraph are true on all platforms for an ANSI compliant C environment. The one thing that is not guaranteed to be the same on all compliant platforms is the number of bits in a byte. The most common C environments use 8 bits per byte and use 32 bits for words/addresses. However, according to the documentation of the C89 standard:

"... on a machine with 36-bit words, a byte can be defined to consist of 9, 12, 18, or 36 bits, these numbers being all the exact divisors of 36 which are not less than 8. These strictures codify the widespread presumption that any object can be treated as an array of characters, the size of which is given by the sizeof operator with that object�s type as its operand. These definitions do not preclude holes in struct objects. Such holes are in fact often mandated by alignment and packing requirements. The holes simply do not participate in representing the composite value of an object."

As we will see, the sizeof operator is a very useful tool to find out platform specific information about your machine. For the rest of this course, our discussions and illustrations will use an 8 bit byte and 32 bit addressing platform since 8-bit / 32-bit still represents the majority of systems running C code at this writing.

Programs run in main memory (RAM). Think of memory as an array of bytes. Each byte is a chunk of bits (typically 8). Each bit is a single value that is either a 1 or a 0. Now think of memory as a long row of houses on one long street like "memory lane". Each house has its own mailing address. When you write a letter you address it to a house at some address. Likewise in RAM, the smallest chunk of memory that can be addressed is a byte. Every byte of memory gets its own address starting at 0 and ending at 4,294,967,295 which is 4 Gigabytes, or 2 to the 32'nd power.

This example is an illustration of a 32 bit addressing system. A system where addresses are stored in 32 bits and thus have a range of 0 through 4,294,967,295

addr: 0

addr: 1

addr: 2

addr: 3

addr: 4

addr: 5

addr: 6

addr: 7

addr: 8

addr: 9

addr:10

addr:11

addr:12

. . .

addr: 4,294,967,295

1 byte

. . .

1 byte

Let's go back to our postal delivery analogy. Every house (byte of RAM) is located on memory lane and the first house (byte) is numbered 0 (zero). Each thereafter is numbered 1, 2 ,3 ,4 etc. Any letter addressed to a house is simply labeled with the house number. No names (yet). Suppose the envelope being sent only has space for 6 decimal digits in the house number field on the face of the envelope. How many houses could possibly get letters on memory lane? That's simple. 1,000,000. Six decimal digits means houses whose address is between 0 and 999,999 could be addressed. Houses with higher numbers simply cannot get mail because of the addressing limitation.

With this in mind, what does it mean to be on a 32 bit address platform? It means that addresses are expressed as a 32 bit value. As a result no more than 4 Gigs or (2 to the 32) bytes of memory can be addressed. How many bytes of memory can be addressed on an architecture with a 64 bit address space? The answer is 2 to the 64'th - the square of 4G which is a huge number. As such, 64 bit platforms (and there are plenty of 64 bit machines around) don't actually use all 64 bits to express an address. Its way overkill. No one could afford to buy that much RAM, nor could that much RAM be manufactured small enough (yet) to fit in a PC.

Let's look at how C uses memory to store variables on a typical 32 bit addressing platform.

C compilers on a 32 bit platform typically define an int variable to be the same size as an address value. 4 bytes. The GNU C compiler running on the AFS Unix machines at CMU is no exception. Let's look at some examples of how our C compiler stores variables that you might declare in your program. Assume a declaration such as int x. The compiler finds 4 consecutive bytes and names that memory x. What we mean by naming that memory x is that the name x is a synonym for the value of the variable x.

addr: 0

addr: 1

addr: 2

addr: 3

addr: 4

addr: 5

1'st
byte
of x

2'nd
byte
of x

3'rd
byte
of x

4'th
byte
of x

addr:10

addr:11

addr:12

. . .

addr: 4,294,967,295

1 byte

. . .

1 byte

In this case the compiler picked a chunk of memory from byte #6 through byte #9. Four bytes to store a 32 bit integer.

Here is a char variable char c; being declared and storage illustrated below

addr: 0

addr: 1

addr: 2

addr: 3

addr: 4

addr: 5

addr: 7

addr: 8

addr: 9

addr:10

addr:11

addr:12

. . .

addr: 4,294,967,295

1 byte

. . .

1 byte

In this case the compiler picked byte #6. One byte to store a char. A short int on our platform is a 16 bit int and thus would use 2 bytes of storage. A float variable would be 4 bytes and a double variable would be 8 bytes of storage. The compiler always chooses the memory location to store a variable. The programmer has no input into this decision. The only exception to this rule is that once we learn how to allocate dynamic memory we will have the option to store variables in a region of memory known as the heap. In either case however, we cannot request any specific memory address for the storage of a variable.

Let's go back to our statement that the name of a variable is a synonym for the value of the memory.


int x;   /*  x is the name of a 4 byte chunk of memory in RAM */
x = 7;   /*   to change the value in x  we use its name on the left side of an assignment statement  */
int y = x + 3;   /*  to lookup the value of x we use its name in an expression */

Once a variable is declared, the variable's name is a synonym for the contents (value) of the variable. There are other properties associated with variables such as the type of the variable (int, char, float, double, etc). A variable's type implies what range of values can be stored in it, and how much storage that type requires. If you come from a Java background you already understand range of values but the storage requirement property is not as much emphasized. Another property of a variable which is explicitly de-emphasized in (at least older versions of) Java is the address of a variable. The C language gives us operators to ask a variable how much storage it uses for its type and at what address that variable is stored.

Assume we have an int variable named x.

The expression x is the value of x, and can be used to read and write the value of x.
The expression &x produces the memory address where x is stored. & is the address-of operator. The address of a variable is always the lowest numbered byte of storage in use by that variable.
The expression sizeof x produces an integer which is the number of bytes that x uses for its type.
The expression sizeof int produces the same thing. The sizeof operator will accept an explicit data type such as int, char, float etc, or a variable name.

The sizeof operator is a very intelligent and useful operator that accepts not only explicit type names like int or variable names like x, but accepts certain kinds of expressions and tells the number of bytes required by the data type.

Here is a screenshot of output produced by a simple C program that we have written for you. It lists the primitive data types, their associated storage requirement and range of values. On different platforms, different results may be displayed. This output comes from execution on a 64 bit machine that is part of Carnegie Mellon's AFS network as of Fall 2009.

The output of sizeof.c on one of the 64 bit Linux machines running Carnegie Mellon's AFS (Andrew File System).

char           1 bytes -128 to 127
unsigned char  1 bytes 0 to 255
short          2 bytes -32768 to 32767
unsigned short 2 bytes 0 to 65535
int            4 bytes -2147483648 to 2147483647
unsigned int   4 bytes 0 to 4294967295
long           8 bytes -9223372036854775808 to 9223372036854775807
unsigned long  8 bytes 0 to 18446744073709551615
float          4 bytes 1.175494e-38 to 3.402823e+38
double         8 bytes 2.225074e-308 to 1.797693e+308
pointer        8 bytes

precision of float   6 digits
precision of double 15 digits

Once we have written a few C programs and covered the basics, you can come back and read the source code of sizeof.c . It lists the built in primitive date types and shows which format specifiers are used to print those data types, along with the range of values that can be stored in that type. This little program is a handy reference.

So far we have looked at how memory is laid out and how the C compiler sets aside chunks of bytes for variables. We have seen how to ask the compiler where a variable is stored (&x) and how many bytes that variable is occupying (sizeof x). Let's now look at how the values of integers are encoded. We want to see for instance, what is the exact bit pattern used to represent 7, -13 or 157 in an integer variable.

Signed vs. unsigned numbers. Integers in C are either signed or unsigned. The unsigned are the simplest to understand. They are just binary numbers or base 2. We start with the smallest integer variable - the char type. A char is a one byte object. This fact is true across all platforms. The sizeof operator will always return 1 for a char. A char variable is most often used to store an ASCII value which represents a letter of the alphabet or punctuation character, but it is just a small integer. Sometimes a char is referred to as a byte but there is no byte type per se in C.

unsigned char c =157; /* the bit pattern representing decimal 157 is 10011101 */

Confirming 10011101 as the binary representation of 157 is done as follows:

2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0

128 64 32 16 8 4 2 1

1 0 0 1 1 1 0 1

128 +0 +0 +16 +8 +4 +0 +1

The decimal equivalent of 10011101 is 128+16+8+4+1==157. Thus the statement unsigned char c = 157 would put the bit pattern 10011101 into the 8 bits of variable c.

How are signed integers such as `char c = -7;` stored?

two's complement. In two's complement the high order bit is the sign bit. A 1 in the highest order bit means the number is negative. A 0 means the number is non negative. complement means that the negation of a number complements the absolute value. complementary angles add up to 90 deg. complementary binary numbers add to 0. Thus the complement of a positive number is simply whatever bit pattern you have to add to the number to get a sum of zero. There is a simple procedure to form the two's complement of a positive number. Just flip the bits and add 1. Let's do this with a simple value.


char c = 7;   /* bit pattern is: 00000111 */
c = -7;       /* now flip the bits to:   11111000 and add 1 which gives us: 11111001 */

  00000111
+
  11111001
  --------
  00000000

two's complement is used by all signed numbers such as short int (16 bits), int (32 bits) and long int (again 32 bits). Signed numbers sacrifice their highest order bit for direction thus cutting the absolute magnitude in half. Unsigned numbers are encoded as simple binary and all the bits are for magnitude.

Hello World

/* hello-1.c
   Illustrates: printf, fflush, stdin, stdout
*/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()
{
	int x,y;

	printf("Hello World: Enter two small positive numbers for x and y: ");
	fflush(stdout); /* needed because output streams are buffered */

	scanf("%d %d", &x, &y);
	printf("\nYou entered %d for x and %d for y\n", x,y);

	return EXIT_SUCCESS;
}

#include directive

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

The #include is a pre-compilation directive that instructs the compiler to replace the #include line with the text from the indicated file and compile it. Notice the name of the included file is in angle brackets. The file being pulled in for compilation lives in some predetermined directory where the C compiler was installed on your computer. The find command can tell you where included files (or any other) are located on your system. Soon you will write your own .h files and put them with your main program file in its directory. In that case your include would look like #include "myfile.h". The quoted filename would be expected to be in the current directory. You could add path information to the file name such as #include "project1/myfile.h" or #include "../myfile.h" which would mean "myfile.h" is located in a project directory or the directory above this one.

main function

The int main( int argc, char *argv[] ) function has some similarity to Java also in that just like a Java program, a C program must have exactly one main function (now that we are in C, we refer to methods as functions). The main function can be prototyped to return other types, but void and int are the most common. For this course we will adhere to the most common practice which is to return an int. The argc parameter indicates how many strings were entered on the command line that invoked this program. The argv parameter is the array of strings passed to the program on the command line when the program is invoked. Once we cover strings, pointers and arrays you will fully understand the syntax and semantics of argc and argv.

variable declarations

int x,y;

Variables are not automatically initialized. The C compiler does not guarantee initialization. Our integer x probably contains the same bit pattern it had after being used by the last process that used that chunk of memory. Newly declared variables should be explicitly initialized before use. The ANSI C 1989 standard also requires that code and data are not mixed. This means you must declare your variables at the top of your block, then write code after the variables are declared. If you wish to mix data and code in a block you must not use the -ansi and -pedantic switches at compilation. If you do not use the -ansi -pedantic switches on you compilation command you will be using the C90 version of the gcc compiler which permits mixing of code and data. It also allows arrays to be dimensioned with runtime values - just like Java.

console I/O: stdin, stdout

printf("Hello World: Enter two small positive integers for x and y: ");
fflush(stdout); /* needed because output streams are buffered */

scanf("%d %d", &x, &y);
printf("You entered %d for x and %d for y\n", x,y);

C defines several streams. A stream is just a source or destination for data. The most commonly used are stdin (the keyboard), and stdout (the terminal or screen/monitor). These streams are always open and ready to be used. The printf function writes to stdout, while the scanf function reads from stdin. In the first printf we supply a string literal. Notice that we flush our stdout stream because I/O in C is buffered. Every time you write to stdout, (or any other stream) you are really writing to a buffer in memory. Only when that stream's buffer is flushed does the data actually show up on the stream/device. The ANSI standard says that unless you flush the stream you are not guaranteed that the data reaches the stream. In this case it means you might never see the text show up on your screen. Novices are often puzzled when debugging code with print statements because they are writing printf after printf such as "in function foo" and "returning from function foo" etc, and they are sure the program is going into foo and back out again but the output never shows up on the screen. The reason is they never flushed the buffer. In virtually every ANSI C compiler I've used, the compiler does a flush for you every time a newline is written to the stream. Many programmers take this for granted and treat newline '\n' as a flush of a text stream. However, it is not guaranteed behavior. In ANSI C only a flush of the stream guarantees the buffered output actually gets written to the stream - but, as just mentioned, virtually every compiler treats newline as a flush, and nearly every programmer takes advantage of this. One more thing about streams. Flushing only applies to output streams like stdout and stderr or a disk file you are writing to. You can't/don't flush input streams.

The scanf statement causes your program to stop and wait for the user to type some keystrokes followed by a RETURN. As soon as the user hits RETURN the entire sequence of keystrokes (string) is stored in the system buffer. Now the %d and %d embedded in the format string come into play. Those %ds are format specifiers for conversion to integer. We will soon survey the other conversion types like %s for string, %c for char, %f for float, and so on. Since there are two %ds the inputted string will be tokenized to extract the first two tokens. The first token will be converted to an integer, and the second will be converted to integer also. You can refer back to our sizeof.c program for an example of the primitive data types and the format specifiers used to print them.

The first converted value will be stored at memory address &x and the second successful converted value at memory address &y. Recall that the & (address-of) operator produces the address of the variable to its right. We are telling scanf to read a line of text from the keyboard, convert the first two tokens in the string to integers and store those integers inside x and y respectively.

Our scanf is a value returning function. It returns an int representing the number of successful conversions. Since our format string only had two % something or others in it, then our scanf can only return a value of 2, 1, or 0 depending on whether it found 2, 1, or 0 tokens that could be converted to int. In the case of an unsuccessful conversion, that respective variable does not get anything assigned into it. Unlike Java, scanf will not crash, throw an exception or give any indication if you enter your first name where a number was expected. It will simply fail to convert that string to a number and fail to store any new value at the address specified. It is the programmer's responsibility to check the value returned by scanf to verify that scanf returned the number two, if two conversions were requested.

The printf and scanf functions have another form which allows us to read/write stdin/stdout. This alternate form lets us pass in the name of the stream. This form is used to read/write text files but will take stdin/stdout as stream args just as well.

fprintf( stdout, "Hello World" )      is identical to     printf( "Hello World" )

fscanf( stdin, "%d %d", &x, &y )      is identical to     scanf( "%d %d", &x, &y );

What would happen if in our scanf() or fscanf() we just used x and y instead of &x and &y?

compilation

Download hello-1.c to your Unix machine and cd into the directory where you put the file.


$ pwd
/afs/andrew.cmu.edu/usr20/tlh/C
$ ls
hello-1.c
$ gcc  -W -Wall -Wextra -O2 hello-1.c
$
$ ls
a.out  hello-1.c
$

Notice I issued a shell command pwd which printed the current working directory. I then printed a listing of the files in the directory with an ls command. The only file in the directory was our hello-1.c which I compiled with gcc -W -Wall -Wextra -O2 hello-1.c then listed the directory again.

Let's break this command down into its components and explain each one.

gcc is the program that compiles our code. Our GNU C compiler is itself, just another program.
-W -Wall -Wextra -O2 these are switches that customize the behavior of our compilation. Remember we promised to show you how to get all the help the compiler can give you. Using these switches tells the compiler to apply more scrutiny to your code so that those things which can be detected at compile time will be reported to you as warnings and errors. The -O2 ("oh two" not "zero two") switch is calling for code optimization at a level of 2. This course does not address code optimization, but -O2 switch does detect use of un-initialized variables. There are many other switches you can use in your compilation command that we will not cover in this course. The history of how these switches came about and what things they detect is a rather random and spurious. As the language evolved switches were added or changed in a rather ad-hoc manner. For example -Wall means "warnings all". So you might think that means it warns on all infractions. Well, not quite. If you want to detect failure to use argv or argc then you must add -W which is just "warnings". Go figure. Better yet, use them and never ignore warnings. In this course you are never allowed to hand in code with warnings.
hello-1.c the list of files to be compiled. In this case its a list of just one file. That will change soon.

Because a file named a.out showed up in our directory, we know the compilation succeeded. We did have some warnings, but no errors. There are some warnings that are always benign and some that are benign depending on what you are doing or not doing in the code. Many warnings are a foreboding of disaster. We will identify and qualify warnings as we encounter them. For now we will never ignore warnings because the dangerous ones must be fixed, and the benign ones are easy to fix. The warnings we just encountered are a result of the fact that we never used the argc and argv parameters. This warning is benign since we do not intend to pass anything in on the command line to this program. By ignoring argc and argv we're not missing anything. The dangerous scenario is when you use argv but never check argc to see if anything actually got sent in. As a result you try to read argv[1] but there is no argv[1] and you are now out of bounds in an array. We can ignore these warnings or we can change the prototype of our main such that there is no argc or argv. Its a simple fix - just make main look like int main(). Go ahead and make the change right now and re-compile. If you want your executable to be named something other than a.out, then add the -o switch (dash little oh) to your command and put the name you want after the -o. For instance you could recompile with

gcc -W -Wall -Wextra -O2 hello-1.c -o hello-1.exe

execution

Now that we have a clean compilation we can execute the program. Since I am the owner of this file, I should already have execute privileges on a.out Let's do an ls -l command to see who owns a.out and what the privileges on that file are.


$ ls -l
total 6
-rwxr-xr-x 1 tlh staff 5054 Jul 29 15:57 a.out
-rw-r--r-- 1 tlh staff  388 Jul 29 15:54 hello-1.c
$

The -l switch on the ls command lists each file on its own line and gives several properties for the file. We are trying to verify we have privileges to execute this file. I can see that tlh is the owner and I have execution privileges. Typically if you compile a C program, you are granted execution rights to the executable file that is created. We have already seen the chmod command and can look in the man pages for help. Suppose we did not have execution privileges to this file. This happens when for instance we copy an executable from someone else's directory and they are the owner, not us, or if we download an executable to our machine. The fix is simple. The following discussion illustrates some alternative syntax for the chmod command.


$ ls -l
total 6
-rwxr-xr-x 1 tlh staff 5054 Jul 29 15:57 a.out
-rw-r--r-- 1 tlh staff  388 Jul 29 15:54 hello-1.c
$
$ chmod +x a.out
$
$ ls -l
total 6
-rwxr-xr-x 1 tlh staff 5054 Jul 29 15:57 a.out
-rw-r--r-- 1 tlh staff  388 Jul 29 15:54 hello-1.c
$

The chmod command allowed us to turn on execution +x for a.out. Of course this example didn't really prove anything since I already had execute privileges. Let's turn off execution, look at the flags and try to execute. The shell will refuse to execute. Then we can turn execution back on, examine the flags and execute. To turn off execution we will use another form of the chmod command chmod 666 a.out. Think of each digit in 666 as an octal digit that represents three bits: owner, group, and world respectively. The first 6 changes the owner's rwx flags to 110 respectively. Thus owner now has rw- (no execution). The other two 6's do the same thing to the group and world flags. When we examine the flags after chmod 666 a.out they look like -rw-rw-rw-. We then use the same form of the command but this time with 755 to give us full privileges and give everyone else read and execute, which is a typical security setting.


$ chmod 666 a.out
$ ls -l
total 6
-rw-rw-rw- 1 tlh staff 5054 Jul 29 15:57 a.out
-rw-r--r-- 1 tlh staff  388 Jul 29 15:54 hello-1.c
$
$ ./a.out
./a.out: Permission denied.
$
$ chmod 755 a.out
$ ls -l
total 6
-rwxr-xr-x 1 tlh staff 5054 Jul 29 15:57 a.out
-rw-r--r-- 1 tlh staff  388 Jul 29 15:54 hello-1.c
$

We now know how to turn read, write and execution privileges on/off. Before we continue we should explain the syntax ./a.out. Note the ./ before the a.out. The dot character is a synonym for the current working directory. You can verify this by issuing an ls command with . as the arg. The / (slash character) is the directory path separator. So the ./ is the syntax that prepends the current working directory path to the file we want to execute. This is needed because whenever you type a verb (our program is a verb) on the command line, the shell assumes that verb is the name of an installed program in some public directory of installed programs. We don't want the shell looking in /usr/local/bin or some other default directory for our a.out file. We want the shell to execute the a.out that's right here in our working directory.

Exercise 1:

Modify our hello-1.c program such that it reports the values of x and y only if both numbers have good values in them.

Here is a sample solution: solution-1.c

In this solution we introduce more piece of the language: function prototypes. A function's prototype is its signature line with a semicolon after it. In our solution-1.c we prototyped the fatal function (which we wrote just now) above main as follows:

void fatal( char * msg );

Like variables, functions must be declared before referenced. A prototype above main satisfies the compiler. We can then put the actual function body below main, or even in another file. Function prototypes are more than a convenience. They solve an intractable problem. Suppose you have two functions named functA and functB respectively. Suppose functB calls functA and functA calls functB. This is mutual recursion. Without prototypes it would be impossible to get such a program to compile. If you placed functA above functB the compiler would complain that you called functB before defining it. If you reverse the order you get the same complaint about functA being called before it has been defined. The only solution is to use prototypes which may be placed in any order as long as a function's prototype appears before the function is used. Prototypes are often placed inside a .h file which we will see soon. In our solution-1.c we just put our fatal function's prototype above the main function in the same source file where the function is called.


$ cat mutual.c
/*
  mutual.c
  Illustrates: mutual recursion requires prototypes to compile
*/
#include <stdlib.h>
#include <stdio.h>
void functA()
{
  printf("in functA\n");
  functB();
}
void functB()
{
  printf("in functB\n");
  functA();
}

int main( )
{
  functA();
  return 0;
}
$
$ gcc  -W -Wall -Wextra -O2 mutual.c
mutual.c: In function `functA':
mutual.c:12: warning: implicit declaration of function `functB'
mutual.c: At top level:
mutual.c:15: error: conflicting types for 'functB'
mutual.c:12: error: previous implicit declaration of 'functB' was here
$

All these problems are a result of the fact that functB was referenced before it was defined.

mutual.c: In function `functA': mutual.c:12: warning: implicit declaration of function `functB'	If a function is referenced before declared (or prototyped) the compiler will assume the function returns an int and attempt to find the declaration later. This is what implicit declaration means.
mutual.c: At top level: mutual.c:15: error: conflicting types for 'functB'	The compiler has finally seen the actual declaration and it conflicts with the implicit one made earlier. The actual declaration does not return int. As a result no actual declaration is found to match the implicit one and compilation fails.
mutual.c:12: error: previous implicit declaration of 'functB' was here	The compiler points out where the implicit declaration was generated (which does not match the actual declaration found later).

As an exercise, follow the example of prototyping shown in solution-1.c. Fix the mutual.c program by putting prototypes for functA and functB above main, then move the actual function definitions below main. Warning! once you get it to compile cleanly, beware that if you execute it, it will run until you hit ^C to make it stop. Those recursive functions have no base case and will just keep calling each other until the program runs out of memory and crashes. Eventually we will understand recursion and use it correctly. This example was intended only to illustrate the canonical case for prototypes which is mutual recursion.

Going back to our solution-1.c file, our fatal function takes a string parameter. When main calls fatal it passes a literal string constant. The incoming parameter is prototyped as a char * type which means pointer to character. Since a string is just a sequence of characters, a pointer to the first character of that sequence represents a pointer to a string. C however has no string type per-se and thus there is no such thing as a pointer to a string - only a pointer to a character. We will cover strings in explicit detail in our next chapter. For now we show you how to pass strings to a function and how to prototype the incoming string parameter as char *. The exit function causes the program to exit immediately without returning back through the call chain. EXIT_FAILURE is a #define value.

It is dangerous to ignore implicit declaration warnings because the compiler will attempt to find some function that matches the name of the function you called. It is possible that the compiler will find a function by the same name as the one you called, but that function is not the intended match. This happens when you are writing a new function and that function's name is the same as some obscure function that is part of your compiler. Further suppose that you call your function but you forget to actually write the function definition. In this case the compiler looks for your newly written version of the function but instead discovers an existing function by that same name and matches (links) them to make the compilation succeed. So, your program compiles but when you run it and that function is called, the code that gets executed is not what you intended and you get a crash or worse - unpredictable behavior.

Function prototype vs. function signature

A function's signature is how the compiler disambiguates it from all other functions. In C the signature consists of the name and args by number, type and order. The return type is not part of the signature and you cannot have two functions that differ only in return type.

argc, argv and for loop demonstrated

Before we go on to some more Unix stuff let's revisit argc and argv with a program that will make it clear how to use them. This program contains a simple loop that prints each token passed into the program from the command line.

cmdargs.c


/*	cmdargs.c

	Illustrates args, argv and for loop
*/
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
	int i;
	for (i=0 ; i< argc ; ++i)
		printf("argv[%d] = %s\n", i, argv[i] );

	return EXIT_SUCCESS;
}

output of cmdargs.c


$ ./a.out hello Unix and hello World
argv[0] = ./a.out
argv[1] = hello
argv[2] = Unix
argv[3] = and
argv[4] = hello
argv[5] = World
$

In C, the for loop does not allow us to declare the loop counter variable inside the for statement as does Java. The counter must be declared outside, prior to the loop statement.

Chapter 1.3 File I/O: text files and binary files

Text files: fprintf() and fscanf()

We now turn our attention to reading and writing files. Error checking will also be covered.

File-I/O on text streams: review of fscanf() and fprintf()

Thus far we have seen how to read and write the two predefined text streams stdin and stdout using fscanf(stdin,"%d",&x) and fprintf(stdout,"x=%d",x). Those two streams are associated with text devices. We now show you how to use fprintf and fscanf to read/write text files.

Here is a program that demonstrates fscanf(), fprintf(), and a few other pieces of the language.

fileio-1.c


/* fileio-1.c  demonstrates:

	argc, argv
	atoi(), exit()
	printf(), fprintf(), fscanf()
	fopen(), fclose()
*/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main( int argc, char *argv[] )
{
 	FILE *infile, *outfile; /* infile and outfile are pointers to FILE objects  */
 	int x,i,n;

	if (argc < 3 )
	{
		printf("must enter two values on cmd line: a small positive integer followed by name of output file.\n");
		exit( EXIT_FAILURE );
	}

	n = atoi( argv[1] ); /* read the man pages on atoi(). Converts a string to an int */
	if (n<=0)
	{
		printf("must enter a SMALL POSITIVE INT  followed by name for output file on cmd line\n");
		exit( EXIT_FAILURE );
	}

 	outfile = fopen(argv[2], "wt" ); /* "wt" means we are writing text to the file */
 	if (NULL==outfile) /* if the open fails then a NULL pointer was put into outfile */
 	{
		printf("Can't open %s for output.\n", argv[2] );
 		exit( EXIT_FAILURE );
	}

	/* READY TO WRITE A SEQUENCE Of INTS TO THE OUTPUT TEXT FILE */

	printf("\nWriting to file %s\n", argv[2] );
	for (i=1 ; i<=n ; ++i )
	{
		printf("...wrote %d\n",i);
		fprintf( outfile, "%d\n", i );
	}
	fclose( outfile );

	/*RE-OPEN THAT FILE AS INPUT AND READ THEM BACK IN AND ECHO TO STDOUT */

 	infile = fopen(argv[2], "rt" ); /* "rt" means we are reading the text file */
 	if (infile==NULL) /* we really don't expect this to happen considering we just wrote it - but we always test */
 	{
		printf("Can't open %s for input.\n", argv[2] );
 		exit( EXIT_FAILURE );
	}

	printf("\nNow reading from file %s\n", argv[2] );
	for (i=1 ; i<=n ; ++i )
	{
		fscanf( infile, "%d", &x );
		printf("...read %d\n",x);
	}
	fclose( infile );

    return 0;
}

In the program above we are expecting the user to enter two values on the command line after the name of the executable. If the executable is named a.out then we are expecting the user to enter something like this on the command line.

./a.out 10 output.txt

argv

argc

FILE

pointer to FILE

if (NULL==infile)

if (infile==NULL)

if(infile=NULL)

if (NULL=infile)

Let's zoom in to the byte and bit level on our streams to see exactly how fprintf() and fscanf() do conversion and formatting to the data they read and write.

We store a number such as 7 into variable i.

1st byte	2nd byte	3rd byte	4th byte
00000000 00000000 00000000 00000111

We write that number out to our text file using: fprintf( outfile, "%d\n", i );
What gets written to the file is the following two bytes:

1st byte	2nd byte
'7'	'\n'
00110111	00001010

The first byte is the ASCII character code value for the number 7. There is a difference between the number 7 and the character '7'. fprintf() means file print formatted . Our format string is "%d" which tells fprintf to convert the number to a sequence of digit characters which represent the number. The last byte is the ASCII for newline ('\n') character which is decimal 10 in the ASCII chart. Unix writes out the newline as a single byte of ASCII value 10 (or 00001010 in binary). Windows and DOS on the other hand, write out two bytes of ASCII values for '\n', 10 and 13.

Suppose the value in our integer variable i was 64295. Then the sequence of bytes written to the output file would be:

-	1st byte	2nd byte:	3rd byte	4th byte	5th byte	6th byte: '\n'
character	'6'	'4'	'2'	'9'	'5'	'\n'
decimal value from ASCII table	54	52	50	57	53	10
binary	00110110	00110100	00110010	00111001	00110101	00001010

And lastly, if i contained a negative integer like -7 then fprintf(outfile,"%d\n",i) would produce:

1st byte: '-'	2nd byte: '7'	3rd byte: '\n'
00101101	00110111	00001010

You may be wondering where I got the decimal ASCII values for characters such as '6' or '\n' (newline). Most modern compilers represent characters using the ASCII table scheme. Although this is not required by the ANSI standard the exceptions are few. If you suspect your ANSI compiler does not use the ASCII table, you should write a simple program to echo the letters of the alphabet followed by their decimal value.

ASCII TABLE

Look at the ASCII table and notice it is a list of 8 bit values from 1 through 127. Thus a text file is a file that contains bytes with values no greater than 127. In other words it only contains ASCII characters. As it turns out nearly all compilers have some graphical control character mapped to values 128 through 255. This is sometimes called the extended ASCII character set. These extended value mappings are not standard and vary greatly from system to system. Our table above lists each character in the standard ASCII set by its decimal, HEX, octal and even HTML value. For instance the graphical character 'A'.

'A' is decimal 65, HEX 41 and octal 101 (and binary 01000001 which is not shown in our table).

Let's go back and look at our example above where we showed what would be written to the text file if we executed fprintf( outfile, "%d\n", i ); where i is 64295. This time we will display that sequence in HEX. We also split up the 8 bit byte in the bottom row into 2 groups of 4 bits. The ease of translation from a HEX digit to a group of 4 bits becomes apparent. In fact HEX is very often used instead of binary to illustrate the exact bit pattern in a given chunk of memory. Hex is more compact and readable.

'6'	'4'	'2'	'9'	'5'	'\n'
HEX 36	HEX 34	HEX 32	HEX 39	HEX 35	HEX 0A
0011 0110	0011 0100	0011 0010	0011 1001	0011 0101	0000 1010

Hands on: Write a program to apply what you just learned

Download the following program skeleton and finish it. Your program will prompt the user to repeatedly enter an integer from stdin. However, it will be entered in a loop, one digit at a time followed by a return. As the user enters each digit you will update your calculation of the entire number being entered and echo its current value to stdout.

Exercise-4.c

Hints:

Use scanf("%c",&charVar) or charVar = getc(stdin) to read your key from stdin. See man pages for doc on getc().
After reading the digit, call getchar() to eat the RETURN that gets entered after every digit key. We don't want that RETURN code to be confused with our digits.
Look at the ASCII table to see that '0', '1', '2', '3' ... '9' are sequential.
To convert a digit key's ASCII value to its numeric value - just subtract '0' from it. Example if char c = '5', then the expression c-'0' produces the number 5.

solution-4.c DON'T PEEK until you've given it your best shot!

SUMMARY: using text files

Declare a FILE variable as in:	`FILE * txtFile;`
Use fopen as in:	`txtFile = fopen( argv[1], "rt" ); /* ("rt" means open for read as text. Use "wt" to write, "at" to append )*/`
always verify the file variable is not NULL	`if (NULL==txtFile) exit(EXIT_FAILURE);`
Read using fscanf() as in:	`int result = fscanf( txtFile, "%d", &x);`
Verify the fscanf succeeded:	`if (result !=1) { fprintf(stderr,"fscanf failed\n"); }`
Write using fprintf as in:	`int result = fprintf( txtFile, "%d", x);`
Verify the write succeeded:	`if (result != 1) { fprintf(stderr,"fwrite failed\n"); }`
Close file when done as in:	`fclose(txtFile);`

Caveat on using feof() to detect end of a text file

Suppose you have a text file that consists of a sequence of numbers with one number per line like below and that input file has a blank line at the bottom.

The above input file has a blank line at the bottom. You want to write a loop that repeatedly reads a number from the input file and then writes that number to stdout. You want to stop when you exhaust the input file of numbers. The following program illustrates a common error committed by novices as they try to use the feof( infile ) function to indicate when the input file has been exhausted. The feof function takes an open input file parameter and returns true/false whether end of file has been reached. The problem with our read loop is the newline at the bottom of the file will cause the feof() function to NOT report EOF. As a result of this newline we enter the loop one more time. The fscanf eats the newline and of course can't convert it to a number. No new value is stored in x which contains the last successful conversion. So fscanf returns EOF value (which will never be positive) but since we are not looking at the value returned by fscanf we go ahead and write out the old value of x again.

The correct approach is to break as soon as your read does not return the expected value rather than relying on feof. Once the loop is exited you are free to look at feof().

/* feof-caveat.c

This program demonstrates a danger of using feof to detect end of file when reading a sequence of values from an text file with empty lines at the bottom.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main ()
{
  FILE *txtFile = fopen("input.txt", "rt");
  short x=7; /* any value will do */

  if (!txtFile) exit(EXIT_FAILURE);

  while(!feof(txtFile))
  {
	fscanf( txtFile, "%hi", &x );
	printf("writing x=%hi to output file\n", x );
  }
  printf("EOF of txtFile detected.\n");
  fclose(txtFile);

  return EXIT_SUCCESS;
}

Download and try this incorrect code for yourself. The correct way to read the file is:


while ( fscanf(txtFile, "%hi", &x) == 1 )
{
...Write x out to some output file
}

Note that in the correct form above we do not enter the loop again unless fscanf returned exactly the number of conversions requested. We know that fscanf will only return 1 when it successfully finds another number in the file. Any other value from fscanf() and our loop stops. This is the correct idiom to read of all the numbers in the file. As an exercise, edit the feof-caveat.c file and replace the bad loop with the correct loop above. Test it again. What happens now?

binary files: fread() and fwrite()

We now illustrate reading and writing binary files using fread and fwrite. Recall that binary files are not intended to be readable to people. Such files can contain bytes whose value is greater than 127. Unix does not distinguish between text and binary files. To Unix, all files are just sequences of bytes. C however does make a distinction and that distinction is really that binary files are read and written using functions that do no formatting or conversion. fread() and fwrite() simply copy a chunk of bytes to or from RAM or disk.

It is important to open a binary file using the "b" mode in the fopen() because on some platforms (Windows) you may get incorrect output if you fwrite to a binary file that was not opened using the "b" mode.

Our next program demonstrates fread() and fwrite() and arrays of ints.

fileio-2.c


/* fileio-2.c  demonstrates:

	fread(), fwrite()

*/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main( int argc, char *argv[] )
{
 	FILE *infile, *outfile;
 	int x=7;
 	unsigned int i;
	short h= 35;
	char c = 'Z';
	int arr1[] = {3,5,7,11,13};
	int arr2[5];

	if (argc < 2 )
	{
		printf("must enter a name for binary output file on cmd line\n");
		exit( EXIT_FAILURE );
	}

 	outfile = fopen(argv[1], "wb" ); /* "wb" means we are writing binary to the file */
 	if (outfile==NULL) /* then the open must have failed */
 	{
		printf("Can't open %s for output.\n", argv[1] );
 		exit( EXIT_FAILURE );
	}

	/* WRITE BINARY DATA TO THE FILE USING fwrite() */

	printf("\nWriting to file %s\n", argv[1] );

	fwrite( &x, sizeof(x), 1, outfile );
	fwrite( &h, sizeof(h), 1, outfile );
	fwrite( &c, sizeof(c), 1, outfile );
	fwrite( arr1, sizeof(arr1), 1, outfile );

	printf("...wrote %d %d %c ",x,h,c);

	for (i=0 ; i< sizeof(arr1)/sizeof(int) ; ++i)
		printf("%d ", arr1[i] );
	printf("\n");

	fclose( outfile );

	/*RE-OPEN THAT FILE AS INPUT AND READ DATA BACK IN AND ECHO TO STDOUT */

 	infile = fopen(argv[1], "rb" ); /* "rb" means we are reading the binary file */
 	if (infile==NULL) /* we really don't expect this to happen considering we just created it - but we always test */
 	{
		printf("Can't open %s for input.\n", argv[1] );
 		exit( EXIT_FAILURE );
	}

	printf("\nNow reading from file %s\n", argv[1] );
	x=0; h=0; c=0;

	fread( &x, sizeof(x), 1, infile );
	fread( &h, sizeof(h), 1, infile );
	fread( &c, sizeof(c), 1, infile );
	fread( arr2, sizeof(arr2), 1, infile );

	fclose( infile );

	printf("...read  %d %d %c ",x,h,c);
	for (i=0 ; i< sizeof(arr1)/sizeof(int) ; ++i)
		printf("%d ", arr2[i] );
	printf("\n");


    return 0;
}

Let's zoom in to the byte and bit level on our binary output file to see exactly what fwrite() wrote to disk and exactly what fread() read back into memory.

First we have int x=7;

32 bits in variable x
1st byte	2nd byte	3rd byte	4th byte
00000000	00000000	00000000	00000111

Then fwrite( &x, sizeof(x), 1, outfile ); and what gets written to the file is:

32 bits written to the file
1st byte	2nd byte	3rd byte	4th byte
00000000	00000000	00000000	00000111

Note that fwrite() simply copied a chunk of memory from RAM to disk without formatting or converting it. Now let's look at the meaning of each parameter of fwrite().

starting address of this chunk: fwrite(&x,sizeof(x), 1, outfile );
number of bytes in this chunk: fwrite(&x, sizeof(x), 1, outfile ); /* assuming a 32 bit platform, int==4 bytes */
number of multiples of this chunk: fwrite(&x, sizeof(x),1, outfile );
destination file: fwrite(&x, sizeof(x),1,outfile );

The same thing applies for the short int and the char variables. Let's skip to the array being written to disk.

Again, the chunk is simply copied to the file with no formatting or conversion. Let's look at the parameters again:

an array's name should be used as the address of its first element (Don't put & before the arrays name!): fwrite(arr1,sizeof(arr1), 1, outfile );
the sizeof operator correctly returns 20 as the number bytes occupied by arr1: fwrite(arr1, sizeof(arr1), 1, outfile ); /* assuming a 32 bit platform, int==4 bytes */
1 multiple of this chunk is to be written. Only copy the FIRST 20 bytes: fwrite(arr1, sizeof(ar1),1, outfile );
destination file: fwrite(arr1, sizeof(arr1),1,outfile );

A few questions for you:

How can you use sizeof to tell you the number of elements in an array? (click here to see the answer)
How would you rewrite: fwrite(arr1, sizeof(arr1),1, outfile ); in a platform independent manner, such that you replace the literal '1' with the actual number of ints in the array, and pass sizeof the number of bytes in the data type that the array points to? Don't hardcode the number 20 anywhere or the data type int in the value you give the sizoef operator. (click here to see the answer)
Referring back to Question #1, why doesn't this work if you are in a function and the array has been passed in? (click here to see the answer)
Why do you think the C compiler turns an array's name into a pointer when you pass it into a function instead of making a copy of the array? (click here to see the answer)

Let's now look at how fread() puts data into memory

Assume we just did: fwrite( &x, sizeof(x), 1, outfile ); and what got written to the file was:

1st byte	2nd byte	3rd byte	4th byte
00000000	00000000	00000000	00000111

Now read it back in with: fread( &x, sizeof(x), 1, infile ); and this is what memory looks like:

32 bits in variable x
1st byte	2nd byte	3rd byte	4th byte
00000000	00000000	00000000	00000111

This is pretty simple. We just did the inverse. We copied a chunk byte for byte from disk to RAM. No formatting or conversion. Notice that the parameters are exactly the same. The only difference is that the last parameter is now the source and the first parameter is now the destination. There are a few caveats or "gotcha" cases that must be understood and avoided. Suppose we declare a short int like this: short int shint = 7; and we write it out to disk with fwrite() like this:


short int shint = 7;
fwrite( &shint, sizeof(shint), 1, outfile );

What got written to disk looks like this:

16 bits of shint		next byte junk	next byte junk
00000000	00000111	10010010	11110111

So.. we see that we have only written out 2 bytes to disk - that's all the memory our variable shint occupies in memory and so that's all our fwrite() put out

Now..what exactly would get put into memory if we executed this line of code next ?

fread( &x, sizeof(x), 1, infile ); /* x is full 32 bit int -- not a 16 bit short int */

Well.. memory would now look like this:

32 bits in variable x
1st byte	2nd byte	3rd byte	4th byte
16 bits from shint		16 bits junk AFTER shint
00000000 00000111		10010010 11110111

This is all WRONG! If we print x, out comes a number much bigger than 7! We screwed up when we read a 4 bytes chunk from a 2 byte object and got 2 bytes of data and 2 bytes of garbage. We have no idea what those extra two bytes are! Let's now look at the inverse error.

Suppose we just wrote out our 32 bit x to the disk and then we execute: fread( &shint, sizeof(shint), 1, infile );. What do we get when we print our short int? We see a big fat 0 come out because we only grabbed the first two bytes of x. Those bytes are all zeros, the 2nd two bytes which contain the value 7 are never read into our short var. Moral of the story: Read exactly what you wrote. If you don't you will lose data, pick up garbage or overwrite memory that shouldn't be written to. Unlike Java, C does not throw an exception for you or necessarily crash when you do something dangerous like touch memory you don't own. The result of any such memory errors is always the same - unpredictable/undefined behavior. This is the worst thing that can happen. There is no sure way to recognize and recover from such errors. You must avoid them with strict adherence to the rules.

Error checking on File operations

fread and fwrite can fail for reasons other than memory errors. These other failure modes are easily detected using ferror() and feof(). It is the programmer's responsibility to check for an error after every I/O operation. Here is a program that writes and reads a binary file and does error checking after each I/O attempt.

fileio-3.c


/* fileio-3.c

   demonstrates the following:

   - FILE, fopen, fwrite, fread, fclose
   - writing and reading binary files of primitives
   - formatted console I/O of values read from binary files
   - #define
   - ferror() - returns a value that is re-initialized by each call to fwrite or fread. non zero value indicates I/O error
   - feof() returns true or false if EOF has been encountered on the stream just read
   Expects a command line arg: name of binary output file to be created

   Writes ten ints followed by ten doubles to that file then reopens the file for reading, and echoes values to console
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NUM_INTS      10
#define NUM_DOUBLES   10

int main( int argc, char *argv[] )
{

	FILE *binaryFile;

	int     i;       /* loop counter as values are writen/read to/from file */
	int     iVal;    /* a sequence of these will be written to outfile */
	double  dVal;    /* followed by a sequence of these  */

	if (argc < 2)
	{
		printf("usage: ./a.out  \n");
		exit(0);
	}

	/* OPEN FOR WRITE   */

	binaryFile = fopen( argv[1], "wb" );  /* note the 'b' for binary in the mode string "wb". Windows requires this */
	if (binaryFile == NULL)
	{
		fprintf(stderr,"Can't open binary output file %s\n", argv[1] );
		exit(EXIT_FAILURE);
	}

	/* WRITE TEN INTS & TEN DOUBLES */

	for (i=0 ; i<10 ; ++i)
		if ( fwrite( &i, sizeof(i), 1, binaryFile ) != 1)
		{
			if ( ferror(binaryFile) )
				fprintf(stderr,"I/O Err (fwrite)\n" );
			else
				fprintf(stderr,"???\n" );
			exit(EXIT_FAILURE);
		}

	for (dVal=0.0; dVal < 10.0 ; dVal += 1.0)
		if ( fwrite( &dVal, sizeof(dVal), 1, binaryFile ) != 1)
		{
			if ( ferror(binaryFile) )
				printf("I/O Err (fwrite)\n" );
			else
				printf("???\n" );
			exit(EXIT_FAILURE);
		}

	fclose( binaryFile );

	/* RE-OPEN FOR READ BINARY. ECHO TO CONSOLE */

	binaryFile = fopen( argv[1], "r" );

	if (binaryFile == NULL)
	{
		fprintf(stderr,"Can't open binary input file %s\n", argv[1] );
		exit(EXIT_FAILURE);
	}

	/* READ/ECHO TEN INTS & TEN DOUBLES */

	for ( i=0 ; i<10 ; ++i )
		if ( fread( &iVal, sizeof(iVal), 1, binaryFile) == 1 )
			printf("%d ",iVal );
		else
		{
			if ( ferror(binaryFile) )
				fprintf(stderr,"I/O Err (fread)\n" );
			else if ( feof(binaryFile) )
				fprintf(stderr,"Premature EOF (fread)\n" );
			else
				fprintf("???\n" );
			exit(EXIT_FAILURE);
		}
	printf("\n");

	for ( i=0 ; i<10 ; ++i )
		if ( fread( &dVal, sizeof(dVal), 1, binaryFile) == 1 )
			printf("%f ",dVal );
		else
		{
			if ( ferror(binaryFile) )
				fprintf(stderr,"I/O Err (fread)\n" );
			else if ( feof(binaryFile) )
				fprintf(stderr,"Premature EOF (fread)\n" );
			else
				printf(stderr,"???\n" );
			exit(EXIT_FAILURE);
		}
	printf("\n");

	fclose( binaryFile );

	return 0;
}

SUMMARY: using binary files

Declare a FILE variable as in:	`FILE * binaryFile;`
Use fopen as in:	`binaryFile = fopen( argv[1], "rb" ); /* ("rb" means open for read as binary. Use "wb" to write, "ab" to append) */`
always verify the file variable is not NULL	`if (NULL==binaryFile) exit(EXIT_FAILURE);`
Read using fread() as in:	`int result = fread( &iVal, sizeof(iVal), 1, binaryFile)`
Verify the fread suceeded:	`if (result != 1) fprintf(stderr,"fread failed\n");`
Write using fwrite as in:	`int result = fwrite( &i, sizeof(i), 1, binaryFile )`
Verify the write suceeded:	`if (result != 1) fprintf(stderr,"fwrite failed\n");`
Close file when done as in:	`fclose(binaryFile);`

Chapter 1.4 The standard streams: redirection and pipes (+ find, grep, wc and tar)

The standard streams and their redirection

Now that we can write, compile and execute a trivial C program, we can apply some Unix tools, starting with stream redirection (then pipes and eventually shell script). As we do, keep in mind one of our primary uses of Unix tools is program verification. Once a piece of code is written, how can we streamline the testing and verification of the code? Suppose you know what the correct output of your program should look like and you want to compare that reference output repeatedly against your program's output until they match. The low tech way would be to have a hard copy of that reference output in hand to compare against what your program just outputted to the screen. Alternatively, you might have a second window displaying that reference output to visually compare to your program's output. This method is awkward and prone to error. A better solution is to redirect the screen output produced by your program and save it into a file, then use the diff utility to look for differences between the your program's output and the reference output. To do this we must explain the standard streams and show you how to redirect them.

A stream is either a source or destination for data. The three most commonly used standard text streams are stdin, stdout and stderr. Another stream is /dev/null.

stdin is an input stream associated with a device - your keyboard. When you execute a scanf(..) statement, you are reading text from the keyboard.
stdout is a output stream associated with a device - your terminal (a.k.a. screen, monitor). when you execute a printf(..) you are sending text to the terminal.
stderr is a output stream associated with your terminal just like stdout. When you execute fprintf(stderr, "Fatal Error! program aborting\n") you are sending text to the stderr stream which will also show up on your terminal. We will differentiate stderr from stdout shortly.
dev/null is a output stream associated with no device. This stream is generally used to make textual output disappear and not show up anywhere. Sometimes you have a command to execute and you just don't want the textual output it produces to be seen or stored anywhere. In this case you would redirect that output to /dev/null.

The majority of Unix commands that produce text output, write that output to stdout. Since we want to demonstrate redirecting stdout to a text file we don't have to write a program to produce output. We can just execute any number of Unix commands that write to stdout and then redirect that output to a file. Once this simple process is understood we can go back and redirect the stdout of any program we have written. In the screen shot below we execute the ls command and see its output to stdout (the screen). We then execute the same command but redirect stdout to a text file using the > operator.


$ cd classical
$ ls
ClairDeLune.wma        FurElise.wma     MoonlightSonata.wma  TheMaiden.wma
EtudeinE.wma           Liebestraum.wma  PreludeinEminor.wma  UnSuspiro.wma
FantasieImpromptu.wma  Malaguena.wma    RustlesofSpring.wma
$
$ ls > listing.txt
$
$ ls
ClairDeLune.wma        FurElise.wma     MoonlightSonata.wma  TheMaiden.wma
EtudeinE.wma           Liebestraum.wma  PreludeinEminor.wma  UnSuspiro.wma
FantasieImpromptu.wma  Malaguena.wma    RustlesofSpring.wma  listing.txt
$
$ cat listing.txt
ClairDeLune.wma
EtudeinE.wma
FantasieImpromptu.wma
FurElise.wma
Liebestraum.wma
Malaguena.wma
MoonlightSonata.wma
PreludeinEminor.wma
RustlesofSpring.wma
TheMaiden.wma
UnSuspiro.wma
listing.txt
$

In the above screen shot, our second ls command used the redirection operator > to redirect stdout to a file instead of the screen. As a result our third ls command shows a new file appearing in the directory. A cat of that new file proves that the output from the previous ls command got stored in the file "listing.txt". Note that the format of the text in the file is slightly different from the same text displayed originally on the screen. The redirected output has a newline after each filename. This formatting difference only occurs with some Unix commands. If we were to redirect the output of one of our C programs, Unix would not modify the format. In the case of the ls command, Unix assumes that when a directory listing is redirected, the user of that redirected file will want those file names each on its own line to facilitate the reading and processing of those filenames one at a time from the file. This only occurs with certain Unix commands whose output is generally agreed to be most usable if separated by newlines. Not all commands get their redirected output modified like this. When we start doing scripting we will appreciate this convenient formatting that the output of some commands receive when redirected.

The redirection append operator >> behaves just like redirection but appends to the specified file instead of recreating that file. In the following example we use the echo command to send some text to stdout but we redirect that stdout to a text file which we then get a listing on and then display the contents of, using the more command.


$ echo 73 > input.txt
$
$ ls -l input.txt
-rw-r--r-- 1 tlh staff 3 Jul 18 13:14 input.txt
$
$ more input.txt
73
$

Remember than any program which writes to stdout can be redirected. What we will demonstrate next is that any program that reads from stdin can have its input redirected to come from somewhere else. The redirection operator can cause a program to get its stdin from a file rather than default stdin device (keyboard). Suppose your program reads from the keyboard. You can store those keyboard inputs into a text file then redirect that text file into your program so that when your program runs it does not stop and wait for you to type. Instead, every time your program executes a read from stdin, it will take text from the text file rather than from the actual keyboard device. Let's look at an example using the solution executable to our next exercise and the input file we just created that contains a line of text containing the number 73.

Exercise 2: Write a C program and use Unix tools to test your solution.

Write a C program named exercise-2.c that prompts for, then reads an integer from stdin. This integer is expected to be a person's height in inches. You program must use the division / and modulo % operators to calculate the number of whole feet and remainder inches. For example if the number 70 is entered, your program would print:

5' 10"

which is the number of whole feet followed by a single quote, then a space, then the number of remainder inches followed by a double quote. The output must be exactly in this format to be correct. You must escape the double quote " character imbedded in your format string by putting a backslash \ before it. Otherwise the compiler will think it is the end of the format string.

Your program must be robust in that if the user enters "foo" or a number less than 1 instead of a small positive integer, the program will terminate with an error msg. You do not have to detect overflow such as the user entering a 50 digit number.

A reference solution is here: solution-2.c Without peeking inside - download, compile and run it several times with valid and invalid inputs. As you write your own solution, make its output identical to the reference solution's.
Redirect the output of the echo command to create the following input files:
- input-1.txt (containing the number 70)
- input-2.txt (containing the number 72)
- input-3.txt (containing the number 56)
- input-4.txt (containing the number 84)
Here are the steps to test your program against the reference solution executable
Redirect each input file into your solution and save the output by redirecting your solution's output to a file such as "my-output-1.txt" depending on the input file number. In the illustration below notice that we are using the redirection operator twice. First to redirect the input file to become the executable's stdin, and secondly to redirect the executable's stdout to a text file. In doing so we accomplish both steps in one command. The operators are evaluated left to right.
```
 ./a.out < input-1.txt > my-output-1.txt
```
Redirect the same input into the reference executable and redirect its output into a second output file such as "solution-output-1.txt".
```
 ./solution-2.exe < input-1.txt > solution-output-1.txt
```
Compare the two output files using the Unix utility diff. Read the man pages on diff. Look for the switches that tell diff to ignore whitespace, newlines and case. Although it should not be difficult to get the output of your program to be exactly like the reference solution's, there are many instances where you just don't care if there are extra spaces/lines/tabs etc. between words or numbers. Likewise you often wish to ignore differences in capitalization between two files. The output of the diff command is a line of text for each difference line found. When the files are identical no difference lines are displayed.
```
 diff   my-output-1.txt   solution-output-1.txt
```
The following form tells diff to ignore differences in case, newlines and whitespace.
```
 diff -b -B -i   my-output-1.txt   solution-output-1.txt
```
Your goal is to have no difference lines between your output and the reference solution's output. Using the correct switches on the diff command will ensure that trivial things that do not matter (like and extra space or trailing newline) do not show up as a difference.

Now is a good time show you another stream: stderr and demonstrate the difference between stdout and stderr A short while ago we executed our hello-1.c program and redirected its stdout to a file. As a result we did not see the prompts for the values to enter, nor did we see the prints of the values read in. The stderr stream is traditionally used to write error messages. Text printed to the stderr stream is not re-directed when the > operator is used. Let's go back and modify your exercise 2 solution to print all error messages to stdout. We will then execute our new program and redirect its output to a file.

Exercise 3: Modify your solution to exercise-2 such that the error message is written to stderr using: fprintf(stderr,"scanf failed on conversion to integer");

When you have it working, execute the following command

./a.out > output.txt

stdin is stream 0, stdout is stream 1, stderr is stream 2.

Here are some rules for redirecting the standard streams:

The leftmost argument on the line must be a command or program because the shell always expects to see the command first on the line. One exception is that you can have parentheses around that command and thus the first character on the line is a parenthesis.
The argument on the open side must be the name of a command or program that writes to stdout.
The argument on the pointed side of the redirection operator is receiving the redirected output and it must either be a text file name or a command that gets its input from stdin.
If redirection operator can be either the < character or the > character - subject to the first rule.

Beware, the redirection operators are notorious for behaving differently under different shells. The following examples apply to the csh shell and may not work as illustrated if you are using a different shell.

ls -l > dirlisting.txt	Creates a new file "dirlisting.txt" and dumps the output of the ls command into it (ls writes to stdout).
echo 5 10 > input.txt	Creates a new file "input.txt" and writes 5 10 into it. 5 and 10 are two separate args and they only get one space put in between them even if there were really several spaces.
echo "5 10" > input.txt	Creates a new file "input.txt" and writes 5 10 into it. Since the 5 and 10 are quoted there is only one argument to the echo command and that entire quoted string with all embedded spaces is written to the file.
./a.out > output.txt	Creates a new file "output.txt" and redirects into it any text that a.out wrote to stdout. Anything that a.out wrote to stderr is not put into the file.
./a.out < input.txt	Redirection operators can work from right to left. Executes a.out and causes it to get its input from input.txt rather than stdin. The assumption here is that a.out reads from stdin and we are replacing stdin with our input file so that every time a.out tries to read from stdin it will instead be reading from input.txt.
./a.out >> output.txt	Same as above but appends to output.txt rather than re-writing it. Stream numbers may be added just like in the above examples.
./a.out >& output.txt	Redirect both stdout and stderr to a text file. Any text written to either stream ends up in output.txt

Redirection operations can be parenthesized. This is what you must do to redirect stdout and stderr separately when using the csh (C) shell.

(./a.out > out.txt) >& err.txt

Redirects stdout from a.out into out.txt, and redirect stderr from a.out into err.txt.

What is the result of each the following commands? (try them and see)

Write a small program that reads a number from the stdin using scanf and echoes that number to stdout with a newline after the number. Compile it and name the executable a.out. Test it then apply these commands to that executable. Do you understand why they don't work?

input.txt > ./a.out

cat input.txt > ./a.out

In the first command you violated the first rule. The first arg on the command line must be an output producing command (not just a file name). The second command causes your executable to be overwritten by the contents of the input file. Not what you wanted either. We now show you how to accomplish what you were trying to do with the second command.

Pipes

Consider again this ill fated command: cat input.txt > ./a.out from the section above. It is intuitive to see the intention was to make the a.out executable take its input from the input file. Unfortunately we overwrote our executable instead. The right way to accomplish the intent of the command is to use the pipe operator | which is the vertical bar character, typically above the ENTER key.

cat input.txt | ./a.out

The above command performs exactly what we had been trying to do. Our a.out program gets its input from the file instead of stdin. To illustrate a more powerful use, we combine multiple Unix commands with the pipe operator.

wc filePath

reports how many newlines, words and bytes a file contains.


$ cat foo.c
#include <stdlib.h>
#include <stdio.h>

int main( )
{
  int x=0;
  printf("Enter a small positive number: ");
  fflush(stdout);
  scanf("%d", &x );
  printf("x= %d\n",x);

  return 0;
}

$ wc foo.c
 14  25 187 foo.c

The behavior of the wc command can be customized by using switches such as:

	-c	print the byte counts
	-m	print the character counts
	-l	print the newline counts
	-L	print the length of the longest line
	-w	print the word counts

We can now pipe the results of commands such as ls, find or grep into the wc command to count the number of items those commands found. We then pipe the results into the wc command with the -l switch to produce a count of those files.

Here we use the ls command to find the .c files in our current directory.

$ ls *.c
fixed.c  fscanf.c   lab2.c           listOps.c  m2.c  m4.c      pt.c      sol2.c
foo.c    getline.c  lab3-template.c  m1.c       m3.c  mutual.c  sizeof.c  solution-2.c
$
$ ls *.c | wc -l
16
$

Here we use the find command to find all files ending in .c in our current directory's entire hierarchy. we then pipe the results into the wc command with the -l switch to produce a count of those files.

$ find -name '*.c'
./OldFiles/listOps.c
./OldFiles/lab3-template.c
./OldFiles/lab2.c
./OldFiles/sizeof.c
./OldFiles/getline.c
./OldFiles/mutual.c
./OldFiles/C/hello-1.c
./OldFiles/C/solution-2.c
./OldFiles/fixed.c
./OldFiles/fscanf.c
./OldFiles/solution-2.c
./OldFiles/sol2.c
./OldFiles/m1.c
./OldFiles/m2.c
./OldFiles/m3.c
./OldFiles/m4.c
./OldFiles/pt.c
./OldFiles/foo.c
./listOps.c
./lab3-template.c
./lab2.c
./sizeof.c
./getline.c
./mutual.c
./C/hello-1.c
./C/solution-2.c
./fixed.c
./fscanf.c
./solution-2.c
./sol2.c
./m1.c
./m2.c
./m3.c
./m4.c
./pt.c
./foo.c
$
$ find -name '*.c' | wc -l
36
$

Below we grep for all occupancies of "exit" in all the .c files in our current directory hierarchy and then count the number of matches by piping the grep output into the wc command.

$ grep exit *.c
lab3-template.c:                printf("Initial malloc of dictionary failed. Program exiting\n");
lab3-template.c:      if ( !outfile ) exit(EXIT_FAILURE );
lab3-template.c:      exit( EXIT_SUCCESS );
lab3-template.c:  if ( !infile ) exit( EXIT_FAILURE );
listOps.c:freeing all memory correctly before exit worth 5% X/C
listOps.c:                      exit( 0 );
listOps.c:      exit(0);
m2.c:  exit( EXIT_FAILURE ); /* a NON_ZERO value */
m3.c:      exit(EXIT_FAILURE);
m3.c:  exit( EXIT_FAILURE ); /* a NON_ZERO value */
m4.c:     exit( EXIT_FAILURE ); /* a NON_ZERO value */
m4.c:  exit( EXIT_FAILURE ); /* a NON_ZERO value */
pt.c:  exit( EXIT_FAILURE);
sol2.c:- exit() function
sol2.c:  exit( EXIT_FAILURE);
solution-2.c:- exit() function
solution-2.c:  exit( EXIT_FAILURE);
$
$ grep exit *.c | wc -l
17
$

Another command that interacts powerfully with the pipe operator is xargs. The xargs command allows you to execute some command on each match or hit that some other command has produced. For example if you want to delete all files in a directory hierarchy that satisfy some criteria you would use the find command to produce the matching files, then pipe those matches into the xargs command and then put the rm command after the xargs command to indicate that you want xargs to execute the rm command on each one of those filenames found by the find command.

Let's delete all .swp files in the entire handin directory of this 15-123 course on Carnegie Mellon's AFS file space. These .swp files are the traces of students attempting to create or edit files from within the handin directory. Since students only have the privilege to copy file into but not edit/create from within, a .swp file is the by-product of a rejected attempt to do so. First we use the find command to see a list of all the .swp files. Then we re-issue the command and pipe the list of files into the rm command via the xargs command.

$ cd HANDIN $ cd HANDIN $ $ pwd /afs/andrew.cmu.edu/course/15/123-tlh/handin $ $ find -name '*.swp' ./mjelin/quiz-00/1/.swp ./irm/quiz-00/1/.swp $ $ $ find -name '*.swp' | xargs rm $ $ find -name '*.swp' $ $

Our initial listing shows only two .swp files in the entire handin hierarchy. Not bad. Students are catching on to the rules. Next we re-issue the command but this time we pipe it into the xargs command. Our xargs command will apply whatever command we specify (i.e. rm) to each line of output that is being piped in from the find command. It is important to understand that just piping the find into the rm command will not work because the rm command will not work on a long string of filenames separated by newlines. This is where the xargs command mediates between the results of the find and the rm command to feed each filename found, one at a time to the rm command. Once the command completes we do another find just to verify we got them all.

The tar utility

Unix has a utility to combine and compress multiple files and directories into a single file. The tar utility got its name from the "tape archive" utility. Many Unix systems still use some form of tape backup although more modern and denser media is becoming increasingly popular. Let's demonstrate the tar utility and then explain the switches. I have a directory named "classical" containing some songs. I want to archive that directory into a single tar file. This is analogous to zipping up a directory on Windows.

$ pwd
/afs/andrew.cmu.edu/usr20/tlh
$
$ ls classical/
ClairDeLune.wma  FantasieImpromptu.wma  Liebestraum.wma  MoonlightSonata.wma  RustlesofSpring.wma  UnSuspiro.wma
EtudeinE.wma     FurElise.wma           Malaguena.wma    PreludeinEminor.wma  TheMaiden.wma
$
$ tar zcfv classical.tgz classical
classical/
classical/Malaguena.wma
classical/MoonlightSonata.wma
classical/ClairDeLune.wma
classical/RustlesofSpring.wma
classical/TheMaiden.wma
classical/UnSuspiro.wma
classical/FantasieImpromptu.wma
classical/EtudeinE.wma
classical/PreludeinEminor.wma
classical/Liebestraum.wma
classical/FurElise.wma
$ ls classical.*
classical.tgz
$

Notice that the switches zcfv before the name of the archive file to be created, do not have a dash in front of them. The last argument is a list of files and/or directories to be archived. In the above example one lone directory was specified but we could have specified a list of files separated by spaces. Like this:

$ cd classical/
$
$ ls
ClairDeLune.wma  FantasieImpromptu.wma  Liebestraum.wma  MoonlightSonata.wma  RustlesofSpring.wma  UnSuspiro.wma
EtudeinE.wma     FurElise.wma           Malaguena.wma    PreludeinEminor.wma  TheMaiden.wma
$
$ tar zcfv mytar.tgz FurElise.wma Liebestraum.wma MoonlightSonata.wma RustlesofSpring.wma
FurElise.wma
Liebestraum.wma
MoonlightSonata.wma
RustlesofSpring.wma
$
$ ls
ClairDeLune.wma  FantasieImpromptu.wma  Liebestraum.wma  MoonlightSonata.wma  PreludeinEminor.wma  TheMaiden.wma
EtudeinE.wma     FurElise.wma           Malaguena.wma    mytar.tgz            RustlesofSpring.wma  UnSuspiro.wma
$

In our above illustrations we specified the z,c,f, and v options.

c Creates an archive file (as opposed to the inverse - which is to extract from or unzip/untar an existing archive file).
f Means you are specifying the name for this archive file (comes after the options) such as we did above - "mytar.tgz" or "classical.tgz"
v Verbose mode - Spits out the name of every file it is packing into the archive.
z Tells tar to read or write archives through gzip, allowing tar to directly operate on several kinds of compressed archives transparently.

To uncompress an archive use the tar command with the option x. First we delete the classical directory. Then we untar the classical.tgz file to recreate the original classical directory.

$
$ rm -rf classical
$
$ tar xzf classical.tgz
$ ls classical
ClairDeLune.wma  FantasieImpromptu.wma  Liebestraum.wma  MoonlightSonata.wma  PreludeinEminor.wma  TheMaiden.wma
EtudeinE.wma     FurElise.wma           Malaguena.wma    mytar.tgz            RustlesofSpring.wma  UnSuspiro.wma
$

To x option means extract (untar) the archive. The z option means it was compressing using the z (gzip) option and should be uncompressed using the z (gzip) option. The f option means the filename to be un-tarred is being specified on the command line. In this case classical.tgz. A directory named classical is created in the same dir where the command is issued. x. First we delete the classical directory. Then we untar the classical.tgz file to recreate the original classical directory.

Warning: The options used above such as czfv or xzf are used in old style form, all bunched together with no spaces or dashes between. If you want to use the dash '-' before them you must put a dash before every switch individually such as -x -v -f . If you mix the two you will get incorrect results.

2^7	2^6	2^5	2^4	2^3	2^2	2^1	2^0
128	64	32	16	8	4	2	1
1	0	0	1	1	1	0	1
128	+0	+0	+16	+8	+4	+0	+1

2^7	2^6	2^5	2^4	2^3	2^2	2^1	2^0
128	64	32	16	8	4	2	1
1	0	0	1	1	1	0	1
128	+0	+0	+16	+8	+4	+0	+1

Chapter 1: Hello Unix, Hello World

Copyright Timothy L. Hoffman

C compiler for this course

Motivation

Topics covered in this chapter

Chapter 1.1 Unix shells: command shells vs. shell scripting

Command Shell

Shell script

FILE TYPE

OWNER'S RIGHTS

GROUP'S RIGHTS

WORLD'S RIGHTS

The man command will show you documentation on any command.

Directories

Displaying the contents of files

Finding files that satisfy a criteria

Searching for text patterns inside files

Some other powerful utilities and commands

Chapter 1.2 Memory, bits & bytes, "Hello World"

Memory organization and platform dependence

How are signed integers such as char c = -7; stored?

Hello World

main function

variable declarations

console I/O: stdin, stdout

What would happen if in our scanf() or fscanf() we just used x and y instead of &x and &y?

compilation

execution

Exercise 1:

argc, argv and for loop demonstrated

Chapter 1.3 File I/O: text files and binary files

Text files: fprintf() and fscanf()

File-I/O on text streams: review of fscanf() and fprintf()

Let's zoom in to the byte and bit level on our streams to see exactly how fprintf() and fscanf() do conversion and formatting to the data they read and write.

ASCII TABLE

Hands on: Write a program to apply what you just learned

Exercise-4.c

solution-4.c DON'T PEEK until you've given it your best shot!

SUMMARY: using text files

Caveat on using feof() to detect end of a text file

binary files: fread() and fwrite()

Let's zoom in to the byte and bit level on our binary output file to see exactly what fwrite() wrote to disk and exactly what fread() read back into memory.

A few questions for you:

Let's now look at how fread() puts data into memory

Now..what exactly would get put into memory if we executed this line of code next ?

Error checking on File operations

SUMMARY: using binary files

Chapter 1.4 The standard streams: redirection and pipes (+ find, grep, wc and tar)

The standard streams and their redirection

Exercise 2: Write a C program and use Unix tools to test your solution.

What is the result of each the following commands? (try them and see)

Pipes

The tar utility

Here is the GNU organization's tar tutorial: http://www.gnu.org/software/tar/manual/html_section/

How are signed integers such as `char c = -7;` stored?

2^7	2^6	2^5	2^4	2^3	2^2	2^1	2^0
128	64	32	16	8	4	2	1
1	0	0	1	1	1	0	1
128	+0	+0	+16	+8	+4	+0	+1