Secure Programming for Linux and Unix HOWTO David A. Wheeler v3.010 Edition Copyright © 1999, 2000, 2001, 2002, 2003 David A. Wheeler v3.010, 3 March 2003 This book provides a set of design and implementation guidelines for writing secure programs for Linux and Unix systems. Such programs include application scripts), network servers, and setuid/setgid programs. Specific guidelines for C, C++, Java, Perl, PHP, Python, Tcl, and Ada95 are included. For a current version of the book, see [http://www.dwheeler.com/secure-programs] http://www.dwheeler.com/secure-programs This book is Copyright (C) 1999-2003 David A. Wheeler. Permission is granted to copy, distribute and/or modify this book under the terms of the GNU Free Documentation License (GFDL), Version 1.1 or any later version published by Author'', with no Front-Cover Texts, and no Back-Cover texts. A copy of the This is an inserted line This is a replaced line This book is distributed in the hope that it will be useful, but WITHOUT ANY A PARTICULAR PURPOSE. WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR ----------------------------------------------------------------------------- Table of Contents This is a replaced line 2.1. History of Unix, Linux, and Open Source / Free Software 2. Background 2.2. Security Principles 2.3. Why do Programmers Write Insecure Code? 2.5. Types of Secure Programs 2.6. Paranoia is a Virtue 2.7. Why Did I Write This Document? 2.8. Sources of Design and Implementation Guidelines 2.9. Other Sources of Security Information 2.10. Document Conventions 3. Summary of Linux and Unix Security Features 3.1. Processes 3.2. Files 3.4. Sockets and Network Connections 3.5. Signals This is a replaced line 3.8. Audit 3.7. Dynamically Linked Libraries 3.9. PAM 3.10. Specialized Security Extensions for Unix-like Systems 4. Security Requirements 4.1. Common Criteria Introduction 4.2. Security Environment and Objectives 4.3. Security Functionality Requirements 4.4. Security Assurance Measure Requirements 5. Validate All Input 5.2. Environment Variables 5.1. Command line 5.3. File Descriptors 5.5. File Contents 5.4. File Names 5.6. Web-Based Application Inputs (Especially CGI Scripts) 5.7. Other Inputs 5.8. Human Language (Locale) Selection This is a replaced line 5.10. Prevent Cross-site Malicious Content on Input 5.11. Filter HTML/URIs That May Be Re-presented 5.13. Counter SPAM 5.12. Forbid HTTP GET To Perform Non-Queries This is an inserted line 5.14. Limit Valid Input Time and Load Level This is a replaced line 6.1. Dangers in C/C++ 6. Avoid Buffer Overflow 6.3. Compilation Solutions in C/C++ 6.2. Library Solutions in C/C++ 6.4. Other Languages 7. Structure Program Internals and Approach 7.1. Follow Good Software Engineering Principles for Secure Programs This is an inserted line 7.3. Separate Data and Control 7.5. Minimize the Functionality of a Component 7.4. Minimize Privileges 7.6. Avoid Creating Setuid/Setgid Scripts 7.7. Configure Safely and Use Safe Defaults 7.8. Load Initialization Values Safely 7.9. Fail Safe 7.10. Avoid Race Conditions 7.11. Trust Only Trustworthy Channels 7.12. Set up a Trusted Path 7.13. Use Internal Consistency-Checking Code 7.14. Self-limit Resources This is a replaced line 7.16. Foil Semantic Attacks 7.17. Be Careful with Data Types This is a replaced line 8.1. Call Only Safe Library Routines 8.2. Limit Call-outs to Valid Values 8.3. Handle Metacharacters 8.4. Call Only Interfaces Intended for Programmers 8.5. Check All System Call Returns 8.6. Avoid Using vfork(2) 8.7. Counter Web Bugs When Retrieving Embedded Content 8.8. Hide Sensitive Information 9.1. Minimize Feedback 9.2. Don't Include Comments 9.3. Handle Full/Unresponsive Output 9.4. Control Data Formatting (Format Strings/Formatation) 9.5. Control Character Encoding in Output 9.6. Prevent Include/Configuration File Access This is a replaced line 10.1. C/C++ 10.2. Perl 10.4. Shell Scripting Languages (sh and csh Derivatives) 10.5. Ada 10.6. Java 10.7. Tcl 10.8. PHP 11. Special Topics 11.1. Passwords 11.2. Authenticating on the Web This is an inserted line 11.3. Random Numbers 11.4. Specially Protect Secrets (Passwords and Keys) in User Memory 11.5. Cryptographic Algorithms and Protocols 11.6. Using PAM 11.8. Windows CE 11.7. Tools 11.9. Write Audit Records 11.10. Physical Emissions 11.11. Miscellaneous 12. Conclusion 13. Bibliography A. History B. Acknowledgements C. About the Documentation License D. GNU Free Documentation License E. Endorsements F. About the Author This is a replaced line 5-1. Legal UTF-8 Sequences List of Figures 1-1. Abstract View of a Program This is a replaced line This is an inserted line Chapter 1. Introduction -----------------------------------------------------------------------------   A wise man attacks the city of the mighty and pulls down the stronghold   Proverbs 21:22 (NIV) in which they trust. This book describes a set of guidelines for writing secure programs on Linux This is a replaced line This is a replaced line does not have the same access rights as the program. Such programs include application programs used as viewers of remote data, web applications book does not address modifying the operating system kernel itself, although (including CGI scripts), network servers, and setuid/setgid programs. This many of the principles discussed here do apply. These guidelines were developed as a survey of ``lessons learned'' from various sources on how to create such programs (along with additional observations by the author), reorganized into a set of larger principles. This book includes specific guidance for a number of languages, including C, C++, Java, Perl, PHP, This is an inserted line Python, Tcl, and Ada95. This is an inserted line You can find the master copy of this book at [http://www.dwheeler.com/ secure-programs] http://www.dwheeler.com/secure-programs. This book is also part of the Linux Documentation Project (LDP) at [http://www.tldp.org] http:/ /www.tldp.org It's also mirrored in several other places. Please note that these mirrors, including the LDP copy and/or the copy in your distribution, but please do not send comments until you've checked to make sure that your may be older than the master copy. I'd like to hear comments on this book, comment is valid for the latest version. This book does not cover assurance measures, software engineering processes, and quality assurance approaches, which are important but widely discussed elsewhere. Such measures include testing, peer review, configuration management, and formal methods. Documents specifically identifying sets of development assurance measures for security issues include the Common Criteria (CC, [CC 1999]) and the Systems Security Engineering Capability Maturity Model [SSE-CMM 1999]. Inspections and other peer review techniques are discussed in [Wheeler 1996]. This book does briefly discuss ideas from the CC, but only as an organizational aid to discuss security requirements. such as the Software Engineering Institute's Capability Maturity Model for More general sets of software engineering processes are defined in documents international standards for quality systems are defined in ISO 9000 and ISO 9001 [ISO 9000, 9001]. This is a replaced line secure in a given environment. This is clearly necessary for secure use of a given program, but a great many other documents discuss secure configurations. An excellent general book on configuring Unix-like systems to be secure is Garfinkel [1996]. Other books for securing Unix-like systems This is a replaced line Unix-like systems at web sites such as [http://www.unixtools.com/ security.html] http://www.unixtools.com/security.html. Information on configuring a Linux system to be secure is available in a wide variety of documents including Fenzi [1999], Seifried [1999], Wreski [1998], Swan [2001], and Anonymous [1999]. Geodsoft [2001] describes how to harden OpenBSD, and many of its suggestions are useful for any Unix-like system. This is a replaced line [2002]. For Linux systems (and eventually other Unix-like systems), you may want to examine the Bastille Hardening System, which attempts to ``harden'' or ``tighten'' the Linux operating system. You can learn more about Bastille at [http://www.bastille-linux.org] http://www.bastille-linux.org; it is available for free under the General Public License (GPL). Other hardening This is a replaced line might want to look at Cox [2000]. The U.S. National Security Agency (NSA) maintains a set of security recommendation guides at [http:// nsa1.www.conxion.com] http://nsa1.www.conxion.com, including the ``60 Minute infrastructure (PKI) using open source tools, you might want to look at the Network Security Guide.'' If you're trying to establish a public key [http://ospkibook.sourceforge.net] Open Source PKI Book. More about firewalls and Internet security is found in [Cheswick 1994]. Configuring a computer is only part of Security Management, a larger area that also covers how to deal with viruses, what kind of organizational security policy is needed, business continuity plans, and so on. There are international standards and guidance for security management. ISO 13335 is a 13335]. ISO/IEC 17799:2000 defines a code of practice [ISO 17799]; its stated five-part technical report giving guidance on security management [ISO This is a replaced line implementing or maintaining security in their organization.'' The document specifically identifies itself as "a starting point for developing and controls it contains may be applicable, and that additional controls not organization specific guidance." It also states that not all of the guidance contained may be required. Even more importantly, they are intended to be definitive details or "how-tos". It's worth noting that the original signing broad guidelines covering a number of areas. and not intended to give of ISO/IEC 17799:2000 was controversial; Belgium, Canada, France, Germany, Italy, Japan and the US voted against its adoption. However, it appears that these votes were primarily a protest on parliamentary procedure, not on the they find it helpful. More information about ISO 17799 can be found in NIST's [http://csrc.nist.gov/publications/secpubs/otherpubs/reviso-faq.pdf] ISO/IEC 17799:2000 FAQ. ISO 17799 is highly related to BS 7799 part 1 and 2; more information about BS 7799 can be found at [http://www.xisec.com/faq.htm] 7799 parts 1 and 2) are intended to be a detailed set of technical guidelines important to note that none of these standards (ISO 13335, ISO 17799, or BS for software developers; they are all intended to provide broad guidelines in a number of areas. This is important, because software developers who simply only follow (for example) ISO 17799 will generally not produce secure software - developers need much, much, much more detail than ISO 17799 provides. The Commonly Accepted Security Practices & Recommendations (CASPR) project at security knowledge into a series of papers available to all (under the GNU This is an inserted line [http://www.caspr.org] http://www.caspr.org is trying to distill information FDL license, so that future document derivatives will continue to be available to all). Clearly, security management needs to include keeping with patches as vulnerabilities are found and fixed. Beattie [2002] provides an interesting analysis on how to determine when to apply patches contrasting risk of a bad patch to the risk of intrusion (e.g., under certain conditions, patches are optimally applied 10 or 30 days after they are released). If you're interested in the current state of vulnerabilities, there are other resources available to use. The CVE at http://cve.mitre.org gives a standard identifier for each (widespread) vulnerability. The paper [http:// securitytracker.com/learn/securitytracker-stats-2002.pdf] SecurityTracker Statistics analyzes vulnerabilities to determine what were the most common vulnerabilities. The Internet Storm Center at http://isc.incidents.org/ shows the prominence of various Internet attacks around the world. This book assumes that the reader understands computer security issues in general, the general security model of Unix-like systems, networking (in particular TCP/IP based networks), and the C programming language. This book security. If you need more information on how TCP/IP based networks and protocols work, including their security protocols, consult general works on does include some information about the Linux and Unix programming model for TCP/IP such as [Murhammer 1998]. This is an inserted line When I first began writing this document, there were many short articles but no books on writing secure programs. There are now two other books on writing This is a replaced line McGraw [Viega 2002]; this is a very good book that discusses a number of important security issues, but it omits a large number of important security problems that are instead covered here. Basically, this book selects several important topics and covers them well, but at the cost of omitting many other important topics. The Viega book has a little more information for Unix-like systems than for Windows systems, but much of it is independent of the kind of system. The other book is ``Writing Secure Code'' by Michael Howard and David LeBlanc [Howard 2002]. The title of this other book is misleading; the book is solely about writing secure programs for Windows, and is basically worthless if you are writing programs for any other system. This shouldn't be surprising; it's published by Microsoft press, and its copyright is owned by Microsoft. If you are trying to write secure programs for Microsoft's Windows guidance is the The Open Web Application Security Project (OWASP) Guide to Building Secure Web Applications and Web Services; it has more on process, systems, it's a good book. Another useful source of secure programming and less specifics than this book, but it has useful material in it. This book covers all Unix-like systems, including Linux and the various strains of Unix, and it particularly stresses Linux and provides details about Linux specifically. There's some material specifically on Windows CE, and in fact much of this material is not limited to a particular operating system. If you know relevant information not already included here, please This book is copyright (C) 1999-2002 David A. Wheeler and is covered by the GNU Free Documentation License (GFDL); see Appendix C and Appendix D for more This is an inserted line information. Chapter 2 discusses the background of Unix, Linux, and security. Chapter 3 the security attributes and operations of processes, filesystem objects, and so on. This is followed by the meat of this book, a set of design and describes the general Unix and Linux security model, giving an overview of systems. The book ends with conclusions in Chapter 12, followed by a lengthy bibliography and appendixes. believe emphasize the programmer's viewpoint. Programs accept inputs, process data, call out to other resources, and produce output, as shown in Figure 1-1 ; notionally all security guidelines fit into one of these categories. I've This is a replaced line avoiding buffer overflows (which in some cases can also be considered an input issue), language-specific information, and special topics. The chapters are ordered to make the material easier to follow. Thus, the book chapters This is a replaced line overflows (Chapter 6), structuring program internals and approach (Chapter 7 ), carefully calling out to other resources (Chapter 8), judiciously sending finally information on special topics such as how to acquire random numbers ( Chapter 11). This is an inserted line information back (Chapter 9), language-specific information (Chapter 10), and This is an inserted line Figure 1-1. Abstract View of a Program [program] ----------------------------------------------------------------------------- Chapter 2. Background   I issued an order and a search was has a long history of revolt against made, and it was found that this city kings and has been a place of rebellion and sedition.   Ezra 4:19 (NIV) ----------------------------------------------------------------------------- This is an inserted line 2.1. History of Unix, Linux, and Open Source / Free Software This is a replaced line 2.1.1. Unix In 1969-1970, Kenneth Thompson, Dennis Ritchie, and others at AT&T Bell Labs began developing a small operating system on a little-used PDP-7. The operating system was soon christened Unix, a pun on an earlier operating system project called MULTICS. In 1972-1973 the system was rewritten in the programming language C, an unusual step that was visionary: due to this decision, Unix was the first widely-used operating system that could switch from and outlive its original hardware. Other innovations were added to Unix as well, in part due to synergies between Bell Labs and the academic community. In 1979, the ``seventh edition'' (V7) version of Unix was released, the grandfather of all extant Unix systems. After this point, the history of Unix becomes somewhat convoluted. The academic community, led by Berkeley, developed a variant called the Berkeley Software Distribution (BSD), while AT&T continued developing Unix under the names ``System III'' and later ``System V''. In the late 1980's through early 1990's the ``wars'' between these two major strains raged. After many years each variant adopted many of the key features of the other. Commercially, System V won the ``standards wars'' (getting most of its interfaces into the However, System V ended up incorporating many BSD innovations, so the formal standards), and most hardware vendors switched to AT&T's System V. resulting system was more a merger of the two branches. The BSD branch did not die, but instead became widely used for research, for PC hardware, and for single-purpose servers (e.g., many web sites use a BSD derivative). The result was many different versions of Unix, all based on the original seventh edition. Most versions of Unix were proprietary and maintained by their respective hardware vendor, for example, Sun Solaris is a variant of FreeBSD (concentrating on ease-of-installation for PC-type hardware), NetBSD System V. Three versions of the BSD branch of Unix ended up as open source: (concentrating on many different CPU architectures), and a variant of NetBSD, OpenBSD (concentrating on security). More general information about Unix history can be found at [http://www.datametrics.com/tech/unix/uxhistry/ brf-hist.htm] http://www.datametrics.com/tech/unix/uxhistry/brf-hist.htm, This is an inserted line [http://perso.wanadoo.fr/levenez/unix] http://perso.wanadoo.fr/levenez/unix, This is a replaced line unix.html. Much more information about the BSD history can be found in [McKusick 1999] and [ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current/src/ share/misc/bsd-family-tree] ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current using Unix-like systems (instead of Microsoft's products) is [http:// This is an inserted line paper ``Microsoft Windows NT Server 4.0 versus UNIX''. ----------------------------------------------------------------------------- 2.1.2. Free Software Foundation project, a project to create a free version of the Unix operating system. By In 1984 Richard Stallman's Free Software Foundation (FSF) began the GNU free, Stallman meant software that could be freely used, read, modified, and redistributed. The FSF successfully built a vast number of useful components, including a C compiler (gcc), an impressive text editor (emacs), and a host of fundamental tools. However, in the 1990's the FSF was having trouble developing the operating system kernel [FSF 1998]; without a kernel their This is a replaced line ----------------------------------------------------------------------------- In 1991 Linus Torvalds began developing an operating system kernel, which he 2.1.3. Linux material and other components (in particular some of the BSD components and named ``Linux'' [Torvalds 1999]. This kernel could be combined with the FSF and an entire combination as ``Linux''. Note that many use the term ``GNU/ operating system. This book will term the kernel itself the ``Linux kernel'' Linux'' instead for this combination. This is an inserted line In the Linux community, different organizations have combined the available the organizations that develop distributions are called ``distributors''. Common distributions include Red Hat, Mandrake, SuSE, Caldera, Corel, and components differently. Each combination is called a ``distribution'', and distributions are based on the same foundation: the Linux kernel and the GNU Debian. There are differences between the various distributions, but all glibc libraries. Since both are covered by ``copyleft'' style licenses, This is an inserted line changes to these foundations generally must be made available to all, a not exist between the BSD and AT&T-derived Unix systems. This book is not specific to any Linux distribution; when it discusses Linux it presumes Linux kernel version 2.2 or greater and the C library glibc 2.1 or greater, valid assumptions for essentially all current major Linux distributions. ----------------------------------------------------------------------------- 2.1.4. Open Source / Free Software This is a replaced line Increased interest in software that is freely shared has made it increasingly necessary to define and explain it. A widely used term is ``open source This is an inserted line This is a replaced line wrote several seminal articles examining its various development processes. This is a replaced line This is a replaced line source'' is sometime (ab)used to mean software whose source code is visible, This is a replaced line but for which there are limitations on use, modification, or redistribution. This book uses the term ``open source'' for its usual meaning, that is, modification, and redistribution; a more detailed definition is contained in software which has its source code freely available for use, viewing, the [http://www.opensource.org/osd.html] Open Source Definition. In some cases, a difference in motive is suggested; those preferring the term ``free software'' wish to strongly emphasize the need for freedom, while those using the term may have other motives (e.g., higher reliability) or simply wish to appear less strident. For information on this definition of free software, and the motivations behind it, can be found at [http://www.fsf.org] http:// www.fsf.org. This is an inserted line Those interested in reading advocacy pieces for open source software and free software should see [http://www.opensource.org] http://www.opensource.org and This is an inserted line examine such software, for example, Miller [1995] found that the open source [http://www.fsf.org] http://www.fsf.org. There are other documents which software were noticeably more reliable than proprietary software (using their measurement technique, which measured resistance to crashing due to random This is an inserted line input). ----------------------------------------------------------------------------- 2.1.5. Comparing Linux and Unix This book uses the term ``Unix-like'' to describe systems intentionally like Unix. In particular, the term ``Unix-like'' includes all major Unix variants and Linux distributions. Note that many people simply use the term ``Unix'' to describe these systems instead. Originally, the term ``Unix'' meant a particular product developed by AT&T. Today, the Open Group owns the Unix trademark, and it defines Unix as ``the worldwide Single UNIX Specification''. Linux is not derived from Unix source code, but its interfaces are intentionally like Unix. Therefore, Unix lessons learned generally apply to both, including information on security. Most of the information in this book applies to any Unix-like system. Linux-specific information has been capabilities. intentionally added to enable those using Linux to take advantage of Linux's Unix-like systems share a number of security mechanisms, though there are subtle differences and not all systems have all mechanisms available. All include user and group ids (uids and gids) for each process and a filesystem with read, write, and execute permissions (for user, group, and other). See This is an inserted line This is a replaced line including their basic security mechanisms. Chapter 3 summarizes key security features of Unix and Linux. ----------------------------------------------------------------------------- 2.2. Security Principles This is a replaced line There are many general security principles which you should be familiar with; one good place for general information on information security is the Information Assurance Technical Framework (IATF) [NSA 2000]. NIST has identified high-level ``generally accepted principles and practices'' [Swanson 1996]. You could also look at a general textbook on computer security, such as [Pfleeger 1997]. NIST Special Publication 800-27 describes a number of good engineering principles (although, since they're abstract, they're insufficient for actually building secure programs - hence this book); you can get a copy at [http://csrc.nist.gov/publications/nistpubs/ 800-27/sp800-27.pdf] http://csrc.nist.gov/publications/nistpubs/800-27/ This is a replaced line Often computer security objectives (or goals) are described in terms of three This is a replaced line This is a replaced line   * Confidentiality (also known as secrecy), meaning that the computing system's assets can be read only by authorized parties.   * Integrity, meaning that the assets can only be modified or deleted by authorized parties in authorized ways. parties in a timely manner (as determined by the systems requirements).   * Availability, meaning that the assets are accessible to the authorized The failure to meet this goal is called a denial of service. Some people define additional major security objectives, while others lump those additional goals as special cases of these three. For example, some separately identify non-repudiation as an objective; this is the ability to ``prove'' that a sender sent or receiver received a message (or both), even if the sender or receiver wishes to deny it later. Privacy is sometimes addressed separately from confidentiality; some define this as protecting the confidentiality of a user (e.g., their identity) instead of the data. Most objectives require identification and authentication, which is sometimes This is a replaced line is identified as a desirable security objective. Sometimes ``access control'' and ``authenticity'' are listed separately as well. For example, The U.S. Department of Defense (DoD), in DoD directive 3600.1 defines ``information assurance'' as ``information operations (IO) that protect and defend information and information systems by ensuring their availability, providing for restoration of information systems by incorporating protection, This is a replaced line detection, and reaction capabilities.'' In any case, it is important to identify your program's overall security you've met them. objectives, no matter how you group them together, so that you'll know when Sometimes these objectives are a response to a known set of threats, and This is an inserted line sometimes some of these objectives are required by law. For example, for U.S. banks and other financial institutions, there's a new privacy law called the ``Gramm-Leach-Bliley'' (GLB) Act. This law mandates disclosure of personal information shared and means of securing that data, requires disclosure of personal information that will be shared with third parties, and directs institutions to give customers a chance to opt out of data sharing. [Jones 2000] There is sometimes conflict between security and some other general system/ software engineering principles. Security can sometimes interfere with ``ease of use'', for example, installing a secure configuration may take more effort than a ``trivial'' installation that works but is insecure. Often, this apparent conflict can be resolved, for example, by re-thinking a problem it's often possible to make a secure system also easy to use. There's also sometimes a conflict between security and abstraction (information hiding); not, but their specifications won't tell you. In the end, if your application for example, some high-level library routines may be implemented securely or must be secure, you must do things yourself if you can't be sure otherwise - yes, the library should be fixed, but it's your users who will be hurt by your poor choice of library routines. A good general security principle is ``defense in depth''; you should have numerous defense mechanisms (``layers'') in place, designed so that an attacker has to defeat multiple mechanisms to perform a successful attack. ----------------------------------------------------------------------------- 2.3. Why do Programmers Write Insecure Code? Many programmers don't intend to write insecure code - but do anyway. Here summarized by Aleph One on Bugtraq (in a posting on December 17, 1998): This is an inserted line are a number of purported reasons for this. Most of these were collected and   * There is no curriculum that addresses computer security in most schools. Even when there is a computer security curriculum, they often don't discuss how to write secure programs as a whole. Many such curriculum This is a replaced line important, but they often fail to discuss common real-world issues such as buffer overflows, string formatting, and input checking. I believe through colleges and universities are very unlikely to learn how to write secure programs, yet we depend on those very people to write secure programs.   * Programming books/classes do not teach secure/safe programming techniques. Indeed, until recently there were no books on how to write secure programs at all (this book is one of those few).   * No one uses formal verification methods.   * C is an unsafe language, and the standard C library string functions are This is an inserted line ``simple'' ways of using C permit dangerous exploits.   * Programmers do not think ``multi-user.''   * Programmers are human, and humans are lazy. Thus, programmers will often use the ``easy'' approach instead of a secure approach - and once it works, they often fail to fix it later.   * Most programmers are simply not good programmers.   * Most programmers are not security people; they simply don't often think like an attacker does.   * Most security people are not programmers. This was a statement made by some Bugtraq contributors, but it's not clear that this claim is really true.   * Most computer security models are terrible. This is an inserted line This is a replaced line remove security faults or to make it work with more restrictive security policies) is difficult.   * Consumers don't care about security. (Personally, I have hope that consumers are beginning to care about security; a computer system that is consumers are unaware that there's even a problem, assume that it can't constantly exploited is neither useful nor user-friendly. Also, many This is an inserted line This is a replaced line   * Security costs extra development time.   * Security costs in terms of additional testing (red teams, etc.). This is a replaced line ----------------------------------------------------------------------------- This is an inserted line 2.4. Is Open Source Good for Security? open source approaches on security. One of the key issues is that open source This is a replaced line exposes the source code to examination by everyone, both the attackers and defenders, and reasonable people disagree about the ultimate impact of this situation. (Note - you can get the latest version of this essay by going to the main website for this book, [http://www.dwheeler.com/secure-programs] http://www.dwheeler.com/secure-programs. He argues that smart engineers should ``demand open source code for anything Bruce Schneier is a well-known expert on computer security and cryptography. related to security'' [Schneier 1999], and he also discusses some of the preconditions which must be met to make open source software secure. Vincent Rijmen, a developer of the winning Advanced Encryption Standard (AES) a superior vehicle to making security vulnerabilities easier to spot and fix, encryption algorithm, believes that the open source nature of Linux provides ``Not only because more people can look at it, but, more importantly, because the model forces people to write more clear code, and to adhere to standards. This in turn facilitates security review'' [Rijmen 2000]. Elias Levy (Aleph1) is the former moderator of one of the most popular security discussion groups - Bugtraq. He discusses some of the problems in Secure than Closed?". His summary is: So does all this mean Open Source Software is no better than closed Source Software certainly does have the potential to be more secure than This is a replaced line its closed source counterpart. But make no mistake, simply being open source is no guarantee of security. Whitfield Diffie is the co-inventor of public-key cryptography (the basis of all Internet security) and chief security officer and senior staff engineer at Sun Microsystems. In his 2003 article [http://zdnet.com.com/ 2100-1107-980938.html] Risky business: Keeping security a secret, he argues that proprietary vendor's claims that their software is more secure because by proprietary vendors: (1) that release of code benefits attackers more than it's secret is nonsense. He identifies and then counters two main claims made This is a replaced line and that (2) a few expert eyes are better than several random ones. He first notes that while giving programmers access to a piece of software doesn't can be expected to care deeply: Those who either use the software personally guarantee they will study it carefully, there is a group of programmers who or work for an enterprise that depends on it. "In fact, auditing the programs on which an enterprise depends for its own security is a natural function of This is an inserted line the enterprise's own information-security organization." He then counters the second argument, noting that "As for the notion that open source's usefulness to opponents outweighs the advantages to users, that argument flies in the face of one of the most important principles in security: A secret that cannot be readily changed should be regarded as a vulnerability." He closes noting that This is an inserted line "It's simply unrealistic to depend on secrecy for security in computer of general circulation, but can you prevent the code from being software. You may be able to keep the exact workings of the program out reverse-engineered by serious opponents? Probably not." John Viega's article [http://dev-opensourceit.earthweb.com/news/ 000526_security.html] "The Myth of Open Source Security" also discusses issues, and summarizes things this way: Open source software projects can be more secure than closed source projects. However, the very things that can make open source programs secure -- the availability of the source code, and the fact that large This is an inserted line numbers of users are available to look for and fix security holes -- can also lull people into a false sense of security. [http://www.linuxworld.com/linuxworld/lw-1998-11/lw-11-ramparts.html] Michael impact of open source software on security. In contrast, Fred Schneider H. Warfield's "Musings on open source security" is very positive about the to believe that the many eyes inspecting (open) source code would be This is an inserted line doesn't believe that open source helps security, saying ``there is no reason This is an inserted line and claiming that ``bugs in the code are not the dominant means of attack'' This is an inserted line This is a replaced line [Schneider 2000]. He also claims that open source rules out control of the construction process, though in practice there is such control - all major open source programs have one or a few official versions with ``owners'' with reputations at stake. Peter G. Neumann discusses ``open-box'' software (in which source code is available, possibly only under certain conditions), saying ``Will open-box software really improve system security? My answer is not by itself, although the potential is considerable'' [Neumann 2000]. TruSecure Corporation, under sponsorship by Red Hat (an open source company), has developed a paper on why they believe open source is more effective for security [TruSecure 2001]. [http://www-106.ibm.com/developerworks/linux/ library/l-oss.html?open&I=252,t=gr,p=SeclmpOS] Natalie Walker Whitlock's IBM DeveloperWorks article discusses the pros and cons as well. Brian Witten, Carl Landwehr, and Micahel Caloyannides [Witten 2001] published in IEEE Software an article tentatively concluding that having source code available should work in the favor of system security; they note: access to source code lets users improve system security -- if they have ``We can draw four additional conclusions from this discussion. First, the capability and resources to do so. Second, limited tests indicate less vulnerable to nonmalicious faults. Third, a survey of three operating systems indicates that one open source operating system experienced less exposure in the form of known but unpatched two proprietary counterparts. Last, closed and proprietary system development models face disincentives toward fielding and supporting more secure systems as long as less secure systems are more profitable. Notwithstanding these conclusions, arguments in this important matter are This is an inserted line in their formative stages and in dire need of metrics that can reflect security delivered to the customer.'' Scott A. Hissam and Daniel Plakosh's [http://www.ics.uci.edu/~wscacchi/Papers /New/IEE_hissam.pdf] ``Trust and Vulnerability in Open Source Software'' discuss the pluses and minuses of open source software. As with other papers, they note that just because the software is open to review, it should not automatically follow that such a review has actually been performed. Indeed, they note that this is a general problem for all software, open or closed - One interesting point is that they demonstrate that attackers can learn about it is often questionable if many people examine any given piece of software. a vulnerability in a closed source program (Windows) from patches made to an OSS/FS program (Linux). In this example, Linux developers fixed a vulnerability before attackers tried to attack it, and attackers correctly This is a replaced line This is a replaced line This is an inserted line prevent. Therefore, the existance of an OSS/FS program can reveal the vulnerabilities of both the OSS/FS and proprietary program performing the ----------------------------------------------------------------------------- same function - but at in this example, the OSS/FS program was fixed first. This is a replaced line since there's less information available for an attacker, it should be harder It's been argued that a system without source code is more secure because, This is a replaced line weaknesses, however, because although source code is extremely important when trying to add new capabilities to a program, attackers generally don't need First, it's important to distinguish between ``destructive'' acts and ``constructive'' acts. In the real world, it is much easier to destroy a car This is an inserted line than to build one. In the software world, it is much easier to find and exploit a vulnerability than to add new significant new functionality to that This is an inserted line software. Attackers have many advantages against defenders because of this difference. Software developers must try to have no security-relevant mistakes anywhere in their code, while attackers only need to find one. Developers are primarily paid to get their programs to work... attackers don't need to make the program work, they only need to find a single weakness. And as I'll describe in a moment, it takes less information to attack a program than to modify one. Generally attackers (against both open and closed programs) start by knowing about the general kinds of security problems programs have. There's no point in hiding this information; it's already out, and in any case, defenders need that kind of information to defend themselves. Attackers then use techniques techniques (where you run the program) and ``static'' techniques (where you to try to find those problems; I'll group the techniques into ``dynamic'' In ``dynamic'' approaches, an attacker runs the program, sending it data (often problematic data), and sees if the programs' response indicates a examine the program's code - be it source code or machine code). common vulnerability. Open and closed programs have no difference here, since the attacker isn't looking at code. Attackers may also look at the code, the ``static'' approach. For open source software, they'll probably look at the source code and search it for patterns. For closed source software, they might search the machine code (usually presented in assembly language format to simplify the task) for essentially the same patterns. They might also use tools called ``decompilers'' that turn the machine code back into source code and then search the source code for the vulnerable patterns (the same way they would search for vulnerabilities in open source software). See Flake [2001] for one discussion of how closed code can still be examined for This is a replaced line important: even if an attacker wanted to use source code to find a can use a disassembler to re-create the source code of the product. vulnerability, a closed source program has no advantage, because the attacker Non-developers might ask ``if decompilers can create source code from machine machine code?'' The problem is that although developers don't need source code, then why do developers say they need source code instead of just code to find security problems, developers do need source code to make substantial improvements to the program. Although decompilers can turn machine code back into a ``source code'' of sorts, the resulting source code instead of variables like ``grand_total'' you get ``x123123'', instead of is extremely hard to modify. Typically most understandable names are lost, so methods like ``display_warning'' you get ``f123124'', and the code itself may have spatterings of assembly in it. Also, _ALL_ comments and design information are lost. This isn't a serious problem for finding security problems, because generally you're searching for patterns indicating vulnerabilities, not for internal variable or method names. Thus, decompilers can be useful for finding ways to attack programs, but aren't helpful for updating programs. Thus, developers will say ``source code is vital'' when they intend to add is hidden doesn't protect the program very much. functionality), but the fact that the source code for closed source programs This is a replaced line Sometimes it's noted that a vulnerability that exists but is unknown can't be exploited, so the system ``practically secure.'' In theory this is true, but the problem is that once someone finds the vulnerability, the finder may just exploit the vulnerability instead of helping to fix it. Having unknown vulnerabilities doesn't really make the vulnerabilities go away; it simply means that the vulnerabilities are a time bomb, with no way to know when they'll be exploited. Fundamentally, the problem of someone exploiting a vulnerability they discover is a problem for both open and closed source systems. One related claim sometimes made (though not as directly related to OSS/FS) is that people should not post warnings about vulnerabilities and discuss them. This sounds good in theory, but the problem is that attackers already distribute information about vulnerabilities through a large number of channels. In short, such approaches would leave defenders vulnerable, while This is an inserted line doing nothing to inhibit attackers. In the past, companies actively tried to prevent disclosure of vulnerabilities, but experience showed that, in This is a replaced line their users (who could then insist that the vulnerabilities be fixed). This is all part of the argument for ``full disclosure.'' Gartner Group has a blunt commentary in a CNET.com article titled ``Commentary: Hype is the real issue - Tech News.'' They stated: This is an inserted line The comments of Microsoft's Scott Culp, manager of the company's security information. Discussions of morality regarding the distribution of example, the church tried to squelch Copernicus' and Galileo's theory of the sun being at the center of the solar system... Culp's attempt to blame "information security professionals" for the recent spate of vulnerabilities in Microsoft products is at best disingenuous. Perhaps, This is a replaced line This is a replaced line continuous process of improvement. The more widely vulnerabilities become This is a replaced line This is a replaced line 2.4.4. How OSS/FS Counters Trojan Horses This is a replaced line control by a single company, permit people to insert Trojan Horses and other malicious code. Trojan horses can be inserted into open source code, true, This is an inserted line employee can insert malicious code, and in many organizations it's much less likely to be found than in an open source program. After all, no one outside code internally (or, even if they do, few can be assured that the reviewed the organization can review the source code, and few companies review their code is actually what is used). And the notion that a closed-source company can be sued later has little evidence; nearly all licenses disclaim all warranties, and courts have generally not held software development companies liable. Borland's InterBase server is an interesting case in point. Some time between 1992 and 1994, Borland inserted an intentional ``back door'' into their database server, ``InterBase''. This back door allowed any local or remote user to manipulate any database object and install arbitrary programs, and in This is an inserted line vulnerability stayed in the product for at least 6 years - no one else could some cases could lead to controlling the machine as ``root''. This review the product, and Borland had no incentive to remove the vulnerability. Then Borland released its source code on July 2000. The "Firebird" project problem with InterBase in December 2000. By January 2001 the CERT announced began working with the source code, and uncovered this serious security the existence of this back door as CERT advisory CA-2001-01. What's ASCII dump of the program (a common cracker trick). Once this problem was discouraging is that the backdoor can be easily found simply by looking at an You could argue that, by keeping the password unknown, the program stayed safe, and that opening the source made the program less secure. I think this is nonsense, since ASCII dumps are trivial to do and well-known as a standard attack technique, and not all attackers have sudden urges to announce vulnerabilities - in fact, there's no way to be certain that this vulnerability has not been exploited many times. It's clear that after the source was opened, the source code was reviewed over time, and the vulnerabilities found and fixed. One way to characterize this is to say that the original code was vulnerable, its vulnerabilities became easier to exploit when it was first made open source, and then finally these vulnerabilities were fixed. 2.4.5. Other Advantages This is a replaced line The advantages of having source code open extends not just to software that is being attacked, but also extends to vulnerability assessment scanners. Vulnerability assessment scanners intentionally look for vulnerabilities in scanner (which, among other things, found the most legitimate This is an inserted line configured systems. A recent Network Computing evaluation found that the best vulnerabilities) was Nessus, an open source scanner [Forristal 2001]. ----------------------------------------------------------------------------- 2.4.6. Bottom Line This is an inserted line So, what's the bottom line? I personally believe that when a program began as closed source and is then first made open source, it often starts less secure for any users (through exposure of vulnerabilities), and over time (say a few This is a replaced line the program began as open source software, the public scrutiny is more likely to improve its security before it's ready for use by significant numbers of This is a replaced line rule). Just making a program open source doesn't suddenly make a program secure, and just because a program is open source does not guarantee security: points of debate - will people really review code in an open source   * First, people have to actually review the code. This is one of the key project? All sorts of factors can reduce the amount of review: being a niche or rarely-used product (where there are few potential reviewers), having few developers, and use of a rarely-used computer language. This is a replaced line of any kind doesn't have this kind of review. On the other hand, a program that has a primary author and many other people who occasionally examine the code and contribute suggests that there are others reviewing the code (at least to create contributions). In general, if there are more reviewers, there's generally a higher likelihood that someone will that, for example, the OpenBSD project continuously examines programs for security flaws, so the components in its innermost parts have certainly undergone a lengthy review. Since OSS/FS discussions are often held publicly, this level of review is something that potential users can This is a replaced line This is a replaced line One factor that can particularly reduce review likelihood is not actually source'' (also called ``source available'') programs as being open being open source. Some vendors like to posture their ``disclosed source, but since the program owner has extensive exclusive rights, others will have far less incentive to work ``for free'' for the owner on the code. Even open source licenses which have unusually asymmetric likely to voluntarily participate if someone else will have rights to rights (such as the MPL) have this problem. After all, people are less their results that they don't have (as Bruce Perens says, ``who wants to This is a replaced line with the most incentive tend to be people trying to modify the program, this disincentive to participate reduces the number of ``eyeballs''. Elias Levy made this mistake in his article about open source security; his examples of software that had been broken into (e.g., TIS's Gauntlet) were not, at the time, open source.   * Second, at least some of the people developing and reviewing the code This is a replaced line if none of the eyeballs know what to look for. Note that it's not necessary for everyone to know how to write secure programs, as long as those who do know how are examining the code changes.   * Third, once found, these problems need to be fixed quickly and their fixes distributed. Open source systems tend to fix the problems quickly, but the distribution is not always smooth. For example, the OpenBSD This is a replaced line they don't always report the identified problems back to the original developer. Thus, it's quite possible for there to be a fixed version in one system, but for the flaw to remain in another. I believe this problem is lessening over time, since no one ``downstream'' likes to repeatedly fix the same problem. Of course, ensuring that security patches are actually installed on end-user systems is a problem for both open source and closed source software. Another advantage of open source is that, if you find a problem, you can fix it immediately. This really doesn't have any counterpart in closed source. In short, the effect on security of open source software is still a major debate in the security community, though a large number of prominent experts believe that it has great potential to be more secure. ----------------------------------------------------------------------------- This is a replaced line 2.5. Types of Secure Programs Many different types of programs may need to be secure programs (as the term is defined in this book). Some common types are: This is a replaced line   * Application programs used as viewers of remote data. Programs used as viewers (such as word processors or file format viewers) are often asked to view data sent remotely by an untrusted user (this request may be This is a replaced line input should not be allowed to cause the application to run arbitrary programs. It's usually unwise to support initialization macros (run when the data is displayed); if you must, then you must create a secure sandbox (a complex and error-prone task that almost never succeeds, which is why you shouldn't support macros in the first place). Be careful of issues such as buffer overflow, discussed in Chapter 6, which might allow an untrusted user to force the viewer to run an arbitrary program.   * Application programs used by the administrator (root). Such programs shouldn't trust information that can be controlled by non-administrators.   * Local servers (also called daemons). This is an inserted line This is an inserted line   * Network-accessible servers (sometimes called network daemons). of network-accessible servers, but they're so common they deserve their   * Web-based applications (including CGI scripts). These are a special case own category. Such programs are invoked indirectly via a web server, which filters out some attacks but nevertheless leaves many attacks that must be withstood.   * Applets (i.e., programs downloaded to the client for automatic execution). This is something Java is especially famous for, though other languages (such as Python) support mobile code as well. There are several security viewpoints here; the implementer of the applet infrastructure on This is an inserted line the client side has to make sure that the only operations allowed are ``safe'' ones, and the writer of an applet has to deal with the problem of hostile hosts (in other words, you can't normally trust the client). hosts, but frankly I'm skeptical of the value of these approaches and There is some research attempting to deal with running applets on hostile this subject is exotic enough that I don't cover it further here. This is an inserted line   * setuid/setgid programs. These programs are invoked by a local user and, when executed, are immediately granted the privileges of the program's owner and/or owner's group. In many ways these are the hardest programs to secure, because so many of their inputs are under the control of the untrusted user and some of those inputs are not obvious. This book merges the issues of these different types of program into a single set. The disadvantage of this approach is that some of the issues identified here don't apply to all types of programs. In particular, setuid/setgid programs have many surprising inputs and several of the guidelines here only program may cut across these boundaries (e.g., a CGI script may be setuid or setgid, or be configured in a way that has the same effect), and some apply to them. However, things are not so clear-cut, because a particular a different ``type'' of program. The advantage of considering all of these program types together is that we can consider all issues without trying to apply an inappropriate category to a program. As will be seen, many of the principles apply to all programs that need to be secured. This is a replaced line There is a slight bias in this book toward programs written in C, with some notes on other languages such as C++, Perl, PHP, Python, Ada95, and Java. This is because C is the most common language for implementing secure programs on Unix-like systems (other than CGI scripts, which tend to use languages such as Perl, PHP, or Python). Also, most other languages' implementations call the C library. This is not to imply that C is somehow the ``best'' language for this purpose, and most of the principles described here apply regardless of the programming language used. ----------------------------------------------------------------------------- 2.6. Paranoia is a Virtue The primary difficulty in writing secure programs is that writing them that the impact of errors (also called defects or bugs) can be profoundly requires a different mind-set, in short, a paranoid mind-set. The reason is different. Normal non-secure programs have many errors. While these errors are undesirable, these errors usually involve rare or unlikely situations, and if a user should stumble upon one they will try to avoid using the tool that way in the future. In secure programs, the situation is reversed. Certain users will intentionally search out and cause rare or unlikely situations, in the hope writing secure programs, paranoia is a virtue. ----------------------------------------------------------------------------- that such attacks will give them unwarranted privileges. As a result, when 2.7. Why Did I Write This Document? answer: Over the last several years I've noticed that many developers for One question I've been asked is ``why did you write this book''? Here's my Linux and Unix seem to keep falling into the same security pitfalls, again and again. Auditors were slowly catching problems, but it would have been better if the problems weren't put into the code in the first place. I place where developers could go and get information on how to avoid known pitfalls. The information was publicly available, but it was often hard to find, out-of-date, incomplete, or had other problems. Most such information used! That leads up to the answer: I developed this book in the hope that This is a replaced line future software developers won't repeat past mistakes, resulting in more secure systems. You can see a larger discussion of this at [http:// www.linuxsecurity.com/feature_stories/feature_story-6.html] http:// www.linuxsecurity.com/feature_stories/feature_story-6.html. A related question that could be asked is ``why did you write your own book instead of just referring to other documents''? There are several answers:   * Much of this information was scattered about; placing the critical information in one organized document makes it easier to use.   * Some of this information is not written for the programmer, but is written for an administrator or user. This is an inserted line   * Much of the available information emphasizes portable constructs This is a replaced line Linux at all. It's often best to avoid Linux-unique abilities for aid security. Even if non-Linux portability is desired, you may want to portability's sake, but sometimes the Linux-unique abilities can really support the Linux-unique abilities when running on Linux. And, by emphasizing Linux, I can include references to information that is others. helpful to someone targeting Linux that is not necessarily true for ----------------------------------------------------------------------------- 2.8. Sources of Design and Implementation Guidelines This is a replaced line alternatively, how to find security problems in existing programs), and were This is an inserted line the basis for the guidelines highlighted in the rest of this book. For general-purpose servers and setuid/setgid programs, there are a number of valuable documents (though some are difficult to find without having a reference to them). Matt Bishop [1996, 1997] has developed several extremely valuable papers and presentations on the topic, and in fact he has a web page dedicated to the topic at [http://olympus.cs.ucdavis.edu/~bishop/secprog.html] http:// secure_programming_checklist] [AUSCERT 1996], based in part on chapter 23 of This is an inserted line programming checklist [ftp://ftp.auscert.org.au/pub/auscert/papers/ Garfinkel and Spafford's book discussing how to write secure SUID and network programs [http://www.oreilly.com/catalog/puis] [Garfinkel 1996]. [http:// a simple process and checklist for developing secure programs; he later www.sunworld.com/swol-04-1998/swol-04-security.html] Galvin [1998a] described /swol-08-security.html] Galvin [1998b]. [http://www.pobox.com/~kragen/ updated the checklist in [http://www.sunworld.com/sunworldonline/swol-08-1998 security-holes.html] Sitaker [1999] presents a list of issues for the ``Linux security audit'' team to search for. [http://www.homeport.org/~adam/ review.html] Shostack [1999] defines another checklist for reviewing This is a replaced line /security/programming] [NCSA] provides a set of terse but useful secure programming guidelines. Other useful information sources include the Secure Unix Programming FAQ [http://www.whitefang.com/sup/] [Al-Herbish 1999], the Security-Audit's Frequently Asked Questions [http://lsap.org/faq.txt] [Graham 1999], and [http://www.clark.net/pub/mjr/pubs/pdf/] Ranum [1998]. Some recommendations must be taken with caution, for example, the BSD setuid(7) man page [http://www.homeport.org/~adam/setuid.7.html] [Unknown] recommends the use of access(3) without noting the dangerous race conditions that This is a replaced line This is an inserted line ``Security for Programmers'' chapter. [http://www.research.att.com/~smb/ talks] Bellovin [1994] includes useful guidelines and some specific examples, such as how to restructure an ftpd implementation to be simpler and more This is an inserted line secure. FreeBSD provides some guidelines [http://www.freebsd.org/security/ security.html] FreeBSD [1999] [http://developer.gnome.org/doc/guides/ programming-guidelines/book1.html] [Quintero 1999] is primarily concerned with GNOME programming guidelines, but it includes a section on security considerations. [http://www.fish.com/security/murphy.html] [Venema 1996] provides a detailed discussion (with examples) of some common errors when programming secure programs (widely-known or predictable passwords, burning yourself with malicious data, secrets in user-accessible data, and depending on other programs). [http://www.fish.com/security/maldata.html] [Sibert 1996] describes threats arising from malicious data. Michael Bacarella's article [http://m.bacarella.com/papers/secsoft/html] The Peon's Guide To Secure There are many documents giving security guidelines for programs using the This is a replaced line [http://www.csclub.uwaterloo.ca/u/mlvanbie/cgisec] Van Biesbrouck [1996], [http://language.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html] Gundavaram This is an inserted line [unknown], [http://webreview.com/wr/pub/97/08/08/bookshelf] [Garfinkle 1997] [http://www.eekim.com/pubs/cgibook] Kim [1996], [http://www.go2net.com/people This is an inserted line Security/Faq/www-security-faq.html] Stein [1999], [http://members.home.net/ razvan.peteanu] [Peteanu 2000], and [http://advosys.ca/tips/ web-security.html] [Advosys 2000]. There are many documents specific to a language, which are further discussed This is an inserted line distribution includes [http://www.perl.com/pub/doc/manual/html/pod/ in the language-specific sections of this book. For example, the Perl perlsec.html] perlsec(1), which describes how to use Perl more securely. The Secure Internet Programming site at [http://www.cs.princeton.edu/sip] http:// This is an inserted line www.cs.princeton.edu/sip is interested in computer security issues in general, but focuses on mobile code systems such as Java, ActiveX, and This is a replaced line Java ([http://www.securingjava.com] [McGraw 1999]) which is discussed in Section 10.6. Sun's security code guidelines provide some guidelines seccodeguide.html] http://java.sun.com/security/seccodeguide.html. Yoder [1998] contains a collection of patterns to be used when dealing with of commonly-used patterns for programming that you may find useful. The Schmoo group maintains a web page linking to information on how to write application security. It's not really a specific set of guidelines, but a set secure code at [http://www.shmoo.com/securecode] http://www.shmoo.com/ securecode. This is an inserted line There are many documents describing the issue from the other direction (i.e., ``how to crack a system''). One example is McClure [1999], and there's countless amounts of material from that vantage point on the Internet. There are also more general documents on computer architectures on how attacks must be developed to exploit them, e.g., [LSD 2001]. The Honeynet Project has been collecting information (including statistics) on how attackers actually perform their attacks; see their website at [http://project.honeynet.org] There's also a large body of information on vulnerabilities already identified in existing programs. This can be a useful set of examples of ``what not to do,'' though it takes effort to extract more general guidelines This is an inserted line from the large body of specific examples. There are mailing lists that SecurityFocus.com/forums/bugtraq/faq.html] Bugtraq, which among other things a major reporting center for Internet security problems which reports on vulnerabilities. The CERT/CC occasionally produces advisories that provide a description of a serious security problem and its impact, along with This is an inserted line instructions on how to obtain a patch or details of a workaround; for more information see [http://www.cert.org] http://www.cert.org. Note that officially ``CERT'' doesn't stand for anything now. The Department of Energy's Computer Incident Advisory Capability (CIAC) also reports on vulnerabilities. These different groups may identify the same vulnerabilities but use different names. To resolve this problem, MITRE supports the Common Vulnerabilities and Exposures (CVE) list which creates a single unique identifier (``name'') for all publicly known vulnerabilities and security exposures identified by others; see [http://www.cve.mitre.org] http:// vulnerabilities, categorizing each CVE vulnerability so that they can be searched and compared later; see [http://csrc.nist.gov/icat] http:// csrc.nist.gov/icat. This book is a summary of what I believe are the most useful and important guidelines. My goal is a book that a good programmer can just read and then be fairly well prepared to implement a secure program. No single document can really meet this goal, but I believe the attempt is worthwhile. My objective is to strike a balance somewhere between a ``complete list of all possible ``short'' lists available on-line that are nice and short but omit a large guidelines'' (that would be unending and unreadable) and the various number of critical issues. When in doubt, I include the guidance; I believe in that case it's better to make the information available to everyone in this ``one stop shop'' document. The organization presented here is my own (every list has its own, different structure), and some of the guidelines FSUID value) are also my own. Reading all of the referenced documents listed (especially the Linux-unique ones, such as those on capabilities and the above as well is highly recommended, though I realize that for many it's impractical. ----------------------------------------------------------------------------- 2.9. Other Sources of Security Information There are a vast number of web sites and mailing lists dedicated to security issues. Here are some other sources of security information:   * [http://www.securityfocus.com] Securityfocus.com has a wealth of general security-related news and information, and hosts a number of This is a replaced line This is a replaced line lists on SecurityFocus are:   + The ``Bugtraq'' mailing list is, as noted above, a ``full disclosure moderated mailing list for the detailed discussion and announcement of computer security vulnerabilities: what they are, how to exploit them, and how to fix them.''   + The ``secprog'' mailing list is a moderated mailing list for the discussion of secure software development methodologies and techniques. I specifically monitor this list, and I coordinate with its moderator to ensure that resolutions reached in SECPROG (if I This is an inserted line agree with them) are incorporated into this document. holes.   + The ``vuln-dev'' mailing list discusses potential or undeveloped This is an inserted line This is a replaced line   * IBM's ``developerWorks: Security'' has a library of interesting articles. You can learn more from [http://www.ibm.com/developer/security] http:// www.ibm.com/developer/security.   * For Linux-specific security information, a good source is [http:// www.linuxsecurity.com] LinuxSecurity.com. If you're interested in auditing Linux code, places to see include the Linux Security-Audit Project FAQ and [http://www.lkap.org] Linux Kernel Auditing Project are dedicated to auditing Linux code for security issues. Of course, if you're securing specific systems, you should sign up to their security mailing lists (e.g., Microsoft's, Red Hat's, etc.) so you can be warned of any security updates. 2.10. Document Conventions ----------------------------------------------------------------------------- System manual pages are referenced in the format name(number), where number is the section number of the manual. The pointer value that means ``does not point anywhere'' is called NULL; C compilers will convert the integer 0 to the value NULL in most circumstances where a pointer is needed, but note that nothing in the C standard requires that NULL actually be implemented by a series of all-zero bits. C and C++ treat the character '\0' (ASCII 0) called ``NUL'', but ``NUL'' and ``NULL'' sound identical). Function and specially, and this value is referred to as NIL in this book (this is usually method names always use the correct case, even if that means that some sentences must begin with a lower case letter. I use the term ``Unix-like'' to mean Unix, Linux, or other systems whose underlying models are very This is a replaced line This is an inserted line 2000 that implement portions of POSIX yet have vastly different security models. a ``hacker''. Some journalists mistakenly use the word ``hacker'' instead of An attacker is called an ``attacker'', ``cracker'', or ``adversary'', and not ``attacker''; this book avoids this misuse, because many Linux and Unix developers refer to themselves as ``hackers'' in the traditional non-evil This is a replaced line continues to mean simply an expert or enthusiast, particularly regarding computers. It is true that some hackers commit malicious or intrusive actions, but many other hackers do not, and it's unfair to claim that all This is an inserted line hackers perform malicious activities. Many other glossaries and books note that not all hackers are attackers. For example, the Industry Advisory Council's Information Assurance (IA) Special Interest Group (SIG)'s [http:// www.iaconline.org/sig_infoassure.html] Information Assurance Glossary defines hacker as ``A person who delights in having an intimate understanding of the internal workings of computers and computer networks. The term is misused in [http://www.catb.org/~esr/jargon/html/entry/hacker.html] long and complicate This is a replaced line definition for hacker, starting with ``A person who enjoys exploring the details of programmable systems and how to stretch their capabilities, as opposed to most users, who prefer to learn only the minimum necessary.''; it notes although some people use the term to mean ``A malicious meddler who tries to discover sensitive information by poking around'', it also states is ``cracker''. This book uses the ``new'' or ``logical'' quoting system, instead of the that this definition is deprecated and that the correct term for this sense trailing punctuation if the punctuation is not part of the material being quoted. While this may cause a minor loss of typographical beauty, the traditional American system causes extraneous characters to be placed inside the quotes. These extraneous characters have no effect on prose but can be This is an inserted line British) spelling; I've yet to meet an English speaker on any continent who disastrous in code or computer commands. I use standard American (not ----------------------------------------------------------------------------- has trouble with this. Chapter 3. Summary of Linux and Unix Security Features   Discretion will protect you, and understanding will guard you.   Proverbs 2:11 (NIV) Before discussing guidelines on how to use Linux or Unix security features, This section briefly describes those features that are widely available on nearly all Unix-like systems. However, note that there is considerable variation between different versions of Unix-like systems, and not all systems have the abilities described here. This chapter also notes some extensions or features specific to Linux; Linux distributions tend to be fairly similar to each other from the point-of-view of programming for security, because they all use essentially the same kernel and C library (and also notes some of the security-relevant differences between different Unix the GPL-based licenses encourage rapid dissemination of any innovations). It chapter doesn't discuss issues such as implementations of mandatory access know what those features are, please feel free to skip this section. control (MAC) which many Unix-like systems do not implement. If you already Many programming guides skim briefly over the security-relevant portions of Linux or Unix and skip important information. In particular, they often discuss ``how to use'' something in general terms but gloss over the security attributes that affect their use. Conversely, there's a great deal of detailed information in the manual pages about individual functions, but the manual pages sometimes obscure key security issues with detailed discussions on how to use each individual function. This section tries to bridge that gap; it gives an overview of the security mechanisms in Linux that are likely to be used by a programmer, but concentrating specifically on the security ramifications. This section has more depth than the typical programming guides, focusing specifically on security-related matters, and points to references where you can get more details. First, the basics. Linux and Unix are fundamentally divided into two parts: the kernel). Linux supports the concept of ``kernel modules'', which is simply the ability to dynamically load code into the kernel, but note that it still has this fundamental division. Some other systems (such as the HURD) are ``microkernel'' based systems; they have a small kernel with more limited functionality, and a set of ``user'' programs that implement the lower-level functions traditionally implemented by the kernel. Some Unix-like systems have been extensively modified to support strong security, in particular to support U.S. Department of Defense requirements for Mandatory Access Control (level B1 or higher). This version of this book doesn't cover these systems or issues; I hope to expand to that in a future version. More detailed information on some of them is available elsewhere, for example, details on SGI's ``Trusted IRIX/B'' are available in NSA's Final This is an inserted line Evaluation Reports (FERs). When users log in, their usernames are mapped to integers marking their ``UID'' (for ``user id'') and the ``GID''s (for ``group id'') that they are a member of. UID 0 is a special privileged user (role) traditionally called ``root''; on most Unix-like systems (including Unix) root can overrule most security checks and is used to administrate the system. On some Unix systems, GID 0 is also special and permits unrestricted access to resources at the Linux), but even in those systems group 0 is essentially all-powerful because so many special system files are owned by group 0. Processes are the only ``subjects'' in terms of security (that is, only processes are active objects). Processes can access various data objects, in particular filesystem objects (FSOs), System V Interprocess Communication (IPC) objects, and This is an inserted line network ports. Processes can also set signals. Other security-relevant topics include quotas and limits, libraries, auditing, and PAM. The next few subsections detail this. ----------------------------------------------------------------------------- 3.1. Processes In Unix-like systems, user-level activities are implemented by running processes. Most Unix systems support a ``thread'' as a separate concept; threads share memory inside a process, and the system scheduler actually schedules threads. Linux does this differently (and in my opinion uses a better approach): there is no essential difference between a thread and a process. Instead, in Linux, when a process creates another process it can choose what resources are shared (e.g., memory can be shared). The Linux kernel then performs optimizations to get thread-level speeds; see clone(2) for more information. It's worth noting that the Linux kernel developers tend to use the word ``task'', not ``thread'' or ``process'', but the external This is a replaced line here). When programming a multi-threaded application, it's usually better to This is a replaced line only does this make threading more portable, but some libraries provide an additional level of indirection, by implementing more than one application-level thread as a single operating system thread; this can provide some improved performance on some systems for some applications. ----------------------------------------------------------------------------- 3.1.1. Process Attributes Here are typical attributes associated with each process in a Unix-like system:   * RUID, RGID - real UID and GID of the user on whose behalf the process is running the filesystem) This is a replaced line This is a replaced line this, but the vast majority do (including Linux and Solaris); if you want to check if a given system implements this option in the POSIX standard, you can use sysconf(2) to determine if _POSIX_SAVED_IDS is in effect.   * supplemental groups - a list of groups (GIDs) in which this user has membership. In the original version 7 Unix, this didn't exist - processes were only a member of one group at a time, and a special command had to in each process, which is more flexible, and this addition is now widely implemented (including by Linux and Solaris).   * umask - a set of bits determining the default access control settings when a new filesystem object is created; see umask(2).   * scheduling parameters - each process has a scheduling policy, and those with the default policy SCHED_OTHER have the additional parameters nice, priority, and counter. See sched_setscheduler(2) for more information.   * limits - per-process resource limits (see below). begins; see chroot(2).   * filesystem root - the process' idea of where the root filesystem ("/") Here are less-common attributes associated with processes:   * FSUID, FSGID - UID and GID used for filesystem access checks; this is usually equal to the EUID and EGID respectively. This is a Linux-unique attribute. sets of capabilities on a process: the effective, inheritable, and   * capabilities - POSIX capability information; there are actually three permitted capabilities. See below for more information on POSIX capabilities. Linux kernel version 2.2 and greater support this; some other Unix-like systems do too, but it's not as widespread. In Linux, if you really need to know exactly what attributes are associated with each process, the most definitive source is the Linux source code, in particular /usr/include/linux/sched.h's definition of task_struct. This is a replaced line line with vfork(2) is simple: don't use it if you can avoid it. See Section 8.6 for more information. Linux supports the Linux-unique clone(2) call. This call works like fork(2), but allows specification of which resources should be shared (e.g., memory, (originally developed in Plan9); it has different semantics but the same file descriptors, etc.). Various BSD systems implement an rfork() system call general idea (it also creates a process with tighter control over what is This is an inserted line as noted earlier, they should instead rely on threading libraries that use such calls to implement threads. This book is not a full tutorial on writing programs, so I will skip widely-available information handling processes. You can see the documentation for wait(2), exit(2), and so on for more information. This is an inserted line ----------------------------------------------------------------------------- 3.1.2. POSIX Capabilities POSIX capabilities are sets of bits that permit splitting of the privileges typically held by root into a larger set of more specific privileges. POSIX This is a replaced line Linux but they're not universally supported by other Unix-like systems This is a replaced line 2.2 added support for POSIX capabilities to processes. When Linux documentation (including this one) says ``requires root privilege'', in the capability documentation. If you need to know the specific capability nearly all cases it really means ``requires a capability'' as documented in required, look it up in the capability documentation. In Linux, the eventual intent is to permit capabilities to be attached to files in the filesystem; as of this writing, however, this is not yet supported. There is support for transferring capabilities, but this is This is a replaced line capabilities more directly useful, called the ``capability bounding set''. The capability bounding set is a list of capabilities that are allowed to be held by any process on the system (otherwise, only the special init process can hold it). If a capability does not appear in the bounding set, it may not be exercised by any process, no matter how privileged. This feature can be used to, for example, disable kernel module loading. A sample tool that takes This is an inserted line This is an inserted line advantage of this is LCAP at [http://pweb.netcom.com/~spoon/lcap/] http:// This is an inserted line pweb.netcom.com/~spoon/lcap/. More information about POSIX capabilities is available at [ftp:// linux.kernel.org/pub/linux/libs/security/linux-privs] ftp://linux.kernel.org/ pub/linux/libs/security/linux-privs. This is a replaced line This is an inserted line 3.1.3. Process Creation and Manipulation Processes may be created using fork(2), the non-recommended vfork(2), or the Linux-unique clone(2); all of these system calls duplicate the existing This is an inserted line process, creating two processes out of it. A process can execute a different program by calling execve(2), or various front-ends to it (for example, see exec(3), system(3), and popen(3)). This is a replaced line When a program is executed, and its file has its setuid or setgid bit set, This is a replaced line This functionality was the source of an old Unix security weakness when used to support setuid or setgid scripts, due to a race condition. Between the time the kernel opens the file to see which interpreter to run, and when the (now-set-id) interpreter turns around and reopens the file to interpret it, an attacker might change the file (directly or via symbolic links). This is an inserted line different ways. Some systems, such as Linux, completely ignore the setuid and setgid bits when executing scripts, which is clearly a safe approach. Most This is a replaced line kernel race condition. On these systems, when the kernel passes the name of the set-id script to open to the interpreter, rather than using a pathname (which would permit the race condition) it instead passes the filename /dev/ fd/3. This is a special file already opened on the script, so that there can recommend against using the setuid/setgid shell scripts language for secure be no race condition for attackers to exploit. Even on these systems I In some cases a process can affect the various UID and GID values; see setuid (2), seteuid(2), setreuid(2), and the Linux-unique setfsuid(2). In particular the saved user id (SUID) attribute is there to permit trusted programs to temporarily switch UIDs. Unix-like systems supporting the SUID use the following rules: If the RUID is changed, or the EUID is set to a value not equal to the RUID, the SUID is set to the new EUID. Unprivileged users can RUID. set their EUID from their SUID, the RUID to the EUID, and the EUID to the The Linux-unique FSUID process attribute is intended to permit programs like This is a replaced line given UID without giving that UID permission to send signals to the process. Whenever the EUID is changed, the FSUID is changed to the new EUID value; the FSUID value can be set separately using setfsuid(2), a Linux-unique call. Note that non-root callers can only set FSUID to the current RUID, EUID, SEUID, or current FSUID values. ----------------------------------------------------------------------------- This is a replaced line On all Unix-like systems, the primary repository of information is the file tree, rooted at ``/''. The file tree is a hierarchical set of directories, each of which may contain filesystem objects (FSOs). In Linux, filesystem objects (FSOs) may be ordinary files, directories, symbolic links, named pipes (also called first-in first-outs or FIFOs), sockets (see below), character special (device) files, or block special (device) files (in Linux, this list is given in the find(1) command). Other Unix-like systems have an identical or similar list of FSO types. Filesystem objects are collected on filesystems, which can be mounted and unmounted on directories in the file tree. A filesystem type (e.g., ext2 and optimize speed, reliability, and so on; many people use the term ----------------------------------------------------------------------------- ``filesystem'' as a synonym for the filesystem type. 3.2.1. Filesystem Object Attributes Different Unix-like systems support different filesystem types. Filesystems may have slightly different sets of access control attributes and access This is a replaced line ext2 filesystems is currently the most popular filesystem, but Linux supports filesystems too. Most filesystems on Unix-like systems store at least the following: Only the owner or root can change the access control attributes unless   * owning UID and GID - identifies the ``owner'' of the filesystem object. otherwise noted.   * permission bits - read, write, execute bits for each of user (owner), group, and other. For ordinary files, read, write, and execute have their display a directory's contents, while the ``execute'' permission is typical meanings. In directories, the ``read'' permission is necessary to sometimes called ``search'' permission and is necessary to actually enter a directory permits adding, removing, and renaming files in that the directory to use its contents. In a directory ``write'' permission on This is a replaced line This is a replaced line it's only the values of their containing directories and the linked-to This is an inserted line This is a replaced line This is an inserted line   * ``sticky'' bit - when set on a directory, unlinks (removes) and renames of files in that directory are limited to the file owner, the directory owner, or root privileges. This is a very common Unix extension and is versions of Unix called this the ``save program text'' bit and used this specified in the Open Group's Single Unix Specification version 2. Old to indicate executable files that should stay in memory. Systems that did this ensured that only root could set this bit (otherwise users could have crashed systems by forcing ``everything'' into memory). In Linux, this bit has no effect on ordinary files and ordinary users can modify This is an inserted line This is a replaced line This is an inserted line This is a replaced line   * setuid, setgid - when set on an executable file, executing the file will set the process' effective UID or effective GID to the value of the file's owning UID or GID (respectively). All Unix-like systems support this. In Linux and System V systems, when setgid is set on a file that subject to mandatory locking during access (if the filesystem is mounted does not have any execute privileges, this indicates a file that is to support mandatory locking); this overload of meaning surprises many and is not universal across Unix-like systems. In fact, the Open Group's Single Unix Specification version 2 for chmod(3) permits systems to such a setting has no meaning. In Linux and Solaris, when setgid is set on a directory, files created in the directory will have their GID ignore requests to turn on setgid for files that aren't executable if automatically reset to that of the directory's GID. The purpose of this approach is to support ``project directories'': users can save files into such specially-set directories and the group owner automatically changes. However, setting the setgid bit on directories is not specified by standards such as the Single Unix Specification [Open Group 1997].   * timestamps - access and modification times are stored for each filesystem object. However, the owner is allowed to set these values arbitrarily (see touch(1)), so be careful about trusting this information. All Unix-like systems support this. The following attributes are Linux-unique extensions on the ext2 filesystem, though many other filesystems have similar functionality: This is a replaced line root can set or clear this bit. This is only supported by ext2 and is not   * immutable bit - no changes to the filesystem object are allowed; only portable across all Unix systems (or even all Linux filesystems).   * append-only bit - only appending to the filesystem object are allowed; only root can set or clear this bit. This is only supported by ext2 and is not portable across all Unix systems (or even all Linux filesystems). this file''. Many of these values can be influenced at mount time, so that, for example, certain bits can be treated as though they had a certain value (regardless of their values on the media). See mount(1) for more information about this. These bits are useful, but be aware that some of these are intended to simplify ease-of-use and aren't really sufficient to prevent certain actions. For example, on Linux, mounting with ``noexec'' will disable execution of programs on that file system; as noted in the manual, it's intended for This is a replaced line copy the files somewhere else to run them, or even use the command ``/lib/ This is an inserted line ld-linux.so.2'' to run the file directly. Some filesystems don't support some of these access control values; again, Unix-like systems support MS-DOS disks, which by default support very few of This is a replaced line these attributes (and there's not standard way to define these attributes). In that case, Unix-like systems emulate the standard attributes (possibly implementing them through special on-disk files), and these attributes are generally influenced by the mount(1) command. permission bits and owner of the file's directory really matter unless the This is an inserted line It's important to note that, for adding and removing files, only the Unix-like system supports more complex schemes (such as POSIX ACLs). Unless This is an inserted line no permissions in its permission bits can still be removed if its containing the system has other extensions, and stock Linux 2.2 doesn't, a file that has directory permits it. Also, if an ancestor directory permits its children to be changed by some user or group, then any of that directory's descendants can be replaced by that user or group. The draft IEEE POSIX standard on security defines a technique for true ACLs that support a list of users and groups with their permissions. way across Unix-like systems. Stock Linux 2.2, for example, has neither ACLs nor POSIX capability values in the filesystem. This is a replaced line against denial-of-service attacks; even if a user fills a disk that is shared with the root user, the root user has a little space left over (e.g., for This is a replaced line (8), in particular its ``-m'' option. ----------------------------------------------------------------------------- 3.2.2. Creation Time Initial Values At creation time, the following rules apply. On most Unix systems, when a new This is a replaced line the process' EUID and the FSO's GID is set to the process' EGID. Linux works slightly differently due to its FSUID extensions; the FSO's UID is set to the containing directory's setgid bit is set or the filesystem's ``GRPID'' flag process' FSUID, and the FSO GID is set to the process' FSGUID; if the Many systems, including Sun Solaris and Linux, also support the setgid ``project'' directories: to make a ``project'' directory, create a special directory extensions. As noted earlier, this special case supports This is an inserted line group for the project, create a directory for the project owned by that group, then make the directory setgid: files placed there are automatically owned by the project. Similarly, if a new subdirectory is created inside a directory with the setgid bit set (and the filesystem GRPID isn't set), the new subdirectory will also have its setgid bit set (so that project subdirectories will ``do the right thing''.); in all other cases the setgid is clear for a new file. This is the rationale for the ``user-private group'' This is a replaced line a member of a ``private'' group with just themselves as members, so their defaults can permit the group to read and write any file (since they're the transferred this way, read and write privileges are transferred too. FSO basic access control values (read, write, execute) are computed from (requested values & ~ umask of process). New files always start with a clear sticky bit and clear setuid bit. ----------------------------------------------------------------------------- 3.2.3. Changing Access Control Attributes You can set most of these values with chmod(2), fchmod(2), or chmod(1) but see also chown(1), and chgrp(1). In Linux, some of the Linux-unique attributes are manipulated using chattr(1). Note that in Linux, only root can change the owner of a given file. Some Unix-like systems allow ordinary users to transfer ownership of their files to another, but this causes complications and is forbidden by Linux. For example, if you're trying to limit disk usage, allowing such operations would allow users to claim that large files actually belonged to some other ``victim''. ----------------------------------------------------------------------------- 3.2.4. Using Access Control Attributes Under Linux and most Unix-like systems, reading and writing attribute values are only checked when the file is opened; they are not re-checked on every since the filesystem is so central to Unix-like systems. Calls that check read or write. Still, a large number of calls do check these attributes, these attributes include open(2), creat(2), link(2), unlink(2), rename(2), mknod(2), symlink(2), and socket(2). ----------------------------------------------------------------------------- 3.2.5. Filesystem Hierarchy Over the years conventions have been built on ``what files to place where''. Where possible, please follow conventional use when placing information in the hierarchy. For example, place global configuration information in /etc. The Filesystem Hierarchy Standard (FHS) tries to define these conventions in a logical manner, and is widely used by Linux systems. The FHS is an update lessons learned and approaches from Linux, BSD, and System V systems. See to the previous Linux Filesystem Structure standard (FSSTND), incorporating [http://www.pathname.com/fhs] http://www.pathname.com/fhs for more information about the FHS. A summary of these conventions is in hier(5) for Linux and hier(7) for Solaris. Sometimes different conventions disagree; where possible, make these situations configurable at compile or installation time. I should note that the FHS has been adopted by the [http://www.linuxbase.org] Linux Standard Base which is developing and promoting a set of standards to applications to run on any compliant Linux system. increase compatibility among Linux distributions and to enable software ----------------------------------------------------------------------------- 3.3. System V IPC This is a replaced line V interprocess communication (IPC) objects. Indeed System V IPC is required by the Open Group's Single UNIX Specification, Version 2 [Open Group 1997]. System V IPC objects can be one of three kinds: System V message queues, semaphore sets, and shared memory segments. Each such object has the following attributes:   * read and write permissions for each of creator, creator group, and others.   * creator UID and GID - UID and GID of the creator of the object. This is an inserted line equal to the creator UID). When accessing such objects, the rules are as follows: This is a replaced line appropriate creator permission bit is checked to see if access is   * if the process' EUID is the owner or creator UID of the object, then the granted. This is an inserted line   * if the process' EGID is the owner or creator GID of the object, or one of the process' groups is the owning or creating GID of the object, then the appropriate creator group permission bit is checked for access.   * otherwise, the appropriate ``other'' permission bit is checked for access. Note that root, or a process with the EUID of either the owner or creator, can set the owning UID and owning GID and/or remove the object. More information is available in ipc(5). ----------------------------------------------------------------------------- 3.4. Sockets and Network Connections Sockets are used for communication, particularly over a network. Sockets were originally developed by the BSD branch of Unix systems, but they are generally portable to other Unix-like systems: Linux and System V variants support sockets as well, and socket support is required by the Open Group's Single Unix Specification [Open Group 1997]. System V systems traditionally used a different (incompatible) network communication interface, but it's worth noting that systems like Solaris include support for sockets. Socket(2) similar to open(2) for files. The parameters for socket specify the protocol creates an endpoint for communication and returns a descriptor, in a manner family and type, such as the Internet domain (TCP/IP version 4), Novell's IPX, or the ``Unix domain''. A server then typically calls bind(2), listen This is an inserted line (2), and accept(2) or select(2). A client typically calls bind(2) (though This is an inserted line This is an inserted line for more information. It can be difficult to understand how to use sockets that may be omitted) and connect(2). See these routine's respective man pages from their man pages; you might want to consult other papers such as Hall "Beej" [1999] to learn how these calls are used together. This is an inserted line The ``Unix domain sockets'' don't actually represent a network protocol; they can only connect to sockets on the same machine. (at the time of this writing similar to named pipes, but with significant advantages. In particular, Unix for the standard Linux kernel). When used as a stream, they are fairly This is an inserted line domain socket is connection-oriented; each new connection to the socket results in a new communication channel, a very different situation than with This is an inserted line named pipes. Because of this property, Unix domain sockets are often used instead of named pipes to implement IPC for many important services. Just like you can have unnamed pipes, you can have unnamed Unix domain sockets using socketpair(2); unnamed Unix domain sockets are useful for IPC in a way similar to unnamed pipes. There are several interesting security implications of Unix domain sockets. This is a replaced line stat(2) applied to them, you can't use open(2) to open them (you have to use First, although Unix domain sockets can appear in the filesystem and can have This odd capability, not available in any other IPC mechanism, has been used to hack all sorts of schemes (the descriptors can basically be used as a to pass file descriptors between processes (not just the file's contents). limited version of the ``capability'' in the computer science sense of the term). File descriptors are sent using sendmsg(2), where the msg (message)'s This is a replaced line msg_controllen must specify the number of bytes contained in the array). Each control message is a struct cmsghdr followed by data, and for this purpose you want the cmsg_type set to SCM_RIGHTS. A file descriptor is retrieved through recvmsg(2) and then tracked down in the analogous way. Frankly, this feature is quite baroque, but it's worth knowing about. Linux 2.2 and later supports an additional feature in Unix domain sockets: you can acquire the peer's ``credentials'' (the pid, uid, and gid). Here's some sample code: /* fd= file descriptor of Unix domain socket connected to the client you wish to identify */ struct ucred cr; int cl=sizeof(cr); This is an inserted line This is a replaced line if (getsockopt(fd, SOL_SOCKET, SO_PEERCRED, &cr, &cl)==0) { printf("Peer's pid=%d, uid=%d, gid=%d\n", This is a replaced line Standard Unix convention is that binding to TCP and UDP local port numbers less than 1024 requires root privilege, while any process can bind to an This is a replaced line CAP_NET_BIND_SERVICE to bind to a port number less than 1024; this capability specifically, Linux requires a process to have the capability is normally only held by processes with an EUID of 0. The adventurous can file /usr/src/linux/net/ipv4/af_inet.c, function inet_bind(). check this in Linux by examining its Linux's source; in Linux 2.2.12, it's ----------------------------------------------------------------------------- 3.5. Signals Signals are a simple form of ``interruption'' in the Unix-like OS world, and are an ancient part of Unix. A process can set a ``signal'' on another process (say using kill(1) or kill(2)), and that other process would receive and handle the signal asynchronously. For a process to have permission to send an arbitrary signal to some other process, the sending process must either have root privileges, or the real or effective user ID of the sending This is an inserted line process must equal the real or saved set-user-ID of the receiving process. However, some signals can be sent in other ways. In particular, SIGURG can be This is a replaced line Although signals are an ancient part of Unix, they've had different semantics in different implementations. Basically, they involve questions such as ``what happens when a signal occurs while handling another signal''? The older Linux libc 5 used a different set of semantics for some signal operations than the newer GNU libc libraries. Calling C library functions is often unsafe within a signal handler, and even some system calls aren't safe; promises to be safe to call inside a signal. For more information, see the you need to examine the documentation for each call you make to see if it This is an inserted line FAQ). For new programs, just use the POSIX signal system (which in turn was based glibc FAQ (on some systems a local copy is available at /usr/doc/glibc-*/ problems that some of the older signal systems did. The POSIX signal system on BSD work); this set is widely supported and doesn't have some of the is based on using the sigset_t datatype, which can be manipulated through a set of operations: sigemptyset(), sigfillset(), sigaddset(), sigdelset(), and sigismember(). You can read about these in sigsetops(3). Then use sigaction (2), sigprocmask(2), sigpending(2), and sigsuspend(2) to set up an manipulate signal handling (see their man pages for more information). carefully for race conditions. Signals, since they are by nature asynchronous, can easily cause race conditions. A common convention exists for servers: if you receive SIGHUP, you should close any log files, reopen and reread configuration files, and then re-open This is an inserted line This is an inserted line log rotation without data loss. If you are writing a server where this convention makes sense, please support it. Michal Zalewski [2001] has written an excellent tutorial on how signal handlers are exploited, and has recommendations for how to eliminate signal race problems. I encourage looking at his summary for more information; here are my recommendations, which are similar to Michal's work:   * Where possible, have your signal handlers unconditionally set a specific flag and do nothing else.   * If you must have more complex signal handlers, use only calls This is a replaced line particular, don't use malloc() or free() in C (which on most systems aren't protected against signals), nor the many functions that depend on them (such as the printf() family and syslog()). You could try to ``wrap'' calls to insecure library calls with a check to a global flag (to avoid re-entry), but I wouldn't recommend it.   * Block signal delivery during all non-atomic operations in the program, and block signal delivery inside signal handlers. 3.6. Quotas and Limits ----------------------------------------------------------------------------- Many Unix-like systems have mechanisms to support filesystem quotas and process resource limits. This certainly includes Linux. These mechanisms are particularly useful for preventing denial of service attacks; by limiting the resources available to each user, you can make it hard for a single user to use up all the system resources. Be careful with terminology here, because limits but the terms mean slightly different things. both filesystem quotas and process resource limits have ``hard'' and ``soft'' You can define storage (filesystem) quota limits on each mountpoint for the This is an inserted line number of blocks of storage and/or the number of unique files (inodes) that can be used, and you can set such limits for a given user or a given group. A ``hard'' quota limit is a never-to-exceed limit, while a ``soft'' quota can This is a replaced line The rlimit mechanism supports a large number of process quotas, such as file size, number of child processes, number of open files, and so on. There is a ``soft'' limit (also called the current limit) and a ``hard limit'' (also called the upper limit). The soft limit cannot be exceeded at any time, but through calls it can be raised up to the value of the hard limit. See This is a replaced line This is a replaced line pam_limits. ----------------------------------------------------------------------------- 3.7. Dynamically Linked Libraries Practically all programs depend on libraries to execute. In most modern Unix-like systems, including Linux, programs are by default compiled to use dynamically linked libraries (DLLs). That way, you can update a library and all the programs using that library will use the new (hopefully improved) version if they can. Dynamically linked libraries are typically placed in one a few special directories. The usual directories include /lib, /usr/lib, /lib/security for PAM modules, /usr/X11R6/lib for X-windows, and /usr/local/lib. You should use these standard conventions in your programs, in particular, except during debugging you shouldn't use value computed from the current directory as a source for dynamically linked libraries (an attacker may be able to add their own choice ``library'' values). There are special conventions for naming libraries and having symbolic links for them, with the result that you can update libraries and still support programs that want to use old, non-backward-compatible versions of those specific functions in a library when executing a particular program. This is a real advantage of Unix-like systems over Windows-like systems; I believe reason that Unix and Linux systems are reputed to be more stable than Unix-like systems have a much better system for handling library updates, one This is an inserted line Windows-based systems. directories automatically searched during program start-up is stored in the On GNU glibc-based systems, including all Linux systems, the list of file /etc/ld.so.conf. Many Red Hat-derived distributions don't normally include /usr/local/lib in the file /etc/ld.so.conf. I consider this a bug, and adding /usr/local/lib to /etc/ld.so.conf is a common ``fix'' required to run many programs on Red Hat-derived systems. If you want to just override a few functions in a library, but keep the rest of the library, you can enter the names of overriding libraries (.o files) in /etc/ld.so.preload; these preloading file is typically used for emergency patches; a distribution ``preloading'' libraries will take precedence over the standard set. This This is a replaced line directories at program start-up would be too time-consuming, so a caching arrangement is actually used. The program ldconfig(8) by default reads in the file /etc/ld.so.conf, sets up the appropriate symbolic links in the dynamic link directories (so they'll follow the standard conventions), and then writes a cache to /etc/ld.so.cache that's then used by other programs. So, ldconfig has to be run whenever a DLL is added, when a DLL is removed, or when the set of DLL directories changes; running ldconfig is often one of the steps performed by package managers when installing a library. On start-up, then load the libraries it needs. then, a program uses the dynamic loader to read the file /etc/ld.so.cache and Various environment variables can control this process, and in fact there are environment variables that permit you to override this process (so, for particular execution). In Linux, the environment variable LD_LIBRARY_PATH is example, you can temporarily substitute a different library for this a colon-separated set of directories where libraries are searched for first, before the standard set of directories; this is useful when debugging a new library or using a nonstandard library for special purposes, but be sure you trust those who can control those directories. The variable LD_PRELOAD lists object files with functions that override the standard set, just as /etc/ displayed while it's occurring. set to ``all'', voluminous information about the dynamic linking process is for setuid/setgid programs if special measures weren't taken. Therefore, in Permitting user control over dynamically linked libraries would be disastrous the GNU glibc implementation, if the program is setuid or setgid these variables (and other similar variables) are ignored or greatly limited in what they can do. The GNU glibc library determines if a program is setuid or setgid by checking the program's credentials; if the UID and EUID differ, or the GID and the EGID differ, the library presumes the program is setuid/ setgid (or descended from one) and therefore greatly limits its abilities to control linking. If you load the GNU glibc libraries, you can see this; see especially the files elf/rtld.c and sysdeps/generic/dl-sysdep.c. This means that if you cause the UID and GID to equal the EUID and EGID, and then call a program, these variables will have full effect. Other Unix-like systems program should not be unduly affected by the environment variables set. Note handle the situation differently but for the same reason: a setuid/setgid that graphical user interface toolkits generally do permit user control over graphical user inteface toolkits should never, ever, be setuid (or have other dynamically linked libraries, because executables that directly invoke special privileges) at all. For more about how to develop secure GUI applications, see Section 7.4.4. For Linux systems, you can get more information from my document, the Program This is a replaced line ----------------------------------------------------------------------------- 3.8. Audit Different Unix-like systems handle auditing differently. In Linux, the most This is a replaced line klogd(8). You might also want to look at wtmp(5), utmp(5), lastlog(8), and common ``audit'' mechanism is syslogd(8), usually working in conjunction with acct(2). Some server programs (such as the Apache web server) also have their own audit trail mechanisms. According to the FHS, audit logs should be stored This is an inserted line in /var/log or its subdirectories. ----------------------------------------------------------------------------- 3.9. PAM Sun Solaris and nearly all Linux systems use the Pluggable Authentication of authentication methods (e.g., use of passwords, smart cards, etc.). See Section 11.6 for more information on using PAM. ----------------------------------------------------------------------------- 3.10. Specialized Security Extensions for Unix-like Systems A vast amount of research and development has gone into extending Unix-like systems to support security needs of various communities. For example, several Unix-like systems have been extended to support the U.S. military's desire for multilevel security. If you're developing software, you should try to design your software so that it can work within these extensions. FreeBSD has a new system call, [http://docs.freebsd.org/44doc/papers/jail/ jail.html] jail(2). The jail system call supports sub-partitioning an most popular use has been to provide virtual machine services for Internet environment into many virtual machines (in a sense, a ``super-chroot''); its by root) have the the scope of their requests limited to the jail. When a FreeBSD system is booted up after a fresh install, no processes will be in jail. When a process is placed in a jail, it, and any descendants of that process created will be in that jail. Once in a jail, access to the file name-space is restricted in the style of chroot(2) (with typical chroot escape routes blocked), the ability to bind network resources is limited to a specific IP address, the ability to manipulate system resources and perform privileged operations is sharply curtailed, and the ability to interact with other processes is limited to only processes inside the same jail. Note that each jail is bound to a single IP address; processes within the jail may not make use of any other IP address for outgoing or incoming connections. Some extensions available in Linux, such as POSIX capabilities and special mount-time options, have already been discussed. Here are a few of these efforts for Linux systems for creating restricted execution environments; This is an inserted line there are many different approaches. The U.S. National Security Agency (NSA) has developed [http://www.nsa.gov/selinux] Security-Enhanced Linux (Flask), which supports defining a security policy in a specialized language and then enforces that policy. The [http://medusa.fornax.sk] Medusa DS9 extends Linux This is a replaced line //www.lids.org] LIDS protects files and processes, allowing administrators to This is an inserted line ``lock down'' their system. The ``Rule Set Based Access Control'' system, [http://www.rsbac.de] RSBAC is based on the Generalized Framework for Access Control (GFAC) by Abrams and LaPadula and provides a flexible system of Subterfugue is a framework for ``observing and playing with the reality of software''; it can intercept system calls and change their parameters and/or runs under Linux 2.4 with no changes (it doesn't require any kernel change their return values to implement sandboxes, tracers, and so on; it modifications). [http://www.cs.berkeley.edu/~daw/janus] Janus is a security tool for sandboxing untrusted applications within a restricted execution environment. Some have even used [http://user-mode-linux.sourceforge.net] This is an inserted line User-mode Linux, which implements ``Linux on Linux'', as a sandbox implementing more sophisticated security models, Linus Torvalds has requested implementation. Because there are so many different approaches to that a generic approach be developed so different security policies can be This is a replaced line /listinfo/linux-security-module] http://mail.wirex.com/mailman/listinfo/ linux-security-module. There are many other extensions for security on various Unix-like systems, but these are really outside the scope of this document. ----------------------------------------------------------------------------- Chapter 4. Security Requirements   You will know that your tent is secure; you will take stock of your property and find nothing missing.   Job 5:24 (NIV) Before you can determine if a program is secure, you need to determine exactly what its security requirements are. Thankfully, there's an international standard for identifying and defining security requirements standardized as ISO/IEC 15408:1999. The CC is the culmination of decades of that is useful for many such circumstances: the Common Criteria [CC 1999], work to identify information technology security requirements. There are other schemes for defining security requirements and evaluating products to see if products meet the requirements, such as NIST FIPS-140 for cryptographic equipment, but these other schemes are generally focused on a This chapter briefly describes the Common Criteria (CC) and how to use its specialized area and won't be considered further here. concepts to help you informally identify security requirements and talk with others about security requirements using standard terminology. The language of the CC is more precise, but it's also more formal and harder to Note that, in some circumstances, software cannot be used unless it has undergone a CC evaluation by an accredited laboratory. This includes certain kinds of uses in the U.S. Department of Defense (as specified by NSTISSP Number 11, which requires that before some products can be used they must be include some kinds of uses for software in the U.S. federal government. This section doesn't provide enough information if you plan to actually go through a CC evaluation by an accredited laboratory. If you plan to go through a This is an inserted line formal evaluation, you need to read the real CC, examine various websites to really understand the basics of the CC, and eventually contract a lab accredited to do a CC evaluation. ----------------------------------------------------------------------------- 4.1. Common Criteria Introduction First, some general information about the CC will help understand how to Information Technology Security Evaluation", though it's normally just called the Common Criteria. The CC document has three parts: the introduction (that describes the CC overall), security functional requirements (that lists various kinds of security functions that products might want to include), and security assurance requirements (that lists various methods of assuring that Evaluation Methodology" (CEM), that guides evaluators how to apply the CC a product is secure). There is also a related document, the "Common when doing formal evaluations (in particular, it amplifies what the CC means This is a replaced line Although the CC is International Standard ISO/IEC 15408:1999, it is outrageously expensive to order the CC from ISO. Hopefully someday ISO will follow the lead of other standards organizations such as the IETF and the W3C, which freely redistribute standards. Not surprisingly, IETF and W3C standards are followed more often than many ISO standards, in part because ISO's fees for standards simply make them inaccessible to most developers. (I don't mind authors being paid for their work, but ISO doesn't fund most of the standards development work - indeed, many of the developers of ISO documents are volunteers - so ISO's indefensible fees only line their own CC developers anticipated this problem and have made sure that the CC's pockets and don't actually aid the authors or users at all.) Thankfully, the technical content from [http://csrc.nist.gov/cc/ccv20/ccv2list.htm] http:// technical content is freely available to all; you can download the CC's csrc.nist.gov/cc/ccv20/ccv2list.htm Even those doing formal evaluation processes usually use these editions of the CC, and not the ISO versions; there's simply no good reason to pay ISO for them. Although it can be used in other ways, the CC is typically used to create two kinds of documents, a ``Protection Profile'' (PP) or a ``Security Target'' (ST). A ``protection profile'' (PP) is a document created by group of users (for example, a consumer group or large organization) that identifies the desired security properties of a product. Basically, a PP is a list of user security requirements, described in a very specific way defined by the CC. If This is a replaced line possible that there are one or more PPs that define what some users believe are necessary for that kind of product (e.g., an operating system or firewall). A ``security target'' (ST) is a document that identifies what a product actually does, or a subset of it, that is security-relevant. An ST doesn't need to meet the requirements of any particular PP, but an ST could meet the requirements of one or more PPs. Both PPs and STs can go through a formal evaluation. An evaluation of a PP simply ensures that the PP meets various documentation rules and sanity checks. An ST evaluation involves not just examining the ST document, but more importantly it involves evaluating an actual system (called the ``target of evaluation'', or TOE). The purpose of an ST evaluation is to ensure that, to the level of the assurance requirements specified by the ST, the actual product (the TOE) meets the ST's security functional requirements. Customers This is an inserted line can then compare evaluated STs to PPs describing what they want. Through this comparison, consumers can determine if the products meet their requirements - and if not, where the limitations are. To create a PP or ST, you go through a process of identifying the security environment, namely, your assumptions, threats, and relevant organizational security objectives for the product or product type. Finally, the security requirements are selected so that they meet the objectives. There are two kinds of security requirements: functional requirements (what a product has to be able to do), and assurance requirements (measures to inspire confidence This is a replaced line a simple straight line as outlined here, but the final result needs to show a clear relationship so that no critical point is easily overlooked. Even if you don't plan to write an ST or PP, the ideas in the CC can still be helpful; the process of identifying the security environment, objectives, and requirements is still helpful in identifying what's really important. requirements and assurance requirements. In essence, the majority of the CC want. PP authors pick from the various options to describe what they want, is a ``chinese menu'' of possible security requirements that someone might This is a replaced line assurance requirements, so pre-created sets of assurance requirements called 7. EAL 2 is simply a standard shorthand for the set of assurance requirements ``evaluation assurance levels'' (EALs) have been defined, ranging from 1 to example, they might choose EAL 2 plus some additional assurance measures (if defined for EAL 2. Products can add additional assurance measures, for combination would be called "EAL 2 plus"). There are mutual recognition agreements signed between many of the world's nations that will accept an This is an inserted line evaluation done by an accredited laboratory in the other countries as long as all of the assurance measures taken were at the EAL 4 level or less. program that can help you, called the ``CC Toolbox''. It can make sure that This is an inserted line help you quickly develop a document, but it obviously can't do your thinking for you. The specification of exactly what information must be in a PP or ST are in CC part 1, annexes B and C respectively. If you do decide to have your product (or PP) evaluated by an accredited laboratory, be prepared to spend money, spend time, and work throughout the process. In particular, evaluations require paying an accredited lab to do the evaluation, and higher levels of assurance become rapidly more expensive. This is an inserted line This is an inserted line require evidence to justify any claims made. Thus, evaluations require levels). Every claim has to be justified to some level of confidence, so the more claims made, the stronger the claims, and the more complicated the developed to meet CC requirements (especially at the higher assurance design, the more expensive an evaluation is. Obviously, when flaws are found, evaluate a product and determine the truth. If the product doesn't meet its they will usually need to be fixed. Note that a laboratory is paid to claims, then you basically have two choices: fix the product, or change (reduce) the claims. It's important to discuss with customers what's desired before beginning a This is a replaced line formal ST evaluation; an ST that includes functional or assurance requirements not truly needed by customers will be unnecessarily expensive to evaluate, and an ST that omits necessary requirements may not be acceptable PPs identify such requirements, but make sure that the PP accurately reflects the customer's real requirements (perhaps the customer only wants a part of mind, or wants something else instead for the situations where your product will be used). Note that an ST need not include every security feature in a product; an ST only states what will be (or has been) evaluated. A product product with a lower rating or no rating; the environment might be different, that has a higher EAL rating is not necessarily more secure than a similar the evaluation may have saved money and time by not evaluating the other product at a higher level, or perhaps the evaluation missed something important. Evaluations are not proofs; they simply impose a defined minimum bar to gain confidence in the requirements or product. ----------------------------------------------------------------------------- This is an inserted line 4.2. Security Environment and Objectives The first step in defining a PP or ST is identify the ``security This is an inserted line (can attackers access the computer hardware?), the assets requiring purpose of the TOE (what kind of product is it? what is the intended use?). This is a replaced line In developing a PP or ST, you'd end up with a statement of assumptions (who is trusted? is the network or platform benign?), threats (that the system or its environment must counter), and organizational security policies (that the This is a replaced line This is an inserted line threat agent (who might perform the attack?), a presumed attack method, any vulnerabilities that are the basis for the attack, and what asset is under attack. This is an inserted line You'd then define a set of security objectives for the system and environment, and show that those objectives counter the threats and satisfy the policies. Even if you aren't creating a PP or ST, thinking about your assumptions, threats, and possible policies can help you avoid foolish decisions. For example, if the computer network you're using can be sniffed (e.g., the Internet), then unencrypted passwords are a foolish idea in most circumstances. For the CC, you'd then identify the functional and assurance requirements that would be met by the TOE, and which ones would be met by the environment, to meet those security objectives. These requirements would be selected from will briefly describe the major classes of requirements. In the CC, the ``chinese menu'' of the CC's possible requirements, and the next sections requirements are grouped into classes, which are subdivided into families, This is an inserted line which are further subdivided into components; the details of all this are in works is in the CC part 1, figure 4.5, which I cannot reproduce here. the CC itself if you need to know about this. A good diagram showing how this Again, if you're not intending for your product to undergo a CC evaluation, it's still good to briefly determine this kind of information and informally write include that information in your documentation (e.g., the man page or whatever your documentation is). ----------------------------------------------------------------------------- 4.3. Security Functionality Requirements This is an inserted line This section briefly describes the CC security functionality requirements (by CC class), primarily to give you an idea of the kinds of security requirements you might want in your software. If you want more detail about This is an inserted line the CC's requirements, see CC part 2. Here are the major classes of CC security requirements, along with the 3-letter CC abbreviation for that This is a replaced line This is a replaced line and analyze security-relevant activities. You'll need to identify what you want to make auditable, since often you can't leave all possible auditing capabilities enabled. Also, consider what to do when there's no room left for auditing - if you stop the system, an attacker may intentionally do things to be logged and thus stop the system. CC; officially it's called communication, but the real meaning is non-repudiation. Is it important that an originator cannot deny having   * Communication/Non-repudiation (FCO). This class is poorly named in the sent a message, or that a recipient cannot deny having received it? There (e.g., a user might be able to give their private key away ahead of time if they wanted to be able to repudiate something later), but nevertheless for some applications supporting non-repudiation capabilities is very useful. operations use cryptography, what algorithms and key sizes are you using,   * Cryptographic Support (FCS). If you're using cryptography, what and how are you managing their keys (including distribution and destruction)?   * User Data Protection (FDP). This class specifies requirement for protecting user data, and is a big class in the CC with many families This is a replaced line (access control or information flow rules), develop various means to implement the policy, possibly support off-line storage, import, and export, and provide integrity when transferring user data between TOEs. One often-forgotten issue is residual information protection - is it   * Identification and authentication (FIA). Generally you don't just want a identity, a process called authentication. Passwords are the most common mechanism for authentication. It's often useful to limit the number of authentication attempts (if you can) and limit the feedback during user to report who they are (identification) - you need to verify their authentication (e.g., displaying asterisks instead of the actual many cases, don't let the user do anything without authenticating. There password). Certainly, limit what a user can do before authenticating; in may be many issues controlling when a session can start, but in the CC world this is handled by the "TOE access" (FTA) class described below instead.   * Security Management (FMT). Many systems will require some sort of given a more trusted role (e.g., administrator). Be sure you think through what those special operations are, and ensure that only those with the trusted roles can invoke them. You want to limit trust; ideally, This is a replaced line   * Privacy (FPR). Do you need to support anonymity, pseudonymity, want or don't want these (e.g., should an administrator be able to unlinkability, or unobservability? If so, are there conditions where you that these can seriously conflict with non-repudiation, if you want those too. If you're worried about sophisticated threats, these functions can determine the real identity of someone hiding behind a pseudonym?). Note This is a replaced line   * Protection of the TOE Security Functions/Self-protection (FPT). Clearly, if the TOE can be subverted, any security functions it provides aren't worthwhile, and in many cases a TOE has to provide at least some self-protection. Perhaps you should "test the underlying abstract machine" - i.e., test that the underlying components meet your assumptions, or have the product run self-tests (say during start-up, periodically, or on request). You should probably "fail secure", at least under certain conditions; determine what those conditions are. Consider phyical protection of the TOE. You may want some sort of secure recovery function after a failure. It's often useful to have replay detection (detect when an attacker is trying to replay older actions) and counter it. Usually a TOE must make sure that any access checks are always This is a replaced line This is a replaced line   * Resource Utilization (FRU). Perhaps you need to provide fault tolerance, a priority of service scheme, or support resource allocation (such as a quota system).   * TOE Access (FTA). There may be many issues controlling sessions. Perhaps there should be a limit on the number of concurrent sessions (if you're running a web service, would it make sense for the same user to be logged users initiate a session lock. You might want to include a standard warning banner. One surprisingly useful piece of information is time and location of the last login) and the date/time of the last displaying, on login, information about the last session (e.g., the date/ unsuccessful attempt - this gives users information that can help them other criteria (e.g., perhaps you can only use the program during business hours).   * Trusted path/channels (FTP). A common trick used by attackers is to make the screen appear to be something it isn't, e.g., run an ordinary program needs to be a "trusted path" - a way that users can ensure that they are that looks like a login screen or a forged web site. Thus, perhaps there talking to the "real" program. ----------------------------------------------------------------------------- 4.4. Security Assurance Measure Requirements As noted above, the CC has a set of possible assurance requirements that can be selected, and several predefined sets of assurance requirements (EAL levels 1 through 7). Again, if you're actually going to go through a CC evaluation, you should examine the CC documents; I'll skip describing the measures involving reviewing official CC documents (evaluating PPs and STs). Here are some assurance measures that can increase the confidence others have in your software:   * Configuration management (ACM). At least, have unique a version This is an inserted line software, and have separate version identifiers for each piece (typical CM tools like CVS can do this, although CVS doesn't record changes as This is an inserted line You gain more assurance if you have good automated tools to control your atomic changes which is a weakness of it). The more that's under also control documentation, track all problem reports (especially security-related ones), and all development tools. This is a replaced line   * Delivery and operation (ADO). Your delivery mechanism should ideally let users detect unauthorized modifications to prevent someone else masquerading as the developer, and even better, prevent modification in the first place. You should provide documentation on how to securely install, generate, and start-up the TOE, possibly generating a log describing how the TOE was generated. This is an inserted line   * Development (ADV). These CC requirements deal with documentation This is an inserted line describing the TOE implementation, and that they need to be consistent This is a replaced line specification, high-level design, low-level design, and code, as well as   * Guidance documents (AGD). Users and administrators of your product will This is a replaced line doesn't need to be on paper; on-line help and "wizards" can help too. The guidance should include warnings about actions that may be a problem in a This is a replaced line   * Life-cycle support (ALC). This includes development security (securing flaw remediation process (to track and correct all security flaws), and selecting development tools wisely.   * Tests (ATE). Simply testing can help, but remember that you need to test the security functions and not just general functions. You should check is no longer permitted. Of course, there may be clever ways to subvert if something is set to permit, it's permitted, and if it's forbidden, it this, which is what vulnerability assessment is all about (described next).   * Vulnerability Assessment (AVA). Doing a vulnerability analysis is useful, where someone pretends to be an attacker and tries to find vulnerabilities in the product using the available information, including documentation (look for "don't do X" statements and see if an attacker could exploit them) and publicly known past vulnerabilities of this or similar products. This book describes various ways of countering known vulnerabilities of previous products to problems such as replay attacks (where known-good information is stored and retransmitted), buffer overflow attacks, race conditions, and other issues that the rest of this examined to ensure that misleading, unreasonable, or conflicting guidance book describes. The user and administrator guidance documents should be This is a replaced line been addressed. Specialized systems may need to worry about covert channels; read the CC if you wish to learn more about covert channels.   * Maintenance of assurance (AMA). If you're not going through a CC undergoes change. What is your process to give all your users strong evaluation, you don't need a formal AMA process, but all software confidence that future changes to your software will not create new vulnerabilities? For example, you could establish a process where multiple people review any proposed changes. This is a replaced line Chapter 5. Validate All Input   Wisdom will save you from the ways of perverse... This is a replaced line   Proverbs 2:12 (NIV) Some inputs are from untrustable users, so those inputs must be validated (filtered) before being used. You should determine what is legal and reject anything that does not match that definition. Do not do the reverse (identify what is illegal and write code to reject those cases), because you are likely to forget to handle an important case of illegal input. There is a good reason for identifying ``illegal'' values, though, and that's as a set of tests (usually just executed in your head) to be sure that your the filter to see if there are illegal values that could get through. validation code is thorough. When I set up an input filter, I mentally attack Depending on the input, here are a few examples of common ``illegal'' values that your input filters may need to prevent: the empty string, ".", "..", ".. /", anything starting with "/" or ".", anything with "/" or "&" inside it, any control characters (especially NIL and newline), and/or any characters with the ``high bit'' set (especially values decimal 254 and 255, and character 133 is the Unicode Next-of-line character used by OS/390). Again, your code should not be checking for ``bad'' values; you should do this check values. If your pattern isn't sufficiently narrow, you need to carefully re-examine the pattern to see if there are other problems. Limit the maximum character length (and minimum length if appropriate), and be sure to not lose control when such lengths are exceeded (see Chapter 6 for more about buffer overflows). This is an inserted line them from an untrusted user: Here are a few common data types, and things you should validate before using   * For strings, identify the legal characters or legal patterns (e.g., as a regular expression) and reject anything not matching that form. There are special problems when strings contain control characters (especially linefeed or NIL) or metacharacters (especially shell metacharacters); it is often best to ``escape'' such metacharacters immediately when the input is received so that such characters are not accidentally sent. CERT goes further and recommends escaping all characters that aren't in a list of characters not needing escaping [CERT 1998, CMU 1998]. See Section 8.3 for more information on metacharacters. Note that [http://www.w3.org/TR/ 2001/NOTE-newline-20010314] line ending encodings vary on different computers: Unix-based systems use character 0x0a (linefeed), CP/M and DOS based systems (including Windows) use 0x0d 0x0a (carriage-return linefeed, and some programs incorrectly reverse the order), the Apple MacOS uses 0x0d (carriage return), and IBM OS/390 uses 0x85 (0x85) (next line, sometimes called newline).   * Limit all numbers to the minimum (often zero) and maximum allowed values. This is an inserted line   * A full email address checker is actually quite complicated, because there support all of them; see mailaddr(7) and IETF RFC 822 [RFC 822] for more are legacy formats that greatly complicate validation if you need to information if such checking is necessary. Friedl [1997] developed a regular expression to check if an email address is valid (according to the specification); his ``short'' regular expression is 4,724 characters, and his ``optimized'' expression (in appendix B) is 6,598 characters long. And even that regular expression isn't perfect; it can't recognize This is a replaced line (as the specification permits). Often you can simplify and only permit the ``common'' Internet address formats. filenames.   * Filenames should be checked; see Section 5.4 for more information on   * URIs (including URLs) should be checked for validity. If you are directly acting on a URI (i.e., you're implementing a web server or This is an inserted line web-server-like program and the URL is a request for your data), make sure the URI is valid, and be especially careful of URIs that try to ``escape'' the document root (the area of the filesystem that the server is responding to). The most common ways to escape the document root are via ``..'' or a symbolic link, so most servers check any ``..'' directories themselves and ignore symbolic links unless specially UTF-8 encoding), or an encoded ``..'' could slip through. URIs aren't supposed to even include UTF-8 encoding, so the safest thing is to reject any URIs that include characters with high bits set. If you are implementing a system that uses the URI/URL as data, you're not home-free at all; you need to ensure that malicious users can't insert URIs that will harm other users. See Section 5.11.4 for more This is an inserted line   * When accepting cookie values, make sure to check the the domain value for any cookie you're using is the expected one. Otherwise, a (possibly example from IETF RFC 2965 of how failing to do this check could cause a cracked) related site might be able to insert spoofed cookies. Here's an problem: This is a replaced line   + User agent makes request to victim.cracker.edu, gets back cookie session_id="1234" and sets the default domain victim.cracker.edu.   + User agent makes request to spoof.cracker.edu, gets back cookie session-id="1111", with Domain=".cracker.edu".   + User agent makes request to victim.cracker.edu again, and passes: Cookie: $Version="1"; session_id="1234", $Version="1"; session_id="1111"; $Domain=".cracker.edu" was not one it originated by noticing that the Domain attribute is not for itself and ignore it. Unless you account for them, the legal character patterns must not include characters or character sequences that have special meaning to either the   * A character sequence may have special meaning to the program's internal program internals or the eventual output: This is an inserted line storage format. For example, if you store data (internally or externally) in delimited strings, make sure that the delimiters are not permitted delimited text files; inserting the delimiters in the input can be a data values. A number of programs store data in comma (,) or colon (:) problem unless the program accounts for it (i.e., by preventing it or encoding it in some way). Other characters often causing these problems This is an inserted line This is a replaced line less-than sign "<" (used in SGML, XML, and HTML to indicate a tag's beginning; this is important if you store data in these formats). Most filter such data on input. data formats have an escape sequence to handle these cases; use it, or   * A character sequence may have special meaning if sent back out to a user. A common example of this is permitting HTML tags in data input that will later be posted to other readers (e.g., in a guestbook or ``reader comment'' area). However, the problem is much more general. See Section 7.15 for a general discussion on the topic, and see Section 5.11 for a specific discussion about filtering HTML. This is a replaced line These tests should usually be centralized in one place so that the validity This is a replaced line Make sure that your validity test is actually correct; this is particularly a problem when checking input that will be used by another program (such as a filename, email address, or URL). Often these tests have subtle errors, different assumptions than the program that actually uses the data). If there's a relevant standard, look at it, but also search to see if the program has extensions that you need to know about. While parsing user input, it's a good idea to temporarily drop all permanently dropped privileges, and the other process performing security privileges, or even create separate processes (with the parser having checks against the parser requests). This is especially true if the parsing task is complex (e.g., if you use a lex-like or yacc-like tool), or if the programming language doesn't protect against buffer overflows (e.g., C and C++). See Section 7.4 for more information on minimizing privileges. This is a replaced line When using data for security decisions (e.g., ``let this user in''), be sure use the machine IP address or port number as the sole way to authenticate users, because in most environments this information can be set by the (potentially malicious) user. See Section 7.11 for more information. This is a replaced line The following subsections discuss different kinds of inputs to a program; note that input includes process state such as environment variables, umask so you need only worry about those inputs that are. values, and so on. Not all inputs are under the control of an untrusted user, ----------------------------------------------------------------------------- This is an inserted line This is a replaced line Many programs take input from the command line. A setuid/setgid program's command line data is provided by an untrusted user, so a setuid/setgid program must defend itself from potentially hostile command line values. (through calls such as the execve(3) call). Therefore, setuid/setgid programs Attackers can send just about any kind of data through a command line must completely validate the command line inputs and must not trust the name to any value including NULL). of the program reported by command line argument zero (an attacker can set it ----------------------------------------------------------------------------- 5.2. Environment Variables By default, environment variables are inherited from a process' parent. the environment variables to arbitrary values. This is dangerous to setuid/ However, when a program executes another program, the calling program can set setgid programs, because their invoker can completely control the environment variables they're given. Since they are usually inherited, this also applies transitively; a secure program might call some other program and, without special measures, would pass potentially dangerous environment variables values on to the program it calls. The following subsections discuss environment variables and what to do with them. ----------------------------------------------------------------------------- 5.2.1. Some Environment Variables are Dangerous Some environment variables are dangerous because many libraries and programs undocumented. For example, the IFS variable is used by the sh and bash shell are controlled by environment variables in ways that are obscure, subtle, or to determine which characters separate command line arguments. Since the This is a replaced line C, or the back-tick operator in Perl), setting IFS to unusual values can subvert apparently-safe calls. This behavior is documented in bash and sh, but it's obscure; many long-time users only know about IFS because of its use This is an inserted line in breaking security, not because it's actually used very often for its documented, and even if they are, those other programs may change and add dangerous environment variables. Thus, the only real solution (described This is an inserted line intended purpose. What is worse is that not all environment variables are below) is to select the ones you need and throw away the rest. ----------------------------------------------------------------------------- This is an inserted line 5.2.2. Environment Variable Storage Format is Dangerous Normally, programs should use the standard access routines to access environment variables. For example, in C, you should get values using getenv (3), set them using the POSIX standard routine putenv(3) or the BSD extension note here that setenv(3) is implemented in Linux, too. setenv(3) and eliminate environment variables using unsetenv(3). I should However, crackers need not be so nice; crackers can directly control the environment variable data area passed to a program using execve(2). This permits some nasty attacks, which can only be understood by understanding how environment variables really work. In Linux, you can see environ(5) for a summary how about environment variables really work. In short, environment variables are internally stored as a pointer to an array of pointers to characters; this array is stored in order and terminated by a NULL pointer each point to a NIL-terminated string value of the form ``NAME=value''. This This is an inserted line has several implications, for example, environment variable names can't include the equal sign, and neither the name nor value can have embedded NIL characters. However, a more dangerous implication of this format is that it allows multiple entries with the same variable name, but with different prohibit doing this, a locally-executing cracker can create such a situation values (e.g., more than one value for SHELL). While typical command shells using execve(2). The problem with this storage format (and the way it's set) is that a program different one. In Linux, the GNU glibc libraries try to shield programs from this; glibc 2.1's implementation of getenv will always get the first matching entry, setenv and putenv will always set the first matching entry, and unsetenv will actually unset all of the matching entries (congratulations to some programs go directly to the environ variable and iterate across all environment variables; in this case, they might use the last matching entry instead of the first one. As a result, if checks were made against the first matching entry instead, but the actual value used is the last matching entry, a cracker can use this fact to circumvent the protection routines. ----------------------------------------------------------------------------- 5.2.3. The Solution - Extract and Erase This is an inserted line needed as input (if any) should be carefully extracted. Then the entire This is an inserted line This is an inserted line environment should be erased, followed by resetting a small set of necessary environment variables to safe values. There really isn't a better way if you make any calls to subordinate programs; there's no practical method of This is a replaced line every program you call directly or indirectly, someone may add new may be exploitable. The simple way to erase the environment in C/C++ is by setting the global variable environ to NULL. The global variable environ is defined in ; C/C++ users will want to #include this header file. You will need to manipulate this value before spawning threads, but that's rarely a problem, since you want to do these manipulations very early in the program's execution (usually before threads are spawned). This is a replaced line This is a replaced line value, but I'm unaware of any Unix-like system that has trouble with doing this. I normally just modify the ``environ'' directly; manipulating such low-level components is possibly non-portable, but it assures you that you get a clean (and safe) environment. In the rare case where you need later variable's value somewhere, but this is rarely necessary; nearly all programs access to the entire set of variables, you could save the ``environ'' This is a replaced line Another way to clear the environment is to use the undocumented clearenv() function. The function clearenv() has an odd history; it was supposed to be clearenv() is defined in POSIX.9 (the Fortran 77 bindings to POSIX), so there This is a replaced line This is an inserted line is a quasi-official status for it. In Linux, clearenv() is defined in < stdlib.h>, but before using #include to include it you must make sure that __USE_MISC is #defined. A somewhat more ``official'' approach is to cause This is an inserted line __USE_MISC to be defined is to first #define either _SVID_SOURCE or _BSD_SOURCE, and then #include - these are the official feature test macros. One environment value you'll almost certainly re-add is PATH, the list of directories to search for programs; PATH should not include the current directory and usually be something simple like ``/bin:/usr/bin''. Typically you'll also set IFS (to its default of `` \t\n'', where space is the first character) and TZ (timezone). Linux won't die if you don't supply either IFS or TZ, but some System V based systems have problems if you don't supply a TZ value, and it's rumored that some shells need the IFS value set. In Linux, see environ(5) for a list of common environment variables that you might want This is a replaced line This is a replaced line If you really need user-supplied values, check the values first (to ensure that the values match a pattern for legal values and that they are within some reasonable maximum length). Ideally there would be some standard trusted file in /etc with the information for ``standard safe environment variable For something similar, you might want to examine the PAM module pam_env on values'', but at this time there's no standard file defined for this purpose. This is an inserted line those systems which have that module. If you allow users to set an arbitrary environment variable, then you'll let them subvert restricted shells (more on bin/env'' program with the ``-'' option (which erases all environment it the ``-'' option, follow that with the set of variables and their values variables of the program being run). Basically, you call /usr/bin/env, give This is an inserted line program to run and its arguments. You usually want to call the program using you wish to set (as name=value), and then follow that with the name of the the full pathname (/usr/bin/env) and not just as ``env'', in case a user has created a dangerous PATH value. Note that GNU's env also accepts the options "-i" and "--ignore-environment" as synonyms (they also erase the environment of the program being started), but these aren't portable to other versions of env. If you're programming a setuid/setgid program in a language that doesn't allow you to reset the environment directly, one approach is to create a and then calls the other program. Beware: make sure the wrapper will actually This is an inserted line ``wrapper'' program. The wrapper sets the environment program to safe values, invoke the intended program; if it's an interpreted program, make sure there's no race condition possible that would allow the interpreter to load a different program than the one that was granted the special setuid/setgid privileges. ----------------------------------------------------------------------------- This is a replaced line If you allow users to set their own environment variables, then users will be able to escape out of restricted accounts (these are accounts that are general-purpose machine). This includes letting users write or modify certain files in their home directory (e.g., like .login), supporting conventions that load in environment variables from files under the user's control (e.g., openssh's .ssh/environment file), or supporting protocols that transfer environment variables (e.g., the Telnet Environment Option; see CERT Advisory CA-1995-14 for more). Restricted accounts should never be allowed to modify or add any file directly contained in their home directory, and instead should be given only a specific subdirectory that they are allowed to modify ari posted a detailed discussion of this problem on Bugtraq on June 24, 2002: This is an inserted line this hasn't been discussed earlier. If it has, people simply haven't paid Given the similarities with certain other security issues, i'm surprised This is a replaced line This is a replaced line This problem is not necessarily ssh-specific, though most telnet daemons that support environment passing should already be configured to remove dangerous variables due to a similar (and more serious) issue back in '95 (ref: [1]). I will give ssh-based examples here. Scenario one: Let's say admin bob has a host that he wants to give people _log into_ his system, so instead of giving users normal shells, or even no shells, bob gives them all (say) /usr/sbin/nologin, a program he wrote This is an inserted line effectively ending the user's session. As far as most people are This is an inserted line This is a replaced line an encrypted tunnel. The thing is, bob's system uses dynamic libraries (as most do), and /usr/ sbin/nologin is dynamically linked (as most such programs are). If a user file) and put some arbitrary file on the system (e.g. 'doevilstuff.so'), can set his environment variables (e.g. by uploading a '.ssh/environment' This is an inserted line he can bypass any functionality of /usr/sbin/nologin completely via LD_PRELOAD (or another member of the LD_* environment family). The user can now gain a shell on the system (with his own privileges, of if he were aware of what just occurred, would be extremely unhappy. Granted, there are all kinds of interesting ways to (more or less) do away with this problem. Bob could just grit his teeth and give the ftp users a nonexistent shell, or he could statically compile nologin, assuming his operating system comes with static libraries. Bob could also, humorously, make his nologin program setuid and let the standard C library take care of the situation. Then, of course, there are also the ssh-specific access controls such as AllowGroup and AllowUsers. These may appease the situation in this scenario, but it does not correct the problem. ... Now, what happens if bob, instead of using /usr/sbin/nologin, wants to use (for example) some BBS-type interface that he wrote up or downloaded? It can be a script written in perl or tcl or python, or it could be a compiled program; doesn't matter. Additionally, bob need not be running an ftp server on this host; instead, perhaps bob uses nfs or veritas to mount user home directories from a fileserver on his network; this exact setup is (unfortunately) employed by many bastion hosts, password management hosts and mail servers---to name a few. Perhaps bob runs an ISP, and replaces the user's shell when he doesn't pay. With all of these possible (and common) scenarios, bob's going to have a somewhat more difficult time getting around the problem. be compiled into a dynamic library and LD_PRELOAD=/path/to/evil.so should ... Exploitation of the problem is simple. The circumvention code would be placed into ~user/.ssh/environment (a similar environment option may be appended to public keys in the authohrized_keys file). If no dynamically loadable programs are executed, this will have no effect. ISPs and universities (along with similarly affected organizations) should compile their rejection (or otherwise restricted) binaries statically (assuming your operating system comes with static libraries)... Ideally, sshd (and all remote access programs that allow user-definable This is an inserted line environments) should strip any environment settings that libc ignores for setuid programs. 5.3. File Descriptors ----------------------------------------------------------------------------- A program is passed a set of ``open file descriptors'', that is, pre-opened files. A setuid/setgid program must deal with the fact that the user gets to select what files are open and to what (within their permission limits). A setuid/setgid program must not assume that opening a new file will always open into a fixed file descriptor id, or that the open will succeed at all. It must also not assume that standard input (stdin), standard output (stdout), and standard error (stderr) refer to a terminal or are even open. This is a replaced line The rationale behind this is easy; since an attacker can open or close a file descriptor before starting the program, the attacker could create an unexpected situation. If the attacker closes the standard output, when the This is an inserted line program opens the next file it will be opened as though it were standard C libraries will automatically open stdin, stdout, and stderr if they aren't already open (to /dev/null), but this isn't true on all Unix-like systems. Also, these libraries can't be completely depended on; for example, on some systems it's possible to create a race condition that causes this automatic opening to fail (and still run the program). ----------------------------------------------------------------------------- 5.4. File Names The names of files can, in certain circumstances, cause serious problems. This is especially a problem for secure programs that run on computers with users may be able to trick a program into creating undesirable filenames local untrusted users, but this isn't limited to that circumstance. Remote (programs should prevent this, but not all do), or remote users may have partially penetrated a system and try using this trick to penetrate the rest value from an untrusted user, though that depends on the circumstances. You This is an inserted line Usually you will want to not include ``..'' (higher directory) as a legal might also want to list only the characters you will permit, and forbidding any filenames that don't match the list. It's best to prohibit any change in directory, e.g., by not including ``/'' in the set of legal characters, if you're taking data from an external user and transforming it into a filename. ``*'', ``?'', ``['' (matching ``]''), and possibly ``{'' (matching ``}''). Often you shouldn't support ``globbing'', that is, expanding filenames using PNG files. The C fopen(3) command (for example) doesn't do globbing, but the For example, the command ``ls *.png'' does a glob on ``*.png'' to list all command shells perform globbing by default, and in C you can request globbing using (for example) glob(3). If you don't need globbing, just use the calls that don't do it where possible (e.g., fopen(3)) and/or disable them (e.g., This is a replaced line This is an inserted line to permit globbing. Globbing can be useful, but complex globs can take a great deal of computing time. For example, on some ftp servers, performing a machine: few of these requests can easily cause a denial-of-service of the entire ftp> ls */../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../* Trying to allow globbing, yet limit globbing patterns, is probably futile. Instead, make sure that any such programs run as a separate process and use process limits to limit the amount of CPU and other resources they can consume. See Section 7.4.8 for more information on this approach, and see Section 3.6 for more information on how to set these limits. (since this marks the end of the name) and the '/' character (since this is Unix-like systems generally forbid including the NIL character in a filename the directory separator). However, they often permit anything else, which is cleverly-created filenames. Filenames that can especially cause problems include:   * Filenames with leading dashes (-). If passed to other programs, this may cause the other programs to misinterpret the name as option settings. Ideally, Unix-like systems shouldn't allow these filenames; they aren't needed and create many unnecessary security problems. Unfortunately, currently developers have to deal with them. Thus, whenever calling another program with a filename, insert ``--'' before the filename parameters (to stop option processing, if the program supports this This is a replaced line the filename to keep the dash from being the lead character).   * Filenames with control characters. This especially includes newlines and carriage returns (which are often confused as argument separators inside shell scripts, or can split log entries into multiple entries) and the ESCAPE character (which can interfere with terminal emulators, causing them to perform undesired actions outside the user's control). Ideally, needed and create many unnecessary security problems. Unix-like systems shouldn't allow these filenames either; they aren't multiple arguments, with the other arguments causing problems. Since This is an inserted line   * Filenames with spaces; these can sometimes confuse a shell into being other operating systems allow spaces in filenames (including Windows and This is a replaced line permitted. Please be careful in dealing with them, e.g., in the shell use double-quotes around all filename parameters whenever calling another program. You might want to forbid leading and trailing spaces at least; these aren't as visible as when they occur in other places, and can confuse human users. This is an inserted line   * Invalid character encoding. For example, a program may believe that the filename is UTF-8 encoded, but it may have an invalidly long UTF-8 encoding. See Section 5.9.2 for more information. I'd like to see agreement on the character encoding used for filenames (e.g., UTF-8), and then have the operating system enforce the encoding (so that only legal encodings are allowed), but that hasn't happened at this time.   * Another other character special to internal data formats, such as ``<'', ``;'', quote characters, backslash, and so on. This is a replaced line ----------------------------------------------------------------------------- 5.5. File Contents If a program takes directions from a file, it must not trust that file specially unless only a trusted user can control its contents. Usually this means that an untrusted user must not be able to modify the file, its treated as suspect. directory, or any of its ancestor directories. Otherwise, the file must be If the directions in the file are supposed to be from an untrusted user, then make sure that the inputs from the file are protected as describe throughout this book. In particular, check that values match the set of legal values, and that buffers are not overflowed. ----------------------------------------------------------------------------- This is a replaced line Web-based applications (such as CGI scripts) run on some trusted server and must get their input data somehow through the web. Since the input data generally come from untrusted users, this input data must be validated. Indeed, this information may have actually come from an untrusted third party; see Section 7.15 for more information. For example, CGI scripts are passed this information through a standard set of environment variables and through standard input. The rest of this text will specifically discuss CGI, because it's the most common technique for implementing dynamic web content, but the general issues are the same for most other dynamic web content techniques. One additional complication is that many CGI inputs are provided in so-called ``URL-encoded'' format, that is, some values are written in the format %HH where HH is the hexadecimal code for that byte. You or your CGI library must handle these inputs correctly by URL-decoding the input and then checking if the resulting byte value is acceptable. You must correctly handle all values, decode inputs more than once, or input such as ``%2500'' will be mishandled including problematic values such as %00 (NIL) and %0A (newline). Don't (the %25 would be translated to ``%'', and the resulting ``%00'' would be erroneously translated to the NIL character). CGI scripts are commonly attacked by including special characters in their inputs; see the comments above. Another form of data available to web-based applications are ``cookies.'' unless special precautions are taken. Also, cookies can be used to track users, potentially invading user privacy. As a result, many users disable does not require the use of cookies (but see my later discussion for when you must authenticate individual users). I encourage you to avoid or limit the use of persistent cookies (cookies that last beyond a current session), because they are easily abused. Indeed, U.S. agencies are currently forbidden This is an inserted line to use persistent cookies except in special circumstances, because of the concern about invading user privacy; see the OMB guidance in memorandum that you have a privacy profile (named p3p.xml on the root directory of the M-00-13 (June 22, 2000). Note that to use cookies, some browsers may insist server). Some HTML forms include client-side input checking to prevent some illegal values; these are typically implemented using Javascript/ECMAscript or Java. This checking can be helpful for the user, since it can happen ``immediately'' without requiring any network access. However, this kind of ``illegal'' values directly to the web server without going through the input checking is useless for security, because attackers can send such checks. It's not even hard to subvert this; you don't have to write a program to send arbitrary data to a web application. In general, servers must perform cannot trust clients to do this securely. In short, clients are generally not all their own input checking (of form data, cookies, and so on) because they trustworthy channels. ``trustworthy channels''. See Section 7.11 for more information on A brief discussion on input validation for those using Microsoft's Active This is an inserted line heap.nologin.net/aspsec.html] http://heap.nologin.net/aspsec.html Server Pages (ASP) is available from Jerry Connolly at [http:// 5.7. Other Inputs Programs must ensure that all inputs are controlled; this is particularly difficult for setuid/setgid programs because they have so many such inputs. Other inputs programs must consider include the current directory, signals, memory maps (mmaps), System V IPC, pending timers, resource limits, the scheduling priority, and the umask (which determines the default permissions of newly-created files). Consider explicitly changing directories (using chdir(2)) to an appropriately fully named directory at program startup. ----------------------------------------------------------------------------- This is a replaced line 5.8. Human Language (Locale) Selection This is an inserted line As more people have computers and the Internet available to them, there has been increasing pressure for programs to support multiple human languages and cultures. This combination of language and other cultural factors is usually called a ``locale''. The process of modifying a program so it can support multiple locales is called ``internationalization'' (i18n), and the process This is an inserted line of providing the information for a particular locale to a program is called ``localization'' (l10n). This is an inserted line Overall, internationalization is a good thing, but this process provides another opportunity for a security exploit. Since a potentially untrusted user provides information on the desired locale, locale selection becomes another input that, if not properly protected, can be exploited. ----------------------------------------------------------------------------- 5.8.1. How Locales are Selected In locally-run programs (including setuid/setgid programs), locale information is provided by an environment variable. Thus, like all other environment variables, these values must be extracted and checked against valid patterns before use. This is a replaced line (via the Accept-Language request header). However, since not all web browsers properly pass this information (and not all users configure their browsers properly), this is used less often than you might think. Often, the language requested in a web browser is simply passed in as a form value. Again, these values must be checked for validity before use, as with any other form value. This is a replaced line discussed in the previous sections. However, because this input is so rarely format strings (discussed later), user-controlled strings can permit considered, I'm discussing it separately. In particular, when combined with attackers to force other programs to run arbitrary instructions, corrupt data, and do other unfortunate actions. ----------------------------------------------------------------------------- 5.8.2. Locale Support Mechanisms There are two major library interfaces for supporting locale-selected This is an inserted line messages on Unix-like systems, one called ``catgets'' and the other called the gettext approach, a string (usually in English) is used to look up a table that translates the original string. catgets(3) is an accepted standard (via the X/Open Portability Guide, Volume 3 and Single Unix Specification), so it's possible your program uses it. The ``gettext'' interface is not an official standard, (though it was originally a UniForum proposal), but I believe it's the more widely used interface (it's used by Sun and essentially all GNU programs). In theory, catgets should be slightly faster, but this is at best marginal on in catgets() makes the gettext() interface much easier to use. I'd suggest This is a replaced line using gettext(), just because it's easier to use. However, don't take my word for it; see GNU's documentation on gettext (info:gettext#catgets) for a longer and more descriptive comparison. The catgets(3) call (and its associated catopen(3) call) in particular is vulnerable to security problems, because the environment variable NLSPATH can be used to control the filenames used to acquire internationalized messages. The GNU C library ignores NLSPATH for setuid/setgid programs, which helps, This is a replaced line programs (like CGI scripts) which don't ``appear'' to require such This is an inserted line protection. The widely-used ``gettext'' interface is at least not vulnerable to a malicious NLSPATH setting to my knowledge. However, it appears likely to me that malicious settings of LC_ALL or LC_MESSAGES could cause problems. Also, if you use gettext's bindtextdomain() routine in its file cat-compat.c, that does depend on NLSPATH. ----------------------------------------------------------------------------- 5.8.3. Legal Values their desired locales, make sure the provided internationalization For the moment, if you must permit untrusted users to set information on information meets a narrow filter that only permits legitimate locale names. For user programs (especially setuid/setgid programs), these values will come in via NLSPATH, LANGUAGE, LANG, the old LINGUAS, LC_ALL, and the other LC_* values (especially LC_MESSAGES, but also including LC_COLLATE, LC_CTYPE, LC_MONETARY, LC_NUMERIC, and LC_TIME). For web applications, this user-requested set of language information would be done via the Accept-Language request header or a form value (the application should indicate the actual language setting of the data being returned via the environment variable filtering if your users can set your environment Content-Language heading). You can check this value as part of your variables (i.e., setuid/setgid programs) or as part of your input filtering (e.g., for CGI scripts). The GNU C library "glibc" doesn't accept some values of LANG for setuid/setgid programs (in particular anything with "/"), but errors have been found in that filtering (e.g., Red Hat released an update to fix this error in glibc on September 1, 2000). This kind of filtering isn't have not found any guidance on filtering language settings, so here are my suggestions based on my own research into the issue. First, a few words about the legal values of these settings. Language settings are generally set using the standard tags defined in IETF RFC 1766 (which uses two-letter country codes as its basic tag, followed by an This is a replaced line settings use the underscore instead). However, some find this insufficiently flexible, so three-letter country codes may soon be used as well. Also, there are two major not-quite compatible extended formats, the X/Open Format and the CEN Format (European Community Standard); you'd like to permit both. This is a replaced line ``FR_fr'' (French using the territory of France's conventions). Also, so many people use nonstandard names that programs have had to develop ``alias'' systems to cope with nonstandard names (for GNU gettext, see /usr/share/ locale/locale.alias, and for X11, see /usr/lib/X11/locale/locale.alias; you This is a replaced line well. Libraries like gettext() have to accept all these variants and find an appropriate value, where possible. One source of further information is FSF [1999]; another source is the li18nux.org web site. A filter should not escaping out of the trusted directories) and ``..'' (which might permit going permit characters that aren't needed, in particular ``/'' (which might permit up one directory). Other dangerous characters in NLSPATH include ``%'' (which indicates substitution) and ``:'' (which is the directory separator); the documentation I have for other machines suggests that some implementations This is an inserted line may use them for other values, so it's safest to prohibit them. ----------------------------------------------------------------------------- 5.8.4. Bottom Line In short, I suggest simply erasing or re-setting the NLSPATH, unless you have a trusted user supplying the value. For the Accept-Language heading in HTTP (if you use it), form values specifying the locale, and the environment listed above, filter the locales from untrusted users to permit null (empty) variables LANGUAGE, LANG, the old LINGUAS, LC_ALL, and the other LC_* values values or to only permit values that match in total this regular expression (note that I've recently added "="): [A-Za-z][A-Za-z0-9_,+@\-\.=]* I haven't found any legitimate locale which doesn't match this pattern, but this pattern does appear to protect against locale attacks. Of course, locale, but in such a case these routines will fall back to the default there's no guarantee that there are messages available in the requested messages (usually in English), which at least is not a security problem. If you wish to be really picky, and only patterns that match li18nux's locale ^[A-Za-z]+(_[A-Za-z]+)? pattern, you can use this pattern instead: (\.[A-Z]+(\-[A-Z0-9]+)*)? (,[A-Za-z0-9]+(\=[A-Za-z0-9\-]+))*)?$ In both cases, these patterns use POSIX's extended (``modern'') regular (\@[A-Za-z0-9]+(\=[A-Za-z0-9\-]+) This is an inserted line This is an inserted line expression notation (see regex(3) and regex(7) on Unix-like systems). Of course, languages cannot be supported without a standard way to represent their written symbols, which brings us to the issue of character encoding. This is a replaced line 5.9. Character Encoding 5.9.1. Introduction to Character Encoding since essentially all U.S. systems support ASCII, this permits easy exchange For many years Americans have exchanged text using the ASCII character set; of English text. Unfortunately, ASCII is completely inadequate in handling the characters of nearly all other languages. For many years different countries have adopted different techniques for exchanging text in different languages, making it difficult to exchange data in an increasingly interconnected world. This is an inserted line This is an inserted line More recently, ISO has developed ISO 10646, the ``Universal Mulitple-Octet Coded Character Set (UCS). UCS is a coded character set which defines a single 31-bit value for each of all of the world's characters. The first 65536 characters of the UCS (which thus fit into 16 bits) are termed the This is an inserted line all of today's spoken languages. The Unicode forum develops the Unicode standard, which concentrates on the UCS and adds some additional conventions to aid interoperability. Historically, Unicode and ISO 10646 were developed by competing groups, but thankfully they realized that they needed to work together and they now coordinate with each other. If you're writing new software that handles internationalized characters, you should be using ISO 10646/Unicode as your basis for handling international characters. However, you may need to process older documents in various older (language-specific) character sets, in which case, you need to ensure that an untrusted user cannot control the setting of another document's character set (since this would significantly affect the document's interpretation). ----------------------------------------------------------------------------- 5.9.2. Introduction to UTF-8 This is an inserted line create a universal character set more than 8 bits was required. Therefore, a Most software is not designed to handle 16 bit or 32 bit characters, yet to special format called ``UTF-8'' was developed to encode these potentially international characters in a format more easily handled by existing programs and libraries. UTF-8 is defined, among other places, in IETF RFC 2279, so variable-width encoding; characters numbered 0 to 0x7f (127) encode to it's a well-defined standard that can be freely read and used. UTF-8 is a themselves as a single byte, while characters with larger values are encoded into 2 to 6 bytes of information (depending on their value). The encoding has been specially designed to have the following nice properties (this information is from the RFC and Linux utf-8 man page): This is a replaced line files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8. This is fabulous for backward compatibility with the many existing U.S. programs and data files. This is a replaced line   * All UCS characters beyond 0x7f are encoded as a multibyte sequence consisting only of bytes in the range 0x80 to 0xfd. This means that no This is an inserted line ASCII byte can appear as part of another character. Many other encodings permit characters such as an embedded NIL, causing programs to fail.   * It's easy to convert between UTF-8 and a 2-byte or 4-byte fixed-width representations of characters (these are called UCS-2 and UCS-4 respectively).   * The lexicographic sorting order of UCS-4 strings is preserved, and the This is an inserted line Boyer-Moore fast search algorithm can be used directly with UTF-8 data.   * All possible 2^31 UCS codes can be encoded using UTF-8.   * The first byte of a multibyte sequence which represents a single how long this multibyte sequence is. All further bytes in a multibyte This is a replaced line resynchronization; if a byte is missing, it's easy to skip forward to the ``next'' character, and it's always easy to skip forward and back to the ``next'' or ``preceding'' character. In short, the UTF-8 transformation format is becoming a dominant method for exchanging international text information because it can support all of the world's languages, yet it is backward compatible with U.S. ASCII files as particularly when storing data in a ``text'' file. well as having other nice properties. For many purposes I recommend its use, This is an inserted line 5.9.3. UTF-8 Security Issues and this might be an exploitable security hole. UTF-8 encoders are supposed This is an inserted line The reason to mention UTF-8 is that some byte sequences are not legal UTF-8, to use the ``shortest possible'' encoding, but naive decoders may accept encodings that are longer than necessary. Indeed, earlier standards permitted decoders to accept ``non-shortest form'' encodings. The problem here is that This is an inserted line ways, and thus might defeat the security routines checking for dangerous this means that potentially dangerous input could be represented multiple This is an inserted line inputs. The RFC describes the problem this way: Implementers of UTF-8 need to consider the security aspects of how they This is an inserted line circumstances an attacker would be able to exploit an incautious UTF-8 parser by sending it an octet sequence that is not permitted by the UTF-8 syntax. A particularly subtle form of this attack could be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences when encoded as the single-octet sequence 00, but allow the illegal interpret it as a NUL character (00). Another example might be a parser illegal octet sequence 2F C0 AE 2E 2F. A longer discussion about this is available at Markus Kuhn's UTF-8 and http://www.cl.cam.ac.uk/~mgk25/unicode.html. Unicode FAQ for Unix/Linux at [http://www.cl.cam.ac.uk/~mgk25/unicode.html] ----------------------------------------------------------------------------- 5.9.4. UTF-8 Legal Values This is a replaced line Thus, when accepting UTF-8 input, you need to check if the input is valid UTF-8. Here is a list of all legal UTF-8 sequences; any character sequence This is a replaced line table, the first column shows the various character values being encoded into UTF-8. The second column shows how those characters are encoded as binary This is a replaced line some values should not be allowed because they're not the shortest possible encoding. The last row shows the valid values each byte can have (in hexadecimal). Thus, a program should check that every character meets one of the patterns in the right-hand column. A ``-'' indicates a range of legal This is a replaced line sequence doesn't mean that you should accept it (you still need to do all your other checking), but generally you should check any UTF-8 data for UTF-8 This is an inserted line legality before performing other checks. This is a replaced line +------------------------+-------------------------+------------------------+ This is an inserted line |UCS Code (Hex) |Binary UTF-8 Format |Legal UTF-8 Values (Hex)| +------------------------+-------------------------+------------------------+ |00-7F |0xxxxxxx |00-7F | +------------------------+-------------------------+------------------------+ |80-7FF |110xxxxx 10xxxxxx |C2-DF 80-BF | +------------------------+-------------------------+------------------------+ This is an inserted line This is a replaced line +------------------------+-------------------------+------------------------+ This is a replaced line |1000-FFFF |1110xxxx 10xxxxxx |E1-EF 80-BF 80-BF | | |10xxxxxx | | +------------------------+-------------------------+------------------------+ |10000-3FFFF |11110xxx 10xxxxxx |F0 90*-BF 80-BF 80-BF | | |10xxxxxx 10xxxxxx | | +------------------------+-------------------------+------------------------+ |40000-FFFFFF |11110xxx 10xxxxxx |F1-F3 80-BF 80-BF 80-BF | | |10xxxxxx 10xxxxxx | | +------------------------+-------------------------+------------------------+ |40000-FFFFFF |11110xxx 10xxxxxx |F1-F3 80-BF 80-BF 80-BF | | |10xxxxxx 10xxxxxx | | +------------------------+-------------------------+------------------------+ |100000-10FFFFF |11110xxx 10xxxxxx |F4 80-8F* 80-BF 80-BF | +------------------------+-------------------------+------------------------+ This is a replaced line |200000-3FFFFFF |111110xx 10xxxxxx |too large; see below | | |10xxxxxx 10xxxxxx | | | |10xxxxxx | | +------------------------+-------------------------+------------------------+ |04000000-7FFFFFFF |1111110x 10xxxxxx |too large; see below | | |10xxxxxx 10xxxxxx | | | |10xxxxxx 10xxxxxx | | +------------------------+-------------------------+------------------------+ As I noted earlier, there are two standards for character sets, ISO 10646 and Unicode, who have agreed to synchronize their character assignments. The definition of UTF-8 in ISO/IEC 10646-1:2000 and the IETF RFC also currently support five and six byte sequences to encode characters outside the range Unicode characters and it's expected that a future version of ISO 10646 will supported by Uniforum's Unicode, but such values can't be used to support have the same limits. Thus, for most purposes the five and six byte UTF-8 This is an inserted line a special purpose for them). This is set of valid values is tricky to determine, and in fact earlier versions of this document got some entries wrong (in some cases it permitted overlong characters). Language developers should include a function in their libraries to check for valid UTF-8 values, just because it's so hard to get right. I should note that in some cases, you might want to cut slack (or use internally) the hexadecimal sequence C0 80. This is an overlong sequence that, if permitted, can represent ASCII NUL (NIL). Since C and C++ have trouble including a NIL character in an ordinary string, some people have This is a replaced line data stream; Java even enshrines the practice. Feel free to use C0 80 internally while processing data, but technically you really should translate this back to 00 before saving the data. Depending on your needs, you might This is an inserted line decide to be ``sloppy'' and accept C0 80 as input in a UTF-8 data stream. If This is a replaced line sequence since accepting it aids interoperability. Handling this can be tricky. You might want to examine the C routines developed by Unicode to handle conversions, available at [ftp:// ftp.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c] ftp://ftp.unicode.org/ Public/PROGRAMS/CVTUTF/ConvertUTF.c. It's unclear to me if these routines are open source software (the licenses don't clearly say whether or not they can be modified), so beware of that. ----------------------------------------------------------------------------- 5.9.5. UTF-8 Related Issues This is a replaced line encoding of UCS, simplifying a lot of international text handling issues. This section has discussed UTF-8, because it's the most popular multibyte However, it's certainly not the only encoding; there are other encodings, such as UTF-16 and UTF-7, which have the same kinds of issues and must be validated for the same reasons. Another issue is that some phrases can be expressed in more than one way in ISO 10646/Unicode. For example, some accented characters can be represented as a single character (with the accent) and also as a set of characters may appear identical. There's also a zero-width space that could be inserted, (e.g., the base character plus a separate composing accent). These two forms Beware of situations where such hidden text could interfere with the program. This is an issue that in general is hard to solve; most programs don't have with the result that apparently-similar items are considered different. such tight control over the clients that they know completely how a font, display characteristics, locale, and so on). This is an inserted line particular sequence will be displayed (since this depends on the client's ----------------------------------------------------------------------------- This is a replaced line 5.10. Prevent Cross-site Malicious Content on Input This is a replaced line second user; the second user's application may then process that data in a way harmful to the second user. This is a particularly common problem for web This is an inserted line applications, we'll call this problem ``cross-site malicious content.'' In short, you cannot accept input (including any form data) without checking, filtering, or encoding it. For more information, see Section 7.15. Fundamentally, this means that all web application input must be filtered (so characters that can cause this problem are removed), encoded (so the problem), or validated (to ensure that only ``safe'' data gets through). This is an inserted line characters that can cause this problem are encoded in a way to prevent the Filtering and validation should often be done at the input, but encoding can be done either at input or output time. If you're just passing the data This is a replaced line (so it won't be forgotten), but if you're processing the data, there are arguments for encoding on output instead. ----------------------------------------------------------------------------- 5.11. Filter HTML/URIs That May Be Re-presented One special case where cross-site malicious content must be prevented are web then send it on to other users (see Section 7.15 for more information on applications which are designed to accept HTML or XHTML from one user, and cross-site malicious content). The following subsections discuss filtering this specific kind of input, since handling it is such a common requirement. ----------------------------------------------------------------------------- 5.11.1. Remove or Forbid Some HTML Data anything, and this is relatively easy to do. As noted above, you should already be identifying the list of legal characters, and rejecting or removing those characters that aren't in the list. In this filter, simply This is an inserted line don't include the following characters in the list of legal characters: ``< '', ``>'', and ``&'' (and if they're used in attributes, the double-quote character ``"''). If browsers only operated according the HTML specifications, the ``>"'' wouldn't need to be removed, but in practice it must be removed. This is because some browsers assume that the author of the page really meant to put in an opening "<" and ``helpfully'' insert one - attackers can exploit this behavior and use the ">" to create an undesired "< ". Usually the character set for transmitting HTML is ISO-8859-1 (even when sending international text), so the filter should also omit most control characters (linefeed and tab are usually okay) and characters with their high-order bit set. One problem with this approach is that it can really surprise users, especially those entering international text if all international text is quietly removed. If the invalid characters are quietly removed without later. One alternative is forbidding such characters and sending error This is an inserted line warning, that data will be irrevocably lost and cannot be reconstructed messages back to users who attempt to use them. This at least warns users, but doesn't give them the functionality they were looking for. Other This is an inserted line alternatives are encoding this data or validating this data, which are discussed next. This is a replaced line This is an inserted line 5.11.2. Encoding HTML Data This is a replaced line so they won't have their usual meaning in HTML. This can be done by translating all "<" into "<", ">" into ">", and "&" into "&". Arbitrary international characters can be encoded in Latin-1 using the format characters means you must know what the input encoding was, of course. One possible danger here is that if these encodings are accidentally interpreted twice, they will become a vulnerability. However, this approach at least permits later users to see the "intent" of the input. ----------------------------------------------------------------------------- 5.11.3. Validating HTML Data This is an inserted line This is an inserted line Some applications, to work at all, must accept HTML from third parties and This is an inserted line This is an inserted line send them on to their users. Beware - you are treading dangerous ground at this point; be sure that you really want to do this. Even the idea of practitioners, because it is extremely difficult to get it right. accepting HTML from arbitrary places is controversial among some security However, if your application must accept HTML, and you believe that it's worth the risk, at least identify a list of ``safe'' HTML commands and only This is an inserted line permit those commands. This is a replaced line (such as guestbooks) that support short comments:

(paragraph), (bold), (italics), (emphasis), (strong emphasis),

closing tag), as well as all their ending tags.
(preformatted text), 
(forced line break - note it doesn't require a commands are accepted, you also need to ensure that they are properly nested Not only do you need to ensure that only a small set of ``safe'' HTML and closed (i.e., that the HTML commands are ``balanced''). In XML, this is termed ``well-formed'' data. A few exceptions could be made if you're accepting standard HTML (e.g., supporting an implied

where not provided before a

would be fine), but trying to accept HTML in its full generality (which can infer balancing closing tags in many cases) is not needed for most This is an inserted line applications. Indeed, if you're trying to stick to XHTML (instead of HTML), tags can be upper case, lower case, or a mixture. However, if you intend to This is an inserted line then well-formedness is a requirement. Also, HTML tags are case-insensitive; case-sensitive; XHTML uses XML and requires the tags to be in lower case). whatever surrounds the HTML text and the set of permitted tags so that the Here are a few random tips about doing this. Usually you should design contributed text cannot be misinterpreted as text from the ``main'' site (to prevent forgeries). Don't accept any attributes unless you've checked the such as Javascript that can cause trouble for your users. You'll notice that This is an inserted line the safest course. You should probably give a warning message if an unsafe tag is used, but if that's not practical, encoding the critical characters (e.g., "<" becomes "<") prevents data loss while simultaneously keeping the users safe. Be careful when expanding this set, and in general be restrictive of what you sequences differently than you expect, resulting in a potential exploit. For accept. If your patterns are too generous, the browser may interpret the example, FozZy posted on Bugtraq (1 April 2002) some sequences that permitted exploitation in various web-based mail systems, which may give you an idea of the kinds of problems you need to defend against. Here's some exploit text that, at one time, could subvert user accounts in Microsoft Hotmail: This is an inserted line &{[code]}; [N4] [N4]