homework posted for flovodoh
fanbu
C O
N FI
D E
N TI
A L
D R
A FTLinux and C Programming
Saverio Perugini
Department of Computer Science
University of Dayton
February 9, 2017
Copyright © 2017 by Saverio Perugini. ALL RIGHTS RESERVED.
C O
N FI
D E
N TI
A L
D R
A FT
ii
ad majorem Dei gloriam
Saint Francis de Sales, Patron of Writers, Pray for Us.
C O
N FI
D E
N TI
A L
D R
A FT
Preface
This is a book on Linux and C. This is not a passive book.
Why Study This Stuff Anyway?
• an improved understanding/appreciation of the internals of your system and sys- tems software will make you a better application programmer
• UNIX and C are an enabling environment/language for wide variety of science and engineering disciplines (e.g., bioinformatics)
• since UNIX and C are ubiquitous in our field, in general, to be a well-rounded com- puter scientist
• communication and concurrency are everything in today’s software
• ability to write reliable and secure code is indispensable (counter-terrorism) gate- way to studies in distributed computing and networking
Use of this Book
UNIX Compliance
Prerequisites
Book Objectives
• Develop a proficiency in Linux and C as a systems programming language/envi- ronment.
• Establish an understanding of the Linux style of programming and problem solving.
• Survey various system-oriented software tools, including debuggers, and compila- tion and configuration managers.
iii
C O
N FI
D E
N TI
A L
D R
A FT
iv
• Establish an understanding of the design and development of systems software, such as command interpreters and compilers, through the study of pattern match- ing and filters, interprocess communication, system libraries, signals, and automatic program generation.
• Explore Linux internals and establish an understanding of Linux system calls.
• Introduce the client/server model of computation.
Graphic View of Outline
Linux
calls
code machine
code assembly
system
lex & yacc
regular expressions,
ksh, sed,
C++
C
Go
Qt awk
Linux and C Fundamentals:
Programming
Part I Part II
Scripting:
Part III
Automatic Program
Generation:
Part IV
Processes Pattern Matching,
Filters, and Shell Programming
Level of Abstraction
Module
We aim for breadth rather than depth here.
The following figure illustrates the dependencies between the chapters of this book.
C O
N FI
D E
N TI
A L
D R
A FT
v
Part I: Fundamentals
Part IV: Automatic Program GenerationPart II: Processes
Part III: Scripting
1
2
34
6 895
7 10
Book Conventions
• For ease of exposition, we use decimals rather than hexidecimals to denote pointer values in C.
Exercises and Programming Projects
Support on the World Wide Web
The author maintains supplemental material for this textbook online at http:// academic.udayton.edu/SaverioPerugini/SPUC/.
CONFIDENTIAL DRAFT
v i
C O
N FI
D E
N TI
A L
D R
A FT
Contents
Preface iii
List of Figures xvi
List of Tables xx
1 Introduction to Linux 1 1.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 What is Linux Programming? . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.2 What is Systems Software? . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.3 Examples of Systems Software . . . . . . . . . . . . . . . . . . . . . . 2 1.2.4 One Dichotomy of Programming . . . . . . . . . . . . . . . . . . . . . 2 1.2.5 Another Viewpoint (Course Themes) . . . . . . . . . . . . . . . . . . 3 1.2.6 Review of Operating System Nomenclature . . . . . . . . . . . . . . . 3 1.2.7 Why Study This Stuff Anyway? . . . . . . . . . . . . . . . . . . . . . 5 1.2.8 Conceptual Exercises for Section 1.2 . . . . . . . . . . . . . . . . . . . 5 1.2.9 Programming Exercises for Section 1.2 . . . . . . . . . . . . . . . . . . 7
1.3 Introduction to Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 What is Linux? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 Hallmarks of Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.3 Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.4 The UNIX Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.5 History of UNIX and C . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.6 Conceptual UNIX Architecture . . . . . . . . . . . . . . . . . . . . . . 13 1.3.7 Accessing a UNIX Account . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.8 General Syntax of UNIX Commands . . . . . . . . . . . . . . . . . . . 13 1.3.9 Getting Help on the UNIX System . . . . . . . . . . . . . . . . . . . . 13 1.3.10 UNIX Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.11 Introduction to the vi Editor . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.12 Conceptual Exercises for Section 1.3 . . . . . . . . . . . . . . . . . . . 17
1.4 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.6 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
vii
C O
N FI
D E
N TI
A L
D R
A FT
viii CONTENTS
2 Files and Directories I:Manipulation and Management 21 2.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Basic UNIX File Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 ls and cal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Explanation of ls -l Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 UNIX Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.6 Absolute vs. Relative Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.7 Two Special Files in Every Directory . . . . . . . . . . . . . . . . . . . . . . . 24 2.8 Navigating through Directories . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.9 File Manipulation and Management . . . . . . . . . . . . . . . . . . . . . . . 24 2.10 Conceptual Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . 24 2.11 Programming Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 26 2.12 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.13 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.14 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.15 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 The Linux Shell 27 3.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Shell Commands vs. UNIX Commands . . . . . . . . . . . . . . . . . . . . . . 28 3.4 More on Redirecting Standard Error . . . . . . . . . . . . . . . . . . . . . . . 28 3.5 Kernel metacharacters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.6 stty Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.7 Korn Shell metacharacters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7.1 Metacharacters at Different Levels of Interpretation . . . . . . . . . . 28 3.8 Command Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.9 Shell metacharacter interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.10 Shell Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.11 Conceptual Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . 28 3.12 Programming Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 34 3.13 Programming Project for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . 34 3.14 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.15 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.16 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.17 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Introduction to C Programming:System Libraries and I/O 37 4.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Header Files vs. Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 Standard C Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 Standard I/O vs. File I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.5 Standard I/O Redirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.6 Demo of cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.7 Redirecting Standard I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C O
N FI
D E
N TI
A L
D R
A FT
CONTENTS ix
4.8 File Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.9 Demo of wc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.10 I/O in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.11 Effect of a Successful Open on a File . . . . . . . . . . . . . . . . . . . . . . . 39 4.12 Analogs from C++ to C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.13 Review of Standard I/O Functions . . . . . . . . . . . . . . . . . . . . . . . . 39 4.14 Developing cat in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.15 Portability (Safety) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.16 String Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.17 ‘s’ Family of printf/scanf Functions . . . . . . . . . . . . . . . . . . . . . 43 4.18 Using a Pointer to Traverse an Array . . . . . . . . . . . . . . . . . . . . . . . 43 4.19 Simple Macro vs. Constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.20 String Copy Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.21 Command-line Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.22 The argv Array for the Call a.out -wlc myfile . . . . . . . . . . . . . . 44 4.23 Compiling a C Program in UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.24 Compiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.25 C Compilation Steps Using gcc . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.26 The key options to gcc graphically . . . . . . . . . . . . . . . . . . . . . . . . 47 4.27 C Compilation Steps Graphically . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.28 file Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.29 Memory Management: Memory Allocation and Deallocation . . . . . . . . . 47 4.30 Conceptual Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . 47 4.31 Programming Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . 50 4.32 Programming Project for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . 67 4.33 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.34 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.35 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.36 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5 Compiling C in Linux 69 5.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Compiling C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.2 Static vs. Dynamic Linking . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.3 More on Compiling with gcc . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.4 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.5 Process Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.6 NULL Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.7 extern Modifier in C . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.8 Conditional Compilation . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2.9 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.10 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.11 Conceptual Exercises for Section 5.2 . . . . . . . . . . . . . . . . . . . 72 5.2.12 Programming Exercises for Section 5.2 . . . . . . . . . . . . . . . . . . 72
C O
N FI
D E
N TI
A L
D R
A FT
x CONTENTS
5.3 Building a Library in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3.1 Conceptual Exercises for Section 5.3 . . . . . . . . . . . . . . . . . . . 72 5.3.2 Programming Exercises for Section 5.3 . . . . . . . . . . . . . . . . . . 73
5.4 More topics in C: Storage Classes, Thread-safe Functions, and Macros . . . 74 5.4.1 Declarations and Definitions . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.2 Storage and Linkage Classes . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.3 static Modifier in C . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.4 Summary of static Reserved Word . . . . . . . . . . . . . . . . . . 74 5.4.5 C Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.6 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.7 Thread Safe Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.8 makeargv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.9 Self-study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.10 Macros: The #define Preprocessor Directive . . . . . . . . . . . . . 77 5.4.11 Macros vs. Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.12 Conceptual Exercises for Section 5.4 . . . . . . . . . . . . . . . . . . . 77 5.4.13 Programming Exercises for Section 5.4 . . . . . . . . . . . . . . . . . . 80
5.5 Compilation and Configuration Management . . . . . . . . . . . . . . . . . . 88 5.5.1 Compilation Management: make . . . . . . . . . . . . . . . . . . . . . 88 5.5.2 Configuration Management (RCS) . . . . . . . . . . . . . . . . . . . . 90 5.5.3 Distributed Configuration Management (GIT) . . . . . . . . . . . . . 91 5.5.4 Conceptual Exercises for Section 5.5 . . . . . . . . . . . . . . . . . . . 91 5.5.5 Programming Exercises for Section 5.5 . . . . . . . . . . . . . . . . . . 97
5.6 Packaging and Compression Utilities . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.1 ar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.2 tar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.3 gzip/gunzip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6.4 compress/uncompress . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6.5 Conceptual Exercises for Section 5.6 . . . . . . . . . . . . . . . . . . . 101
5.7 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.9 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.10 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6 Files and Directories II:Inodes, Hard and Symbolic Links 103 6.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2 Low-Level I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2.1 Review of Linux I/O Data Structures . . . . . . . . . . . . . . . . . . 104 6.2.2 Review of Buffered Output . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.3 Library vs. System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.4 I/O Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.5 select and poll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3 Disk Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.4 File Access (3 Types) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5 File Permissions, Owners, and Groups . . . . . . . . . . . . . . . . . . . . . . 104
C O
N FI
D E
N TI
A L
D R
A FT
CONTENTS xi
6.6 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.7 Relevant Accessor/Modifier Functions, and structs . . . . . . . . . . . . . 104 6.8 Inodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.9 File Links: Hard vs. Soft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.10 Hard Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.11 Symbolic (Soft) Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.12 Editor Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.13 od (Octal Dump) Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.14 File ‘Types’ and ‘Names’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.15 Question to investigate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.16 Set-uid Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.17 Login Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.18 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.19 find Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.20 Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.21 Character and Block Special Files in Linux . . . . . . . . . . . . . . . . . . . . 109 6.22 Conceptual Exercises for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . 109 6.23 Programming Exercises for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . 115 6.24 Programming Project for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . 116 6.25 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.26 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.27 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.28 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7 Processes: Creation, Environment,Manipulation, and Communication 119 7.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2.1 Process Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.3 Process Creation: fork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.3.1 Background Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.3.2 fork Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.3.3 Conceptual Exercises for Section 7.3 . . . . . . . . . . . . . . . . . . . 120 7.3.4 Programming Exercises for Section 7.3 . . . . . . . . . . . . . . . . . . 129
7.4 Process Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.4.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.4.2 Accessing the Environment . . . . . . . . . . . . . . . . . . . . . . . . 130 7.4.3 New Account Environment . . . . . . . . . . . . . . . . . . . . . . . . 131 7.4.4 Command-line Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.4.5 PATH Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.4.6 Korn Shell Configuration and Customization . . . . . . . . . . . . . . 132 7.4.7 .profile vs. (value of) ENV . . . . . . . . . . . . . . . . . . . . . . . 132 7.4.8 .plan and .project . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.4.9 Configuring vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.4.10 Conceptual Exercises for Section 7.4 . . . . . . . . . . . . . . . . . . . 132 7.4.11 Programming Exercise for Section 7.4 . . . . . . . . . . . . . . . . . . 141
C O
N FI
D E
N TI
A L
D R
A FT
xii CONTENTS
7.5 Process Manipulation: wait and exec . . . . . . . . . . . . . . . . . . . . . . 143 7.5.1 wait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.2 fork and wait Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.3 exec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.4 Investigating Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.5 Process Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.6 Other Things to Know . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.7 Conceptual Exercises for Section 7.5 . . . . . . . . . . . . . . . . . . . 143 7.5.8 Programming Exercises for Section 7.5 . . . . . . . . . . . . . . . . . . 148
7.6 Putting It All Together: Basic Shell Setup . . . . . . . . . . . . . . . . . . . . 153 7.7 Interprocess Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.7.1 I/O Redirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.7.2 Implementing I/O Redirection . . . . . . . . . . . . . . . . . . . . . . 153 7.7.3 Helpful Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.7.4 Unamed and Named Pipes (FIFOs) . . . . . . . . . . . . . . . . . . . . 156 7.7.5 C Model vs. Go Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.7.6 Signals and Job Control . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.7.7 Conceptual Exercises for Section 7.7 . . . . . . . . . . . . . . . . . . . 164 7.7.8 Programming Exercises for Section 7.7 . . . . . . . . . . . . . . . . . . 165
7.8 Client-server Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 7.8.1 Observations on Client-server Programs . . . . . . . . . . . . . . . . 166 7.8.2 Experimental Runs of Client-server Programs . . . . . . . . . . . . . 166 7.8.3 Conceptual Exercises for Section 7.8 . . . . . . . . . . . . . . . . . . . 166 7.8.4 Programming Exercises for Section 7.8 . . . . . . . . . . . . . . . . . . 166
7.9 Client-server Programming in Qt . . . . . . . . . . . . . . . . . . . . . . . . . 168 7.9.1 Programming Exercises for Section 7.9 . . . . . . . . . . . . . . . . . . 168
7.10 Programming Project for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . 169 7.11 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.12 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.13 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.14 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8 Regular Expressions, Pattern Matching, and Filters 177 8.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.2 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.2.1 What /uses/ [Rr]eg.lar [Ee]xpre[s*]ions\? . . . . . . . . 178 8.2.2 Special or Metacharacters . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.2.3 Regular Expression Examples . . . . . . . . . . . . . . . . . . . . . . . 180 8.2.4 Regular Expression Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 182 8.2.5 Using grep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 8.2.6 Full Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . 183 8.2.7 Subtle Point about Tools that use Regular Expressions . . . . . . . . . 184 8.2.8 Conceptual Exercises for Section 8.2 . . . . . . . . . . . . . . . . . . . 184 8.2.9 Programming Exercises for Section 8.2 . . . . . . . . . . . . . . . . . . 189
8.3 sed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
C O
N FI
D E
N TI
A L
D R
A FT
CONTENTS xiii
8.3.1 ex (Line Editor) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 8.3.2 Essential sed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 8.3.3 Some Representative Examples . . . . . . . . . . . . . . . . . . . . . . 194 8.3.4 A Simple Faculty Database Example . . . . . . . . . . . . . . . . . . . 194 8.3.5 d for Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 8.3.6 p for Print . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.3.7 More sed Jargon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.3.8 A Tale of Two Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.3.9 newer Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 8.3.10 Conceptual Exercises for Section 8.3 . . . . . . . . . . . . . . . . . . . 199 8.3.11 Programming Exercises for Section 8.3 . . . . . . . . . . . . . . . . . . 200 8.3.12 Programming Project for Section 8.3 . . . . . . . . . . . . . . . . . . . 205
8.4 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 8.4.1 tr (anslate) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 8.4.2 sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 8.4.3 uniq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 8.4.4 Spellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 8.4.5 Pipeline of Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 8.4.6 Toward Database Operations: cut and paste, and join . . . . . . 207 8.4.7 File Comparison Utilities . . . . . . . . . . . . . . . . . . . . . . . . . 208 8.4.8 Printing and Other Related Filter Utilities . . . . . . . . . . . . . . . . 209 8.4.9 Conceptual Exercises for Section 8.4 . . . . . . . . . . . . . . . . . . . 210 8.4.10 Programming Exercises for Section 8.4 . . . . . . . . . . . . . . . . . . 211
8.5 The awk Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.5.2 Execution Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.5.3 Simple awking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.5.4 Fine Tuning awk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 8.5.5 Some Example awk Command Lines . . . . . . . . . . . . . . . . . . . 214 8.5.6 Gradebook Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 8.5.7 Implementing uniq in awk . . . . . . . . . . . . . . . . . . . . . . . . 215 8.5.8 Conceptual Exercises for Section 8.5 . . . . . . . . . . . . . . . . . . . 216 8.5.9 Programming Exercises for Section 8.5 . . . . . . . . . . . . . . . . . . 216 8.5.10 Programming Project for Section 8.5 . . . . . . . . . . . . . . . . . . . 217
8.6 Programming Projects for Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . 217 8.7 Linux Filter Style of Programming . . . . . . . . . . . . . . . . . . . . . . . . 219 8.8 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 8.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 8.10 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 8.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
9 Shell Programming 225 9.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 9.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.2.1 return vs. exit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
C O
N FI
D E
N TI
A L
D R
A FT
xiv CONTENTS
9.2.2 Command-line Arguments . . . . . . . . . . . . . . . . . . . . . . . . 226 9.3 Command and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.3.1 for Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 9.3.2 String Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 9.3.3 if Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 9.3.4 Additional Condition Tests . . . . . . . . . . . . . . . . . . . . . . . . 231 9.3.5 while Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 9.3.6 Putting It All Together: ourwhich Script . . . . . . . . . . . . . . . . 232 9.3.7 case Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 9.3.8 Example: Factoring Command-line Arguments . . . . . . . . . . . . 235 9.3.9 Conceptual Exercises for Section 9.3 . . . . . . . . . . . . . . . . . . . 237 9.3.10 Programming Exercises for Section 9.3 . . . . . . . . . . . . . . . . . . 238
9.4 Numbers and Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 9.4.1 Numeric Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 9.4.2 Example: Renaming Multiple .c Files to .cpp . . . . . . . . . . . . . 241 9.4.3 Array Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 9.4.4 Restricted Shells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 9.4.5 Conceptual Exercises for Section 9.4 . . . . . . . . . . . . . . . . . . . 243 9.4.6 Programming Exercises for Section 9.4 . . . . . . . . . . . . . . . . . . 244
9.5 Shell Programming vs. Linux Filter Style of Programming . . . . . . . . . . 245 9.6 Conceptual Exercises for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . 245 9.7 Programming Exercises for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . 245 9.8 Programming Project for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . 245 9.9 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 9.10 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 9.11 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 9.12 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
10 Automatic Program Generation 251 10.1 Chapter Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 10.2 Scanner Generation: flex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
10.2.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 10.2.2 Linux Tools for Automatically Generating Scanners and Parsers . . . 251 10.2.3 Structure of a flex Specification: . . . . . . . . . . . . . . . . . . . . . 251 10.2.4 Our First flex Program: cat (version 0) . . . . . . . . . . . . . . . . 252 10.2.5 noop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 10.2.6 cat (version 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 10.2.7 Running flex to Automatically Generate a Scanner . . . . . . . . . . 252 10.2.8 cat (version 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.9 cat (version 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.10 cat -n (version 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.11 cat -n (version 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.12 Word Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.13 Pattern Overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.14 Identifying Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
C O
N FI
D E
N TI
A L
D R
A FT
CONTENTS xv
10.2.15 Matching Quoted Strings . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.16 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 10.2.17 Matching C Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 10.2.18 Conceptual Exercises for Section 10.2 . . . . . . . . . . . . . . . . . . 262 10.2.19 Programming Exercises for Section 10.2 . . . . . . . . . . . . . . . . . 266 10.2.20 Programming Projects for Section 10.2 . . . . . . . . . . . . . . . . . . 267
10.3 Parser Generation: bison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 10.3.1 Scanning and Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 10.3.2 Evaluating Arithmetic Expressions in Linux . . . . . . . . . . . . . . 269 10.3.3 Calculator (version 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 10.3.4 Marriage of flex and bison . . . . . . . . . . . . . . . . . . . . . . . 274 10.3.5 Running bison to Generate a Parser . . . . . . . . . . . . . . . . . . 274 10.3.6 Calculator (version 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
10.4 Putting It All Together: Towards Interpreters . . . . . . . . . . . . . . . . . . 281 10.4.1 Calculator (version 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 10.4.2 Helpful C Constructs and Capabilities . . . . . . . . . . . . . . . . . . 288 10.4.3 Structures for Parse Tree Nodes . . . . . . . . . . . . . . . . . . . . . . 289 10.4.4 Precedence and Associativity in Calculator (version 3) . . . . . . . . 289 10.4.5 Interpreters: Program Evaluators . . . . . . . . . . . . . . . . . . . . . 291 10.4.6 Conceptual Exercises for Section 10.4 . . . . . . . . . . . . . . . . . . 292 10.4.7 Programming Exercises for Section 10.4 . . . . . . . . . . . . . . . . . 295
10.5 Programming Project for Chapter 10 . . . . . . . . . . . . . . . . . . . . . . . 307 10.6 Thematic Take-Aways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 10.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 10.8 Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 10.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Bibliography 313
Appendices 314
A Programming Style Guide 315
B Quick vi Reference 329
C vi Reference 331
About the Author 335
CONFIDENTIAL DRAFT
x v
i C
O N
T E
N T
S
C O
N FI
D E
N TI
A L
D R
A FT
List of Figures
1.1 Object-oriented model vis-à-vis the UNIX model of programming. . . . . . . 10 1.2 Dichotomy in the genealogy of the development of UNIX. . . . . . . . . . . . 12
1.3 Conceptual architecture of UNIX systems. . . . . . . . . . . . . . . . . . . . . 12
2.1 File system tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 Absolute path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 Relative path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4 Relative path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5 Relative path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1 Graphical depiction of the relationship between common Linux shells. . . . 28 3.2 Progressive layers of metacharacter interpretation. . . . . . . . . . . . . . . . 30
4.1 Standard input (stdin) and standard output (stdout). . . . . . . . . . . . 38 4.2 I/O redirection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Pipe. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 An argument vector (char** argv). . . . . . . . . . . . . . . . . . . . . . . 45 4.5 The key options to gcc graphically. . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6 C compilation steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1 Logical layout of program image. . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2 Activation record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3 strtok before. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4 strtok after. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.5 Popup dependency graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.6 Logger dependency graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.1 File permissions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2 File pointer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.3 File tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.4 Inode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.5 Directory entry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.6 Hard link. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.7 Hard link. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.8 Soft link. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
xvii
C O
N FI
D E
N TI
A L
D R
A FT
xviii LIST OF FIGURES
7.1 Process life cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.2 Logical layout of process in main memory. . . . . . . . . . . . . . . . . . . . 121 7.3 Graphic depiction of fork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.4 Graphical depiction of wait. . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.5 Graphical depiction of exec. . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.6 Graphical depiction of suite of exec system calls. . . . . . . . . . . . . . . . 146 7.7 Process creation system calls. . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.9 Before redirection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.10 After redirection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.11 Redirection steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.12 After fork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.13 After dup2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.14 After close. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.15 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.16 Ring of processes vis-à ring of threads. . . . . . . . . . . . . . . . . . . . . . . 159 7.17 Shell job control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.18 X server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.1 A finite-state automaton for a legal identifier and positive integer in C. . . . 178 8.2 Progressive layers of metacharacter interpretation. . . . . . . . . . . . . . . . 181 8.3 Graphical depiction of the foundational natural of ed/ex for vi and sed. . 190 8.4 The sed execution model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 8.5 The -e option to sed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 8.6 Graphical depiction of the Linux filter style of programming. . . . . . . . . . 220
10.1 Makefile dependency graph for C strings. . . . . . . . . . . . . . . . . . . . 262 10.2 Simplified view of scanning and parsing: the front end. . . . . . . . . . . . . 269 10.3 Simplified view of scanning & parsing: the front end with flex & bison. . 269 10.4 More detailed view of scanning and parsing. . . . . . . . . . . . . . . . . . . 270 10.5 More detailed view of scanning and parsing with flex and bison. . . . . . 270 10.6 Parse stack and value stacks in bison. . . . . . . . . . . . . . . . . . . . . . . 274 10.7 Marriage of flex and bison. . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 10.8 Marriage of flex and bison in calculator. . . . . . . . . . . . . . . . . . . . 275 10.9 Interpreting while parsing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 10.10Interpreting while parsing in calculator (version 1 and 2). . . . . . . . . . . . 280 10.11Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 10.12Alternate view of execution by interpretation. . . . . . . . . . . . . . . . . . . 281 10.13Compilation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 10.14Low-level view of execution by compilation. . . . . . . . . . . . . . . . . . . 282 10.15Calculator expression interpretion. . . . . . . . . . . . . . . . . . . . . . . . . 282 10.16Calculator expression interpretion. . . . . . . . . . . . . . . . . . . . . . . . . 283 10.17Calculator expression compilation. . . . . . . . . . . . . . . . . . . . . . . . . 283 10.18Calculator expression compilation. . . . . . . . . . . . . . . . . . . . . . . . . 283 10.19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
C O
N FI
D E
N TI
A L
D R
A FT
LIST OF FIGURES xix
10.20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 10.21structures for parse tree nodes in calculator (version 3). . . . . . . . . . . . 289 10.22Node type used for literals and variables in calculator (version 3). . . . . . . 290 10.23Node type used for operators (i.e., internal nodes) in calculator (version 3). 290 10.24Makefile dependency graph for calculator (version 3). . . . . . . . . . . . . 292
CONFIDENTIAL DRAFT
x x
L IS
T O
F F
IG U
R E
S
C O
N FI
D E
N TI
A L
D R
A FT
List of Tables
1.1 vi commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 vi command codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Linux shells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Korn shell metacharacters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Effect of a successful open on a file. . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 C++ vs. C I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Review of standard I/O functions. . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1 Storage class summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2 static modifier summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 static modifier summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.1 Differences in metacharacter semantics across similar tools. . . . . . . . . . . 184 8.2 Some sample ex addresses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 8.3 Some sample ex commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 8.4 Some sample sed <condition>s and <action>s. . . . . . . . . . . . . . 192 8.5 Some sample sed command lines. . . . . . . . . . . . . . . . . . . . . . . . . 195 8.6 The faculty.details file. . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 8.7 The guestlist file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.1 String operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 9.2 Additional conditional tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 9.3 Linux filter style of programming (left) vs. shell programming (right). . . . 246
10.1 Pattern matching primitives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 10.2 Pattern matching examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 10.3 flex predefined variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
xxi
C O
N FI
D E
N TI
A L
D R
A FT
xxii LIST OF TABLES
Part I: Linux Fundamentals
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 1
Introduction to Linux
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity. – Dennis Ritchie
1.1 Chapter Objectives
•
This is a book on Linux and C.
1.2 Introduction
1.2.1 What is Linux Programming?
Generally, developing programs which support the development of other programs, or the process of developing systems software.
1.2.2 What is Systems Software?
Software which supports software development, or a computer system in general software which allocates and manages computer resources (e.g., CPU, memory, devices)
1
C O
N FI
D E
N TI
A L
D R
A FT
2 CHAPTER 1. INTRODUCTION TO LINUX
1.2.3 Examples of Systems Software
• assemblers
• compilers (e.g., gcc)
• linkers
• loaders
• command interpreters (i.e., shells, e.g., bash)
• system libraries (e.g., libc)
• device drivers
• debuggers (e.g., gdb)
• system utilities (e.g., env)
• configuration managers (e.g., git)
• compilation managers (e.g., make)
1.2.4 One Dichotomy of Programming
• application programming: targeted toward developing systems to sup- port the end-user.
• systems programming: targeted toward developing systems to support the programmer.
Recently, this boundary has become fuzzy. Building a web browser, such as Google Chrome might once have been considered application pro- gramming. However, nowadays developing such applications requires attention to system details such as resources and efficiency (e.g., Google Chrome is multi-processed).
Historically, systems programming meant programming the system (i.e., building compilers, shells, loaders, and so on). However, nowadays, sys- tems programming has come to mean programming with the system (i.e., making system calls, managing threads, and so on).
C O
N FI
D E
N TI
A L
D R
A FT
1.2. INTRODUCTION 3
We could also say that computer science students study programming software while computer engineering students study programming the interface between hardware and software (historically, they studied pro- gramming hardware).
1.2.5 Another Viewpoint (Course Themes)
Systems programming requires a greater awareness of issues of hardware and efficiency than application programming. What does the following C code do?
1 while ( *p++ = *q++) ;
Since systems programs typically run for a long time and, therefore, must be robust and fault tolerant, systems programmers must be diligent to release resources and check for errors (e.g., a NULL pointer as a return value) in more than just the typical places in a program.
Why is the following code unportable or unsafe?
1 char c ; 2
3 while ( (c = getchar ( ) ) != EOF ) 4 . . .
Systems programming is characterized by the use of languages at a lower level than those used in application programming; one that provides the programmer direct access to and control of system resources; leads us to Linux and C.
1.2.6 Review of Operating System Nomenclature
• program vs. (heavyweight) process
• thread (lightweight process): an ADT within a process; has its own 1. stack, 2. program counter value, 3. register set, and 4. state. ; share process resources (e.g., open files).
• (heavyweight) process vs. (lightweight) thread
• process control block
C O
N FI
D E
N TI
A L
D R
A FT
4 CHAPTER 1. INTRODUCTION TO LINUX
• bootstrapping
• batch process
• resident monitor
• multiprogramming
• timesharing (or preemptive multi-tasking)
• non-preemptive multi-tasking = multiprogramming without time- sharing
• job scheduling
• ready queue
• process scheduling
• process scheduling (or CPU scheduling)
• context switch
• context switch time
• quantum (or time slice)
• system call
• interrupt (hardware)
• interrupt service routine
• (asynchronous or synchronous) signal (software)
• asynchronous event
• synchronous event
• device driver
• paging
• segmentation
• paged segmentation
Linux is a multiprogramming, timeshared OS.
C O
N FI
D E
N TI
A L
D R
A FT
1.2. INTRODUCTION 5
1.2.7 Why Study This Stuff Anyway?
• an improved understanding/appreciation of the internals of your sys- tem and systems software will make you a better application pro- grammer
• UNIX and C are an enabling environment/language for wide variety of science and engineering disciplines (e.g., bioinformatics)
• since UNIX and C are ubiquitous in our field, in general, to be a well- rounded computer scientist
• communication and concurrency are everything in today’s software
• ability to write reliable and secure code is indispensable (counter- terrorism) gateway to studies in distributed computing and network- ing
1.2.8 Conceptual Exercises for Section 1.2
Exercise 1.2.1: What is system programming?
Exercise 1.2.2: Give two examples of systems software.
Exercise 1.2.3: Explain the difference between systems programming and application programming.
Exercise 1.2.4: What is an operating system?
Exercise 1.2.5: What are the primary goals of an operating system?
Exercise 1.2.6: What is a process?
Exercise 1.2.7: Explain the difference between a program and a process.
Exercise 1.2.8: What is multiprogramming?
Exercise 1.2.9: What is a context switch?
Exercise 1.2.10: What is timesharing?
Exercise 1.2.11: What is the biggest bottleneck in any computer system?
C O
N FI
D E
N TI
A L
D R
A FT
6 CHAPTER 1. INTRODUCTION TO LINUX
Exercise 1.2.12: Explain clearly why adding more physical main memory to a computer system makes programs run faster.
Exercise 1.2.13: Give one approach to increase the degree of multipro- gramming in a computer system without increasing the amount of main memory in the system.
Exercise 1.2.14: What does timesharing enable in a computer system that is not possible in a system that is non-timeshared?
Exercise 1.2.15: Which of the following, if any, is possible in a time-shared computer system (with only one processor with one core) that is not pos- sible if the system is not time-shared:
(i) interactive programs (ii) multiple processes running on the processor at once (iii) non-interactive programs (iv) (i), (ii) & (iii) (v) none of the above
Exercise 1.2.16: Which of the following, if any, is contained in a C header (i.e., .h) file:
(i) function definitions (ii) function declarations (iii) (i) & (ii) (iv) none of the above
Exercise 1.2.17: Which of the following, if any, is contained in a statically linked C library (i.e., .a) file:
(i) function definitions (ii) function declarations (iii) (i) & (ii) (iv) none of the above
Exercise 1.2.18: What is a thread, and how does it differ from a process? What does a thread share with its process, and what does it not share with its process?
Exercise 1.2.19: (fill in the blank with the appropriate adjective) A thread is sometimes called a process.
C O
N FI
D E
N TI
A L
D R
A FT
1.2. INTRODUCTION 7
Exercise 1.2.20: Suppose we develop two concurrent solutions to the same problem: one using one process with multiple threads of control and one using multiple processes, each with a single thread of control. If turnaround time is the only evaluation criterion, in general, which solu- tion is preferred? Explain why clearly.
Exercise 1.2.21: (fill in the blank) Adding more main memory to a com- puter system increases the degree of .
Exercise 1.2.22: UNIX is both a time-shared and multiuser operating system. Is it possible to have an OS be one and not the other (i.e., time-shared and not multiuser, or multiuser and not time-shared) or do these two proper- ties always come together?
1.2.9 Programming Exercises for Section 1.2
Exercise 1.2.23: Write a single statement or set of statements to accomplish each of the following:
a) Define a structure called part containing an int variable partNumber, and char array partName whose values may be as long as 25 charac- ters.
b) Define Part to be a synonym for the type struct part.
c) Use Part to declare variable a to be of type struct part, array b[10] to be of type struct part, and variable ptr to be of type pointer to struct part.
d) Read a part number and a part name from the keyboard into the indi- vidual members of variable a.
e) Assign the member values of variable a to element 3 of array b.
f) Assign the address of array b to the pointer variable ptr.
g) Print the members values of element 3 of array b to the display using the variable ptr and the structure pointer operator to refer to the members.
Exercise 1.2.24: Assume the following variables have been declared as shown.
C O
N FI
D E
N TI
A L
D R
A FT
8 CHAPTER 1. INTRODUCTION TO LINUX
double number1 = 7 . 3 , number2 ; char * ptr = NULL ; char s1 [ 1 0 0 ] , s2 [ 1 0 0 ] ;
a) Declare the variable dPtr to be a pointer to a variable of type double.
b) Assign the address of variable number1 to pointer variable dPtr.
c) Print the value of the variable pointed to by dPtr to the display.
d) Assign the value of the variable pointed to by dPtr to variable number2.
e) Print the value of number2 to the display.
f) Print the address of number1 to the display.
g) Print the address stored in dPtr to the display.
h) Is the value printed the equal to the address of number1?
i) Copy the string stored in character array s1 into character array s2.
j) Compare the string stored in character array s1 with the string in char- acter array s2, and print the result to the display.
k) Append the string in character array s2 to the string in character array s1. Will this cause a run-time error?
l) Determine the length of the string stored in character array s1, and print the result to the display.
1.3 Introduction to Linux
1.3.1 What is Linux?
Linux is an operating system. An operating system is a collection of soft- ware programs that manage computer resources (e.g., CPU, main and sec- ondardy memory, and devices) and provide a interface to the computer for the user. The goal of an operating system is to manage computer resources efficiently and make the user interface convenient to use.
1.3.2 Hallmarks of Linux
• multiuser,
C O
N FI
D E
N TI
A L
D R
A FT
1.3. INTRODUCTION TO LINUX 9
• preemptive multitasking (time-shared),
• interactive,
• portable (written in C),
• accessible (nohup, dump process table),
• text-based,
• terse,
• efficient,
• silent, and
• free!
1.3.3 Historical Perspective
Originally systems programs were written in assembly language. Research in the 1960’s lead to BCPL and then C. UNIX developed in the late 1960’s (Ken Thompson, 1969, Bell Labs, successor to MIT’s Multics). UNIX rewrit- ten in C in the early 1970’s. C is a ‘low’ high-level programming language; WYSIWYG (What You See Is What You Get) The marriage of Linux in C provided an ideal environment for systems programming. The majority of systems programming today is still done in UNIX and C.
1.3.4 The UNIX Philosophy
• Communication:
model: compose a solution to a problem by combining several small, atomic programs in creative ways through interprocess communica- tion and interoperability mechanisms, such as pipes.
Atomic programs are the building blocks; communication mecha- nisms are the glue. Such program are easier to develop, debug, and maintain than large, all-encompassing, monolithic systems.
If you give me the right kind of Tinker Toys, I can imagine the building. I can sit there and see primitives and recognize their
C O
N FI
D E
N TI
A L
D R
A FT
10 CHAPTER 1. INTRODUCTION TO LINUX
stdin{ |
}
|
{
|
}
|
{ |stdout
Figure 1.1: Conceptual differences between the object-oriented model of program- ming/problem solving (depicted left) and the UNIX model of programming/problem solving (depicted right). Key: # = object, 2 = process, → = message or data, and ∼ = pipe. (left) sequential vs. (right) concurrent. (left) re-compile vs. (right) re-configure.
power to build structures a half mile high, if only I had just one more to make it functionally complete. – Ken Thompson, creator of UNIX and the 1983 ACM A.M. Turing Award Recip- ient, quoted in IEEE Computer 32(5), 1999.
• Concurrency:
Processes can clone themselves (through fork) Why would you want to do this? Think of programs you use everyday. Turns out to be an incredibly powerful and useful primitive.
• Uniform style of I/O:
We see these themes recur throughout this book.
1.3.5 History of UNIX and C
• 1967: Martin Richards develops BCPL as a language for writing op- erating systems and compilers. Ken Thompson develops B, which evolved from BCPL, at AT&T Bell Laboratories in Murray Hill, NJ. Both B and BCPL were typeless languages (i.e., every data item occu- pied one word in memory).
• 1969: Ken Thompson used B to develop early version of the UNIX op- erating system on a DEC PDP-7 computer at Bell Labs in Murray Hill, NJ. UNIX evolved from Multics, also at Bell Labs. B became widely known as the development language of the UNIX OS.
C O
N FI
D E
N TI
A L
D R
A FT
1.3. INTRODUCTION TO LINUX 11
• 1972: Dennis Ritchie wrote a C compiler at Bell Labs. C evolved from B and was originally implemented on a DEC PDP-11 computer. C was considered a hybrid between a low-level language and a high- level language; gives programmer facilities to allocate and manipulate memory. It was excellent for writing systems programs (e.g., compil- ers), but for other programs C is not the best choice. It does not babysit the programmer with several automatic checks; no training wheels (no undelete).
• 1973: Dennis Ritchie helped Thompson port UNIX to a DEC PDP-11; they rewrote the UNIX kernel in C.
• 1974: they licensed UNIX to colleges and universities for educational purposes. major role in the development of UNIX and C (i.e., ‘four- year effect’) Later UNIX become available for commercial use. Com- puter ”Systems” Research Group at the University of California at Berkeley (UCB) made significant additions and changes. UNIX devel- opers split into two camps. UCB camp (west coast): resulted in BSD (Berkeley Software Distribution), 4.xBSD Berkeley UNIX, Ultrix (DEC’s UNIX, based on BSD 4.2), SunOS, FreeBSD (based on 4.4BSD-Lite) vi editor. AT&T Bell Labs and UNIX Systems Laboratories (USL) camp (east coast): resulted in SVR3
• 1983: Ken Thompson and Dennis Ritchie are given the ACM A.M. Tur- ing Award for contributions to OS theory and the implementation of UNIX:
• 1987: AT&T Bell Labs and Sun Microsystems wanted to merge BSD and System V which resulted in SVR4 (developed jointly by USL and Sun); Sun developed Solaris 2.0; trying to merge today, want a more standard version, ongoing work on POSIX; C evolved into C++ (the ++ creates a pun); Today virtually all new major OS’s are written in C/C++.
UNIX is not an acronym, but a weak pun on Multics – the OS Thomp- son and Ritchie worked on before UNIX.
C O
N FI
D E
N TI
A L
D R
A FT
12 CHAPTER 1. INTRODUCTION TO LINUX
System V
ea st
co as
t west coast
UNIX
NJ AT&T Bell Laboratories
vi
bsd4.* (Solaris) (Berkley Software Distribution)
Figure 1.2: Dichotomy in the genealogy of the development of UNIX.
stdio.h
variables
Filesystem and a suite of commands, libraries, and system calls
Hardware
Kernel interface
X−Windows
g++
bash sh
csh
grep
System call
core of os
system libraries
a.out
ksh wc
creates virtual C computer
libc.a
metacharacters,
as
gcc
ld
date
cal
vi
who
Shells
Application programs
Other application programs
Assemblers, Compilers, Linkers
include files
libc.so
interface to core OS services
Figure 1.3: Conceptual architecture of UNIX systems.
C O
N FI
D E
N TI
A L
D R
A FT
1.3. INTRODUCTION TO LINUX 13
1.3.6 Conceptual UNIX Architecture
• hardware
• kernel
• shells (e.g., bash)
• compilers
– gcc: provides a virtual C computer
– g++: provides a virtual C++ computer
• programs and applications (e.g., cat, wc, sed, awk)
• X-windows system
1.3.7 Accessing a UNIX Account
Login/Logout
Login name echoed; password not echoed. If you enter an invalid string for either, the system will not indicate which was invalid.
concept of the shell: your interface to the system ls’ing, clear, and banner
Some system status commands: date, hostname, whoami (or logname), who, w, uptime (when was the system last rebooted), uname and uname -a, ulimit and ulimit -a (ulimit is a shell builtin), ps and ps -a, and top and htop
1.3.8 General Syntax of UNIX Commands
1.3.9 Getting Help on the UNIX System
For a help on a particular command, use man <command>. The man com- mand retrieves the manpage (manual page) for any command, C library function, or system call. For instance, man wc, man -s 3C printf, man fgetc, man fork, or man man (a self-referential command). A manpage can be searched with /<keyword/topic>.
For all commands on a general topic, use apropos <keyword/topic> (e.g., apropos copy). The apropos command is the same as man -k.
C O
N FI
D E
N TI
A L
D R
A FT
14 CHAPTER 1. INTRODUCTION TO LINUX
Similiarly, the whatis command is the samae as man -f <title>. man printf (which section?) use man -a printf (all) man -s 2 fork, man -s 3 intro
1.3.10 UNIX Manual
Chapter 1: Commands
Chapter 2: System Calls
Chapter 3: Libraries (portable, meet a standard C specification)
Chapter 4: File Formats
Chapter 5: Misc Facilities, macros
Chapter 6: Games
Chapter 7: Devices and Networking
Chapter 8: System Maintenance
Chapter 9: Device Drivers
UNIX Standards
POSIX (Portable Operating System Interface). IEEE standard for UNIX li- braries to promote the development of reliable software Linux, Mac OS X, and many other flavors of UNIX are moving toward POSIX standards (e.g., POSIX threads).
1.3.11 Introduction to the vi Editor
The vi Philosophy
Editors such as vi and emacs are editors for programmers and power- users; they were designed for people who want to be extremely efficient and productive in their work. We study vi since it is the only editor guar- anteed to exist on all UNIX systems. There is a steep learning curve, but the increase in productively is worth the investment. For instance, the h, j, k, l keys, rather then the arrow keys, move the cursor left, down, up, right,
C O
N FI
D E
N TI
A L
D R
A FT
1.3. INTRODUCTION TO LINUX 15
Table 1.1: vi commands. Description Insert text
before cursor i at beginning of line I after cursor a at end of line A after current line o before current line O
respectively. Why? Because it is quicker for the typist to reach the h,j, k, andl keys than the arrow keys on the keyboard.
The vi editor is a moded editor. There are two main modes: insert mode and command mode. Editing text is done in insert mode. There are multi- ple ways to enter insert mode. Which to use depends are what you want to do once in insert mode. Type i to enter insert mode. This will allow you to enter text at the current cursor position. Hitting the o key will also put you in insert mode, but will also open a new line. Commands are en- tered in command mode. There is only one way to enter command mode — by hitting the <escape> key. When vi is started, you are by default in command mode.
The u key undoes the previous operation. To save the current file enter :w (file write) in command mode. To quit the editor without saving (i.e., writing), enter :q (quit, no write) in command mode. To save and quit, enter :wq in command mode. This is the same as <shift-ZZ> (file write and quit).
See Appendices ??.
The command mode in vi is built on top of ex and ex is built on top of ed (the original UNIX line editor); hitting : while in command mode permits the user to enter ex commands
The general syntax for vi commands:
vi [n]<operator> [m] <object> ex :[address] <command> [<options>]
C O
N FI
D E
N TI
A L
D R
A FT
16 CHAPTER 1. INTRODUCTION TO LINUX
Table 1.2: vi command codes. Description Command code
move one space to the right space, l, or right arrow move one space to the left h, or left arrow move down one line j, or down arrow move up one line k, or up arrow move one word to the right w, or W move one word to the left b, or B move to beginning of line 0 move to end of line $ move to top of screen H move to middle of screen M move to bottom of screen L save contents to file :w quit file :q quit vi, saving file only if changes were made :x save file and quit vi :wq save contents to file and quit vi ZZ toggle between uppercase and lowercase ˜ delete back one character X delete character under cursor x delete line dd delete word dw
C O
N FI
D E
N TI
A L
D R
A FT
1.3. INTRODUCTION TO LINUX 17
vi Editor
Text Editing: vi
1.3.12 Conceptual Exercises for Section 1.3
Exercise 1.3.1: List three properties of the UNIX operating system, one of which must not also be a property of Microsoft Windows.
Exercise 1.3.2: Give three hallmarks of the UNIX operating system.
Exercise 1.3.3: (true / false) Linux is a preemptive multitasking (time-shared) operating system.
Exercise 1.3.4: To log off of the Korn shell, you should:
a) enter the EOF character
b) enter stop
c) enter logoff
d) enter logout
e) enter bye
Exercise 1.3.5: List and describe succinctly one item from each of the first three sections of the UNIX Reference manual. Each of these items must be accessible using the man command on our system. Do not copy whole pages from the manual. Instead, phrase the explanations in your own words.
Exercise 1.3.6: Do not give the definitions, but for each of the following, state in which section (1, 2, or 3) of the UNIX Manual you would find it described, with brief reasons.
a) strlen
b) bash
c) read
Exercise 1.3.7: Why should we study vi?
Exercise 1.3.8: In vi, to delete three words forward from the cursor, enter
C O
N FI
D E
N TI
A L
D R
A FT
18 CHAPTER 1. INTRODUCTION TO LINUX
a) d3w
b) 3dd
c) 3x
d) d3f
Exercise 1.3.9: (true or false) vi and emacs are qualitatively different in that vi has modes and emacs is modeless.
Exercise 1.3.10: To read a file trig.c into vi at the cursor position, enter (assume trig.c resides in the directory from which vi was started and that you are in command mode): a) r trig.c b) :r trig.c c) <esc>r trig.c .
Exercise 1.3.11: How do you save and exit the vi editor when in insert mode? Give the sequence of keystrokes.
Exercise 1.3.12: How do you save the current file and exit the vi editor when in insert mode? Give the complete sequence of keystrokes.
Exercise 1.3.13: In vi, how do you delete the character at the current cur- sor position assuming you are in command mode?
Exercise 1.3.14: In vi, to delete three lines forward from the cursor assum- ing you are in insert mode, enter
a) d3w
b) 3dd
c) i3dd
d) i3ll
e) <esc>3ll
f) 3x
g) i3x
h) <esc>3x
i) <esc>3dd
j) d3f
C O
N FI
D E
N TI
A L
D R
A FT
1.4. THEMATIC TAKE-AWAYS 19
1.4 Thematic Take-Aways
1.5 Chapter Summary
1.6 Key Terms
systems programming, systems software.
1.7 Bibliographic Notes
C O
N FI
D E
N TI
A L
D R
A FT
20 CHAPTER 1. INTRODUCTION TO LINUX
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 2
Files and Directories I: Manipulation and Management
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
2.1 Chapter Objectives
•
21
C O
N FI
D E
N TI
A L
D R
A FT
22 CHAPTER 2. FILES AND DIRECTORIES I:
MANIPULATION AND MANAGEMENT
/ (root)
/bin (executable commands)
/dev (device drivers)
/sbin (system executable
/tmp (system scratch files)
/etc (system administration)
/home (links to users’ home
/src (source code)
/lib (object, source libraries)
/usr (user utilities)
directories)
commands)
Figure 2.1: File system tree.
cps444−n1.02
hw1 hw3
wc.c logapp.c
. . . . . .
. . .
. . .
/
bin etchome dev
cps444−n1.01
homeworks
Figure 2.2: Absolute path.
C O
N FI
D E
N TI
A L
D R
A FT
2.1. CHAPTER OBJECTIVES 23
. . . . . .
. . .
. . .
/
bin etchome dev
cps444−n1.01
homeworks
cps444−n1.02
hw1 hw3
wc.c logapp.c
Figure 2.3: Relative path.
. . . . . .
. . .
. . .
/
bin etchome dev
cps444−n1.01
homeworks
cps444−n1.02
hw1 hw3
wc.c logapp.c
Figure 2.4: Relative path.
C O
N FI
D E
N TI
A L
D R
A FT
24 CHAPTER 2. FILES AND DIRECTORIES I:
MANIPULATION AND MANAGEMENT
. . . . . .
. . .
. . .
/
bin etchome dev
cps444−n1.01
homeworks
cps444−n1.02
hw1 hw3
wc.c logapp.c
Figure 2.5: Relative path.
2.2 Basic UNIX File Nomenclature
2.3 ls and cal
2.4 Explanation of ls -l Output
2.5 UNIX Filesystem
2.6 Absolute vs. Relative Path
2.7 Two Special Files in Every Directory
2.8 Navigating through Directories
2.9 File Manipulation and Management
2.10 Conceptual Exercises for Chapter 2
Exercise 2.10.1: Give examples of two top-level subdirectories other than /dev and a brief description of the role of each.
C O
N FI
D E
N TI
A L
D R
A FT
2.10. CONCEPTUAL EXERCISES FOR CHAPTER ?? 25
Exercise 2.10.2: To list your . files, enter
a) dot
b) .
c) ls -l
d) ls -F
e) ls -a
Exercise 2.10.3: Which file in the UNIX system is designated as the system trash and why might you need to use it?
Exercise 2.10.4: Write a complete command line to remove (only) all plain files (not directories or links) ending in .core residing in or below your login directory.
Exercise 2.10.5: Write a complete command line to remove all files ending in .core residing in or below your login directory. Your solution must work from any directory.
Exercise 2.10.6: Write a single complete command line to remove all files ending in .core residing in or below your login directory. Your solution must work from any directory.
Exercise 2.10.7: Write a complete command line to remove (only) all plain files ending in .core (only) residing in your current working directory.
Exercise 2.10.8: Give a complete command line to remove a file named -r.
Exercise 2.10.9: Give a single complete command line to delete a file named -r.
Exercise 2.10.10: Give a directory owned by root in which you have write permissions.
Exercise 2.10.11: Give a directory owned by root in which you do not have write permissions.
Exercise 2.10.12: Explain the difference between a relative and absolute path.
C O
N FI
D E
N TI
A L
D R
A FT
26 CHAPTER 2. FILES AND DIRECTORIES I:
MANIPULATION AND MANAGEMENT
2.11 Programming Exercises for Chapter 2
2.12 Thematic Take-Aways
2.13 Chapter Summary
2.14 Key Terms
2.15 Bibliographic Notes
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 3
The Linux Shell
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
3.1 Chapter Objectives
•
3.2 Introduction
In Chapter ?? we said that an operating system is a manager and an inter- face. In Linux, the interface, or the shell, is a programming language—and a conduit to the computer hardware, as we will see in detail in Chapter 9.
Table 3.1 Fig. 3.1
Table 3.1: Linux shells. Name Command Default Prompt Notes
Bourne sh $ original UNIX shell Korn ksh $ superset of Bourne C csh % has C-like syntax Bourne-Again Shell bash bash$ superset of Bourne Tenex Shell tcsh > superset of C shell Z Shell zsh :∼> combination of ksh, bash, and csh
. . .
develop your own!
27
C O
N FI
D E
N TI
A L
D R
A FT
28 CHAPTER 3. THE LINUX SHELL
sh (Bourne shell)
bash (Bourne Again shell) ksh (Korn shell)
zsh (Z shell)
csh (C shell)
tcsh (Tenex shell)
Figure 3.1: Graphical depiction of the subset/superset relationship between common Linux shells. The Korn (ksh) and Bourne Again (bash) shells are supersets of the original UNIX Bourn shell (sh) while the tcsh is a superset of the C shell (csh).
3.3 Shell Commands vs. UNIX Commands
3.4 More on Redirecting Standard Error
3.5 Kernel metacharacters
3.6 stty Command
3.7 Korn Shell metacharacters
3.7.1 Metacharacters at Different Levels of Interpretation
3.8 Command Substitution
3.9 Shell metacharacter interpretation
3.10 Shell Scripts
3.11 Conceptual Exercises for Chapter 3
Exercise 3.11.1: Assumming a correct program a.out which prints its command-line arguments to standard output, one per line (see Program- ming Exercise 4.31.23), give the output generated by the shell command line: $ ./a.out one two\ three four.
Exercise 3.11.2: Find an example where ksh and csh differ in their behav- ior. For example, . . . .
C O
N FI
D E
N TI
A L
D R
A FT
3.11. CONCEPTUAL EXERCISES FOR CHAPTER ?? 29
Table 3.2: Korn shell metacharacters. Meta-character Meaning
# start of a comment to eol ; command separator ˜ home directory
* match any characters; alone expands to all files in current directory *
? match any single character | pipe or logical ”or” between patterns < redirect standard input > redirect standard output $ get value of variable following ‘<command>’ command substitution; called grave quotes $(<command>) command substitution \ escapes next shell metacharacter;
allows long command-lines to be split across multiple lines ‘ ... ’ ... protected from shell interpretation ‘‘ ... ’’ ... protected from shell interpretation, except for
$, \, ‘‘ ’’, or $( ) (or ‘ ’) [ begin a character group ] end a character group - denotes a character range ! negate a character group ?(<pattern>) match zero or one instance of <pattern>
*(<pattern>) match zero or more instances of <pattern> +(<pattern>) match one or more instances of <pattern> @(<pattern>) match exactly one instance of <pattern> !(<pattern>) match any strings which do not contain <pattern>
C O
N FI
D E
N TI
A L
D R
A FT
30 CHAPTER 3. THE LINUX SHELL
)
$grep \\\\ wc.c\n
$ls cat.c wc.c $grep \\ wc.c
$grep \\\\ wc.c\n $la^?s *.c\n ^D
^U ^V
Kernel metacharacters
kernel
sh, ksh, bash )(e.g.,
shell
)grep, sed, awk(e.g.,
application
terminated by a \n )
interpreted command line
command line
output
keystrokes
(perhaps containing shell metacharacters: *, ?, #, \
consumes shell metacharacters
consumes apllication metacharaters
(application metacharacters: \, $
$ls *.c\n
Figure 3.2: Progressive layers of metacharacter interpretation.
C O
N FI
D E
N TI
A L
D R
A FT
3.11. CONCEPTUAL EXERCISES FOR CHAPTER ?? 31
Exercise 3.11.3: [KP84, exercises 1-1 & 1-2, p. 7] Start with the following environment:
1 $ stty k i l l '@'
2 $ stty erase '#'
3 $ stty lnext '\'
4 $ sh
Explain the results of each of the commands in the following transcript:
1 $ date\@ 2 date@ : not found 3 $ date
4 Fri Sep 2 0 9 : 1 0 : 4 5 EDT 2005 5 $ # date 6 Fri Sep 2 0 9 : 1 0 : 4 5 EDT 2005 7 $ \# date
Exercise 3.11.4: [KP84, exercise 1-4, p. 29] Consider the file junk.
Take one sentence to explain the output of each of the following command lines (there are 10):
1 $ ls junk
2
3 $ echo junk
4
5 $ ls / 6
7 $ echo / 8
9 $ ls
10
11 $ echo
12
13 $ ls * 14
15 $ echo * 16
17 $ ls '*'
18
19 $ echo '*'
C O
N FI
D E
N TI
A L
D R
A FT
32 CHAPTER 3. THE LINUX SHELL
For each of the of the rows above compare the command line in the first column to that in the second column.
Exercise 3.11.5: Which of the following are not shell metacharacters (give all that apply)?
$ . ; | / \ &
Exercise 3.11.6: Give and explain the output of the following Korn shell commands:
1 echo 'Go $HOME'
2 echo "$5.00 is too much!"
3 echo $ (who | wc −l ) users is not very many
Exercise 3.11.7: (true / false) In the Korn shell, single quotes protect dou- ble quotes.
Exercise 3.11.8: To list all the files ending in .c or .h, enter
(a) ls *[ch] (b) ls *.[c|h] (c) ls *.[ch]
Exercise 3.11.9: To list all the files ending in .c or .cpp, enter
(a) ls *[c,cpp] (b) ls *.[c|cpp] (c) ls *.{c,cpp}
Exercise 3.11.10: [Rob99, p.4] Give the output of the following com- mand lines (assume there are 9 files in the current working directory, /home/linda, and x=10):
a) $ echo ’Send output of "command" to file descriptor 2’
b) $ echo "Well, isn’t that \"special\"?"
c) $ echo "You have $(ls | wc -l) files in $(pwd)"
d) $ print "You have \$(ls | wc -l) files in \$(pwd)"
C O
N FI
D E
N TI
A L
D R
A FT
3.11. CONCEPTUAL EXERCISES FOR CHAPTER ?? 33
e) $ echo ’You have $(ls | wc -l) files in $(pwd)’
f) $ echo "The value of \$x is $x"
g) $ print "The value of $x is \$x"
h) $ echo ’Go $HOME’
i) $ echo "$5.00 is too much!"
j) $ echo $(who | wc -l) users is not very many
Exercise 3.11.11: Give the output of the following command lines (assum- ing that each command line is run by a user without write permissions on /):
a) $ touch /
b) $ touch \/
c) $ touch ’/’
Exercise 3.11.12: Suppose a command mystery writes its output to stderr. Give a single command line which would pipe this output to wc -l.
Exercise 3.11.13: Which of the following are not shell metacharacters?
(a) $ (b) . (c) & (d) | (e) / (f) \
Exercise 3.11.14: Is export a shell built-in or a UNIX command? Show how to determine the answer?
C O
N FI
D E
N TI
A L
D R
A FT
34 CHAPTER 3. THE LINUX SHELL
3.12 Programming Exercises for Chapter 3
3.13 Programming Project for Chapter 3
3.14 Thematic Take-Aways
3.15 Chapter Summary
3.16 Key Terms
3.17 Bibliographic Notes
C O
N FI
D E
N TI
A L
D R
A FT
3.17. BIBLIOGRAPHIC NOTES 35
Part I: C Fundamentals
C O
N FI
D E
N TI
A L
D R
A FT
36 CHAPTER 3. THE LINUX SHELL
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 4
Introduction to C Programming: System Libraries and I/O
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
4.1 Chapter Objectives
•
37
C O
N FI
D E
N TI
A L
D R
A FT
38 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
command
stdout
program
stdin
Figure 4.1: Standard input (stdin) and standard output (stdout).
program command ><
Figure 4.2: I/O redirection.
less
ls −l
Figure 4.3: Pipe.
C O
N FI
D E
N TI
A L
D R
A FT
4.2. HEADER FILES VS. LIBRARIES 39
Table 4.1: Effect of a successful open on a file. ‘‘r’’
read ‘‘w’’
write ‘‘a’’
append File Exists - Old contents
discarded -
File Does Not Exist Error File created File created
4.2 Header Files vs. Libraries
4.3 Standard C Library
4.4 Standard I/O vs. File I/O
4.5 Standard I/O Redirection
4.6 Demo of cat
4.7 Redirecting Standard I/O
4.8 File Descriptors
4.9 Demo of wc
4.10 I/O in C
4.11 Effect of a Successful Open on a File
TODO: Fix alignment
4.12 Analogs from C++ to C
4.13 Review of Standard I/O Functions
[C][7–7]
C O
N FI
D E
N TI
A L
D R
A FT
40 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
Table 4.2: C++ vs. C I/O. C++ C
iostream stdio.h
cin stdin
cout stdout
>> fscanf
<< fprintf
Table 4.3: Review of standard I/O functions. stdin and stdout file I/O
character getchar putchar
getc
putc
fgetc
fputc
ungetc
line gets puts
fgets
fputs
formatted scanf printf
fscanf
fprintf
record - -
fread
fwrite
Never use gets. It will continue to store characters past the end of the passed buffer. Thus, it is dangerous to use. See man gets. Use fgets instead.
4.14 Developing cat in C
1 # include<s t d i o . h> 2
3 /* c a t : vers ion 1 */ 4 void filecopy (FILE* ifp , FILE* ofp ) { 5
6 char c ; 7
8 while ( (c = getc (ifp ) ) != EOF ) 9 putc (c , ofp ) ;
10 } 11
12 i n t main ( i n t argc , char * * argv ) {
C O
N FI
D E
N TI
A L
D R
A FT
4.14. DEVELOPING CAT IN C 41
13
14 FILE* fp = NULL ; 15
16 i f (argc == 1) 17 filecopy (stdin , stdout ) ; 18 else
19 while (−−argc > 0) 20 i f ( ( fp = fopen ( * (++argv ) , "r" ) ) == NULL ) { 21 printf ("cat: can't open %s\n" , *argv ) ; 22 return 1 ; 23 } else { 24 filecopy (fp , stdout ) ; 25 fclose (fp ) ; 26 } 27
28 return 0 ; 29 }
[KR88][p. 162]
1 /* r e f . [CPL] Chapter 7 , 7 . 6 , p . 163 with minor mod if ica t ions by ←֓ Perugini */
2 # include<s t d i o . h> 3 # include<s t d l i b . h> 4
5 /* c a t : vers ion 2 */ 6 i n t main ( i n t argc , char * * argv ) { 7
8 void filecopy (FILE* ifs , FILE* ofs ) ; 9
10 i n t exit_status = 0 ; 11
12 char * pgm = *argv ; 13
14 FILE* fp = NULL ; 15
16 i f (argc == 1) 17 filecopy (stdin , stdout ) ; 18 else
19 while (−−argc > 0) 20 i f ( ( fp = fopen ( * (++argv ) , "r" ) ) == NULL ) { 21 fprintf (stderr , "%s: can't open %s\n" , pgm , *argv ) ; 22 //perror (” can ' t open f i l e . ” ) ; 23 // e x i t ( 1 ) ; 24 /* or use fol lowing l i n e to cont inue process ing */ 25 exit_status = 1 ; 26 } else {
C O
N FI
D E
N TI
A L
D R
A FT
42 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
27 filecopy (fp , stdout ) ; 28 fclose (fp ) ; 29 } 30
31 i f (ferror (stdout ) ) { 32 fprintf (stderr , "%s: error writing stdout\n" , pgm ) ; 33 //perror (” e r r o r wr i t ing stdout . ” ) ; 34 exit_status = 2 ; 35 } 36
37 exit (exit_status) ; 38 } 39
40 void filecopy (FILE* ifp , FILE* ofp ) { 41
42 i n t c ; 43
44 while ( (c = getc (ifp ) ) != EOF ) 45 putc (c , ofp ) ; 46 }
[KR88][p. 163]
4.15 Portability (Safety)
1 char c ; 2 while ( (c = getchar ( ) ) != EOF ) { . . . }
4.16 String Functions
strdup = malloc + strcpy
1 # include<s t d i o . h> 2 # include<s t d l i b . h> 3 # include<s t r i n g . h> 4
5 main ( ) { 6 char * str1 = strdup ("Linux" ) ; 7 printf (":%s:\n" , str1 ) ; 8
9 char * str2 = malloc ( s izeo f ( *str2 ) * 6 ) ; 10 strcpy (str2 , "Linux" ) ; 11 printf (":%s:\n" , str2 ) ; 12 }
C O
N FI
D E
N TI
A L
D R
A FT
4.17. ‘S’ FAMILY OF PRINTF/SCANF FUNCTIONS 43
4.17 ‘s’ Family of printf/scanf Functions
4.18 Using a Pointer to Traverse an Array
1 # include <s t d i o . h> 2 # include <s t d l i b . h> 3 # include < l i m i t s . h> 4 # ifndef MAXCANON 5 /* # d e f ine LINELEN 256 */ 6 # define MAXCANON 8192 7 # endif
8
9 /* t r a v e r s e . c */ 10 i n t main ( ) { 11
12 /* char l i n e [LINELEN+ 1 ] ; */ 13 char line [MAX_CANON+ 1 ] ; 14 char * p = NULL ; 15
16 /* same as p = &l i n e [ 0 ] , r i g h t ? */ 17 p = line ; 18
19 /* n o t i c e the parentheses */ 20 while ( ( * p++ = getchar ( ) ) != '\n' ) ; 21
22 *p = '\0' ; 23
24 /* why can ' t we j u s t p r i n t p? */ 25 printf ("%20s\n" , line ) ; 26
27 exit (EXIT_SUCCESS) ; 28 }
4.19 Simple Macro vs. Constant
4.20 String Copy Code
1 # include <s t d i o . h> 2
3 main ( ) { 4
5 char * q = "copy this" ;
C O
N FI
D E
N TI
A L
D R
A FT
44 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
6 char * p = malloc ( s izeo f ( *p ) * 1 0 ) ; 7 char * r = p ; 8
9 printf ("%s\n" , q ) ; 10 while ( *p++ = *q++) ; 11 *p = '\0' ; /* necessary ? no */ 12 printf ("%s\n" , r ) ; 13 }
4.21 Command-line Arguments
1 # include<s t d i o . h> 2 # include<s t d l i b . h> 3
4 i n t main ( i n t argc , char * argv [ ] ) { 5 i n t i ; 6
7 printf ("argc is %d\n" , argc ) ; 8
9 fo r (i = 0 ; i < argc ; i++) 10 printf ("argv[%1d] is %s\n" , i , argv [i ] ) ; 11
12 exit ( 0 ) ; 13 }
1 # include<s t d i o . h> 2 # include<s t d l i b . h> 3
4 i n t main ( i n t argc , char * * argv ) { 5
6 printf ("argc is %d\n" , argc ) ; 7
8 fo r ( ; *argv ; argv++) 9 printf ("Next argument is %s\n" , *argv ) ;
10
11 exit ( 0 ) ; 12 }
4.22 The argv Array for the Call a.out -wlc myfile
[RR03][p. 32]
C O
N FI
D E
N TI
A L
D R
A FT
4.27. C COMPILATION STEPS GRAPHICALLY 45
1000
1300
’m’ ’y’ ’f’ ’i’ ’l’ ’e’ ’\0’
0
1
2
1200
1300
1200
’−’ ’w’ ’l’ ’\0’’c’
3
’a’ ’.’ ’o’ ’u’ ’t’ ’\0’
1100
NULL
char* argv[] = char** argv
1000
1100
Figure 4.4: An argument vector (char** argv).
.i
compiles assembles linkspreprocesses
.o object code
a.out
.s assembly code
generates:
expanded source code
executable
(comments purged, macros expanded, declarations included)
option
gcc −c
gcc
gcc −S
gcc −E cpp
Figure 4.5: The key options to gcc graphically.
C O
N FI
D E
N TI
A L
D R
A FT
46 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
cpp f1.c f2.c main.c
. . .
.
.
.
.
.
.
.
.
f2.o
object code
f1.o main.o
linker
gcc f1.o f2.o main.o
a.out
executable
.
. . .
.
.
.
. 9A 01 00 00 10 00 4C 01 04 00 00 00
.
.
40 00 30 C0 2E 62 61 00 00 00 00 00
.
.
24 01 00 00 00 00 00 00 83 C0 0F 83
61 00 00 00 00 00 40 00 30 C0 2E 62
f2.cf1.c main.c
expanded C source files
int main() {
printf("..."); strlen("...");
f2.c
C source files
f1.c main.c
gcc −E f1.c f2.c main.c
stdio.h string.h
#DEFINE TEN 10
myfunction(TEN); /* comment */
int main() {
myfunction(10);
printf("..."); strlen("...");
gcc −S f1.c f2.c main.c
movl 2345, %esp call strlen
movl 10, %esp call myfunction
movl 1234, %esp call printf
libc.o
of printf definition definition
of strlen
input output
}
}
f2.s
assembly code
f1.s main.s
gcc −c f1.c f2.c main.c
assembler
preprocessor
#include <stdio.h> #include <string.h>
compiler
9A 01 00 00 10 00 4C 01 04 00 00 00
24 01 00 00 00 00
/usr/include/
stdio.h string.h
.
Figure 4.6: C compilation steps.
C O
N FI
D E
N TI
A L
D R
A FT
4.23. COMPILING A C PROGRAM IN UNIX 47
4.23 Compiling a C Program in UNIX
4.24 Compiling
4.25 C Compilation Steps Using gcc
4.26 The key options to gcc graphically
4.27 C Compilation Steps Graphically
4.28 file Command
4.29 Memory Management: Memory Allocation and Deal-
location
4.30 Conceptual Exercises for Chapter 4
Exercise 4.30.1: (2 points) (circle one) A C header (i.e., .h) file contains
(i) function definitions (ii) function declarations (iii) (i) & (ii) (iv) none of the above
Exercise 4.30.2: (2 points) (circle one) A C library contains
(i) function definitions (ii) function declarations (iii) (i) & (ii) (iv) none of the above
Exercise 4.30.3: Consider the following line of C code:
FILE* fptr = fopen ("input.txt", "r");
Draw the data structure to which fptr points and describe each field of it.
Exercise 4.30.4: To append output from an executable file pgm to a file data, enter:
C O
N FI
D E
N TI
A L
D R
A FT
48 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
a) pgm | data
b) pgm > data
c) pgm >> data
Exercise 4.30.5: [KP84, exercise 1-5, p. 31] Explain why the command line ls >ls.out causes ls.out to be included in the list of files.
Exercise 4.30.6: [KP84, exercise 1-5, p. 31] Explain the output of the com- mand line wc temp > temp. If you misspell a command name, as in the command line woh >temp, what happens?
Exercise 4.30.7: [KP84, exercise 1-7, p. 32–33] Explain the difference be- tween the command line who | sort and the command line who > sort.
Exercise 4.30.8: What does the following C code do? while (*p++ =
*q++);
Exercise 4.30.9: List, in order, the first four stages of compilation pre- sented.
Exercise 4.30.10: (true / false) Code containing system calls will always execute faster than the same code where the systems calls are replaced with analogous library calls.
Exercise 4.30.11: (true / false) A program containing system calls will al- ways execute faster than the same program where the systems calls are replaced with analogous library calls.
Exercise 4.30.12: (true / false) A dynamically linked executable will al- ways be larger than its statically linked analog.
Exercise 4.30.13: (true / false) A library function, such as printf, is part of the C language.
Exercise 4.30.14: Draw a diagram illustrating the logical layout of a pro- gram image in main memory. Be precise and complete. Clearly label all sections and aspects. Indicate in which direction each section of the mem- ory grows.
C O
N FI
D E
N TI
A L
D R
A FT
4.30. CONCEPTUAL EXERCISES FOR CHAPTER ?? 49
Exercise 4.30.15: Give the value of argc in a.out in the following com- mand line ./a.out < infile > outfile.
Exercise 4.30.16: What problem may occur with the following code?
1 char c ; 2
3 while ( (c = getchar ( ) != EOF ) { 4 . . . 5 }
Exercise 4.30.17: Is one of the following assignments incorrect in ANSI C? Explain.
1 s t r u c t node *p , *q ; 2
3 p = malloc ( 3 * s izeo f ( *p ) ) ; 4 q = ( s t r u c t node * ) malloc ( 3 * s izeo f ( s t r u c t node ) )
Exercise 4.30.18: A program once contained the following:
1 # include <math . h> 2 . . . 3 y = cos (x ) ; 4 . . .
and yet the definition of the cosine function was not found. What hap- pened?
Exercise 4.30.19: What is wrong with the following recovery?
1 printf ("Enter your age:\n" ) ; 2 while (scanf ("%d" &age ) < 1) 3 printf ("Error. Try again:\n" ) ;
Exercise 4.30.20: What output is generated by the following C program?
1 # include <s t d i o . h> 2 # include <s t r i n g . h> 3
C O
N FI
D E
N TI
A L
D R
A FT
50 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
4 main ( ) { 5 char * s = strdup ("ping" ) ; 6 char * p = strdup (s ) ; 7 char * r = p ; 8 strcpy (s , "pong" ) ; 9 while ( *p++ = *s++) ;
10 printf ("%s\n" , r ) ; 11 }
4.31 Programming Exercises for Chapter 4
Exercise 4.31.21: Write a complete C (not C++) program to read a stream of text from standard input until EOF and write to standard output only the total number of words read and the average number of words per line, in that order, where a word is defined as any string of characters except whitespace, and a line is defined as any string of non-whitespace characters ending in a newline. For instance,
1 $ ./a .out 2 Count the number
3 of words
4 and
5 the average
6 number of words
7 per line in this stream of
8 text . 9 ˆD
10 $
11 18 2 . 5 7 12 $
13 $ ./a .out < /etc/mime .types 14 1957 2 . 2 7
Do not store more than one character (byte) at a time in your program, and keep your program to approximately 10 lines of code.
Exercise 4.31.22: Write a complete C program which reads two integers from stdin, a base and an exponent, in that order, computes the value of the base raised to the exponent, and prints the resulting product to stdout. Do not give more than twenty lines of code and do not use a library function to implement raising the base to the exponent (i.e., code
C O
N FI
D E
N TI
A L
D R
A FT
4.31. PROGRAMMING EXERCISES FOR CHAPTER ?? 51
it from scratch). See the stdio(3), stdin(3), stdout(3), scanf(3), and printf(3) manpages for help.
Exercise 4.31.23: Write a complete C program which writes its command- line arguments (including the command name) to stdout, one per line. Do not use the [ or ] characters anywhere in your program. Hint: only five lines of code are necessary.
Exercise 4.31.24: Write a complete C program which accepts two integers as command line arguments, a base and an exponent, in that order, computes the value of the base raised to the exponent, and prints the resulting prod- uct to stdout. Do not give more than twenty lines of code and do not use a library function to implement raising the base to the exponent (i.e., code it from scratch).
Exercise 4.31.25: Write a complete C program which accepts only two files as command line arguments. The first file given at the command line con- tains only two positive integers: a base and an exponent, in that order, separated by whitespace. The program computes the value of the base raised to the exponent, and prints the resulting product to the file given by second command-line argument. This program does file I/O rather than standard I/O. You may not assume the files will exist and contain data as described above. The program must contain code to check for all pos- sible errors, including the absence of one or more of the command-line arguments, the absence of any of the files, and print all error messages to stderr. Do not give more than thirty lines of code and do not use a li- brary function to implement raising the base to the exponent (i.e., code it from scratch).
Exercise 4.31.26: Write a complete C program which accepts only three files as command line arguments. The first file given at the command line con- tains only a positive integer, the base, while the second file contains only a positive integer, the exponent. The program computes the value of the base raised to the exponent, and prints the resulting product to the file given by third command-line argument. This program does file I/O rather than standard I/O. You may not assume the files will exist and contain data as described above. The program must contain code to check for all possible errors, including the absence of one or more of the command-line
C O
N FI
D E
N TI
A L
D R
A FT
52 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
arguments, the absence of any of the files, and print all error messages to stderr. Do not give more than thirty lines of code and do not use a li- brary function to implement raising the base to the exponent (i.e., code it from scratch).
Exercise 4.31.27: (diff1.c) Implement a primitive version of the Linux file comparison program diff in C.
Requirements:
1. Your program must be written in C (not C++) and compile without errors or warnings using gcc.
2. Do not prompt for input.
3. The two input files are given on the command line using file I/O. For instance,
1 $ ./a .out file1 file2
4. Two files are identical if they match exactly character by character.
5. If the two input files are identical, do not print anything to standard output, but exit with a 0 exit status.
6. If the two input files are different, print the line numbers (the first line of each file is line 1) on which they differ, one per line. For instance,
1 $ ./a .out file1 file2 2 3 3 4 4 5 5 101 6 500 7 501 8 502 9 503
10 504 11 505
7. A file name of - stands for text read from the standard input.
8. As a special case, diff - - compares a copy of standard input to itself. Do not copy stdin to a file and then diff on that file.
C O
N FI
D E
N TI
A L
D R
A FT
4.31. PROGRAMMING EXERCISES FOR CHAPTER ?? 53
9. Normal program output must only be written to standard output.
10. Abnormal program output (e.g., error messages) must only be written to standard error.
11. Support the following command-line options:
• -l: ignore leading whitespace in the comparison, where whites- pace is any contiguous series of tabs or spaces.
• -t: ignore trailing whitespace in the comparison.
• -m: ignore intermediary whitespace in the comparison (i.e., whitespace neither at the beginning or the end of each line).
• -a: ignore all whitespace in the comparison.
12. All options must precede both input filenames.
13. Options can be given individually and in any order. For instance, -l -t, or in one stoke (e.g., -tl).
14. If no options are given, the comparison is exact.
15. If an invalid option or filename is given, your program must print the same error message diff would print to standard error in that particular situation and halt with the same non-zero exit status.
16. If any other option, valid or otherwise, is given with the -a option, your program must print the following error message to standard er- ror and halt with exit status 9:
1 $ ./a .out −t −a file1 file2 2 Option −a cannot be combined with any other options .
Hints: If designed properly, the program required to solve this homework problem should occupy no more than 200 lines of code. Furthermore, the interested reader is encouraged to investigate the getopt function (see man -s 3 getopt) to simplify parsing command-line options, and to factor command-line arguments from file arguments. The use of getopt is not required. If you are still getting acclimated to Linux and C, you should avoid the use getopt, and parse the command-line options manually.
Exercise 4.31.28: (diff1.go) Complete Programming Exercise 4.31.27 in Go (http://golang.com). You may find the webpage at http://
C O
N FI
D E
N TI
A L
D R
A FT
54 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
thenewstack.io/cli-command-line-programming-with-go/
on command-line processing in Go helpful. Also have a look at the following Go packages for a helpful functions to use: bufio (http:// golang.org/pkg/bufio/, fmt (https://golang.org/pkg/fmt/, strings (http://golang.org/pkg/strings), flag (https:// golang.org/pkg/flag/, os (https://golang.org/pkg/os/, log (http://golang.org/pkg/log/, and io (http://golang.org/ pkg/io/.
Exercise 4.31.29: In this exercise, you will manipulate C character strings, which are simply arrays of characters that are terminated by the ASCII NULL character (0x00, ’\0’).
(countsubsstdin.c) This program reads two mandatory and one op- tional inputs from standard input, each on a separate line, until EOF. Each of the two required inputs is a string. The second string will be searched for occurrences of the first string as a substring. The number of occurrences found will then be displayed to standard output. The presence of the op- tional third input -nooverlap informs the program that the substrings identified may not overlap.
If an incorrect number of inputs is provided, or if the optional third input- provided is anything other than -nooverlap, an appropriate usage mes- sage must be printed to standard error and the program must halt with exit status 1. If the first argument is the empty string (i.e., a string having length 0), print an error message to standard error and the program must halt with exit status 1.
Store these input strings on the heap (not the stack) so they can be of an arbitrary size.
The following are some sample, non-exhaustive test cases. Your program is expected to produce identical output. Do not prompt for input.
1 $ ./a .out 2 hehe
3 xxxheheheyyy
4 ˆD 5 2 6 $ ./a .out 7 hehe
C O
N FI
D E
N TI
A L
D R
A FT
4.31. PROGRAMMING EXERCISES FOR CHAPTER ?? 55
8 xxxheheheyyy
9 −nooverlap 10 ˆD 11 1 12 $ ./a .out 13 xexe
14 thexexexethe
15 ˆD 16 2 17 $ ./a .out 18 xexe
19 thexexexethe
20 −nooverlap 21 ˆD 22 1 23 $ ./a .out 24 xe
25 thexexexethe
26 −nooverlap 27 ˆD 28 3 29 $ ./a .out 30 he
31 thexexexethe
32 −nooverlap 33 ˆD 34 2 35 $ ./a .out 36 he
37 thexexexethe
38 ˆD 39 2 40 $ ./a .out 41 the
42 thexexexethe
43 2 44 ˆD 45 $ ./a .out 46 the
47 thexexexethe
48 −noproblem 49 ˆD 50 Usage : string1 string2 [−nooverlap] 51 $ echo $? 52 1 53 $ /a .out 54
C O
N FI
D E
N TI
A L
D R
A FT
56 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
55 thexexexethe
56 −noproblem 57 ˆD 58 Usage : string1 string2 [−nooverlap] 59 $ echo $? 60 1 61 $ ./a .out 62
63 thexexexethe
64 −nooverlap 65 ˆD 66 Search string cannot be empty ! 67 $ echo $? 68 1 69 $ ./a .out 70 thexexexethe
71 thexexexethe
72 ˆD 73 1 74 $ ./a .out 75 thexexexethe
76
77 ˆD 78 Usage : string1 string2 [−nooverlap] 79 $ ./a .out 80 thexexexethe_extra
81 thexexexethe
82 ˆD 83 0 84 $ ./a .out 85 xex
86 −nooverlap 87 ˆD 88 0 89 $ ./a .out 90 0 91 −nooverlap 92 ˆD 93 0
Keep your program to approximately 75 lines of code.
Exercise 4.31.30: (countsubsargs.c) This programming exercise is the same as Programming Exercise 4.31.29, except in their exercise the in- puts be command-line arguments. Specifically, this program expects two mandatory and one optional command-line arguments. Each of the two re-
C O
N FI
D E
N TI
A L
D R
A FT
4.31. PROGRAMMING EXERCISES FOR CHAPTER ?? 57
quired arguments is a string. The second string must be searched for occur- rences of the first string as a substring. The number of occurrences found must be written to standard output. The presence of the optional third ar- gument -nooverlap informs the program that the substrings identified may not overlap.
If an incorrect number of command-line arguments is provided, or if the optional third argument provided is anything other than -nooverlap, an appropriate usage message must be printed to standard error and the program must halt with exit status 1. If the first argument is the empty string (i.e., a string having length 0), print an error message to standard error and the program must halt with exit status 1.
The following are some sample, non-exhaustive test cases. Your program is expected to produce identical output. Do not prompt for input.
1 $ ./a .out hehe xxxheheheyyy 2 2 3 $ ./a .out hehe xxxheheheyyy −nooverlap 4 1 5 $ ./a .out xexe thexexexethe 6 2 7 $ ./a .out xexe thexexexethe −nooverlap 8 1 9 $ ./a .out xe thexexexethe −nooverlap
10 3 11 $ ./a .out he thexexexethe −nooverlap 12 2 13 $ ./a .out he thexexexethe 14 2 15 $ ./a .out the thexexexethe 16 2 17 $ ./a .out the thexexexethe −noproblem 18 Usage : string1 string2 [−nooverlap] 19 $ echo $? 20 1 21 $ /a .out "" thexexexethe −noproblem 22 Usage : string1 string2 [−nooverlap] 23 $ echo $? 24 1 25 $ ./a .out "" thexexexethe −nooverlap 26 Search string cannot be empty ! 27 $ echo $? 28 1
C O
N FI
D E
N TI
A L
D R
A FT
58 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
29 $ ./a .out thexexexethe thexexexethe 30 1 31 $ ./a .out thexexexethe 32 Usage : string1 string2 [−nooverlap] 33 $ ./a .out thexexexethe_extra thexexexethe 34 0 35 $ ./a .out xex −nooverlap 36 0 37 $ ./a .out 0 −nooverlap 38 0
Exercise 4.31.31: (removesubsstdin.c) This programming exercise is a modification of Programming Exercise 4.31.29. The read are the same, and the same errors should be handled in the same manner. The difference is that this program removes all occurrences of the first string in the second string, and the resulting string and the number of occurrences that were found/removed must be written to standard output.
Store these input strings on the heap (not the stack) so they can be of an arbitrary size.
The following are some sample, non-exhaustive test cases. Your program is expected to produce identical output. Do not prompt for input.
1 $ ./a .out 2 hehe
3 xxxheheheyyy
4 −nooverlap 5 ˆD 6 1 7 xxxheyyy
8 $ ./a .out 9 hehe
10 xxxheheheyyy
11 ˆD 12 2 13 xxxyyy
14 $ ./a .out 15 xx
16 xxxheheheyyy
17 ˆD 18 2 19 heheheyyy
20 $ ./a .out 21 yy
C O
N FI
D E
N TI
A L
D R
A FT
4.31. PROGRAMMING EXERCISES FOR CHAPTER ?? 59
22 xxxheheheyyy
23 −nooverlap 24 ˆD 25 1 26 xxxhehehey
27 $ ./a .out 28 qq
29 xxxheheheyyy
30 −nooverlap 31 ˆD 32 0 33 xxxheheheyyy
34 $ ./a .out 35 qq
36 ˆD 37 Usage : string1 string2 [−nooverlap] 38 $ echo $? 39 1 40 $ ./a .out 41
42
43 ˆD 44 Search string cannot be empty ! 45 $ echo $? 46 1 47 $ ./a .out 48 hello
49
50 ˆD 51 0 52
53 $
Exercise 4.31.32: (removesubsargs.c) This programming exercise is a modification of Programming Exercise 4.31.30. The command-line argu- ments expected are the same, and the same errors should be handled in the same manner. The difference is that this program must remove all oc- currences of the first string in the second string, and the resulting string and the number of occurrences that were found/removed must be written to standard output.
The following are some sample, non-exhaustive test cases. Your program is expected to produce identical output. Do not prompt for input.
C O
N FI
D E
N TI
A L
D R
A FT
60 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
1 $ ./a .out hehe xxxheheheyyy −nooverlap 2 1 3 xxxheyyy
4 $ ./a .out hehe xxxheheheyyy 5 2 6 xxxyyy
7 $ ./a .out xx xxxheheheyyy 8 2 9 heheheyyy
10 $ ./a .out yy xxxheheheyyy −nooverlap 11 1 12 xxxhehehey
13 $ ./a .out qq xxxheheheyyy −nooverlap 14 0 15 xxxheheheyyy
16 $ ./a .out qq 17 Usage : string1 string2 [−nooverlap] 18 $ echo $? 19 1 20 $ ./a .out "" "" 21 Search string cannot be empty ! 22 $ echo $? 23 1 24 $ ./a .out hello "" 25 0 26
27 $
Keep your program to approximately 50 lines of code.
Exercise 4.31.33: (allsubsstdin.c) This programming exercise is a modification of Programming Exercise 4.31.31. This program reads two a string and an integer n, in that order, one per line, from standard input. The program must determine and list all distinct substrings of length n that exist in the given string. In addition to listing the strings, your pro- gram must also list the number of occurrences of each substring, both with and without overlap. You might consider defining a function based on the solution to Programming Exercise 4.31.29 that can be called (twice) when outputting each string to provide the requisite information. If the num- ber of inputsis incorrect or if the second input represents an integer 6 0, an appropriate usage message must be printed to standard error and the program must halt with exit status 1.
Store these input strings on the heap (not the stack) so they can be of an
C O
N FI
D E
N TI
A L
D R
A FT
4.31. PROGRAMMING EXERCISES FOR CHAPTER ?? 61
arbitrary size.
The following are some sample, non-exhaustive test cases. Your program is expected to produce identical output. Do not prompt for input.
1 $ ./a .out 2 aaabccbcbcebcebfff
3 ˆD 4 Usage : string n (where n must be > 0) 5 $ echo $? 6 1 7 $ ./a .out 8 aaabccbcbcebcebfff
9 3 10 ˆD 11
12 Unique substrings of length 3 : 13
14 aaa / 1 / 1 15 aab / 1 / 1 16 abc / 1 / 1 17 bcc / 1 / 1 18 ccb / 1 / 1 19 cbc / 2 / 1 20 bcb / 1 / 1 21 bce / 2 / 2 22 ceb / 2 / 2 23 ebc / 1 / 1 24 ebf / 1 / 1 25 bff / 1 / 1 26 fff / 1 / 1 27 $ ./a .out 28 aaabccbcbcebcebfff
29 2 30 ˆD 31
32 Unique substrings of length 2 : 33
34 aa / 2 / 1 35 ab / 1 / 1 36 bc / 4 / 4 37 cc / 1 / 1 38 cb / 2 / 2 39 ce / 2 / 2 40 eb / 2 / 2 41 bf / 1 / 1 42 ff / 2 / 1
C O
N FI
D E
N TI
A L
D R
A FT
62 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
43 $ ./a .out 44 aaabccbcbcebcebfff
45 4 46 ˆD 47
48 Unique substrings of length 4 : 49
50 aaab / 1 / 1 51 aabc / 1 / 1 52 abcc / 1 / 1 53 bccb / 1 / 1 54 ccbc / 1 / 1 55 cbcb / 1 / 1 56 bcbc / 1 / 1 57 cbce / 1 / 1 58 bceb / 2 / 1 59 cebc / 1 / 1 60 ebce / 1 / 1 61 cebf / 1 / 1 62 ebff / 1 / 1 63 bfff / 1 / 1 64 $ ./a .out 65 aaabccbcbcebcebfff
66 10 67 ˆD 68
69 Unique substrings of length 1 0 : 70
71 aaabccbcbc / 1 / 1 72 aabccbcbce / 1 / 1 73 abccbcbceb / 1 / 1 74 bccbcbcebc / 1 / 1 75 ccbcbcebce / 1 / 1 76 cbcbcebceb / 1 / 1 77 bcbcebcebf / 1 / 1 78 cbcebcebff / 1 / 1 79 bcebcebfff / 1 / 1 80 $ ./a .out 81 aaabccbcbcebcebfff
82 25 83 ˆD 84
85 Unique substrings of length 2 5 : 86 $
Exercise 4.31.34: (allsubsargs.c) This programming exercise is a mod- ification of Programming Exercise 4.31.32. This program expects two
C O
N FI
D E
N TI
A L
D R
A FT
4.31. PROGRAMMING EXERCISES FOR CHAPTER ?? 63
command-line arguments: the first is a string, the second a number n. The program must determine and list all distinct substrings of length n that ex- ist in the given string. In addition to listing the strings, your program must also list the number of occurrences of each substring, both with and with- out overlap. You might consider defining a function based on the solution to ... countsubs.c that can be called (twice) when outputting each string to provide the requisite information. If the number of command-line ar- guments is incorrect or if the second command-line argument represents an integer 6 0, an appropriate usage message must be printed to standard error and the program must halt with exit status 1.
The following are some sample, non-exhaustive test cases. Your program is expected to produce identical output. Do not prompt for input.
1 $ ./a .out aaabccbcbcebcebfff 2 Usage : string n (where n must be > 0) 3 $ echo $? 4 1 5 $ ./a .out aaabccbcbcebcebfff 3 6
7 Unique substrings of length 3 : 8
9 aaa / 1 / 1 10 aab / 1 / 1 11 abc / 1 / 1 12 bcc / 1 / 1 13 ccb / 1 / 1 14 cbc / 2 / 1 15 bcb / 1 / 1 16 bce / 2 / 2 17 ceb / 2 / 2 18 ebc / 1 / 1 19 ebf / 1 / 1 20 bff / 1 / 1 21 fff / 1 / 1 22 $ ./a .out aaabccbcbcebcebfff 2 23
24 Unique substrings of length 2 : 25
26 aa / 2 / 1 27 ab / 1 / 1 28 bc / 4 / 4 29 cc / 1 / 1 30 cb / 2 / 2
C O
N FI
D E
N TI
A L
D R
A FT
64 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
31 ce / 2 / 2 32 eb / 2 / 2 33 bf / 1 / 1 34 ff / 2 / 1 35 $ ./a .out aaabccbcbcebcebfff 4 36
37 Unique substrings of length 4 : 38
39 aaab / 1 / 1 40 aabc / 1 / 1 41 abcc / 1 / 1 42 bccb / 1 / 1 43 ccbc / 1 / 1 44 cbcb / 1 / 1 45 bcbc / 1 / 1 46 cbce / 1 / 1 47 bceb / 2 / 1 48 cebc / 1 / 1 49 ebce / 1 / 1 50 cebf / 1 / 1 51 ebff / 1 / 1 52 bfff / 1 / 1 53 $ ./a .out aaabccbcbcebcebfff 10 54
55 Unique substrings of length 1 0 : 56
57 aaabccbcbc / 1 / 1 58 aabccbcbce / 1 / 1 59 abccbcbceb / 1 / 1 60 bccbcbcebc / 1 / 1 61 ccbcbcebce / 1 / 1 62 cbcbcebceb / 1 / 1 63 bcbcebcebf / 1 / 1 64 cbcebcebff / 1 / 1 65 bcebcebfff / 1 / 1 66 $ ./a .out aaabccbcbcebcebfff 25 67
68 Unique substrings of length 2 5 : 69 $
Exercise 4.31.35: Write complete C program that allocates memory for the structure depicted in the following figure, loads it with the strings shown, prints it (one string per line), and deallocates it, without any memory leaks.
C O
N FI
D E
N TI
A L
D R
A FT
4.31. PROGRAMMING EXERCISES FOR CHAPTER ?? 65
char* stringsarr[] = char** stringsarr
0
1
2
1200
1300
1200
3
1100
NULL
1000
1100
1000
’a’ ’\0’
’\0’’b’ 1300
’f’ ’\0’’d’ ’e’
’c’
Exercise 4.31.36: (parsestring.c) Write a C program that does the fol- lowing until EOF: i) reads a line from standard input, including an empty line, with getline, ii) tokenizes the line based on spaces and tabs, iii) builds an array of character arrays (pointers) to store the token of the line, iv) write each token to standard output from that array of character pointers, and v) frees the array of character pointers. For instance, if the input line is one two three four, the structure built is
1014
1000
3
2
1
0
1000
1100
1100
1000
char* parsedstring[] = char** parsedstring
char* line
’o’ n
’n’ ’e’ ’\0’ ’t’ ’w’ ’o’ ’\0’ ’t’ ’h’ ’r’ ’e’ ’e’ ’\0’ ’f’ ’o’ ’u’ ’r’ ’\0’
1004 1008 1014
NULL 4
1008
1004
and the output is:
1 :one : 2 :two : 3 :three : 4 :four :
The following are some sample, non-exhaustive test cases. Your program is expected to produce identical output. Do not prompt for input.
1 $ ./a .out
C O
N FI
D E
N TI
A L
D R
A FT
66 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
2 one two three four
3 :one : 4 :two : 5 :three : 6 :four : 7 apple orange pear lemon lime
8 :apple : 9 :orange :
10 :pear : 11 :lemon : 12 :lime : 13
14 −a −b −c −d −e −f −h 15 :−a : 16 :−b : 17 :−c : 18 :−d : 19 :−e : 20 :−f : 21 :−h : 22 ˆD 23 $
24 $ c a t input .txt 25 one two three four
26 apple orange pear lemon lime
27
28 −a −b −c −d −e −f −h 29 $
30 $ ./a .out < input .txt 31 :one : 32 :two : 33 :three : 34 :four : 35 :apple : 36 :orange : 37 :pear : 38 :lemon : 39 :lime : 40
41 :−a : 42 :−b : 43 :−c : 44 :−d : 45 :−e : 46 :−f : 47 :−h :
Note that you must build an array of character pointers to the token; it is
C O
N FI
D E
N TI
A L
D R
A FT
4.32. PROGRAMMING PROJECT FOR CHAPTER ?? 67
not enough simply to produce the correct output. For extra credit, make no more than one pass through the input string. Keep your program to approximately 50 lines of code.
4.32 Programming Project for Chapter 4
Implement the Linux wc command in C.
Requirements:
a) The program must be written in C (not C++) and compile without errors or warnings using gcc on a Linux system.
b) Your version of wc must behave exactly like the wc command installed on our system in all aspects with the following exception. You must only implement the -l, -w, and -m options. It is your responsibility to mine the behavior of wc on a Linux system and replicate it in your program (see the wc manpage and experiment with the command thor- oughly). However, the following is some guidance to get you started in thinking about the behavior of wc:
i) All options must precede all input filenames.
ii) If no input files are given as command-line arguments, wc defaults to standard input.
iii) wc always writes to standard output.
iv) Options can be given individually and in any order (e.g., -m -l or -l -m) or in one stoke (e.g., -lm or -ml).
v) The order in which the options are supplied has no effect on the order in which the counters are displayed. The number of lines are always printed first, followed by the number of words and charac- ters.
vi) If no options are given, wc prints the number of lines, words, and characters.
vii) If an invalid option or filename is given, your program must print the same error message wc would print to standard error in that particular situation and halt with the same non-zero exit status.
viii) Use field-width and precision in your formatted output.
C O
N FI
D E
N TI
A L
D R
A FT
68 CHAPTER 4. INTRODUCTION TO C PROGRAMMING:
SYSTEM LIBRARIES AND I/O
Hints: If designed properly, the program required to solve this project should occupy no more than 150 lines of code. Furthermore, the inter- ested student is encouraged to investigate the getopt function (see man -s 3 getopt) to simplify parsing command-line options, and to factor command-line arguments from file arguments. The use of getopt is not required.
Sample test data: There is a transcript of a Linux session on the companion website which illustrates the execution the wc command on several test cases. The input files used in the examples actually live on a Linux system and you are encouraged to test your program with them for purposes of comparison. These test cases are not exhaustive.
4.33 Thematic Take-Aways
4.34 Chapter Summary
4.35 Key Terms
4.36 Bibliographic Notes
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 5
Compiling C in Linux
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
5.1 Chapter Objectives
• Establish an understanding of compilation management and make.
• Establish an understanding of configuration management and RCS (rcs).
5.2 Compiling C
5.2.1 Overview
A compiler was originally a program that ‘compiled’ subroutines [a link-loader]. When in 1954 the combination ‘algebraic com- piler’ came into use, or rather misuse, the meaning of the term had already shifted to the present one [BE75].
69
C O
N FI
D E
N TI
A L
D R
A FT
70 CHAPTER 5. COMPILING C IN LINUX
3 main data sections/regions
address
low address
realloc
arrayofints
210
0 1 2 3 4 5 6 7 8 9
210
40 bytes
argc argv ,,
program text
initialized static data
command−line arguments and environment variables
heap
stack
int* arrayofints;e.g.,
e.g.,
activation records for function calls
return address local variables
arguments return value
float rate = 3.1;
3.1
rate
x y
a b
e.g., deallocations using free e.g., free(arrayofints);
arrayofints = malloc(sizeof(*arrayofints)*10);
saved registers, automatic variables) (return address, parameters,
uninitialized static data global section
& environment
dynamic memory allocations from malloc family
five many section of a (C) program command−line arguments
environment variables stack: (local variales)
heap: (dynamically−allocated memory)
global section: (global variales)
program text
high
Figure 5.1: Logical layout of program image.
5.2.2 Static vs. Dynamic Linking
5.2.3 More on Compiling with gcc
5.2.4 Process
[RR03][p. 24]
[RR03][p. 16]
5.2.5 Process Termination
5.2.6 NULL Pointer
5.2.7 extern Modifier in C
1 /* x . c */ 2
3 i n t x = 1 0 ; 4 # include<s t d i o . h> 5
6 /* main . c */ 7
8 extern i n t x ;
C O
N FI
D E
N TI
A L
D R
A FT
5.2. COMPILING C 71
return address
unused
1020
1016
1012
1009
1000top of stack
base 1024
12 bytes a
x
saved frame pointer
Figure 5.2: Activation record.
9
10 main ( ) { 11 printf ("%d\n" , x ) ; 12 }
5.2.8 Conditional Compilation
1 # include "local.h"
2
3 /* we would normally indent the body of cond i t ional , 4 but not permitted here */ 5 # i f vax | | u3b | | u3b5 | | u3b2 6 # define MAGIC 330 7 # else
8 # define MAGIC 500 9 # endif
10
11 # i f d e f LIMIT 12 #undef LIMIT 13 # endif
14 # define LIMIT 1000 15
16 /* when re turn type omitted , i n t assumed */ 17 f ( ) { 18 /* allowed to indent here */ 19 . . . 20 /* to use debugging statements , # d e f ine DEBUG 21 anywhere be fore # i f d e f f i n d s i t ; 22 or use gcc −DDEBUG pgm. c */
C O
N FI
D E
N TI
A L
D R
A FT
72 CHAPTER 5. COMPILING C IN LINUX
23 # i f d e f DEBUG 24 printf ("x is %d\n" , x ) ; 25 printf ("y is %d\n" , y ) ; 26 # endif
27 /* allowed to indent here */ 28 . . . 29 }
[C][4-27] 5.2.9 Error Handling
5.2.10 Debugging
5.2.11 Conceptual Exercises for Section 5.2
Exercise 5.2.1: Explain what it means to link a program in the context of C programming. Specifically, what is linked to what? Be complete.
Exercise 5.2.2: Give one word that provides a better description than the word linking of what happens when a program is linked.
Exercise 5.2.3: Give one word that provides a better description than the word compiliation of what happens when a program is compiled.
Exercise 5.2.4: (circle one) (true / false) A dynamically linked executable will always be larger than its statically linked analog.
5.2.12 Programming Exercises for Section 5.2
Exercise 5.2.1:
5.3 Building a Library in C
5.3.1 Conceptual Exercises for Section 5.3
Exercise 5.3.1: Libraries in C Describe in detail the process involved in making a library in C. Specifi- cally,
a) What is a library? What does it contain? Be specific.
b) In creating a library, one must create at least two source files. What are those files called? What are their file extensions?
C O
N FI
D E
N TI
A L
D R
A FT
5.3. BUILDING A LIBRARY IN C 73
c) Of those two source files, one is given to the user as is. Which one? What do you do with the other file and how it is supplied to the user?
d) Assume the library being built includes an embedded data structure whose implementation details are to be hidden from user, but whose functionality is to be exposed. What is such a data structure called?
e) How do you use the facilities of C to implement the two requirements of the data structure given in the prior question. Be specific, and be technical.
f) Give the series of command lines that must be invoked to make the program a statically-linked library (which can be used by others) once the library is coded, but not yet compiled and packaged. Be complete. Do not skip steps.
g) What is the program called that the user of the library writes?
h) Name the two shell environment variables automatically examined (if set) by gcc to locate libraries and header files. Indicate which variable is used for which.
i) Assume that these two variables are not set and the library and header files are not in the current directory, but available in ∼/lib and ∼/include, respectively. Give a single command-line to compile and statically link a source program example.c to a library named stack.
5.3.2 Programming Exercises for Section 5.3
Exercise 5.3.2: Complete Programming Exercise 4.31.22 but this time in- vokes the pow function in the math library to perform the computation. Include a comment at the top of your program given the command line you used to compile the program illustrating how the math library was explicitly linked to your program. See the pow(3) manpage for help.
C O
N FI
D E
N TI
A L
D R
A FT
74 CHAPTER 5. COMPILING C IN LINUX
Table 5.1: Storage class summary. Class Scope Life Storage Init. arr/str Default value
automatic block block active stack yes undefined register (1) block block active machine reg. no undefined (2) external (3) decl. to eof permanent data area yes 0 static external (4) decl. to eof permanent data area yes 0 static internal block permanent data area yes 0
Table 5.2: static modifier summary. Where declared static modifies static applied? Storage class Linkage class
inside a function storage class yes static none inside a function storage class no automatic none outside any function linkage class yes static internal outside any function linkage class no static external
5.4 More topics in C: Storage Classes, Thread-safe Func-
tions, and Macros
5.4.1 Declarations and Definitions
5.4.2 Storage and Linkage Classes
[C]
5.4.3 static Modifier in C
[RR03][p. 814]
[RR03]
5.4.4 Summary of static Reserved Word
• static keyword used in a variable declaration:
– outside of any function:
Table 5.3: static modifier summary. static modifies static applied? Linkage class linkage class yes internal linkage class no external
C O
N FI
D E
N TI
A L
D R
A FT
5.4. MORE TOPICS IN C: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 75
1 /* x i s s t a t i c data and a l l o c a t e d in the s t a t i c region of ←֓ the memory image ,
2 and i t has e x t e r n a l l ink ag e */ 3
4 /* l ink ag e c l a s s : ? 5 s torag e c l a s s : ? */ 6 i n t x ; 7
8 /* x i s STILL s t a t i c data , but now has i n t e r n a l l ink ag e ←֓ and thus cannot be
9 re fe re nce d by another module ( . o f i l e ) */ 10
11 /* akin to ” p r i v a t e ” in C++ or Java */ 12
13 /* l ink ag e c l a s s : ? 14 s torag e c l a s s : ? */ 15 s t a t i c i n t x ;
– inside of any function:
1 void f ( ) { 2
3 /* x i s a l l o c a t e d on the s tack ( i . e . , i t i s not s t a t i c ←֓ data ) and
4 t h i s p a r t i c u l a r x can only be re fe re nce d within the ←֓ body of
5 t h i s funct ion */ 6
7 /* l ink ag e c l a s s : ? 8 s torag e c l a s s : ? */ 9 i n t x ;
10
11 /* x i s now s t a t i c data and a l l o c a t e d in the s t a t i c ←֓ region of the memory
12 image */ 13
14 /* l ink ag e c l a s s : ? 15 s torag e c l a s s : ? */ 16 s t a t i c i n t x ; 17 }
• static keyword used in a function definition/declaration:
1 /* f ( ) has e x t e r n a l l ink ag e and thus can be re fe re nce d by ←֓ another module
C O
N FI
D E
N TI
A L
D R
A FT
76 CHAPTER 5. COMPILING C IN LINUX
’a’ ’.’ ’o’ ’u’ ’t’
1000
’−’ ’w’ ’l’ ’c’ ’y’ ’f’ ’i’ ’l’ ’e’ ’\0’’m’’ ’ ’ ’
1000
t
Figure 5.3: strtok before.
2 ( i . e . , . o f i l e ) */ 3
4 /* l ink ag e c l a s s : ? 5 s torag e c l a s s : ? */ 6 void f ( ) ; 7
8 /* f ( ) has i n t e r n a l l ink ag e and thus cannot be re fe re nce d by ←֓ another module
9 ( i . e . , . o f i l e ) */ 10
11 /* l ink ag e c l a s s : ? 12 s torag e c l a s s : ? */ 13 void s t a t i c f ( ) ;
5.4.5 C Libraries
• interface (.h header file) contains function declarations and is implementation-neutral.
• implementation (compiled .o object file or archived .a or .so library file) contains function definitions.
• application or client (.c source file ususally containing main()) con- tains invocations to functions in implementation and is implementation- neutral.
The underlying implementation can change without disrupting the client code as long as the contractual signature of each function declaration in the interface remains unchanged.
5.4.6 Synchronization
5.4.7 Thread Safe Functions
[RR03][p. 36]
C O
N FI
D E
N TI
A L
D R
A FT
5.4. MORE TOPICS IN C: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 77
’a’ ’.’ ’o’ ’u’ ’t’
1000
’−’ ’w’ ’l’ ’c’ ’y’ ’f’ ’i’ ’l’ ’e’ ’\0’’m’
1000
t
’\0’ ’\0’
1006 1011
Figure 5.4: strtok after.
[RR03][p.36]
5.4.8 makeargv
5.4.9 Self-study
5.4.10 Macros: The #define Preprocessor Directive
1 # define SQUARE(X) ( ( X) * ( X) ) 2
3 # define PRINT(A, B ) p r i n t f (#A ": %d, " #B ": %d\n" , A, B ) 4
5 main ( ) { 6 i n t x = SQUARE ( 3 ) ; 7 i n t y = SQUARE (x+1) ; 8 PRINT (x , y ) ; 9 }
1 main ( ) { 2 i n t x = ( ( 3 ) * ( 3 ) ) ; 3 i n t y = ( (x+1) * (x+1) ) ; 4 printf ("x" ": %d, " "y" ": %d\n" , x , y ) ; 5 }
5.4.11 Macros vs. Functions
5.4.12 Conceptual Exercises for Section 5.4
Exercise 5.4.1: Recall that r strtok is the thread-safe version of strtok. What does thread-safe mean?
Exercise 5.4.2: What is the lifetime of an internal static variable?
Exercise 5.4.3: Consider the following C module [RR03][pp. 41–42]:
C O
N FI
D E
N TI
A L
D R
A FT
78 CHAPTER 5. COMPILING C IN LINUX
1 /* a funct ion which s o r t s an array of i n t e g e r s and 2 counts the number of in te rchang e s made in the process */ 3 s t a t i c i n t count = 0 ; 4
5 i n t x = 1 0 ; 6
7 /* re turn t rue i f in te rchang e s are made */ 8 s t a t i c i n t onepass ( i n t a [ ] , i n t n ) { 9 i n t i ;
10 i n t interchanges = 0 ; 11 i n t temp ; 12
13 fo r (i = 0 ; i < n−1; i++) 14 i f (a [i ] > a [i+1] ) { 15 temp = a [i ] ; 16 a [i ] = a [i+ 1 ] ; 17 a [i+1] = temp ; 18 interchanges = 1 ; 19 count++; 20 } 21 return interchanges ; 22 } 23
24 void clearcount ( ) { 25 count = 0 ; 26 } 27
28 i n t getcount ( ) { 29 return count ; 30 } 31
32 /* s o r t a in ascending order */ 33 void bubblesort ( i n t a [ ] , i n t n ) { 34 i n t i ; 35 fo r (i = 0 ; i < n−1; i++) 36 i f ( ! onepass (a , n−i ) ) 37 break ; 38 }
a) Give the storage class of the count variable (line 3).
b) Give the linkage class of the count variable (line 3).
c) Give the storage class of the onepass function (line 8).
d) Give the linkage class of the onepass function (line 8).
e) Give the storage class of the temp variable (line 11).
C O
N FI
D E
N TI
A L
D R
A FT
5.4. MORE TOPICS IN C: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 79
f) Give the linkage class of the temp variable (line 11).
g) Give the storage class of the x variable (line 5).
h) Give the linkage class of the x variable (line 5).
i) Give the storage class of the getcount function (line 27).
j) Give the linkage class of the getcount function (line 27).
Exercise 5.4.4: Consider the following Go package.
1 package bubblesort
2
3 /* a package which s o r t s an array of i n t e g e r s and 4 counts the number of in te rchang e s made in the process */ 5
6 var count = 0 7
8 /* re turn t rue i f in te rchang e s are made */ 9 func onepass (a [ ] int , n i n t ) i n t {
10 interchanges := 0 11 var temp i n t
12
13 fo r i := 0 ; i < n−1; i++ { 14 i f a [i ] > a [i+1] { 15 temp = a [i ] 16 a [i ] = a [i+1] 17 a [i+1] = temp 18 interchanges = 1 19 count = count + 1 20 } 21 } 22 return interchanges
23 } 24
25 func Clearcount ( ) { 26 count = 0 27 } 28 func Getcount ( ) i n t { 29 return count
30 } 31
32 /* s o r t a in ascending order */ 33 func Bubblesort (a [ ] int , n i n t ) { 34 fo r i := 0 ; i < n−1; i++ { 35 i f onepass (a , n−i ) == 0 { 36 break
C O
N FI
D E
N TI
A L
D R
A FT
80 CHAPTER 5. COMPILING C IN LINUX
37 } 38 } 39 }
a) Give the storage class of the count variable (line 6).
b) Give the linkage class of the count variable (line 6).
c) Give the storage class of the onepass function (line 9).
d) Give the linkage class of the onepass function (line 9).
e) Give the storage class of the temp variable (line 11).
f) Give the linkage class of the temp variable (line 11).
g) Give the storage class of the Getcount function (line 28).
h) Give the linkage class of the Getcount function (line 28).
Exercise 5.4.5: Unlike C, Go does not have a static keyword: a funca- tion name or variable whose identifier starts with a lower case letter has internal linkage, while one starting with an upper case letter has external linkage. However, how can we acheived a variable local to a function with static (i.e., global) storage?
Exercise 5.4.6: The following program will compile, but will not link. Cor- rect it so that it compiles and links successfully.
5.4.13 Programming Exercises for Section 5.4
Exercise 5.4.7: [RR03, pp. 55–56] Implement a logging library which is similar to the list object developed in this chapter. The logging utility al- lows the caller to save a message at the end of a list. The logger also records the time which the message was logged.
You can use the logging facility to save the messages which were printed by some of your programs, or for program debugging and testing.
Requirements:
a) Use the following header file loggerlib.h for your logging facility.
C O
N FI
D E
N TI
A L
D R
A FT
5.4. MORE TOPICS IN C: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 81
1 # include <time . h> 2
3 typedef s t r u c t data_struct { 4 time_t time ; 5 char * string ; 6 } data_t ; 7
8 i n t addmsg (data_t data ) ; 9 void clearlog ( void ) ;
10 char * getlog ( void ) ; 11 i n t savelog ( char * filename ) ;
b) The data t structure and the addmsg function have the roles described in class. Recall that addmsg copies the node and inserts it at the end of the list.
c) The savelog function saves the logged messages to a disk file.
d) The clearlog function releases all the storage which has been allo- cated for the logged messages and empties the list of logged messages.
e) The getlog function allocates enough space for a string containing the entire log, copies the log into this string, and returns a pointer to the string. It is the responsibility of the calling program to free this memory when necessary.
f) If successful, addmsg and savelog return 0. If unsuccessful, addmsg and savelog return -1.
g) A successful getlog call returns a pointer to the log string. An unsuc- cessful getlog call returns NULL.
h) The functions addmsg, savelog, and getlog set errno on failure. You must explicitly set errno for all errors. In other words, do not rely on the fact that the function which fails may set errno automatically for you. Common errors include exceeding available memory or file I/O open/close, read/write errors. See the GNU webpage for libc for a list error codes which are #defined in error.h (e.g., use ENOMEM for the former and EIO for the latter errors above).
i) Use the following format for the getlog and savelog output, where [ ] represents one single space character:
Time:[ ]MM/DD/YY[ ]HH/MM/SS\n
Message:[ ]This is message 1\n
C O
N FI
D E
N TI
A L
D R
A FT
82 CHAPTER 5. COMPILING C IN LINUX
\n
Time:[ ]MM/DD/YY[ ]HH/MM/SS\n
Message:[ ]This is message 2\n
\n
...
...
j) The following programs demonstrates how to format the time in MM/DD/YY[ ]HH/MM/SS format:
1 # include<s t d i o . h> 2 # include<time . h> 3
4 main ( ) { 5 time_t t ; 6 char * s = malloc ( s izeo f ( *s ) * 1 9 ) ; 7
8 i f (time (&t ) == −1) 9 return −1;
10
11 s t r u c t tm* loct = localtime (&t ) ; 12 strftime (s , 18 , "%x %X " , loct ) ; 13 printf ("%s\n" , s ) ; 14 }
k) If an application tries to invoke savelog on an empty list object, do not write any thing to the data file (do not even open and create it).
l) If an application tries to invoke getlog on an empty list object, simply return NULL (the empty string). It is then the caller’s responsibility to perform error checking, and check the value of the char* returned (e.g., before printing it) to make sure it points to valid memory. It might be a good idea to define a static isempty() function.
m) Never allocate more memory than necessary for anything.
n) All implementation details must be hidden from any application which uses the logging library.
o) Your program must be written in C (not C++) and compile without er- rors or warnings using gcc on our system.
p) Use the following skeleton for loggerlib.c.
C O
N FI
D E
N TI
A L
D R
A FT
5.4. MORE TOPICS IN C: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 83
1 # include <s t d l i b . h> 2 # include <s t r i n g . h> 3 # include "loggerlib.h"
4
5 typedef s t r u c t list_struct { 6 data_t item ; 7 s t r u c t list_struct* next ; 8 } log_t ; 9
10 s t a t i c log_t* headptr = NULL ; 11 s t a t i c log_t* tailptr = NULL ; 12
13 i n t addmsg (data_t data ) { 14 return 0 ; 15 } 16
17 void clearlog ( void ) { 18 } 19
20 char * getlog ( void ) { 21 return NULL ; 22 } 23
24 i n t savelog ( char * filename) { 25 return 0 ; 26 }
If designed properly, the program required to solve this exercise should occupy no more than 200 lines of code.
Sample Application
The source code files logapp.c and logapplib.c constitute a sample application for the logging library developed in this assignment and can be used for purposes of testing. Remember, your library must work in any application which conforms to the prototypes of the services which the logging library provides.
Exercise 5.4.8: Complete Programming Exercise 5.4.7 in Go subject only to the following modifications.
Requirements:
a) Your logging facility must support the follow interface (loggerlib.go):
C O
N FI
D E
N TI
A L
D R
A FT
84 CHAPTER 5. COMPILING C IN LINUX
1 type Data_t s t r u c t { 2 Logged_time time .Time 3 Str string
4 } 5
6 type Log_t s t r u c t { 7 item Data_t
8 next *Log_t 9 }
10
11 // Publ ic f u n c t i o n s 12 func Addmsg (data Data_t ) ( int ,error ) 13 func Clearlog ( ) 14 func Getlog ( ) (string ,error ) 15 func Savelog (filename string ) error
b) The Data t structure and the Addmsg function have the roles de- scribed as in Programming Exercise 5.4.7. Recall that Addmsg copies the node and inserts it at the end of the list.
c) The Savelog function writes the logged messages to a disk file.
d) If successful, Savelog returns nil. If unsuccessful, Savelog returns err.
e) If an application tries to invoke Savelog on an empty list object, do not write any data to the disk file; do not even open and create it.
f) The Clearlog function releases all the storage which has been allo- cated for the logged messages and empties the list of logged messages.
g) The Getlog function copies the entire log into a string, and returns a string,error.
h) If successful, Getlog returns the log string,error. If unsuccessful, Getlog returns "",errors.New("filled in with appropriate error message").
i) If an application tries to invoke Getlog on an empty list object, simply return "",errors.New("filled in with appropriate error message") (the empty string). It is then the caller’s responsibil- ity to perform error checking, and check the value of the string returned (e.g., before printing it) to make sure it points to valid memory. You may want to define an isempty() function.
j) If successful, Addmsg returns 0,nil. If unsuccessful, Addmsg
C O
N FI
D E
N TI
A L
D R
A FT
5.4. MORE TOPICS IN C: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 85
returns -1,errors.New("filled in with appropriate error message").
k) Use the following format for the output of Getlog and Savelog, where [ ] represents one single space character:
Time:[ ]MM/DD/YYYY[ ]HH:MM:SS\n
Message:[ ]This is message 1\n
\n
Time:[ ]MM/DD/YYYY[ ]HH:MM:SS\n
Message:[ ]This is message 2\n
\n
...
...
l) Do not exit from functions. Instead, return an error value to allow the calling program flexibility in handling the error.
m) Your program must be written in Go and compile without errors or warnings using go build on a Linux system.
n) Use the following skeleton for loggerlib.go, also available at http://perugini.cps.udayton.edu/teaching/books/SPUC/
www/files/loggerlib.go.
1 package loggerlib
2
3 import ( 4 "time"
5 "errors" ) 6
7 type Data_t s t r u c t { 8 Logged_time time .Time 9 Str string
10 } 11
12 type Log_t s t r u c t { 13 item Data_t
14 next *Log_t 15 } 16
17 // global , p r i v a t e v a r i a b l e s 18 var headptr *Log_t
C O
N FI
D E
N TI
A L
D R
A FT
86 CHAPTER 5. COMPILING C IN LINUX
19 var tailptr *Log_t 20
21 func Addmsg (data Data_t ) ( int ,error ) { 22 . . . 23 } 24
25 func Clearlog ( ) { 26 . . . 27 } 28
29 func Getlog ( ) string { 30 . . . 31 } 32
33 func Savelog (filename string ) error { 34 . . . 35 }
o) Use the directory structure depicted in the following diagram for this library:
$GOPATH
src/ pkg/
loggerlib/ linux_amd64/
loggerlib/logapp/ loggerlib/
logapp* logapp.go logapp_helperfuns/ loggerlib.go logapp/ loggerlib.a
logapp_helperfuns.alogapp_helperfuns.go
If designed properly a priori, the program required to solve this exercise should occupy no more than 150 lines of code.
The following program demonstrates one way to format the time in Go in MM/DD/YYYY[ ]HH/MM/SS format:
1 package main
2
3 import (
C O
N FI
D E
N TI
A L
D R
A FT
5.4. MORE TOPICS IN C: STORAGE CLASSES, THREAD-SAFE FUNCTIONS, AND MACROS 87
4 "fmt"
5 "time"
6 "strings"
7 ) 8 func main ( ) { 9 /* formats curre nt system time as ”MM/DD/YYYY[ ]HH:MM: SS” */
10
11 var timestr string
12 var months map [string ]string 13
14 months = make (map [string ]string ) 15
16 months ["Jan" ] = "01" ; months ["Feb" ] = "02" ; months ["Mar" ] = "03" 17 months ["Apr" ] = "04" ; months ["May" ] = "05" ; months ["Jun" ] = "06" 18 months ["Jul" ] = "07" ; months ["Aug" ] = "08" ; months ["Sep" ] = "09" 19 months ["Oct" ] = "10" ; months ["Nov" ] = "11" ; months ["Dec" ] = "12" 20
21 current_time := time .Now ( ) .Local ( ) 22 const layout = "Jan 2 2006 15:04:05" 23 timeslice := strings .Split (current_time .Format (layout ) , " " ) 24 timestr = months [timeslice [ 0 ] ] + "/" 25
26 i f len (timeslice [ 1 ] ) == 1 { 27 timestr += "0" 28 } 29 timestr += timeslice [ 1 ] + "/" + timeslice [ 2 ] + " " + timeslice [ 3 ] 30 fmt .Println (timestr ) 31 }
Sample application
The source code files logapp.go and logapp helperfuns.go, available at http://perugini.cps.udayton.edu/teaching/ books/SPUC/www/files/logapp.go.txt and http://perugini. cps.udayton.edu/teaching/books/SPUC/www/files/logapp_
helperfuns.go.txt, respectively, constitute a sample application for the logging library developed in this exercise and can be used for purposes of testing. These files must not be modified at all. Remember, your library must work in any application which conforms to the prototypes of the services which the logging library provides.
C O
N FI
D E
N TI
A L
D R
A FT
88 CHAPTER 5. COMPILING C IN LINUX
button.o
button.c window.h
window.o
window.c
popup
Figure 5.5: Popup dependency graph.
5.5 Compilation and Configuration Management
5.5.1 Compilation Management: make
1 $ touch foo .c
Directives
1 target : source1 source2 . . . 2 command1
3 command2
What Will make Do?
Simple Example
1 gcc −c button .c # produces button . o 2 gcc −c window .c # produces window . o 3 gcc −o popup button .o window .o # produces popup
1 all : popup 2
3 popup : button .o window .o 4 gcc −o popup button .o window .o 5
6 button .o : button .c 7 gcc −c button .c 8
C O
N FI
D E
N TI
A L
D R
A FT
5.5. COMPILATION AND CONFIGURATION MANAGEMENT 89
a.out
logapp.o loggerapplib.o loggerlib.o
logapp.c loggerapplib.c loggerlib.h loggerlib.c
Figure 5.6: Logger dependency graph.
9 window .o : window .c window .h 10 gcc −c window .c
List Object Example
[RR03][pp. 55–56]
List Object Makefile
Variables
1 CC = gcc 2
3 LIST_OF_FILES = file1 .c file2 .c \ 4 file3 .c file4 .c 5
6 program1 : $ (LIST_OF_FILES) 7 $ (CC ) $ (LIST_OF_FILES) −o program1
[RR03][pp. 55–56] Environment Variables
1 $ export LIST_OF_FILES="file1.c file2.c file3.c file4.c file5.c" 2 $ make −e program1
Variables on the Command Line
1 $ make LIST_OF_FILES="file1.c file2.c file3.c file4.c file5.c" ←֓ program1
C O
N FI
D E
N TI
A L
D R
A FT
90 CHAPTER 5. COMPILING C IN LINUX
Default Suffix Rules
1 .c .o : 2 $ (CC ) $ (CFLAGS ) $< −o $@
1 .c .a : 2 $ (CC ) −c $ (CFLAGS ) $< 3 ar rv $@ $ * . o 4 rm −f $ * . o
1 prog : lib (sub1 ) lib (sub2 ) lib ( ( module1 ) ) prog .o 2 $ (CC ) −o $@ prog .o lib
System Default Make Definitions
mkdep
1 mkdep [cc−options ] file1 .c file2 .c . . .
5.5.2 Configuration Management (RCS)
Sample RCS Session
1 127 Cayuga> mkdir RCS 2 128 Cayuga> rcs −i blitz # i n i t i a l i z e f i l e in RCS system 3 RCS file : RCS/blitz ,v 4 enter description , terminated with single '.' or end of file : 5 NOTE : This is NOT the log message ! 6 >> Shell script fo r blitzing directories , named after the 7 >> Wehrmacht Blitzkrieg tactic . 8 >> . 9 done
10 129 Cayuga> rcs −alat ,egm ,ribbens ,mcquain blitz # author ize users 11 RCS file : RCS/blitz ,v 12 done
13 130 Cayuga> rcs −elat blitz # deauthorize user 14 RCS file : RCS/blitz ,v 15 done
16 131 Cayuga> ci blitz # check in f i l e , vers ion number assigned 17 RCS/blitz ,v <−− blitz 18 initial revision : 1 . 1
C O
N FI
D E
N TI
A L
D R
A FT
5.5. COMPILATION AND CONFIGURATION MANAGEMENT 91
19 done
20 131 Cayuga> ls −l RCS 21 132 Cayuga> co −l blitz # check out f i l e with e x c l u s i v e r i g h t to ←֓
modify 22 RCS/blitz ,v −−> blitz 23 revision 1 . 1 (locked ) 24 done 25 133 Cayuga> ex blitz # e d i t f i l e 26 "blitz" 14 lines , 879 characters 27 :$a 28 Junk line at end . 29 . 30 :wq 31 "blitz" 15 lines , 897 characters 32 134 Cayuga> ci blitz # check modified f i l e back in 33 RCS/blitz ,v <−− blitz 34 new revision : 1 . 2 ; previous revision : 1 . 1 35 enter log message , terminated with single '.' or end of file : 36 >> Added junk line at end using ex . 37 >> . 38 done
39 135 Cayuga> rcs −o1 . 1 blitz # d e l e t e old vers ion 40 RCS file : RCS/blitz ,v 41 deleting revision 1 . 1 42 done
43 136 Cayuga> rlog blitz # gives mod if ica t ion h i s t o r y
5.5.3 Distributed Configuration Management (GIT)
5.5.4 Conceptual Exercises for Section 5.5
Exercise 5.5.1: (true / false) In a command-line in a Makefile, leading tabs are significant.
Exercise 5.5.2: (true / false): In a Makefile, leading tabs are insignificant.
Exercise 5.5.3: Consider the following:
1 $ ls −l 2 total 89 3 −rw−−−−−−− 1 lucia users 196 Jun 25 09 :41 Makefile 4 −rw−−−−−−− 1 lucia users 90001 Jun 25 09 :42 fig1 .eps 5 −rw−−−−−−− 1 lucia users 8 Jun 25 09 :43 final .aux 6 −rw−−−−−−− 1 lucia users 11056 Jun 25 09 :43 final .dvi 7 −rw−−−−−−− 1 lucia users 3664 Jun 25 09 :43 final .log 8 −rw−−−−−−− 1 lucia users 64411 Jun 25 09 :44 final . ps
C O
N FI
D E
N TI
A L
D R
A FT
92 CHAPTER 5. COMPILING C IN LINUX
9 −rw−−−−−−− 1 lucia users 8319 Jun 25 09 :42 final .tex 10 $
11 $ c a t Makefile
1 SRC = final .tex 2
3 all : final 4
5 final : final .ps 6
7 final .ps : final .dvi 8 dvips −o final .ps final 9
10 final .dvi : ${SRC} fig1 .eps 11 latex final
12
13 clean : 14 touch * . tex 15 rm final .log final .aux final .dvi final .ps 16 $
Which commands, if any, do the following command lines force to exe- cute? The following command lines are independent of each other (i.e., the second is not run after the first, and the first and not run after the sec- ond).
a) $ make final
b) $ make
Exercise 5.5.4: Consider the following:
1 $ ls −l 2 total 89 3 −rw−−−−−−− 1 lucia users 196 Jun 25 09 :41 Makefile 4 −rw−−−−−−− 1 lucia users 90001 Jun 25 09 :42 fig1 .eps 5 −rw−−−−−−− 1 lucia users 8 Jun 25 09 :43 final .aux 6 −rw−−−−−−− 1 lucia users 11056 Jun 25 09 :43 final .dvi 7 −rw−−−−−−− 1 lucia users 3664 Jun 25 09 :43 final .log 8 −rw−−−−−−− 1 lucia users 64411 Jun 25 09 :44 final . ps 9 −rw−−−−−−− 1 lucia users 8319 Jun 25 09 :42 final .tex
10 $
11 $ c a t Makefile
C O
N FI
D E
N TI
A L
D R
A FT
5.5. COMPILATION AND CONFIGURATION MANAGEMENT 93
1 SRC = final 2
3 all : $ (SRC ) 4
5 $ (SRC ) : $ (SRC ) .ps 6
7 $ (SRC ) .ps : $ (SRC ) .dvi 8 dvips −o $ (SRC ) .ps $ (SRC ) 9
10 $ (SRC ) .dvi : $ (SRC ) .tex fig1 .eps 11 latex $ (SRC ) 12
13 clean : 14 touch * . tex 15 −rm $ (SRC ) .log $ (SRC ) .aux $ (SRC ) .dvi $ (SRC ) .ps 16 $
Which commands, if any, do the following command lines force to exe- cute? The following command lines are independent of each other (i.e., the second is not run after the first, and the first and not run after the sec- ond).
a) $ make final
b) $ make
Exercise 5.5.5: Consider the following:
1 $ ls −l 2 total 292 3 −rw−−−−−−− 1 lucia staff 8 Oct 31 19 :58 444f10e2 .aux 4 −rw−−−−−−− 1 lucia staff 15580 Oct 31 19 :58 444f10e2 .dvi 5 −rw−−−−−−− 1 lucia staff 13010 Oct 31 19 :58 444f10e2 .log 6 −rw−−−−−−− 1 lucia staff 58440 Oct 31 19 :59 444f10e2 .pdf 7 −rw−−−−−−− 1 lucia staff 180865 Oct 31 19 :58 444f10e2 . ps 8 −rw−−−−−−− 1 lucia staff 9583 Oct 31 19 :58 444f10e2 .tex 9 −rw−−−−−−− 1 lucia staff 317 Oct 31 19 :57 Makefile
10 $
11 $ c a t Makefile
1 SRC = 444f10e2 2
3 spell : 4 detex $ (SRC ) | aspell list | sort −u
C O
N FI
D E
N TI
A L
D R
A FT
94 CHAPTER 5. COMPILING C IN LINUX
5
6 all : $ (SRC ) 7 $ (SRC ) : $ (SRC ) .pdf 8
9 $ (SRC ) .pdf : $ (SRC ) .ps 10 ps2pdf $ (SRC ) .ps 11
12 $ (SRC ) .ps : $ (SRC ) .dvi 13 dvips −t letter $ (SRC ) .dvi −o $ (SRC ) .ps 14
15 $ (SRC ) .dvi : $ (SRC ) .tex 16 latex $ (SRC ) 17
18 clean : 19 − touch * . tex 20 − rm $ (SRC ) .aux $ (SRC ) .log $ (SRC ) .dvi $ (SRC ) .ps $ (SRC ) .pdf 21 $
Which commands, if any, do the following command lines force to exe- cute? The following command lines are completely independent of each other (i.e., the second is not run after the first, and the first is not run after the second).
a) $ make
b) $ make all
Exercise 5.5.6: Consider the following:
1 $ ls −l 2 total 89 3 −rw−−−−−−− 1 lucia users 196 Jun 25 09 :41 Makefile 4 −rw−−−−−−− 1 lucia users 90001 Jun 25 09 :42 popd .eee 5 −rw−−−−−−− 1 lucia users 8 Jun 25 09 :43 pushd .aaa 6 −rw−−−−−−− 1 lucia users 11056 Jun 25 09 :43 pushd .ddd 7 −rw−−−−−−− 1 lucia users 3664 Jun 25 09 :43 pushd .lll 8 −rw−−−−−−− 1 lucia users 64411 Jun 25 09 :44 pushd .ppp 9 −rw−−−−−−− 1 lucia users 8319 Jun 25 09 :42 pushd .fff
10 $
11 $ c a t Makefile
1 SRC = pushd 2
3 all : $ (SRC )
C O
N FI
D E
N TI
A L
D R
A FT
5.5. COMPILATION AND CONFIGURATION MANAGEMENT 95
4
5 $ (SRC ) : $ (SRC ) .ppp 6
7 $ (SRC ) .ppp : $ (SRC ) .ddd 8 src2dev −o $ (SRC ) .ppp $ (SRC ) 9
10 $ (SRC ) .ddd : $ (SRC ) .fff popd .eee 11 hexroff $ (SRC ) 12
13 clean : 14 touch * . fff 15 −rm $ (SRC ) .lll $ (SRC ) .aaa $ (SRC ) .ddd $ (SRC ) .ppp 16 $
Which commands, if any, do the following command lines force to exe- cute? The following command lines are completely independent of each other (i.e., the second is not run after the first, and the first is not run after the second).
a) $ make pushd
b) $ make
Exercise 5.5.7: Generally, we always want our Makefile only to execute commands only when necessary. This is the point of make.
The following Makefile will perform unnecessary work under certain circumstances. Identify the problem in it, explain why it is a problem, and correct it in place.
1 SRC = flip 2 CC = gcc 3 CFLAGS = −DBSD −DNDEBUG −O −c 4
5 all : $ (SRC ) man 6
7 man : $ (SRC ) . 1 8 nroff −man $ (SRC ) . 1 > $ (SRC ) .man 9
10 $ (SRC ) : $ (SRC ) .o getopt .o 11 $ (CC ) −s −o $ (SRC ) $ (SRC ) .o getopt .o 12
13 $ (SRC ) .o : $ (SRC ) .c $ (SRC ) .h 14 $ (CC ) $ (CFLAGS ) $ (SRC ) .c 15
C O
N FI
D E
N TI
A L
D R
A FT
96 CHAPTER 5. COMPILING C IN LINUX
16 getopt .o : getopt .c $ (SRC ) .h 17 $ (CC ) $ (CFLAGS ) getopt .c 18
19 clean : 20 @−rm * .o $ (SRC ) $ (SRC ) .man
Exercise 5.5.8: What does the acronym RSC expand to?
Exercise 5.5.9: (true or false) RCS is a collection of UNIX tools/commands for software project management.
Exercise 5.5.10: Git is a what type of software version control system (only one word necessary)?
Exercise 5.5.11: Give the Git command to download a remote repository to a local host.
Exercise 5.5.12: Give the Git command to cd to a different branch. Give an example of a complete Git command to do this.
Exercise 5.5.13: List by name three common branches or directories in a Git repository.
Exercise 5.5.14: Consider pushing a bug fix directly to production/release in Git. What is this called?
Exercise 5.5.15: Explain the difference between the Git add and the commit commands.
Exercise 5.5.16: Explain the difference between the Git push and the merge commands.
Exercise 5.5.17: Explain the difference between the Git fetch and the pull commands.
Exercise 5.5.18: List three differences between Git and Subversion.
Exercise 5.5.19: Suppose you issue a Git pull request from your feature branch to the develop branch, but the request inidates that there are merge conflicts. List the steps to resolve this issue.
C O
N FI
D E
N TI
A L
D R
A FT
5.5. COMPILATION AND CONFIGURATION MANAGEMENT 97
5.5.5 Programming Exercises for Section 5.5
Exercise 5.5.20: In this exercise you will both create a dependency graph for the codebase of a C project and write the Makefile.
a) Draw a dependency graph, like those shown in this section, for the Makefile you will create for the second part (b) of this exercise. Read part (b) first, but complete the dependency graph before writing the Makefile, which is trivial once the graph is constructed.
b) Write a Makefile for a C program called flip which converts the line- ending characters on plain text files from MS-DOS conventions (CR-LF pairs) to UNIX conventions (LF only) and vice versa.
The source files required for building flip are flip.1, flip.c, flip.h, and getopt.c, and are available in a tar archive at http:// perugini.cps.udayton.edu/teaching/books/SPUC/www/
files/flip.tar.
Your Makefile must include target directives for every derived file produced during the compilation process (i.e., each program, each ob- ject file, and any other intermediate files produced during compilation). Make sure that each directive also lists all files that the derived file de- pends on in its dependency list.
The steps in compiling flip are:
1 gcc −DBSD −DNDEBUG −O −c flip .c 2 gcc −DBSD −DNDEBUG −O −c getopt .c 3 gcc −s −o flip flip .o getopt .o
Your Makefile must be written so that make flip carries out these commands, only if necessary. Each command above generates a separate derived file, and so must be placed in a separate directive. In addition, your makefile must be written so that make man carries out the fol- lowing command, again, only if necessary:
1 nroff −man flip . 1 > flip .man
C O
N FI
D E
N TI
A L
D R
A FT
98 CHAPTER 5. COMPILING C IN LINUX
The flip.1 file is the source file for the command’s manpage. nroff is a program that formats the text of the manpage. The command shown above formats the manpage into a human-readable form and places the output in the file flip.man.
Your Makefile must be written so that when make is invoked with no target specified on the command line, it carries out both sets of com- mands listed above, only if necessary, bringing everything (both the pro- gram and its formatted manpage) up-to-date. Finally, your Makefile must have both an all and a clean directive to remove all generated files. Use variables where appropriate in your Makefile to improve its readability, and use descriptive comments to clarify your intentions wherever necessary. You may find it helpful to use the touch command and the -n option to make to help debug your Makefile.
Both flip.c and getopt.c include only flip.h.
Exercise 5.5.21: In this exercise you will progressively refine a Makefile for an application that utilizes a two libraries for interacting with a link- list data structure. The source files required for building the system are listlib.c (the library implementation), listlib.h (the library header file or interface), and keeplog helper.c (a library implementation used by the application program), and keeplog.c (the application), and are available in a tar archive at http://perugini.cps.udayton.edu/ teaching/books/SPUC/www/files/listlib.tar.
a) Start by drawing the dependeny graph for this project (as shown in Figs. 5.5 and 5.6).
b) Write a simple Makefile for this project. By simple we mean do not create the two libraries at this point. Getting the project build an exe- cutable through a Makefile is sufficient for this part. Your Makefile must include target directives for every derived file produced during the compilation process (i.e., each program, each object file, and any other intermediate files produced during compilation). Make sure that each directive also lists all files that the derived file depends on in its depen- dency list. Your Makefile must be written so that when make is in- voked with no target specified on the command line, re-compiles or re- links, only if necessary, to bring everything up-to-date. Your Makefile
C O
N FI
D E
N TI
A L
D R
A FT
5.5. COMPILATION AND CONFIGURATION MANAGEMENT 99
must have both an all and a clean directive to remove all generated files. Use variables where appropriate in your Makefile to improve its readability, and use descriptive comments to clarify your intentions wherever necessary.
c) Re-write/shorten your Makefile so that it uses default syntax rules illustrated in this section.
d) Factor keeplog helper.c into two files, keeplog helper1.c and keeplog helper2.c, each containing one function. Re-write your Makefile so that none of the file names in the directory, save for the executable keeplog, are hardcoded into the Makefile. In other words, write your Makefile in such a way that it is general enough to compile, link, and build any C project. Your Makefile should be approximately 15 lines of code.
e) Make a statically-linked library out of listlib.c and keeplog - helper.c, name them liblist.a and libkeeplog.a, respectively, and install them and the header files listlib.h and keeplog - helper.h in your /lib and /include directories. Set your LIBRARY PATH and C INCLUDE PATH variables. Now, re-write your Makefile from the previous part so that it works in concert with these newly created libraries. Your Makefile should be approximately 15 lines of code.
Exercise 5.5.22: Provide a Makefile which builds the program for the prior problem (#5). Your Makefile must include target directives for ev- ery derived file produced during the compilation process (i.e., each pro- gram, each object file, and any other intermediate files produced during code generation and compilation). Make sure that each directive also lists all files on which the derived file depends in its dependency list. Also, your Makefile must be written to carry out only the commands neces- sary to bring any produced file up-to-date. Your Makefile must do just enough, but no extra, work to bring the final executable up-to-date every time make is invoked. In addition, it must have an all directive, and a clean directive to remove all generated files. Use variables where ap- propriate in your Makefile to improve its readability. Your Makefile must bring everything up-to-date, using only f?lex and gcc, without any warnings or errors, when make is invoked.
C O
N FI
D E
N TI
A L
D R
A FT
100 CHAPTER 5. COMPILING C IN LINUX
Exercise 5.5.23: Assume there are several .c and .h files in a directory constituting a project. Write a Makefile to compile and link this project whose executable is to be named example.
Your Makefile must be written so carries out only the commands neces- sary to bring any produced file up-to-date. Your Makefile must do just enough, but no extra, work to bring the final executable example up-to- date every time make is invoked. In addition, it must have an all direc- tive and a clean directive to remove all generated files. Use variables where appropriate in your Makefile to improve its readability. Only when defining variables, if you do not know the specific GNU make syn- tax to accomplish a task, it is okay to write in English what you are trying to do. However, in the rules and command-line section of the Makefile, you must use proper, correct syntax. Your Makefile must bring every- thing up-to-date, using only gcc, without any warnings or errors, when make is invoked.
5.6 Packaging and Compression Utilities
5.6.1 ar
1 $ ar t /usr/lib/libc .a | grep 'ˆprintf.o' 2 p r i n t f .o 3 $ ar qv project .ar * .c 4 $ ar t project .ar 5 $ ar rvb foob .c project .ar fooa .c 6 $ ar xv project .ar fooa .c foob .c
5.6.2 tar
1 $ # c r e a t e s the p1 . t a r arch ive 2 $ tar cvf p1 .tar myshell .c helper .c other .c Makefile 3 $ # l i s t s the conte nts of p1 . t a r 4 $ tar tf p1 .tar 5 $ # e x t r a c t s the p1 . t a r arch ive 6 $ tar xvf p1 .tar 7
8 $ # pre s e rve s f i l e permiss ions 9 $ tar cvpf p1 .tar myshell .c helper .c other .c Makefile
C O
N FI
D E
N TI
A L
D R
A FT
5.7. THEMATIC TAKE-AWAYS 101
10 $ # compresses the data 11 $ tar cvzf p1 .tgz myshell .c helper .c other .c Makefile 12 $ # pre s e rve s and compresses 13 $ tar cvpzf p1 .tgz myshell .c helper .c other .c Makefile 14 $ # c r e a t e s arch ive rooted at d i r e c t o r y p1 15 $ tar cvpzf p1 .tgz p1 16 $ # e x t r a c t s the compressed arch ive p1 . tgz 17 $ tar xvzf p1 .tgz p1
5.6.3 gzip/gunzip
1 $ gzip foo .tar 2 $ gunzip foo .tar .gz 3 $ zcat foo .tar .gz | tar tvf −
5.6.4 compress/uncompress
1 $ compress foo .tar 2 $ uncompress foo .tar .Z
5.6.5 Conceptual Exercises for Section 5.6
Exercise 5.6.1: Give one advantage and one disadvantage of ar.
Exercise 5.6.2: Give one advantage and one disadvantage of tar.
Exercise 5.6.3: Write a command to tar and gzip (only) all the files (plain files, directories, or links) ending in .c in or below /C into C.tgz in one stroke.
5.7 Thematic Take-Aways
5.8 Chapter Summary
5.9 Key Terms
5.10 Bibliographic Notes
C O
N FI
D E
N TI
A L
D R
A FT
102 CHAPTER 5. COMPILING C IN LINUX
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 6
Files and Directories II: Inodes, Hard and Symbolic Links
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
6.1 Chapter Objectives
• Establish an understanding of I/O systems calls (open/close and read/write)
• Establish an understanding of the Linux file permission model.
• Establish an understanding of the Linux file system.
• Establish an understanding of hard and symbolic links.
103
C O
N FI
D E
N TI
A L
D R
A FT
104 CHAPTER 6. FILES AND DIRECTORIES II:
INODES, HARD AND SYMBOLIC LINKS
others
r w x r w x r w x
user group
Figure 6.1: File permissions.
6.2 Low-Level I/O
6.2.1 Review of Linux I/O Data Structures
6.2.2 Review of Buffered Output
6.2.3 Library vs. System Calls
6.2.4 I/O Recap
6.2.5 select and poll
6.3 Disk Statistics
6.4 File Access (3 Types)
6.5 File Permissions, Owners, and Groups
[RR03][p. 105]
6.6 Files
[RR03][Fig 4.3]
[RR03][p. 120]
6.7 Relevant Accessor/Modifier Functions, and structs
6.8 Inodes
[RR03][p. 160] [RR03][p. 163]
C O
N FI
D E
N TI
A L
D R
A FT
6.8. INODES 105
user program area
myfp
1000
"This is a test."
3
file structure for /home/cps346−01.15/testfile.txt
1000
0
6
1
2
3
4
5
file descriptor table
kernel area
to system file table
Figure 6.2: File pointer.
kernel area
entry for /home/cps346−01.15/
testfile.txt
0
6
1
2
3
4
5
file descriptor table
user program area
system file table in−memory inode table
myfd
3
B
Figure 6.3: File tables.
C O
N FI
D E
N TI
A L
D R
A FT
106 CHAPTER 6. FILES AND DIRECTORIES II:
INODES, HARD AND SYMBOLIC LINKS
direct pointers to beginning file blocks
pointers to next file blocks
size (in bytes) owner UID and GID relevant times (3) link and block counts
file information:
single indirect pointer
double indirect pointer
triple indirect pointer
permissions
inode
Figure 6.4: Inode.
inode
21452
1
12345 testfile.txt
inode name
directory entry in /home/cps346−01.15
"This is some text."
block 2145212345
Figure 6.5: Directory entry.
C O
N FI
D E
N TI
A L
D R
A FT
6.9. FILE LINKS: HARD VS. SOFT 107
/home/lucy
dir2dir1
prog1 proga
Figure 6.6: Hard link.
2 "This is some text."
block 2145212345inode
12345 testfile.txt
inode name
directory entry in /home/cps346−01.15
12345
inode name
directory entry in /home/cps346−01.15/tmp
testfile2.txt
21452
Figure 6.7: Hard link.
6.9 File Links: Hard vs. Soft
6.10 Hard Links
[RR03][p. 165]
6.11 Symbolic (Soft) Links
Helpful for creating shorter URLs to files served by a web server.
[RR03][p. 170]
C O
N FI
D E
N TI
A L
D R
A FT
108 CHAPTER 6. FILES AND DIRECTORIES II:
INODES, HARD AND SYMBOLIC LINKS
12345 testfile.txt
inode name
directory entry in /home/cps346−01.15
block 2145212345inode
24198 testfile2.txt
inode name
directory entry in /home/cps346−01.15/tmp
block 3172224198inode
21452
1
31722
1 "/home/cps346−01.15/ testfile.txt"
"This is some text."
Figure 6.8: Soft link.
6.12 Editor Examples
6.13 od (Octal Dump) Command
6.14 File ‘Types’ and ‘Names’
6.15 Question to investigate
6.16 Set-uid Program
6.17 Login Process
6.18 Things to Do
6.19 find Command
Traverse a file hierarchy to find files and directories, and optionally execute a command line on all files found.
1 $ find . −name wc .c −print 2 $ find . −name "*.c" −print 3 $ find ˜ −name "*.c" −print 4 $ find . −name "sf[1-9].cpp" −print 5 $ find . −type d −print 6 $ find ˜ −type d −print
C O
N FI
D E
N TI
A L
D R
A FT
6.20. ACCOUNTS 109
7 $ find $HOME −type f −print 8 $ find . −name "*.c" −exec chmod 660 {} \ ; 9 $ find . −name "*" −type f −exec chmod 400 {} \ ;
10 $ find . −name "*" −type d −exec chmod 500 {} \ ; 11 $ find . −name .DS_Store −exec rm {} \ ; 12 $ find . −name .DS_Store −delete 13 $ find . −name ".*rc" −print 14 $ find . \ ( −name "*˜" −o −name "*.bak" \ ) −exec rm {} \ ;
6.20 Accounts
6.21 Character and Block Special Files in Linux
1 $ ls −l /devices/pci@1e , 600000/ide@d/dad@0 , 0 : a ,raw 2 crw−r−−−−− 1 root sys 136 , 8 Feb 19 17 :47 /devices/pci@1e , 600000/←֓
ide@d/dad@0 , 0 : a ,raw 3 $ ls −l /devices/pci@1e , 600000/ide@d/dad@0 , 0 : a 4 brw−r−−−−− 1 root sys 136 , 8 Feb 19 02 :33 /devices/pci@1e , 600000/←֓
ide@d/dad@0 , 0 : a
6.22 Conceptual Exercises for Chapter 6
Exercise 6.22.1: Can we solve the file renaming problem with the find command? For instance, find home -name route.c -exec mv {} route.cpp
;. Explain.
Exercise 6.22.2: Write a single, complete command line to make (only) each plain file (not directories or links) ending in .txt in or below your login directory readable by you and writable by you and others, without giving any extraneous permissions.
Exercise 6.22.3: Write a single, complete command line to make (only) each directory (not plain files or links) named abc or acc residing in or below your current working directory readable by you and your group; writable by you; and searchable by you, your group, and everyone, with- out giving any extraneous permissions.
C O
N FI
D E
N TI
A L
D R
A FT
110 CHAPTER 6. FILES AND DIRECTORIES II:
INODES, HARD AND SYMBOLIC LINKS
Exercise 6.22.4: Write a single, complete command line to remove all files ending in .core residing in or below your $HOME/bin, $HOME/C, and /Dropbox/homeworks directories. Your solution must work from any directory.
Exercise 6.22.5: Write a single, complete command line to find (only) all the plain files in your account ((not directories or links) ending in .tex that contain the string Linux and C Programming, in any case. Your solution must work from any directory.
Exercise 6.22.6: Write a single, complete command line to find (only) all the plain files in your account (not directories or links) ending in .a (i.e., all archives) and run the command to list the contains of the archive on each one. Your solution must work from any directory.
Exercise 6.22.7: What does the following C code do?
1 while ( ( n=read (fd , buf , bufsize ) ) > 0) ;
Exercise 6.22.8: We illustrated in class that, however ironic, a homemade version of Linux cat using standard library functions executes faster than one using system calls read and write with a buffer of size one. What happens to the run-time speed of the latter as we increase the size of the buffer? At what point does the buffer size have no effect on the speed of the program?
Exercise 6.22.9: Create a file named -r. Describe how you did this. Now remove the file with the rm command. Describe how you did this.
Exercise 6.22.10: For cd to work properly, must it be a Linux command or a shell builtin? Explain with reasons.
Exercise 6.22.11: [KP84, Exercise 2-8, p.63] cp doesn’t copy subdirectories, it just copies files at the first level of a hierarchy. What does it do if one of the argument files is a directory? Is this kind or even sensible? Discuss the relative merits of three possibilities: 1. an option to cp to descend directo- ries, 2. a separate command rcp (recursive copy) to do the job, or 3. just having cp copy a directory recursively when it finds one. What other pro- grams would benefit from the ability to traverse the directory tree?
C O
N FI
D E
N TI
A L
D R
A FT
6.22. CONCEPTUAL EXERCISES FOR CHAPTER ?? 111
Exercise 6.22.12: Choose any file in /dev on our system. The fourth sec- tion of the Unix Reference Manual on our system has descriptions of spe- cial files. Use it to give a brief description (paraphrase the manual) of the file you’ve selected. You may need to abbreviate the name of your file when you invoke man.
If the file you have selected is a ‘symbolic link,’ which it probably is, follow it to an ‘original.’ Give the result of the ls -l command on that file, and explain the fields. Does its access list begin with a -, d, or l? Explain. Click here for an example file (do not use any contents from this in your solution). Your solution must take the form of this sample and provide a commensurate level of detail.
Exercise 6.22.13: [KP84, Exercise 3-14, p.94] Compare the here-document version of 411 with the original. Which is easier to maintain? What is a better basis for a general service.
Exercise 6.22.14: [KP84, Exercises 2-6, p.62]
What is the difference between the command line $ mv junk junk1 and the command lines $ cp junk junk1 and $ rm junk invoked in se- qunce? Hint: make a link to junk, then try it.
Exercise 6.22.15: [KP84, Exercises 2-6, p.62] Why does ls -l report 4 links to recipes? Hint: try the command line ls -ld /usr/you. Why is this useful information?
Exercise 6.22.16: [KP84, Exercises 2-1, p.45]
What happens when you type <ctrl-d> to ed? Compare this to the follow- ing the command line ed < file.
Exercise 6.22.17: [KP84, Exercises 2-4, p.52]
du was written to monitor disk usage. Using it to find files in a directory hierarchy is at best a strange idiom, and perhaps inappropriate. As an alternative, look at the manual page for the find command, and compare the two commands. In particular, compare the command du -a | grep ... with the corresponding invocation of find. Which runs faster and how do you know? Is it better to build a new tool or use a side effect of an existing tool?
C O
N FI
D E
N TI
A L
D R
A FT
112 CHAPTER 6. FILES AND DIRECTORIES II:
INODES, HARD AND SYMBOLIC LINKS
Exercise 6.22.18: Find the entry of the passwd file which corresponds to your account. Because of the shared file systems on our system, you will have to do some exploring. See passwd(1), passwd(5), and getent(1) for help. Writeup your findings in a file called mypasswd, in your text subdirectory. Include the following: the absolute path to the passwd file, how you found your entry, what your userid (uid) is, and what your groupid (gid) is. Click here for an example mypasswd file.
Exercise 6.22.19: Take the shell facilities described in the first chapter of [KP84].
Exercise 6.22.20: Explain the output of following transcript.
1 $ c a t des
2 process patterns building
3 large scale software systems
4 using object technology
5 $ ln des ˜/b 6 $ rm des 7 $ c a t ˜/b
Exercise 6.22.21: Explain how a hard link can be distinguished from the file to which it is linked.
Exercise 6.22.22: (true / false) The link count in an inode refers only to hard links.
Exercise 6.22.23: (true / false) The file representing a symbolic link does not have its own inode number.
Exercise 6.22.24: A symbolic link points to a (inode number or filename).
Exercise 6.22.25: Does the cp command follow symbolic links? If so, ex- plain with examples.
Exercise 6.22.26: Does the find command follow symbolic links? If so, explain with examples.
Exercise 6.22.27: Does the tar command follow symbolic links? If so, ex- plain with examples.
C O
N FI
D E
N TI
A L
D R
A FT
6.22. CONCEPTUAL EXERCISES FOR CHAPTER ?? 113
Exercise 6.22.28: While hard links cannot be made across filesystems, can you mv directories across filesystems. If so, how? Explain.
Exercise 6.22.29: What is the minimum number of links to the directory d in the following figure, if circles represent directories and rectangles non- directory files? Explain.
d
gf he i
Exercise 6.22.30: Give three types of file information which are in a file’s inode.
Exercise 6.22.31: Give one example of file information which is not in a file’s inode.
Exercise 6.22.32: When is an entry in the system file table freed?
Exercise 6.22.33: When is an entry in the in-memory inode table freed?
Exercise 6.22.34: [KP84, p.55] Consider the following session with some Linux system.
1 $ ls −l /etc/passwd 2 −rw−r−−r−− 1 root wheel 1861 Mar 22 2005 /etc/passwd 3
4 $ ls −l /bin/passwd 5 −rwsrwxrwx 1 root wheel 35092 Mar 20 2005 /usr/bin/passwd
Is this setup advisable? Why or why not? Explain and be specific.
Exercise 6.22.35: Give a complete command line which will set the per- mission on a file named permfile to -r-x-w-rwx. Use octal notation.
C O
N FI
D E
N TI
A L
D R
A FT
114 CHAPTER 6. FILES AND DIRECTORIES II:
INODES, HARD AND SYMBOLIC LINKS
Exercise 6.22.36: Write a complete command line to make (only) each plain file (not directories or links) ending in .txt in or below your login directory readable by you and writable by you and others, without giving any extraneous permissions.
Exercise 6.22.37: Write a complete command line to make (only) each plain file (not directories or links) ending in .txt in or below your login directory readable by you and writable by you and others, without giving any extraneous permissions (use octal codes).
Exercise 6.22.38: Write a complete command line to make (only) each di- rectory (not plain files or links) named abc or acc residing in or below your current working directory readable by you and your group; writable by you; and searchable by you, your group, and everyone, without giving any extraneous permissions.
Exercise 6.22.39: Suppose you have a file $HOME/a/bfile. Give the com- mand to set the permissions of bfile so that it would be readable by you, writable by you and your group, and executable by others, without giving any extraneous permissions.
Exercise 6.22.40: Suppose you have a file $HOME/a/bfile. How would you arrange it, without giving any extraneous permissions, so that bfile would be readable by you and your group, writable by you, and exe- cutable by others?
Exercise 6.22.41: Suppose you had a file $HOME/C/a.out. How would you arrange it, without giving extraneous permissions, so that a.out would be readable by you and your group, writable by you, and exe- cutable by everyone? Make no assumptions on the existing permissions of the other files and directories in the account.
Exercise 6.22.42: Suppose you have a file $HOME/tmp/logutil. Give a complete command line to set the permissions of logutil so that it would be readable by you, writable by you and your group, and executable by others, without giving any extraneous permissions.
Exercise 6.22.43: Give one example of file information which is in its par- ent directory.
C O
N FI
D E
N TI
A L
D R
A FT
6.23. PROGRAMMING EXERCISES FOR CHAPTER ?? 115
Exercise 6.22.44: What would you do to setup your environment such that files are created readable by only you, your group, and others, but writable by only you and your group, without giving any extraneous permissions, in such a way that this setting would be in effect each time you logged in?
Exercise 6.22.45: What are the actual contents of a directory file and how can you find this information?
Exercise 6.22.46: Consider a text editor which performs the following sequence of operations when editing the file /dirA/name1.
Open the file /dirA/name1. Read the entire file into memory. Close /dirA/name1. Modify the memory image of the file. Unlink /dirA/name1. Open the file /dirA/name1 (create and write flags). Write the contents of memory to the file. Close /dirA/name1.
Now, suppose that /dirA/name1 is an ordinary file and /dirB/name2 is a symbolic link to /dirA/name1. How are the files /dirB/name2 and /dirA/name1 related after the sequences of operations given above? For full credit, draw a figure depicting the inode pointers and structures before /dirA/name1 is opened in the editor and after it is closed in the editor.
6.23 Programming Exercises for Chapter 6
Exercise 6.23.47: Complete the definition of the isdirectory function below.
1 # include <s t d i o . h> 2 # include <time . h> 3 # include <sys/ s t a t . h> 4
5 i n t isdirectory ( char * path ) { 6 i f ( . . . )
C O
N FI
D E
N TI
A L
D R
A FT
116 CHAPTER 6. FILES AND DIRECTORIES II:
INODES, HARD AND SYMBOLIC LINKS
7 return 0 ; 8 else
9 return S_ISDIR (statbuf .st_mode ) ; 10 }
Exercise 6.23.48: Write a complete C program which takes a single filename argument and writes to stdout the number of links to that file. For full credit, your program must include all necessary error checking.
6.24 Programming Project for Chapter 6
Implement the Linux cp command.
6.25 Thematic Take-Aways
•
6.26 Chapter Summary
6.27 Key Terms
hard link, inode, soft link, symbolic link,
6.28 Bibliographic Notes
C O
N FI
D E
N TI
A L
D R
A FT
6.28. BIBLIOGRAPHIC NOTES 117
Part II: Communication and Concurrency
C O
N FI
D E
N TI
A L
D R
A FT
118 CHAPTER 6. FILES AND DIRECTORIES II:
INODES, HARD AND SYMBOLIC LINKS
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 7
Processes: Creation, Environment, Manipulation, and Communication
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
7.1 Chapter Objectives
• Establish an understanding of processes in Linux.
• Establish an understanding of processes creation and manipulation in Linux.
• Establish an understanding of the interaction between a process and the environment in which it executes.
• Establish an understanding of interprocess communication through (unamed and named) pipes (FIFOs)
• Introduce the client-server model of programming.
• Introduce Qt programming.
• Establish an understanding of the design and implementation of a command shell or command-line interface (CLI)
119
C O
N FI
D E
N TI
A L
D R
A FT
120 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
Secondary Memory
new
running
blocked
done
CPUto run selected
created process
I/O request
normal or abnormal termination
ready
quantum expired
I/O complete
Main Memory
Figure 7.1: Process life cycle.
7.2 Introduction
[RR03, p. 62]
[RR03, p. 24]
7.2.1 Process Identification
7.3 Process Creation: fork
[ATT][6-11]
7.3.1 Background Processes
7.3.2 fork Exercises
7.3.3 Conceptual Exercises for Section 7.3
Exercise 7.3.1: What is a process?
Exercise 7.3.2: Of what concept is timesharing an extension?
Exercise 7.3.3: What does timesharing enable in a computer system that is not possible in a system that is non-timeshared?
C O
N FI
D E
N TI
A L
D R
A FT
7.3. PROCESS CREATION: FORK 121
3 main data sections/regions
address
low address
realloc
arrayofints
210
0 1 2 3 4 5 6 7 8 9
210
40 bytes
argc argv ,,
program text
initialized static data
command−line arguments and environment variables
heap
stack
int* arrayofints;e.g.,
e.g.,
activation records for function calls
return address local variables
arguments return value
float rate = 3.1;
3.1
rate
x y
a b
e.g., deallocations using free e.g., free(arrayofints);
arrayofints = malloc(sizeof(*arrayofints)*10);
saved registers, automatic variables) (return address, parameters,
uninitialized static data global section
& environment
dynamic memory allocations from malloc family
five many section of a (C) program command−line arguments
environment variables stack: (local variales)
heap: (dynamically−allocated memory)
global section: (global variales)
program text
high
Figure 7.2: Logical layout of process in main memory.
Exercise 7.3.4: (circle one) Which of the following is possible in a time- shared computer system (with only one processor with one core) that is not possible if the system is not time-shared:
(i) interactive programs (ii) multiple processes running on the processor at once (iii) non-interactive programs (iv) (i), (ii) & (iii) (v) none of the above
Exercise 7.3.5: [RR03, Exercise 4.19, p. 119] What is a system call and how does it differ from a library call? What does a system call generally cause to happen?
Exercise 7.3.6: What is a orphan process?
Exercise 7.3.7: (true / false) All zombie processes become orphans.
Exercise 7.3.8: [RR03, Exercise 4.30, p. 125] How does fork affect the sys- tem file table?
Exercise 7.3.9: [RR03, Exercise 4.33, p. 128] Give the output generated by the following C program.
C O
N FI
D E
N TI
A L
D R
A FT
122 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
AFTER
if(childpid=fork() > 0) {
} else if (childid == 0) {
} . . }
/* parent */
/* child */
. main() {
USER AREA
STACK
DATA
TEXT
pid: 12791
. if(childpid=fork() > 0) {
} else if (childid == 0) {
} . . }
/* parent */
/* child */
. main() {
USER AREA
STACK
DATA
TEXT
pid: 12793
. if(childpid=fork() > 0) {
} else if (childid == 0) {
} . . }
/* parent */
/* child */
. main() {
USER AREA
STACK
DATA
TEXT
pid: 12791BEFORE
.
Figure 7.3: Graphic depiction of fork.
C O
N FI
D E
N TI
A L
D R
A FT
7.3. PROCESS CREATION: FORK 123
1 # include <s t d i o . h> 2 # include <unis td . h> 3
4 i n t main ( void ) { 5 printf ("Linux and C" ) ; 6 fork ( ) ; 7 return 0 ; 8 }
Exercise 7.3.10: Consider the following C code:
1 c2 = 0 ; 2 c1 = fork ( ) ; /* fork number 1 */ 3 i f (c1 == 0) 4 c2 = fork ( ) ; /* fork number 2 */ 5 fork ( ) ; /* fork number 3 */ 6 i f (c2 > 0) 7 fork ( ) ; /* fork number 4 */
Trace this program segment and determine how many processes are cre- ated. Assume that no errors occur. Draw a graph that shows how the processes are related. In this graph each process will be represented by a small circle containing a number that represents which fork created the process. The original process will contain 0 and the process created by the first fork will contain 1. There will be arrows from each parent to all of its children. Each arrow should point in a downward direction.
Exercise 7.3.11: Consider the following C program:
1 main ( ) { 2 i n t c2 = 0 ; 3 i n t c1 = fork ( ) ; /* fork number 1 */ 4 i f (c1 == 0) 5 c2 = fork ( ) ; /* fork number 2 */ 6 fork ( ) ; /* fork number 3 */ 7 i f (c2 > 0) 8 fork ( ) ; /* fork number 4 */ 9 }
Trace this program to determine how many processes are created. Assume that no errors occur. Draw a graph that shows how the processes created are related. In this graph each process will be represented by a small circle
C O
N FI
D E
N TI
A L
D R
A FT
124 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
containing a number that represents which fork created the process. The original process will contain 0 and the process created by the first fork will contain 1. There will be arrows from each parent to all of its children. Each arrow should point in a downward direction. Be careful.
Exercise 7.3.12: Consider the following C program.
1 # include <s t d i o . h> 2 # include <unis td . h> 3
4 i n t main ( ) { 5 pid_t childpid ; 6 i n t i ; 7
8 childpid = fork ( ) ; 9
10 fo r (i = 0 ; i < 10 && childpid == 0 ; i++) { 11
12 i f (childpid == −1) { 13 perror ("Failed to fork." ) ; 14 return 1 ; 15 } 16
17 fprintf (stderr , "A" ) ; 18
19 childpid = fork ( ) ; 20
21 i f (childpid == 0) { 22 fprintf (stderr , "B" ) ; 23 childpid = fork ( ) ; 24 } 25 } 26
27 return 0 ; 28 }
a)How many processes does this program spawn (include the original pro- cess in your count)? Give a brief explanation of how you arrived at your answer.
b)What is the output of this program?
Exercise 7.3.13: Consider the following C program.
C O
N FI
D E
N TI
A L
D R
A FT
7.3. PROCESS CREATION: FORK 125
1 # include<s t d i o . h> 2 # include<s t d l i b . h> 3 # include<unis td . h> 4
5 main ( ) { 6
7 pid_t childpid = 0 ; 8 i n t i=2; 9
10 fprintf (stderr , "PPID: %ld, PID: %ld, ping\n" , 11 ( long ) getppid ( ) , ( long ) getpid ( ) ) ; 12
13 while (i <= 20 && childpid == 0) 14
15 i f ( (childpid = fork ( ) ) == 0) 16 sleep ( 1 ) ; 17
18 fprintf (stderr , "PPID: %ld, PID: %ld, p%cng\n" , 19 ( long ) getppid ( ) , 20 ( long ) getpid ( ) , i++ % 2 ? 'i' : 'o' ) ; 21 }
Recall, the sleep library call blocks the calling process until n seconds have elapsed, where n is the argument to sleep.
a) How many processes does this program spawn (include the original process in your count)? Give a brief explanation of how you arrived at your answer.
i) 0 processes
ii) 20 processes
iii) an infinite number of processes
iv) none of the above
b) What is the output of this program?
The questions on the following page refer to the following program.
[RR03, p. 126]
# include < f c n t l . h> # include <s t d i o . h> # include <unis td . h> # include <sys/ s t a t . h>
C O
N FI
D E
N TI
A L
D R
A FT
126 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
i n t main ( void ) { char c = '!' ; i n t myfd ;
i f ( (myfd = open ("input.txt" , O_RDONLY) ) == −1) { perror ("Failed to open file" ) ; return 1 ;
}
i f (fork ( ) == −1) { perror ("Failed to fork" ) ; return 1 ;
}
read (myfd , &c , 1 ) ; printf ("Process %ld got %c\n" , ( long ) getpid ( ) , c ) ; return 0 ;
}
C O
N FI
D E
N TI
A L
D R
A FT
7.3. PROCESS CREATION: FORK 127
a) [RR03, Fig. 4.4, p. 126] Draw a diagram depicting the parent’s file de- scriptor table, the child’s file descriptor table, and the system file table at line 19.
b) [RR03, Fig. 4.5, p. 127] Consider moving lines 5–8 immediately after line 14. How would this change effect the parent’s file descriptor table, the child’s file descriptor table, and the system file table at the line contain- ing the call to read. Draw a diagram depicting the parent’s file descrip- tor table, the child’s file descriptor table, and the system file table at the line containing the call to read.
Exercise 7.3.14: Consider the following code:
1 # include <s t d i o . h> 2 # include <s t d l i b . h> 3 # include <unis td . h> 4
5 i n t main ( i n t argc , char * * argv ) { 6 pid_t childpid = 0 ; 7 i n t i , n ; 8
9 i f (argc != 2) { 10 fprintf (stderr , "Usage: %s processes\n" , argv [ 0 ] ) ; 11 return 1 ; 12 } 13 n = atoi (argv [ 1 ] ) ; 14 fo r (i = 1 ; i < n ; i++) 15 i f ( (childpid = fork ( ) ) == −1) 16 break ; 17
18 fprintf (stderr , "i:%d process ID:%ld parent ID:%ld child ID:%←֓ ld\n" ,
19 i , ( long ) getpid ( ) , ( long ) getppid ( ) , ( long ) childpid ) ; 20 return 0 ; 21 }
Trace the execution of this program with a command-line argument of 4. Assume that no errors occur. Draw a graph which shows how the pro- cesses are related. In this graph each process will be represented by a small circle containing a number which represents the value of i at the time the process was created. The circle for the original process will contain 0. Use lowercase letters to distinguish processes which were created with
C O
N FI
D E
N TI
A L
D R
A FT
128 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
the same value of i. There will be arrows from each parent to all of its children. Each arrow should point in a downward direction.
Exercise 7.3.15: Consider the following C program.
1 # include <s t d i o . h> 2 # include <unis td . h> 3
4 main ( ) { 5 fork ( ) ; /* fork number 1 */ 6 fork ( ) ; /* fork number 2 */ 7 fork ( ) ; /* fork number 3 */ 8 printf ("pid: %ld, ppid: %ld\n" , ( long ) getpid ( ) , ( long ) getppid ( ) ←֓
) ; 9 }
Trace this program to determine how many processes are created. Assume that no errors occur. Draw a graph which shows how the processes created are related. In this graph each process will be represented by a small circle containing a number which represents the forkwhich created the process. The original process will contain 0 and the process created by the first fork will contain 1. There will be arrows from each parent to all of its children. Each arrow should point in a downward direction. Be careful.
Exercise 7.3.16: Consider the following C program:
1 # include<s t d i o . h> 2 # include<s t d l i b . h> 3 # include<unis td . h> 4
5 i n t main ( ) { 6 pid_t childpid = 0 ; 7 i n t i ; 8
9 fo r (i = 1 ; i < 4 ; i++) 10 i f ( (childpid = fork ( ) ) == −1) 11 break ; 12
13 return 0 ; 14 }
a) Trace this program to determine how many processes are created. As- sume that no errors occur. Draw a directed graph which shows how the
C O
N FI
D E
N TI
A L
D R
A FT
7.3. PROCESS CREATION: FORK 129
processes created are related. In this graph each process will be repre- sented by a small circle containing a pid. You may assume the pid of the original process is 0 and that pid’s are assigned in increasing order of process creation, i.e., 1, 2, 3, . . . . There will be arrows from each parent to all of its children. Each arrow should point in a downward direction.
b) Modify this program in place above so that each parent process waits for all of its children to terminate before it terminates. You must only add code to the above program; you must not remove any code.
7.3.4 Programming Exercises for Section 7.3
Exercise 7.3.17: Write a C (not C++) program which spawns and synchro- nizes 20 processes to print the following to stderr (of course, with differ- ent process and parent process ids):
PPID: 310, PID: 497, ping
PPID: 497, PID: 498, pong
PPID: 498, PID: 499, ping
PPID: 499, PID: 500, pong
PPID: 500, PID: 501, ping
PPID: 501, PID: 502, pong
PPID: 502, PID: 503, ping
PPID: 503, PID: 504, pong
PPID: 504, PID: 505, ping
PPID: 505, PID: 506, pong
PPID: 506, PID: 507, ping
PPID: 507, PID: 508, pong
PPID: 508, PID: 509, ping
PPID: 509, PID: 510, pong
PPID: 510, PID: 511, ping
PPID: 511, PID: 512, pong
PPID: 512, PID: 513, ping
PPID: 513, PID: 514, pong
PPID: 514, PID: 515, ping
PPID: 515, PID: 516, pong
C O
N FI
D E
N TI
A L
D R
A FT
130 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
The first process must print ping to stderr and its child must print pong to stderr, then the child of that child must print ping to stderr and its child must print pong to stderr, and so on.
No sophisticated C library functions or Linux system calls, beyond what has been covered in this section, are necessary for this program. Do not use any C constructs not presented in this section.
Requirements:
a) Your program must be written in C (not C++) and compile cleanly with gcc.
b) The first process must print ping to stderr and its child must print pong to stderr, then the child of that child must print ping to stderr and its child must print pong to stderr, and so on.
c) Do not use the system call wait in your program because it is uneces- sary and, of course, sleep cannot be used to synchronize the processes because it does not guarantee the order in which the processes will run.
d) The processes forked need not terminate in reverse order of creation; it is okay if the parent terminates before the child it forked and the com- mand prompt displays between some of the lines of output.
e) Keep your program to approximately 25 lines of code.
Exercise 7.3.18: [RR03, Program 3.2, p. 68] Write a complete main() func- tion which creates a fan of n processes, where n is a command-line argu- ment.
Exercise 7.3.19: Write a complete C program which creates a chain of n (given as a command-line argument) processes which terminate in reverse order of creation. For full credit, your program must check for errors.
7.4 Process Environment
7.4.1 Variables
7.4.2 Accessing the Environment
C O
N FI
D E
N TI
A L
D R
A FT
7.4. PROCESS ENVIRONMENT 131
1 /* outputs the conte nts of i t s environment l i s t */ 2 # include<s t d i o . h> 3
4 extern char * * environ ; 5
6 i n t main ( void ) { 7 i n t i ; 8
9 printf ("The environment list follows:\n" ) ; 10 fo r (i=0; environ [i ] != NULL ; i++) 11 printf ("environ[%d]: %s\n" , i , environ [i ] ) ; 12 return 0 ; 13 }
[RR03, p. 49]
1 # include<s t d i o . h> 2 # include<s t d l i b . h> 3 # define MAILDEFAULT "/var/mail" 4
5 /* POSIX standard s p e c i f i e s t h a t s h e l l should use MAIL i f MAILPATH ←֓ not s e t */
6 i n t main ( void ) { 7 char * mailp = NULL ; 8
9 i f ( (mailp = getenv ("MAILPATH" ) ) == NULL ) 10 i f ( (mailp = getenv ("MAIL" ) ) == NULL ) 11 mailp = MAILDEFAULT ; 12 return 0 ; 13 }
[RR03, p. 50] 7.4.3 New Account Environment
7.4.4 Command-line Tips
7.4.5 PATH Variable
1 # include <s t d i o . h> 2 # include <s t d l i b . h> 3 # define PATH DELIMITERS ":" 4
5 i n t tokenizepath ( const char * s , const char * delimiters , char * * * ←֓ argvp ) ;
6
C O
N FI
D E
N TI
A L
D R
A FT
132 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
7 i n t main ( void ) { 8
9 char * * tokenized_path = NULL ; 10 char * path = getenv ("PATH" ) ; 11
12 i f (tokenizepath (path , PATH_DELIMITERS , &tokenized_path) != −1) 13 while ( *tokenized_path != NULL ) 14 printf ("%s\n" , *tokenized_path++) ; 15 return 0 ; 16 }
7.4.6 Korn Shell Configuration and Customization
7.4.7 .profile vs. (value of) ENV
7.4.8 .plan and .project
7.4.9 Configuring vi
7.4.10 Conceptual Exercises for Section 7.4
Exercise 7.4.1: Can a descendant shell pass variables up to an ancestor shell or is it just one-way street from ancestor to descendants? Explain.
Exercise 7.4.2: Which command indicates which computer you are logged into?
Exercise 7.4.3: Which command indicates the name and version of the OS running on the computer into which you are logged?
Exercise 7.4.4: Explain why it might be a good idea to single quote the character you are setting a kernal metacharacter to using stty (e.g., $ stty kill ’a’).
Exercise 7.4.5: Assume the pound # and backslash \characters serve as the erase and escape (kernal) metacharacters, respectively, and that, for each file-name pattern-matching expression, there is at least one file which matches that expression. Explain the output of each of the following Korn shell command lines (assume each is entered at the keyboard; something else might appear on the screen).
a)$ kill 5678
b)$ grep where are we going?
C O
N FI
D E
N TI
A L
D R
A FT
7.4. PROCESS ENVIRONMENT 133
c)$ \# pwd
d)$ sort myfile | mail vijay
e)$ ls myfile > thisfile &
f)$ du -a bc[de]f
g)$ ps
h)$ ls -ld .
Exercise 7.4.6: What is a daemon process? Give an example.
Exercise 7.4.7: What is the difference the .kshrc file and the .profile file. When is each sourced?
Exercise 7.4.8: Consider the following:
The env utility examines the environment and modifies it to ex- ecute another command. When called without arguments, the env command writes the current environment to standard output. The optional utility argument specifies the command to be exe- cuted under the modified environment. The optional -i argument means that env should ignore the environment inherited from the shell when executing utility. Without the -i option, env uses the [name=value] arguments to modify rather than replace the current environment to execute utility. The env utility does not modify the environment of the shell that executes it [RR03, p. 54].
Consider the following session with Linux:
1 $ env
2 HOST=wonderland 3 TERM=xterm−color 4 SHELL=/bin/bash 5 HISTORY=32 6 USER=alice 7 PAGER=less 8 HOME=/characters/alice 9 CAT=pat
10 $ env CAT=tom 11 HOST=wonderland 12 TERM=xterm−color
C O
N FI
D E
N TI
A L
D R
A FT
134 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
13 SHELL=/bin/bash 14 HISTORY=32 15 USER=alice 16 PAGER=less 17 HOME=/characters/alice 18 CAT=tom 19 $ env −i CAT=tom 20 CAT=tom 21 $ env −i CAT=jerry echo $CAT 22 pat
23 $
Explain why the last invocation of the env command does not print jerry. According to the env specification above, it seems as if it should. Hint: The answer has nothing to do with the fact that env does not modify environment of the shell that executes it.
Exercise 7.4.9: What does the PATH variable do?
Exercise 7.4.10: What would you do to change the value of the PATH vari- able by extending it with $HOME/bin at the beginning of its current value, in such a way that this new value would be in effect each time you logged in, and its value would also affect all descendant processes of your login shell? Give a complete command line and explanation.
Exercise 7.4.11: Assume you have an executable file $HOME/bin/ls and your PATH is modified as indicated above. Explain what the which ls command line would output and why.
Exercise 7.4.12: Consider the following series of command lines and out- puts:
1 $ c a t .profile ENV=".kshrc" PS1="$ " EXINIT="showmode showmatch ruler" 2 $ c a t .kshrc ksh $HOME/.profile 3 $ ksh
(true / false) The variables ENV, PS1, and EXINIT will all be visible to the child shell created on line 3.
C O
N FI
D E
N TI
A L
D R
A FT
7.4. PROCESS ENVIRONMENT 135
Exercise 7.4.13: Consider the following series of command lines and out- puts:
1 $ c a t .profile 2 ENV=".myenv" ; export ENV 3 ADDENV=".kshrc" 4 export PS1="Go ahead $ " 5 EXINIT="showmode showmatch ruler" 6 . $ADDENV 7 $ c a t .kshrc 8 . $HOME/.profile
Identify the most critical problem above.
Exercise 7.4.14: Consider the following series of command lines and out- puts:
1 $ c a t .profile 2 ENV=".myenv" ; export ENV 3 ADDENV=".kshrc" 4 export PS1="$ " 5 EXINIT="showmode showmatch ruler" 6 . $ADDENV 7 $ c a t .kshrc 8 . $HOME/.profile
What will this configuration cause to happen when the user logs on to the system?
Exercise 7.4.15: Consider the following series of command lines and out- puts:
1 $ c a t .profile ENV=".myenv" ; export ENV ADDENV=".kshrc" export PAGER=less EXINIT="showmode showmatch ruler" . $HOME/.kshrc 2 $ c a t .kshrc a l i a s ll=ls −l 3 $ c a t .myenv . $HOME/.kshrc $ ksh
C O
N FI
D E
N TI
A L
D R
A FT
136 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
4 $ echo $PAGER 5 6 $ echo $EXINIT 7
a) What is printed on line 5?
b) What is printed on line 7?
Exercise 7.4.16: Consider the following series of command lines and out- puts (executed in this order):
1 $ export PAGER=less 2 $ EXINIT="showmode showmatch ruler" 3 $ ksh
4 $ echo $PAGER
5
6 $ echo $EXINIT
7
8 $ export A=10 9 $ e x i t
10 $ echo $A
11
12 $
a) What is printed on line 5?
b) What is printed on line 7?
c) What is printed on line 11?
Exercise 7.4.17: Consider the following series of command lines and out- puts (executed in this order):
1 $ PAGER=less 2 $ export EXINIT="showmode showmatch ruler" 3 $ bash
4 $ echo $PAGER
5
6 $ echo $EXINIT
7
8 $ export A=10 9 $ e x i t
10 $ echo $A
C O
N FI
D E
N TI
A L
D R
A FT
7.4. PROCESS ENVIRONMENT 137
11
12 $
1.What is printed on line 5?
2.What is printed on line 7?
3.What is printed on line 11?
Exercise 7.4.18: Consider the following series of command lines and out- puts:
1 $ c a t .profile ENV=".myenv" ; export ENV ADDENV=".kshrc" EXINIT="showmode showmatch ruler" . $HOME/.kshrc 2 $ c a t .kshrc a l i a s dir=ls 3 export EXINIT 4 $ c a t .myenv . $HOME/.kshrc 5 $ ksh 6 $ export PAGER=less 7 $ ˆD 8 $ echo $PAGER 9 10 $ echo $EXINIT 11
a) What is printed on line 9?
b) What is printed on line 11?
Exercise 7.4.19: Consider the following series of command lines and out- puts:
1 $ c a t .profile 2 ENV=".myenv" ; export ENV 3 ADDENV=".kshrc" 4 EXINIT="showmode showmatch ruler" 5 . $HOME/.kshrc 6 $ c a t .kshrc 7 a l i a s dir=ls
C O
N FI
D E
N TI
A L
D R
A FT
138 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
8 export EXINIT
9 $ c a t .myenv 10 . $HOME/.kshrc 11 $ ksh
12 export PAGER=less 13 $ ˆD 14 $ echo $PAGER 15
16 $ echo $EXINIT
17
18 $
a) What is printed on line 15?
b) What is printed on line 17?
Exercise 7.4.20: Consider the following series of command lines and out- puts:
1 $ c a t .profile ENV=".myenv" ; export ENV ADDENV=".kshrc" export PAGER=less EXINIT="showmode showmatch ruler" . $HOME/.kshrc 2 $ c a t .kshrc a l i a s ll="ls -l" 3 $ c a t .myenv . $HOME/.kshrc 4 $ ksh 5 $ echo $PAGER 6 7 $ echo $EXINIT 8
a) What is printed on line 6?
b) What is printed on line 8?
Exercise 7.4.21: Consider the following session with the Korn shell:
1 $ pwd
2 $ /home/cps444−n1 . 1 9 3 $ ls
C O
N FI
D E
N TI
A L
D R
A FT
7.4. PROCESS ENVIRONMENT 139
4 C/ bin/ text/ wc .c 5 $ cd C
6 $ pwd
7 $ /home/cps444−n1 . 19/C 8 $ ls
9 c a t .c myshell .c mine .c 10 $ cd text 11 $ pwd
12 $ /home/cps444−n1 . 19/text
Give and explain two directories in the user’s CDPATH.
Exercise 7.4.22: Consider the following series of command lines and out- puts:
1 $ whoami
2 linda
3 $ pwd 4 $ /home/linda 5 $ ls −F 6 C/ bin/ text/ 7 $ cd C
8 $ pwd
9 $ /home/linda/C 10 $ ls
11 c a t .c wc .c env .c 12 $ cd text
13 $ pwd
14 $ /home/linda/text
Give two directories in linda’s CDPATH.
Exercise 7.4.23: Customizing Your Shell
The ksh Manpage and Your Environment Configuration:
Examine the manpage for the Korn shell (i.e., ksh). Remember, the ksh manpage is the sole authoritative reference for ksh on any system. The manpage explains all of the features supported by the shell and also docu- ments the various shell variables that tailor the behavior of the shell.
Next, examine your .profile and .kshrc files in your home directory. See which shell variables are set or altered by these files as well as which Linux commands are started from them. Take care to ensure that you un- derstand the behavior these settings affect before you change any. Being
C O
N FI
D E
N TI
A L
D R
A FT
140 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
careless can result in inaccessibility to your account or files, even by you! Take note of any command aliases that have already created for you in the default setup of your account.
Customization:
Choose eight different aspects of the Korn shell’s behavior that you can al- ter by setting or modifying the values of shell variables (i.e., not by simply adding commands). Modify your startup files so that your customizations will take effect at a desired time (e.g., at login or when you instigate a new shell or both). In addition, add or modify two Linux commands or shell built-ins that are called from these startup files. Lastly, and in addi- tion to setting shell variables and invoking commands, create or modify five command aliases to use as shorthand abbreviations for commands. Choose aliases that you believe are be personally useful.
Your solution file must be a plain ASCII text file in the format defined be- low, describing each shell variable, command, and key binding you added or modified. Do not insert any extra notes or explanations (other than what is asked for here). Specifically,
a) describe the change you made
b) describe the purpose of the modification and and how the behavior of the shell differs as a result
c) in which startup file did you placed the change
d) why did you made the change it that file
Similarly, do the same for each command and command alias you added or modified. Write no more than one sentence for each of components (a), (b), and (d) of each answer (there are fifteen). Simply provide a filename for component (c) of each answer. Your answer should concise, but also complete and correct.
At the end of your file, include a copy of each startup file that you mod- ified. Use :r .profile and :r .kshrc in vim. If you include more than one file, be sure to clearly mark where each file begins and ends.
There is a template for your ASCII file available at http://perugini. cps.udayton.edu/teaching/books/SPUC/www/files/envcust.
C O
N FI
D E
N TI
A L
D R
A FT
7.4. PROCESS ENVIRONMENT 141
txt to be used as starting point. Do not use any of the exact modifica- tions listed in this template in your submission. When your ASCII file is complete, convert it to PDF using the following commands:
1 $ enscript −o envcust . ps envcust .txt # converts ASCII to P o s t s c r i p t 2 $ ps2pdf envcust . ps # converts P o s t s c r i p t to PDF
Exercise 7.4.24: Describe (in your own words) the difference the between the filename value of the ENV variable and the .profile file. When is each sourced? Describe why the designers distributed the placement of ac- count configuration information across two separate files versus one cen- tral file. Be specific.
7.4.11 Programming Exercise for Section 7.4
Exercise 7.4.25: [RR03, pp. 54–55] Implement the Linux env utility in C.
The env utility examines the environment and modifies it to ex- ecute another command. When called without arguments, the env command writes the current environment to standard output. The optional utility argument specifies the command to be exe- cuted under the modified environment. The optional -i argument means that env should ignore the environment inherited from the shell when executing utility. Without the -i option, env uses the [name=value] arguments to modify rather than replace the cur- rent environment to execute utility. The env command does not modify the environment of the shell which executes it. [See the env manpage for more information.]
SYNOPSIS
env [-i] [name=value] ... [utility [argument ...]]
POSIX: Shell and Utilities
Requirements:
Write a program which behaves in the same way as the env utility when executing another program.
C O
N FI
D E
N TI
A L
D R
A FT
142 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
a) [This exercise is asking you to implement env from scratch, not just call the system’s installed version of env from a C program.]
b) When called with no arguments, the env utility calls the getenv function and outputs the current environment to stan- dard output.
c) When env is called with the optional -i argument, the entire environment is replaced by the name=value pairs. Otherwise, the pairs modify or add to the current environment.
d) If the utility argument is given, use system to exe- cute utility after the environment has been appropriately changed. Otherwise, print the changed environment to standard output, one entry per line. Check the return value of system to handle any errors.
e) One way to change the current environment in a program is to overwrite the value of the environ external variable. If you are completely replacing the old environment (-i option), count the number of name=value pairs, allocate enough space for the argument array (do not forget the extra NULL entry), copy the pointers for argv into the array, and set environ.
f) If you are modifying the current environment by overwriting environ, allocate enough space to hold the old environ into the new one. For each name=value pair, determine whether the name is already in the old environment. If the name appears, just replace the pointer. Otherwise, add the new entry to the array.
g) Note that it not safe to just append new entries to the old environ, since you cannot expand the old environ array with realloc. If all the name=value pairs correspond to entries al- ready in the environment, just replace the corresponding point- ers in environ.
h) [Return a different integer as an exit status for an invalid option as that returned for an invalid utility. Mimic the behavior of env on a Linux system.]
i) [Your program must be written in C (not C++) and compile without errors or warnings using gcc on a Linux system.]
C O
N FI
D E
N TI
A L
D R
A FT
7.5. PROCESS MANIPULATION: WAIT AND EXEC 143
[If designed properly, the program required to solve this home- work should occupy no more than 200 lines of code.] [RR03, pp. 54–54]
Use the env command on the system as a reference executable for this exercise:
1 $ env −i env 2 $ env −i A=1 B=2 env 3 A=1 4 B=2
Exercise 7.4.26: Complete Programming Exercise 7.4.25 in Go subject only to the following modifications. If the utility argument is given, use exec.Command to execute utility after the environment has been appro- priately changed. Otherwise, print the changed environment to standard output, one entry per line. Check the return value of exec.Command to handle any errors. Your program must be written in Go and compile with- out errors or warnings using go build on a Linux system. If designed properly, the program required to solve this homework should occupy no more than 100 lines of code.
7.5 Process Manipulation: wait and exec
7.5.1 wait
[ATT][6–41]
7.5.2 fork and wait Exercises
7.5.3 exec
[ATT][6–21]
C O
N FI
D E
N TI
A L
D R
A FT
144 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
signal #
status 0wait(2):
exit(2):
wait(2):
status
0
Figure 7.4: Graphical depiction of wait.
C O
N FI
D E
N TI
A L
D R
A FT
7.5. PROCESS MANIPULATION: WAIT AND EXEC 145
pid: 12791
DATA
STACK
USER AREA
.
.
.
. }
.
.
.
.
.
main() {
execl("new pgm", ...);
BEFORE
TEXT
AFTER
C O
N FI
D E
N TI
A L
D R
A FT
146 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
run−time
p nve
( iteral)l ( ector)v execl execv
execlp execle execvp
athp
nve
vpeexec
compile−time
ath
Figure 7.6: Graphical depiction of suite of exec system calls.
C O
N FI
D E
N TI
A L
D R
A FT
7.5. PROCESS MANIPULATION: WAIT AND EXEC 147
7.5.4 Investigating Questions
7.5.5 Process Review
7.5.6 Other Things to Know
7.5.7 Conceptual Exercises for Section 7.5
Exercise 7.5.1: [SGG07]Exercise 3.4, pp 125–126 Consider the following C program:
1 # include<s t d i o . h> 2 # include<unis td . h> 3 # include<sys/types . h> 4
5 i n t value = 5 ; 6
7 i n t main ( ) { 8 pid_t pid = 0 ; 9
10 pid = fork ( ) ; 11
12 i f (pid == 0) 13 value += 1 5 ; 14 else i f (pid > 0) { 15 wait (NULL ) ; 16 fprintf (stderr , "%d\n" , value ) ; 17 } 18 exit ( 0 ) ; 19 }
Give and explain the output of this program.
Exercise 7.5.2: Consider the following:
A process fully terminates when:
a) its parent has executed wait(&status), and
b) it exits or is killed by a signal (e.g., <ctrl-C>).
Answer the following questions.
a) In what order do the steps above occur during normal process termina- tion?
C O
N FI
D E
N TI
A L
D R
A FT
148 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
b) What happens if they occur in the reverse of normal order?
c) What happens if (b) occurs, but (a) never occurs?
Exercise 7.5.3: What does the following program guarantee?
1 # include <s t d i o . h> 2 # include <sys/wait . h> 3
4 i n t main ( ) { 5 i n t pid ; 6 i n t status ; 7 printf ("Hello World!\n" ) ; 8 pid = fork ( ) ; 9
10 i f (pid == −1) { 11 perror ("bad fork" ) ; 12 exit ( 1 ) ; 13 } 14
15 i f (pid == 0) 16 . . . 17 else { 18 wait (&status ) ; 19 . . . 20 } 21 }
7.5.8 Programming Exercises for Section 7.5
Exercise 7.5.4: [SGG07] Exercise 3.6 pg 126 The Fibonacci sequence is the series of numbers 0, 1, 1, 2, 3, 5, 8, . . . Formally, it is expressed as
fib0 = 0
fib1 = 1
fibn = fibn−1 + fibn−2.
Write a complete C program that spawns n process which cooperate to compute and print the first n Fibonacci numbers, where n is given as a command-line argument, such that each process computes and prints only
C O
N FI
D E
N TI
A L
D R
A FT
7.5. PROCESS MANIPULATION: WAIT AND EXEC 149
one number in the sequence. The processes must terminate in reverse or- der of creation. Be careful to synchronize the processes so that the numbers are printed in the correct order. For instance,
1 $ ./a .out 2 2 0 1 3 $ ./a .out 3 4 0 1 1 5 $ ./a .out 4 6 0 1 1 2 7 $ ./a .out 5 8 0 1 1 2 3 9 $ ./a .out 6
10 0 1 1 2 3 5 11 $ ./a .out 7 12 0 1 1 2 3 5 8 13 $ ./a .out 12 14 0 1 1 2 3 5 8 13 21 34 55 89 15 $
Exercise 7.5.5: Write a complete C program that takes one or more command-line arguments that represent a (valid or invalid) Linux com- mand and invokes that command as efficient as possible. You may assume that the command line will never contain any quotes or other special char- acters. Do not use the library call system in your program. Your program must check for errors. Keep your program to approximately 15 lines of code.
Examples:
1 $ ./a .out ps 2 PID TTY TIME CMD
3 1707 pts/2 0 0 : 0 0 : 0 0 ps 4 16649 pts/2 0 0 : 0 0 : 0 0 bash 5 $ ./a .out cal 9 1990 6 September 1990 7 Su Mo Tu We Th Fr Sa
8 1 9 2 3 4 5 6 7 8
10 9 10 11 12 13 14 15 11 16 17 18 19 20 21 22 12 23 24 25 26 27 28 29 13 30
C O
N FI
D E
N TI
A L
D R
A FT
150 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
14 $ ./a .out echo hello world 15 hello world
16 $
Exercise 7.5.6: This exercise is an extension of Programming Exer- cise 4.31.36. Specifically, extend your solution to Programming Exer- cise 4.31.36 so that the lines read represent Linux commands to be exe- cuted. After the command line is tokenized and stored in the array of arguments, rather than printing it, fork a child and then have the parent wait for the child and have the child execvp the Linux command using the argument vector. This time print a prompt for input to standard error, as shown below. For instance,
1 $ gcc ourshell .c −o ourshell 2 $ ./ourshell 3 ourshell> date
4 Wed Feb 10 1 0 : 5 9 : 2 7 EST 2016 5 ourshell> hostname
6 cpssuse07
7 ourshell> uname
8 Linux
9 ourshell> uname −a 10 Linux cpssuse07 3.11.10−29−desktop #1 SMP PREEMPT Thu Mar 5 1 6 : 2 4 : 0 0 ←֓
UTC 2015 (338 c513 ) x86 64 x86 64 x86 64 GNU/Linux 11 ourshell> wc −l 12 hello world
13 good
14 bye
15 ˆD 16 3 17 ourshell> ls −l −a ˜ / .profile 18 −rw−−−−−−− 1 lucia wheel 2087 Aug 11 2015 .profile 19 ourshell>
20 ourshell> cal 9 1752 21 September 1752 22 Su Mo Tu We Th Fr Sa
23 1 2 14 15 16 24 17 18 19 20 21 22 23 25 24 25 26 27 28 29 30 26 ourshell> gcc parsestring .c −o parsestring 27 ourshell> ./parsestring 28 one two three four
29 :one : 30 :two :
C O
N FI
D E
N TI
A L
D R
A FT
7.5. PROCESS MANIPULATION: WAIT AND EXEC 151
31 :three : 32 :four : 33 ls −a −l myfile 34 :ls : 35 :−a : 36 :−l : 37 :myfile : 38 ˆD 39 ourshell> lsss
40 ./ourshell : lsss : No such file or directory 41 ourshell> ./parsestring1 42 ./ourshell : ./parsestring1 : No such file or directory 43 ourshell> ˆD 44 $
Exercise 7.5.7: Consider the following scenario: You are programming a Raspberry Pi computer running a Linux kernel. You have a compiled pro- gram (i.e., an executable, e.g., called utility) that performs some task once (e.g., flashes an LED), and may take some command-line arguments (e.g., the number of times you want the LED to flash). What you want to do is write another program, named repeat.c, that accepts that other program (e.g., utility) as a command-line argument and continually ex- ecutes it, from start to completion, as its own process (i.e., not as part of the repeat process). The repeat process never terminates (i.e., it runs for- ever, like a daemon). Write the complete repeat.c C program. Do not use system in your program, and keep your program to approximately 15 lines of code. Hint: the repeat process never has more than one child at a time and never terminates before it.
Examples:
1 $ gcc echohello .c −o utility 2 $ ./utility 1 3 hello
4 $ ./utility 1 −nonewline 5 hello$ ./utility 2 6 hellohello
7 $ ./utility 3 −nonewline 8 hellohellohello$ ./utility 3 9 hellohellohello
10 $
11 $ gcc repeat .c −o repeat 12 $ ./repeat ./utility 2
C O
N FI
D E
N TI
A L
D R
A FT
152 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
13 hellohello
14 hellohello
15 hellohello
16 hellohello
17 hellohello
18 . . . 19 continues forever
20 . . . 21 $ ./repeat ./utility 1 −nonewline 22 hellohellohellohellohellohello . . . . . . continues forever . . . . . .
Exercise 7.5.8: [RR03, pp. 88–89] Expand the process fan structure pre- sented in this chapter through the development of a simple batch process- ing facility, called runsim, which is the start to a licence manager for an application program.
Requirements:
Suggested library and system calls appear in parentheses.
a) Your source code must be written in C (not C++) and compile without error(s) or warning(s) using gcc on a Linux system.
b) Write a program called runsim which takes exactly one command-line argument specifying the maximum number of simultaneous exeuctions.
c) Check for the appropriate command-line argument and output a usage message if the command line is incorrect.
d) Initialize pr limit from the command line. The pr limit variable specifies the maximum number of children allowed to execute at a time.
e) Initialize the pr count variable to 0. The pr count variable holds the number of active children.
f) Execute the following main loop until EOF is reached on standard input.
i) If pr count is equal to pr limit , wait for a child to finish and decrement pr count.
ii) Read a line from standard input (fgets) of up to MAX CANON char- acters and execute a program corresponding to that command line by forking a child (fork, markargv, and execvp).
iii) Increment pr count to track the number of active children.
C O
N FI
D E
N TI
A L
D R
A FT
7.5. PROCESS MANIPULATION: WAIT AND EXEC 153
iv) Check if any of the children have finished (waitpid with the WNOHANG option). Decrement pr count for each completed child.
g) After encountering an end-of-file on standard input, wait for all the remaining children to finish and then exit.
h) Write a test program called testsim to test the runsim. The program testsim must accept exctly two command-line arguments: the sleep time and the repeat factor. The repeat factor is the number of times testsim iterates a loop. In the loop, testsim sleeps for the specified sleep time and then outputs a message with its process ID to standard error. Use runsum to run multiple copies of the testsim program.
i) Create a test file called testing.data which contains command lines to run, e.g.,
testsim 5 10
testsim 8 10
testsim 4 10
testsim 13 6
testsim 1 12
j) Run the program by entering a command line such as the following:
runsim 2 < testing.data
k) Create a README file and log a list of your observations in it.
l) Develop a Makefile which builds your programs. Your Makefile must include target directives for every derived file produced during the compilation process (i.e., each program, each object file, and any other intermediate files produced during code generation and compilation). Make sure that each directive also lists all files on which the derived file depends in its dependency list. Also, your Makefile must be written so carries out only the commands necessary to bring any produced file up-to-date. Your Makefile must do just enough, but no extra, work to bring the final executables (runsim and testsim) up-to-date every time make is invoked. In addition, it must have an all directive and a clean directive to remove all generated files. Use variables where appropriate in your Makefile to improve its readability.
C O
N FI
D E
N TI
A L
D R
A FT
154 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
fork1
2
3
2
wait
exec exit
$ date
date
Figure 7.7: Process creation system calls.
7.6 Putting It All Together: Basic Shell Setup
[ATT][6–7]
7.7 Interprocess Communication
7.7.1 I/O Redirection
7.7.2 Implementing I/O Redirection
[RR03, p. 130]
[RR03, p. 131]
7.7.3 Helpful Functions
7.7.4 Unamed and Named Pipes (FIFOs)
Simple (Unnamed) Pipes
Setting Up Pipelines in C
C O
N FI
D E
N TI
A L
D R
A FT
7.7. INTERPROCESS COMMUNICATION 155
bash(e.g. ) shell
1 fork()
bash(e.g. ) shell 2 exec()
2 wait()
a.out(e.g. ) program
3 exit()
Figure 7.8: .
a.out
0
before redirection
0
1
2
file descriptor table
standard input
standard error
standard output
2
1
Figure 7.9: Before redirection.
a.out
0 0
1
2
file descriptor table
standard input
standard error
testfile.txtwrite to 1
2
after redirection
to testfile.txt
Figure 7.10: After redirection.
C O
N FI
D E
N TI
A L
D R
A FT
156 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
0
1
2
file descriptor table
standard input
standard error
testfile.txtwrite to
0
1
2
file descriptor table
standard input
standard error
0
1
2
file descriptor table
standard input
standard error
testfile.txtwrite to
after afterafter open dup2 close
3 3write to testfile.txt
standard output
write to testfile.txt
Figure 7.11: Redirection steps.
Implementing ls -l | sort -n +4 [RR03, p. 191] [RR03, p. 191] [RR03, p. 192] [RR03]
Named Pipes (FIFOs)
Note about Pipes
7.7.5 C Model vs. Go Model
7.7.6 Signals and Job Control
Shell Job Control
1 $ ls & 2 [ 1 ] 1329 3 $ xclock −update 1& 4 [ 2 ] 1331 5 $ firefox & 6 [ 3 ] 1334
1 [ 1 ] + Stopped spell termpaper 2 [ 2 ] − Running find /usr −name main .exe −print &
1 PID TT STAT TIME COMMAND
2 12360 p0 S 0 : 0 1 −ksh (ksh ) 3 12372 p0 I 0 : 0 0 main
C O
N FI
D E
N TI
A L
D R
A FT
7.7. INTERPROCESS COMMUNICATION 157
standard output
parent
child
1 2
3
0
2
pipe
parent
child
0
1
2
file descriptor table
standard error
3
standard output
4 writepipe
0
1
2
file descriptor table
standard input
standard error
3
4
pipe read
writepipe
pipe read
3 4
4
0
1
standard input
Figure 7.12: After fork.
C O
N FI
D E
N TI
A L
D R
A FT
158 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
4
parent
child
1 2
3
0
2
pipe
parent
child
0
1
2
file descriptor table
standard error
3
standard output
4 writepipe
0
1
2
file descriptor table
standard input
standard error
3
4
pipe read
writepipe
writepipe
pipe read
pipe read
3 40
1
Figure 7.13: After dup2.
C O
N FI
D E
N TI
A L
D R
A FT
7.7. INTERPROCESS COMMUNICATION 159
pipe
parent
child
1 2
0
2
pipe
parent
0
1
2
file descriptor table
standard error
standard output
pipe read
0
1 child
0
1
2
file descriptor table
standard input
standard error
write
Figure 7.14: After close.
pipe 2 write
parent
child
pipe 1 pipe 2
0
1
2 standard error
0
1
2 standard error
child file descriptor table
parent file descriptor table 2
2
1 0
0 1
pipe 1 read
pipe 1 write
pipe 2 read
Figure 7.15: ...
C O
N FI
D E
N TI
A L
D R
A FT
160 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
{
|
}
| {
|
}
|
{ |
{
|
}
| {
|
}
|
{ |
Figure 7.16: Ring of processes communicating through (unamed) pipes vs. ring of threads communicating through channels; key: 2 = process, { or } = thread, and ∼ = pipe or channel.
4 12425 p0 R 0 : 0 0 ps −x
X-Windows
X Server
More Job Control
1 $ at 5 : 0 0pm 2 echo "Time to go home!"
3
4 $ at now + 2 minutes 5 echo "Move on to next topic."
1 $ crontab −e 2 $ crontab −l
Conceptual Exercises for Section 7.7.6
Exercise 7.7.1: What is a signal? What generates signals?
Exercise 7.7.2: What is job scheduling?
C O
N FI
D E
N TI
A L
D R
A FT
7.7. INTERPROCESS COMMUNICATION 161
^Z
background (running)
k i l l − S T O P % j o b _ i d . . .
b g [ % j o b _ i d . . . ]
f g [ % j o b _ i d . . . ]
background (stopped)foreground (running) fg [% job_id ...]
Figure 7.17: Shell job control.
ssh −X cpssuse06.cps.udayton.edu sshd
cpssuse06.cps.udayton.edu
$ firefox &
client
MH
server: X server
(Xming)
server: client:
forwards X connection
Figure 7.18: X server.
C O
N FI
D E
N TI
A L
D R
A FT
162 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
Exercise 7.7.3: What is process scheduling?
Exercise 7.7.4: Processes in a UNIX pipeline (e.g., ls | more) execute (se- quentially or concurrently).
Exercise 7.7.5: [SG, pp. 621–622] (true / false) Signals can be lost, i.e., if another signal of the same kind is sent before a previous signal has been received by the process to which it was directed, then the first signal is overwritten and only the last signal will be seen by the process.
Exercise 7.7.6: [SG, p. 622] (true / false) There is no relative priority among signals. For example, if a process is blocking SIGUSR1 and SIGUSR2 sig- nals, and SIGUSR2 is sent to it before SIGUSR1, there is no guarantee that SIGUSR2 will be received first when the process unblocks both.
Exercise 7.7.7: [KP84, pp. 226–227] Some programs which want to detect signals simply cannot be stopped at an arbitrary point (e.g., in the middle of updating a complex data structure). How can we solve this problem? Write a complete C program which finishes the current iteration in its main processing loop (and only then exits) if it receives SIGINT in the loop.
[RR03, Program 8.11, p. 283]
Exercise 7.7.8: Can we write a C program to count the number of SIGUSR1 signals received without using a signal handler or without call- ing sigwait? If so, give the code. If not, write a program to count the number of SIGUSR1 signals received by calling sigwait, but without us- ing a signal handler.
Exercise 7.7.9: (true / false) Writing to a Linux pipe is not an atomic oper- ation.
Exercise 7.7.10: (true / false) Reading from a Linux pipe is not an atomic operation.
Exercise 7.7.11: (true / false) A controlling terminal can be redirected from the command line like standard input and standard output.
Exercise 7.7.12: Give a signal which cannot be ignored or caught by a han- dler?
Exercise 7.7.13: Signals occur asynchronously. Explain what this means.
C O
N FI
D E
N TI
A L
D R
A FT
7.7. INTERPROCESS COMMUNICATION 163
Exercise 7.7.14: How do interrupts/signals add concurrency to a pro- gram?
Exercise 7.7.15: Assume POSIX guarantees that the function mystery is async-signal safe. This means the mystery can be safely called from within a signal handler. What else does this imply about mystery?
Exercise 7.7.16: (true / false) Since POSIX guarantees read to be async- signal safe, we need not restart read if it is interrupted by a signal.
Exercise 7.7.17: Explain the role played by signals in non-blocking I/O (also called asynchronous I/O).
Exercise 7.7.18: (true / false) Non-blocking I/O is not possible without the use of interrupts/signals.
Programming Exercises for Section 7.7.6
Exercise 7.7.19: Write a complete C program which waits for SIGUSR1 to arrive. The program should not do busy waiting and it should handle other signals while waiting for SIGUSR1.
Exercise 7.7.20: [KP84, pp. 225–226] The C signal handling facility is often used to enable a program to clean up unfinished business before terminat- ing. Complete the following C program so that it ignores SIGINT only if it is already ignored else it deletes its temporary file if SIGINT is received during processing.
# include <s i g n a l . h> char *tempfile = "tmp.XXXXXX" ;
i n t main ( void ) {
/* c r e a t e s a temporary f i l e */ mkstemp (tempfile) ;
/* process ing */
exit ( 0 ) ; }
C O
N FI
D E
N TI
A L
D R
A FT
164 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
Exercise 7.7.21: [KP84, pp. 225–226] Sometimes we want to interpret a sig- nal as a request to stop the current computation and return to a command- processing loop. Think of a text editor: interrupting a long printout should not cause it to exit and lose the work already done. Complete the follow- ing C program so that it ignores SIGINT only if it is already ignored else it should return to the state just prior to the main processing loop if SIGINT is received in the loop.
# include <s i g n a l . h>
i n t main ( void ) {
fo r ( ; ; ) {
/* main process ing loop */ }
. . .
exit ( 0 ) ; }
7.7.7 Conceptual Exercises for Section 7.7
Exercise 7.7.1: List the two primary interprocess communication mecha- nisms used in Linux and C programming presented here.
Exercise 7.7.2: Give the value of argc in a.out in the command line: $ ./a.out < infile > outfile.
Exercise 7.7.3: The questions on the following two pages refer to the fol- lowing program.
[RR03, Program 6.3, p. 190]
1 # include <errno . h> 2 # include <s t d i o . h> 3 # include <unis td . h> 4 # include <sys/types . h> 5
6 i n t main ( void ) { 7 pid_t childpid ;
C O
N FI
D E
N TI
A L
D R
A FT
7.7. INTERPROCESS COMMUNICATION 165
8 i n t fd [ 2 ] ; 9
10 i f ( (pipe (fd ) == −1) | | ( (childpid = fork ( ) ) == −1) ) { 11 perror ("Failed to setup pipeline" ) ; 12 return 1 ; 13 } 14
15 i f (childpid == 0) { 16 i f (dup2 (fd [ 1 ] , STDOUT_FILENO) == −1) 17 perror ("Failed to redirect stdout of ...." ) ; 18 else i f ( ( close (fd [ 0 ] ) == −1) | | (close (fd [ 1 ] ) == −1) ) 19 perror ("Failed to close extra pipe descriptors on ...." ) ; 20 else
21 . . . 22 return 1 ; 23 } 24 i f (dup2 (fd [ 0 ] , STDIN_FILENO) == −1) 25 perror ("Failed to redirect stdin of ...." ) ; 26 else i f ( (close (fd [ 0 ] ) == −1) | | (close (fd [ 1 ] ) == −1) ) 27 perror ("Failed to close extra pipe file descriptors on ...." ) ; 28 else
29 . . . 30 return 1 ; 31 }
Draw a diagram depicting the input/output infrastructure of the two pro- cesses, and give the file descriptor table for each process,
a) [RR03, Fig. 6.2, p. 191] after the call to fork executes, but before any call to dup2 executes
b) [RR03, Fig. 6.3, p. 191] after both calls to dup2 execute, but before any call to close executes
c) [RR03, Fig. 6.4, p. 192] after all calls to close execute
d) [RR03, Exercise 6.7, p. 190] Describe the effect of removing lines 18, 19, 26, and 27 from the program. What output would be generated? Why?
Exercise 7.7.4: (true / false) An open for reading on a Linux pipe blocks until at least one process has the pipe open for writing.
Exercise 7.7.5: (true / false) An open for writing on a Linux pipe does not block until at least one process has the pipe open for reading.
Exercise 7.7.6: (true / false) Writing to a Linux pipe is not an atomic oper- ation.
C O
N FI
D E
N TI
A L
D R
A FT
166 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
Exercise 7.7.7: (true / false) A read on a Linux pipe blocks until some- thing is written to the pipe.
7.7.8 Programming Exercises for Section 7.7
Exercise 7.7.8: Write a complete C program which implements ls -l >> ls.out as efficient as possible. Do not re-implement ls. Rather re-use the system’s ls command. Do not use the system call system in your program. Do not use more than five lines of code.
Exercise 7.7.9: Write a complete C program which implements ls -l | wc -l.
Exercise 7.7.10: Write a complete C program to construct a token ring of two processes as depicted in the image below.
pipe 2 write
parent
child
pipe 1 pipe 2
0
1
2 standard error
0
1
2 standard error
child file descriptor table
parent file descriptor table 2
2
1 0
0 1
pipe 1 read
pipe 1 write
pipe 2 read
[RR03]
C O
N FI
D E
N TI
A L
D R
A FT
7.8. CLIENT-SERVER PROGRAMMING 167
7.8 Client-server Programming
7.8.1 Observations on Client-server Programs
7.8.2 Experimental Runs of Client-server Programs
7.8.3 Conceptual Exercises for Section 7.8
7.8.4 Programming Exercises for Section 7.8
Exercise 7.8.1: The Fibonacci sequence is the series of numbers 0, 1, 1, 2, 3, 5, 8, . . . . Formally, it is expressed as
fib0 = 0
fib1 = 1
fibn = fibn−1 + fibn−2
Develop a system in C that prints the first n Fibonacci numbers using the client-server model with n number of clients using Linux named pipes as the interprocess communication mechanism, where n is given as a command- line argument, such that each client prints only one number in the se- quence. Be careful to synchronize the processes so that the numbers are printed in the correct order.
Specifically, develop a program printer that can only print integers and a process adder that can only add integers. The printer process commu- nicates two integers to the adder process. The adder process adds these two integers and communicates the result to the printer process to be printed to stderr. This cycle of events continues until all of the desired numbers in the sequence are computed and printed. Write two complete C programs below: one for the printer client processes and one for the adder server process. Assume no errors occur (i.e., for simplicity of expo- sition, you need not handle errors). The printer process must take n as a command-line argument. For instance,
$ ./adder & [ 1 ] 12750 $ ./printer 7 0 1 1 2 3 5 8
C O
N FI
D E
N TI
A L
D R
A FT
168 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
$ ./printer 20 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 $
Exercise 7.8.2: Develop a complete system that implements simple pass- word authentication using the client-server model and using a Linux named pipe as the interprocess communication mechanism.
Specifically, develop a program server that accepts a password of exactly eight characters from a client and compares that password to the stored password and writes a message to stderr as shown below.
Write two complete C programs below: one for the server process and one for the client process. Assume no errors occur (i.e., for simplicity of ex- position, you need not handle errors). The client process must take the password as a command-line argument. For instance (assume the pass- word is passpass),
1 $ ./server & 2 [ 1 ] 12750 3 $ ./client passpass 4 Access granted . 5 $ ./client passport 6 Access denied . 7 $
As shown above, the server must persist across connections from multi- ple clients, and the client must not cause the server to terminate.
7.9 Client-server Programming in Qt
7.9.1 Programming Exercises for Section 7.9
Exercise 7.9.3: Integrating Qt and C: Build a graphical user interface in Qt, akin to that shown below, for a C program that raises a base to an exponent and returns the results (see below).
1 # ifndef POWER H 2 # define POWER H 3
C O
N FI
D E
N TI
A L
D R
A FT
7.10. PROGRAMMING PROJECT FOR CHAPTER ?? 169
4 # i f d e f c p l u s p l u s 5 extern "C" { 6 # endif
7
8 i n t power ( i n t x , i n t n ) ; 9
10 # i f d e f c p l u s p l u s 11 } 12 # endif
13
14 # endif
1 # include "power.h"
2
3 i n t power ( i n t x , i n t n ) { 4 i n t result = 1 ; 5
6 fo r (n = n ; n > 0 ; n−−) 7 result *= x ; 8
9 return result ; 10 }
7.10 Programming Project for Chapter 7
(This project is an extension of Programming Exercise 4.31.6 that involved building a simple shell.)
Implement a simple command shell (or command-line interpreter) in C. A shell is a fundamental user interface to any operating system and an example of systems software.
Requirements
C O
N FI
D E
N TI
A L
D R
A FT
170 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
The shell will loop continuously to accept user commands; it will termi- nate when quit is entered.
a) The command-line prompt need not contain the pathname of the cur- rent directory (item #6 on p. 158); use a simple $ for the prompt instead.
b) Your shell must not use its parent shell to provide any functionality since the idea of your shell is to potentially replace your login shell. In other words, assume that your shell is not running as a child on top of your login shell. This also means that you must not use the system call system anywhere in your program.
c) Your shell does not have to support background execution of programs (item #5 on p. 158).
d) You need not write a manual (i.e., manpage) for your shell (Project Re- quirements #2 on p. 158). Thus, a readme file should not be part of your submission.
e) Your shell does not have to support the clr, dir, and help internal commands (items #1ii, #1iii, #1vi, and #1ix on p. 157).
Internal commands
The shell must support the following internal commands, which should be handled by the shell itself and should not be handled by using exec to call an external program.
cd [[<pathname>]<directory>] Change the current default di- rectory to <directory>. If <directory> is not present, report the current directory. If <directory> does not exist an appropriate error message should be reported. The command should also change the PWD environment variable.
environ List all the environment strings.
echo <string> Display <string> on the display, followed by a newline.
C O
N FI
D E
N TI
A L
D R
A FT
7.10. PROGRAMMING PROJECT FOR CHAPTER ?? 171
pause Pause operation of the shell until the enter key is pressed.
set <varname> = <value> Sets a shell variable <varname> to the value <value>. Both <varname> and <value>may consist of a string of case-sensitive alphanumeric characters [a-zA-Z0-9] and each may be up to 32 characters long. Your shell should allow the creation of at least 255 distinct shell variables. Shell variables should be able to be used as part of any <string>, <pathname>, <directory>, or <value>. When using a shell variable as part of a <string>, <pathname>, <directory>, or <value>, the effect should be that the variable is replaced with its corresponding value.
quit Quit the shell.
Program invocations
All the other command-line input is interpreted as program invocation, which should be done by the shell forking and executing the program as its own child processes. The programs should be executed with an en- vironment which contains the entry: parent =<pathname>/myshell. Upon finding the executable, the shell will echo the full path from the system root to the directory where the executable was found. If the executable is not found, the shell will issue an informative error message.
Path specifications
When appropriate, the user may include path specifications in com- mands, as indicated by <pathname> in the internal command specifi- cations above, and elsewhere. The shell will accept path specifications which start with /, ./, and ../.
However, the user should not be required to include path specifications. In a program invocation, when no explicit path is given to an executable, the shell will search for the executable according to the values in the environment variable PATH. This value must be retrieved using the Linux system call getenv.
C O
N FI
D E
N TI
A L
D R
A FT
172 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
The shell environment should contain shell=<pathname>/myshell where <pathname>/myshell is the full path for the shell executable (not a hardwired path back to your directory, but the path from which it was executed).
Other considerations
The shell must take into account the attributes of relevant files. For example, if the command /usr/home/me/foo is entered and the speci- fied file exists in the specified location, but is not executable, the shell will issue an informative error message.
The shell must be able to take its command line input from a file; i.e., if the shell is invoked with a command-line argument:
myshell < batchfile
then batchfile is assumed to contain a set of command lines for the shell to process. When EOF is reached, the shell should exit. If the shell is invoked without a command-line argument it solicits input from the user via a prompt on the display.
The shell must support I/O redirection on either or both stdin and stdout. That is, the command line
programname arg1 arg2 < inputfile > outputfile
will execute the program programname with arguments arg1 and arg2, the stdin file stream replaced by inputfile and the stdout replaced by outputfile.
stdout redirection should also be possible for the internal commands dir, environ, and echo.
With output redirection, if the redirection token is >, then the output
C O
N FI
D E
N TI
A L
D R
A FT
7.10. PROGRAMMING PROJECT FOR CHAPTER ?? 173
file is created if it does not exist, and truncated if it does and its write permissions are set. If the redirection token is >>, then the output file is created if it does not exist, and appended if it does. When an output file is created using redirection, its access permission must at least include read permission for the owner. If redirection targets an existing file whose write permissions are not set, the shell will issue an informative error message.
Changes to shell environment variables should be registered using setenv or putenv so those values will be visible when external program invocations are made. When your shell exits, the environment should be restored to the same state as before the shell was started.
Design and implementation
There are some explicit requirements, in addition to those on the Pro- gramming Style page of the course website:
a) You must decompose your implementation into separate source and header files, in some sensible manner which reflects the logical purpose of the various components of your design.
b) You must document your implementation according to our program- ming style guide.
c) You must properly allocate and de-allocate memory, as needed.
d) If your shell does not implement a specified feature, it should write an appropriate disclaimer when the user attempts to use that feature, something distinguishable from a normal error message resulting from a logically invalid command. Any such omissions should also be docu- mented in the User Manual.
In general, you are expected to apply the design and implementation guidelines and skills covered in your previous computer science courses.
Recommendations and assumptions
There are some explicit assumptions you may make:
C O
N FI
D E
N TI
A L
D R
A FT
174 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
a) No command line will be longer than 100 characters, and no command will be given more than 10 arguments, not counting redirections.
b) Each command argument and redirection symbol will be preceded by at least one blank space.
c) You may find it helpful to consult the Linux manpages on fork, exec, getenv, access, waitpid, opendir, freopen, and those of the re- lated Linux features cited in those manpages.
Additional Requirements:
a) Your implementation must be distributed across more than one source code file, in some sensible manner which reflects the logical purpose of the various components of your design, to encourage problem decom- position and modular design.
Makefile
a) Develop a Makefile which builds your shell.
b) Name your makefile Makefile (i.e., with an uppercase M). Details on writing a Makefilewill be given in class; do not follow the Joe Citizen example given in Project Requirements #6 on p. 158.
c) Your Makefile must include target directives for every derived file produced during the compilation process (i.e., each program, each ob- ject file, and any other intermediate files produced during code com- pilation). Make sure that each directive also lists all files on which the derived file depends in its dependency list. Also, your Makefile must be written so carries out only the commands necessary to bring any produced file up-to-date. Your Makefile must do just enough, but no extra, work to bring the final executable myshell for your shell up-to- date every time make is invoked. In addition, it must have an all direc- tive and a clean directive to remove all generated files. Use variables where appropriate in your Makefile to improve its readability. Your Makefile must bring everything up-to-date, using only gcc, without any warnings or errors, when make is invoked on our system.
C O
N FI
D E
N TI
A L
D R
A FT
7.11. THEMATIC TAKE-AWAYS 175
d) Include a directive to produce the tarball necessary for submission (see below).
Hints
If designed properly, the program required for this project should oc- cupy no more than 500 lines of code.
You are encouraged to develop your shell iteratively/progressively. Specifically,
a) start by implementing the execution of non-shell builtin Linux com- mands (e.g., ls);
b) then implememnt I/O redirection for non-shell builtin commands (e.g., ls > outfile and cat < infile);
c) then implement the shell builtin commands (e.g., environ or quit); and
d) finally, implement I/O redirection for the shell builtin commands (e.g., environ >> outfile).
Sample test data
There is a transcript of a Linux session here which illustrates the ex- ecution our solution on several representative test cases. The input files used in the examples actually live on our Linux system (see the particu- lar computer on which they were run at the top of the file) and you are encouraged to test your program with them for purposes of comparison. These test cases are not exhaustive.
7.11 Thematic Take-Aways
• A process cannot modify its parent.
C O
N FI
D E
N TI
A L
D R
A FT
176 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
7.12 Chapter Summary
7.13 Key Terms
fork process wait
7.14 Bibliographic Notes
C O
N FI
D E
N TI
A L
D R
A FT
7.14. BIBLIOGRAPHIC NOTES 177
Part III: Scripting
C O
N FI
D E
N TI
A L
D R
A FT
178 CHAPTER 7. PROCESSES: CREATION, ENVIRONMENT,
MANIPULATION, AND COMMUNICATION
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 8
Regular Expressions, Pattern Matching, and Filters
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
UNIX is not so much an operating system as a way of thinking. – Unknown.
The UNIX legacy is a set of simple and timeless tools that can take years to master but which can perform seeming miracles in seconds in the hands of experienced users. – a Bellevue Linux Users Group member, 2005.
8.1 Chapter Objectives
• Establish an understanding of basic and full regular expressions.
• Establish an understanding of grep and egrep.
• Establish an understanding of sed and awk.
• Establish an understanding of filter scripts.
• Establish an understanding of the Linux filter style of programming.
179
C O
N FI
D E
N TI
A L
D R
A FT
180 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
[_a−zA−Z0−9]
[0−9]
s1
s2
s3
[_a−zA−Z]
[1−9]
Figure 8.1: A finite-state automaton for a legal identifier and positive integer in C defined by the regular grammar [ a− zA− Z][ a− zA− Z0− 9]⋆ + [1− 9][0− 9]⋆.
8.2 Regular Expressions
A regular expression (RE) defines one or more strings of characters; a reg- ular expression is said to match any string it defines. Regular expression are typically written enclosed in some a special characters, called delim- iters, marking the start or end of a regular expression, but are not part of the regular expression itself; we use forward slashes (/) here. For instance, /abc/ is a regular expression which matches the string abc. The strings matched by a regular expression can be recognized with a finite state au- tomaton (FSA), which has limited recognition capabilities (e.g., no mem- ory) and, therefore, cannot match parentheses. Fig. 8.1 presents a finite state automaton1 which recognizes sentences defined by the regular gram- mar [ a− zA− Z][ a− zA− Z0− 9]⋆ + [1− 9][0− 9]⋆ which describes posi- tive integers and legal identifiers in C. Regular expressions are built using a combination of literal characters and metacharacters. A character is any character except a newline: a-z A-Z 0-9 ( ) = ; : ,. A metacharac- ter (or special character) is a character which represents something other than itself: . ⋆ [] ˆ - $ / + ? | ( ) { }.
8.2.1 What /uses/ [Rr]eg.lar [Ee]xpre[s*]ions\?
Regular expressions are used by many Linux utilities, including editors and filters:
• the shell 1The FSA in Fig. 8.1 is not a pure FSA because it, like the grammar which defines the language it recog-
nizes, uses syntactic sugar. While this FSA only has three transitions, it should have one for each individual input character which moves the automaton from one state to another. For instance, there should be nine transitions between states one and three, one for each positive digit.
C O
N FI
D E
N TI
A L
D R
A FT
8.2. REGULAR EXPRESSIONS 181
• ex (Linux line editor; interactive)
• vi (Linux visual editor; interactive)
• emacs (general-purpose editor)
• tr (character translation tool)
• grep (global regular expression print; file searching tool/utility; re- turns entire matched line, not just matched string)
• sed (Linux stream editor; non-interactive)
• awk (pattern scanning and processing language)
• perl (practical extraction report language; based on the Linux shell and sed and awk)
• py (Python scripting language)
8.2.2 Special or Metacharacters
• period . matches any single character.
/a.c/ matches abc adc aec a=c a:c /x..x/ matches xaax xavx x=kx
• asterisk ⋆ matches zero or more occurrences of the previous regular expression; notice that this is different than the shell wildcard mean- ing.
/ab*c/ matches ac abc abbc abbbbbbbbbbbbbbbbc /a*/ matches "" a aa aaaaaaaaaa /a*b*c*/ matches? /.*/ matches?
• square brackets, the character class symbol [] indicates a set of char- acters, any one of which can match; metacharacters (e.g., ∗ and $) lose their special meaning within square brackets, which the following ex- ceptions: the ˆ character at the start means NOT, and the - character between characters refers to a range.
C O
N FI
D E
N TI
A L
D R
A FT
182 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
/[Mm]ark/ matches mark Mark /t[aeiou]x/ matches tax tex tix tox tux /[abc].⋆/ matches anything beginning with a or b or c /[a-z][a-z]/ matches any two-letter lower-case string /[a-zA-Z]⋆/ matches any word made of letters /[ˆabc].⋆/ matches anything starting with something besides a or b or c /[a-zA-Z0-9 ]⋆/ matches?
To match a literal ˆ in a character class, put it somewhere other than in the first position (e.g., [a-zˆ])
To match a literal - in a character class, put it somewhere other than in between two characters (e.g., [-a-z])
All other metacharacters are literal in a character class. Therefore, con- text matters.
• caret ˆ outside a character class means ‘beginning of line.’
/ˆT/ matches all lines starting with T /ˆ[0-9]/ matches?
• dollar sign $ outside of a character class means ‘end of line.’
/T$/ matches all lines ending with T /ˆ$/ matches? /\ˆ\$/ matches?
• backslash \ is used to escape special characters.
/\./ matches . /a\⋆b/ matches a⋆b
8.2.3 Regular Expression Examples
• social security numbers (SSNs): [0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9] (yes, it is rather long winded, but we will shorten below)
• legal C identifier: [a-zA-Z ][a-zA-Z0-9 ]⋆
C O
N FI
D E
N TI
A L
D R
A FT
8.2. REGULAR EXPRESSIONS 183
)
$grep \\\\ wc.c\n
$ls cat.c wc.c $grep \\ wc.c
$grep \\\\ wc.c\n $la^?s *.c\n ^D
^U ^V
Kernel metacharacters
kernel
sh, ksh, bash )(e.g.,
shell
)grep, sed, awk(e.g.,
application
terminated by a \n )
interpreted command line
command line
output
keystrokes
(perhaps containing shell metacharacters: *, ?, #, \
consumes shell metacharacters
consumes apllication metacharaters
(application metacharacters: \, $
$ls *.c\n
Figure 8.2: Progressive layers of metacharacter interpretation.
C O
N FI
D E
N TI
A L
D R
A FT
184 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
8.2.4 Regular Expression Rule
A regular expression always matches the longest string possible starting from the beginning of the line. For instance, consider the string: This (rug) is not what it once was (a long time ago), is
it?
/Th.⋆is/ matches? /(.⋆)/ matches?
8.2.5 Using grep
The grep filter prints to standard output the lines matching a regular ex- pression or pattern.
• grep <search pattern> <filename(s)>: print to standard output all the lines in the given file(s) that contain a match of the search pattern (e.g., grep "abc" text.txt prints out all lines in the file text.txt containing the string abc somewhere in them).
• grep -i <search pattern> <filename(s)>: same as above, but ignores case of the searched string (e.g., grep -i path .login .tcshrc).
• grep <search pattern> <filename(s)>: print to standard output all the lines in the given files(s) which do not contain a match of the search pattern (e.g., grep -v "abc" text.txt prints out all the lines in text.txt which do not contain the string abc anywhere in them).
• grep -f <search strings filename> <filename(s)>: causes grep to look for search strings in the file following the -f (e.g., grep -f searchstrings.txt .login .tcshrc).
Quotes are optional around regular expressions which do not contain spaces or other shell metacharacters (discussed in Chapter 3). See Fig. 8.2.
C O
N FI
D E
N TI
A L
D R
A FT
8.2. REGULAR EXPRESSIONS 185
8.2.6 Full Regular Expressions
Full regular expressions contain additional metacharacters than those found in basic regular expression so simplify the construction of a regular expression, regular in a terse expression. Since any full regular expression can be rewritten as a semantically equivalent basic regular expression, full regular expressions add syntactic sugar to basic regular expressions. The grep utility uses basic regular expressions while egrep (extended grep which is the same as grep -E) uses full regular expressions.
• plus is + similar to ⋆, but matches one or more occurrences of the preceding regular expression.
/ab+c/ matches abc abbc abbbc but not ac ..⋆ = .+
• question mark ? matches zero or one occurrences of the previous regular expression.
/ab?c/ matches ac abc
• logical or |matches either the regular expression before or the regular expression after the vertical bar.
/abc|def/ matches abc def
• parentheses ( ) can be used to group regular expressions for use with ⋆, ?, +, |, and so on.
/ab(c|d)ef/ matches abcef abdef /((abcef)|(abdef))/matches abcef abdef /ab(cd|de)fg/ matches abcdfg abdefg
Depending on the program (see below), you may need to use \( and \) for grouping instead.
• set braces \{ and \} are used to specify repetitions of a regular expression.
C O
N FI
D E
N TI
A L
D R
A FT
186 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
Table 8.1: Differences in metacharacter semantics across similar tools. special or semantics
metacharacters grep/ex/vi egrep
( ) literal grouping \( \) grouping literal { } special ? \{ \} ? repetition
/[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}/matches SSNs
/a\{4,\}/ matches four or more as (n or more)
/[a-z]\{3,5\}/ matches three to five lower case letters (in general, the range n thru m, with n 6 m)
Again, depending on the program (see below), you may need to use \{ and \} for repetition instead.
• fgrep: self-study
8.2.7 Subtle Point about Tools that use Regular Expressions
Different tools and utilities implement a different set of metacharacter, some with the same meanings and others with different meanings. Con- sult the manpage for the particular tool for the definitive meaning of a special character for that tool. However, we highlight one important dif- ference here.
In grep and ex/vi, ( and ) characters used alone match themselves, while \( and \) are used for grouping. The egrep utility uses the opposite conventions; { and } are special in grep and ex/vi. See [?][Chapter 6 (pp. 295–301)] and, especially [?][Tables 6-1 and 6-2 (pp. 296–297)]
8.2.8 Conceptual Exercises for Section 8.2
Exercise 8.2.1: To match the strings abc and abbbc but not ac, use the extended regular expression:
a) /ab⋆c/
C O
N FI
D E
N TI
A L
D R
A FT
8.2. REGULAR EXPRESSIONS 187
b) /ab+c/
c) /ab?c/
Exercise 8.2.2: To match social security numbers, use the regular expres- sion:
a) /[0-9]⋆/
b) /[0-9]+/
c) /[0-9]{9}/
Exercise 8.2.3: (true or false) The shell metacharacter . and the grep metacharacter . have different semantics.
Exercise 8.2.4: What theoretical model of computation is used to match regular expressions to strings?
Exercise 8.2.5: What does the command line grep ’\ˆ[ˆx]’ y match?
Exercise 8.2.6: Consider the following (the -n to cat prefaces each line with its line number).
0 $ c a t −n textfile 1 a
2 aa
3 aaa
4 ab
5 aba
6 abb
7 abc
8 abd
9 abe
10 ac
11 aca
12 ad
13 ada
14 ae
15 aea
16 b
17 c
18 d
19 e
20 bba
21 aaabbbb
C O
N FI
D E
N TI
A L
D R
A FT
188 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
22 $
For each of the following command lines, indicate all lines from textfile, using line numbers (1 through 21 from top to bottom), that are returned.
a) cat textfile | grep ’abc’
b) cat textfile | grep ’a..’
c) cat textfile | grep ’a.⋆’
d) cat textfile | grep ’a[ab].?’
e) cat textfile | egrep ’a[ab].?’
f) cat textfile | grep ’[ˆa]’
g) cat textfile | grep ’ˆ[ˆa]$’
h) grep ’\$’ < grepfile
i) grep \\\\ grepfile
j) grep ’$’ grepfile
Exercise 8.2.7: Consider the following (from [KP84, Exercise 3-3, p.79]
0 $ c a t −n grepfile 1 grep \$ 2 grep \\$ 3 grep \\\$ 4 grep '\$'
5 grep '\'$'
6 grep \\
7 grep \\\\
8 grep "\$"
9 grep '"$'
10 grep "$"
11 $
For each of the following command lines, indicate all lines from grepfile, using line numbers (1 through 10 from top to bottom), that are returned.
a) grep "\$" grepfile
b) grep ’\$’ grepfile
C O
N FI
D E
N TI
A L
D R
A FT
8.2. REGULAR EXPRESSIONS 189
c) $ grep -v ’$’ grepfile
d) grep -v ’\$’ grepfile
e) grep ’[ˆ$]’ grepfile
f) grep "[ˆ$]" grepfile
g) grep \\\\ grepfile
h) grep ’$’ grepfile
Exercise 8.2.8: For each of the following items, write a basic regular expres- sion that matches the specified text (including but not limited to all of the underlined phrases in each example) and no other text in the given line, assuming that your expression is intended to be used with grep. For these items, you may not simply list an underlined phrase itself; you must use at least one special character in each answer. Note: the string following each item is just provided for illustrative purposes. Therefore, do not write a regular expression that matches the underlined strings only in the sample sentence.
a) Matching Hello, hi, or howdy:
Hello, there. Or is ‘‘hi’’ or ‘‘howdy’’ more to
your liking?
b) Matching the, regardless of case:
The quick brown fox jumps over the lazy dog.
c) The last word in a sentence:
How many sentences are here? There are two. No,
three!
d) A social security number:
Match the number 045-35-2344 but not 045-3-52344.
e) A word with five or more letters:
This sentence does not have many long words.
f) Any proper noun:
Jean-Luc, Worf, and Q, but not wormhole jump.
g) An entire sentence that ends in a period:
Does this sentence end in a period?
This one, indeed, does.
C O
N FI
D E
N TI
A L
D R
A FT
190 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
h) Any sequence beginning with ”artificial” and ending with ”intelli- gence”:
Politicians can act artificial, but do they have intelligence?
i) Any of computer, computers, or computing:
computer science is the study of computing, and how
computers work.
j) Matching any phrase of exactly three words separated by white space:
This is a short sentence.
Exercise 8.2.9: Complete Conceptual Exercise 8.2.8 using full regular ex- pressions.
Exercise 8.2.10: Consider the following text, taken from the manpage for a hypothetical Linux command called flip:
Flip is a file interchange program that converts text file
formats between **IX and MS-DOS. It converts lines ending
with carriage-return (CR) and linefeed (LF) to lines ending
with just linefeed, or vice versa.
For each of the following regular expressions, circle all strings (without crossing lines, of course) in the text provided for each expression that match the regular expression.
Also, in the regular expressions below ( and ) characters used alone match themselves, while \( and \) are used for grouping (these are the rules that grep and ex/vi use, while egrep uses the opposite conventions). Also, \{ and \} are special in grep and ex/vi. Remember that regular expressions match the longest possible string. For example, the regular expression /(.⋆)/ matches the following string: (CR) and linefeed (LF). And as usual, all of these regular expressions are also case-sensitive.
a) /in/
b) /[R-Z]/
c) /ˆ[Ff]/
d) /.$/
e) /ee*/
C O
N FI
D E
N TI
A L
D R
A FT
8.3. SED 191
f) /\*/
g) /lines\{0,\}/
h) /[Cc].*[Ff]/
i) /(.\{2\})/
j) /[Ii][acX][ˆa-f]/
Exercise 8.2.11: Using the same text from the previous problem, for each of the following full regular expressions, circle all strings (without cross- ing lines, of course) in the text provided for each expression that match the full regular expression. Again, remember that full regular expressions match the longest possible string. For example, the full regular expression /\(.*\)/ matches the following string: (CR) and linefeed (LF).
a) /F[ˆ ]+/ (the character following the ˆ is a single space)
b) /line(s|[ˆs ]+)/ (the character following the second s is a single space)
c) /v.*e/
d) /[a-z]*[e.]$/
e) /\*+/
8.2.9 Programming Exercises for Section 8.2
Exercise 8.2.12: Write a complete grep command line that prints to stan- dard output only all lines of its input that contain more than one word, where a word is any string of characters except whitespace.
Exercise 8.2.13: Write a complete grep command line that prints to stan- dard output only the lines input which contain a single quote (’) character.
8.3 sed
The sed utility is a non-interactive stream editor and is a Turing complete language; it is helpful for processing rows of text. The sed utility (and the vi editor) is based upon ex and, thus, we begin our discussion there (see Fig. 8.3).
C O
N FI
D E
N TI
A L
D R
A FT
192 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
vi (interactive) sed (non-interactive)
ed
ex
Figure 8.3: Graphical depiction of the foundational natural of ed and ex for vi and sed. The semantics of an arrow between a source and target are ‘source is dependent on target.’
Table 8.2: Some sample ex addresses. address semantics
10,20 lines 10 thru 20 .,100 current line thru line 100 .,$ current line (.) thru last line of file ($) 1,$ line 1 thru last line of file ($), or the entire file % the entire file
8.3.1 ex (Line Editor)
The vi editor is a masterpiece in user-interface software design and is close to a full programming language because of its use of ex (the Linux line editor). The most effective approach to studying vi involves learning/- knowing the general syntax rather than memorize commands. The general syntax of ex commands is:
:[<address>]<command>[<options> ]
Some example ex addresses are given in Table 8.2. Some example ex com- mands are given in Table 8.3. When experimenting with these commands in vi it is helpful to start by entering :set list in ex mode. This will make tabs and end of lines (EOLs) visible as ˆIs and $s, respectively (:set nolist undoes this operation).
General format of search and replace commands:
:<address>s/<regexp>/<replacement text>/
8.3.2 Essential sed
The execution model of sed for each line in the input stream, illustrated in Fig. 8.4, is:
CONFIDENTIAL DRAFT
8.3. S E D
193
Table 8.3: Some sample ex commands. The symbols 2 and→ represent a single space character and single tab character, respectively.
command description/notes
:g/ˆ$/d delete all blank lines (same as grep -v ’ˆ$’) :%s/Alice/Lucia/g (the g option makes the substitution global) replace all occurrences, not just the first, on each line :%s/hello/& world/g & represents the matched text :%s/→/222/g replaces each tab with three consecutive spaces, on each line :%s/[2 →][2 →]⋆ $//g purges trailing whitespace from every line :%s/fprintf/FPRINTF/g replaces all occurrences of fprintf with FPRINTF :.,$s/fprintf/FPRINTF/g replaces all occurrences of fprintf from the current line (.) to the last line of the file ($) with FPRINTF :10,20s/fprintf/FPRINTF/g replaces all occurrences of fprintf from line 10 to 20 with FPRINTF :%s/ˆ\([A-Z][a-z-]⋆\),2\([A-Z][a-z-]⋆\)$/\22\1/ converts names from <last>,2<first> format to <last>2<first> format :%s/ˆ\([[:alpha:]]⋆\)2\([[:alpha:]]⋆\)$/\2,2\1/ undoes the previous substitution :100,200m. moves lines 100 thru 200 to the current line (.) :10,20w newfile.txt extracts lines 10 thru 20 and write them to newfile.txt
C O
N FI
D E
N TI
A L
D R
A FT
194 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
outer loop
command 1
command 2
command 3
command 4
command 5
command 6
files
file 1
file 2
file 3
file 4
file 5
edit commands
Figure 8.4: The sed execution model.
Table 8.4: Some sample sed<condition>s and <action>s which can be combined to form instances of the general format of sed syntax.
<condition>s <action>s /<regexp>/ d
/m,n/ p
$ q
<condition>! :<address>s/<regexp>/<replacm. text>/
<condition>, <condition> w <filename>
i
a
file(s)sed −e ’{
address space applies to all
... editing commands ...
>1 commands and/or
with newlines
without { }, put an individual, and possibly distinct, address for each expression
expressions separated
}’
Figure 8.5: The -e option to sed.
C O
N FI
D E
N TI
A L
D R
A FT
8.3. SED 195
1. Read input line from standard or file input into pattern space.
2. Apply commands to pattern space.
3. Send pattern space to standard output.
Thus, sed reads in one line at a time, applies all the commands sequen- tially, then picks up the next line, and so on. Note that this is in contrast to reading all lines at once, applying the first command, then reading all again, applying the second command, and so on. This way we need only make one pass through the input (see Fig. 8.4).
The syntax of sed commands is similar to that of ex:
general syntax: <condition><action> detailed syntax: [<address>[,<address>]][!]<cmd>[<args>]
Sample sed <condition>s and <actions>s which can be combined to form instances of the the general format of sed syntax are given in Ta- ble 8.4.
The sed utility can be invoked in the following ways:
sed ’<editing commands>’ <file(s)>
cat <file(s)> | sed ’<editing commands>’
sed -f <editing commands file> <file(s)>
In the last invocation style above, if sed editing commands exist in a file commands.sed, then invoke sed as sed -f commands.sed <file(s)>.
Some options to sed require particular attention. The -n option sup- presses the default output (i.e., step three of the sed execution model), both in the presence of absence of the p or d action. Note that in the ab- sence of the -n option, the p action is always assumed (i.e., step three).
For instance, the following two distinct sed command lines, one with the -n option and one without it, produce the same output:
sed -n ’/one/p’<file> ≡ sed ’/one/!d’ <file>
Also, notice that
C O
N FI
D E
N TI
A L
D R
A FT
196 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
sed -n /<regexp>/p’ <file> ≡ grep <regexp> <file>
There are multiple ways to apply multiple sed commands to a stream of input. The following four command lines will always produces the same output.
$ sed ’ˆ$/d’ spaces | sedˆ[2→][2→]⋆//’ | sed ’s/[2→][2→]⋆$//’
$ sed ’/ˆ$/d
s/ˆ[2→][2→]⋆// sed ’/[ 2→][2→]⋆$//’ spaces
$ sed -e ’/ˆ$/d’ -e ’s/ˆ2→][2→]⋆//’ -e ’s/2→][2→]⋆$//’ spaces
$ cat sedscript
/ˆ$/d
s/ˆ2→][2→]⋆//
s/[2→][2→]⋆$// $ sed -f sedscript spaces
See Fig. 8.5 for more details on using the -e option to sed.
8.3.3 Some Representative Examples
Some illuminating sed example commands lines are given in Table 8.5.
8.3.4 A Simple Faculty Database Example
Consider the stream of data, available in faculty.details, in Table 8.6. Consider the following transcript of sed command lines over this data stream (from which output is absent for purposes of brevity).
1 $ # same as grep CPS f a c u l t y . d e t a i l s 2 $ sed −n '/CPS/p' faculty .details 3 $
4 $ # same as above
CONFIDENTIAL DRAFT
8.3. S E D
197
Table 8.5: Some sample sed command lines. The symbols 2 and → represent a single space character and single tab character, respectively.
sed ’s/[→]/2/g’ main.c replaces each tab with three consecutive spaces, on each line (will changes take effect in the file main.c?)
sed ’s/[2 →][2 →]⋆$//’ main.c purges trailing whitespace from each line sed ’s/index1/index2/g’ main.c replace string index1 with string index2 on the current line; note . assummed, if omitted sed -n ’20,30p’ file print lines 20 thru 30 from file sed ’1,10d’ file delete lines 1–10 from file sed ’$d’ file delete the last line of file du -a | sed ’s/.⋆ →//’ purges the first columns from the du -a output [KP84][p. 109] sed ’s/ˆ\([A-Z][a-z-]⋆\),2\([A-Z][a-z-]⋆\)$/\22\ 1/’ file replace string1,string2 with string2 string1 sed ’10,20w newfile’ file write lines 10 through 20 of file to newfile sed ’1,/ˆ$/d’ file delete lines 1 thru the first line blank line sed -n ’/ˆ$/,/ˆend/p’ file print only the lines between the first blank line thru the first line that
contains the string end at the beginning of the line sed ’s/ˆ/→/’ file prepends the current line with a tab [KP84][p. 109] sed ’/./s/ˆ/→/’ file same as previous except the substitution
only applies to lines which have at least one character (.) [KP84][p. 110] sed ’/ˆ$/!s/ˆ/→/’ file same as previous (! inverts the condition) [KP84][p. 110]
C O
N FI
D E
N TI
A L
D R
A FT
198 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
Table 8.6: The faculty.details file.
Name: Mehdi Zargham Office: 139 Anderson Hall Course: CPS 149
Name: Raghava Gowda Office: 142 Anderson Hall Course: CPS 310
Name: James P. Buckley Office: 146 Anderson Hall Course: CPS 430/542
Name: Dale Courte Office: 144 Anderson Hall Course: CPS 132
Name: Saverio Perugini Office: 145 Anderson Hall Course: CPS 444/544
Name: Zhongmei Yao Office: 150 Anderson Hall Course: CPS 470
Name: Phu Phung Office: 149 Anderson Hall Course: CPS 341
Name: Ju Shen Office: 151 Anderson Hall Course: CPS 465/592
Name: Atif Abueida Office: 105-B Science Center Course: MTH 218
Name: Benjamin Kunz Office: 305 St. Joe’s Course: PSY 495/506
Name: Mark Masthay Office: 178 Science Center Course: CHM 105
5 $ sed '/CPS/!d' faculty .details 6 $
7 $ # p r i n t s l i n e s with a cross−l i s t e d course ; 8 $ # same as sed −n '/\//p ' or grep '\/ ' f a c u l t y . d e t a i l s 9 $ sed −n '/[/]/p' faculty .details
10 $
11 $ # p r i n t l i n e s conta in ing a non−cross−l i s t e d course ; 12 $ # same as grep −v '\/ ' f a c u l t y . d e t a i l s 13 $ sed '/\//d' faculty .details 14 $
15 $ # removes ”Name: ” from f i l e f a c u l t y . d e t a i l s 16 $ sed 's/ˆName:[ ]//' faculty .details 17 $
18 $ # removes ”Name: ” & ” O f f i c e : ” from f a c u l t y . d e t a i l s 19 $ sed 's/ˆName:[ ]//' faculty .details | sed 's/Office:[ ]//' 20 $
21 $ # how can we purge a l l a t t r i b u t e l a b e l s 22 $ (i .e . , "Name: " , "Office: " , "Course: " ) ? 23 $ # mult iple ways : 24 $ sed 's/[A-Za-z][A-Za-z]*: //g' faculty .details 25 $
26 $ # w i l l not work , s i n c e sed uses b a s i c re g ular e xpre s s ions and 27 $ # not f u l l re g ular e xpre s s ions 28 $ sed 's/[A-Za-z]+: //g' faculty .details 29 $
30 $ sed 's/[A-Za-z]\{1,\}: //g' faculty .details 31 $
32 $ # purges a l l a t t r i b u t e l a b e l s , 33 $ # n o t i c e escape of newline metacharacter 34 $ sed 's/ˆName:[ ]//' faculty .details | sed 's/Office:[ ]//' | \ 35 > sed 's/Course:[ ]//' 36 $
C O
N FI
D E
N TI
A L
D R
A FT
8.3. SED 199
37 $ sed −e 's/ˆName:[ ]// 38 > s/Office:[ ]//
39 > s/Course:[ ]//' faculty .details 40 $
41 $ c a t sedfile
42 s/ˆName : [ ]// 43 s/Office : [ ]// 44 s/Course : [ ]// 45 $
46 $ sed −f sedfile faculty .details 47 $
48 $ sed 's/ˆName:[ ]\(.*\)Office:[ ]\(.*\)
49 > Course:[ ]\(.*\)$/\1\2\3'\ faculty .details 50 $
51 $ sed 's/[A-Za-z][A-Za-z]*://g' faculty .details
8.3.5 d for Delete
The d action delete lines from the output stream, not original file.
Examples:
• sed ’d’ faculty.details reads in one line at a time into a buffer (work space), deletes it, and prints the contents of the buffer (in this case, empty)
• sed ’1d’ faculty.details reads in one line at a time into the buffer, deletes it if it is line 1, and prints the buffer contents onto out- put (in this case, all lines except 1 would be output)
• sed ’$d’ faculty.details does the same, but for the last line
• sed ’2,4d’ faculty.details deletes lines from 2 up to and in- cluding line 4
• sed ’/Yao/,/ran/d’ faculty.details deletes lines starting from one which matches Yao up to and including one which matches ran
• sed ’/Yao/,/ran/!d’ faculty.details negates the address (i.e., do not delete these lines, and delete others)
C O
N FI
D E
N TI
A L
D R
A FT
200 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
8.3.6 p for Print
The p action print lines from the buffer. Examples:
• sed ’p’ faculty.details reads in one line at a time into the buffer and prints each. Notice that by default sed prints what is in the buffer. Therefore, you will get two copies of each line.
• sed -n ’p’ faculty.details, the -n suppresses the default print action of sed. Therefore, this is the equivalent to cat.
We can use the same addressing commands as before (e.g., sed -n 4,6 ’p’ faculty.details prints lines 4 through 6).
8.3.7 More sed Jargon
• = prints (just) the line number
• a appends text at the end of the buffer; use it as a\ followed by what you want to append
• b branches out of pattern matching (i.e., stop attempting to make more matches)
8.3.8 A Tale of Two Buffers
Normally, sed reads one line at a time into its main buffer, called the pattern buffer. There is another buffer, called the hold buffer, available for use. Some commands to work with this buffer include:
• h copies the contents of the main buffer into the hold buffer, thus overwriting whatever it was that was already in the hold buffer
• g copies the contents of the hold buffer into the main buffer, over- writing it
• H does the same as h, except it appends the contents of the main buffer after the last line in the hold buffer
• G does the same as g, again in the ‘append’ sense
C O
N FI
D E
N TI
A L
D R
A FT
8.3. SED 201
• x exchanges contents of the two buffers; what was in hold buffer is now in the pattern buffer, and vice versa; a buffer (work space), deletes it, and prints the contents of the buffer (in this case, empty)
• N reads in an additional line and appends it to the contents of the pattern buffer; in between the original line and the newly added line, N will insert a newline (\n) character; useful for reading in multiple lines at a time (see flip example below)
8.3.9 newer Script
Linux utilities and languages such as sed can be used creatively to craft clever system utilities. For instance, consider the following newer script which prints to standard output all the files in the current directory newer in modification time than the first filename command-line argument2.
0 # !/ usr/bin/env ksh 1
2 /bin/ls −t | sed −e '/ˆ'$1'$/q' | sed '$d' 3
4 e x i t 0
Notice that the first command-line argument to the script is stored in vari- able $1, which is unquoted so to subject it to shell interpretation. The interpretive nature of the Linux shell and sed enable this organic style of programming (i.e., scripting) which in cwould require access to the inodes of the files so to check modification times, a laborious process.
8.3.10 Conceptual Exercises for Section 8.3
Exercise 8.3.1: Explain why the % symbol representing the entire file in ex is not required when we desire sed substitutions of the form s/<regexp>/<replacement text>/ to take place over the entire in- put stream.
2The -t option to ls list the files in order from newest to oldest.
C O
N FI
D E
N TI
A L
D R
A FT
202 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
8.3.11 Programming Exercises for Section 8.3
Exercise 8.3.2: Write a complete sed command line that prints to standard output the lines of its single file argument that consists only of 5-letter (upper and lower case) palindromes. A palindrome is a word which reads the same backwards and forwards (e.g., CbXbC or abcba).
Exercise 8.3.3: What does the following command line sed ‘s/\⋆f/’ h output?
Exercise 8.3.4: Write a command line which would print to standard out- put only
a) lines 50-100 of a file testdriver.cpp.
b) all lines in a file words.txt that have only five characters in them and read the same backwards as forwards (i.e., five-character palindromes).
c) all the lines in the files f1 and f2 which end with the literal string $HOME.
Exercise 8.3.5: Write a complete sed command line that prints to stan- dard output the contents of its file arguments with all leading and trailing whitespace purged from every line. For instance,
1 $ c a t abc
2 a $
3 b $
4 $
5 c$
6 $
7 $
8 d $
9 $ c a t def
10 d $
11 $
12 $
13 a $
14 $ c a t abc def | sed . . . 15 a$
16 b$
17 $
18 c$
19 $
C O
N FI
D E
N TI
A L
D R
A FT
8.3. SED 203
20 $
21 d$
22 d$
23 $
24 $
25 a$
where $ indicates end-of-line.
Exercise 8.3.6: Complete Programming Exercise 8.3.5, but this time also purge all blank lines. For instance,
1 $ c a t abc
2 a $
3 b $
4 $
5 c$
6 $
7 $
8 d $
9 $ c a t def
10 d $
11 $
12 $
13 a $
14 $ c a t abc def | sed . . . 15 $ ./sanatize abc def 16 a$
17 b$
18 c$
19 d$
20 d$
21 a$
where $ indicates end-of-line.
Exercise 8.3.7: Suppose we have the following file ids in our current di- rectory, which contains only valid social security numbers, one per line, with no leading or trailing whitespace.
1 $ c a t ids
2 111224555 3 254342341 4 314344311
C O
N FI
D E
N TI
A L
D R
A FT
204 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
5 314570001 6 701091008 7 . . . 8 112816522 9 $
Write a command line to convert each id in the form xxxyyzzz to xxx-yy-zzz and print the results numerically sorted to standard output.
Exercise 8.3.8: Consider the following:
1 $ c a t ids
2 111−22−4555 3 254−34−2341 4 314−34−4311 5 314−57−0001 6 701−09−1008 7 . . .
ids is in the current directory and contains only valid social security num- bers, one per line, with no leading or trailing whitespace.
Write a command line to convert each line in the form xxx-yy-zzz to xxxyyzzz and print the results numerically sorted to standard output.
Exercise 8.3.9: Suppose the output of ls -l appears as follows: [KP84, p.13]
1 $ ls −l 2 total 12 3 drwx−−x−−− 3 cps444−n1 . 2 1 cps444 512 Oct 17 14 :46 C/ 4 −rw−−−−−−− 1 cps444−n1 . 2 1 cps444 273 Oct 17 15 :56 Makefile 5 drwx−−−−−− 2 cps444−n1 . 2 1 cps444 1024 Oct 26 15 :03 backups/ 6 drwx−−−−−− 2 cps444−n1 . 2 1 cps444 512 Oct 17 14 :41 bin/ 7 drwx−−−−−− 2 cps444−n1 . 2 1 cps444 512 Oct 3 16 :22 tmp/
Write a complete command line that prints to standard output the list of files in the current directory (one per line), together with their date of last modification (use <month> 2 <dd> 2 >filename> format).
Exercise 8.3.10: Suppose we have the following file guestlist in our current directory, which contains one name per line in the format
C O
N FI
D E
N TI
A L
D R
A FT
8.3. SED 205
<last>,2 <first>, with no leading or trailing whitespace or blank lines, where 2 represents a single space character.
1 $ c a t guestlist
2 Pike , Rob 3 Ritche , Dennis 4 . . . 5 Kernighan , Brian 6 Thompson , Ken 7 $
Write a single command line to convert each line in the form <last>,2 <first> to <first> 2 <first> and print the results alphabetically sorted by first name to standard output.
Exercise 8.3.11: Suppose we have the following file guestlist in our current directory, which contains one name per line in the format <last>,<first>, including possible leading or trailing whitespace or possible whitespace after the comma, where $ indicates end-of-line.
1 $ c a t guestlist
2 Pike , Rob $ 3 $
4 $
5 Ritche , Dennis $ 6 . . . 7 Kernighan , Brian $ 8 $
9 Thompson , Ken$ 10 $
Give a single command line to convert each line of standard input in the form <last>,<first> to <first> 2 <first> and print the results, with any leading and trailing whitespace, and all blank lines, purged to standard output, where 2 represents a single space character.
Exercise 8.3.12: Complete Programming Exercise 8.3.7, but this time print the results alphabetically sorted by first name to standard output.
Exercise 8.3.13: Rewrite the newer script in § 8.3.9 at least two different ways by altering the syntax in line 3 so that it still generates the same out-
C O
N FI
D E
N TI
A L
D R
A FT
206 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
put as the unaltered version. Experiment with the use of shell (single and double) quotes.
Exercise 8.3.14: Write a complete Korn shell script, invoking sed, that takes only a single filename command-line argument and prints to stan- dard output the filenames (one per line) in the current working direc- tory that are older (in modification time) than the file passed at the com- mand line, which must reside in the current working directory. The first command-line argument to the script can be referenced in the script as $1.
Exercise 8.3.15: Complete Programming Exercise 8.2.12 using a complete sed command line.
Exercise 8.3.16: Complete Programming Exercise 8.2.13 using a complete sed command line.
Programming Exercises 8.3.17–8.3.29 below are related to the faculty database example used in § 8.3.4.
Exercise 8.3.17: Write a sed command line to delete all blank lines in the file faculty.details.
Exercise 8.3.18: Write a sed command line to print the lines pertaining to faculty who have offices in Anderson Hall.
Exercise 8.3.19: Write a sed command line to find the line numbers de- scribing faculty who teach non-cross-listed undergraduate courses.
Exercise 8.3.20: Assume that Perugini is an assistant professor and all other professors are associate professors. Write a sed command line to print each professor’s rank on a separate line, after the given line, in the form Rank: <rank>. Do not include any addresses in your editing com- mands. Put the editing commands to solve this exercise in a file rank.f and invoke it as: sed -n -f rank.f faculty.details.
Exercise 8.3.21: Write a sed command line to print the lines in the format <name>:<office>:<course> (i.e., strip the labels Name:, Office:, and Course:).
Exercise 8.3.22: Write a sed command line to print the lines in the format <course>:<office>:<name>
C O
N FI
D E
N TI
A L
D R
A FT
8.4. FILTERS 207
Exercise 8.3.23: Write a sed command line to output each entry (line of input) as three lines.
Exercise 8.3.24: Suppose faculty offices are moving. Move faculty in An- derson Hall to the Science Center and move those in the Science Center to Miriam Hall. However, faculty office numbers will remain the same. Write a sed command line to make this change.
Exercise 8.3.25: Write a sed command line to pretty print the file so that each line has one line before it describing what it is about (e.g., “The next line is about Dr. Zhongmei Yao”) before the first line.
Exercise 8.3.26: Write a sed command line to completely capitalize the names of faculty (see the Linux transliterate command below).
Exercise 8.3.27: Write a sed command line to flip alternate lines.
Exercise 8.3.28: Write a sed command line to delete all the blank lines.
Exercise 8.3.29: Write a sed command line to consolidate multiple blank lines, wherever they occur, into just one blank line (i.e., replace multiple blank lines with just one blank line) (hint: investigate the D action) (see the Linux uniq command below).
8.3.12 Programming Project for Section 8.3
8.4 Filters
8.4.1 tr (anslate)
tr only reads from standard input. Syntax: tr <string1> <strings2>
tr converts characters in <string1> to those, respectively, in <strings2>. For instance, tr A-Z a-z < myfile. Options:
• tr -d (delete character(s) in <strings1>)
• tr -c (act on complement of <strings1>)
• tr -s (squeeze strings of repeated characters)
C O
N FI
D E
N TI
A L
D R
A FT
208 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
8.4.2 sort
The sort utility can be fine-tuned to sort columns in a variety of ways:
• sort -n (numeric-sort: compare according to string numerical value)
• sort -g (general-numeric-sort: compare according to general nu- merical value)
• sort -r (reverse sort: reverse the result of comparisons)
• sort -rn (reverse numeric-sort)
• sort -d (dictionary order: consider only blanks and alphanumeric characters)
• sort -b (ignore leading blanks)
• sort -f (ignore-case: fold lower case to upper case characters)
• sort -k=2 (sort on column 2)
• sort -t":" -k=2 (sort on column 2 using colon delimited columns)
8.4.3 uniq
The uniq filter purges duplicate consecutive lines (i.e., they must be adja- cent) fast (in O(n) linear time).
Options:
• uniq -d (only prints the lines which are repeated)
• uniq -u (only prints the lines which are not repeated)
• uniq -c (count)
To purge duplicates, first sort and then apply uniq. For instance, sort name | uniq which is semantically equivalent to sort -u names.
C O
N FI
D E
N TI
A L
D R
A FT
8.4. FILTERS 209
8.4.4 Spellers
There are multiple spellers available in Linux:
• spell
• ispell (interactive spell)
• aspell
Add following line to your .vimrc to invoke aspell on the current file in vim using the keystroke <crtl-t>:
map ˆT <CR>:!aspell --dont-backup check %<CR>:e! %<CR>
8.4.5 Pipeline of Filters
Recall the Linux model of computation and communication mechanism setup for free by the shell:
1 $ detex uist2015 .tex | aspell list | sort | uniq 2 $ detex uist2015 .tex | aspell list | sort | uniq | wc −l 3 $ detex uist2015 .tex | aspell list | sort −u 4 $ detex uist2015 .tex | aspell list | sort −u | wc −l 5 $ detex 20150115 .tex | nroff
8.4.6 Toward Database Operations: cut and paste, and join
The paste utility is the vertical analog of cat (e.g., paste a b). To con- catenate multiple lines of one file into a single line, use paste -s a. Dif- ferent delimiters can also be used (e.g., paste -s -d ":;|" a).
A pipeline of these filters can be used to extract or merge fields or columns from lines.
1 $ who | cut −d" " −f1 | paste − −
The join utility is relational database operator and can be used to join two files based on a common, sorted column, called the join key. For instance,
C O
N FI
D E
N TI
A L
D R
A FT
210 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
1 $ c a t idfname
2 1 Larry 3 2 Linus 4 3 Lucia 5 4 Leisel 6 $
7 $ c a t lnameid 8 Smith 1 9 Jones 2
10 Murphy 3 11 Patrick 4 12 $
13 $ join −1 1 −2 2 idfname lnameid 14 1 Larry Smith 15 2 Linus Jones 16 3 Lucia Murphy 17 4 Leisel Patrick 18 $
19 $ sed 's/\(.*\)[ ]\([1-9][0-9]*\)$/\2 \1/' lnameid | \ 20 > join −1 1 −2 1 − idfname 21 1 Smith Larry 22 2 Jones Linus 23 3 Murphy Lucia 24 4 Patrick Leisel
8.4.7 File Comparison Utilities
• comm
– syntax: comm <file1> <file2>
– only meaningful if <file1> and <file2> are sorted.
– Merges the two files and prints to standard output each line in one of three columns:
1. line(s) only in <file1>
2. line(s) only in <file2>
3. line(s) in both <file1> and <file2>
– sample output:
an apple
cat both ideas
dog
elephants
C O
N FI
D E
N TI
A L
D R
A FT
8.4. FILTERS 211
– use options to indicate which columns to suppress from output
• cmp
• diff
– finds and prints to standard output differences between two files or two directories
– syntax:
diff <file1> <file2>
diff -r <directory1><directory2> (-r indicates recursive diff)
• sdiff: self-study
8.4.8 Printing and Other Related Filter Utilities
• lpr, lpd, lpq (a suite of utilities to print files),
• indent (a source code pretty printer),
$ c a t .indent .pro # resource f i l e f o r indent −br −nce −cdw −npcs −ncs −bs −brs −brf −i3
• script (maintains a transcript of a terminal session, e.g., script diary),
• expand, unexpand (converts tabs to spaces and vice versa),
• dos2unix, unix2dos (converts plain text files to and from using DOS and UNIX newline characters and vice versa),
• iconv (convert character encoding of given files from one encoding to another),
• a2ps (ascii to postscript), enscript, nenscript (utilities for con- verting plain ASCII text files to Postscript),
C O
N FI
D E
N TI
A L
D R
A FT
212 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
• groff, troff, nroff (a plain ASCII text formatting system),
• latex, pdflatex, dvips, xdvi, bibtex, detex (suite of tools for the LATEXdocument typesetting system),
• ghostview, gv, ggv (Postscript suite of tools and viewers),
• xpdf, acroread (PDF viewers),
• ps2pdf, pdf2ps (conversion utilities to and from Postscript to PDF and vice versa), and
• xfig (a WYSIWYG drawing tool)
8.4.9 Conceptual Exercises for Section 8.4
Exercise 8.4.1: Consider the following input stream.
hello
hi
hi
hello
Give the output of following command lines on above input stream:
a) uniq
b) uniq -u
c) uniq -d
d) uniq -c
Exercise 8.4.2: Suppose we have a file ∼/alongfile containing many misspelt words, including duplicates. Write a single command line which would print to standard output a count of the misspelt words excluding duplicates.
C O
N FI
D E
N TI
A L
D R
A FT
8.5. THE AWK PROGRAMMING LANGUAGE 213
8.4.10 Programming Exercises for Section 8.4
8.5 The awk Programming Language
8.5.1 Introduction
The programming language awk is a more powerful sed. It is named after those who developed it: Aho, Weinberger, and Kernighan. It follows a sed style, but uses C syntax to specify commands. While sed is most appropri- ate for processing the rows (or lines) of a plain text file, awk is most appro- priate for processing the columns of a text file. It is useful and powerful for table manipulation and data summarization tasks, and most-appropriate and helpful for processing columned data (i.e., extracting, manipulating, or printing columns from input streams using specified delimiters). It can be used to perform simple (relational) database queries. The awk program- ming language, like sed is is Turing complete.
8.5.2 Execution Model
1 BEGIN {commands executed once before any input is read} 2 {main input loop executed fo r each line of input} 3 END {commands executed once after all input is read}
8.5.3 Simple awking
Consider the following input stream (student.grades):
Lucy 45 55 60 90
Linus 70 75 88 100
Larry 75 80 85 100
Lucia 80 70 70 95
The following awk script cats a file; run it as you would run sed: awk -f <awk scriptname>:
1 { p rin t }
C O
N FI
D E
N TI
A L
D R
A FT
214 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
Note that the curly braces contain commands, just as in sed. Since there is nothing before {, these commands are applied to all lines. The only difference is that instead of p in sed, we have print.
To make the AWK script a self-contained program use #!/usr/bin/awk -f as the first line of the script file.
awk has two special patterns, BEGIN and END, where you can put com- mands which are executed before any line is read, and after all lines are read, respectively. For example:
1 BEGIN { 2 p rin t "I am going to start reading a file. Woo hoo!"
3 } 4 { p rin t } 5 END { 6 p rin t "I have finished reading the file already. Sigh."
7 }
When awk reads a line, it automatically parses the line and puts tokens of the line into built-in defined variables such as $1 (first field), $2 (second field), and so on. The default field separator is a tab (or space). Therefore, the awk script
1 { p rin t $1 }
will just print the names. The built-in variable $0 stores the entire line. We can also declare and manipulate variables, just like we would in a C
program. The following demonstrates how you will calculate the average value of scores in the first column of numbers (which is actually the second column of the file).
1 BEGIN { 2 total = 0 3 lc = 0 4 } 5 { 6 total = total + $2 7 ++lc 8 } 9 END {
10 avg = total/lc 11 p rin t total , avg
C O
N FI
D E
N TI
A L
D R
A FT
8.5. THE AWK PROGRAMMING LANGUAGE 215
12 }
awk also has system variables to modify the output format (e.g., OFS stands for output field separator) which we can set in the BEGIN preamble code segment:
1 BEGIN { 2 total = 0 3 lc = 0 4 OFS = "---" 5 }
This will affect all subsequent outputs written using the print com- mand; in between two variables (listed in comma separated format), awk will insert the output field separator; similarly, there is a FS which is an input field separator variable which can be used to set the input field sep- arator to a character other than the default whitespace.
It is good practice to put one awk command on each line. If you use multiple commands, you will need to use a semicolon ; to separate them.
8.5.4 Fine Tuning awk
The character following a -F on an awk command line specifies the field delimiter, which is whitespace by default.
1 awk −F : '{ p rin t $0} ' faculty .details 2 awk −F : '{ p rin t $1" "$2} ' faculty .details
• FS variable: the field separator, can be assigned a value
• OFS variable: the output field separator, can be assigned a value
• NF variable: stores number of fields in record
• NR variable: the total number of input records seen so far can use C statements for formatted output (e.g., printf (‘‘%d\n’’, $1);)
C O
N FI
D E
N TI
A L
D R
A FT
216 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
Table 8.7: The guestlist file. Hemingway,Ernest
Faulker,William
Steinbeck,John
O’Connor,Flannery
Orwell,George
Huxley,Aldous
Plath,Sylvia
Miller,Arthur
O’Neill,Eugene
Wilson,August
Williams,Tennessee
8.5.5 Some Example awk Command Lines
Consider the stream of data, available in guestlist, in Table 8.7. The fol- lowing awk command lines work with the guestlist data from Table 8.7 as well as the faculty.details data from Table 8.6.
1 # to see who i s logged in 2 who | awk '{print $1}' 3
4 # to see from where users are logged in 5 who | awk '{print $5}' 6
7 p rin t "$(hostname) has been up for
8 $(uptime | awk '{print $3}') days."
9
10 # works l i k e c a t 11 awk '{print}' faculty .details 12
13 awk −F , '{print $2 " " $1}' guestlist 14
15 # why thre e spaces between f i e l d s in output? 16 awk −F , '{print $2, " ", $1}' guestlist 17
18 # s o r t s by f i r s t name 19 awk −F , '{print $2 " " $1}' guestlist | sort 20
21 awk 'BEGIN {FS=":"} {print NF}' faculty .details 22
23 awk 'BEGIN {FS=","; OFS=":"} {print $2, $1}' guestlist
Notice how awk is more suitable for tasks involving the manipula- tion of entire columns (rather than rows) of data, such as culling out
C O
N FI
D E
N TI
A L
D R
A FT
8.5. THE AWK PROGRAMMING LANGUAGE 217
a column or columns of data or transforming a stream of data from <last>,2<first> to <last>2<first> format, than sed in that the command-lines for those tasks involving long drawn-out regular expres- sions, such as those in ex and sed in Tables 8.3 and 8.5, are unnecessary in awk.
8.5.6 Gradebook Example
1 awk 'BEGIN { 2 ns = 0 3 total = 0 4 } 5 { 6 sum = $2 + $3 + $4 7 avg = sum / 3 8 ns++ 9 total += avg
10 p r i n t f ("%d %s: %.2f\n" , ns , $1 , avg ) 11 } 12 END { p r i n t f ("%d students: %.2f\n" , ns , total/ns ) } ' scores
Peter 85 90 95
Paul 25 25 50
Mary 100 80 60
1: Peter 90
2: Paul 33.3333
3: Mary 80
3 students: 67.7778
8.5.7 Implementing uniq in awk
1 $ c a t ouruniq
2 BEGIN { 3 prevline = "" 4 } { 5 i f (NR == 1 | | $0 != prevline ) { 6 p rin t $0 7 prevline = $0
C O
N FI
D E
N TI
A L
D R
A FT
218 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
8 } 9 }
10
11 $ c a t uniq1line
12 BEGIN { 13 prevline = "" 14 } { 15 i f (NR == 1 | | $0 != prevline ) { 16 p r i n t f ("%s " , $0 ) ; 17 prevline = $0 18 } 19 } END { 20 p r i n t f ("\n" ) ; 21 } 22
23 $ sort names | awk −f ouruniq 24 $ sort names | awk −f uniq1line
8.5.8 Conceptual Exercises for Section 8.5
8.5.9 Programming Exercises for Section 8.5
Exercise 8.5.1: Complete Programming Exercise 8.2.12 using a complete awk command line.
Exercise 8.5.2: Complete Programming Exercise 8.2.13 using a complete awk command line.
Exercise 8.5.3: Complete Programming Exercise 8.3.5 using a complete awk command line.
Exercise 8.5.4: Complete Programming Exercise 8.3.6 using a complete awk command line.
Exercise 8.5.5: Complete Programming Exercise 8.3., but this time invoke awk.
Exercise 8.5.6: Complete Programming Exercise 8.3., but this time invoke awk.
Exercise 8.5.7: Complete Programming Exercise 8.3., but this time invoke awk.
Programming Exercises 8.3.17–8.3.29 are the same as Programming Exer- cises 8.5.17–8.5.29, but this time use awk.
C O
N FI
D E
N TI
A L
D R
A FT
8.6. PROGRAMMING PROJECTS FOR CHAPTER ?? 219
8.5.10 Programming Project for Section 8.5
8.6 Programming Projects for Chapter 8
One important and recurring theme of Linux programming is to construct software systems, such as specialized tools and utilities, by dynamically and creatively combining and composing multiple simple, atomic existing tools as the building blocks, using pipes as glue holding them together or, more formally, the interprocess communication mechanism. Pipes and filters are important and powerful tool construction mechanisms, whose use is illustrated in the following two projects.
The following requirements apply to both of the following program- ming projects:
i) Your script must be written in the Korn shell programming language.
ii) The first line of your script must be: #!/usr/bin/env ksh.
iii) Your script must have execute permission (e.g., -rwxr-x--- permis- sions.
iv) Your script must end with a proper exit or return statement (0 for success and non-zero for failure).
v) Do not use any specific aspect of your environment within your script. In other words, use native Linux command names as opposed to your personal aliases for those commands and do not rely on any specific aspect of your environment (e.g., values of particular shell variables).
vi) Your script must only write to standard output.
vii) Your script must not write or produce any intermediate files.
viii) Your script must execute using the Korn shell (ksh) interpreter on a Linux system.
ix) Your script may not contain invocations to C, C++, Perl, Python, Ruby, or any other similar scripting languages to solve the problem.
x) The exact same output as that given must be produced (i.e., zero dif- ferences as defined by diff, sdiff, and cmp)
C O
N FI
D E
N TI
A L
D R
A FT
220 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
You are advised to invest thought into the necessary transformations, and how to structure those transformations, to map the input (e.g., web- page) into the final output. If designed properly, the script required to solve each of these projects should occupy no more than 75 lines of code (and it can be done in less than 20 lines of code) Aim for correctness and clarity, not brevity.
Exercise 8.6.1: Webpage Scraping: Transforming the text of a web- page into a format amenable for entry into a database system is a common task ideally suited for a filter script. In this project, you will creatively compose and combine (through pipes) the tools and utilities covered in this chapter to write a shell filter script to con- vert the semistructured data on a webpage to colon separated values (CSV) file, a format easily imported into a database system, written to standard output without writing or producing any intermediate files. Start by finding a webpage with some semi-structured, tabu- lar data such as the status of the United States Congress at http:// votesmart.org/officials/NA/C/national-congressional#.
ViFfghNViko. Then define an output format such as <state>:<branch>:<party>:<district/seat>:<name>:<URL>, an instance of which is AK:House:Republican:At-Large:Don Young:http://votesmart.org/candidate/26717/don-young. Then write your filter script to convert one into the other. You may rely on the presence of the file pvsurls.txt, available at http://perugini. cps.udayton.edu/teaching/books/SPUC/www/files/pvsurls.
txt in the current working directory. Correct standard output is available at http://perugini.cps.udayton.edu/teaching/books/SPUC/ www/files/pvsstdoutstream.txt
To avoid parsing HTML code, use the following command line in your script, which uses the lynx text-based web browser to write the human- readable contents of a webpage to standard output: lynx -dump -width=200 <url>. The lynx browser can be used to browse the web from non-graphical interfaces such as an ssh terminal. While not neces- sary, you may want to explore the iconv utility to deal with accents in names.
Exercise 8.6.2: Cross-referencing #included Files
C O
N FI
D E
N TI
A L
D R
A FT
8.7. LINUX FILTER STYLE OF PROGRAMMING 221
In large programming projects, keeping track of which source files use which #include files can become a tedious chore. Consider the following files, which contain the listed #includes:
A.cpp B.cpp C.cpp c.h
-------------- ---------------- -------------- --------------
#include <a.h> #include <a.h> #include "b.h" #include <d.h>
#include<b.h> # include <c.h> #include"d.h"
#include "d.h"
The goal is to collect all the files included by each source file. Thus, the following list is desired, sorted first by source filename, and then for each source filename, sorted by include filename.
A.cpp: a.h b.h d.h
B.cpp: a.h c.h
C.cpp: b.h d.h
c.h: d.h
Such a listing is helpful for creating a Makefile. Remember, another theme of Linux programming is to write programs that write programs!
You are to write a shell filter script crossref which takes as arguments any number of C/C++ .c .cpp source files and #include .h files, and produces a sorted list as described above. Your script must run at the com- mand line as crossref <file(s)>. For instance, the command line crossref [ABC].cpp c.h could produce the output given above. You may assume that your script will always be given valid file(s) that exist. Each line of your output must separate the source filename from the files it #includes with a single colon (:) followed by a single space. Delimit each #include file with a single space. Each line should contain no lead- ing and trailing whitespace or extraneous text, as shown in the output above.
8.7 Linux Filter Style of Programming:
Monolithic Programs vs. Atomic Programs + Glue
This chapter presents a pattern for programming based on a simple, yet powerful idea: instead of writing one large, compiled, monolithic, unmal-
C O
N FI
D E
N TI
A L
D R
A FT
222 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
input
P1
|
P2
|
P3
|
...
|
Pn
output
Figure 8.6: Graphical depiction of the Linux filter style of programming: solving a prob- lem as a chain of concurrent processes communicating with I/O through pipes.
C O
N FI
D E
N TI
A L
D R
A FT
8.7. LINUX FILTER STYLE OF PROGRAMMING 223
leable C++ or Java program for a particular specialized task, use pipes as glue to creatively combine and compose a set of small, atomic, lego-like, programs, whose use in isolation is limited, on-the-fly at run-time, from the catalog of Linux tools and utilities to build a solution to that special- ized task (see Fig. 8.6). The resulting system is a collection of concurrently running process, communicating with each other in synchronized manner through input and output. The power of these atomic tools is unleashed when they are used as building blocks in a large system. In other words, the power and utility of the final composition is greater than the sum of its parts. Moreover, the resulting program can be decomposed and re- composed as easily as it was originally composed to meet ever-evolving software requirements. This approach makes programming more of an art than a science.
It is the way I think. I am a very bottom-up thinker. If you give me the right kind of Tinker Toys, I can imagine the building. I can sit there and see primitives and recognize their power to build structures a half mile high, if only I had just one more to make it functionally complete. I can see those kinds of things. – Ken Thompson, creator of UNIX, 1999 (from Computer Magazine in- terview)
The synergy of many atomic tools and utilities and a programmable shell, explored further in the next chapter, with interprocess communication mechanisms enables and fosters this style of programming.
This idea is not complete revolutionary. For instance, programmers have been constructing programs as compositions of invocations to a col- lection of off-the-shelf routines called libraries for almost a half century. Moreover, the object-oriented paradigm of programming involves com- posing a program as a collection of objects, from object collections, which communicate with each other by passing messages to each other. How- ever, the Linux style of filter programming lifts that pattern of software de- velopment to the process level, where each of the mini-computation units are heavyweight processes with composition mechanisms (e.g., pipes) which make decomposition and recomposition more convenient than end- less cycles of debug-modify-recompile-rerun.
C O
N FI
D E
N TI
A L
D R
A FT
224 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
You can think of languages like C++ or Java as a Swiss-Amry knife; the are multi-purpose languages that can perform a wide range of tasks reasonably well, but are not idealy suited to one particular task or applica- tion domain. Linux tools and utilities, on the other hand, particularly sed and awk, are ideally suited for particular, targeted tasks, but are not gen- eral enough to be practical multi-purpose langauges3. For instance, sed is well-suited for manipulating row-oriented text data and awk is help- ful for working with column-oriented data, but neither have support for concurrent programming. Java, on the other hand, has support for both text-processing and concurrent programming. The moral of the story is, if do not know what tasks you are going to face out in the field and cannot bring multiple tools with you, take a multi-purpose language such as C++ or Java. If, on the other hand, you know what particular task you are going to face in the field, take a domain-specific language such as sed or awk.
Linux Tools & Utilitites (atoms) +
Programmable Shell + Interprocess Communication Mechanisms (glue) = Powerful Toolkit for Developing
Reconfigurable Programs
On− the− Fly
In the following chapter we study shell programming and contrast it with the Linux filter style of programming.
8.8 Thematic Take-Aways
• A regular expression always matches the longest string possible start- ing from the beginning of the line.
• Metacharacters common to the shell and the utility the shell is invok- ing in a command need to be protected from shell interpretation, and protected from utility interpretation if intended to be literal in the util- ity (e.g., grep ‘$’ or grep ‘\$’) (see Fig. 8.2).
• A regular expression is not a regular grammar.
3This is notwithstanding the fact that both are Turing complete.
C O
N FI
D E
N TI
A L
D R
A FT
8.9. CHAPTER SUMMARY 225
8.9 Chapter Summary
8.10 Key Terms
awk, egrep, fgrep, finite state automaton, grep, regular grammar, regu- lar expression, metacharacter, pattern, sed, special character
8.11 Bibliographic Notes
C O
N FI
D E
N TI
A L
D R
A FT
226 CHAPTER 8. REGULAR EXPRESSIONS, PATTERN MATCHING, AND FILTERS
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 9
Shell Programming
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
9.1 Chapter Objectives
• Establish an understanding of Korn shell programming.
• Contrast the Linux filter style of programming with shell program- ming.
9.2 Introduction
A shell script (or shell program) is a series of Linux commands placed in an ASCII text file. Each shell (e.g., ksh, bash, or csh) provides mechanisms for control (e.g., if, while, and for statements)
9.2.1 return vs. exit
Same difference as in C (i.e., same semantics in main; different semantics in functions. return allows you return a value from a function; exit exits the current shell entirely.
227
C O
N FI
D E
N TI
A L
D R
A FT
228 CHAPTER 9. SHELL PROGRAMMING
9.2.2 Command-line Arguments
Arguments given to a shell script on the command line when it is invoked are available through the variable $* (a space separated list) and "$@" (a list with each argument double quoted separately). Individual arguments to the shell script are referenced as $1, $2, $3, . . . , $9, and $0 is the name of the shell script. The built-in command shift can be used to access command-line arguments beyond a count of nine, as shown below. The variables $# stores the number of command-line arguments (i.e., the shell analog to argc in C, save for the command name).
Examples:
1 $ # p r i n t s a l l the command−l i n e arguments 2 $ echo $* 3 $ # the number of command−l i n e arguments 4 $ # ( does not include the command name ) 5 $ echo $# 6 $ # p r i n t s the command name 7 $ echo $0
8 $ # p r i n t s the f i r s t command−l i n e argument 9 $ echo $1
10 $ # s h i f t s the arguments l e f t by n 11 $ # ( e . g . , i f n = 1 , arg1 = arg2 , arg2 = arg3 , and so on ) 12 $ s h i f t n
$⋆ vs. $@
When unquoted, $⋆ and $@ have the same semantics: all arguments on command line, except the command name. When quoted, "$*" represents all arguments on command line as one string (i.e., "$1 $2 ..."), and "$@" means all arguments on command line, individually quoted (i.e., "1””2" ...).
9.3 Command and Control
9.3.1 for Loops
A for loop is used to iterate over all items in a list or array.
C O
N FI
D E
N TI
A L
D R
A FT
9.3. COMMAND AND CONTROL 229
Syntax:
1 fo r variable [ in list ] 2 do
3 statements
4 done
for <variable> [in <list>]
do
<statements>
done
Example:
1 fo r name in Lucy Linus Lucia Larry Leisel
2 do
3 p rin t "Next person is $name."
4 done
5 e x i t 0
If in list is omitted in a for loop in a script, the list is assumed to be $* (i.e., all of the command line arguments to the script). The keywords do and donemust be on lines by themselves, or use the ; statement separator (e.g., for directories in $PATH; do).
Example:
1 # !/ usr/bin/env ksh 2 # p r i n t a l l arguments to a s h e l l s c r i p t 3 fo r arg in $ * ; do 4 p rin t $arg 5 done
6 e x i t 0
Illustrative Script
1 # !/ usr/bin/env ksh 2
3 echo '$* is ' $* 4 echo '$@ is ' $@
C O
N FI
D E
N TI
A L
D R
A FT
230 CHAPTER 9. SHELL PROGRAMMING
5 p rin t '$# is ' $# 6 p rin t "The number of arguments to $0 was $#."
7
8 p rin t $0
9 p rin t $1
10 p rin t $2
11 p rin t $3 12 p rin t $# 13
14 p rin t
15
16 # f o r f i l e 17 # f o r f i l e in ”$ *” 18 fo r file in "$@"
19 do
20 echo $file
21 done
22
23 e x i t 0
Sample invocations:
1 $ ./prog "a b" c d 2 $* is a b c d 3 $@ is a b c d
4 $# i s 3 5 The number of arguments to ./prog was 3 . 6 ./prog 7 a b
8 c
9 d
10 3 11
12 a b
13 c
14 d
9.3.2 String Operators
Hostname Examples
1 HOST=$ (hostname | cut −d . −f1 ) 2 HOST=$ (hostname | awk −F . '{print $1}' ) 3 HOST=${HOSTNAME%%.*}
C O
N FI
D E
N TI
A L
D R
A FT
9.3. COMMAND AND CONTROL 231
Table 9.1: String operators. Syntax Semantics
${<varname>:-<word>} if <varname> exists and is not null, return its value; otherwise return <word>
${<varname>:=<word>} if <varname> exists and is not null, return its value; otherwise set it to <word> and then return its value
${<varname>:?<message>} if <varname> exists and is not null, return its value; otherwise print <varname>: followed by <message>, and abort the current command or script
${<varname>:+<word>} if <varname> exists and is not null, return <word>; otherwise return null
${<varname>#<pattern>} if <pattern> matches the beginning of the variable’s value, delete the shortest part which matches and return the rest
${<varname>##<pattern>} if <pattern> matches the beginning of the variable’s value, delete the longest part which matches and return the rest
${<varname>%<pattern>} if <pattern> matches the end of the variable’s value, delete the shortest part which matches and return the rest
${<varname>%%<pattern>} if <pattern> matches the end of the variable’s value, delete the longest part which matches and return the rest
String Variable Comparisons
Use string variable comparisons within [[ <expression> ]]. The [[ and ]] strings are each a token and, thus, must only appear with whites- pace on each side. Within <expression> you can use parentheses for grouping and the relational operators <, >, <=, >=, ==, 6=, &&, and ||.
Examples:
1 $ person=lucia 2 $ [ [ $person = lucia ] ] 3 $ echo $? 4 0 5 $ [ [ $person = linus ] ] 6 $ echo $? 7 1 8 $ [ [ $person != linus ] ] 9 $ echo $?
10 0 11 $ [ [ ($person != linus ) && ($person != lucia ) ] ] 12 $ echo $? 13 1
The = operator is an overloaded operator meaning assignment or com-
C O
N FI
D E
N TI
A L
D R
A FT
232 CHAPTER 9. SHELL PROGRAMMING
parison depending on the context. No space on each side implies assign- ment while spaces on each side implies comparison. String variables con- taining only digits can be treated as numbers using arithmetic relational operators, for strings representing integers: -lt, -le, -eq, -ge, -gt, and -ne with the implied semantics.
9.3.3 if Statement
Syntax:
1 i f condition
2 then
3 statements
4 [ e l i f condition 5 then
6 statements] 7 [ else 8 statements] 9 f i
if <condition>
then
<statements>
[elif <condition>
then
<statements>]
[else
<statements>]
fi
The keywords then, else, elif, and fi are the shell analogs of curly braces (i.e., { }) in C, which have special meaning in the shell. The key- words elif or else can be omitted.
Example:
1 i f [ [ $person = linus ] ] 2 then
3 p rin t $person is on the sixth floor . 4 e l i f [ [ $person = lucia ] ]
C O
N FI
D E
N TI
A L
D R
A FT
9.3. COMMAND AND CONTROL 233
Table 9.2: Additional conditional tests. Syntax Semantics
-n <string> string not null? -z <string> string null? -a <filename> exists? -f <filename> is plain file? -d <filename> is directory? -L <filename> is symbolic link? -s <filename> exists and not empty? -r <filename> read permission? -w <filename> write permission? -x <filename> execute permission? -O <filename> your file? -G <filename> your group? <file1> -nt <file2> <file1> newer than <file2>? <file1> -ot <file2> <file1> older than <file2>?
5 then
6 p rin t $person is on the fifth floor . 7 e l i f [ [ $person = linda ] ] 8 then
9 p rin t $person is on the fifth floor . 10 else
11 p rin t "Who are you talking about?"
12 f i
A <condition> can be anything that returns an exit status. For in- stance:
1 options="-f -d -L" 2 i f p r in t − $options | grep −q −e −d ; then 3 p rin t "option '-d' present in list."
4 f i
9.3.4 Additional Condition Tests
Table 9.2 provides additional conditional tests.
Example:
1 i f [ [ ! −f output .file ] ] ; then 2 p rin t "output.file does not exist."
C O
N FI
D E
N TI
A L
D R
A FT
234 CHAPTER 9. SHELL PROGRAMMING
3 f i
9.3.5 while Statement
Syntax:
1 while condition
2 do 3 statements
4 done
while <condition>
do
<statements>
done
Here <condition> has the same syntax as the if statement. We can use break or continue, or return or exit, inside a loop with the same meaning as in C.
Example:
1 # !/ usr/bin/env ksh 2 # re por t type of e xe cutable f i l e anywhere in search path 3
4 path=$PATH 5 dir=${path%%:*} 6 while [ [ −n $path ] ] ; do 7 i f [ [ −x $dir/$1 && ! −d $dir/$1 ] ] ; then 8 file $dir/$1 9 e x i t 0
10 f i
11 path=${path # * : } 12 dir=${path%%:*} 13 done
14 p rin t "File not found."
15 e x i t 1
9.3.6 Putting It All Together: ourwhich Script
Recall the which program.
C O
N FI
D E
N TI
A L
D R
A FT
9.3. COMMAND AND CONTROL 235
1 $ which ls flex bison
2 /bin/ls 3 /usr/bin/flex 4 /usr/bin/bison
Here we are going to implement the which command as a Korn shell script. When given no argument(s), which prints a usage message and return with exit status 255.
1 $ which
2 Usage : which [options ] [−−] COMMAND [ . . . ] 3 $ echo $? 4 255
If it encounters an argument with no path, it outputs nothing for that argu- ment, continues processing the rest of the arguments as usual, but returns with exit status 1.
1 $ which ls notfound c a t
2 /bin/ls 3 which : no notfound in (/bin :/usr/bin :/usr/local/bin :/usr/sbin ) 4 /bin/ c a t 5 $ echo $? 6 1
If all arguments have a path, which exits with status 0.
1 $ which which X
2 /usr/bin/which 3 /usr/bin/X 4 $ echo $? 5 0
We cannot assume that each directory in the PATH is valid or that a file with a path is executable.
1 # !/ usr/bin/env ksh 2
3 # i n s e r t code here to catch a l i a s e s 4
5 exit_status=0; 6
7 i f [ [ $# −ne 0 ] ] ; then
C O
N FI
D E
N TI
A L
D R
A FT
236 CHAPTER 9. SHELL PROGRAMMING
8 i f [ [ −n $PATH ] ] ; then 9 path=$ ( echo $PATH | sed 's/:/ /g' )
10 fo r cmd ; do 11 found=0 12 fo r dir in $path ; do 13
14 # i s i t a d i r e c t o r y 15 # p r i n t $d ir/$cmd 16 # fol lowing i f i s superf luous 17 # i f [ [ −d $d ir ] ] ; then 18 i f [ [ −f $dir/$cmd ] ] ; then 19 i f [ [ (−x $dir/$cmd ) && ( ! −d $dir/$cmd ) ] ] ; then 20 p rin t "$dir/$cmd"
21 found=1 22 break
23 f i
24 f i
25 done 26 i f [ [ $found −eq 0 ] ] ; then 27 p rin t "$0: no $cmd in ($PATH)"
28 exit_status=1; 29 f i
30 done
31 f i
32 else
33 p rin t "Usage: ./ourwhich [filename...]" 1>&2 34 e x i t 255 35 f i
36
37 e x i t $exit_status
9.3.7 case Selection
Syntax:
1 case expression in
2 pattern1 ) 3 statements ; ; 4 pattern2 ) 5 statements ; ; 6 . 7 . 8 . 9 esac
C O
N FI
D E
N TI
A L
D R
A FT
9.3. COMMAND AND CONTROL 237
case <expression> in
<pattern> )
<statements> ;;
<pattern> )
<statements> ;;
.
.
.
esac
Double semicolon (;;) is required to terminate <statements>. The <statements> corresponding to the first pattern matching the <expression> are executed, after which the case statement termi- nates. The <expression> is usually some variable’s value. The <patterns> can be plain strings, or they can be Korn shell patterns us- ing metacharacters *, ?, !, [], and so on, including file-matching patterns. A <pattern> can consist of several patterns separated by | (logical or). A case statement is an attractive construct for determining which options to a script have been passed on the command line (see below).
Example:
1 case $person in
2 linus ) 3 p rin t "Oh..He's on the tenth floor." ; ; 4 lucy | linda ) 5 p rin t "They're out to lunch." ; ; 6 * ) 7 p rin t "Hmm. Not sure." ; ; 8 esac
Note that inside a case | does not act as a pipe (i.e., when used for inter- process communication).
9.3.8 Example: Factoring Command-line Arguments into Options and
Filenames
1 #example usage ./ f a c t o r i n g −d −f f1 f2 f3
C O
N FI
D E
N TI
A L
D R
A FT
238 CHAPTER 9. SHELL PROGRAMMING
2
3 args=" "$* 4
5 echo args :$args : 6
7 # i n v e s t i g a t e the use of getopt and g e topts 8 options=${args%% ( [a−zA−Z0−9] |/) *} 9
10 options=$ ( echo $options | sed 's/ˆ[ ]//' ) 11
12 files=${args# $opt ions } 13
14 p rin t − options :$options : 15 p rin t files :$files : 16
17 # grep 18 # −q : quie t ; j u s t re turn e x i t s t a t u s 19 # −e : fo l lowing i s a pat tern , not an option ; 20 # p r o t e c t s p a t t e r n s with a leading − 21 # −e i s same as − 22
23 # i f p r i n t − $opt ions | grep −q −e −d 24 # i f p r i n t − $opt ions | grep −q − −d 25 # then 26 # p r i n t − ”−d i s present ” 27 # echo ”−d i s present ” 28 # e l s e 29 # p r i n t − ”−d i s absent ” 30 # f i 31
32 fo r option in $options
33 do
34 case $option in
35 −d ) 36 p rin t "found a -d." ; ; 37 −f | −q ) 38 p rin t − "-f or -q" ; ; 39 * ) 40 p rin t "some other option(s)" ; ; 41 esac 42 done
43
44 e x i t 0
C O
N FI
D E
N TI
A L
D R
A FT
9.3. COMMAND AND CONTROL 239
9.3.9 Conceptual Exercises for Section 9.3
Exercise 9.3.1: To execute a loop five times in the Korn shell, use integer i=1 and then use:
a) do i=1,5
b) while (( i <= 5 ))
c) for i <= 5 ; do
Exercise 9.3.2: The syntax to test a <condition> in the Korn shell is
a) if <condition>
b) if [[<condition> ]]
c) if (<condition> )
Exercise 9.3.3: Consider the following Korn shell script printargs.
1 # !/ usr/bin/env ksh 2 fo r arg in "$@" ; do 3 p rin t $arg
4 done
What do each of the following command lines print?
a) ./printargs a "b c" d
b) ./printargs ’a "b c" d’
Exercise 9.3.4: Will the ourwhich script given in § 9.3.6 have problems with directory names containing a whitespace character (e.g., C2files). Explain.
Exercise 9.3.5: [KP84, p.98] Consider the following Korn shell script:
1 $ c a t mystery
2
3 # !/ usr/bin/env ksh 4 echo '# To unmystery, ksh this file'
5 fo r i
6 do 7 echo "echo $i 1>&2"
C O
N FI
D E
N TI
A L
D R
A FT
240 CHAPTER 9. SHELL PROGRAMMING
8 echo "cat >$i <<'End of $i'"
9 c a t $i
10 echo "End of $i"
11 done
12 $
Suppose we have the following two files, ab and xyz.
1 $ c a t ab
2 hello
3 good
4 bye
5 $ c a t xyz
6 Abc
7 Xyz
a) What would be the standard output of the command line: ./mystery ab xyz ?
b) Explain in your own words what mystery does. When we say ‘in your own words,’ we mean do not just explain, in order, what each line of the script does. Rather, provide a high-level description of the function of the script (e.g., alphabetically sorts the contents of a file).
c) What does the output of mystery do (follow same guidelines as previ- ous question)?
d) In the mystery script, why is the occurrence of End of $i on line 8 single-quoted?
Exercise 9.3.6: Give a command line that tests if f1 is a directory.
9.3.10 Programming Exercises for Section 9.3
Exercise 9.3.7: Write a complete Korn shell script that prints to standard output only all lines of its single file argument that contain more than one word, where a word is any string of characters except whitespace.
Exercise 9.3.8: Write a complete Korn shell script that prints to standard output only all lines of its file arguments that contain more than one word, where a word is any string of characters except whitespace.
C O
N FI
D E
N TI
A L
D R
A FT
9.4. NUMBERS AND ARRAYS 241
Exercise 9.3.9: Write a complete Korn shell script that prints to standard output the lines of its single file argument that consists only of five-letter (upper and lower case) palindromes. A palindrome is a word that reads the same backwards and forwards (e.g., CbXbC or abcba).
Exercise 9.3.10: Extend the ourwhich script given in § 9.3.6 to catch aliases akin to the which command on a Linux system.
9.4 Numbers and Arrays
9.4.1 Numeric Variables
Korn shell variables are strings by default or integers, depending on how they are defined. The statement A=100 assigns the string 100 to variable A. The statement integer A=100 assigns the integer 100 to the variable A. The keyword integer is an alias for typeset -i. To manipulate nu- meric variables using C-style expressions, use either $(( <expression> )) to return the value of expression or (( <expression> )) to return only an exit status.
Examples:
1 $ integer x=1 2 $ ( ( y=x*10 ) ) 3 $ echo $y
4 10 5 $ ( ( x+=1 ) ) 6 $ echo $x
7 2 8 $ p rin t $x $y
9 2 10 10 $ integer a=10 11 $ integer b=21 12 $ ( ( a == 10 ) ) 13 $ echo $? 14 0 15 $ integer X=$ ( ( a+10 ) ) 16 $ echo $X
17 20 18 $ X=$ ( ( a == 10 ) ) 19 $ echo $X
C O
N FI
D E
N TI
A L
D R
A FT
242 CHAPTER 9. SHELL PROGRAMMING
20 0 21 $ ( ( a == 10 ) ) 22 $ echo $? 23 0 24 $ ( ( b < 20 ) ) 25 $ echo $? 26 1 27 $ ( ( (a < 10) | | (a > 100) ) ) 28 $ echo $? 29 1
Within <expression> we can use parentheses for grouping, the arith- metic operators +, -, *, /, %, <<, >>, &, |, ∼, and ˆ, and the relational operators <, >, <=, >=, ==, !=, &&, and ||. Furthermore, within the $(( <expressions> )) and (( <expressions> )) syntax, vari- ables need not be preceded by a dollar sign, and special characters need not be quoted or escaped. The let syntax is same as (( <expressions> )) except <expressions> in the latter need not be quoted (e.g., com- pare and constrast each line of lines 10–12 below with line 13 below). The following is another example of printing all arguments to a shell script, demonstrating these constructs:
1 # !/ usr/bin/env ksh 2
3 integer i=0 4
5 fo r arg in $ * ; do 6 # any of fo l lowing f i v e l i n e s works 7 p rin t "Argument $i is '$arg'."
8 # i n s i d e ( ( . . . ) ) or a f t e r a l e t s tatement the $ may be omitted 9 p rin t "Argument $(( i++ )) is $arg"
10 ( ( ++i ) ) 11 ( ( i++ ) ) 12 ( ( i+= 1 ) ) 13 l e t i='i+1' 14 p rin t "Arg $i is $arg"
15 done
16
17 e x i t 0
Note again that spaces are significant in the shell. The [[, ]], ((, and )) strings are tokens and, thus, must be delimited by whitespace. Use == for arithmetic comparisons; use = for string comparisons. How could one
C O
N FI
D E
N TI
A L
D R
A FT
9.4. NUMBERS AND ARRAYS 243
do both in a single expression? Nest them, or use [[ ... ]] && (( ... )).
9.4.2 Example: Renaming Multiple .c Files to .cpp
The command line mv *.c *.cpp will not work. Why? Nor will the find command work. Why? Script to generate some empty input files:
1 # !/ usr/bin/env ksh 2
3 #$1 = d i r e c t o r y 4 #$2 = number of f i l e des ired 5
6 dir=$1 7 prefix=$2 8 suffix=$3 9 integer i=1
10 integer n=$4 11
12 while ( ( i <= n ) ) ; do 13 touch $dir/${prefix}${i} .$suffix 14 ( ( i += 1 ) ) 15 # p r i n t ${ p r e f i x }${ i } . $ s u f f i x 16 done
17
18 e x i t 0
Rename (multiple move) script:
1 # !/ usr/bin/env ksh 2 #rename ( mult iple move) s c r i p t 3
4 from=$1 5 to=$2 6
7 # f o r f i l e in $ ( l s * . $from ) ; do 8 fo r file in * . $from ; do 9 mv $file ${file%.$from} .$to
10 # p r i n t ${ f i l e %.$from } . $to 11 done
12
13 e x i t 0
C O
N FI
D E
N TI
A L
D R
A FT
244 CHAPTER 9. SHELL PROGRAMMING
9.4.3 Array Variables
An array variable provides a way to index a list of values. Ar- rays in the shell are quite different from arrays in C or Perl. In the shell, we can define x[10] without first having defined elements 1 . . . 9. The ${arrayname[*]} syntax represents all elements of the array arrayname. Items in an array can be accessed by position; first item is at index 0. The $<arrayname> syntax refers to ${<arrayname>[0]} (i.e., the first element of array <arrayname>). The number of de- fined elements in an array variable is given by ${#<arrayname>[⋆]}. The ${<arrayname>[$(( ${#<arrayname>[*]} - 1 ))]} syntax accesses the last element of array <arrayname>.
Examples:
1 $ s e t −A people Lucy and Linus 2 $ s e t −A others ${people [ * ] } and Larry and Lucia 3 $ others [7 ]=and ; others [8 ]=Leisel 4 $ # p r i n t s f i r s t element of array othe rs ( i . e . , ${people [ 0 ] } ) 5 $ p rin t $people
6 Lucy
7 $ # same as above 8 $ p rin t ${people [ 0 ]} 9 Lucy
10 $ # p r i n t s second element of array othe rs 11 $ p rin t ${people [ 1 ]} 12 and
13 $ # p r i n t s length of array othe rs 14 $ p rin t "The length of array others is ${#others[*]}."
15 9 16 $ # p r i n t s l a s t element of array othe rs 17 $ p rin t ${others [$ ( ( ${# othe rs [ * ] } − 1 ) ) ]} 18 Leisel
19 $ s e t −A files=$ (ls )
$#arrayname[i] represents the number of characters in element i of ar- ray arrayname. For instance:
1 $ # p r i n t the number of c h a r a c t e r s in 2 $ # f i r s t element of array othe rs ( i . e . , ${ othe rs [ 0 ] } ) 3 $ p rin t ${# othe rs } 4 4 5 $ # same as above
C O
N FI
D E
N TI
A L
D R
A FT
9.4. NUMBERS AND ARRAYS 245
6 $ p rin t ${# othe rs [ 0 ]} 7 4 8 $ # p r i n t the number of c h a r a c t e r s in second element of 9 $ # array othe rs
10 $ p rin t ${# othe rs [ 1 ]} 11 3
Another example:
1 $ s e t −A today $ (date ) 2 $ p rin t ${today [ * ] } 3 Thu Oct 12 1 6 : 0 3 : 4 4 EDT 2015 4 $ p rin t ${# today [ * ] } 5 6 6 $ p rin t "${today[1]} ${today[2]}, ${today[5]}"
7 Oct 12 , 2015 8 $ date | awk '{print $2 " " $3 ", " $6}' 9 Oct 12 , 2015
10 $ date | awk 'BEGIN {OFS=" "} {print $2, $3 "," , $6}' 11 Oct 12 , 2015
9.4.4 Restricted Shells
Use #!/usr/bin/env ksh -r as the first line of a script to run the script in a restricted Korn shell, where certain operations are forbidden, includ- ing a cd. Enter ksh -r or rksh at the command prompt to start an inter- active restricted Korn shell.
9.4.5 Conceptual Exercises for Section 9.4
Exercise 9.4.1: Consider the following Korn shell statements (assume that they are executed in the order that they are given and that the current directory is /home/linus):
10 $ foo=null 11 $ p rin t $foo 12 $ foo="$foo set" 13 $ p rin t $foo 14 $ s e t −A x $foo 15 $ p rin t ${x [ 1 ]} 16 $ unset foo 17 $ p rin t ${foo:−unset}
C O
N FI
D E
N TI
A L
D R
A FT
246 CHAPTER 9. SHELL PROGRAMMING
18 $ integer pwd=3 19 $ i f [ [ ${pwd} = $ (pwd) ] ] ; then 20 > p rin t 3 21 > else 22 > p rin t $ ( ( pwd*2 ) ) ; f i 23 $ A=quoted 24 p rin t "A '$(print $A)' \$ and escaped \."
a) What is printed by the statement on line 13?
b) What is printed by the statement on line 15?
c) What is printed by the statement on line 17?
d) What do the statements on lines 18–22 print (a syntax error is a valid answer)?
e) What is printed by the statement on line 24?
Exercise 9.4.2: Constrast the command line mv ⋆.c ∼/home/linus with the multimv script given in § 9.4.2.
Exercise 9.4.3: What is the motivation for a restricted shell.
Exercise 9.4.4: What operations does a restricted Korn shell restrict?
9.4.6 Programming Exercises for Section 9.4
Exercise 9.4.5: Suppose we have a directory with many .c (C source) files (e.g., one hundred c files). Write a complete Korn shell script which when invoked replaces the .c extension, on every file in the current directory which contains it, with .cpp.
Exercise 9.4.6: Suppose we have a directory with many .cpp (C++ source) files (e.g., one hundred c++ files). Write a complete Korn shell script which when invoked replaces the .cpp extension, on every file in the current directory which contains it, with .c.
Exercise 9.4.7: Give a Korn shell script containing a function pow, which raises a base to a non-negative exponent and returns the result.
C O
N FI
D E
N TI
A L
D R
A FT
9.5. SHELL PROGRAMMING VS. LINUX FILTER STYLE OF PROGRAMMING 247
9.5 Shell Programming vs. Linux Filter Style of Program-
ming
Table 9.3 graphically contrasts the the Linux filter style of programming (left) versus shell programming (right). The filter model involves solving a programming problem as a chain of concurrent processes communicat- ing with each other with I/O through pipes. Shell programming, on the other hand, typically involves writing a script which executes as a single process. A shell script may also spawn other processes, some even which are chains of processes which communicate with each other through pipes, as shown on the right side of Table 9.3. However, unlike a filter script, a shell script invokes a fan of other processes in that those spawned pro- cesses run sequentially, not concurrent, and, thus, are not communicating with each other.
9.6 Conceptual Exercises for Chapter 9
9.7 Programming Exercises for Chapter 9
9.8 Programming Project for Chapter 9
Write a Korn shell script filecount which counts the number of ordinary files (defined as everything except the following), num- ber of executable files, number of links, and number of directo- ries in one or more directories which are provided as command- line arguments. A sample test session with filecount is available at http://perugini.cps.udayton.edu/teaching/books/SPUC/ www/files/filecounttestsession.txt.
Requirements
• The above counts include dot files, except that . and .. are not included in the directory count (investigate the -A option to ls).
• Files in sub-directories are not included in the counts.
• The distinction between file types is the same as that of ls -F.
C O
N FI
D E
N TI
A L
D R
A FT
248 CHAPTER 9. SHELL PROGRAMMING
Table 9.3: Graphical depiction of the Linux filter style of programming (left) versus shell programming (right). Key: each P1 . . . Pn enclosed in a circle represents a process while each S1 . . . S13 within a process represents a statement of the script.
Filter Script Model Filter Script Shell Script Model Shell Script
input
P1
|
P2
|
P3
|
...
|
Pn
output # P1
cat | \
# P2
sed | \
# P3
awk | \
...
# Pn
sort
S2
S3 P2
S4
S5 P3
S6
S7 P4
S8
S9 P6
S10
S11 P7
S12
S13 P10
S1
output
P1
(shell script)
|
P5
|
P8
|
P9
input
# S1
# S2: P2
sed
# S3
# S4: P3
awk
# S5
# S6: P4 | P5
ls | wc -l
C O
N FI
D E
N TI
A L
D R
A FT
9.8. PROGRAMMING PROJECT FOR CHAPTER ?? 249
• If the script is invoked with no directory name provided, it must work on the current directory. Otherwise, it must produce a single line of output for each directory it processes, as in the following sample (on fictitious locations):
1 $ ./filecount 2 . : 10 ordinary 9 executable 3 links 5 directories 3 $
4 $ ./filecount courses tmp 5 courses : 2 ordinary 8 executable 7 links 42 directories 6 tmp : 8 ordinary 17 executable 5 links 51 directories
• The script must support the following command-line options:
-f: include the count of ordinary files in the output
-x: include the count of executable files in the output
-l: include the count of links in the output
-d: include the count of directories in the output
If any of these options are specified when the script is called, then only the requested totals must be printed for each directory.
• If an invalid option <option> is given, the script must print ./filecount: Illegal option: <option> and a usage mes- sage to stderr and halt with a exit status 1 as shown below.
1 $ ./filecount −t 2 ./filecount : Illegal option −t 3 Usage : filecount [−dflx ] [directory . . . ] 4 $ echo $? 5 1
• If an invalid directory <directory> is given, the script must print ./filecount: Invalid directory: <directory> and a usage message to stderr and halt with a exit status 2.
1 $ ./filecount somedir 2 ./filecount : Invalid directory : somedir 3 Usage : filecount [−dflx ] [directory . . . ]
C O
N FI
D E
N TI
A L
D R
A FT
250 CHAPTER 9. SHELL PROGRAMMING
4 $ echo $? 5 2
• The script must execute using the Korn shell interpreter (ksh). You may not use C, C++, Perl, Python, Ruby, or any similar language.
• The script must run at the command line as: ./filecount [-dflx] [directory ...].
• The script must have -rwxr-x--- permission.
• The script must terminate with a proper exit statement.
• Do not use any specific aspects of your environment within the script. In other words, use native Linux command names as opposed to your environment’s aliases for those commands and do not rely on any specific aspect of your environment (e.g., values of particular shell variables).
• Each line of output must separate the directory from the counts with a single colon (:) followed by exactly two spaces. Delimit each count from its label with a single space and delimit each count label pair from each other with exactly two spaces (exactly as shown above). Al- ways print the ordinary count first, followed by the executable count, then the link count, and finally the directory count, if requested, re- gardless of the order in which the options are given on the command line.
• Each line of output must not contain any leading and trailing whites- pace or any extraneous text.
• All options must precede all directories on a command line.
• Use -- to indicate the end of options.
• Options can be given as singletons (e.g., -x) or in any combinations (e.g., -fx, -xf, -fxld).
• The file counts are mutually-exclusive. One file must never be counted twice. Anything that is not a directory, symbolic link, or exe- cutable, is an ordinary file.
C O
N FI
D E
N TI
A L
D R
A FT
9.9. THEMATIC TAKE-AWAYS 251
• Executable files are to be counted as executable only, not executable and ordinary.
• The script must not create any new files or remove any existing files.
• The script must not create any new directories or remove any existing directories.
You are encouraged to make creative use of the given tools (grep, sed, awk, and others) and string operators (i.e., do not reinvent the wheel). Re- member, grep, sed, and awk can be used on shell variables (e.g., $(echo $PATH | sed ’s/:/2/g’)). Also, explore getopts (though not nec- essary), ls -A, print -n, and print - -n. If designed properly, the script required for this homework should occupy no more than 100 lines of code.
9.9 Thematic Take-Aways
9.10 Chapter Summary
9.11 Key Terms
9.12 Bibliographic Notes
See [?, Chapter 4] and [?, Chapter 12] for more information on Korn shell programming.
C O
N FI
D E
N TI
A L
D R
A FT
252 CHAPTER 9. SHELL PROGRAMMING
Part IV: Compilation Concepts and Techniques, and Automatic Program
Generation
C O
N FI
D E
N TI
A L
D R
A FT
Chapter 10
Automatic Program Generation
Author: Saverio Perugini Copyright © 2017 by Saverio Perugini ALL RIGHTS RESERVED
10.1 Chapter Objectives
• Establish an understanding of flex and bison.
• Differientiate between . . . .
• Introduce . . . .
10.2 Scanner Generation: flex
10.2.1 Outline
10.2.2 Linux Tools for Automatically Generating Scanners and Parsers
flex and bison are the GNU versions of lex and yacc (yet another com- piler compiler), respectively.
10.2.3 Structure of a flex Specification:
253
C O
N FI
D E
N TI
A L
D R
A FT
254 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.2: Our first flex program: cat (version 0).
1
2 %% 3
4 %% 5
6 /* c a l l e d by f l e x when EOF reached */ 7 i n t yywrap ( void ) { 8 /* convention i s to re turn 1 */ 9 return 1 ;
10 } 11
12 i n t main ( void ) { 13 /* main entry point f o r f l e x */ 14 yylex ( ) ; 15 return 0 ; 16 }
Listing 10.1: Structure of a flex specification.
1 /* d e f i n i t i o n s */ 2
3 %% 4
5 /* a s e t of pat tern−a c t i o n r u l e s */ 6
7 %% 8
9 /* subrout ines */
10.2.4 Our First flex Program: cat (version 0)
10.2.5 noop
10.2.6 cat (version 1)
10.2.7 Running flex to Automatically Generate a Scanner
1 $ flex c a t .l # produces l e x . yy . c 2 $ gcc lex .yy .c # produces a . out , the e xe cutable f o r the scanner 3 $ ./a .out # runs the scanner
C O
N FI
D E
N TI
A L
D R
A FT
10.2. SCANNER GENERATION: FLEX 255
Listing 10.3: Noop: noop.l.
1 /* noop . l */ 2
3 %% 4
5 . { } 6 \n { } 7
8 %% 9
10 i n t yywrap ( ) { 11 return 1 ; 12 } 13
14 i n t main ( ) { 15 yylex ( ) ; 16 return 0 ; 17 }
Listing 10.4: cat version 1.
1 /* cat1 . l */ 2
3 %% 4
5 . /* match any c h a r a c t e r except newline */ printf ("%s" , yytext ) ; 6
7 \n /* match newline */ printf ("\n" ) ; 8
9 %% 10
11 i n t yywrap ( void ) { 12 return 1 ; 13 } 14
15 i n t main ( void ) { 16 yylex ( ) ; 17 return 0 ; 18 }
C O
N FI
D E
N TI
A L
D R
A FT
256 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.5: cat version 2.
1 /* cat2 . l */ 2
3 %% 4
5 . ECHO ; 6
7 \n ECHO ; 8
9 %% 10
11 i n t yywrap ( void ) { 12 return 1 ; 13 } 14
15 i n t main ( i n t argc , char * * argv ) { 16 printf (":%s:\n" , argv [ 1 ] ) ; 17 i f ( (yyin = fopen (argv [ 1 ] , "r" ) ) == NULL ) 18 printf ("broken\n" ) ; 19 i f (yyin == stdin ) 20 printf ("here\n" ) ; 21 else
22 printf ("there\n" ) ; 23
24 yylex ( ) ; 25 fclose (yyin ) ; 26 return 0 ; 27 }
. . . or use a Makefile (more on this later) 10.2.8 cat (version 2)
10.2.9 cat (version 3)
10.2.10 cat -n (version 4)
10.2.11 cat -n (version 5)
10.2.12 Word Count
10.2.13 Pattern Overlap
10.2.14 Identifying Identifiers
10.2.15 Matching Quoted Strings
10.2.16 States
• %s ONE creates the (regular) start state ONE
C O
N FI
D E
N TI
A L
D R
A FT
10.2. SCANNER GENERATION: FLEX 257
Listing 10.6: cat version 3.
1 /* cat3 . l */ 2
3 %{ 4 i n t cc=0; 5 %} 6
7 %% 8
9 . { cc++; ECHO ; } 10
11 \n { cc++; ECHO ; } 12
13 %% 14
15 i n t yywrap ( void ) { 16 return 1 ; 17 } 18
19 i n t main ( i n t argc , char * * argv ) { 20 yyin = fopen (argv [ 1 ] , "r" ) ; 21 yylex ( ) ; 22 fclose (yyin ) ; 23 printf ("%d characters\n" , cc ) ; 24 return 0 ; 25 }
C O
N FI
D E
N TI
A L
D R
A FT
258 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.7: cat -n version 4.
1 /* cat4 . l ( c a t −n ) */ 2
3 %{ 4 i n t cc = 0 ; 5 i n t lineno = 0 ; 6 %} 7
8 %% 9
10 ˆ . * \ n { cc += strlen (yytext ) ; 11 printf ("%d %s" , ++lineno , yytext ) ; } 12 %% 13
14 i n t yywrap ( ) { 15 return 1 ; 16 } 17
18 i n t main ( i n t argc , char * * argv ) { 19 yyin = fopen (argv [ 1 ] , "r" ) ; 20 yylex ( ) ; 21 printf ("%d characters.\n" , cc ) ; 22 fclose (yyin ) ; 23 return 0 ; 24 }
C O
N FI
D E
N TI
A L
D R
A FT
10.2. SCANNER GENERATION: FLEX 259
Listing 10.8: cat -n version 5.
1 /* cat5 . l ( c a t −n ) */ 2
3 %option yylineno 4
5 %{ 6 i n t cc = 0 ; 7 %} 8
9 %% 10 ˆ . * \ n { cc += strlen (yytext ) ; 11 printf ("%4d\t%s" , yylineno−1, yytext ) ; } 12
13 %% 14
15 i n t yywrap ( void ) { 16 return 1 ; 17 } 18
19 i n t main ( i n t argc , char * * argv ) { 20 yyin = fopen (argv [ 1 ] , "r" ) ; 21 yylex ( ) ; 22 printf ("%d characters.\n" , cc ) ; 23 fclose (yyin ) ; 24 return 0 ; 25 }
C O
N FI
D E
N TI
A L
D R
A FT
260 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.9: Word count (wc).
1 %{ 2 i n t cc = 0 ; 3 i n t wc = 0 ; 4 i n t lc = 0 ; 5 %} 6
7
8 %% 9
10 \n { lc++; cc++; } 11
12 [ \t ] { cc++; } 13
14 [ ˆ \t\n]+ { wc++; cc += yyleng ; /* count anything but whitespace */ } 15
16 %% 17
18 i n t yywrap ( ) { 19 return 1 ; 20 } 21
22 i n t main ( i n t argc , char * * argv ) { 23 yyin = fopen (argv [ 1 ] , "r" ) ; 24 yylex ( ) ; 25 printf ("%8d%8d%8d\n" , lc , wc , cc ) ; 26 fclose (yyin ) ; 27 return 0 ; 28 }
C O
N FI
D E
N TI
A L
D R
A FT
10.2. SCANNER GENERATION: FLEX 261
Listing 10.10: Pattern overlap in word count (wc2.l).
1 %{ 2 i n t cc = 0 ; 3 i n t wc = 0 ; 4 i n t lc = 0 ; 5 %} 6
7
8 %% 9
10 [ ] { printf ("Found a space.\n" ) ; } 11
12 [ \t ] { cc++; } 13
14 \n { lc++; cc++; } 15
16 [ ˆ \t\n]+ { wc++; cc += yyleng ; /* count anything but whitespace */ } 17
18 %% 19
20 i n t yywrap ( ) { 21 return 1 ; 22 } 23
24 i n t main ( i n t argc , char * * argv ) { 25 yyin = fopen (argv [ 1 ] , "r" ) ; 26 yylex ( ) ; 27 printf ("%8d%8d%8d\n" , lc , wc , cc ) ; 28 fclose (yyin ) ; 29 return 0 ; 30 }
C O
N FI
D E
N TI
A L
D R
A FT
262 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.11: Identifying identifiers (idcount.l).
1 %{ 2 i n t idcount=0; 3 %} 4
5 alpha [_a−zA−Z ] 6 alphanumeric [_a−zA−Z0−9] 7 digit [0−9] 8
9
10 %% 11
12 {alpha}{alphanumeric}* { idcount++; printf ("%s\n" , yytext ) ; } 13 {alpha} ({alpha} |{digit} ) * {idcount++; ECHO ; printf ("\n" ) ; } 14
15 . {} 16 \n {} 17
18 %% 19
20 i n t yywrap ( void ) { 21 return 1 ; 22 } 23
24 i n t main ( i n t argc , char * * argv ) { 25 yyin = fopen (argv [ 1 ] , "r" ) ; 26 yylex ( ) ; 27 fclose (yyin ) ; 28 printf ("This program contains %d identifiers.\n" , idcount ) ; 29 return 0 ; 30 }
C O
N FI
D E
N TI
A L
D R
A FT
10.2. SCANNER GENERATION: FLEX 263
Listing 10.12: Matching quoted strings (quotedStrings.l).
1 %{ 2 # include<s t r i n g . h> 3 extern i n t yy_flex_debug ; 4 char * yylval = NULL ; 5 %} 6
7 %% 8
9 ["][ˆ"\n ] * [ "] { printf (":%s : \n", yytext); 10 yylval = strdup(yytext+1);
11 /* yylval[strlen(yylval)-1] = '\0'; */
12 yylval[yyleng-2] = '\0';
13 printf (":%s : \n", yylval); } 14
15 [" ] [ ˆ "\n]*[\n] { fprintf (stderr, ":%s : \n", yytext); 16 warning("Invalid string :"); 17 printf (":%s : \n", yytext+1); } 18
19 \n { }
20 . { }
21
22 %%
23
24 int yywrap() {
25 return 1;
26 }
27
28 int warning (char* s) {
29 fprintf (stderr, "%s\n", s); 30 return 2;
31 }
32
33 int main(int argc, char** argv) {
34 /* flex -d to enable debugging statements */
35 yy_flex_debug = 1;
36 yylex();
37 return 0;
38 }
C O
N FI
D E
N TI
A L
D R
A FT
264 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
all
a.out
lex.yy.c
Cstrings.l
Figure 10.1: Makefile dependency graph for C strings.
Table 10.1: Pattern matching primitives. Metacharacter Matches . any character except newline \n newline
* zero or more copies of the preceding expression + one or more copies of the preceding expression ? zero or one copy of the preceding expression ˆ beginning of line $ end of line a | b a or b (ab)+ one or more copies of ab (grouping) ‘‘a+b’’ literal “a+b” (C escapes still work) [] character class
• ‘rules that do not have start states can apply in any state’ [?, p. 172]
• %x TWO creates the exclusive start state TWO
• ‘a rule with no start state is not matched when an exclusive state is active’ [?, p. 172]
10.2.17 Matching C Strings
10.2.18 Conceptual Exercises for Section 10.2
Exercise 10.2.1: Define a regular expression to match a string containing balanced parentheses (e.g., ((())()) is balanced, (()() is unbalanced) not state why it is not possible.
C O
N FI
D E
N TI
A L
D R
A FT
10.2. SCANNER GENERATION: FLEX 265
Listing 10.13: States (states.l).
1 %{ 2 %} 3
4 %x ONE 5 %x TWO 6
7
8 %% 9
10 a {BEGIN ONE ; printf ("in ZERO; read a; goto ONE\n" ) ; } 11
12 b {BEGIN TWO ; printf ("in ZERO; read b; goto TWO\n" ) ; } 13
14 <TWO>a { printf ("in TWO; read a; goto 0\n" ) ; BEGIN 0 ; } 15 <TWO>b { printf ("in TWO; read b; goto 0\n" ) ; BEGIN 0 ; } 16 <ONE>a { printf ("in ONE; read a; goto TWO\n" ) ; BEGIN TWO ; } 17 <ONE>b { printf ("in ONE; read b; goto TWO\n" ) ; BEGIN TWO ; } 18
19 . {} 20 \n {} 21 <ONE>. {} 22 <ONE>\n {} 23 <TWO>. {} 24 <TWO>\n {} 25
26 %% 27
28 i n t yywrap ( ) { 29 return 1 ; 30 } 31
32 i n t main ( ) { 33 yylex ( ) ; 34 return 0 ; 35 }
C O
N FI
D E
N TI
A L
D R
A FT
266 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.14: Matching C strings (Cstrings.l).
1 %{ 2 extern i n t yy_flex_debug ; 3 char buf [ 1 0 0 ] ; 4 char * s = NULL ; 5
6 %} 7
8 %x INQUOTE 9
10 %% 11
12 \" { BEGIN INQUOTE; s = buf; } 13
14 <INQUOTE>\\\" { *s++ = '\"'; fprintf(stderr, "found escaped quote\n"←֓ ); }
15 <INQUOTE>\\\n { fprintf(stderr, "found escaped newline\n"); } 16 <INQUOTE>\\n { *s++ = '\n'; fprintf(stderr, "found newline\n"); } 17 <INQUOTE>\\t { *s++ = '\t'; fprintf(stderr, "found tab\n"); } 18
19 <INQUOTE>[" ] { *s = '\0' ; 20 BEGIN 0 ; 21 printf ("\nFound :%s:\n" , buf ) ; } 22
23 <INQUOTE>\n { BEGIN 0 ; fprintf (stderr , "Invalid string.\n" ) ; /* ←֓ e x i t ( 1 ) ; */ }
24
25
26 <INQUOTE>. { *s++ = *yytext ; } 27
28 \n { } 29 . { } 30
31 %% 32
33 i n t yywrap ( ) { 34 return 1 ; 35 } 36
37 i n t main ( ) { 38 yy_flex_debug = 0 ; 39 yylex ( ) ; 40 return 0 ; 41 }
C O
N FI
D E
N TI
A L
D R
A FT
10.2. SCANNER GENERATION: FLEX 267
Listing 10.15: Makefile for C strings (Makefile).
1 SRC = Cstrings .l 2 CC = gcc 3 LEX = flex 4 LEX_FLAGS = −d 5 OBJ = lexer 6
7 all : $ (OBJ ) 8
9 $ (OBJ ) : lex .yy .c 10 $ (CC ) −o $ (OBJ ) lex .yy .c 11
12 lex .yy .c : $ (SRC ) 13 $ (LEX ) $ (LEX_FLAGS) $ (SRC ) 14
15 clean : 16 @−rm lex .yy .c $ (OBJ )
Table 10.2: Pattern matching examples. Expression Matches
abc abc abc* ab, abc, abcc, abccc, . . . abc+ abc, abcc, abccc, abcccc, . . . a(bc)+ abc, abcbc, abcbcbc, ... a(bc)? a, abc [abc] one of: a, b, c [a-z] any letter, a through z [a\-z] one of: a, -, z [-az] one of: -, a, z [A-Za-z0-9]+ one or more alphanumeric characters [ \t\n]+ whitespace [ˆab] anything except: a, b [aˆb] a, ˆ, b [a | b] a, |, b a | b a, b
C O
N FI
D E
N TI
A L
D R
A FT
268 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Table 10.3: flex predefined variables. Name Function
int yylex(void) call to invoke lexer, returns token char* yytext pointer to matched string yyleng length of matched string yylval value associated with token int yywrap(void) wrapup, return 1 if done, 0 if not done FILE* yyout output file FILE* yyin input file INITIAL initial start condition BEGIN condition switch start condition ECHO write matched string
10.2.19 Programming Exercises for Section 10.2
Exercise 10.2.2: Define a flex specification for a program that writes to stdout each line of its standard input with all leading and trailing whites- pace purged from every line.
Exercise 10.2.3: Define a flex specification for a program that writes to stdout each line of its standard input with all leading and trailing whites- pace purged from every line, and all blank lines purged.
Exercise 10.2.4: Define a flex specification for the Linux wc command. You need not handle file I/O or command-line options (assume -l, -w, and -c are always present).
Exercise 10.2.5: Define a flex specification for the Linux wc command. The scanner generated must support both standard input and file input. You need not handle command-line options (assume -l, -w, and -c are always present).
Exercise 10.2.6: Consider the input stream given in Exercise 8.3.. Define a flex specification for a program to convert each line of standard input in the form (<last>,<first>) to (<first> 2 <first>) and print the results to stdout, where 2 represents a single space character.
Exercise 10.2.7: Consider the input stream given in Programming Ex- ercise 8.3.. Define a flex specification for a program to con- vert each line of standard input in the form (<last>,<first>) to
C O
N FI
D E
N TI
A L
D R
A FT
10.2. SCANNER GENERATION: FLEX 269
(<first> 2 <first>) and print the results, with any leading and trail- ing whitespace, and all blank lines, purged, to standard output, where 2 represents a single space character.
Exercise 10.2.8: Rewrite the flex specification for matching quoted strings in Listing ?? by combining the two pattern-action rules into one pattern-action rule.
10.2.20 Programming Projects for Section 10.2
Exercise 10.2.1: Automatically generate a lexical analyzer which outputs the uncommented and commented included header filenames from a stream of C/C++ source code.
Requirements:
a) Your program must read from standard input and file input, but always write to standard output.
b) Your program must support only two command-line options (-u and -c) and combinations of them (e.g., -uc and -cu).
c) When run with no command-line options, your program must print both uncommented and commented included header filenames (and nothing else) using the format used in the sample output given below.
d) When run with the -u command-line option, your program must print only the uncommented included header filenames (and nothing else) us- ing the format used in the sample output given below.
e) When run with the -c command-line option, your program must print only the commented included header filenames (and nothing else) using the format used in the sample output given below.
f) When run with the -u and -c command-line options or the -uc or -cu command-line options, your program must print both the uncommented and commented included header filenames (and nothing else) using the format used in the sample output given below.
g) If an invalid option <option> is given, the program must print ./showheaders: Illegal option: <option> and a usage mes- sage to stderr and halt with a exit status 1 as shown below.
C O
N FI
D E
N TI
A L
D R
A FT
270 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
1 $ ./showheaders −t 2 ./showheaders : Illegal option −t 3 Usage : showheaders [−cu ] [file (s ) . . . ] 4 $ echo $? 5 1
h) If an invalid file <file> is given, the program must print ./showheaders: Invalid file: <file> and a usage message to stderr and continue processing any remaining input files, but exit status 2 after processing any remaining files.
1 $ ./showheaders somefile 2 ./showheaders : Invalid file : somefile 3 Usage : showheaders [−cu ] [file (s ) . . . ] 4 $ echo $? 5 2
i) You may assume that the input stream will never contain more than fifty (uncommented or commented) included header filenames.
j) Your solution must contain only a flex specification file and a Makefile (i.e., no other source files).
k) Use macros and substitutions (e.g., digit [0-9]), where possible and appropriate, to simply the pattern-matching rules in your flex specifi- cation file.
l) Develop a Makefile which builds your lexical analyzer. Your Makefilemust include target directives for every derived file produced during the compilation process (i.e., each program, each object file, and any other intermediate files produced during code generation and com- pilation). Make sure that each directive also lists all files on which the derived file depends in its dependency list. Also, your Makefile must be written so carries out only the commands necessary to bring any pro- duced file up-to-date. Your Makefile must do just enough, but no ex- tra, work to bring showheaders (the final executable for your lexical analyzer) up-to-date every time make is invoked. In addition, it must have an all directive and a clean directive to remove all generated files. Use variables where appropriate in your Makefile to improve its readability. Your Makefile must bring everything up-to-date, us- ing only lex and gcc, without any warnings or errors, when make is invoked.
C O
N FI
D E
N TI
A L
D R
A FT
10.3. PARSER GENERATION: BISON 271
yes / nosource program (regular grammar) list of
tokens
(context−free grammar) parserscanner(string or
list of lexemes)
Figure 10.2: Simplified view of scanning and parsing: the front end.
yes / no
yacclex
source program list of
tokens(string or list of lexemes)
.yregular grammar ( ).l context−free grammar ( )
lex.yy.c
scanner parser .tab.h .tab.c
Figure 10.3: Simplified view of scanning and parsing: the front end with flex and bison.
Sample test data is available at http://perugini.cps.udayton.edu/ teaching/books/SPUC/www/files/showheadersdata.tar, and a sample test session with showheaders on that data is available at http://perugini.cps.udayton.edu/teaching/books/SPUC/ www/files/showheaderstestsession.txt.
Exercise 10.2.2: Complete Programming Project 10.2.1 in Go, subject to all the requirements given in that specification. Use the Nex (nex) lexical analyzer generator for Go available at: https://crypto.stanford. edu/˜blynn/nex/.
10.3 Parser Generation: bison
10.3.1 Scanning and Parsing
10.3.2 Evaluating Arithmetic Expressions in Linux
1 $ expr 2 + 3 2 5 3 $ expr 2 + 3 \* 4 4 14 5 $ expr 2 \* 3 + 4 6 10
C O
N FI
D E
N TI
A L
D R
A FT
272 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
yes / no
id1 = id2 * id3 + id4
scanner
parser
list of tokens
"n = x * y + z"source program (string)
Figure 10.4: More detailed view of scanning and parsing.
.tab.c
lex regular grammar ( )
yacc context−free grammar ( )
id1 = id2 * id3 + id4
scanner
list of tokens
"n = x * y + z"source program (string)
yes / no =
.l
.y
lex.yy.c
parser .tab.h
Figure 10.5: More detailed view of scanning and parsing with flex and bison.
C O
N FI
D E
N TI
A L
D R
A FT
10.3. PARSER GENERATION: BISON 273
7 $ expr "2 + 3 * 4"
8 2 + 3 * 4
1 $ bc −l 2 bc 1 . 0 6 3 Copyright 1991−1994 , 1997 , 1998 , 2000 Free Software Foundation ,Inc . 4 This is free software with ABSOLUTELY NO WARRANTY . 5 For details type `warranty'.
6 23+47
7 70
8 2 + 3
9 5
10 2 + 3 * 4
11 14
12 2 * 3 + 4
13 10
14 2 ˆ 3
15 8
16 ˆD
10.3.3 Calculator (version 1)
The following is a context-free grammar in ENBF defining a language of calculator expressions which we use as a running example in this chapter.
<program> ::= <program> <expr> \n | <expr> \n <expr> ::= (<list>) | a
<list> ::= <expr> | <expr> <list> <expr> ::= <integer>
<expr> ::= − <expr> <expr> ::= <expr> + <expr>
<expr> ::= <expr> * <expr> <integer> ::= 1 | 2 | 3 | . . . | ∞
Hack to deal with an ambiguous grammar. bison Conflicts
%left ’+’ ’-’
%left ’*’ ’/’
1 %token INTEGER 2 /* produces ”# d e f ine INTEGER 258” in c a l c . tab . c
C O
N FI
D E
N TI
A L
D R
A FT
274 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.16: calc1.l.
1 %{ 2 # include "calc1.tab.h"
3 /* 4 # d e f ine YYSTYPE i n t 5 e xte rn YYSTYPE y y lval ; 6 */ 7 %} 8
9 %% 10
11 0 { 12 /* get i n t e g e r value of INTEGER token */ 13 yylval = atoi (yytext ) ; 14 return INTEGER ; 15 } 16
17 [1−9][0−9]* { 18 /* get i n t e g e r value of INTEGER token */ 19 yylval = atoi (yytext ) ; 20 return INTEGER ; 21 } 22
23 [−+\n ] { return *yytext ; } 24
25 [ \t ] ; /* skip whitespace */ 26
27 . yyerror ("invalid character" ) ; 28
29 %% 30
31 i n t yywrap ( void ) { 32 return 1 ; 33 }
C O
N FI
D E
N TI
A L
D R
A FT
10.3. PARSER GENERATION: BISON 275
Listing 10.17: calc1.y.
1 /* token values t y p i c a l l y s t a r t around 258 2 because values 0−255 are reserved f o r c h a r a c t e r values and 3 l e x r e s e r v e s s e v e r a l values f o r end−of− f i l e and e r r o r process ing 4 */ 5
6 /* produces ”# d e f ine INTEGER 258” in y . tab . c on our system */ 7 %token INTEGER 8
9 %{ 10 # include<s t d i o . h> 11 # define YYDEBUG 0 12 %} 13
14 %left '+' '-' 15
16 %% 17
18 program : program expr '\n' { printf ("%d\n" , $2 ) ; } 19 | expr '\n' {printf ("%d\n" , $1 ) ; } 20 ; 21
22 expr : INTEGER { $$ = $1 ; /* d e f a u l t a c t i o n : pop , push */ } 23
24 | expr '+' expr { 25 /* ad d i t ion */ 26 $$ = $1 + $3 ; 27 } 28
29 | expr '-' expr { 30 /* s u b t r a c t i o n */ 31 $$ = $1 − $3 ; 32 } 33 ; 34
35 %% 36
37 i n t yyerror ( char * s ) { 38 fprintf (stderr , "%s\n" , s ) ; 39 return 0 ; 40 } 41
42 i n t main ( void ) { 43 # i f YYDEBUG 44 yydebug = 0 ; 45 // yy flex debug = 1 ; 46 # endif
47 yyparse ( ) ; 48 return 0 ; 49 }
C O
N FI
D E
N TI
A L
D R
A FT
276 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
value stack
contains terminals
represents current parsing state
and non−terminals; an array of YYSTYPE elements
$$ = top of stack
<expr>
’+’
<expr>
31
’+’
23
$3
$2
$1
tokens yylvals
<expr> $$ 54
parse stack
Figure 10.6: Parse stack and value stacks in bison.
inptut (e.g., source code) gram.y
lex.yy.c
(contains
tokens.l
(contains
gcc
s#include
gram.tab.c
gram.tab.h
output (e.g., parse tree)
(defines )
of grammar) (EBNF specification
YYSTYPE
)yylex()
yyparse() )
(regular expression specification of
tokens)
bison
flex
a.out
Figure 10.7: Marriage of flex and bison.
3 because values 0−255 are reserved f o r c h a r a c t e r values , and 4 l e x r e s e r v e s s e v e r a l values f o r end−of− f i l e and e r r o r process ing 5 and , the re fore , token values t y p i c a l l y s t a r t around 258 */
10.3.4 Marriage of flex and bison
10.3.5 Running bison (in conjunction with flex) to Generate a Parser
[Nie][p. 5] [Nie][p. 5]
Fig. 10.7 illustrates how flex and bison collaborate to generate a parser.
1 $ flex tokens .l # produces l e x . yy . c 2 $ bison −d gram .y # produces gram . tab . c and gram . tab . h
C O
N FI
D E
N TI
A L
D R
A FT
10.3. PARSER GENERATION: BISON 277
inptut (e.g., source code)
lex.yy.c
(contains
(contains
gcc
s#include
output (e.g., parse tree)
(defines )
of grammar) (EBNF specification
YYSTYPE
)yylex()
yyparse() )
(regular expression specification of
tokens)
bison
flex
a.out
calc1.y
calc1.l
calc1.tab.c
calc1.tab.h
Figure 10.8: Marriage of flex and bison in calculator.
3 $ gcc −c gram .tab .c # produces gram . tab . o 4 $ gcc −c lex .yy .c # produces l e x . yy . o 5 $ gcc −o parser gram .tab .o lex .yy .o # produces parser 6 $ ./parser < . . .
1 $ flex calc1 .l # produces l e x . yy . c 2 $ bison −d calc1 .y # produces c a l c 1 . tab . c and c a l c 1 . tab . h 3 $ gcc −c calc1 .tab .c # produces c a l c 1 . tab . o 4 $ gcc −c lex .yy .c # produces l e x . yy . o 5 $ gcc −o parser calc1 .tab .o lex .yy .o # produces parser 6 $ ./calc1 < . . .
10.3.6 Calculator (version 2)
We extend the calculator of the previous section to incorporate the follow- ing new features:
• multiplication (*) and division (/) arithmetic operators,
• a unary minus operator (−),
• a exponentiation operator (ˆ) for non-negative exponents,
• parentheses to override operator precedence,
• single-character variables, and
• a print statement.
C O
N FI
D E
N TI
A L
D R
A FT
278 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.18: Makefile for calculator (version 1).
1 SRC = calc1 2 CC = gcc 3 LEX = flex 4 #LEX FLAGS = −d 5 LEX_FLAGS = 6 YACC = bison 7 YACC_FLAGS = −d −t 8
9 all : $ (SRC ) 10
11 $ (SRC ) : lex .yy .o $ (SRC ) .tab .o 12 $ (CC ) lex .yy .o $ (SRC ) .tab .o −o $ (SRC ) 13
14 lex .yy .o : lex .yy .c $ (SRC ) .tab .h 15 $ (CC ) −c lex .yy .c 16
17 lex .yy .c : $ (SRC ) .l 18 $ (LEX ) $ (LEX_FLAGS) $ (SRC ) .l 19
20 $ (SRC ) .tab .o : $ (SRC ) .tab .c 21 $ (CC ) −c $ (SRC ) .tab .c 22
23 $ (SRC ) .tab .c : $ (SRC ) .y 24 $ (YACC ) $ (YACC_FLAGS) $ (SRC ) .y 25
26 $ (SRC ) .tab .h : $ (SRC ) .y 27 $ (YACC ) $ (YACC_FLAGS) $ (SRC ) .y 28
29 clean : 30 −rm * . [ cho ] $ (SRC )
C O
N FI
D E
N TI
A L
D R
A FT
10.3. PARSER GENERATION: BISON 279
Listing 10.19: calc2.l.
1 %{ 2 # include "calc2.tab.h"
3 %} 4
5 %% 6
7 [a−z ] { /* the p o s i t i o n of the c h a r a c t e r in the alphabet 0 . . 2 5 */ 8 yylval = *yytext − 'a' ; 9 return VARIABLE ; }
10
11 0 { yylval = atoi (yytext ) ; 12 return INTEGER ; } 13
14 [1−9][0−9]* { yylval = atoi (yytext ) ; 15 return INTEGER ; } 16
17 [−+() = * / ˆ ;\n ] { /* ope rators */ return *yytext ; } 18
19 print { /* operator */ return PRINT ; } 20
21 [ \t ] { /* skip whitespace */ } 22
23 . { /* anything e l s e i s an e r r o r */ yyerror ("invalid character" ) ; } 24
25 %% 26
27 i n t yywrap ( void ) { 28 return 1 ; 29 }
C O
N FI
D E
N TI
A L
D R
A FT
280 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.20: calc2.y. 1 %token INTEGER VARIABLE PRINT 2 %right '=' 3 %left '+' '-' 4 %left '*' '/' 5 %right 'ˆ' 6 7 %{ 8 # include<s t d i o . h> 9 # include<math . h>
10 # define SIZE 26 11 # define YYDEBUG 0 12 i n t symtab [SIZE ] ; 13 %} 14 15 %% 16 17 program : program statement ';' '\n' 18 | statement ';' '\n' 19 ; 20 21 statement : 22 expr 23 | PRINT expr { printf ("%d\n" , $2 ) ; } 24 | VARIABLE '=' expr { symtab [$1 ] = $3 ; } 25 ; 26 27 expr : 28 INTEGER 29 | VARIABLE { $$ = symtab [$1 ] ; } 30 | '-' expr %prec 'ˆ' { $$ = $2*−1; } 31 | expr '*' expr { $$ = $1 * $3 ; } 32 | expr '/' expr { $$ = $1 / $3 ; } 33 | expr '+' expr { $$ = $1 + $3 ; } 34 | expr '-' expr { $$ = $1 − $3 ; } 35 | expr 'ˆ' expr { $$ = pow ($1 , $3 ) ; } 36 | '(' expr ')' { $$ = $2 ; } 37 ; 38 39 %% 40 41 i n t yyerror ( char * s ) { 42 fprintf (stderr , "%s\n" , s ) ; 43 return 0 ; 44 } 45 46 i n t main ( void ) { 47 # i f YYDEBUG 48 yydebug = 1 ; 49 # endif 50 i n t i ; 51 for (i=0; i < SIZE ; i++) 52 symtab [i ] = 0 ; 53 yyparse ( ) ; 54 return 0 ; 55 }
C O
N FI
D E
N TI
A L
D R
A FT
10.3. PARSER GENERATION: BISON 281
The following is sample input and output for the extended calculator (> is simply the prompt for input and will be the empty string in your system).
> 2 * (5 - 6);
> print 2 * (5 -6);
-2
> x = 6 / (7- 4);
> x;
> print x ;
2
> y= 3;
> y + -3 * x;
> print y + - 3 * x;
-3
> print y ˆ x;
9
The syntactic aspects of these enchancements are expressed in the follow- ing context-free grammar in EBNF for calculator sentences:
<program> ::= <program> <stmt> ; \n | <stmt> ; \n
<stmt> ::= <expr> | print <expr> <stmt> ::= <variable> = <expr>
<expr> ::= <integer> | <var> <expr> ::= − <expr> <expr> ::= <expr> + <expr>
<expr> ::= <expr> − <expr> <expr> ::= <expr> * <expr>
<expr> ::= <expr> / <expr> <expr> ::= <expr> ˆ <expr>
<expr> ::= (<expr>) <integer> ::= 1 | 2 | 3 | . . . | ∞ <variable> ::= a | b | c | . . . | z
The unary minus operator (−) has precedence over all other operators. The exponentiation operator (ˆ) is right-associative and has the second highest precedence. Identifiers for single-character variables are limited to the 26 lowercase alphabetic characters.
1 /* y i e l d s an i n t e g e r in the range 0−25 */ 2 /* a s c i i code f o r c h a r a c t e r ' a ' i s 97 */
C O
N FI
D E
N TI
A L
D R
A FT
282 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
program output scanner
(regular grammar) grammar) (context−free
tokens parser
source program (string or
list of lexemes)
list of
Front End
interpreter
interpreting while parsing
Figure 10.9: Interpreting while parsing.
calc2.tab.h
tokens
source program
(string or list of lexemes)
list of
Front End
interpreter
interpreting while parsing
program output
(regular grammar) calc2.l scanner lex.yy.c
(context−free grammar) calc2.y
parser calc2.tab.c
Figure 10.10: Interpreting while parsing in calculator (version 1 and 2).
3 /* a s c i i code f o r c h a r a c t e r ' t ' i s 116 */ 4 yylval = *yytext − 'a' ;
The lexical analyzer must now return VARIABLE tokens in addition to INTEGER tokens.
The same Makefile from version 1 can be used for version 2 of the calculator.
(regular grammar) Interpreter
scanner list of
tokens parser grammar)
(context−freesource program (string or
parse tree
Front End
list of lexemes)
program input
program output
Figure 10.11: Interpretation.
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 283
Interpreterscanner source program
(string or list of lexemes)
list of
tokens parser grammar)
(context−free parse tree
Front End
(regular grammar) program output
program input
interpreter (compiled to machine code)
(input to the interpreter)
(input to the interpreter)
(e.g., processor)
Figure 10.12: Alternate view of execution by interpretation.
translated program
scanner (regular grammar) list of
tokens parser grammar)source program
(string or parse tree
Front End
list of lexemes)
(context−free
Compiler
Interpreter (e.g., processor)
code generator/ translatoranalyzer
semantic
program output
program input
(e.g., object code)
Figure 10.13: Compilation.
10.4 Putting It All Together: Towards Interpreters
In this section, we extend the language for calculator sentences and its parser. Specifically, we
1. incorporate more features into the calculator,
2. construct a syntax tree during parsing, and
3. traverse the tree to evaluate a calculator program and produce output.
10.4.1 Calculator (version 3)
The additional features are
• the <, <=, >, >=, ==, and ! = binary comparison operators,
• selection through if and if–else statements,
• repetition through a while statement, and
• statement blocks beginning and ending with { and }, respectively.
C O
N FI
D E
N TI
A L
D R
A FT
284 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
program output
mul id3 add id4 store id1
001101010110110 000110101010111 111100011100101 010101010101010
id1 = id2 * id3 + id4
=
+
scanner
parser
preprocessor
n = x * y + z
id1
Front End
n = x * y + z
code generator
Compiler
/* mathematical expression */
*
id2 id3
id4 parse tree
assembly code
assembler
object code
source program commented
list of tokens
list of lexemes
processorprogram input
load id2
Figure 10.14: Low-level view of execution by compilation.
(regular grammar) Interpreter
scanner list of
tokens parser grammar)
(context−freesource program (string or
parse tree
Front End
list of lexemes)
program output
Figure 10.15: Calculator expression interpretion.
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 285
program output
list of
tokens
source program (string or
parse tree
Front End
list of lexemes)
(regular grammar)
lex.yy.c
scanner (calc3.l)
calc3.tab.c
(context−free grammar)
(calc3.y)
parser calc3.tab.h
Interpreter
interpreter.c
Figure 10.16: Calculator expression interpretion.
in assembly code
code generator/ translator
Compiler
(context−free
scanner (regular grammar) list of
tokens parser grammar)source program
(string or parse tree
Front End
list of lexemes)
translated program
Figure 10.17: Calculator expression compilation.
calc3.tab.h
code generator/ translator
Compiler
list of
tokens
source program (string or
parse tree
Front End
list of lexemes)
translated program
in assembly code
compiler.c
(regular grammar) (calc3.l)
scanner lex.yy.c
(context−free grammar)
(calc3.y)
calc3.tab.c
parser
Figure 10.18: Calculator expression compilation.
Front End
mul id3 add id4 store id1
id1 = id2 * id3 + id4tokens
=
+
scanner
parser
n = x * y + zsource program
id1
code generator
Compiler
*
id2 id3
id4 parse tree
assembly code
lex
yacc
regular grammar ( ).l
context−free grammar ( ).y
(mathematical expression)
load id2
Figure 10.19: .
C O
N FI
D E
N TI
A L
D R
A FT
286 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Front End
mul id3 add id4 store id1
id1 = id2 * id3 + id4tokens
=
+
n = x * y + zsource program
id1
*
id2 id3
id4
parse tree
assembly code
compiler.c
Compiler
code generator
lex
yacc
calc3.l (regular grammar)
calc3.y (context−free grammar) parser
calc3.tab.h calc3.tab.c
scanner lex.yy.c
(mathematical expression)
load id2
Figure 10.20: .
With these new features, the language understood by the calculator begins to resemble an imperative programming language. These features, which have the same semantics as in C, are expressed in the following context- free grammar in EBNF for calculator programs:
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 287
<program> ::= <code>
<code> ::= <code> <stmt> | <stmt> <stmt> ::= ; | print <expr> ; <stmt> ::= <var> = <expr> ;
<stmt> ::= while ( <expr> ) <stmt> <stmt> ::= if ( <expr> ) <stmt> [ else <stmt> ] <stmt> ::= { <stmt list> }
<stmt list> ::= <stmt> | <stmt list> <stmt>
<expr> ::= <integer> | <var> | − <expr> <expr> ::= <expr> + <expr> | <expr> − <expr> <expr> ::= <expr> * <expr> | <expr> / <expr>
<expr> ::= <expr> < <expr> | <expr> > <expr> <expr> ::= <expr> <= <expr> | <expr> >= <expr>
<expr> ::= <expr> == <expr> | <expr> != <expr> <expr> ::= <expr> ˆ <expr> | ( <expr> )
<integer> ::= 1 | 2 | 3 | . . . | ∞ <variable> ::= a | b | c | . . . | z
The following is a calculator program,
1 x = 1 0 ; 2 while (x >= 1) { 3 print x ; 4 x = x − 1 ; 5 } .
and its output.
10
9
8
7
6
5
4
3
2
1
Construction of a parse tree requires some preliminary discussion of some constructs and capabilities in C that help facilitate the process.
C O
N FI
D E
N TI
A L
D R
A FT
288 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Listing 10.21: calc3.l.
1 %{ 2 # include "calc3.h"
3 # include "calc3.tab.h" 4 %} 5
6 %option yylineno 7
8 %% 9
10 [a−z ] { /* v a r i a b l e s */ 11 yylval .environI = *yytext − 'a' ; 12 return VARIABLE ; } 13
14 0 { yylval .literal = 0 ; 15 return INTEGER ; } 16
17 [1−9][0−9]* { /* i n t e g e r s */ 18 yylval .literal = atoi (yytext ) ; 19 return INTEGER ; } 20
21 [ − ˆ ( ) <>=+*/;{}] { /* s ing le−c h a r a c t e r ope rators returned as ←֓ themself */
22 return *yytext ; } 23
24 ">=" { /* other ope rators returned as tokens */ 25 return GE ; } 26
27 "<=" return LE ; 28 "==" return EQ ; 29 "!=" return NE ; 30 "while" return WHILE ; 31 "if" return IF ; 32 "else" return ELSE ; 33 "print" return PRINT ; 34
35 [ \t\n ] { /* ignore whitespace */ ; } 36
37 . yyerror ("Unknown character" ) ; 38
39 %% 40
41 i n t yywrap ( void ) { 42 return 1 ; 43 }
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 289
Listing 10.22: calc3.y. 1 %{ 2 # include <s t d l i b . h> 3 # include <s t d i o . h> 4 # include <s tdar g . h> /* provides access to the v a r i a b l e argument macros */ 5 # include "calc3.h" 6 # define SIZE 26 7 8 PTnode* newOperatorNode ( i n t oper , i n t nops , . . . ) ; 9 PTnode* newLiteralOrVariableNode( i n t literalOrVariable , PTnodeFlag flag ) ;
10 void freePTnode (PTnode* nodePtr ) ; 11 i n t dfs (PTnode* nodePtr ) ; 12 13 void yyerror ( char * s ) ; 14 15 i n t environment [SIZE ] ; /* environment */ 16 17 extern i n t yylineno ; 18 19 %} 20 21 /* value s tack w i l l be an array of these YYSTYPE ' s */ 22 %union { 23 i n t literal ; /* l i t e r a l value */ 24 char environI ; /* environment index */ 25 PTnode* nodePtr ; /* node pointer */ 26 } ; 27 /* g ener a tes the f o l low ing : 28 29 typedef union { 30 i n t l i t e r a l ; 31 char envir onI ; 32 PTnode * nodePtr ; 33 } YYSTYPE ; 34 ex ter n YYSTYPE y y lva l ; 35 */ 36 /* in other words , constants , var iab les , and nodes can 37 be r epr esented by y y lva l in the par ser ' s value s tack */ 38 39 /* binds INTEGER to iValue in the YYSTYPE union */ 40 /* a s s o c i a t e s token names with c o r r e c t component of the YYSTYPE union */ 41 /* to g ener a te f o l low ing code */ 42 /* y y lva l . nodePtr = newLiteralOrVariableNode ( yyvsp [ 0 ] . l i t e r a l ) ; */ 43 44 %token <literal> INTEGER 45 %token <environI> VARIABLE 46 %token WHILE IF PRINT 47 /* binds expr to nodePtr in the YYSTYPE union */ 48 %type <nodePtr> stmt expr stmtlist 49 50 %nonassoc IFX 51 %nonassoc ELSE 52 %left GE LE EQ NE '>' '<' 53 %left '+' '-' 54 %left '*' '/' 55 %right 'ˆ' 56 %nonassoc UMINUS 57 58 %% 59 60 program : code { exit ( 0 ) ; } 61 ; 62 63 code : code stmt { dfs ($2 ) ; freePTnode ($2 ) ; } 64 | /* NULL */ 65 66 stmt : ';' { $$ = newOperatorNode (';' , 2 , NULL , NULL ) ; } 67 | expr ';' { $$ = $1 ; } 68 | PRINT expr ';' { $$ = newOperatorNode (PRINT , 1 , $2 ) ; } 69 | VARIABLE '=' expr ';' { $$ = newOperatorNode ('=' , 2 , 70 newLiteralOrVariableNode($1 ,variableFlag ) , $3 ) ; } 71 | WHILE '(' expr ')' stmt { $$ = newOperatorNode (WHILE , 2 , $3 , $5 ) ; } 72 | IF '(' expr ')' stmt %prec IFX { $$ = newOperatorNode (IF , 2 , $3 , $5 ) ; } 73 | IF '(' expr ')' stmt ELSE stmt { $$ = newOperatorNode (IF , 3 , $3 , $5 , $7 ) ; } 74 | '{' stmtlist '}' { $$ = $2 ; } 75
C O
N FI
D E
N TI
A L
D R
A FT
290 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Takes advantage of the fact that ints and chars are represented inter- nally as ints.
10.4.2 Helpful C Constructs and Capabilities
We construct the parse tree in a bottom-up fashion. This means that we allocate leaf nodes when variables and integers are reduced. We allocate an internal nodes when operators are reduced. An internal node contains the operator, the number of arguments, and pointers to previously allocated nodes which represent its operands. Two issues arise.
1. We have different types of nodes: internal nodes and leaf nodes, each with different storage requirements.
2. We have multiple types of internal, operator nodes: those for unary, binary, and ternary operators.
We use unions in C and the control C affords the programmer in lay- ing out the memory structures on the help to address the hetergenity of the different types of nodes (i.e., the first issue), and we use functions of variable arguments to help allocate and load internal nodes which have a different number of children pointers depending on the arity of the opera- tor each represents (i.e., the second issue).
unions
1 union { 2 i n t i ; 3 f l o a t f ; 4 char [ 1 6 ] s ; 5 }
Variable Argument Lists
1 void f ( i n t nargs , . . . ) { 2 /* the d e c l a r a t i o n . . . 3 can only appear at the end of an argument l i s t */ 4
5 i n t i , tmp ;
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 291
PTnodeFlag flag
int oper
int nops
pointer to an array of pointers of type
PTnode*
OperatorNode
structPTnode
int literalOrVariable union − could be any 1 of 2
OperatorNode operator1
PTnode** operands
Figure 10.21: structures for parse tree nodes in calculator (version 3).
6
7 va_list ap ; /* argument po inte r */ 8
9 va_start(ap , narags ) ; /* i n i t i a l i z e s ap to point to the 10 f i r s t unnamed argument ; 11 v a s t a r t must be c a l l e d once 12 be fore ap can be used */ 13
14 fo r (i=0; i < nargs ; i++) 15 temp = va_arg (ap , i n t ) ; /* re turns one argument and 16 s te ps ap to the next argument */ 17 /* the second argument to va arg 18 must be a type name so t h a t 19 va args knows how big a step 20 to take */ 21
22 va_end (ap ) ; /* clean−up ; must be c a l l e d be fore 23 funct ion re turns */ 24 }
10.4.3 Structures for Parse Tree Nodes
Header File
We place the datatype definitions for our parse tree in a file named calc.h.
10.4.4 Precedence and Associativity in Calculator (version 3)
C O
N FI
D E
N TI
A L
D R
A FT
292 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
PTnode* newLiteralOrVariableNode(int literalOrVariable, PTnodeFlag flag) {
/* copy data */
100
}
called when we see a literal or variable; creates a leaf node in parse tree
PTnode* nodePtr 100
PTnodeFlag flag
int literalOrVariable
Figure 10.22: Node type used for literals and variables (i.e., leaf nodes) in calculator (ver- sion 3).
PTnode* nodePtr
va_list ap
100100
int operatorLiteral
int numOfOperands
PTnode** operands
called when we see an operator; creates an internal node in parse tree
PTnodeFlag flag
PTnode* newOperatorNode(int operatorLiteral, int numOfOperands, ... ) {
}
/* copy data */
OperatorNode operator1
Figure 10.23: Node type used for operators (i.e., internal nodes) in calculator (version 3).
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 293
1 /* value s tack w i l l be an array of the s e YYSTYPE ' s ; 2 has nothing to do with the union in c a l c 3 . h */ 3 %union { 4 i n t literal ; /* i n t e g e r value */ 5 char environI ; /* environment index */ 6 PTnode* nodePtr ; /* node pointe r */ 7 } ; 8 /* g e ne rate s the fol lowing : 9
10 typedef union { 11 i n t l i t e r a l ; 12 char environI ; 13 nodePtr * nodePtr ; 14 } YYSTYPE ; 15 e xte rn YYSTYPE y y lval ; 16
17 in other words , constants , v a r i a b l e s , and nodes can 18 be represented by y y lval in the parser ' s value s tack 19
20 binds INTEGER to iValue in the YYSTYPE union 21 a s s o c i a t e s token names with c o r r e c t component of the 22 YYSTYPE union to generate fo l lowing code 23 y y lval . nodePtr = newLiteralOrVariableNode ( yyvsp [ 0 ] . l i t e r a l ) ; */ 24
25 %token <literal> INTEGER 26 %token <environI> VARIABLE 27 %token WHILE IF PRINT 28 %nonassoc IFX 29 %nonassoc ELSE 30
31 %left GE LE EQ NE '>' '<' 32 %left '+' '-' 33 %left '*' '/' 34 %right 'ˆ' 35 %nonassoc UMINUS 36
37 /* binds expr to nPtr in the YYSTYPE union */ 38 %type <nodePtr> stmt expr stmtlist
10.4.5 Interpreters: Program Evaluators
When the syntax tree is completely built, pass only a pointer to the root node to a function eval which interprets the program and prints any out- put. The eval function returns an int and conducts a depth-first traversal of the tree. Since the tree is constructed in a bottom-up fashion, the depth- first walk visits nodes in the order in which they were allocated. This ap-
C O
N FI
D E
N TI
A L
D R
A FT
294 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
interpreter.o
interpreter.c calc.tab.h calc.h
parsetree.o
parsetree.c
calc.tab.o
calc.tab.c
calc.y
compiler.o
compiler.c
lex.yy.o
lex.yy.c
calc.l
all
interpreter compilerparsetree
Figure 10.24: Makefile dependency graph for calculator (version 3).
proach has the attractive property of applying the operators in the order that they were encountered during parsing or, in other words, according to the rules of precedence.
When eval returns, pass only a pointer to the root node of the syntax tree to a function freeTree which frees each node of the tree.
The Makefile dependency graph for calculator (version 3) is given in Fig. ??.
10.4.6 Conceptual Exercises for Section 10.4
Exercise 10.4.1: What is the underlying cause of a shift-reduce conflict?
Exercise 10.4.2: What is the underlying cause of a reduce-reduce conflict?
Exercise 10.4.3: What does bison do when it encounters a shift-reduce conflict?
Exercise 10.4.4: What action does bison take when it encounters a shift- reduce conflict?
Exercise 10.4.5: What does bison do when it encounters a reduce-reduce conflict?
Exercise 10.4.6: Give a specific example of a shift-reduce conflict. Show the complete grammar, input string, parse stack, and value stack to clearly il-
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 295
Listing 10.23: Makefile for calculator (version 3).
1 SRC = calc3 2 CC = gcc −g 3 LEX = flex 4 LEX_FLAGS = 5 YACC = bison 6 YACC_FLAGS = −d −t 7
8 all : interpreter compiler parsetree 9 # a l l : i n t e r p r e t e r compiler
10
11 interpreter : lex .yy .o $ (SRC ) .tab .o interpreter .o 12 $ (CC ) −lm lex .yy .o $ (SRC ) .tab .o interpreter .o −o interpreter 13
14 compiler : lex .yy .o $ (SRC ) .tab .o compiler .o 15 $ (CC ) lex .yy .o $ (SRC ) .tab .o compiler .o −o compiler 16
17 parsetree : lex .yy .o $ (SRC ) .tab .o parsetree .o 18 $ (CC ) lex .yy .o $ (SRC ) .tab .o parsetree .o −o parsetree 19
20 lex .yy .o : lex .yy .c $ (SRC ) .tab .h $ (SRC ) .h 21 $ (CC ) −c lex .yy .c 22
23 lex .yy .c : $ (SRC ) .l 24 $ (LEX ) $ (LEX_FLAGS) $ (SRC ) .l 25
26 $ (SRC ) .tab .o : $ (SRC ) .tab .c $ (SRC ) .h 27 $ (CC ) −c $ (SRC ) .tab .c 28
29 $ (SRC ) .tab .c : $ (SRC ) .y 30 $ (YACC ) $ (YACC_FLAGS) $ (SRC ) .y 31
32 $ (SRC ) .tab .h : $ (SRC ) .y 33 $ (YACC ) $ (YACC_FLAGS) $ (SRC ) .y 34
35 interpreter .o : interpreter .c $ (SRC ) .h $ (SRC ) .tab .h 36 $ (CC ) −c interpreter .c 37
38 compiler .o : compiler .c $ (SRC ) .h $ (SRC ) .tab .h 39 $ (CC ) −c compiler .c 40
41 parsetree .o : parsetree .c $ (SRC ) .h $ (SRC ) .tab .h 42 $ (CC ) −c parsetree .c 43
44 clean : 45 −rm * .o $ (SRC ) .tab .h $ (SRC ) .tab .c lex .yy .c interpreter compiler ←֓
parsetree
C O
N FI
D E
N TI
A L
D R
A FT
296 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
lustrate the conflict and to convince us that you know what you are talking about.
Exercise 10.4.7: Give a specific example of a shift-reduce conflict. Show a complete BNF grammar, input string, and parse stack to clearly illustrate the conflict. Use . (dot) to denote the top of the stack.
Exercise 10.4.8: Give a specific example of a reduce-reduce conflict. Show the complete grammar, input string, parse stack, and value stack to clearly illustrate the conflict and to convince us that you know what you are talk- ing about.
Exercise 10.4.9: Give a specific example of a reduce-reduce conflict. Show a complete BNF grammar, input string, parse stack, and value stack to clearly illustrate the conflict. Use . (dot) to denote the top of the stack.
Exercise 10.4.10: Consider the following context-free grammar in EBNF. Would this grammar pose a problem bison, even without directives to disambiguate the grammar? Explain why or why not. Be specific.
<stmt> ::= if <expr> <stmt>
<stmt> ::= if <expr> <stmt> else <stmt> <stmt> ::= s
<expr> ::= c
Exercise 10.4.11: State whether it is preferable or not to use a left-recursive or right-recursive grammar with bison and why. Explain. Be specific.
Exercise 10.4.12: Consider the following ambiguous context-free grammar in EBNF for the dangling else problem. Does this grammar as is, and without directives to disambiguate the grammar, pose a problem for bison? Explain why or why not. Be specific.
<stmt> ::= if <expr> <stmt>|<matched stmt> <matched stmt> ::= if <expr> <matched stmt> else <stmt>
<matched stmt> ::= <other>
where the non-terminal <other> generates some non-if statement such as a print statement.
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 297
Exercise 10.4.13: In version 1 of the calculator, why is the string print -4 - 5 parsed as a sentence, if the unary minus operator has the highest precedence?
Exercise 10.4.14: In version 2 of the calculator, will the ’-’ expr %prec ’ˆ’ { $$ = $2*-1; } rule interfere with parsing the string print 2 ˆ -3;
Exercise 10.4.15: In version 2 of the calculator, what is the difference between ’-’ expr %prec ’ˆ’ { $$ = $2*-1; } and ’-’ expr %prec UMINUS { $$ = $2*-1; }?
10.4.7 Programming Exercises for Section 10.4
Exercise 10.4.16: Consider the following context-free grammar defined in EBNF (from [Lou02]):
<expr> ::= ( <list> ) | a <list> ::= <expr> [<list>]
where <expr> and <list> are non-terminals and a, (, and ) are terminals.
Automatically generate a shift-reduce, bottom-up parser by defining a flex and a bison specification of a parser for the language defined by this grammar. The parser must accepts strings from standard input (one per line) until EOF and determines whether or not each string is in the lan- guage defined by this grammar. Thus, it might be help to think of defining this language using the following context-free grammar in EBNF:
<sentence> ::= <sentence> <expr> \n | <expr> \n <expr> ::= (<list>) | a
<list> ::= <expr> | <expr> <list>
where <sentence>, <expr>, and <list> are non-terminals and a, (, ), and \n are terminals.
Factor your program into a scanner (lexical analyzer) and shift-reduce parser (syntactic analyzer) as shown in Figs. 10.3 and 10.5.
C O
N FI
D E
N TI
A L
D R
A FT
298 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
You may not assume that each lexeme will be valid and separated by ex- actly one space, or that each line will contain no leading or trailing whites- pace. There are two distinct error conditions that your program must recognize. First, if a given string does not consist of valid lexemes, then respond with this message: ‘‘...’’ contains invalid lexemes and, thus, is not a sentence. Second, if a given string consists of valid lexemes but it is not a sentence according to the grammar, then re- spond with the message: ‘‘...’’ is not a sentence. Note that the “invalid lexemes” message takes priority over the “not a sentence” message (i.e., the “not a sentence” message can only be issued if the in- put string consists entirely of valid lexemes).
You may assume that whitespace is ignored, that no line of input will ex- ceed 4,096 characters, that each line of input will end with a newline, and that no string will contain more than 200 lexemes.
Print only one line of output to standard output per line of input, and do not prompt for input. The following is a sample interactive session with the parser (> is simply the prompt for input and will be the empty string in your system):
> ( a)
"( a )" is a sentence.
> a
"a" is a sentence.
> ( ( ( a a ) ) )
"( ( ( a a ) ) )" is a sentence.
> ( a ) )
"( a ) )" is not a sentence.
> ,(a)
",(a)" contains invalid lexemes and, thus, is not a sentence.
> (( (a a ) ))
"( ( ( a a ) ) )" is a sentence.
> ( a ( a ) ) )
"( a ( a ) ) )" is not a sentence.
> (( a ) 1 )
"(( a ) 1 )" contains invalid lexemes and, thus, is not a sentence.
> (a(a))
"( a ( a ) )" is a sentence.
> ( ( a ) )
"( ( a ) )" is a sentence.
> ( )
"( )" is not a sentence.
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 299
> (
"(" is not a sentence.
You may assume the following code in your bison specification, though you must replace each ... with one line of code:
1 sentence : sentence expr '\n' { printf ("\"%s\" is a sentence.\n" , 2 temp ) ; 3 . . . } 4 | error '\n' { printf ("\"%s\" is not a sentence.\n" , 5 temp ) ; 6 . . . 7 yyclearin ; /* d is card lookahead */ 8 yyerrok ; } 9 |
10 ; 11 /* bison s p e c i f i c a t i o n f i l e parser . y */
Also write a Makefile which builds your parser. Your Makefile must include target directives for every derived file produced during the com- pilation process (i.e., each program, each object file, and any other inter- mediate files produced during code generation and compilation). Make sure that each directive also lists all files on which the derived file depends in its dependency list. Also, your Makefile must be written to carry out only the commands necessary to bring any produced file up-to-date. Your Makefile must do just enough, but no extra, work to bring the final exe- cutable for your parser up-to-date every time make is invoked. In addition, it must have an all directive and a clean directive to remove all gener- ated files. Use variables where appropriate to improve the readability of your Makefile. Your Makefilemust bring everything up-to-date, using only flex, bison, and gcc, without any warnings or errors, when make is invoked.
Exercise 10.4.17: Consider the following context-free grammar defined in EBNF:
<P> ::= () | (<P>) | ()(<P>) | (<P>)<P>
where <P> is a non-terminal and ( and ) are terminals.
C O
N FI
D E
N TI
A L
D R
A FT
300 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Complete Programming Exercise 10.4.16 using this grammar subject to all of the requirements given in that exercise.
The following is a sample interactive session with the parser:
> ()
"()" is a sentence.
> ()()
"()()" is a sentence.
> (())
"(())" is a sentence.
> (()())()
"(()())()" is a sentence.
> ((()())())
"((()())())" is a sentence.
> (a)
"(a)" contains invalid lexemes and, thus, is not a sentence.
> )(
")(" is not a sentence.
> )()
")()" is not a sentence.
> )()(
")()(" is not a sentence.
> (()()
"(()()" is not a sentence.
> ())((
"())((" is not a sentence.
> ((()())
"((()())" is not a sentence.
Exercise 10.4.18: Consider the following context-free grammar defined in EBNF from § 10.3.3:
<program> ::= <program> <expr> \n | <expr> \n <expr> ::= <expr> + <expr>
<expr> ::= <expr> * <expr> <expr> ::= − <expr>
<expr> ::= <integer> <integer> ::= 1 | 2 | 3 | . . . | ∞
where <expr> and <integer> are non-terminals and +, *, −, and 1, 2, 3, . . . are terminals.
Use flex and bison to build a C program which reads sentences in the language defined by this grammar from standard input (one per line) until
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 301
EOF and writes each expression evaluated and decorated with parentheses to indicate the order of operator application to standard output (using the format below, one per line). Normal precedence rules hold: − has the high- est, * has the second highest, and + has the lowest. Assume left-to-right associativity. The following is sample input and output for the expression evaluator (> is simply the prompt for input and will be the empty string in your system):
> 2+3*4
(2+(3*4)) = 14
> 2+3*-4
(2+(3*(-4))) = -10
> -2*3+4
(((-2)*3)+4) = -2
Do not build a parse tree to solve this problem.
Hint: Use an array implementation of a stack which contains elements of type char*. Also, use the sprintf function to convert an integer to a string. For example,
1 char * string_representation_of_an_integer = 2 malloc ( 1 0 * s izeo f ( *string_representation_of_an_integer) ) ; 3
4 /* p r i n t s the i n t e g e r 789 to 5 the s t r i n g v a r i a b l e s t r i n g r e p r e s e n t a t i o n o f a n i n t e g e r */ 6 sprintf (string_representation_of_an_integer , "%d" , 789) ; 7
8 /* next l i n e p r i n t s the i n t e g e r 789 to stdout */ 9 printf ("%s" , string_representation_of_an_integer) ;
You must explicitly deallocate any memory you explicitly allocate (i.e., your program must not have any memory leaks).
Write a Makefile which builds your expression evaluator. Your Makefile must include target directives for every derived file produced during the compilation process (i.e., each program, each object file, and any other intermediate files produced during code generation and com- pilation). Make sure that each directive also lists all files on which the derived file depends in its dependency list. Also, your Makefile must be written to carry out only the commands necessary to bring any produced
C O
N FI
D E
N TI
A L
D R
A FT
302 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
file up-to-date. Your Makefilemust do just enough, but no extra, work to bring the final executable for your evaluator up-to-date every time make is invoked. In addition, it must have an all directive and a clean directive to remove all generated files. Use variables where appropriate to improve the readability of your Makefile. Your Makefilemust bring everything up-to-date, using only flex, bison, and gcc, without any warnings or errors, when make is invoked.
Exercise 10.4.19: Build a parser to determine the order in which operators of a logical expression are evaluated. Expressions are defined by the fol- lowing context-free grammar in BNF (not EBNF):
<expr> ::= <expr> & <expr>
<expr> ::= <expr> | <expr> <expr> ::= ∼ <expr>
<expr> ::= <literal> <literal> ::= t
<literal> ::= f
where t, f, |, &, and ∼ are terminals which represent true, false, or, and, and not, respectively. The following is sample input and output for the expression evaluator (> is simply the prompt for input and will be the empty string in your system).
> f | t & f | ˜t
((f | (t & f)) | (˜t)) is false.
> ˜t | t | ˜f & ˜f & t & ˜t | f
((((˜t) | t) | ((((˜f) & (˜f)) & t) & (˜t))) | f) is true.
Notice that you must decorate the parsed expression with parentheses to indicate the order of operator-execution as well as evaluate it. Normal precedence rules hold: ∼ has the highest, & has the second highest, and | has the lowest. Assume left-to-right associativity.
Requirements:
a) Your program must read from standard input and write to standard output. Specifically, your program must read a set of expressions from standard input (one per line) and write the corresponding parenthesized
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 303
expressions (also one per line, in the format used above) to standard output.
b) Write a Makefile as indicated in Programming Exercise 10.4.18.
Exercise 10.4.20: Add a do {...} while (...); loop to the calculator (version 3).
Exercise 10.4.21: Re-instrument version 3 of the calculator so that the integer representing a literal or variable in the PTnode type is wrapped in a struct called LiteralOrVariableNode. Call this approach version 4.
PTnodeFlag flag
PTnode*
int literalOrVariable
structPTnode
union − could be any 1 of 2 LiteralOrVariable literalOrVariable
OperatorNode operator1
OperatorNode
pointer to an array of pointers of type
LiteralOrVariableNode
int oper
int nops
PTnode** operands
LiteralOrVariableNode literalOrVariable
/* copy data */
100
}
called when we see a literal or variable; creates a leaf node in parse tree
PTnode* nodePtr 100
PTnodeFlag flag
PTnode* newLiteralOrVariableNode(int literalOrVariable, PTnodeFlag flag) {
C O
N FI
D E
N TI
A L
D R
A FT
304 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
Exercise 10.4.22: Re-instrument version 4 of the calculator created in Programming Exercise 10.4.21 to factor the LiteralOrVariableNode struct into a LiteralNode struct and a VariableNode struct. Similarly, factor the newLiteralOrVariableNode function into newLiteralNode and newVariableNode functions. Call this ap- proach version 5.
PTnodeFlag flag
int variable
VariableNode
PTnode*
structPTnode
OperatorNode
LiteralNode
int literal
− could be any 1 of 3union
OperatorNode operator1
VariableNode variable
LiteralNode literal
int oper
int nops
pointer to an array of pointers of type
called a "variant record"
PTnode** operands
LiteralNode literal
/* copy data */
100
}
PTnode* nodePtr 100
PTnodeFlag flag
called when we see a literal; creates a leaf node in parse tree
PTnode* newLiteralNode(int literal, PTnodeFlag flag) {
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 305
VariableNode variable
/* copy data */
100
}
PTnode* nodePtr 100
PTnodeFlag flag
called when we see a variable; creates a leaf node in parse tree
PTnode* newVariableNode(int variable, PTnodeFlag flag) {
Exercise 10.4.23: Re-instrument version 3 of the calculator to use a dif- ferent design for the OperatorNode struct. Specifically, instead of an a pointer to an array of type PTnode*, make the operands field of the OperatorNode struct be a array of size one of pointers of type PTnode* (as shown below) and dynamically expand it as needed in the newOperatorNode function. Call this approach version 6.
PTnodeFlag flag
int oper
int nops
OperatorNode
structPTnode
int literalOrVariable union − could be any 1 of 2
OperatorNode operator1
(expandable)PTnode* operands[1]
C O
N FI
D E
N TI
A L
D R
A FT
306 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
PTnode* nodePtr
va_list ap
100 100
called when we see an operator; creates an internal node in parse tree
PTnodeFlag flag
int operatorLiteral
int numOfOperands
PTnode* operands[1]
(expandable)
PTnode* newOperatorNode(int operatorLiteral, int numOfOperands, ... ) {
}
OperatorNode operator1
Would this approach work if the union was the first field of the PTnode struct rather than the PTnodeFlag enum? Explain.
Exercise 10.4.24: Re-instrument version 4 of the calculator (i.e., Program- ming Exercise 10.4.21) to use the memory design of version 6 (i.e., Pro- gramming Exercise 10.4.23). Call this approach version 7.
Exercise 10.4.25: Re-instrument version 5 of the calculator (i.e., Program- ming Exercise 10.4.22) to use the memory design of version 6 (i.e., Pro- gramming Exercise 10.4.23). Call this approach version 8.
Exercise 10.4.26: Re-instrument version 7 of the calculator (i.e., Program- ming Exercise 10.4.24) to use the memory design depicted below where a the PTnode type is a union of structs rather than a struct containing a union. Call this approach version 9 (a memory overlay approach).
C O
N FI
D E
N TI
A L
D R
A FT
10.4. PUTTING IT ALL TOGETHER: TOWARDS INTERPRETERS 307
PTnodeFlag flag
union of struct sPTnode
OperatorNode operator1
(expandable)PTnode* operands[1]
OperatorNode
int oper
int nops
nodeFlag flag
union − could be any 1 of 3LiteralOrVariableNode literalOrVariable
PTnodeFlag flag
LiteralOrVariableNode
int literalOrVariable
Would this approach work if the nodeFlag enum type was not a mem- ber of both the LiteralOrVariableNode and OperatorNode struct types, in addition to being a member of the PTnode struct type? Ex- plain. Would this approach work if the PTnodeFlag enum was the last member of the PTnode union? Explain.
Exercise 10.4.27: Re-instrument version 8 of the calculator (i.e., Program- ming Exercise 10.4.25) to use the memory design depicted in version 9 (i.e., Programming Exercise 10.4.27). Call this approach version 10.
C O
N FI
D E
N TI
A L
D R
A FT
308 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
PTnodeFlag flag
union of struct sPTnode
(expandable)PTnode* operands[1]
OperatorNode
int oper
int nops
nodeFlag flag
LiteralNode
int literal
VariableNode
int variable
union − could be any 1 of 4
OperatorNode operator1
VariableNode variable
LiteralNode literal
PTnodeFlag flag
PTnodeFlag flag
Exercise 10.4.28:
Exercise 10.4.29: Build a graphical user interface in Qt, akin to that shown below, for the interpreter/compiler developed in Programming Project 10.5. See http://hipersayanx.blogspot.com/2013/03/ using-flex-and-bison-with-qt.html for help on using flex and bison with Qt.
C O
N FI
D E
N TI
A L
D R
A FT
10.5. PROGRAMMING PROJECT FOR CHAPTER ?? 309
10.5 Programming Project for Chapter 10
Putting It All Together
Build an interpreter and a compiler to C++ for the language BOOLexp. BOOLexp programs are defined by the following context-free grammar in BNF (not EBNF):
<sentence> ::= ( <declarations> , <expr> ) <declarations> ::= [] <declarations> ::= [ <varlist> ]
<varlist> ::= <var> <varlist> ::= <var> , <varlist>
<expr> ::= <expr> & <expr> <expr> ::= <expr> | <expr>
<expr> ::= ∼ <expr> <expr> ::= <literal>
<expr> ::= <var> <literal> ::= t <literal> ::= f
<var> ::= a . . .e <var> ::= g . . .s <var> ::= u . . .z
where t, f, |, &, and ∼ are terminals which represent true, false, or, and, and not, respectively, and all lower case letters except for f and t are ter- minals each representing a variable. Each variable in the variable list is bound to true in the expression. Any variable used in any expression not contained in the variable list is assumed to be false.
Factor your system into the following three components:
• Front End (i.e., a shift-reduce parser, automatically generated with flex and bison, which produces a parse tree)
• Interpreter (i.e., expression evaluator)
• Compiler (i.e., translator)
The general approach to this problem is to build a parse tree for each sentence and then implement two traversals of the tree: one traversal eval- uates the expression as it walks the tree (the interpreter component) and
C O
N FI
D E
N TI
A L
D R
A FT
310 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
the other generates C++ code as it walks the tree (the compiler compo- nent).
The following is sample input and output for the interpreter (i.e., ex- pression evaluator) (> is simply the prompt for input and will be the empty string in your system).
> ([], f | t & f | ˜t)
((f | (t & f)) | (˜t)) is false.
> ([p,q], ˜t | p | ˜e & ˜f & t & ˜q | r)
((((˜t) | p) | ((((˜e) & (˜f)) & t) & (˜q))) | r) is true.
Notice that when interpreting a BOOLexp program you must not only eval- uate the logical expression (the first element of the program pair) but also determine the order in which operators of it are evaluated and illustrate that order in the diagrammed output. Normal precedence rules hold: ∼ has the highest, & has the second highest, and | has the lowest. Assume left-to-right associativity. When compiling a BOOLexp program to C++ you must generate a C++ program with equivalent semantics as the BOOLexp program.
Requirements:
a) Use flex and bison to develop the front end of your system (i.e., scan- ner and parser, respectively).
b) Implement a -i option indicating to only interpret and a -c option in- dicating to only compile. If no command line options are given, then in- terpret and compile. Alternatively, generate two seperate executables: one for the interpreter and one for the compiler. Only the first approach is demonstrated below.
c) Your program must read from standard input and write to standard output. Specifically, your program must read a set of expressions from standard input (one per line) and write the corresponding parenthe- sized expressions (also one per line, in the format used above) to stan- dard output. When compiling, the compiled programs are written to files, rather than standard output.
d) Free all memory that you explicitly allocated from the heap. Specifi- cally, free the entire parse tree which means you must free each node,
C O
N FI
D E
N TI
A L
D R
A FT
10.5. PROGRAMMING PROJECT FOR CHAPTER ?? 311
and for internal (operator) nodes you must free the buffer which stores the pointers to its children, if used.
e) The C++ programs you compile to must compile with g++ without er- rors or warnings.
f) Write a Makefile that builds your system (interpreter and compiler) as indicated in Programming Exercise 10.5.18,
Sample Test Data
Sample standard input is available at http://perugini. cps.udayton.edu/teaching/books/SPUC/www/files/
boolexpstdin.txt and sample standard output is available at http://perugini.cps.udayton.edu/teaching/books/SPUC/
www/files/boolexpstdout.txt. A sample test session with boolexp on that data is available at http://perugini.cps.udayton.edu/ teaching/books/SPUC/www/files/boolexptestsession.txt. These test cases are not exhaustive. There is also a reference boolexp executable solution for this system available at http://perugini.cps. udayton.edu/teaching/books/SPUC/www/files/boolexp. This sample test data with the reference executable is bundled and available at http://perugini.cps.udayton.edu/teaching/books/SPUC/ www/files/boolexpdata.tar.
The following is sample input and output for the interpreter (only) (> is simply the prompt for input and will be the empty string in your system).
$ ./boolexp -i
> ([] , f | t & f | ˜ t)
((f | (t & f)) | (˜t)) is false.
> ([p], f | t & f | ˜p)
((f | (t & f)) | (˜p)) is false.
> ([] , f | t | f & t | f | t & t & t | ˜ t)
(((((f | t) | (f & t)) | f) | ((t & t) & t)) | (˜t)) is true.
> ([p, q], ˜t | p | ˜e & ˜f & t & ˜q | r)
((((˜t) | p) | ((((˜e) & (˜f)) & t) & (˜q))) | r) is true.
> ([] , t & f & t | ˜ t & ˜ f & ˜ f | f & t & ˜ t)
((((t & f) & t) | (((˜t) & (˜f)) & (˜f))) | ((f & t) & (˜t))) is false.
> ([] , t & f | t & f | t & f | f & ˜ t | f)
(((((t & f) | (t & f)) | (t & f)) | (f & (˜t))) | f) is false.
> ([], t & t & ˜ f | f & ˜ t | ˜ t & f)
((((t & t) & (˜f)) | (f & (˜t))) | ((˜t) & f)) is true.
> ([ ], t & t | ˜ f & ˜ f | t & f | ˜ t)
C O
N FI
D E
N TI
A L
D R
A FT
312 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
((((t & t) | ((˜f) & (˜f))) | (t & f)) | (˜t)) is true.
> ([a,b,c], a & ˜ f & ˜ f & b | ˜ t | c)
(((((a & (˜f)) & (˜f)) & b) | (˜t)) | c) is true.
> ([], t & ˜ f & ˜ t | ˜ f & ˜ t & t)
(((t & (˜f)) & (˜t)) | (((˜f) & (˜t)) & t)) is false.
> ([], t & ˜ f | t & ˜ f)
((t & (˜f)) | (t & (˜f))) is true.
> ([], t | f | t & f | t | ˜ t & t | f)
(((((t | f) | (t & f)) | t) | ((˜t) & t)) | f) is true.
> ([], ˜ f & t & ˜ t | ˜ f | t & ˜ f)
(((((˜f) & t) & (˜t)) | (˜f)) | (t & (˜f))) is true.
> ([],˜ t | ˜ f | ˜ t & ˜ f & f & ˜ t)
(((˜t) | (˜f)) | ((((˜t) & (˜f)) & f) & (˜t))) is true.
> ([x,y], ˜x | t | ˜z & ˜f & y & ˜y | f)
((((˜x) | t) | ((((˜z) & (˜f)) & y) & (˜y))) | f) is true.
> ([],˜t|˜f&˜t|˜t&˜f|˜t&˜t)
((((˜t) | ((˜f) & (˜t))) | ((˜t) & (˜f))) | ((˜t) & (˜t))) is false.
> ˆD
$
The following is a sample interactive test session for the system (inter- preter and compiler):
$ ./boolexp
> ([p, q], ˜t | p | ˜e & ˜f & t & ˜q | r)
((((˜t) | p) | ((((˜e) & (˜f)) & t) & (˜q))) | r) is true.
> ([] , t & f & t | ˜ t & ˜ f & ˜ f | f & t & ˜ t)
((((t & f) & t) | (((˜t) & (˜f)) & (˜f))) | ((f & t) & (˜t))) is false.
> ([] , t & f | t & f | t & f | f & ˜ t | f)
(((((t & f) | (t & f)) | (t & f)) | (f & (˜t))) | f) is false.
> ([], t & t & ˜ f | f & ˜ t | ˜ t & f)
((((t & t) & (˜f)) | (f & (˜t))) | ((˜t) & f)) is true.
ˆD
$
$ cat 1.cpp
#include<iostream>
using namespace std;
main() {
bool p = true;
bool q = true;
bool e = false;
bool r = false;
C O
N FI
D E
N TI
A L
D R
A FT
10.5. PROGRAMMING PROJECT FOR CHAPTER ?? 313
bool result = !true || p || !e && !false & true && !q || r;
cout << "The result is ";
if (result)
cout << "true";
else
cout << "false";
cout << "." << endl;
}
$
$ cat 4.cpp
#include<iostream>
using namespace std;
main() {
bool result = true && true && !false || false && !true || !true & false;
cout << "The result is ";
if (result)
cout << "true";
else
cout << "false";
cout << "." << endl;
}
$
$ ./boolexp
> ([ ], t & t | ˜ f & ˜ f | t & f | ˜ t)
((((t & t) | ((˜f) & (˜f))) | (t & f)) | (˜t)) is true.
> ([a,b,c], a & ˜ f & ˜ f & b | ˜ t | c)
(((((a & (˜f)) & (˜f)) & b) | (˜t)) | c) is true.
> ([], t & ˜ f & ˜ t | ˜ f & ˜ t & t)
(((t & (˜f)) & (˜t)) | (((˜f) & (˜t)) & t)) is false.
> ([], t & ˜ f | t & ˜ f)
((t & (˜f)) | (t & (˜f))) is true.
> ([], t | f | t & f | t | ˜ t & t | f)
(((((t | f) | (t & f)) | t) | ((˜t) & t)) | f) is true.
> ([], ˜ f & t & ˜ t | ˜ f | t & ˜ f)
(((((˜f) & t) & (˜t)) | (˜f)) | (t & (˜f))) is true.
> ([],˜ t | ˜ f | ˜ t & ˜ f & f & ˜ t)
C O
N FI
D E
N TI
A L
D R
A FT
314 CHAPTER 10. AUTOMATIC PROGRAM GENERATION
(((˜t) | (˜f)) | ((((˜t) & (˜f)) & f) & (˜t))) is true.
ˆD
$
$ ./boolexp -c
> ([x,y], ˜x | t | ˜z & ˜f & y & ˜y | f)
$
$ ./boolexp -ci
> ([],˜t|˜f&˜t|˜t&˜f|˜t&˜t)
((((˜t) | ((˜f) & (˜t))) | ((˜t) & (˜f))) | ((˜t) & (˜t))) is false.
ˆD
$
$ cat 1.cpp
#include<iostream>
using namespace std;
main() {
bool result = !true || !false && !true || !true && !false || !true && !true;
cout << "The result is ";
if (result)
cout << "true";
else
cout << "false";
cout << "." << endl;
}
10.6 Thematic Take-Aways
10.7 Chapter Summary
10.8 Key Terms
10.9 Bibliographic Notes
C O
N FI
D E
N TI
A L
D R
A FT
Bibliography
[AS96] H. Abelson and G.J. Sussman. Structure and Interpretation of Com- puter Programs. MIT Press, Second edition, 1996.
[ATT] UNIX System Calls and Libraries.
[BE75] F.L. Bauer and J. Eickel. Compiler Construction: An Advanced Course. Springer-Verlag, New York, NY, 1975.
[C] C Language for Experienced Programmers.
[KP84] B.W. Kernighan and R. Pike. The UNIX Programming Environment. Prentice Hall, Second edition, 1984.
[KR88] B.W. Kernighan and D.M. Ritchie. The C Programming Language. Prentice Hall, Second edition, 1988.
[Lou02] K.C. Louden. Programming Languages: Principles and Practice. Brooks/Cole, Pacific Grove, CA, second edition, 2002.
[Nie] T. Niemann. Lex and Yacc Tutorial. ePaperPress. http:// epaperpress.com/lexandyacc/.
[Rob99] A. Robbins. UNIX in a Nutshell. O’Reilly, Beijing, third edition, 1999.
[RR03] K.A. Robbins and S. Robbins. UNIX Systems Programming: Com- munication, Concurrency, and Threads. Prentice Hall, second edi- tion, 2003.
[SG] Silberschatz and Galvin. Operating Systems Concepts. Addison- Wesley, fourth edition.
315
C O
N FI
D E
N TI
A L
D R
A FT
316 BIBLIOGRAPHY
[SGG07] A. Silberschatz, P.B. Galvin, and G. Gagne. Operating Systems Concepts with Java. John Wiley and Sons, Inc., seventh edition, 2007.
C O
N FI
D E
N TI
A L
D R
A FT
Appendix A
Programming Style Guide
It has been said that
Programs must be written for people to read, and only inciden- tally for machines to execute [AS96].
Therefore, as discussed in class, it is important to follow some basic guide- lines for writing source code. Follow the guidelines below for all pro- gramming assignments. Note: we may evolve this set of guidelines as the course progresses.
Remember, assignments provide you with an opportunity to show us that you care enough to submit a professionally-prepared submission. Practice good programming habits early and you will be rewarded with effective and efficient programs. Following this guide will improve the readability, writeabiliy, and maintainability of your programs and there- fore reduce the likelihood of costly errors which will save you time in de- bugging. A portion of your grade for all work will be evaluated for style.
• Source code files must be readable by vi and contain only UNIX new- lines (only line feeds). In other words, source code files must not con- tain non-UNIX newlines (line feed and carriage return pairs, e.g., ˆM s).
• Assignments must be prepared exclusively using UNIX systems.
• Begin each source file with the following header filled-in appropri- ately.
317
C O
N FI
D E
N TI
A L
D R
A FT
318 APPENDIX A. PROGRAMMING STYLE GUIDE
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / / fi lename : env . c / / d e s c r i p t i o n : Implements the UNIX env u t i l i t y . / / author : Last , F i r s t / log in id : cps444−n1 . xx / / c l a s s : CPS 444 / i n s t r u c t o r : Perugini / assignment : Homework #1 / / assigned : January 18 , 2006 / due : January 25 , 2006 / / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
• Begin each shell script file with the following header filled-in appro- priately.
# * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * # # f i lename : f i l t e r # # d e s c r i p t i o n : Implements a f i l t e r s c r i p t . # # author : Last , F i r s t # log in id : cps444−n1 . xx # # c l a s s : CPS 444 # i n s t r u c t o r : Perugini # assignment : Homework #1 # # assigned : January 18 , 2006 # due : January 25 , 2006 # # * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
• Do not allow any line of code to exceed 80 characters in length. Most text editors have an option to give you column position. Find an ap- propriate place to break long program statements to continue them on the following line. Break long character strings using string concate- nation.
C O
N FI
D E
N TI
A L
D R
A FT
319
• Indent all code within a block.
• Do not use tabs anywhere in your code. For each level of indentation, use three spaces. Tabs cause different amounts of horizontal spacing on different systems. By using spaces (and a fixed-width font), you guarantee your code will be properly indented for every system, edi- tor, and printout.
• Align corresponding opening and closing braces, begin or ends, or any other program unit delimiters. My preference for curly braces { } (or similar delimiters) is to always place the opening brace on the same line as the block it opens. This makes it easy to see where blocks of code, such as loops, begin and end, and does not waste a line of code. An alternate style is to place each brace on line by itself. You may use either of these styles, but do not mix them. Always be consistent. Investigate the use of the UNIX utility indent. You can define your own indent profile, named .indent.pro, and place it in your home directory. Running indent on your source code files using this profile is an easy way to ensure consistency in your coding conventions.
• Use descriptive (variable, constant, procedure, function) identifiers and use appropriate naming conventions for variables (total sold) and constants (OUNCES PER TON). Remember, syntax should imply semantics.
Cryptic: int x, y, z
Descriptive: int dollars, average, weight
• Initialize variables (to a value of the appropriate type) before you use them to avoid garbage. This can be done when you declare the vari- able or with an assignment statement before the variable is used.
Incorrect: double radius = 3;
Correct: double radius = 3.0;
• Avoid type mismatches. Following this guideline will make your programs more portable.
C O
N FI
D E
N TI
A L
D R
A FT
320 APPENDIX A. PROGRAMMING STYLE GUIDE
Consider (getchar() returns an int):
char c ;
while ( (c = getchar ( ) ) != EOF ) { . . .
}
• Do not assign a variable or literal of one type to a variable of another, even if our compilers/interpreters permit it. Following this guideline will make your programs more portable.
Consider: double avg score = 76.7; int exam1 = 86;
Incorrect: avg score = exam1;
Correct: avg score = static cast (exam1);
Consider: double average = 0.0; int total = 967,
num students = 10;
Incorrect: average = total/num students;
Correct: average = static cast (total)/num students;
• Avoid the use goto unless necessary.
• Avoid the use of global variables.
• Use comments to explain critical subsections or any ambiguous parts of your programs (e.g., a cryptic or obfuscated expression).
• Use named constants rather than magic numbers, and use the #definepreprocessor directive to create named constants. This gives you a single point of modification which will save you time and re- duce bugs.
Original:
const i n t SIZE 76 const i n t NUM_OF_RECORDS = 1 0 1 ; const double RATE = 3 . 1 8 8 ;
const char JOB_ARRIVAL = 'A' const char IO = 'I' const char JOB_TERMINATION = 'T'
C O
N FI
D E
N TI
A L
D R
A FT
321
switch (event ) { case JOB_ARRIVAL : case IO : case JOB_TERMINATION :
}
Recommended:
# define SIZE 76 # define NUMBER OF RECORDS 101 # define RATE 3.188
# define JOB ARRIVAL 'A' # define IO 'I' # define JOB TERMINATION 'A'
switch (event ) { case JOB_ARRIVAL : case IO : case JOB_TERMINATION :
}
• Always use enumerated types where they make your code more read- able.
example:
typedef enum { JAN = 1 , FEB , MAR , APR , MAY , JUN , JUL , AUG , SEP , OCT , NOV , DEC } months ;
main ( ) { months my_months ;
switch (my_months) { case JAN :
. . . break ;
case FEB : . . . break ;
. . .
case NOV :
C O
N FI
D E
N TI
A L
D R
A FT
322 APPENDIX A. PROGRAMMING STYLE GUIDE
. . . break ;
case DEC : . . . break ;
} }
• Enforce the principle of least privilege.
• Avoid using local variables with same name in different scopes (they are different variables).
• Always exit from main with a 0 exit status to indicate success and a non-zero status to indicate failure. Use exit rather than return to make your program more uniform. Use an int as a return type for main.
example:
i n t main ( ) { FILE* fp = NULL ; char * filename = "input.txt" ;
i f ( (fp = fopen (filename , "r" ) ) == NULL ) { fprintf (stderr , "cannot open %s\n" , filename ) ; exit ( 1 ) ;
} else { . . . exit ( 0 ) ;
}
• Always initialize pointer variables.
examples:
Node* node_ptr = NULL ;
char * filename = "input.txt" ;
FILE* myinstream = fopen (filename , "r" ) ;
• Avoid allocating more memory than necessary for anything.
C O
N FI
D E
N TI
A L
D R
A FT
323
• When allocating memory by calling sizeof in a call to malloc al- ways pass a variable to sizeof rather than a datatype.
Original:
1 i n t * array = NULL ; 2
3 array = ( i n t * ) malloc ( s izeo f ( i n t ) * 1 0 ) ;
Recommended:
1 i n t * array = NULL ; 2
3 array = malloc ( s izeo f ( *array ) * 1 0 ) ;
The approach is recommended because if you decide later to change the type of ptr, then you only have to change the type in the declara- tion (i.e., the line containing the call to malloc need not change at all). This style is an aid to program modification because the definition(s) may be a few hundred lines of code below the declaration. Using the original approach, if the type changes, it must be changed in three places: the declaration, the type cast, and the argument to sizeof.
• When allocating memory, always verify that the memory was allo- cated successfully.
example:
i f ( ( node_ptr = malloc ( s izeo f ( *node_ptr) ) ) == NULL ) { fprintf (stderr , "out of memory!" ) ; exit ( 1 ) ;
} else { . . . exit ( 0 ) ;
}
• Once finished, always free memory that you explicitly allocated.
example:
i f ( ( node_ptr = malloc ( s izeo f ( *node_ptr) ) ) == NULL ) {
C O
N FI
D E
N TI
A L
D R
A FT
324 APPENDIX A. PROGRAMMING STYLE GUIDE
fprintf (stderr , "out of memory!" ) ; exit ( 1 ) ;
} else { . . . free (node_ptr) ; exit ( 0 ) ;
}
• When opening a file, always verify that the file was opened success- fully.
example:
i f ( ( fp = fopen (filename , "r" ) ) == NULL ) { fprintf (stderr , "cannot open %s\n" , filename) ; exit ( 1 ) ;
} else { . . . exit ( 0 ) ;
}
• Always close files that you explicitly opened.
example:
i f ( ( fp = fopen (filename , "r" ) ) == NULL ) { fprintf (stderr , "cannot open %s\n" , filename) ; exit ( 1 ) ;
} else { . . . fclose (fp ) ; exit ( 0 ) ;
}
• Always print error and debugging messages to stderr (output writ- ten to stdout is line buffered).
example:
i f ( ( fp = fopen (filename , "r" ) ) == NULL ) { fprintf (stderr , "cannot open %s\n" , filename) ; exit ( 1 ) ;
} else { . . .
C O
N FI
D E
N TI
A L
D R
A FT
325
}
• Avoid buffer overflows.
Consider:
char password[17];
printf ("Please enter your password: ");
Incorrect: scanf ("%s", password);
Correct: scanf ("16%s", password);
• Use perror (errno.h) and/or strerror (string.h) to display error messages where appropriate.
• Make appropriate use of qualifiers such as const, restrict, volatile, and register on function parameters and elsewhere (in the case of const, volatile, and register).
• Functions:
– Always use a procedure/function prototype.
– Use parameter names in procedure/function prototypes.
– Use different identifiers for formal parameters and actual param- eters to reinforce that they are different variables.
– Precede every procedure/function with the following header ex- plaining its purpose, the meaning of each parameter, precondi- tion, postcondition, and the general strategy of its implementa- tion, if applicable.
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / /purpose : To compute the f a c t o r i a l of a non−negat ive i n t e g e r . / / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ i n t factorial ( i n t n ) {
i f (n == 0) then return 1 ;
else
return n*factorial (n−1) ; }
C O
N FI
D E
N TI
A L
D R
A FT
326 APPENDIX A. PROGRAMMING STYLE GUIDE
– No routine/subprogram, block, procedure, function, or method (or message) should exceed 50 lines of code.
• The following guidelines are from [RR03, pp.29–30]:
Error handling is a key issue in writing reliable systems pro- grams. When you are writing a function, think in terms of that function being called millions of times by the same appli- cation. How do you want the function to behave? In general, functions should never exit on their own, but rather should always indicate an error to the calling program. This strat- egy gives the caller an opportunity to recover or shut down gracefuly.
Functions should also not make unexpected changes to the process state that persist beyond the return from the function. For example, if a function blocks signals, it should restore the signal mask to its previous value before returning.
Finally, the function should release all the hidden resources that it uses during its execution. Suppose a function allocates a temporary buffer by calling malloc and does not free it before returning. One call to this function may not cause a problem, but hundreds or thousands of successive calls may cause the process memory usage to exceed its limits. Usu- ally, a function that allocates memory should either free the memory or make a pointer available to the calling program. Otherwise, a long-running program may have a memory leak; that is, memory ‘leaks’ out of the system and is not available until the process terminates.
You should also be aware that the failure of a library function usually does not cause your program to stop executing. In- stead, the program continues, possibly using inconsistent or invalid data. You must examine the return value of every library function that can return an error that affects the running of your program, even if you think the chance of such an error occurring is remote.
Your own functions should also engage in careful error han- dling and communication. Standard approaches to handling
C O
N FI
D E
N TI
A L
D R
A FT
327
errors in UNIX programs include the following.
– Print out an error message and exit the program (only in main).
– Return -1 or NULL and set an error indicator such as errno.
– Return an error code.
In general, functions should never exit on their own but should always report an error to the calling program. Error messages within a function may be useful during the debug- ging phase but generaly should not appear in the final ver- sion. A good way to handle debugging is to enclose debug- ging print statements in a conditional compilation block so you can reactivate them if necessary [RR03, pp.29–30].
• The following guidelines are from [RR03, pp.30–31]:
Most library functions provide good models for implement- ing functions. Here are some guidelines to follow.
1. Make use of return values to communicate information and to make error trapping easy for the calling program.
2. Do not exit from functions. Instead, return an error value to allow the calling program flexibility in handling the error [Explicitly set errno for all errors, and do not rely on the fact that a function which fails may set errno automatically for you. Common errors include exceeding available memory or file I/O open/close, read/write er- rors. See the GNU webpage for libc for a list error codes which are #defined in error.h (e.g., use ENOMEM for the former and EIO for the latter errors above)].
3. Make functions general but usable. (Sometimes there are conflicting goals.)
4. Do not make unnecessary assumptions about sizes of buffers. (This is often hard to implement.)
5. When it is necessary to use limits, use standard system- defined limits, [e.g., MAX CANON, #defined in limits.h] rather than arbitrary constants.
C O
N FI
D E
N TI
A L
D R
A FT
328 APPENDIX A. PROGRAMMING STYLE GUIDE
6. Do not reinvent the wheel – use standard library functions when possible.
7. Do not modify input parameter values unless it makes sense to do so.
8. Do not use static variable or dynamic memory allocation if automatic allocation will do just as well.
9. Analyze all the calls to the malloc family to make sure the program frees the memory that was allocated.
10. Consider whether a function is ever called recursively or from a signal handler or from a thread. Functions with variables of static storage class may not behave in the de- sired way. (The error number can cause a big problem here.)
11. Analyze the consequences of interruptions by signals.
12. Carefully consider how the entire program termi- nates [RR03, pp.30–31].
• Do not use a system call (e.g., open) where a library call (e.g., fopen) will suffice.
• Shell scripts:
– Always terminate with a proper exit statement (0 for success and non-zero for failure).
– Always start with a proper interpreter directive.
# !/ bin/sh # !/ bin/ksh # !/ bin/bash # !/ bin/csh
or
# !/ usr/bin/env ksh # !/ usr/bin/env bash
• Be consistent in your application of the above guidelines.
C O
N FI
D E
N TI
A L
D R
A FT
329
• Overall, write your programs such that they are self-documenting. In other words, structure your code such that the program itself provides its own documentation. Self-documentation means using descriptive identifiers and a consistent, aligned format.
C O
N FI
D E
N TI
A L
D R
A FT
330 APPENDIX A. PROGRAMMING STYLE GUIDE
C O
N FI
D E
N TI
A L
D R
A FT
Appendix B
Quick vi Reference
1. Invoking and exiting vi 6. Searching $ vi file invoke vi /string find next string $ view file opens file in read-only mode ?string reverse search :wq write and quit n repeat last / or ? :w write N repeat last / or ? backwards :w file write to file :w! file overwrite existing file 7. Change Text :q quit rc replace char with c :q! unconditional quit nsstring<ESC> substitute n chars with string
cctext<ESC> change line with text 2. Display Text ncctext<ESC> change n lines <CTRL/d> scroll down cwtext<ESC> change word <CTRL/u> scroll up c$text<ESC> change to end of line <CTRL/f> page forward nJ join n lines <CTRL/b> page backward r<RET> split line
:1,$s/string/newstring/g substitution 3. Cursor Movement :%s/string/newstring/g → <SP> l next char ← <BS> h previous char 8. Copying Text ↓ j char below yy yank entire line ↑ k char above nyy yank n lines from current line <RET> beginning of next line yw yank word - beginning of previous line y$ yank to end of line G GOTO last line p put after char (line) :n nG GOTO line n P put before char (line) $ end of line ˆ beginning of line 9. Move Text w nw W nW forward beginning of word use delete instead of yank e ne E nE end of word b nb B nB back beginning of word 10. Miscellaneous
xp transpose 2 characters 4. Text Creation :r file read file into buffer atext<ESC> append after cursor :!spell % run shell command on current file itext<ESC> insert before cursor <CTRL/l> redraw screen otext<ESC> open line below $ vi -r file recovery Otext<ESC> open line above
11. Setting Options 5. Delete Text :set number number lines x delete char :set nonumber turn off numbers nx delete n chars :set list display tabs and end of lines rc replace character :set nolist turn off list dd delete current line :set showmode indicate input mode ndd delete n lines :set noshowmode turn off showmode dw delete word :set wm=10 define automatic right margin d$ delete to end of line :set wm=0 turn off wm u undo last editing command
331
C O
N FI
D E
N TI
A L
D R
A FT
332 APPENDIX B. QUICK VI REFERENCE
C O
N FI
D E
N TI
A L
D R
A FT
Appendix C
vi Reference
Summary of vi Commands and Functions
Entering/Leaving vi, File Control
Commands given from UNIX:
%vi filename edit filename, display beginning of file %vi + filename edit filename, display end of file %vi +n filename edit filename, begin display at line n %vi list edit first file in list, use :m from within vi to edit next %view filename view file in read-only mode; cannot make changes
%vi -r list files saved when system crashed (recovery files) %vi -r filename recover filename
333
C O
N FI
D E
N TI
A L
D R
A FT
334 APPENDIX C. VI REFERENCE
Commands given from vi:
:w write changes to current file :w filename write changes to new file filename :w! filename write changes to filename, overwriting existing file
:q quit vi; will not quite if there are changes to file :q! quit vi, discard changes :wq write changes to file, then quite vi ZZ same as :wq
:e filename edit filename :e + filename edit filename, display end of file :e +n filename edit filename, begin display at line n :e! re-edit current file, discarding changes :n edit next file specified in argument list vi command was given :n list specify new list of files to edit
:f display filename and current line CTRL-G same as :f
:sh run a UNIX shell; use exit or CTRL-D to return :! command run the specified UNIX command then return
C O
N FI
D E
N TI
A L
D R
A FT
335
Cursor Movement/Screen Display
→ or l move cursor one character to the right ← or h move cursor one character to the left ↓ or j move cursor to the next line ↑ or k move cursor to previous line <SPACE> same as→ <BACKSPACE> same as← + or <RET> move to first character of next line - move to first character of previous line 0 move to beginning of current line $ move cursor to end of current line J join current line and following line
CTRL-F move forward a screenful CTRL-B move backward a screenful CTRL-D move forward half a screenful CTRL-U move backward half a screenful H move to beginning of top line of screen (home) nH move to beginning of nth line from top of screen M move to beginning of middle line of screen L move to beginning of last line of screen nL move to beginning of nth line from bottom of screen
w move to the begnning of the next word nw move to the begnning of the nth word forward W move to the begnning of the next word, ignoring punctuation e move to the end of the word ne move to the end of the nth word forward b move backward a word B move backward a word, ignoring punctuation
) move cursor to the end of the sentence ( move cursor to the beginning of the sentence } move cursor to the end of the paragraph { move cursor to the beginning of the paragraph ]] move cursor to the end of the section [[ move cursor to the beginning of the section
C O
N FI
D E
N TI
A L
D R
A FT
336 APPENDIX C. VI REFERENCE
nG move cursor to the beginning of line number n in the file 1G move cursor to first line in file G move cursor to last line in file
fx find the next occurrence of x (find forward) Fx find the previous occurrence of x (find bckard) ; repeat f or F command; find next or previous occurrence
of same character
/ text <RET> search forward for next occurrence of text ? text <RET> search backward for next occurrence of text n after / or ?, search in same direction for same text N after / or ?, search in reverse direction for same text
CTRL-L redraw the screen
C O
N FI
D E
N TI
A L
D R
A FT
About the Author
Saverio Perugini is an Associate Professor in the Department of Computer Science at the University of Dayton.
337
C O
N FI
D E
N TI
A L
D R
A FT
338 APPENDIX C. VI REFERENCE
C O
N FI
D E
N TI
A L
D R
A FT
Colophon
This book is typeset with LATEX and BIBTEX using a 12pt Palatino font.
339
C O
N FI
D E
N TI
A L
D R
A FT
Index
Placeholder Index Entry, xi
340
- Preface
- List of Figures
- List of Tables
- Introduction to Linux
- Chapter Objectives
- Introduction
- What is Linux Programming?
- What is Systems Software?
- Examples of Systems Software
- One Dichotomy of Programming
- Another Viewpoint (Course Themes)
- Review of Operating System Nomenclature
- Why Study This Stuff Anyway?
- Conceptual Exercises for Section 1.2
- Programming Exercises for Section 1.2
- Introduction to Linux
- What is Linux?
- Hallmarks of Linux
- Historical Perspective
- The unix Philosophy
- History of unix and c
- Conceptual unix Architecture
- Accessing a unix Account
- General Syntax of unix Commands
- Getting Help on the unix System
- Unix Manual
- Introduction to the vi Editor
- Conceptual Exercises for Section 1.3
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- Files and Directories I: Manipulation and Management
- Chapter Objectives
- Basic unix File Nomenclature
- ls and cal
- Explanation of ls -l Output
- Unix Filesystem
- Absolute vs. Relative Path
- Two Special Files in Every Directory
- Navigating through Directories
- File Manipulation and Management
- Conceptual Exercises for Chapter 2
- Programming Exercises for Chapter 2
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- The Linux Shell
- Chapter Objectives
- Introduction
- Shell Commands vs. Unix Commands
- More on Redirecting Standard Error
- Kernel metacharacters
- stty Command
- Korn Shell metacharacters
- Metacharacters at Different Levels of Interpretation
- Command Substitution
- Shell metacharacter interpretation
- Shell Scripts
- Conceptual Exercises for Chapter 3
- Programming Exercises for Chapter 3
- Programming Project for Chapter 3
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- Introduction to C Programming: System Libraries and I/O
- Chapter Objectives
- Header Files vs. Libraries
- Standard c Library
- Standard i/o vs. File i/o
- Standard i/o Redirection
- Demo of cat
- Redirecting Standard i/o
- File Descriptors
- Demo of wc
- i/o in c
- Effect of a Successful Open on a File
- Analogs from c++ to c
- Review of Standard i/o Functions
- Developing cat in c
- Portability (Safety)
- String Functions
- `s' Family of printf/scanf Functions
- Using a Pointer to Traverse an Array
- Simple Macro vs. Constant
- String Copy Code
- Command-line Arguments
- The argv Array for the Call a.out -wlc myfile
- Compiling a c Program in unix
- Compiling
- C Compilation Steps Using gcc
- The key options to gcc graphically
- C Compilation Steps Graphically
- file Command
- Memory Management: Memory Allocation and Deallocation
- Conceptual Exercises for Chapter 4
- Programming Exercises for Chapter 4
- Programming Project for Chapter 4
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- Compiling C in Linux
- Chapter Objectives
- Compiling C
- Overview
- Static vs. Dynamic Linking
- More on Compiling with gcc
- Process
- Process Termination
- NULL Pointer
- extern Modifier in c
- Conditional Compilation
- Error Handling
- Debugging
- Conceptual Exercises for Section 5.2
- Programming Exercises for Section 5.2
- Building a Library in C
- Conceptual Exercises for Section 5.3
- Programming Exercises for Section 5.3
- More topics in C: Storage Classes, Thread-safe Functions, and Macros
- Declarations and Definitions
- Storage and Linkage Classes
- static Modifier in C
- Summary of static Reserved Word
- C Libraries
- Synchronization
- Thread Safe Functions
- makeargv
- Self-study
- Macros: The #define Preprocessor Directive
- Macros vs. Functions
- Conceptual Exercises for Section 5.4
- Programming Exercises for Section 5.4
- Compilation and Configuration Management
- Compilation Management: make
- Configuration Management (rcs)
- Distributed Configuration Management (git)
- Conceptual Exercises for Section 5.5
- Programming Exercises for Section 5.5
- Packaging and Compression Utilities
- ar
- tar
- gzip/gunzip
- compress/uncompress
- Conceptual Exercises for Section 5.6
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- Files and Directories II: Inodes, Hard and Symbolic Links
- Chapter Objectives
- Low-Level I/O
- Review of Linux I/O Data Structures
- Review of Buffered Output
- Library vs. System Calls
- I/O Recap
- select and poll
- Disk Statistics
- File Access (3 Types)
- File Permissions, Owners, and Groups
- Files
- Relevant Accessor/Modifier Functions, and structs
- Inodes
- File Links: Hard vs. Soft
- Hard Links
- Symbolic (Soft) Links
- Editor Examples
- od (Octal Dump) Command
- File `Types' and `Names'
- Question to investigate
- Set-uid Program
- Login Process
- Things to Do
- find Command
- Accounts
- Character and Block Special Files in Linux
- Conceptual Exercises for Chapter 6
- Programming Exercises for Chapter 6
- Programming Project for Chapter 6
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- Processes: Creation, Environment, Manipulation, and Communication
- Chapter Objectives
- Introduction
- Process Identification
- Process Creation: fork
- Background Processes
- fork Exercises
- Conceptual Exercises for Section 7.3
- Programming Exercises for Section 7.3
- Process Environment
- Variables
- Accessing the Environment
- New Account Environment
- Command-line Tips
- PATH Variable
- Korn Shell Configuration and Customization
- .profile vs. (value of) ENV
- .plan and .project
- Configuring vi
- Conceptual Exercises for Section 7.4
- Programming Exercise for Section 7.4
- Process Manipulation: wait and exec
- wait
- fork and wait Exercises
- exec
- Investigating Questions
- Process Review
- Other Things to Know
- Conceptual Exercises for Section 7.5
- Programming Exercises for Section 7.5
- Putting It All Together: Basic Shell Setup
- Interprocess Communication
- I/O Redirection
- Implementing I/O Redirection
- Helpful Functions
- Unamed and Named Pipes (Fifos)
- C Model vs. Go Model
- Signals and Job Control
- Conceptual Exercises for Section 7.7
- Programming Exercises for Section 7.7
- Client-server Programming
- Observations on Client-server Programs
- Experimental Runs of Client-server Programs
- Conceptual Exercises for Section 7.8
- Programming Exercises for Section 7.8
- Client-server Programming in Qt
- Programming Exercises for Section 7.9
- Programming Project for Chapter 7
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- Regular Expressions, Pattern Matching, and Filters
- Chapter Objectives
- Regular Expressions
- What /uses/ [Rr]eg.lar [Ee]xpre[s*]ions"026E30F ?
- Special or Metacharacters
- Regular Expression Examples
- Regular Expression Rule
- Using grep
- Full Regular Expressions
- Subtle Point about Tools that use Regular Expressions
- Conceptual Exercises for Section 8.2
- Programming Exercises for Section 8.2
- sed
- ex (Line Editor)
- Essential sed
- Some Representative Examples
- A Simple Faculty Database Example
- d for Delete
- p for Print
- More sed Jargon
- A Tale of Two Buffers
- newer Script
- Conceptual Exercises for Section 8.3
- Programming Exercises for Section 8.3
- Programming Project for Section 8.3
- Filters
- tr (anslate)
- sort
- uniq
- Spellers
- Pipeline of Filters
- Toward Database Operations: cut and paste, and join
- File Comparison Utilities
- Printing and Other Related Filter Utilities
- Conceptual Exercises for Section 8.4
- Programming Exercises for Section 8.4
- The awk Programming Language
- Introduction
- Execution Model
- Simple awking
- Fine Tuning awk
- Some Example awk Command Lines
- Gradebook Example
- Implementing uniq in awk
- Conceptual Exercises for Section 8.5
- Programming Exercises for Section 8.5
- Programming Project for Section 8.5
- Programming Projects for Chapter 8
- Linux Filter Style of Programming
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- Shell Programming
- Chapter Objectives
- Introduction
- return vs. exit
- Command-line Arguments
- Command and Control
- for Loops
- String Operators
- if Statement
- Additional Condition Tests
- while Statement
- Putting It All Together: ourwhich Script
- case Selection
- Example: Factoring Command-line Arguments
- Conceptual Exercises for Section 9.3
- Programming Exercises for Section 9.3
- Numbers and Arrays
- Numeric Variables
- Example: Renaming Multiple .c Files to .cpp
- Array Variables
- Restricted Shells
- Conceptual Exercises for Section 9.4
- Programming Exercises for Section 9.4
- Shell Programming vs. Linux Filter Style of Programming
- Conceptual Exercises for Chapter 9
- Programming Exercises for Chapter 9
- Programming Project for Chapter 9
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- Automatic Program Generation
- Chapter Objectives
- Scanner Generation: flex
- Outline
- Linux Tools for Automatically Generating Scanners and Parsers
- Structure of a flex Specification:
- Our First flex Program: cat (version 0)
- noop
- cat (version 1)
- Running flex to Automatically Generate a Scanner
- cat (version 2)
- cat (version 3)
- cat -n (version 4)
- cat -n (version 5)
- Word Count
- Pattern Overlap
- Identifying Identifiers
- Matching Quoted Strings
- States
- Matching c Strings
- Conceptual Exercises for Section 10.2
- Programming Exercises for Section 10.2
- Programming Projects for Section 10.2
- Parser Generation: bison
- Scanning and Parsing
- Evaluating Arithmetic Expressions in Linux
- Calculator (version 1)
- Marriage of flex and bison
- Running bison to Generate a Parser
- Calculator (version 2)
- Putting It All Together: Towards Interpreters
- Calculator (version 3)
- Helpful C Constructs and Capabilities
- Structures for Parse Tree Nodes
- Precedence and Associativity in Calculator (version 3)
- Interpreters: Program Evaluators
- Conceptual Exercises for Section 10.4
- Programming Exercises for Section 10.4
- Programming Project for Chapter 10
- Thematic Take-Aways
- Chapter Summary
- Key Terms
- Bibliographic Notes
- Bibliography
- Appendices
- Programming Style Guide
- Quick vi Reference
- vi Reference
- About the Author