Sed & awk (2nd Edition)

Sed & awk (2nd Edition)

Arnold Robbins, Dale Dougherty

Language: English

Pages: 581


Format: PDF / Kindle (mobi) / ePub

sed & awk describes two text processing programs that are mainstays of the UNIX programmer's toolbox.

sed is a "stream editor" for editing streams of text that might be too large to edit as a single file, or that might be generated on the fly as part of a larger data processing step. The most common operation done with sed is substitution, replacing one block of text with another.

awk is a complete programming language. Unlike many conventional languages, awk is "data driven" -- you specify what kind of data you are interested in and the operations to be performed when that data is found. awk does many things for you, including automatically opening and closing data files, reading records, breaking the records up into fields, and counting the records. While awk provides the features of most conventional programming languages, it also includes some unconventional features, such as extended regular expression matching and associative arrays. sed & awk describes both programs in detail and includes a chapter of example sed and awk scripts.

This edition covers features of sed and awk that are mandated by the POSIX standard. This most notably affects awk, where POSIX standardized a new variable, CONVFMT, and new functions, toupper() and tolower(). The CONVFMT variable specifies the conversion format to use when converting numbers to strings (awk used to use OFMT for this purpose). The toupper() and tolower() functions each take a (presumably mixed case) string argument and return a new version of the string with all letters translated to the corresponding case.

In addition, this edition covers GNU sed, newly available since the first edition. It also updates the first edition coverage of Bell Labs nawk and GNU awk (gawk), covers mawk, an additional freely available implementation of awk, and briefly discusses three commercial versions of awk, MKS awk, Thompson Automation awk (tawk), and Videosoft (VSAwk).





















multiplied by n, we can invoke the program, as follows: $ myscript n=4 myfile This spares us from having to pass "$1" as a shell variable and assigning it to n as an awk parameter inside the shell script. The masterindex, described in Chapter 12, uses the "#!" syntax to invoke awk. If your system does not support this syntax, you can change the script by removing the "#!", placing single quotes around the entire script, and ending the script with "$*", which expands to all shell command-line

mentioned in the first chapter, an awk program can be used more like a query language, extracting useful information from a file. We might say that the pattern placed a condition on the selection of records to be included in a report, namely that they must contain the string "MA". Now we can also specify what portion of a record to include in the report. The next example uses a print statement to limit the output to the first field of each record. $ awk '/MA/ { print $1 }' list John Eric Sal It

the substitute command looks for any of the following metacharacters: "]", "[", "\", "*" or ".". This regular expression is rather interesting: 1) if the close bracket is the first character in a character class, it is interpreted literally, not as the closing delimiter of the class; and 2) of the metacharacters specified, only the backslash has a special meaning in a character class and must be escaped. Also, there is no need to escape the metacharacters "^" and "$" because they only have

fields for a file. Awk supplies the number of fields for a record in the system variable NF. Therefore, rules 2 and 3 test that NF is equal to 9. This helps us avoid matching odd blank lines or the line stating the block total. Because we want to handle directories and files differently, we use another pattern to match the first character of the line. In rule 2 we test for "-" in the first position on the line, which indicates a file. The associated action increments the file counter and adds the

random number, we use an inner loop that generates selections and tests to see if they are in the pick array. (Using the in operator is much faster than looping through the array comparing subscripts.) While (select in pick), the corresponding element has been found already, so the selection is a duplicate and we reject the selection. If it is not true that select in pick, then we assign select to an element of the pick array. This will make future in tests return true, causing the do loop to

Download sample