Posted on September 15, 2023 by nexonhost
How To Use Gawk Command.
The basic gawk syntax looks like this:
gawk [options] [actions/filters] input_file
The command cannot be run without any arguments. The options are not mandatory, but for gawk to produce output, at least one action should be assigned. Actions and filters are different subcommands and selection criteria that enable gawk to manipulate data from the input file.
gawk Options
The gawk command is a versatile tool thanks to its numerous arguments. With gawk being the GNU implementation of awk, long, GNU-style options are available. Each long option has a corresponding short one.
Common options are presented below:
Option | Description |
---|---|
-f program-file, –file program-file | Reads commands from a file, which serves as a script, instead of the first argument in the terminal. |
-F fs, –field-separator fs | Uses the predefined variable |
-v var=val, –assign var=val | Assigns a value to the variable before executing a script. |
-b, –characters-as-bytes | Treats all data as single-byte characters. |
-c, –traditional | Executes gawk in compatibility mode. |
-C, –copyright | Displays the GNU Copyright message. |
-d[file], –dump-variables[=file] | Shows a list of variables, their types, and values. |
-e program-text, –source program-text | Allows the mixing of library functions and source code. |
-E file, –exec file | Turns off terminal variable assignments. |
-L [value], –lint[=value] | Prints warning messages about code not portable to other AWK implementations. |
-S, –sandbox | Runs gawk in sandbox mode. |
gawk Built-in Variables
The gawk command offers several built-in variables used to store and add value to the command. Variables are manipulated from the terminal and only affect the program when a user assigns value to them. Some important gawk built-in variables are:
Variable | Description |
---|---|
ARGC | Shows the number of terminal arguments. |
ARGIND | Displays the ARGV file index. |
ARGV | Presents an array of terminal arguments. |
ERRNO | Contains strings describing a system error. |
FIELDWIDTHS | Displays white-space separated list of field widths. |
FILENAME | Prints the input file name. |
FNR | Shows input record number. |
FS | Represents the input field separator. |
IGNORECASE | Turns case-sensitive search on or off. |
NF | Prints the input file field count. |
NR | Prints the current file line count. |
OFS | Displays the output field separator. |
ORS | Shows the output record separator. |
RS | Prints the input record separator. |
RSTART | Represents the index of the first matched character. |
RLENGTH | Represents the matched string length. |
gawk Examples
The use of gawk pattern-matching and language-processing functions are extensive. This article aims to provide practical examples through which users learn to use the gawk utility.
Print Files
By default, gawk with a print argument displays every line from the specified file. For instance, running the cat command on the my.music text file prints the following:
The gawk command displays the same result:
gawk '{print}' my.music
Print a Column
In text files, spaces are usually used as delimiters for columns. The people file consists of four columns:
Ordinal numbers.
First names.
Last names.
Year of birth.
Use gawk to show only a specific column in the terminal. For instance:
gawk '{print $2}' people
The command prints only the second column. To print multiple columns, like column one (ordinal numbers) and column two (first names), run:
gawk '{print $1, $2}' people
The gawk command also works without the comma between $1 and $2. However, there are no spaces between columns in the output:
gawk '{print $1 $2}' people
Filter Columns
The gawk command offers additional filtering options. For instance, print lines containing the capital letter O with:
gawk '/O/ {print}' people
To show only lines containing letters O or A, use piping:
gawk '/O|A/ {print}' people
The command prints any line that includes a word with capital O or A. On the other hand, use logical AND (&&) to show lines including both O and the year 1995:
gawk '/O/ && /1995/' people
The filters work with numbers as well. For example, show only people born in the 1990s with:
gawk '/199*/ {print}' people
The output shows only lines in which the fourth column includes the value 199.
Customize the output even more by combining previously mentioned options. For example, print only the first and last names of people born in 1995 or 2003 with:
gawk '/1995|2003/ {print $2, $3}' people
The command prints columns two and three as stated in the {print $2, $3} part. The output only shows lines containing the numbers 1995 and 2003, even though columns containing those numbers are hidden.
The gawk command also lets users print everything except for the lines containing the specified string with the logical NOT(!). For instance, omit lines containing the string 19 in the output:
gawk '!/19/' people
Add Line Numbers
The people file includes line numbers in the first column. In case users are working on a file without line numbers, gawk presents options to add them.
To add line numbers, execute gawk with FNR and next:
gawk '{ print FNR, $0; next}' humans
The command adds a line number before each line. The same result is achieved with the NR variable:
gawk '{print NR, $0}' mobile.txt
Find Line Count
To count the total number of lines in the file, use the END statement and the NR variable with gawk:
gawk 'END { print NR }' people
The command reads each line. Once gawk reaches END, it prints the value of NR – which contains the total number of lines. Running the same command without the END statement prints only the value of NR – the number of lines:
Filter Lines Based on Length
Use the following command option to print only lines longer than 20 characters:
gawk 'length>20' people
It also works with multiple arguments. For instance, show lines longer than 17 but shorter than 20 characters:
gawk 'length<20 && length>17' people
To display lines that are exactly 20 characters long, run:
gawk 'length==20' people
Print Info Based on Conditions
The gawk command allows for the use of the if-else statements. For instance, another way to filter only people born after 1999 is with a simple if statement:
gawk '{ if ($4>1999) print }' people
The if statement sets the condition that entries in column four have to be larger than 1999. The output shows only entries that satisfy the condition. Expand the command into an if-else statement to print lines not satisfying the original condition.
gawk '{if ($4>1999) print $0," ==>00s"; else print $0, "==>90s"}' people
The command includes:
If statement. If the condition is satisfied, gawk adds a string “==>90s” to the output line.
Else statement. In case the line doesn’t satisfy the condition, gawk still prints that line in the output, adding the “==>00s” string to the output.
Add a Header
In the same way in which the END statement allows users to modify the output at the end of the file, the BEGIN statement formats the data at the beginning.
When used with awk, the BEGIN sections are always executed first. After that, awk executes the remaining lines. One way to use the BEGIN statement is to add a header to the output.
Execute the following command to add a section above the awk output:
gawk 'BEGIN {print "No/First&Last Name/Year of Birth"} {print $0}' people
Find the Longest Line Length
Combine previous arguments with the if and END statements to find the longest line in the people file:
gawk '{ if (length($0) > max) max = length($0) } END { print max }' people
Find the Number of Fields
The gawk command also allows users to display the number of fields with the NF variable. The simplest way to display the number of fields prints a difficult-to-read output:
gawk '{print NF}' people
The command outputs the number of fields per line without any additional info. To customize the output and make it more human-readable, adjust the initial command:
gawk '{print NR, "-->", NF}' people
The command now includes:
The NR variable that adds line numbers to each output line.
The –> string that separates line numbers from the field numbers.
Another way to show line and field numbers in the people file is to print columns with NF. Note that the people file includes ordinal numbers in column one. Therefore the NR variable is omitted:
gawk '{print $0, "-->", NF}' people
Finally, to print the total number of fields, execute:
gawk '{num_fields = num_fields + NF} END {print num_fields}' people
The file does have ten lines and four columns. Hence, the output is correct.
Conclusion
After going through this tutorial, you know how to use the gawk for advanced text processing and data manipulation.