How to use gawk command in Linux.

How To Use Gawk Command.

The basic gawk syntax looks like this:

gawk [options] [actions/filters] input_file

The command cannot be run without any arguments. The options are not mandatory, but for gawk to produce output, at least one action should be assigned. Actions and filters are different subcommands and selection criteria that enable gawk to manipulate data from the input file.

 

gawk Options

The gawk command is a versatile tool thanks to its numerous arguments. With gawk being the GNU implementation of awk, long, GNU-style options are available. Each long option has a corresponding short one.

Common options are presented below:

Option

Description

-f program-file, –file program-file

Reads commands from a file, which serves as a script, instead of the first argument in the terminal.

-F fs, –field-separator fs

Uses the predefined variable fs as the input field separator.

-v var=val, –assign var=val

Assigns a value to the variable before executing a script.

-b, –characters-as-bytes

Treats all data as single-byte characters.

-c, –traditional

Executes gawk in compatibility mode.

-C, –copyright

Displays the GNU Copyright message.

-d[file], –dump-variables[=file]

Shows a list of variables, their types, and values.

-e program-text, –source program-text

Allows the mixing of library functions and source code.

-E file, –exec file

Turns off terminal variable assignments.

-L [value], –lint[=value]

Prints warning messages about code not portable to other AWK implementations.

-S, –sandbox

Runs gawk in sandbox mode.

 

gawk Built-in Variables

The gawk command offers several built-in variables used to store and add value to the command. Variables are manipulated from the terminal and only affect the program when a user assigns value to them. Some important gawk built-in variables are:

Variable

Description

ARGC

Shows the number of terminal arguments.

ARGIND

Displays the ARGV file index.

ARGV

Presents an array of terminal arguments.

ERRNO

Contains strings describing a system error.

FIELDWIDTHS

Displays white-space separated list of field widths.

FILENAME

Prints the input file name.

FNR

Shows input record number.

FS

Represents the input field separator.

IGNORECASE

Turns case-sensitive search on or off.

NF

Prints the input file field count.

NR

Prints the current file line count.

OFS

Displays the output field separator.

ORS

Shows the output record separator.

RS

Prints the input record separator.

RSTART

Represents the index of the first matched character.

RLENGTH

Represents the matched string length.

 

gawk Examples

The use of gawk pattern-matching and language-processing functions are extensive. This article aims to provide practical examples through which users learn to use the gawk utility.

 

Print Files

By default, gawk with a print argument displays every line from the specified file. For instance, running the cat command on the my.music text file prints the following:

The gawk command displays the same result:

gawk '{print}' my.music

 

Print a Column

In text files, spaces are usually used as delimiters for columns. The people file consists of four columns:

  1. Ordinal numbers.

  2. First names.

  3. Last names.

  4. Year of birth.

Use gawk to show only a specific column in the terminal. For instance:

gawk '{print $2}' people

The command prints only the second column. To print multiple columns, like column one (ordinal numbers) and column two (first names), run:

gawk '{print $1, $2}' people

The gawk command also works without the comma between $1 and $2. However, there are no spaces between columns in the output:

gawk '{print $1 $2}' people

 

Filter Columns

The gawk command offers additional filtering options. For instance, print lines containing the capital letter O with:

gawk '/O/ {print}' people

To show only lines containing letters O or A, use piping:

gawk '/O|A/ {print}' people

The command prints any line that includes a word with capital O or A. On the other hand, use logical AND (&&) to show lines including both O and the year 1995:

gawk '/O/ && /1995/' people

The filters work with numbers as well. For example, show only people born in the 1990s with:

gawk '/199*/ {print}' people

The output shows only lines in which the fourth column includes the value 199.

Customize the output even more by combining previously mentioned options. For example, print only the first and last names of people born in 1995 or 2003 with:

gawk '/1995|2003/ {print $2, $3}' people

The command prints columns two and three as stated in the {print $2, $3} part. The output only shows lines containing the numbers 1995 and 2003, even though columns containing those numbers are hidden.

The gawk command also lets users print everything except for the lines containing the specified string with the logical NOT(!). For instance, omit lines containing the string 19 in the output:

gawk '!/19/' people

 

Add Line Numbers

The people file includes line numbers in the first column. In case users are working on a file without line numbers, gawk presents options to add them.

To add line numbers, execute gawk with FNR and next:

gawk '{ print FNR, $0; next}' humans

The command adds a line number before each line. The same result is achieved with the NR variable: 

gawk '{print NR, $0}' mobile.txt

 

Find Line Count 

To count the total number of lines in the file, use the END statement and the NR variable with gawk:

gawk 'END { print NR }' people

The command reads each line. Once gawk reaches END, it prints the value of NR – which contains the total number of lines. Running the same command without the END statement prints only the value of NR – the number of lines:

 

Filter Lines Based on Length

Use the following command option to print only lines longer than 20 characters:

gawk 'length>20' people

It also works with multiple arguments. For instance, show lines longer than 17 but shorter than 20 characters:

gawk 'length<20 && length>17' people

To display lines that are exactly 20 characters long, run:

gawk 'length==20' people

 

Print Info Based on Conditions

The gawk command allows for the use of the if-else statements. For instance, another way to filter only people born after 1999 is with a simple if statement:

gawk '{ if ($4>1999) print }' people

The if statement sets the condition that entries in column four have to be larger than 1999. The output shows only entries that satisfy the condition. Expand the command into an if-else statement to print lines not satisfying the original condition.

gawk '{if ($4>1999) print $0," ==>00s"; else print $0, "==>90s"}' people

The command includes:

  • If statement. If the condition is satisfied, gawk adds a string “==>90s” to the output line.

  • Else statement. In case the line doesn’t satisfy the condition, gawk still prints that line in the output, adding the “==>00s” string to the output.

 

Add a Header

In the same way in which the END statement allows users to modify the output at the end of the file, the BEGIN statement formats the data at the beginning.

When used with awk, the BEGIN sections are always executed first. After that, awk executes the remaining lines. One way to use the BEGIN statement is to add a header to the output.

Execute the following command to add a section above the awk output:

gawk 'BEGIN {print "No/First&Last Name/Year of Birth"} {print $0}' people

 

Find the Longest Line Length

Combine previous arguments with the if and END statements to find the longest line in the people file:

gawk '{ if (length($0) > max) max = length($0) } END { print max }' people

 

Find the Number of Fields

The gawk command also allows users to display the number of fields with the NF variable. The simplest way to display the number of fields prints a difficult-to-read output:

gawk '{print NF}' people

The command outputs the number of fields per line without any additional info. To customize the output and make it more human-readable, adjust the initial command:

gawk '{print NR, "-->", NF}' people

The command now includes:

  • The NR variable that adds line numbers to each output line.

  • The –> string that separates line numbers from the field numbers.

Another way to show line and field numbers in the people file is to print columns with NF. Note that the people file includes ordinal numbers in column one. Therefore the NR variable is omitted:

gawk '{print $0, "-->", NF}' people

Finally, to print the total number of fields, execute:

gawk '{num_fields = num_fields + NF} END {print num_fields}' people

The file does have ten lines and four columns. Hence, the output is correct.

 

Conclusion

After going through this tutorial, you know how to use the gawk for advanced text processing and data manipulation.