Posted on September 13, 2023 by nexonhost
How To Use Comm Command.
The following guide will explain how to use the Linux comm command with examples.
About comm command:
The comm command is a simple Linux utility for comparing files with focus on the common content. The command compares two sorted files line by line and displays results in three columns.
comm Command Syntax
The comm command is run with at least two arguments stating the names of the files to be compared. Adding options before file names customizes the output.
The basic comm syntax is:
comm [file_name_1] [file_name_2] comm [options] [file_name_1] [file_name_2]
comm Command Options
Using comm is simple, but appending options provides additional customization.
The table below includes all comm options:
Command | Description |
---|---|
-1 | Prints the output without the first column, hiding the lines unique to the first file. |
-2 | Hides the second column in the output (lines unique to the second file). |
-3 | Does not print the column containing the common lines. |
–check-order | Checks whether files are sorted. |
–nocheck-order | Prints the result without checking whether the files are sorted. |
–output-delimiter=[any character] | Replaces the default spaces in the output with another character. |
–total | Shows the total number of lines in each column. |
-z | Displays the output lines as NULL-terminated instead of the newline-terminated default output. |
–help | Shows help information. |
–version | Displays version information. |
Linux comm Examples
The comm command works with two sorted files. To understand how comm works, set up a test environment:
Create two test files.
Name the files (for example, File_1 and File_2).
Add different words or numbers to each file.
Make sure some content overlaps.
Note: Linux offers numerous ways to create files from the terminal. The easiest way is to use the touch command.
Use the cat command to display the File_1 and File_2 content. The output shows that files overlap in three words (art, dog, and way).
The following sections use File_1 and File_2 to explain how comm works.
Compare Two Files
Compare two sorted files line by line with:
comm File_1 File_2
Hide Columns
Use arguments -1, -2, and -3 and the combinations to display only particular columns. For instance, print only lines common for both files with:
comm -12 File_1 File_2
Using -12 with comm hides the first and second columns, leaving only the one containing lines shared by both files.
On the other hand, -3 hides column three and displays lines unique to each file:
comm -3 File_1 File_2
Ignore Case
Comparison with comm is case sensitive. All words in File_1 and File_2 are lowercase, so comm recognizes art, dog, and way as common for both files.
Case differences with the word art/Art between two files prompt comm to register the word as unique to each file. While comm does not accept -i as an option to ignore case, the tr option provides a workaround.
Use Tr on the two files to convert case and then redirect the output to temporary files (Temp_1, Temp_2):
tr A-Z a-z <File_1 > Temp_1
tr A-Z a-z <File_2 > Temp_2
Tr converts content in both files to lowercase and creates new files to save the output (Temp_1, Temp_2). The original files remain unchanged.
Run comm on Temp_1 and Temp_2 to compare files while “ignoring” case:
comm Temp_1 Temp_2
The command outputs lowercase art as the common word for both files.
Compare Unsorted Files
The comm command only produces valuable output when sorted files are used as arguments. When comm works with unsorted files, the output is not usable and always prints an error message.
While comm pairs certain lines and produces an output, the output is incomplete and unusable. The error message specifies that neither file is in sorted order.
To verify that the files are not sorted, use comm –check-order:
comm --check-order Not_Sorted_File_1 Not_Sorted_File_2
The –check-order option prints the error message and stops comm from comparing files at the first unsorted item.
To force comm to print an output and hide the error message, use –nocheck-order:
comm --nocheck-order Not_Sorted_File_1 Not_Sorted_File_2
The output is not reliable. For instance, the word art is present in columns one and two, even though it’s common for both files.
The surefire way to use comm with unsorted files is to apply sort. Execute the following:
comm <(sort Not_Sorted_File_1 ) <(sort Not_Sorted_File_2)
The output shows that the words art, dog, and way are common for both files.
Note that sort without arguments only affects the standard output and does not change the source files. Still, sort can be used with other comm options.
For example, to print only words common for both unsorted files, use:
comm -12 <(sort Not_Sorted_File_1 ) <(sort Not_Sorted_File_2)
To sort source files and then execute comm, use sort -o. The -o argument saves the sorted output to a specific file.
For example, sort Not_Sorted_File_1 and save the output to that same file with:
sort -o Not_Sorted_File_1 Not_Sorted_File_1
Running cat after sorting the file shows that the file is sorted now. Repeat the same process for the second file:
sort -o Not_Sorted_File_2 Not_Sorted_File_2
Run comm to compare files:
comm Not_Sorted_File_1 Not_Sorted_File_2
Compare Directories
Use comm with ls to compare file names in two directories. For example, compare Directory1 and Directory2:
comm <(ls Directory1) <(ls Directory2)
The first column represents file names unique to Directory1, the second those unique to Directory2, and the third one represents file names common for both folders.
When running comm with ls, the command only looks at file names, not the content. Files listed as common for both folders could still differ even if having the same name.
Use comm with STDIN
To compare a file with standard terminal input, use a hyphen as one of the arguments with comm.
For example, compare File_1 with the standard input using:
comm File_1 -
Change the Default Separator
The comm output separates columns with spaces by default. To change the separator, use the –output-delimiter option.
For example, to use * instead of spaces, run:
comm --output-delimiter=* File_1 File_2
The output shows that words in File_1 have no asterisk, those in File_2 have one asterisk, and items common for both files have two asterisks.
Show Line Counts
Show the total number of lines in each column with the –total option:
comm --total File_1 File_2
The output prints the number of lines at the bottom of each column.
Conclusion
After following the steps from this tutorial, you know how to compare files line by line with the comm command.