As software developers, system administrators and data scientists, working with files and manipulating them is a frequent and important part of our day to day job. Knowing how to work with text files and applying desired changes to them in a quick and efficient way can save us a considerable amount of time.
In this article I’m going to introduce you to Awk command, a very powerful text-processing tool which can do complex text processing tasks with one or a few lines of code. You may tend to use your favorite Programming languages like Python, Java, C, … to do these kind of tasks, but after reading this tutorial you will realize doing many of them with Awk is just simpler and more efficient.
I’ll try to demonstrate Awk’s usage by providing basic examples of how to solve common text processing tasks with it.
Installation
Awk is available by default on most Unix distributions. In case you don’t have it, you can use the following commands to install it.
For Debian based distributions:
$ sudo apt-get install gawk
For RPM based distributions:
# yum install gawk
If you are using Microsoft Windows, check the GNU manual for installation instructions.
Workflow
Awk workflow is simple. It reads a line from input stream, executes specified commands on it, and repeats this procedure until the end of file.
There’s also BEGIN and END blocks, which you can use to execute some commands before and after the repeated procedure.

Let’s jump into it
Awk command’s basic structure is like this:
awk [options] file ...
Let’s show some examples.
Example 1: Just print each line as it is
Consider a text file input.txt with the following content:
John 23 Italy
David 18 Spain
Sarah 21 Germany
Dan 42 Germany
Brian 50 England
Lewis 37 France
Ethan 12 France
By running the command:
$ awk '{print}' input.txt
It will run {print}
for each line which is just printing it. So the output will be same as the input.
Example 2: Print the first two columns
Here we want to only print the name (first word) and age (second word) of each person separated by a tab.
Command:
$ awk '{print $1 "t" $2}' input.txt
Output:
John 23
David 18
Sarah 21
Dan 42
Brian 50
Lewis 37
Ethan 12
In the above example, $1 and $2 represent the first and the second fields from each input line. $0 represents the whole line.
Example 3: Add line numbers at the start of each line
Here we define a variable as count, increment it when reading each line and print it at the first of the line.
Command:
$ awk -v count=0 '{print ++count " " $0}' input.txt
Note that we can also remove the -v count=0
part. It will be defined implicitly with the value of 0:
$ awk '{print ++count " " $0}' input.txt
Output:
1 John 23 Italy
2 David 18 Spain
3 Sarah 21 Germany
4 Dan 42 Germany
5 Brian 50 England
6 Lewis 37 France
7 Ethan 12 France
Example 4: Only print people who are older than 30
Awk programming language supports conditions too.
Command:
$ awk '{if ($2 > 30) print $0}' input.txt
Output:
Dan 42 Germany
Brian 50 England
Lewis 37 France
Example 5: Generate a report of how many people are from each country
We can achieve this by using dictionaries and loops.
Command:
$ awk '{my_dict[$3] += 1} END {for (key in my_dict) {print key, my_dict[key]}}' input.txt
Here we have a dictionary named my_dict
. For each line, the key is the third word (country name) and we increase its value by 1. After the END
keyword we can write the END Block commands which we have explained at the workflow section. Here at the END block we loop on the dictionary and print its (key, value) pairs.
Output:
Spain 1
France 2
Germany 2
Italy 1
England 1
Example 6: Calculate the average age
Command:
$ awk '{age_sum += $2} END {print age_sum/NR}' input.txt
NR
is a built-in variable which represents the current record number. So at the END block
, it will be equal to the total number of lines.
Output:
29
You can see other Awk built-in variables here.
Example 7: Only print people whom their names contain ‘s’ character
We can use regular expressions with Awk.
Command:
awk '$1 ~ /[sS]/ {print $1}' input.txt
Here we specify that the first word $1
should match the regular expression [sS]
.
Output:
Sarah
Lewis
Example 7: What if the input file is in another format, like CSV?
You can set the regular expression used to separate fields using -F
option.
Consider having this file as input.csv
:
John,23,Italy
David,18,Spain
Sarah,21,Germany
Dan,42,Germany
Brian,50,England
Lewis,37,France
Ethan,12,France
Command:
awk -F "," '{print $1 " " $3}' input.csv
Output:
John Italy
David Spain
Sarah Germany
Dan Germany
Brian England
Lewis France
Ethan France
Example 8: By defining a function, add a new column showing if the person is younger or older than 20
In this example we will show how to create and use functions. Also we will add our code to a file named prog.awk
instead of writing it as an input argument. We also print our output to a file named output.txt
.
prog.awk:
# Returns if the person is younger or older than 20
function age_func(age) {
if (age < 20) {
return "younger"
}
return "older"
}
{print $0 " " age_func($2)}
Command:
awk -f prog.awk input.txt > output.txt
output.txt:
John 23 Italy older
David 18 Spain younger
Sarah 21 Germany older
Dan 42 Germany older
Brian 50 England older
Lewis 37 France older
Ethan 12 France younger
Awk has also some built-in functions which you can check here.
Summary
In this article we showed the Awk command’s workflow and by providing some examples we saw that Awk is a powerful and flexible text processing tool which can be used in many scenarios. You can read The GNU Awk User’s Guide for more detailed instructions.
Thank you and happy coding!