Hurry! You've gotten your hands on a large file of leaked records and want to determine if any of your friends are affected:
first_name last_name phone email zip random
Lars Hurley (784)764-9965 [email protected] 56662 1
Solomon Guerra (431)799-5443 [email protected] 89120 4
Stone Grant (322)526-5155 [email protected] 53121 6
Isabelle Massey (731)671-8236 [email protected] 77743 1
Geraldine Cooke (865)364-9487 [email protected] 58123-1293 6
Neve Nicholson (430)324-7527 [email protected] 53124 1
Fredericka Myers (247)982-0158 [email protected] 68376-7256 9
Leo Castaneda (771)652-6444 [email protected] 20158 8
Kane Guerrero (299)545-4314 [email protected] 63028 2
Quinn Forbes (463)614-4569 [email protected] 73599 4
Have no fear for awk
is here! The text-processing utility is great for filtering and extracting data from tabular formats, so here's the cheat sheet:
Extract fields
One field:
awk '{ print $2 }' table.txt
last_name
Hurley
Guerra
Grant
Massey
Cooke
Nicholson
Myers
Castaneda
Guerrero
Forbes
Multiple fields:
awk '{ print $2, $3 }' table.txt
last_name phone
Hurley (784)764-9965
Guerra (431)799-5443
Grant (322)526-5155
Massey (731)671-8236
Cooke (865)364-9487
Nicholson (430)324-7527
Myers (247)982-0158
Castaneda (771)652-6444
Guerrero (299)545-4314
Forbes (463)614-4569
Pretty-printed fields (by piping the awk
output to column
):
awk '{ print $2, $3 }' table.txt | column -t
last_name phone
Hurley (784)764-9965
Guerra (431)799-5443
Grant (322)526-5155
Massey (731)671-8236
Cooke (865)364-9487
Nicholson (430)324-7527
Myers (247)982-0158
Castaneda (771)652-6444
Guerrero (299)545-4314
Forbes (463)614-4569
Filter records
Content between the forward slashes is a regex and applied as a filter on the entire record:
awk '/89120/ { print $2, $5 }' table.txt
Guerra 89120
Note that the regex applies to all fields in each record even when those fields aren't extracted:
awk '/89120/ { print $2 }' table.txt
Guerra
Filter records by one field
As demonstrated in the previous section, this regex finds all records that have a 4 anywhere in the record:
awk '/4/ { print $3, $5, $6 }' table.txt | column -t
(784)764-9965 56662 1
(431)799-5443 89120 4
(731)671-8236 77743 1
(865)364-9487 58123-1293 6
(430)324-7527 53124 1
(247)982-0158 68376-7256 9
(771)652-6444 20158 8
(299)545-4314 63028 2
(463)614-4569 73599 4
On the other hand, this filter finds records where only the 6th column is 4:
awk '$6==4 { print $3, $5, $6 }' table.txt | column -t
(431)799-5443 89120 4
(463)614-4569 73599 4
Such a simple tool, and yet so useful. I'm feeling inspired and might put together similar cheat sheets for ls
, sed
, grep
, and other common utilities!