Linux

awk

Pattern-directed scanning and processing language for text manipulation.

#linux #text-processing #scripting

Basic Usage

Print entire file.

awk '{ print }' [file]

Or:

awk '{ print $0 }' [file]

Field Separators

Default (whitespace)

awk '{ print $1, $2 }' [file]

Tab Delimiter

awk -F'\t' '{ print $1, $2 }' [file]

Comma Delimiter (CSV)

awk -F',' '{ print $1, $2 }' [file]

Custom Delimiter

awk -F':' '{ print $1, $3 }' /etc/passwd

Print Columns

Specific Columns

First column:

awk '{ print $1 }' [file]

First and third columns:

awk '{ print $1, $3 }' [file]

Last Column

awk '{ print $NF }' [file]

Second to Last

awk '{ print $(NF-1) }' [file]

All But First Column

awk '{ $1=""; print $0 }' [file]

Or:

awk '{ for(i=2; i<=NF; i++) printf "%s ", $i; print "" }' [file]

Pattern Matching

Match Specific Value

Lines where first column equals "value":

awk '$1 == "value"' [file]

Lines where first column does not equal "value":

awk '$1 != "value"' [file]

Match Pattern

Lines containing "pattern":

awk '/pattern/' [file]

Lines NOT containing "pattern":

awk '!/pattern/' [file]

Regular Expression Match

awk '$1 ~ /^[0-9]+$/' [file]

Not matching:

awk '$1 !~ /^[0-9]+$/' [file]

Conditional Operations

Greater Than / Less Than

awk '$3 > 100' [file]
awk '$2 <= 50' [file]

Multiple Conditions (AND)

awk '$1 == "value" && $2 > 100' [file]

Multiple Conditions (OR)

awk '$1 == "value" || $2 > 100' [file]

Built-in Variables

VariableDescription
NRCurrent record (line) number
NFNumber of fields in current record
FSField separator (default: space)
OFSOutput field separator
RSRecord separator (default: newline)
ORSOutput record separator
FILENAMECurrent input file name

Examples

Print line numbers:

awk '{ print NR, $0 }' [file]

Print number of fields per line:

awk '{ print NF }' [file]

BEGIN and END

BEGIN Block

Executed before processing any lines:

awk 'BEGIN { print "Header" } { print }' [file]

END Block

Executed after processing all lines:

awk '{ sum += $1 } END { print sum }' [file]

Arithmetic Operations

Sum Column

awk '{ sum += $2 } END { print sum }' [file]

Average

awk '{ sum += $1; count++ } END { print sum/count }' [file]

Count Lines

awk 'END { print NR }' [file]

Output Formatting

Custom Output Field Separator

awk 'BEGIN { OFS="|" } { print $1, $2, $3 }' [file]

Printf Formatting

awk '{ printf "%-10s %5d\n", $1, $2 }' [file]

Advanced Examples

Remove Duplicate Lines

awk '!seen[$0]++' [file]

Print Lines Longer Than 80 Characters

awk 'length > 80' [file]

Print Specific Line Range

Lines 10 to 20:

awk 'NR>=10 && NR<=20' [file]

Calculate Column Sum by Group

awk '{ sum[$1] += $2 } END { for (key in sum) print key, sum[key] }' [file]

Print Every Nth Line

Every 5th line:

awk 'NR % 5 == 0' [file]

Working with Multiple Files

Process Multiple Files

awk '{ print FILENAME, $0 }' file1.txt file2.txt

Join Files by Column

awk 'NR==FNR { a[$1]=$2; next } { print $0, a[$1] }' file1.txt file2.txt

Common Use Cases

Extract Email Addresses

awk -F'@' '/@/ { print $2 }' [file]

Count Word Frequency

awk '{ for(i=1;i<=NF;i++) freq[$i]++ } END { for(word in freq) print word, freq[word] }' [file]

Convert CSV to TSV

awk -F',' 'BEGIN { OFS="\t" } { $1=$1; print }' input.csv