Tricks concerning awk, sed, csplit and other tool that work with unstructured text.
Awk
A data driven programming language 💓. Gawk: Effective AWK Programming (gawk manual)
Misc
Some “aha, I forgot!” awk snippets.
# Don't print newlines: use printf e.g:
{sum+=$4}; END {printf "%f",sum/NR}
# Split $NF by ":" into array a (arrays indexed from 1):
{split($NF,a,":"); print a[1]}
# Jump to next line while processing
/pattern/{
# Make something
next # skips last "catch all block"
}
/next_pattern/{next}
{# catch all block
}
Uniq via awk
AWK can be used for quicker alternative to |sort | uniq. It doesn’t have to
sort everything and uses hash table. It has to store everything in memory though.
If you really need speed, my best choice is quniq.c
awk '!visited[$0]++' your_file > deduplicated_file
Sed, tail, head
# sed - remove 1st line
sed '1d' xx01
# Tail - omit first line
# "start passing through on the second line of output".
cat file | tail -n +2
# Head - omit last line
cat /etc/passwd | head -n -1
csplit
Split file into multiple files. result files fill have split string as first line. Filenames are automatically generated (x00).
# splitting on '</doc>'
csplit <filename> '/</doc>/' '{*}'
base64
Generate random ASCII text.
base64 /dev/urandom # lines
# only alphanum
base64 /dev/urandom | sed 's/[+/]/a/g' | head -c 1024
Comments