Processing a list of values in the terminal
Assuming a list of data in a form such as:
cat data
1
2
3
...
There is a useful tool (datamash
) which allows us to compute useful
statistics on the data provided.
For example, assuming we wish to compute the mean of the list above, we could run:
datamash --header-out mean 1 < data
mean(field-1)
2
In this example, the mean 1
part represents that we want the mean of the
first field.
This can also be extended by using other available processing operators.
For example:
cat data | datamash --header-out perc:1 1 perc:5 1 perc:10 1 mean 1 perc:50 1 perc:90 1 perc:95 1 perc:99 1 max 1
min(field-1) mean(field-1) perc:95(field-1) max(field-1)
1 2 2.9 3
One limitation is that datamash
doesn't correctly align the columns. This can
be resolved by piping through | column -t
. I also typically pipe the data
through a sed
filter to remove the field identifiers (given we're only using a
single field here).
cat data | datamash -s --header-out min 1 mean 1 perc:95 1 max 1 | sed -e 's/(field-1)//g' | column -t
min mean perc:95 max
1 2 2.9 3