Linux Bash Print largest number in column from monthly rotated log file - shell

I have monthly rotated log files which looks like the output below. The files are names transc-2301.log (transc-YMM). There is a file for each month of the year. I need a simple bash command to find the file of the current month, and display the largest number (max) of column 3. In the example below, the output should be 87
01/02/23 10:45 19 26
01/02/23 11:45 19 45
01/02/23 12:45 19 36
01/02/23 13:45 22 64
01/02/23 14:45 19 72
01/02/23 15:45 19 54
01/02/23 16:45 19 80
01/02/23 17:45 17 36
01/03/23 10:45 18 24
01/03/23 11:45 19 26
01/03/23 12:45 19 48
01/03/23 13:45 20 87
01/03/23 14:45 20 29
01/03/23 15:45 18 26

Since your filenames are sortable you can easily pick the file of the current month as being the last one in a sortable sequence. Than a quick awk returns the result.
for file in transc_*.log; do :; done
awk '($3>m){m=$3}END{print m}' "$file"
alternatively you can let awk do the heavy lifting on the filename
awk 'BEGIN{ARGV[1]=ARGV[ARGC-1];ARGC=2}($3>m){m=$3}END{print m}' transc_*.log
or if you don't like the glob-expansion trick:
awk '($3>m){m=$3}END{print m}' "transc_$(date "+%y%m").log"

I would harness GNU AWK for this task following way, let transc-2301.log content be
01/02/23 10:45 19 26
01/02/23 11:45 19 45
01/02/23 12:45 19 36
01/02/23 13:45 22 64
01/02/23 14:45 19 72
01/02/23 15:45 19 54
01/02/23 16:45 19 80
01/02/23 17:45 17 36
01/03/23 10:45 18 24
01/03/23 11:45 19 26
01/03/23 12:45 19 48
01/03/23 13:45 20 87
01/03/23 14:45 20 29
01/03/23 15:45 18 26
then
awk 'BEGIN{m=-1;FS="[[:space:]]{2,}";logname=strftime("transc-%y%m.log")}FILENAME==logname{m=$3>m?$3:m}END{print m}' transc*.log
gives output (as of 18 Jan 2023)
87
Warning: I assume your file use as separator two-or-more whitespace characters, if this does not hold adjust FS accordingly. Warning: set m to value which is lower than lowest value which might appear in column of interest. Explanation: I use strftime function to detect what file should be processed and ram all transc*.log files but action is only taken for selected file, action is: set m to $3 if it is higher than current m otherwise keep m value. After processing files, in END, I print value of m.
(tested ub GNU Awk 5.0.1)

mawk '_<(__ = +$NF) { _=__ } END { print +_ }'
gawk 'END { print +_ } (_=_<(__=+$NF) ?__:_)<_'
87

Related

Sort according to two columns and extract top two based on last column

I have a file with three columns. I would like to extract rows with top two values in column 3 for each unique value in column 2.
cat file.list
run1/xx2/x2c1.txt 21 -190
run1/xx2/x2c2.txt 19 -180
run1/xx2/x2c3.txt 18 -179
run1/xx2/x2c4.txt 19 -162
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
run2/xx2/x2c2.txt 18 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run2/xx2/x2c5.txt 21 -179
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c2.txt 19 -192
run3/xx2/x2c3.txt 21 -191
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c5.txt 19 -179
expected output
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
I feel like some combination of sort, uniq and awk might accomplish but I can't properly execute it. I can sort by columns
sort -nk2 -nk3 file.list
which gives me an output sorted by -k2 and -k3 as follows,
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run1/xx2/x2c3.txt 18 -179
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run1/xx2/x2c2.txt 19 -180
run3/xx2/x2c5.txt 19 -179
run1/xx2/x2c4.txt 19 -162
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
run2/xx2/x2c5.txt 21 -179
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
but then I get stuck on how to extract only the rows with best two scores in the last column for 18, 19 and 20.
I would really appreciate any bash solutions.
Piping the current sort results to awk:
$ sort -nk2 -nk3 file.list | awk 'a[$2]++ < 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
Where:
field #2 ($2) is used as the index for array a[]
if the value stored in the array is less than 2 then print the current input line
then increment the counter (++)
1st time we see a[18] the count is 0, we print the line, and increment the count by 1
2nd time we see a[18] the count is 1, we print the line, and increment the count by 1
3rd (to nth) time we see a[18] the count is greater than or equal to 2, we do not print the line, and increment the count
An alternative where we increment the count first:
$ sort -nk2 -nk3 file.list | awk '++a[$2] <= 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

Mixing data from two csv files with date field [duplicate]

This question already has answers here:
match values in first column of two files and join the matching lines in a new file
(2 answers)
Closed 6 years ago.
I have two csv files.
File1.csv
F1 F2
14:01 22
14:05 23
14:07 34
14:58 98
15:01 22
15:10 24
File2.csv
F1 F2
14:01 22
14:06 21
14:07 34
14:59 08
15:01 22
15:19 20
And is it possible to have something like below ?
F1 F2 F3
14:01 22 22
14:05 23
14:06 21
14:07 34 34
14:58 98
14:59 08
15:01 22 22
15:10 24
15:19 20
Thank you.
Here is a pure bash solution, not the most efficient as pointed by #Inian but still pure
#!/bin/bash
f1=()
f2=()
while read -r f1l; do
f1[${#f1[#]}]="$f1l"
done < File1.csv
while read -r f2l; do
f2[${#f2[#]}]="$f2l"
done < File2.csv
output=$'F1\tF2\n'
for (( i=1; i<${#f1[#]}; ++i ))
do
f1c1=${f1[i]%% *}
f1c2=${f1[i]##* }
f2c1=${f2[i]%% *}
f2c2=${f2[i]##* }
if [[ $f1c1 = $f2c1 ]]; then
output+="$f1c1"$'\t'$(($f1c2+$f2c2))$'\n'
else
output+="$f1c1"$'\t'"$f1c2"$'\n'
output+="$f2c1"$'\t'"$f2c2"$'\n'
fi
done
echo "${output:0:-1}" > File3.csv

Count occurrences in a text line

Is there any way to count how often a value occurs in a line?. My input is a tab delimited .txt file. It looks something like this (but with thousands of lines):
#N/A 14 13 #N/A 15 13 #N/A 14 13 13 15 14 13 15 14 14 15
24 26 #N/A 24 22 #N/A 24 26 #N/A 24 26 24 22 24 22 24 26
45 43 45 43 #N/A #N/A #N/A 43 45 45 43 #N/A 47 45 45 43
I would like an output like this or similar.
#N/A(3) 14 13(3) 15 13(1) 13 15(1) 15 14(1) 14 15 (1)
24 26(4) #N/A(3) 24 22(3)
45 45(4) #N/A(4) 43 45(1) 47 45(1)
Perl solution:
perl -laF'/\t/' -ne '
chomp; my %h;
$h{$_}++ for #F;
print join "\t", map "$_ ($h{$_})", keys %h
' < input
-a splits each line on -F (\t means tab) into the #F array
-l adds newlines to prints
-n reads the input line by line
chomp removes the final newline
%h is a hash table, the keys are the members of #F, the values are the counts
awk to the rescue!
$ awk -F' +' -v OFS=' ' '{for(i=1;i<=NF;i++) if($i!="")a[$i]++;
for(k in a) printf "%s", k"("a[k]")" OFS; delete a; print ""}' file
#N/A(3) 14 13(3) 13 15(1) 15 13(1) 14 15(1) 15 14(1)
#N/A(3) 24 22(3) 24 26(4)
#N/A(4) 43 45(1) 45 43(4) 47 45(1)

Get the average of the selected cells line by line in a file?

I have a single file with the multiple columns. I want to select few and get average for selected cell in a line and output the entire average as column.
For example:
Month Low.temp Max.temp Pressure Wind Rain
JAN 17 36 120 5 0
FEB 10 34 110 15 3
MAR 13 30 115 25 5
APR 14 33 105 10 4
.......
How to get average temperature (Avg.temp) and Humidity (Hum)as column?
Avg.temp = (Low.temp+Max.temp)/2
Hum = Wind * Rain
To get the Avg.temp
Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum
JAN 17 36 120 5 0 26.5 0
FEB 10 34 110 15 3 22 45
MAR 13 30 115 25 5 21.5 125
APR 14 33 105 10 4 23.5 40
.......
I don't want to do it in excel. Is there any simple shell command to do this?
I would use awk like this:
awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file
or:
awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {print $0, ($2+$3)/2, $5*$6}' file
This consists in doing the calculations and appending them to the original values.
Let's see it in action, piping to column -t for a nice output:
$ awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file | column -t
Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum
JAN 17 36 120 5 0 26.5 0
FEB 10 34 110 15 3 22 45
MAR 13 30 115 25 5 21.5 125
APR 14 33 105 10 4 23.5 40

Range with leading zero in bash

How to add leading zero to bash range?
For example, I need cycle 01,02,03,..,29,30
How can I implement this using bash?
In recent versions of bash you can do:
echo {01..30}
Output:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Or if it should be comma separated:
echo {01..30} | tr ' ' ','
Which can also be accomplished with parameter expansion:
a=$(echo {01..30})
echo ${a// /,}
Output:
01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
another seq trick will work:
seq -w 30
if you check the man page, you will see the -w option is exactly for your requirement:
-w, --equal-width
equalize width by padding with leading zeroes
You can use seq's format option:
seq -f "%02g" 30
A "pure bash" way would be something like this:
echo {0..2}{0..9}
This will give you the following:
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Removing the first 00 and adding the last 30 is not too hard!
This works:
printf " %02d" $(seq 1 30)

Resources