I have an AV log file showing a number of values for each process scanned: Name, Path, Total files scanned, Scan time. The file contains hundreds of these process entries (example below) and for Total files scanned and Scan time I'd like to sort and print the highest (or longest) values so I can determine which processes are impacting the system. I've tried various ways with grep but only seem to get a list running in numerical order, when what I really want is to say Process id: 86, Scan time (ns): 12761174 is the highest, then Process id 25, etc. Hope my explanation is clear enough.
Process id: 25
Name: wwww
Path: "/usr/libexec/wwww"
Total files scanned: 42
Scan time (ns): "62416"
Status: Active
Process id: 7
Name: xxxx
Path: "/usr/libexec/xxxx"
Total files scanned: 0
Scan time (ns): "0"
Status: Active
Process id: 86
Name: yyyy
Path: "/usr/libexec/yyyy"
Total files scanned: 2
Scan time (ns): "12761174"
Status: Active
I have tried:
grep -Eo | grep 'Scan time (ns)' '[0-9]+' file | sort
Which results in:
file:Scan time (ns): "9391986"
file:Scan time (ns): "9532119"
file:Scan time (ns): "9730650"
file:Scan time (ns): "9743828"
file:Scan time (ns): "9793469"
file:Scan time (ns): "9911768"
What I am wanting to achieve is something such as:
Process id 9, Scan time (ns): "34561"
Process id 86, Scan time (ns): "45630"
Process id 25, Scan time (ns): "1256822"
Process id 51, Scan time (ns): "52351290"
Process id 30, Scan time (ns): "90257651"
Process id 19, Scan time (ns): "178764794932"
Here is another approach. It uses sed and sort:
sed '/^Process id:/h; /^Scan time (ns):/!d; s/"//g; H; x; s/\n/, /' file | sort -k7,7n
Note: I've removed double quotes around the scan time values (double quotes around integer values make little sense to me).
With your shown samples please try following awk code. Written and tested in GNU awk.
awk '
/^Process id: /{
val=$NF
next
}
/^Scan time \(ns\): "/{
arr[val]=$NF
}
END{
PROCINFO["sorted_in"]="#ind_num_asc"
for(i in arr){
print "Process id " i ", Scan time (ns): " arr[i]""
}
}
' Input_file
Using perl to read the records one at a time (Using "paragraph mode" which uses a blank line as a record seperator), extract the time, and sort in reverse order by it:
$ perl -00 -lne 'm/Scan time \(ns\):\s+"(\d+)"/ && push #procs, [ $_, $1 ];
END { print $_->[0] for sort { $a->[1] < $b->[1] } #procs }' input.txt
Process id: 86
Name: yyyy
Path: "/usr/libexec/yyyy"
Total files scanned: 2
Scan time (ns): "12761174"
Status: Active
Process id: 25
Name: wwww
Path: "/usr/libexec/wwww"
Total files scanned: 42
Scan time (ns): "62416"
Status: Active
Process id: 7
Name: xxxx
Path: "/usr/libexec/xxxx"
Total files scanned: 0
Scan time (ns): "0"
Status: Active
A combination of awk's RS (input record separator) and FS (input field separator) are useful in this case:
< inpt awk 'BEGIN { RS = ""; FS = "\n" } { print $1 ", " $5 }' | sort -t \" -k2n
Before starting to process anything, i.e in the BEGIN, we set
RS to "", meaning that records are separated by empty lines, and
FS to "\n", meaning that (in each record) the fields are separated by a line break;
then we proceed by printing the 1st and 5th field, with a comma in between.
Finally, interpreting each line of the stream as "-separated list of fields, we sort numerically according to the second field (-k2).
With just paste:
$ cat file | paste - - - - - - -
Process id: 25 Name: wwww Path: "/usr/libexec/wwww" Total files scanned: 42 Scan time (ns): "62416" Status: Active
Process id: 7 Name: xxxx Path: "/usr/libexec/xxxx" Total files scanned: 0 Scan time (ns): "0" Status: Active
Process id: 86 Name: yyyy Path: "/usr/libexec/yyyy" Total files scanned: 2 Scan time (ns): "12761174" Status: Active
And if we add some formatting:
$ cat file \
| paste - - - - - - - \
| awk '{
printf("process id: %s, scan time (ns): %s\n", $3, $15);
}'
process id: 25, scan time (ns): "62416"
process id: 7, scan time (ns): "0"
process id: 86, scan time (ns): "12761174"
Those are 7 dashes (-), because each one of your records is 7 lines (including the blank line).
Explanation of the hack:
paste will concatenate the 1st line of all input files into a single line, then concatenate the 2nd line and so on.
So for each input file, in order, it reads a line and adds it to it's current output line.
We've given stdin as input, 7 times. But stdin is a single stream.
So paste will do:
read line 1 from input 1 (line 1 of stdin)
read line 1 from input 2 (line 2 of stdin)
...
read line 1 from input 7 (line 7 of stdin)
concatenate these lines (lines 1-7 of stdin) as line 1 of stdout
read line 2 from input 1 (line 8 of stdin)
...
read line 2 from input 7 (line 14 of stdin)
concatenate these lines (lines 8-14 of stdin) as line 2 of stdout
...
i build a script using tcopy command to compare (qc) my files on tape and on disk. but this script doesn't work as i want it to be, i would be glad someone can improvise this script to make it work.
#!/bin/bash
drv=/dev/st0
echo -n "no.y files on tape ... : "
total_files=`mt -f $drv eod; mt -f $drv status | grep "File number=" | awk -F"=" '{print $2}' | awk -F"," '{print $1}'`
echo "$total_files"
mt -f $drv rewind
for i in `seq 0 $total_files`;do
printf -v i '%04d' $i
echo -n --- "file${i}" -
tcopy $drv
stat=$?
echo "status-$stat : `date`"
done
The output:
scan no. files on tape ... : 41
--- file0000 -file 0: block size 65536: 1 records
file 0: eof after 1 records: 65536 bytes
file 1: block size 65536: 771 records
file 1: eof after 771 records: 50528256 bytes
file 2: block size 65536: 762 records
file 2: eof after 762 records: 49938432 bytes
file 3: block size 65536: 1852 records
file 3: eof after 1852 records: 121372672 bytes
...
file 35: block size 65536: 1761 records
file 35: eof after 1761 records: 115408896 bytes
file 36: block size 65536: 1 records
file 36: eof after 1 records: 65536 bytes
file 37: block size 65536: 1 records
file 37: eof after 1 records: 65536 bytes
eot
total length: 2946433024 bytes
status-0 : Sat Apr 11 14:22:26 SGT 2020
my suppose outcome should be something like this:
scan no. files on tape ... : 41
--- file0000 - file 0: block size 65536: 1 records ... eof after 1 records: 65536 bytes
--- file0001 - file 1: block size 65536: 771 records ... eof after 771 records: 50528256 bytes
using bash scripting for extracting data to various files
awk '{for(i=1; i<=10; i++){if($1== 2**($i)){getline; print}}}' test.csv>> test/test_$i.csv
Description; I want to extract data to multiple files where column 1 of input file has sizes in power of 2. I want to extract rows having same size into a different file.
input file:
4 10.06 9.64 10.36 1000
8 10.16 9.79 10.48 1000
16 10.49 10.02 10.86 1000
32 10.54 10.13 10.91 1000
4 10.76 9.64 10.36 1000
8 10.90 9.79 10.48 1000
awk 'log($1)/log(2) == int(log($1)/log(2)) { out="pow-" $1; print >out }' file.in
This will, for the given data, create the files pow-N for N equal to 4, 8, 16 and 32.
It will skip lines that does not have a number that is a power of 2 in its first column.
Thanks for your help.
I figured out the possible solution:
for i in `seq 0 $numline`
do
if [ -e $InDir/$file ]
then
awk -v itr=$i '{if($2== 2**(itr)) {print $0}}' $file >> $OutDir/$(awk "BEGIN{print (2 ** $i)}")mb_$file
fi
done
I've got the following logfile and I'd like to extract the number of dropped packets (in the following example the number is 0):
ITGDec version 2.8.1 (r1023)
Compile-time options: bursty multiport
----------------------------------------------------------
Flow number: 1
From 192.168.1.2:0
To 192.168.1.2:8999
----------------------------------------------------------
Total time = 2.990811 s
Total packets = 590
Minimum delay = 0.000033 s
Maximum delay = 0.000169 s
Average delay = 0.000083 s
Average jitter = 0.000010 s
Delay standard deviation = 0.000016 s
Bytes received = 241900
Average bitrate = 647.048576 Kbit/s
Average packet rate = 197.270907 pkt/s
Packets dropped = 0 (0.00 %)
Average loss-burst size = 0.000000 pkt
----------------------------------------------------------
__________________________________________________________
**************** TOTAL RESULTS ******************
__________________________________________________________
Number of flows = 1
Total time = 2.990811 s
Total packets = 590
Minimum delay = 0.000033 s
Maximum delay = 0.000169 s
Average delay = 0.000083 s
Average jitter = 0.000010 s
Delay standard deviation = 0.000016 s
Bytes received = 241900
Average bitrate = 647.048576 Kbit/s
Average packet rate = 197.270907 pkt/s
Packets dropped = 0 (0.00 %)
Average loss-burst size = 0 pkt
Error lines = 0
----------------------------------------------------------
I'm trying with the following command:
cat logfile | grep -m 1 dropped | sed -n 's/.*=\([0-9]*\) (.*/\1/p'
but nothing gets printed.
Thank you
EDIT: I just wanted to tell you that the "Dropped packets" line gets printed in the following way in the code of the program:
printf("Packets dropped = %13lu (%3.2lf %%)\n", (long unsigned int) 0, (double) 0);
It will be easier to use awk here:
awk '/Packets dropped/{print $4}' logfile
Aside from the problem in your sed expression (that it doesn't allow space after =), you don't really need a pipeline here.
grep would suffice:
grep -m 1 -oP 'dropped\s*=\s*\K\d+' logfile
You could have fixed your sed expression by permitting space after the =:
sed -n 's/.*= *\([0-9]*\) (.*/\1/p'
Avoiding your use of cat and grep, in plain sed:
sed -n 's/^Packets dropped[=[:space:]]\+\([0-9]\+\).*/\1/p' logfile
Matches
any line starting with "Packets dropped"
one or more whitespace or "=" characters
one or more digits (which are captured)
The rest .* is discarded.
With the -r option as well, you can lose a few backslashes:
sed -nr 's/^Packets dropped[=[:space:]]+([0-9]+).*/\1/p' logfile
sed -n '/Packets dropped/ s/.*[[:space:]]\([0-9]\{1,\}\)[[:space:]].*/\1/p' YourFile
but print both 2 line (detail + summary) where info is write
I am trying to add a formatted column between columns of multiple text files. By using
awk 'NR==FNR{c[NR]=$3;l=NR;next}{$2=($3+c[1])*c[l]" "$2}7' file file
I can convert a file that has a form
1 2 3 4
10 20 30 40
100 200 300 400
to
1 1800 3 4
10 9900 30 40
100 90900 300 400
How can I do above operation to multiple .dat files?
tmp="/usr/tmp/tmp$$"
for file in *
do
awk '...' "$file" "$file" > "$tmp" && mv "$tmp" "$file"
done
wrt your script, though:
awk 'NR==FNR{c[NR]=$3;l=NR;next}{$2=($3+c[1])*c[l]" "$2}7' file file
Never use the letter l (el) as a variable name as it looks far too much like the number 1 (one). I'd actually write it as:
awk 'NR==FNR{c[++n]=$3;next}{$2=($3+c[1])*c[n]" "$2}7' file file
or if memory is a concern for a large file:
awk 'NR==FNR{c[NR==1]=$3;next}{$2=($3+c[1])*c[0]" "$2}7' file file