Converting second pattern to millisecond in awk - bash

I have file which is having pattern 's' , I need to convert into 'ms' by multiplying by 1000. I am unable to do it. Please help me.
file.txt
First launch 1
App: +1s170ms
First launch 2
App: +186ms
First launch 3
App: +1s171ms
First launch 4
App: +1s484ms
First launch 5
App: +1s227ms
First launch 6
App: +204ms
First launch 7
App: +1s180ms
First launch 8
App: +1s177ms
First launch 9
App: +1s183ms
First launch 10
App: +1s155ms
My code:
awk 'BEGIN { FS="[: ]+"}
/:/ && $2 ~/ms$/{vals[$1]=vals[$1] OFS $2+0;next}
END {
for (key in vals)
print key vals[key]
}' file.txt
Expected output:
App 1170 186 1171 1484 1227 204 1180 1177 1183 1155
Output Coming:
App 1 186 1 1 1 204 1 1 1 1
How to convert in above pattern 's' to 'ms' if second pattern comes .

What I will try to do here is explain it a bit generic and then apply it to your case.
Question: I have a string of the form 123a456b7c8d where the numbers are numeric integral values of any length and the letters are corresponding units. I also have conversion factors to convert from unit a,b,c,d to unit f. How can I convert this to a single quantity of unit f?
Example: from 1s183ms to 1183ms
Strategy:
create per string a set of key-value pairs 'a' => 123,'b' => 456, 'c' => 7 and 'd' => 8
multiply each value with the corect conversion factor
add the numbers together
Assume we use awk and the key-value pairs are stored in array a with the key as an index.
Extract key-value pairs from str:
function extract(str,a, t,k,v) {
delete a; t=str;
while(t!="") {
v=t+0; match(t,/[a-zA-Z]+/); k=substr(t,RSTART,RLENGTH);
t=substr(t,RSTART+RLENGTH);
a[k]=v
}
return
}
convert and sum: here we assume we have an array f which contains the conversion factors:
function convert(a,f, t,k) {
t=0; for(k in a) t+=a[k] * f[k]
return t
}
The full code (for the example of the OP)
# set conversion factors
BEGIN{ f['s']=1000; f['ms'] = 1 }
# print first word
BEGIN{ printf "App:" }
# extract string and print
/^App/ { extract($2,a); printf OFS "%dms", convert(a,f) }
END { printf ORS }
which outputs:
App: 1170ms 186ms 1171ms 1484ms 1227ms 204ms 1180ms 1177ms 1183ms 1155ms

perl -n -e '$s=0; ($s)=/(\d+)s/; ($ms)=/(\d+)ms/;
s/^(\w+):/push #{$vals{$1}}, $ms+$s*1000/e;
eof && print "$_: #{$vals{$_}}\n" for keys %vals;' file`
perl -n doesn't print anything as it loops through the input.
$s and $ms are set to those fields. $s is ensured to reset to zero
s///e is stuffing the %vals hash with a list of numbers in ms for each key, App, in this case.
eof && executes the subsequent code after the end of the file.
print "$_: #{$vals{$_}}\n" for keys %vals is printing the %vals hash as the OP wants.
App: 1170 186 1171 1484 1227 204 1180 1177 1183 1155

Related

shell script subtract fields from pairs of lines

Suppose I have the following file:
stub-foo-start: 10
stub-foo-stop: 15
stub-bar-start: 3
stub-bar-stop: 7
stub-car-start: 21
stub-car-stop: 51
# ...
# EOF at the end
with the goal of writing a script which would append to it like so:
stub-foo-start: 10
stub-foo-stop: 15
stub-bar-start: 3
stub-bar-stop: 7
stub-car-start: 21
stub-car-stop: 51
# ...
# appended:
stub-foo: 5 # 5 = stop(15) - start(10)
stub-bar: 4 # and so on...
stub-car: 30
# ...
# new EOF
The format is exactly this sequential pairing of start and stop tags (stop being the closing one) and no nesting in between.
What is the recommended approach to writing such a script using awk and/or sed? Mostly, what I've tried is greping lines, storing to a variable, but that seemed to overcomplicate things and trail off.
Any advice or helpful links welcome. (Most tutorials I found on shell scripting were illustrative at best)
A naive implementation in plain bash
#!/bin/bash
while read -r start && read -r stop; do
printf '%s: %d\n' "${start%-*}" $(( ${stop##*:} - ${start##*:} ))
done < file
This assumes pairs are contiguous and there are no interlaced or nested pairs.
Using GNU awk:
awk -F '[ -]' '{ map[$2][$3]=$4;print } END { for (i in map) { print i": "(map[i]["stop:"]-map[i]["start:"])" // ("map[i]["stop:"]"-"-map[i]["start:"]")" } }' file
Explanation:
awk -F '[ -]' '{ # Set the field delimiter to space or "-"
map[$2][$3]=$4; # Create a two dimensional array with the second and third field as indexes and the fourth field as the value
print # Print the line
}
END { for (i in map) {
print i": "(map[i]["stop:"]-map[i]["start:"])" // ("map[i]["stop:"]"-"-map[i]["start:"]")" # Loop through the array and print the data in the required format
}
}' file

Check to see if numbers in a column are sequential via command line

In a text file, I have a sequence of numbers in a column preceded by a short string. It is the 5th column in the example file here under "NAME":
SESSION NAME: session
SAMPLE RATE: 48000.000000
BIT DEPTH: 16-bit
SESSION START TIMECODE: 00:00:00:00.00
TIMECODE FORMAT: 24 Frame
# OF AUDIO TRACKS: 2
# OF AUDIO CLIPS: 2
# OF AUDIO FILES: 2
M A R K E R S L I S T I N G
# LOCATION TIME REFERENCE UNITS NAME COMMENTS
2 0:00.500 24000 Samples xxxx0001
3 0:03.541 170000 Samples xxxx0002
4 0:05.863 281458 Samples xxxx0003
5 0:08.925 428430 Samples xxxx0004
6 0:10.604 509025 Samples xxxx0005
7 0:13.973 670742 Samples xxxx0006
8 0:15.592 748453 Samples xxxx0008
9 0:19.243 923666 Samples xxxx0008
In the example above, 0007 is missing, and 0008 is duplicated.
Therefore, I would like to be able to check if the numbers are:
sequential given the range that presently exists in the column.
if there are any duplicates
I would also like to output these results:
SKIPPED:
xxxx0007
DUPLICATES:
xxxx0008
The furthest I have been able to get is to use awk to get the column I need:
cat <file.txt> | awk '{ print $5 }'
which gets me to this:
NAME
xxxx0001
xxxx0002
xxxx0003
xxxx0004
xxxx0005
xxxx0006
xxxx0008
xxxx0008
But I do not know where to go from here.
Do I need to loop through the list items and parse so I get the number only, then start doing some comparisons to the next line?
Any help would be tremendously appreciated
Thank you!
As a starting point, please try the following:
awk '
NR>1 { gsub("[^0-9]", "", $5); count[$5]++ }
END {
print "Skipped:"
for (i=1; i<NR; i++)
if (count[i] == 0) printf "xxxx%04d\n", i
print "Duplicates:"
for (i=1; i<NR; i++)
if (count[i] > 1) printf "xxxx%04d\n", i
} ' file.txt
Output:
Skipped:
xxxx0007
Duplicates:
xxxx0008
The condition NR>1 is used to skip the top header line.
gsub("[^0-9]", "", $5) removes non-number characters from $5.
As a result, $5 is set to a number extracted from the 5th column.
The array count[] counts the occurances of each number. If the value
is 0 (or undefined), it means the number is skipped. If the value
is larger than 1, the number is duplicated.
The END { ... } block is executed after all the input lines are processed
and it is useful to report the final results.
However, the "Skipped/Duplicates" approach cannot well detect such cases as:
# LOCATION TIME REFERENCE UNITS NAME COMMENTS
1 0:00.500 24000 Samples xxxx0001
2 0:02.888 138652 Samples xxxx0003
3 0:04.759 228446 Samples xxxx0004
4 0:07.050 338446 Samples xxxx0005
5 0:09.034 433672 Samples xxxx0006
6 0:12.061 578958 Samples xxxx0007
7 0:14.111 677333 Samples xxxx0008
8 0:17.253 828181 Samples xxxx0009
or
# LOCATION TIME REFERENCE UNITS NAME COMMENTS
1 0:00.500 24000 Samples xxxx0001
2 0:02.888 138652 Samples xxxx0003
3 0:04.759 228446 Samples xxxx0002
4 0:07.050 338446 Samples xxxx0004
5 0:09.034 433672 Samples xxxx0005
6 0:12.061 578958 Samples xxxx0006
7 0:14.111 677333 Samples xxxx0007
8 0:17.253 828181 Samples xxxx0008
It will be better to perform a line-by-line comparison between expected value and the actual value. Then how about:
awk '
NR>1 {
gsub("[^0-9]", "", $5)
if ($5 != NR-1) printf "Line: %d Expected: xxxx%04d Actual: xxxx%04d\n", NR, NR-1, $5
} ' file.txt
output for the original example:
Line: 8 Expected: xxxx0007 Actual: xxxx0008
[EDIT]
According to the revised input file which includes more extra header lines, how about:
awk '
f {
gsub("[^0-9]", "", $5)
if ($5 != NR-skip) printf "Line: %d Expected: xxxx%04d Actual: xxxx%04d\n", NR, NR-skip, $5
}
/^#[[:blank:]]+LOCATION[[:blank:]]+TIME REFERENCE/ {
skip = NR
f = 1
}
' file.txt
Output:
Line: 19 Expected: xxxx0007 Actual: xxxx0008
The script above skips the lines until the specific pattern # LOCATION TIME REFERENCE is found.
The f { ... } block is executed if f is true. So the block is skipped
until f is set to a nonzero value.
The /^# .../ { ... } block is executed if the input line matches the
pattern. If found, skip is set to the number of header lines and
f (flag) is set to 1 so the upper block is executed from the next
iteration.
Hope this helps.

Merging sums of numbers from different files and deleting select duplicate lines

I've checked other threads here on merging, but they seem to be mostly about merging text, and not quite what I needed, or at least I couldn't figure out a way to connect their solutions to my own problem.
Problem
I have 10+ input files, each consisting of two columns of numbers (think of them as x,y data points for a graph). Goals:
Merge these files into 1 file for plotting
For any duplicate x values in the merge, add their respective y-values together, then print one line with x in field 1 and the added y-values in field 2.
Consider this example for 3 files:
y1.dat
25 16
27 18
y2.dat
24 10
27 9
y3.dat
24 2
29 3
According to my goals above, I should be able to merge them into one file with output:
final.dat
24 12
25 16
27 27
29 3
Attempt
So far, I have the following:
#!/bin/bash
loops=3
for i in `seq $loops`; do
if [ $i == 1 ]; then
cp -f y$i.dat final.dat
else
awk 'NR==FNR { arr[NR] = $1; p[NR] = $2; next } {
for (n in arr) {
if ($1 == arr[n]) {
print $1, p[n] + $2
n++
}
}
print $1, $2
}' final.dat y$i.dat >> final.dat
fi
done
Output:
25 16
27 18
24 10
27 27
27 9
24 12
24 2
29 3
On closer inspection, it's clear I have duplicates of the original x-values.
The problem is my script needs to print all the x-values first, and then I can add them together for my output. However, I don't know how to go back and remove the lines with the old x-values that I needed to make the addition.
If I blindly use uniq, I don't know whether the old x-values or the new x-value is deleted. With awk '!duplicate[$1]++' the order of lines deleted was reversed over the loop, so it deletes on the first loop correctly but the wrong ones after that.
Been at this for a long time, would appreciate any help. Thank you!
I am assuming you already merged all the files into a single one before making the calculation. Once that's done the script is as simple as :
awk '{ if ( $1 != "" ) { coord[$1]+=$2 } } END { for ( k in coord ) { print k " " coord[k] } }' input.txt
Hope it helps!
Edit : How this works ?
if ( $1 != "" ) { coord[$1]+=$2 }
This line will get executed for each line in your input. It will first check whether there is a value for X, otherwise it simply ignores the line. This helps to ignore empty lines should your file have any. The block which gets executed : coord[$1]+=$2 is the heart of the script and creates a dictionary with X being the key of each entry and at the same time it adds each value for Y found.
END { for ( k in coord ) { print k " " coord[k] }
This block will execute after awk has iterated over all the lines in your file. It will simply grab each key from the dictionary and print it, then a space and finally the sum of all the values which were found, or in other words, the value for that specific key.
Using Perl one-liner
> cat y1.dat
25 16
27 18
> cat y2.dat
24 10
27 9
> cat y3.dat
24 2
29 3
> perl -lane ' $kv{$F[0]}+=$F[1]; END { print "$_ $kv{$_}" for(sort keys %kv) }' y*dat
24 12
25 16
27 27
29 3
>

Bash script - How to loop through rows in a CSV file

I am working with a huge CSV file (filename.csv) that contains a single column. From column 1, I wanted to read current row and compare it with the value of the previous row. If it is greater OR equal, continue comparing and if the value of the current cell is smaller than the previous row - divide the value of the current cell by the value of the previous cell and exit by printing the value of the division. For example from the following example: i wanted my bash script to divide 327 by 340 and print 0.961765 to the console and exit.
338
338
339
340
327
301
299
284
284
283
283
283
282
282
282
283
I tried it with the following awk and it works perfectly fine.
awk '$1 < val {print $1/val; exit} {val=$1}' filename.csv
However, since i want to include around 7 conditional statements (if-else's), I wanted to do it with a bit cleaner bash script and here is my approach. I am not that used to awk to be honest and that's why i prefer to use bash.
#!/bin/bash
FileName="filename.csv"
# Test when to stop looping
STOP=1
# to find the number of columns
NumCol=`sed 's/[^,]//g' $FileName | wc -c`; let "NumCol+=1"
# Loop until the current cell is less than the count+1
while [ "$STOP" -lt "$NumCol" ]; do
cat $FileName | cut -d, -f$STOP
let "STOP+=1"
done
How can we loop through the values and add conditional statements?
PS: the criteria for my if-else statement is (if the value ($1/val) is >=0.85 and <=0.9, print A, else if the value ($1/val) is >=0.7 and <=0.8, print B, if the value ($1/val) is >=0.5 and <=0.6 print C otherwise print D).
Here's one in GNU awk using switch, because I haven't used it in a while:
awk '
$1<p {
s=sprintf("%.1f",$1/p)
switch(s) {
case "0.9": # if comparing to values ranged [0.9-1.0[ use /0.9/
print "A" # ... in which case (no pun) you don't need sprintf
break
case "0.8":
print "B"
break
case "0.7":
print "c"
break
default:
print "D"
}
exit
}
{ p=$1 }' file
D
Other awks using if:
awk '
$1<p {
# s=sprintf("%.1f",$1/p) # s is not rounded anymore
s=$1/p
# if(s==0.9) # if you want rounding,
# print "A" # uncomment and edit all ifs to resemble
if(s~/0.9/)
print "A"
else if(s~/0.8/)
print "B"
else if(s~/0.7/)
print "c"
else
print "D"
exit
}
{ p=$1 }' file
D
This is an alternative approach,based on previous input data describing comparison of $1/val with fixed numbers 0.9 , 0.7 and 0.6.
This solution will not work with ranges like ($1/val) >=0.85 and <=0.9 as clarified later.
awk 'BEGIN{crit[0.9]="A";crit[0.7]="B";crit[0.6]="C"} \
$1 < val{ss=substr($1/val,1,3);if(ss in crit) {print crit[ss]} else {print D};exit}{val=$1}' file
A
This technique is based on checking if rounded value $1/val belongs to a predefined array loaded with corresponding messages.
Let me expand the code for better understanding:
awk 'BEGIN{crit[0.9]="A";crit[0.7]="B";crit[0.6]="C"} \ #Define the criteria array. Your criteria values are used as keys and values are the messages you want to print.
$1 < val{
ss=substr($1/val,1,3); #gets the first three chars of the result $1/val
if(ss in crit) { #checks if the first three chars is a key of the array crit declared in begin
print crit[ss] #if it is, print it's value
}
else {
print D #If it is not, print D
};
exit
}
{val=$1}' file
Using substr we get the first three chars of the result $1/val:
for $1/val = 0.961765 using substr($1/val,1,3) returns 0.9
If you want to make comparisons based on two decimals like 0.96 then change substr like substr($1/val,1,4).
In this case you need to accordingly provide the correct comparison entries in crit array i.e crit[0.96]="A"

awk reading in values

Hello the following code is used by me to split a file
BEGIN{body=0}
!body && /^\/\/$/ {body=1}
body && /^\[/ {print > "first_"FILENAME}
body && /^pos/{$1="";print > "second_"FILENAME}
body && /^[01]+/ {print > "third_"FILENAME}
body && /^\[[0-9]+\]/ {
print > "first_"FILENAME
print substr($0, 2, index($0,"]")-2) > "fourth_"FILENAME
}
the file looks like here
header
//
SeqT: {"POS-s":174.683, "time":0.0130084}
SeqT: {"POS-s":431.49, "time":0.0221447}
[2.04545e+2]:0.00843832,469:0.0109533):0.00657864,((((872:0.00120503,((980:0.0001);
[29]:((962:0.000580339,930:0.000580339):0.00543993);
absolute:
gthcont: 5 4 2 1 3 4 543 5 67 657 78 67 8 5645 6
01010010101010101010101010101011111100011
1111010010010101010101010111101000100000
00000000000000011001100101010010101011111
The problem is that in the file 4 print substr($0, 2, index($0,"]")-2) > "fourth_"FILENAME the number with the sci notation with e does not get through. it works only as long as it is written without that . how can i cahnge the awk to also get the number in the way like 2.7e+7 or so
The problem is you're trying to match E notation when your regex is looking for integers only.
Instead of:
/^\[[0-9]+\]/
use something like:
/^\[[0-9]+(\.[0-9]+(e[+-]?[0-9]+)?)?\]/
This will match positive integers, floats, and E notation wrapped in square brackets at the start of the line.
See demo

Resources