Bash script for parsing and processing specific lines of a text file - bash
I have the following text file
40 timesteps took 58.320842 seconds
greetings 0
80 timesteps took 58.048400 seconds
greetings 0
120 timesteps took 59.459687 seconds
greetings 0
What I would like to do is parse only the lines containing the seconds, add them together and print out the final result.
How can I do that?
Thank you in advance.
awk is well suited for this type of processing.
To deal with floating point precision, you can use printf with a format-string for each variable involved.
There is also another way which sets the format-string for all evaluated variables. The formatting is applied during evaluations (which internally use sprintf. The controlling built-in variable is OFMT. See Built-in Variables That Control awk.
#!/bin/bash
file="$1" # $1 is the 1st command line parameter
awk -vOFMT="%.6f" '/ took /{ secs+=$4 } END{ print secs }' "$file"
Using sed is more involved, because it cannot do any calculations and even bash itself cannot do floating point arithmetic, so you need to use something like awk or bc in any case.
If you really want to use sed:
#!/bin/bash
file="$1" # $1 is the 1st command line parameter
{ sed -nr 's/.* took ([0-9.]+).*/\1+/p' "$file" |tr -d '\n'; echo 0; } |bc
You can use a simple shell command:
grep timesteps <file-name> | awk '{x += $4} END{printf("%.5f", x)}'
Change the number in the printf statement to your preferred output precision.
The awk solutions are good answers. For fun, here is a Ruby answer...
ruby -e 'puts readlines.inject(0) { |m, v| m += v.split[3].to_f }' < file
...or perhaps...
ruby -e 'puts readlines.map { |x| x.split[3].to_f }.reduce(&:+)' < file
...to pass the file as a parameter to a script...
#!/usr/bin/ruby
puts $<.map { |x| x.split[3].to_f }.reduce(&:+)
Related
Replace a string with a random number for every line, in every file, in a directory in Bash
!/bin/bash for file in ~/tdg/*.TXT do while read p; do randvalue=`shuf -i 1-99999 -n 1` sed -i -e "s/55555/${randvalue}/" $file done < $file done This is my script. I'm attempting to replace 55555 with a different random number every time I find it. This currently works, but it replaces every instance of 55555 with the same random number. I have attempted to replace $file at the end of the sed command with $p but that just blows up. Really though, even if I get to the point were each instance on the same line all of that same random number, but a new random number is used for each line, then I'll be happy. EDIT I should have specified this. I would like to actually save the results of the replace in the file, rather than just printing the results to the console. EDIT The final working version of my script after JNevill's fantastic help: !/bin/bash for file in ~/tdg/*.TXT do while read p; do gawk '{$0=gensub(/55555/, int(rand()*99999), "g", $0)}1' $file > ${file}.new done < $file mv -f ${file}.new $file done
Since doing this is in sed gets pretty awful and quickly you may want to switch over to awk to perform this: awk '{$0=gensub(/55555/, int(rand()*99999), "g", $0)}1' $file Using this, you can remove the inner loop as this will run across the entire file line-by-line as awk does. You could just swap out the entire script and feed the wildcard filename to awk directly too: awk '{$0=gensub(/55555/, int(rand()*99999), "g", $0)}1' ~/tdg/*.TXT
This is how to REALLY do what you're trying to do with GNU awk: awk -i inplace '{ while(sub(/55555/,int(rand()*99999)+1)); print }' ~/tdg/*.TXT No shell loops or temp files required and it WILL replace every 55555 with a different random number within and across all files. With other awks it'd be: seed="$RANDOM" for file in ~/tdg/*.TXT; do seed=$(awk -v seed="$seed" ' BEGIN { srand(seed) } { while(sub(/55555/,int(rand()*99999)+1)); print > "tmp" } END { print int(rand()*99999)+1 } ' "$file") && mv tmp "$file" done
A variation on JNevill's solution that generates a different set of random numbers every time you run the script ... A sample data file: $ cat grand.dat abc def 55555 xyz-55555-55555-__+ 123-55555-55555-456 987-55555-55555-.2. .+.-55555-55555-==* And the script: $ cat grand.awk { $0=gensub(/55555/,int(rand()*seed),"g",$0); print } gensub(...) : works same as Nevill's answer, while we'll mix up the rand() multiplier by using our seed value [you can throw any numbers in here you wish to help determine size of the resulting value] ** keep in mind that this will replace all occurrences of 55555 on a single line with the same random value Script in action: $ awk -f grand.awk seed=${RANDOM} grand.dat abc def 6939 xyz-8494-8494-__+ 123-24685-24685-456 987-4442-4442-.2. .+.-17088-17088-==* $ awk -f grand.awk seed=${RANDOM} grand.dat abc def 4134 xyz-5060-5060-__+ 123-14706-14706-456 987-2646-2646-.2. .+.-10180-10180-==* $ awk -f grand.awk seed=${RANDOM} grand.dat abc def 4287 xyz-5248-5248-__+ 123-15251-15251-456 987-2744-2744-.2. .+.-10558-10558-==* seed=$RANDOM : have the OS generate a random int for us and pass into the awk script as the seed variable
Need To Generate New Random Numbers For Each Sed Substitution Used In "While Loop"
I'm using sed to substitute a random 10 digit string of numbers for a certain field in a file, which I can successfully do. However, the same random 10 digit string of numbers are used for each substitution sed performs which is unacceptable in this case. I need a new random 10 digit string of numbers for every substitution sed performs. Here's what I have so far: #!/bin/bash # # random_number() { for i in {1}; do tr -c -d 0-9 < /dev/urandom | head -c 10; done } while read line do sed -E "s/[<]FITID[>][[:digit:]]+/<FITID>$(random_number)/g" done<~/Desktop/FITIDTEST.QFX Here's a sample of what the original FITIDTEST.QFX file looks like: <FITID>1266821191 <FITID>1267832241 <FITID>1268070393 <FITID>1268565193 <FITID>1268882385 <FITID>1268882384 And here is the output after executing the script: <FITID>4270240286 <FITID>4270240286 <FITID>4270240286 <FITID>4270240286 <FITID>4270240286 <FITID>4270240286 I need those 10 digit numbers to be different for each field. I thought the "while loop" would force sed to call the random_number() function each time but apparently it's called once and the value is stored and used repeatedly. Is there anyway to avoid that? Any help is greatly appreciated!
Your sed is replacing all the lines with matching pattern not just one line hence at the end of loop you are seeing same number in replacement. You can use: while read line; do sed -E "/<FITID>/s/<FITID>[[:digit:]]+/<FITID>$(random_number)/" <<< "$line" done < ~/Desktop/FITIDTEST.QFX > _tmp_ Output: cat _tmp_ <FITID>9974823224 <FITID>1524680591 <FITID>7433495381 <FITID>6642730759 <FITID>9653629434 <FITID>1325816974
Just use awk: $ cat tst.awk BEGIN { srand() } { sub(/[0-9]+/,sprintf("%010d",rand()*10000000000)) print } $ awk -f tst.awk file <FITID>3730584119 <FITID>1473036092 <FITID>8390375691 <FITID>6700634479 <FITID>8379256766 <FITID>6583696062 $ awk -f tst.awk file <FITID>7844627153 <FITID>0141034890 <FITID>9714288799 <FITID>0911892354 <FITID>8916456168 <FITID>4187598430
read line from file and save them in a comma separated string to a variable
I want to read lines from a text file and save them in a variable. cat ${1} | while read name; do namelist=${name_list},${name} done the file looks like this: David Kevin Steve etc. and i want to get this output instead David, Kevin, Steve etc. and save it to the variable ${name_list}
The command: $ tr -s '\n ' ',' < sourcefile.txt # Replace newlines and spaces with [,] This will likely return a , as the last character (and potentially the first). To shave of the comma(s) and return a satisfying result: $ name_list=$(tr -s '\n ' ',' < sourcefile.txt) # store the previous result $ name_list=${tmp%,} # shave off the last comma $ name_list=${tmp#,} # shave off any first comma EDIT This solution runs 44% faster and yields consistent and valid results across all Unix platforms. # This solution python -mtimeit -s 'import subprocess' "subprocess.call('tmp=$(tr -s "\n " "," < input.txt);echo ${tmp%,} >/dev/null',shell = True)" 100 loops, best of 3: 3.71 msec per loop # Highest voted: python -mtimeit -s 'import subprocess' "subprocess.call('column input.txt | sed "s/\t/,/g" >/dev/null',shell = True)" 100 loops, best of 3: 6.69 msec per loop
name_list="" for name in `cat file.txt` do VAR="$name_list,$i" done EDIT: this script leaves a "," at the beginning of name_list. There are a number of ways to fix this. For example, in bash this should work: name_list="" for name in `cat file.txt`; do if [[ -z $name_list ]]; then name_list="$i" else name_list="$name_list,$i" fi done RE-EDIT: so, thanks to the legitimate complaints of Fredrik: name_list="" while read name do if [[ -z $name_list ]]; then name_list="$name" else name_list="$name_list,$name" fi done < file.txt
Using column, and sed: namelist=$(column input | sed 's/\t/,/g')
variable=`perl -lne 'next if(/^\s*$/);if($a){$a.=",$_"}else{$a=$_};END{print $a}' your_file`
How to convert HHMMSS to HH:MM:SS Unix?
I tried to convert the HHMMSS to HH:MM:SS and I am able to convert it successfully but my script takes 2 hours to complete because of the file size. Is there any better way (fastest way) to complete this task Data File data.txt 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,071600, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,072200,072200, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,072600,072600, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073200,073200, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073500,073500, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,073700,073700, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,073900,073900, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,074400,, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,090200, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,090900,090900, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,091500,091500, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,091900,091900, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092500,092500, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092900,092900, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,093200,093200, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,093500,093500, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,094500,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,170100, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,170400,170400, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,170700,170700, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171000,171000, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171500,171500, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,171900,171900, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172500,172500, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172900,172900, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,173500,173500, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,174100,, My code : script.sh #!/bin/bash awk -F"," '{print $5}' Data.txt > tmp.txt # print first line first string before , to tmp.txt i.e. all Numbers will be placed into tmp.txt sort tmp.txt | uniq -d > Uniqe_number.txt # unique values be stored to Uniqe_number.txt rm tmp.txt # removes tmp file while read line; do echo $line cat Data.txt | grep ",$line," > Numbers/All/$line.txt # grep Number and creats files induvidtually awk -F"," '{print $5","$4","$7","$8","$9","$10","$11}' Numbers/All/$line.txt > Numbers/All/tmp_$line.txt mv Numbers/All/tmp_$line.txt Numbers/Final/Final_$line.txt done < Uniqe_number.txt ls Numbers/Final > files.txt dos2unix files.txt bash time_replace.sh when you execute above script it will call time_replace.sh script My Code for time_replace.sh #!/bin/bash for i in `cat files.txt` do while read aline do TimeDep=`echo $aline | awk -F"," '{print $6}'` #echo $TimeDep finalTimeDep=`echo $TimeDep | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'` #echo $finalTimeDep ########## TimeAri=`echo $aline | awk -F"," '{print $7}'` #echo $TimeAri finalTimeAri=`echo $TimeAri | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'` #echo $finalTimeAri sed -i 's/',$TimeDep'/',$finalTimeDep'/g' Numbers/Final/$i sed -i 's/',$TimeAri'/',$finalTimeAri'/g' Numbers/Final/$i ############################ done < Numbers/Final/$i done Any better solution? Appreciate any help. Thanks Sri
If there's a large quantity of files, then the pipelines are probably what are going to impact performance more than anything else - although processes can be cheap, if you're doing a huge amount of processing then cutting down the amount of time you do pass data through a pipeline can reap dividends. So you're probably going to be better off writing the entire script in awk (or perl). For example, awk can send output to an arbitary file, so the while lop in your first file could be replaced with an awk script that does this. You also don't need to use a temporary file. I assume the sorting is just for tracking progress easily as you know how many numbers there are. But if you don't care for the sorting, you can simply do this: #!/bin/sh awk -F ',' ' { print $5","$4","$7","$8","$9","$10","$11 > Numbers/Final/Final_$line.txt }' datafile.txt ls Numbers/Final > files.txt Alternatively, if you need to sort you can do sort -t, -k5,4,10 (or whichever field your sort keys actually need to be). As for formatting the datetime, awk also does functions, so you could actually have an awk script that looks like this. This would replace both of your scripts above whilst retaining the same functionality (at least, as far as I can make out with a quick analysis) ... (Note! Untested, so may contain vauge syntax errors): #!/usr/bin/awk BEGIN { FS="," } function formattime (t) { return substr(t,1,2)":"substr(t,3,2)":"substr(t,5,2) } { print $5","$4","$7","$8","$9","formattime($10)","formattime($11) > Numbers/Final/Final_$line.txt } which you can save, chmod 700, and call directly as: dostuff.awk filename Other awk options include changing fields in-situ, so if you want to maintain the entire original file but with formatted datetimes, you can do a modification of the above. Change the print block to: { $10=formattime($10) $11=formattime($11) print $0 } If this doesn't do everything you need it to, hopefully it gives some ideas that will help the code.
It's not clear what all your sorting and uniq-ing is for. I'm assuming your data file has only one entry per line, and you need to change the 10th and 11th comma-separated fields from HHMMSS to HH:MM:SS. while IFS=, read -a line ; do echo -n ${line[0]},${line[1]},${line[2]},${line[3]}, echo -n ${line[4]},${line[5]},${line[6]},${line[7]}, echo -n ${line[8]},${line[9]}, if [ -n "${line[10]}" ]; then echo -n ${line[10]:0:2}:${line[10]:2:2}:${line[10]:4:2} fi echo -n , if [ -n "${line[11]}" ]; then echo -n ${line[11]:0:2}:${line[11]:2:2}:${line[11]:4:2} fi echo "" done < data.txt The operative part is the ${variable:offset:length} construct that lets you extract substrings out of a variable.
In Perl, that's close to child's play: #!/usr/bin/env perl use strict; use warnings; use English( -no_match_vars ); local($OFS) = ","; while (<>) { my(#F) = split /,/; $F[9] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[9]; $F[10] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[10]; print #F; } If you don't want to use English, you can write local($,) = ","; instead; it controls the output field separator, choosing to use comma. The code reads each line in the file, splits it up on the commas, takes the last two fields, counting from zero, and (if they're not empty) inserts colons in between the pairs of digits. I'm sure a 'Code Golf' solution would be made a lot shorter, but this is semi-legible if you know any Perl. This will be quicker by far than the script, not least because it doesn't have to sort anything, but also because all the processing is done in a single process in a single pass through the file. Running multiple processes per line of input, as in your code, is a performance disaster when the files are big. The output on the sample data you gave is: 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,07:16:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:22:00,07:22:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,07:26:00,07:26:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:32:00,07:32:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:35:00,07:35:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,07:37:00,07:37:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,07:39:00,07:39:00, 10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:44:00,, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,09:02:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:09:00,09:09:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:15:00,09:15:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,09:19:00,09:19:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:25:00,09:25:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:29:00,09:29:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,09:32:00,09:32:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,09:35:00,09:35:00, 10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:45:00,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,17:01:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,17:04:00,17:04:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,17:07:00,17:07:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:10:00,17:10:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:15:00,17:15:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,17:19:00,17:19:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:25:00,17:25:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:29:00,17:29:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:35:00,17:35:00, 10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:41:00,,
Setting a BASH environment variable directly in AWK (in an AWK one-liner)
I have a file that has two columns of floating point values. I also have a C program that takes a floating point value as input and returns another floating point value as output. What I'd like to do is the following: for each row in the original, execute the C program with the value in the first column as input, and then print out the first column (unchanged) followed by the second column minus the result of the C program. As an example, suppose c_program returns the square of the input and behaves like this: $ c_program 4 16 $ and suppose data_file looks like this: 1 10 2 11 3 12 4 13 What I'd like to return as output, in this case, is 1 9 2 7 3 3 4 -3 To write this in really sketchy pseudocode, I want to do something like this: awk '{print $1, $2 - `c_program $1`}' data_file But of course, I can't just pass $1, the awk variable, into a call to c_program. What's the right way to do this, and preferably, how could I do it while still maintaining the "awk one-liner"? (I don't want to pull out a sledgehammer and write a full-fledged C program to do this.)
you just do everything in awk awk '{cmd="c_program "$1; cmd|getline l;print $1,$2-l}' file
This shows how to execute a command in awk: ls | awk '/^a/ {system("ls -ld " $1)}' You could use a bash script instead: while read line do FIRST=`echo $line | cut -d' ' -f1` SECOND=`echo $line | cut -d' ' -f2` OUT=`expr $SECOND \* 4` echo $FIRST $OUT `expr $OUT - $SECOND` done
The shell is a better tool for this using a little used feature. There is a shell variable IFS which is the Input Field Separator that sh uses to split command lines when parsing; it defaults to <Space><Tab><Newline> which is why ls foo is interpreted as two words. When set is given arguments not beginning with - it sets the positional parameters of the shell to the contents of the arguments as split via IFS, thus: #!/bin/sh while read line ; do set $line subtrahend=`c_program $1` echo $1 `expr $2 - $subtrahend` done < data_file
Pure Bash, without using any external executables other than your program: #!/bin/bash while read num1 num2 do (( result = $(c_program num2) - num1 )) echo "$num1 $result" done
As others have pointed out: awk is not not well equipped for this job. Here is a suggestion in bash: #!/bin/sh data_file=$1 while read column_1 column_2 the_rest do ((result=$(c_program $column_1)-$column_2)) echo $column_1 $result "$the_rest" done < $data_file Save this to a file, say myscript.sh, then invoke it as: sh myscript.sh data_file The read command reads each line from the data file (which was redirected to the standard input) and assign the first 2 columns to $column_1 and $column_2 variables. The rest of the line, if there is any, is stored in $the_rest. Next, I calculate the result based on your requirements and prints out the line based on your requirements. Note that I surround $the_rest with quotes to reserve spacing. Failure to do so will result in multiple spaces in the input file to be squeezed into one.