Parsing out numbers from a string - BASH script - bash

I have a text file with the following contents:
QAM Mode : QAM-16
QAM Annex : Annex A
Frequency : 0 Hz
IF Frequency : 0 Hz
Fast Acquisition : 0
Receiver Mode : cable
QAM Lock : 1
FEC Lock : 1
Output PLL Lock : 0
Spectrum Inverted : 0
Symbol Rate : -1
Symbol Rate Error : 0
IF AGC Level (in units of 1/10 percent) : 260
Tuner AGC Level (in units of 1/10 percent) : 1000
Internal AGC Level (in units of 1/10 percent) : 0
SNR Estimate (in 1/100 dB) : 2260
**FEC Corrected Block Count (Since last tune or reset) : 36472114
FEC Uncorrected Block Count (Since last tune or reset) : 0
FEC Clean Block Count (Since last tune or reset) : 0**
Cumulative Reacquisition Count : 0
Uncorrected Error Bits Output From Viterbi (Since last tune or reset) : 0
Total Number Of Bits Output from Viterbi (Since last tune or reset) : 0
viterbi bit error rate (in 1/2147483648 th units) : 0
Carrier Frequency Offset (in 1/1000 Hz) : -2668000
Carrier Phase Offset (in 1/1000 Hz) : 0
**Good Block Count (Reset on read) : -91366870**
**BER Raw Count (Reset on read) : 0**
DS Channel Power (in 10's of dBmV units ) : -760
Channel Main Tap Coefficient : 11846
Channel Equalizer Gain Value in dBm : 9
**Post Rs BER : 2147483648
Post Rs BER Elapsed Time (in Seconds) : 0**
Interleave Depth : 1
I need to parse the numbers from the bolded lines using a bash script but I haven't been able to do this with the command set I have available. This is my first time every using BASH scripts and the searches I've found that could help used some grep, sed, and cut options that weren't available. The options I have are listed below:
grep
Usage: grep [-ihHnqvs] PATTERN [FILEs...]
Search for PATTERN in each FILE or standard input.
Options:
-H prefix output lines with filename where match was found
-h suppress the prefixing filename on output
-i ignore case distinctions
-l list names of files that match
-n print line number with output lines
-q be quiet. Returns 0 if result was found, 1 otherwise
-v select non-matching lines
-s suppress file open/read error messages
sed
BusyBox v1.00-rc3 (00:00) multi-call binary
Usage: sed [-efinr] pattern [files...]
Options:
-e script add the script to the commands to be executed
-f scriptfile add script-file contents to the
commands to be executed
-i edit files in-place
-n suppress automatic printing of pattern space
-r use extended regular expression syntax
If no -e or -f is given, the first non-option argument is taken as the sed
script to interpret. All remaining arguments are names of input files; if no
input files are specified, then the standard input is read. Source files
will not be modified unless -i option is given.
awk
BusyBox v1.00-rc3 (00:00) multi-call binary
Usage: awk [OPTION]... [program-text] [FILE ...]
Options:
-v var=val assign value 'val' to variable 'var'
-F sep use 'sep' as field separator
-f progname read program source from file 'progname'
Can someone please help me with this? Thanks!

AWK can do that for you:
awk '/^(FEC.*Block|Good Block|BER|Post)/{print $NF}' textfile

grep -e "^FEC " -e "^Good Block" -e "BER" file.txt | awk '{print $NF}'
grep: Match lines that: start with FEC or start with Good Block or contains BER
awk: Print the last space-separated field in each line

If you have the right grep, you can do this with grep alone, using a regex look-ahead:
$ /bin/grep -Po "(?<=Post Rs BER : )(.+)" data.txt
2147483648
$
I got the inspiration for this here

In addition, you can do this with a pure bash one-liner, no awk, sed, grep, or other helpers:
$ { while read line; do if [[ $line =~ "Post Rs BER : (.*)$" ]]; then echo ${BASH_REMATCH[1]}; fi; done; } < data.txt
2147483648
$
or
$ cat data.txt | { while read line; do if [[ $line =~ "Post Rs BER : (.*)$" ]]; then echo ${BASH_REMATCH[1]}; fi; done; }
2147483648
$

Related

Processing of the data from a big number of input files

My AWK script processes each log file from the folder "${results}, from which it looks for a pattern (a number occurred on the first line of ranking table) and then print it in one line together with the filename of the log:
awk '$1=="1"{sub(/.*\//,"",FILENAME); sub(/\.log/,"",FILENAME); printf("%s: %s\n", FILENAME, $2)}' "${results}"/*_rep"${i}".log
Here is the format of each log file, from which the number
-9.14
should be taken
AutoDock Vina v1.2.3
#################################################################
# If you used AutoDock Vina in your work, please cite: #
# #
# J. Eberhardt, D. Santos-Martins, A. F. Tillack, and S. Forli #
# AutoDock Vina 1.2.0: New Docking Methods, Expanded Force #
# Field, and Python Bindings, J. Chem. Inf. Model. (2021) #
# DOI 10.1021/acs.jcim.1c00203 #
# #
# O. Trott, A. J. Olson, #
# AutoDock Vina: improving the speed and accuracy of docking #
# with a new scoring function, efficient optimization and #
# multithreading, J. Comp. Chem. (2010) #
# DOI 10.1002/jcc.21334 #
# #
# Please see https://github.com/ccsb-scripps/AutoDock-Vina for #
# more information. #
#################################################################
Scoring function : vina
Rigid receptor: /home/gleb/Desktop/dolce_vita/temp/nsp5holoHIE.pdbqt
Ligand: /home/gleb/Desktop/dolce_vita/temp/active2322.pdbqt
Grid center: X 11.106 Y 0.659 Z 18.363
Grid size : X 18 Y 18 Z 18
Grid space : 0.375
Exhaustiveness: 48
CPU: 48
Verbosity: 1
Computing Vina grid ... done.
Performing docking (random seed: -1717804037) ...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
mode | affinity | dist from best mode
| (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1 -9.14 0 0
2 -9.109 2.002 2.79
3 -9.006 1.772 2.315
4 -8.925 2 2.744
5 -8.882 3.592 8.189
6 -8.803 1.564 2.092
7 -8.507 4.014 7.308
8 -8.36 2.489 8.193
9 -8.356 2.529 8.104
10 -8.33 1.408 3.841
It works OK for a moderate number of input log files (tested for up to 50k logs), but does not work for the case of big number of the input logs (e.g. with 130k logs), producing the following error:
./dolche_finito.sh: line 124: /usr/bin/awk: Argument list too long
How could I adapt the AWK script to be able processing any number of input logs?
If you get a /usr/bin/awk: Argument list too long then you'll have to control the number of "files" that you supply to awk; the standard way to do that efficiently is:
results=. # ???
i=00001 # ???
output= # ???
find "$results" -type f -name "*_rep$i.log" -exec awk '
FNR == 1 {
filename = FILENAME
sub(/.*\//,"",filename)
sub(/\.[^.]*$/,"",filename)
}
$1 == 1 { printf "%s: %s\n", filename, $2 }
' {} + |
LC_ALL=C sort -t':' -k2,2g > "$results"/ranking_"$output"_rep"$i".csv
edit: appended the rest of the chain as asked in comment
note: you might need to specify other predicates to the find command if you don't want it to search the sub-folders of $results recursively
Note that your error message:
./dolche_finito.sh: line 124: /usr/bin/awk: Argument list too long
is from your shell interpreting line 124 in your shell script, not from awk - you just happen to be calling awk at that line but it could be any other tool and you'd get the same error. Google ARG_MAX for more information on it.
Assuming printf is a builtin on your system:
printf '%s\0' "${results}"/*_rep"${i}".log |
xargs -0 awk '...'
or if you need awk to process all input files in one call for some reason and your file names don't contain newlines:
printf '%s' "${results}"/*_rep"${i}".log |
awk '
NR==FNR {
ARGV[ARGC++] = $0
next
}
...
'
If you're using GNU awk or some other awk that can process NUL characters as the RS and your input file names might contain newlines then you could do:
printf '%s\0' "${results}"/*_rep"${i}".log |
awk '
NR==FNR {
ARGV[ARGC++] = $0
next
}
...
' RS='\0' - RS='\n'
When using GNU AWK you might alter ARGC and ARGV to command GNU AWK to read additional files, consider following simple example, let filelist.txt content be
file1.txt
file2.txt
file3.txt
and content of these files to be respectively uno, dos, tres then
awk 'FNR==NR{ARGV[NR+1]=$0;ARGC+=1;next}{print FILENAME,$0}' filelist.txt
gives output
file1.txt uno
file2.txt dos
file3.txt tres
Explanation: when reading first file i.e. where number of row in file (FNR) is equal number of row globally (NR) I add to ARGV line as value under key being number of row plus one, as ARGV[1] is already filelist.txt and I increase ARGC by 1, I instruct GNU AWK to then go to next line so no other action is undertaken. For other files I print filename followed by whole line.
(tested in GNU Awk 5.0.1)

BASH: Performing decimal division on a column in file and printing result in another file

I have a file (in.txt) with the following columns:
# DM Sigma Time (s) Sample Downfact
78.20 7.36 134.200512 2096883 70
78.20 7.21 144.099904 2251561 70
78.20 9.99 148.872384 2326131 150
78.20 10.77 283.249664 4425776 45
I want to write a bash script to divide all values in column 'Time' by 0.5867, get a precision up to 2 decimal points and print out the resulting values in another file out.txt
I tried using bc/awk but it gives this error.
awk: cmd. line:1: fatal: division by zero attempted
awk: fatal: cannot open file `file' for reading (No such file or directory)
Could someone help me with this? Thanks.
This is the bash script that I attempted:
cat in.txt | while read DM Sigma Time Sample Downfact; do
echo "$DM $Sigma $Time $Sample $Downfact"
pperiod = 0.5867
awk -v n=$Time 'BEGIN {printf "%.2f\n", (n/$pperiod)}'
#echo "scale=2 ; $Time / $pperiod" | bc
#echo "$subint" > out.txt
done
I expected the script to divide column 'Time' with pperiod and get the result with a precision of 2 decimal places. This result should be printed to a file named out.txt
Lots of issues with current awk code:
need to pass in the value of the $pperiod variable
need to reference the Time column by is position ($3 in this case)
BEGIN{} block is applied before any input lines are processed and has nothing to do with processing of actual input lines
there is no code to perform processing on actual input lines
need to decide what to do in the case of a divide by zero scenario (in this case we'll default answer to 0.00)
NOTE: current code generates divide by zero error because $pperiod is an undefined (awk) variable which in turn defaults to 0
additionally, pperiod = 0.5867 is invalid bash syntax
One idea for fixing current issues:
pperiod=0.5867
awk -v pp="${pperiod}" 'NR>1 {printf "%.2f\n", (pp==0 ? 0 : ($3/pp))}' in.txt > out.txt
Where:
-v pp="${pperiod}" - assign awk variable pp the value of the bash variable "${pperiod}"
NR>1 - skip header line
NR>1 {printf "%.2f\n" ...}- for each input line, other than the header line, print the result of dividing theTimecolumn (aka$3) by the value of the awkvariablepp(which holds the value of thebashvariable"${pperiod}"`)
(pp==0 ? 0 : ($3/pp)) - if pp is equal 0 we print 0 else print result of $3/pp) (this keeps us from generating a divide by zero error)
NOTE: this also eliminates the need for the cat|while loop
This generates:
$ cat out.txt
228.74
245.61
253.75
482.78

Continuously-updated (running-count) output from a program reading from a pipeline

How can I get continuously-updated output from a program that's reading from a pipeline? For example, let's say that this program were a version of wc:
$ ls | running_wc
So I'd like this to output instantly, e.g.
0 0 0
and then every time a new output line is received, it'd update again, e.g.
1 2 12
2 4 24
etc.
Of course my command isn't really ls, it's a process that slowly outputs data... I'd actually love to dynamically have it count matches and non matches, and sum this info up on a single line, e.g,
$ my_process | count_matches error
This would constantly update a single line of output with the matching and non matching counts, e.g.
$ my_process | count_matches error
0 5
then later on it might look like so, since it's found 2 matches and 10 non matching lines.
$ my_process | count_matches error
2 10
dd will print out statistics if it receives a SIGUSR1 signal, but neither wc nor grep does that. You'll need to re-implement them, more or less.
count_matches() {
local pattern=$1
local matches=0 nonmatches=0
local line
while IFS= read -r line; do
if [[ $line == *$pattern* ]]; then ((++matches)); else ((++nonmatches)); fi
printf '\r%s %s' "$matches" "$nonmatches"
done
printf '\n'
}
Printing a carriage return \r each time causes the printouts to overwrite each other.
Most programs will switch from line buffering to full buffering when used in a pipeline. Your slow-running program should flush its output after each line to ensure the results are available immediately. Or if you can't modify it, you can often use stdbuf -oL to force programs that use C stdio to line buffer stdout.
stdbuf -oL my_process | count_matches error
Using awk. First we create the "my_process":
$ for i in {1..10} ; do echo $i ; sleep 1 ; done # slowly prints lines
The match counter:
$ awk 'BEGIN {
print "match","miss" # print header
m=0 # reset match count
}
{
if($1~/(3|6)/) # match is a 3 or 6 (for this output)
m++ # increment match count
print m,NR-m # for each record output match / miss counts
}'
Running it:
$ for i in {1..10} ; do echo $i ; sleep 1 ; done | awk 'BEGIN{print "match","miss";m=0}{if($1~/(3|6)/)m++;print m,NR-m}'
match miss
0 1
0 2
1 2
1 3
1 4
2 4
2 5
2 6
2 7
2 8

Greping multiple lines from one multiline output Bash

I have a command dumpsys power with this output:
POWER MANAGER (dumpsys power)
Power Manager State: mDirty=0x0
mWakefulness=Awake #
mWakefulnessChanging=false
mIsPowered=false
mPlugType=0
mBatteryLevel=67 #
mBatteryLevelWhenDreamStarted=0
mDockState=0
mStayOn=false #
mProximityPositive=false
mBootCompleted=true #
mSystemReady=true #
mHalAutoSuspendModeEnabled=false
mHalInteractiveModeEnabled=true
mWakeLockSummary=0x0
mUserActivitySummary=0x1
mRequestWaitForNegativeProximity=false
mSandmanScheduled=false
mSandmanSummoned=false
mLowPowerModeEnabled=false #
mBatteryLevelLow=false #
mLastWakeTime=134887327 (59454 ms ago) #
mLastSleepTime=134881809 (64972 ms ago) #
mLastUserActivityTime=134946670 (111 ms ago)
mLastUserActivityTimeNoChangeLights=134794061 (152720 ms ago)
mLastInteractivePowerHintTime=134946670 (111 ms ago)
mLastScreenBrightnessBoostTime=0 (134946781 ms ago)
mScreenBrightnessBoostInProgress=false
mDisplayReady=true #
mHoldingWakeLockSuspendBlocker=false
mHoldingDisplaySuspendBlocker=true
Settings and Configuration:
mDecoupleHalAutoSuspendModeFromDisplayConfig=false
mDecoupleHalInteractiveModeFromDisplayConfig=true
mWakeUpWhenPluggedOrUnpluggedConfig=true
mWakeUpWhenPluggedOrUnpluggedInTheaterModeConfig=false
mTheaterModeEnabled=false
mSuspendWhenScreenOffDueToProximityConfig=false
mDreamsSupportedConfig=true
mDreamsEnabledByDefaultConfig=true
mDreamsActivatedOnSleepByDefaultConfig=false
mDreamsActivatedOnDockByDefaultConfig=true
mDreamsEnabledOnBatteryConfig=false
mDreamsBatteryLevelMinimumWhenPoweredConfig=-1
mDreamsBatteryLevelMinimumWhenNotPoweredConfig=15
mDreamsBatteryLevelDrainCutoffConfig=5
mDreamsEnabledSetting=false
mDreamsActivateOnSleepSetting=false
mDreamsActivateOnDockSetting=true
mDozeAfterScreenOffConfig=true
mLowPowerModeSetting=false
mAutoLowPowerModeConfigured=false
mAutoLowPowerModeSnoozing=false
mMinimumScreenOffTimeoutConfig=10000
mMaximumScreenDimDurationConfig=7000
mMaximumScreenDimRatioConfig=0.20000005
mScreenOffTimeoutSetting=60000 #
mSleepTimeoutSetting=-1
mMaximumScreenOffTimeoutFromDeviceAdmin=2147483647 (enforced=false)
mStayOnWhilePluggedInSetting=0
mScreenBrightnessSetting=102
mScreenAutoBrightnessAdjustmentSetting=-1.0
mScreenBrightnessModeSetting=1
mScreenBrightnessOverrideFromWindowManager=-1
mUserActivityTimeoutOverrideFromWindowManager=-1
mTemporaryScreenBrightnessSettingOverride=-1
mTemporaryScreenAutoBrightnessAdjustmentSettingOverride=NaN
mDozeScreenStateOverrideFromDreamManager=0
mDozeScreenBrightnessOverrideFromDreamManager=-1
mScreenBrightnessSettingMinimum=10
mScreenBrightnessSettingMaximum=255
mScreenBrightnessSettingDefault=102
Sleep timeout: -1 ms
Screen off timeout: 60000 ms
Screen dim duration: 7000 ms
Wake Locks: size=0 Suspend Blockers: size=4
PowerManagerService.WakeLocks: ref count=0
PowerManagerService.Display: ref count=1
PowerManagerService.Broadcasts: ref count=0
PowerManagerService.WirelessChargerDetector: ref count=0
Display Power: state=ON #
I want to get the lines marked with # in a format of:
mScreenOffTimeoutSetting=60000
mDisplayReady=true
***
ScreenOfftimeoutSetting = 60000
DisplayReady = true
The commands output can vary from device to device and some of the lines might not be there or are in a different place. Thus if the searched line isn't there no errors should be generated.
It's not clear what you want. Aou can use sed to extract variables form the file and do whatever you want with them. Here's an example:
sed -n -e 's/^mSomeName=\(.*\)/newVariable=\1/p' -e 's/^mOtherName=.*+\(.*\)/newVariable2=\1/p' myFile
Explanation:
-n don't output anything per default
-e an expression follows. It's required since we have multiple expressions in place
s/^mSomeName=\(.*\)/newVariable=\1/p if file starts (^) with mSomeName= capture what follows (\(.*\)), replace the line with newVariable=\1, where \1 is what got captured, and print it out (p)
's/^mOtherName=.+(.)/newVariable2=\1/p' similar to the previous expression but will capture whatere comes after a + sign and print it behind newVariable2
This does something like:
$ sed -n -e 's/^mSomeName=\(.*\)/newVariable=\1/p' -e 's/^mOtherName=.*+\(.*\)/newVariable2=\1/p' <<<$'mSomeName=SomeValue\nmOtherName=OtherValue+Somethingelse'
newVariable=SomeValue
newVariable2=Somethingelse
<<<$'...' is a way of passing a string with linebreaks \n directly to the command in bash. You can replace it with a file. This command just outputs a string, nothing will get changed.
If you need them in bash variables use eval:
$ eval $(sed -n -e 's/^mSomeName=\(.*\)/newVariable=\1/p' -e 's/^mOtherName=.*+\(.*\)/newVariable2=\1/p' <<<$'mSomeName=SomeValue\nmOtherName=OtherValue+Somethingelse')
$ echo newVariable=$newVariable - newVariable2=$newVariable2
newVariable=SomeValue - newVariable2=Somethingelse
eval will execute the string which in this case set the variable values:
$ eval a=1
$ echo $a
1
If you want to just use Grep command, you can use -A (After) and -B (Before) options and pipes.
This is a exemple with 2 lines.
File test.txt :
test
aieauieaui
test
caieaieaipe
mSomeName=SomeValue
mOtherName=OtherValue+Somethingelse
nothing
blabla
mSomeName=SomeValue2
mOtherName=OtherValue+Somethingelse2
The command to use :
grep -A 1 'mSomeName' test.txt |grep -B 1 'mOtherName'
The output :
mSomeName=SomeValue
mOtherName=OtherValue+Somethingelse
--
mSomeName=SomeValue2
mOtherName=OtherValue+Somethingelse2

Bash error: Integer expression expected

In the sections below, you'll see the shell script I am trying to run on a UNIX machine, along with a transcript.
When I run this program, it gives the expected output but it also gives an error shown in the transcript. What could be the problem and how can I fix it?
First, the script:
#!/usr/bin/bash
while read A B C D E F
do
E=`echo $E | cut -f 1 -d "%"`
if test $# -eq 2
then
I=`echo $2`
else
I=90
fi
if test $E -ge $I
then
echo $F
fi
done
And the transcript of running it:
$ df -k | ./filter.sh -c 50
./filter.sh: line 12: test: capacity: integer expression expected
/etc/svc/volatile
/var/run
/home/ug
/home/pg
/home/staff/t
/packages/turnin
$ _
Before the line that says:
if test $E -ge $I
temporarily place the line:
echo "[$E]"
and you'll find something very much non-numeric, and that's because the output of df -k looks like this:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb1 954316620 212723892 693109608 24% /
udev 10240 0 10240 0% /dev
: :
The offending line there is the first, which will have its fifth field Use% turned into Use, which is definitely not an integer.
A quick fix may be to change your usage to something like:
df -k | sed -n '2,$p' | ./filter -c 50
or:
df -k | tail -n+2 | ./filter -c 50
Either of those extra filters (sed or tail) will print only from line 2 onwards.
If you're open to not needing a special script at all, you could probably just get away with something like:
df -k | awk -vlimit=40 '$5+0>=limit&&NR>1{print $5" "$6}'
The way it works is to only operate on lines where both:
the fifth field, converted to a number, is at least equal to the limit passed in with -v; and
the record number (line) is two or greater.
Then it simply outputs the relevant information for those matching lines.
This particular example outputs the file system and usage (as a percentage like 42%) but, if you just want the file system as per your script, just change the print to output $6 on its own: {print $6}.
Alternatively, if you do the percentage but without the %, you can use the same method I used in the conditional: {print $5+0" "$6}.

Resources