Bash scripting: Find minimum value in a script - bash

I am writing a script that finds the minimum value in a string. The string is given to me with a cat <file> and then I parse each number inside that string. The string only contains a set of numbers that is separated by spaced.
This is the code:
echo $FREQUENCIES
for freq in $FREQUENCIES
do
echo "Freq: $freq"
if [ -z "$MINFREQ" ]
then
MINFREQ=$freq
echo "Assigning MINFREQ for the first time with $freq"
elif [ $MINFREQ -gt $freq ]
then
MINFREQ=$freq
echo "Replacing MINFREQ with $freq"
fi
done
Here is the output I get:
800000 700000 600000 550000 500000 250000 125000
Freq: 800000
Assigning MINFREQ for the first time with 800000
Freq: 700000
Replacing MINFREQ with 700000
Freq: 600000
Replacing MINFREQ with 600000
Freq: 550000
Replacing MINFREQ with 550000
Freq: 500000
Replacing MINFREQ with 500000
Freq: 250000
Replacing MINFREQ with 250000
Freq: 125000
Replacing MINFREQ with 125000
Freq:
: integer expression expected
The problem is that the last line, for some reason, is empty or contain white spaces (I am not sure why). I tried testing if the variable was set: if [ -n "$freq" ] but this test doesn't seem to work fine here, it still goes through the if statement for the last line.
Could someone please help me figure out why the last time the loop executes, $freq is set to empty or whitespace and how to avoid this please?
EDIT:
using od -c feeded with echo "<<$freq>>"
0000000 < < 8 0 0 0 0 0 > > \n
0000013
0000000 < < 7 0 0 0 0 0 > > \n
0000013
0000000 < < 6 0 0 0 0 0 > > \n
0000013
0000000 < < 5 5 0 0 0 0 > > \n
0000013
0000000 < < 5 0 0 0 0 0 > > \n
0000013
0000000 < < 2 5 0 0 0 0 > > \n
0000013
0000000 < < 1 2 5 0 0 0 > > \n
0000013
0000000 < < \r > > \n
0000006
There seems to be an extra \r (from the file).
Thank you very much!

If you're only working with integer values, you can validate your string using regex:
elif [[ $freq =~ ^[0-9]+$ && $MINFREQ -gt $freq ]]

For the error problem: you might have some extra white space in $FREQUENCIES?
Another solution with awk
echo $FREQUENCIES | awk '{min=$1;for (i=1;i++;i<=NF) {if ( $i<min ) { min=$i } } ; print min }'
If it's a really long variable, you can go with:
echo $FREQUENCIES | awk -v RS=" " 'NR==1 {min=$0} {if ( $0<min ) { min=$0 } } END {print min }'
(It sets the record separator to space, then on the very first record sets the min to the value, then for every record check if it's smaller than min and finally prints it.
HTH

If you are using bash you have arithmetic expressions and the "if unset: use value and assign" parameter substitution:
#!/bin/bash
for freq in "$#"; do
(( minfreq = freq < ${minfreq:=freq} ? freq : minfreq ))
done
echo $minfreq
use:
./script 800000 700000 600000 550000 500000 250000 125000

Data :
10,
10.2,
-3,
3.8,
3.4,
12
Minimum :
echo -e "10\n10.2\n-3\n3.8\n3.4\n12" | sort -n | head -1
Output: -3
Maximum :
echo -e "10\n10.2\n-3\n3.8\n3.4\n12" | sort -nr | head -1
Output: 12
How ? : 1. Print line by line 2. sort for numbers (Reverse for getting maximum)3. print first line alone.Simple !!
This may not be a good method. But easy for learners. I am sure.

echo $FREQUENCIES | awk '{for (;NF-1;NF--) if ($1>$NF) $1=$NF} 1'
compare first and last field
set first field to the smaller of the two
remove last field
once one field remains, print
Example

Related

Head & tail string in one line - possible?

I want to retrieve the first X and the last Y characters from a string (standard ascii, so no worries about unicode).
I understand that I can do this as seperate actions, i.e. :
FIRST=$(echo foobar | head -c 3)
LAST=$(echo foobar | tail -c 3)
COMBINED= "${FIRST}${LAST}"
But is there a cleaner way to do this ?
I would prefer to use common standard utils (i.e. bash built-ins, sed, awk etc.). At a push, a Perl one-liner is OK, but no Python or anything else.
head + tail two answers, regarding -c switch
head + tail character based (with -c, reducing strings)
Under bash, you could
string=foobarbaz
echo ${string::3}${string: -3}
foobaz
But to avoid repetion in case of shorter strings:
if ((${#string}>6));then
echo ${string::3}${string: -3}
else
echo $string
fi
Full bash function
shrinkStr(){
local sep='..' opt OPTIND OPTARG string varname='' paddstr paddchr=' '
local -i maxlen=40 lhlen=15 rhlen padd=0
while getopts 'P:l:m:s:v:p' opt; do
case $opt in
l) lhlen=$OPTARG ;;
m) maxlen=$OPTARG ;;
p) padd=1 ;;
P) paddchr=$OPTARG ;;
s) sep=$OPTARG ;;
v) varname=$OPTARG ;;
*) echo Wrong arg.; return 1 ;;
esac
done
rhlen="maxlen-lhlen-${#sep}"
((rhlen<1)) && { echo bad lengths; return 1;}
shift $((OPTIND-1))
string="$*"
if ((${#string}>maxlen)) ;then
string="${string::lhlen}$sep${string: -rhlen}"
elif ((${#string}<maxlen)) && ((padd));then
printf -v paddstr '%*s' $((maxlen-${#string})) ''
string+=${paddstr// /$paddchr}
fi
if [[ $varname ]] ;then
printf -v "$varname" '%s' "$string"
else
echo "$string"
fi
}
Then
shrinkStr -l 4 -m 10 Hello world!
Hell..rld!
shrinkStr -l 2 -m 10 Hello world!
He..world!
shrinkStr -l 3 -m 10 -s '+++' Hello world!
Hel+++rld!
This work even with UTF-8 characters:
cnt=1;for str in Généralités Language Théorème Février 'Hello world!';do
shrinkStr -l5 -m11 -vOutstr -pP_ "$str"
printf ' %11d: |%s|\n' $((cnt++)) "$Outstr"
done
1: |Généralités|
2: |Language___|
3: |Théorème___|
4: |Février____|
5: |Hello..rld!|
cnt=1;for str in Généralités Language Théorème Février 'Hello world!';do
shrinkStr -l5 -m10 -vOutstr -pP_ "$str"
printf ' %11d: |%s|\n' $((cnt++)) "$Outstr"
done
1: |Génér..tés|
2: |Language__|
3: |Théorème__|
4: |Février___|
5: |Hello..ld!|
head + tail lines based (without -c, reducing files)
By using only one fork to sed.
Here is a little function I wrote for this:
headTail() {
local hln=${1:-10} tln=${2:-10} str;
printf -v str '%*s' $((tln-1)) '';
sed -ne "1,${hln}{p;\$q};$((hln+1)){${str// /\$!N;}};:a;\$!{N;D;ba};p"
}
Usage:
headTail <head lines> <tail lines>
Both argument default are 10.
In practice:
headTail 3 4 < <(seq 1 1000)
1
2
3
997
998
999
1000
Seem correct. Testing border case (where number of line are smaller than requested):
headTail 1 9 < <(seq 1 3)
1
2
3
headTail 9 1 < <(seq 1 3)
1
2
3
Taking more lines: (I will take 100 first and 100 last lines, but print only 2 Top lines, 4 Middle lines and 2 Bottom lines of headTail's output.):
headTail 100 100 < <(seq 1 2000)|sed -ne '1,2s/^/T /p;99,102s/^/M /p;199,$s/^/B /p'
T 1
T 2
M 99
M 100
M 1901
M 1902
B 1999
B 2000
BUG (limit): Don't use this with 0 as argument!
headTail 0 3 < <(seq 1 2000)
1
1998
1999
2000
headTail 3 0 < <(seq 1 2000)
1
2
3
1999
2000
BUG (limit): because of max line length:
headTail 4 32762 <<<Foo\ bar
bash: /bin/sed: Argument list too long
For both this to be supported, function would become:
head + tail lines, using one fork to sed
headTail() {
local hln=${1:-10} tln=${2:-10} str sedcmd=''
((hln>0)) && sedcmd+="1,${hln}{p;\$q};"
if ((tln>0)) ;then
printf -v str '%*s' $((tln-1)) ''
sedcmd+="$((hln+1)){${str// /\$!N;}};:a;\$!{N;D;ba};p;"
fi
sed -nf <(echo "$sedcmd")
}
Then
headTail 3 4 < <(seq 1 1000) |xargs
1 2 3 997 998 999 1000
headTail 3 0 < <(seq 1 1000) |xargs
1 2 3
headTail 0 4 < <(seq 1 1000) |xargs
997 998 999 1000
for i in {6..9};do printf " %3d: " $i;headTail 3 4 < <(seq 1 $i) |xargs; done
6: 1 2 3 4 5 6
7: 1 2 3 4 5 6 7
8: 1 2 3 5 6 7 8
9: 1 2 3 6 7 8 9
Stronger test: With bigger numbers: Reading 500'000 first and 500'000 last lines from an input of 3'000'000 lines:
headTail 500000 500000 < <(seq 1 3000000) | sed -ne '499999,500002p'
499999
500000
2500001
2500002
headTail 5000000 5000000 < <(seq 1 30000000) | sed -ne '4999999,5000002p'
4999999
5000000
25000001
25000002
$ perl -E '($s, $x, $y) = #ARGV; substr $s, $x, -$y, ""; say $s' abcdefgh 2 3
abfgh
The four argument variant of substr replaces the given portion of the string with the last argument. Here, we replace from position $x to position -$y (negative numbers count from the end of the string), and use an empty string as replacement, i.e. we remove the middle part.

Optimally finding the index of the maximum element in BASH array

I am using bash in order to process software responses on-the-fly and I am looking for a way to find the
index of the maximum element in the array.
The data that gets fed to the bash script is like this:
25 9
72 0
3 3
0 4
0 7
And so I create two arrays. There is
arr1 = [ 25 72 3 0 0 ]
arr2 = [ 9 0 3 4 7 ]
And what I need is to find the index of the maximum number in arr1 in order to use it also for arr2.
But I would like to see if there is a quick - optimal way to do this.
Would it maybe be better to use a dictionary structure [key][value] with the data I have? Would this make the process easier?
I have also found [1] (from user jhnc) but I don't quite think it is what I want.
My brute - force approach is the following:
function MAX {
arr1=( 25 72 3 0 0 )
arr2=( 9 0 3 4 7 )
local indx=0
local max=${arr1[0]}
local flag
for ((i=1; i<${#arr1[#]};i++)); do
#To avoid invalid arithmetic operators when items are floats/doubles
flag=$( python <<< "print(${arr1$[${i}]} > ${max})")
if [ $flag == "True" ]; then
indx=${i}
max=${arr1[${i}]}
fi
done
echo "MAX:INDEX = ${max}:${indx}"
echo "${arr1[${indx}]}"
echo "${arr2[${indx}]}"
}
This approach obviously will work, BUT, is it the optimal one? Is there a faster way to perform the task?
arr1 = [ 99.97 0.01 0.01 0.01 0 ]
arr2 = [ 0 6 4 3 2 ]
In this example, if an array contains floats then I would get a
syntax error: invalid arithmetic operator (error token is ".97)
So, I am using
flag=$( python <<< "print(${arr1$[${i}]} > ${max})")
In order to overcome this issue.
Finding a maximum is inherently an O(n) operation. But there's no need to spawn a Python process on each iteration to perform the comparison. Write a single awk script instead.
awk 'BEGIN {
split(ARGV[1], a1);
split(ARGV[2], a2);
max=a1[1];
indx=1;
for (i in a1) {
if (a1[i] > max) {
indx = i;
max = a1[i];
}
}
print "MAX:INDEX = " max ":" (indx - 1)
print a1[indx]
print a2[indx]
}' "${arr1[*]}" "${arr2[*]}"
The two shell arrays are passed as space-separated strings to awk, which splits them back into awk arrays.
It's difficult to do it efficiently if you really do need to compare floats. Bash can't do floats, which means invoking an external program for every number comparison. However, comparing every number in bash, is not necessarily needed.
Here is a fast, pure bash, integer only solution, using comparison:
#!/bin/bash
arr1=( 25 72 3 0 0)
arr2=( 9 0 3 4 7)
# Get the maximum, and also save its index(es)
for i in "${!arr1[#]}"; do
if ((arr1[i]>arr1_max)); then
arr1_max=${arr1[i]}
max_indexes=($i)
elif [[ "${arr1[i]}" == "$arr1_max" ]]; then
max_indexes+=($i)
fi
done
# Print the results
printf '%s\n' \
"Array1 max is $arr1_max" \
"The index(s) of the maximum are:" \
"${max_indexes[#]}" \
"The corresponding values from array 2 are:"
for i in "${max_indexes[#]}"; do
echo "${arr2[i]}"
done
Here is another optimal method, that can handle floats. Comparison in bash is avoided altogether. Instead the much faster sort(1) is used, and is only needed once. Rather than starting a new python instance for every number.
#!/bin/bash
arr1=( 25 72 3 0 0)
arr2=( 9 0 3 4 7)
arr1_max=$(printf '%s\n' "${arr1[#]}" | sort -n | tail -1)
for i in "${!arr1[#]}"; do
[[ "${arr1[i]}" == "$arr1_max" ]] &&
max_indexes+=($i)
done
# Print the results
printf '%s\n' \
"Array 1 max is $arr1_max" \
"The index(s) of the maximum are:" \
"${max_indexes[#]}" \
"The corresponding values from array 2 are:"
for i in "${max_indexes[#]}"; do
echo "${arr2[i]}"
done
Example output:
Array 1 max is 72
The index(s) of the maximum are:
1
The corresponding values from array 2 are:
0
Unless you need those arrays, you can also feed your input script directly in to something like this:
#!/bin/bash
input-script |
sort -nr |
awk '
(NR==1) {print "Max: "$1"\nCorresponding numbers:"; max = $1}
{if (max == $1) print $2; else exit}'
Example (with some extra numbers):
$ echo \
'25 9
72 0
72 11
72 4
3 3
3 14
0 4
0 1
0 7' |
sort -nr |
awk '(NR==1) {max = $1; print "Max: "$1"\nCorresponding numbers:"}
{if (max == $1) print $2; else exit}'
Max: 72
Corresponding numbers:
4
11
0
You can also do it 100% in awk, including sorting:
$ echo \
'25 9
72 0
72 11
72 4
3 3
3 14
0 4
0 1
0 7' |
awk '
{
col1[a++] = $1
line[a-1] = $0
}
END {
asort(col1)
col1_max = col1[a-1]
print "Max is "col1_max"\nCorresponding numbers are:"
for (i in line) {
if (line[i] ~ col1_max"\\s") {
split(line[i], max_line)
print max_line[2]
}
}
}'
Max is 72
Corresponding numbers are:
0
11
4
Or, just to get the maximum of column 1, and any single number from column 2, that corresponds with it. As simply as possible:
$ echo \
'25 9
72 0
3 3
0 4
0 7' |
sort -nr |
head -1
72 0

How to subtract values of a specific row value from all the other row values?

My current working file is like this
ID Time A_in Time B_in Time C_in
Ax 0.1 10 0.1 15 0.1 45
By 0.2 12 0.2 35 0.2 30
Cz 0.3 20 0.3 20 0.3 15
Fr 0.4 35 0.4 15 0.4 05
Exp 0.5 10 0.5 25 0.5 10
My columns of interest are those with "_in" header. In those columns, I want to subtract the values of all Row elements from the row element that start with ID "Exp".
Lets consider A_in column, where the "Exp" row value is 10. So I want to subtract 10 from all the other elements of that A_in column
My amateur code is like this (I know it is silly)
#This part is grabbing all the values in ```Exp``` row
Exp=$( awk 'BEGIN{OFS="\t";
PROCINFO["sorted_in"] = "#val_num_asc"}
FNR==1 { for (n=2;n<=NF;n++) { if ($n ~ /_GasOut$/) cols[$n]=n; }}
/Exp/ {
for (c in cols){
shift = $cols[c]
printf shift" "
}
}
' File.txt |paste -sd " ")
Exp_array=($Exp)
z=1
for i in "${Exp_array[#]}"
do
z=$(echo 2+$z | bc -l)
Exp_point=$i
awk -vd="$Exp_point" -vloop="$z" -v '
BEGIN{OFS="\t";
PROCINFO["sorted_in"] = "#val_num_asc"}
function abs(x) {return x<0?-x:x}
FNR==1 { for (n=2;n<=NF;n++) { if ($n ~ /_GasOut$/) cols[$n]=n; }}
NR>2{
$loop=abs($loop-d); print
}
' File.txt
done
My First desired outcome is this
ID Time A_in Time B_in Time C_in
Ax 0.1 0.0 0.1 10 0.1 35
By 0.2 02 0.2 10 0.2 20
Cz 0.3 10 0.3 05 0.3 05
Fr 0.4 25 0.4 10 0.4 05
Exp 0.5 0.0 0.5 0.0 0.5 0.0
Now from each "_in" columns I want to find the corresponding ID of 2 smallest values. So
My second desired outcome is
A_in B_in C_in
Ax Cz Cz
By Exp Fr
Exp Exp
Perl to the rescue!
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
#ARGV = (#ARGV[0, 0]); # Read the input file twice.
my #header = split ' ', <>;
my #in = grep $header[$_] =~ /_in$/, 0 .. $#header;
$_ = <> until eof;
my #exp = split;
my #min;
<>;
while (<>) {
my #F = split;
for my $i (#in) {
$F[$i] = abs($F[$i] - $exp[$i]);
#{ $min[$i] }[0, 1]
= sort { $a->[0] <=> $b->[0] }
[$F[$i], $F[0]], grep defined, #{ $min[$i] // [] }
unless eof;
}
say join "\t", #F;
}
print "\n";
say join "\t", #header[#in];
for my $index (0, 1) {
for my $i (#in) {
next unless $header[$i] =~ /_in$/;
print $min[$i][$index][1], "\t";
}
print "\n";
}
It reads the file twice. In the first read, it just remembers the first line as the #header array and the last line as the #exp array.
In the second read, it subtracts the corresponding exp value from each _in column. It also stores the two least numbers in the #min array at the position corresponding to the column position.
Formatting the numbers (i.e. 0.0 instead of 0 and 02 instead of 2) left as an exercise to the reader. Same with redirecting the output to several different files.
After some fun and an hour or two I wrote this abomination:
cat <<EOF >file
ID Time A_in Time B_in Time C_in
Ax 0.1 10 0.1 15 0.1 45
By 0.2 12 0.2 35 0.2 30
Cz 0.3 20 0.3 20 0.3 15
Fr 0.4 35 0.4 15 0.4 05
Exp 0.5 10 0.5 25 0.5 10
EOF
# fix stackoverflow formatting
# input file should be separated with tabs
<file tr -s ' ' | tr ' ' '\t' > file2
mv file2 inputfile
# read headers to an array
IFS=$'\t' read -r -a hdrs < <(head -n1 inputfile)
# exp line read into an array
IFS=$'\t' read -r -a exps < <(grep -m1 $'^Exp\t' inputfile)
# column count
colcnt="${#hdrs[#]}"
if [ "$colcnt" -eq 0 ]; then
echo >&2 "ERROR - must be at least one column"
exit 1
fi
# numbers of those columns which headers have _in suffix
incolnums=$(
paste <(
printf "%s\n" "${hdrs[#]}"
) <(
# puff, the numbers will start from zero cause bash indexes arrays from zero
# but `cut` indexes fields from 1, so.. just keep in mind it's from 0
seq 0 $((colcnt - 1))
) |
grep $'_in\t' |
cut -f2
)
# read the input file
{
# preserve header line
IFS= read -r hdrline
( IFS=$'\t'; printf "%s\n" "$hdrline" )
# ok. read the file field by field
# I think we could awk here
while IFS=$'\t' read -a vals; do
# for each column number with _in suffix
while IFS= read -r incolnum; do
# update the column value
# I use bc for float calculations
vals[$incolnum]=$(bc <<-EOF
define abs(i) {
if (i < 0) return (-i)
return (i)
}
scale=2
abs(${vals[$incolnum]} - ${exps[$incolnum]})
EOF
)
done <<<"$incolnums"
# output the line
( IFS=$'\t'; printf "%s\n" "${vals[*]}" )
done
} < inputfile > MyFirstDesiredOutcomeIsThis.txt
# ok so, first part done
{
# output headers names with _in suffix
printf "%s\n" "${hdrs[#]}" |
grep '_in$' |
tr '\n' '\t' |
# omg, fix tr, so stupid
sed 's/\t$/\n/'
# puff
# output the corresponding ID of 2 smallest values of the specified column number
# #arg: $1 column number
tmpf() {
# remove header line
<MyFirstDesiredOutcomeIsThis.txt tail -n+2 |
# extract only this column
cut -f$(($1 + 1)) |
# unique numeric sort and extract two smallest values
sort -n -u | head -n2 |
# now, well, extract the id's that match the numbers
# append numbers with tab (to match the separator)
# suffix numbers with dollar (to match end of line)
sed 's/^/\t/; s/$/$/;' |
# how good is grep at buffering(!)
grep -f /dev/stdin <(
<MyFirstDesiredOutcomeIsThis.txt tail -n+2 |
cut -f1,$(($1 + 1))
) |
# extract numbers only
cut -f1
}
# the following is something like foldr $'\t' $(tmpf ...) for each $incolnums
# we need to buffer here, we are joining the output column-wise
output=""
while IFS= read -r incolnum; do
output=$(<<<$output paste - <(tmpf "$incolnum"))
done <<<"$incolnums"
# because with start with empty $output, paste inserts leading tabs
# remove them ... and finally output $output
<<<"$output" cut -f2-
} > MySecondDesiredOutcomeIs.txt
# fix formatting to post it on stackoverflow
# files have tabs, and column will output them with space
# which is just enough
echo '==> MyFirstDesiredOutcomeIsThis.txt <=='
column -t -s$'\t' MyFirstDesiredOutcomeIsThis.txt
echo
echo '==> MySecondDesiredOutcomeIs.txt <=='
column -t -s$'\t' MySecondDesiredOutcomeIs.txt
The script will output:
==> MyFirstDesiredOutcomeIsThis.txt <==
ID Time A_in Time B_in Time C_in
Ax 0.1 0 0.1 10 0.1 35
By 0.2 2 0.2 10 0.2 20
Cz 0.3 10 0.3 5 0.3 5
Fr 0.4 25 0.4 10 0.4 5
Exp 0.5 0 0.5 0 0.5 0
==> MySecondDesiredOutcomeIs.txt <==
A_in B_in C_in
Ax Cz Cz
By Exp Fr
Exp Exp
Written and tested at tutorialspoint.
I use bash and core-/more-utils to manipulate the file. First I identify the numbers of columns ending with _in suffix. Then I buffor the value stored in the Exp line.
Then I just read a file line by line, field by field, and for each field that has the number of a column that header ends with _in suffix, I substract the field value with the field value from the exp line. I think this part should be the slowest (I use plain while IFS=$'\t' read -r -a vals), but a smart awk scripting could speed this process up. This generates your "first desired output", as you called it.
Then I need to output only the header names ending with _in suffix. Then for each column number that ends with _in suffix, I need to identify 2 smallest values in the column. I use plain sort -n -u | head -n2. Then, it get's a little tricky. I need to extract IDs that have one of the corresponding 2 smallest values in such column. This is a job for grep -f. I prepare proper regex in the input using sed and let grep -f /dev/stdin do the filtering job.
Please just ask 1 question at a time. Here's how to do the first thing you asked about:
$ cat tst.awk
BEGIN { OFS="\t" }
NR==FNR { if ($1=="Exp") split($0,exps); next }
FNR==1 { $1=$1; print; next }
{
for (i=1; i<=NF; i++) {
val = ( (i-1) % 2 ? $i : exps[i] - $i )
printf "%s%s", (val < 0 ? -val : val), (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file file
ID Time A_in Time B_in Time C_in
0 0.1 0 0.1 10 0.1 35
0 0.2 2 0.2 10 0.2 20
0 0.3 10 0.3 5 0.3 5
0 0.4 25 0.4 10 0.4 5
0 0.5 0 0.5 0 0.5 0
The above will work efficiently and robustly using any awk in any shell on every UNIX box.
If after reading this, re-reading the previous awk answers you've received, and looking up the awk man page you still need help with the 2nd thing you asked about then ask a new standalone question just about that.

Iterating over a text file in bash and rounding each number

My file looks like this
0 0 1 0.2 1 1
1 1 0.8 0.1 1
0.2 0.4 1 0 1
And I need to a create a new output file
0 0 1 0 1 1
1 1 1 0 1
0 0 1 0 1
i.e. if the number is greater than 0.5, it is rounded up to 1, and if it less than 0.5, it is rounded down to 0 and put into a new file.
The file is quite large, with ~ 1400000000 values. I would quite like to write a bash script to do this.
I am guessing the best way to do this would be to iterate over each value in a for loop, with an if statement inside which tests whether the number is greater or less than 0.5 and then prints 0 or 1 dependent.
The pseudocode would look like this, but my bash isn't great so - before you tell my it isnt syntatically correct, I already know
#!/bin/bash
#reads in each line
while read p; do
#loops through each number in each line
for i in p; do
#tests if each number is greater than or equal to 0.5 and prints accordingly
if [i => 0.5]
then
print 1
else
print 0
fi
done < test.txt >
I'm not really sure how to do this. Can anyone help? Thanks.
awk '{
for( i=1; i<=NF; i++ )
$i = $i<0.5 ? 0 : 1
}1' input_file > output_file
$i = $i<0.5 ? 0 : 1 changes each field to 0 or 1 and {...}1 will print the line with the changed values afterwards.
another awk without loops...
$ awk -v RS='[ \n]' '{printf ($1>=0.5) RT}' file
0 0 1 0 1 1
1 1 1 0 1
0 0 1 0 1
if the values are not between 0 and 1, you may want to change to
$ awk -v RS='[ \n]' '{printf "%.0f%s", $1, RT}' file
note that default rounding is to the even (i.e. 0.5 -> 0, but 1.5 -> 2). If you want always to round up
$ awk -v RS='[ \n]' '{i=int($1); printf "%d%s", i+(($1-i)>=0.5), RT}' file
should take of non-negative numbers. For negatives, there are again two alternatives, round towards zero or towards negative infinity.
Here's one in Perl using regex and look-ahead:
$ perl -p -e 's/0(?=\.[6789])/1/g;s/\.[0-9]+//g' file
0 0 1 0 1 1
1 1 1 0 1
0 0 1 0 1
I went with the if it less than 0.5, it is rounded down to 0 part.

How do I iterate through each line of a command's output in bash?

I have a script that reads from /proc/stat and calculates CPU usage. There are three relevant lines in /proc/stat:
cpu 1312092 24 395204 12582958 77712 456 3890 0 0 0
cpu0 617029 12 204802 8341965 62291 443 2718 0 0 0
cpu1 695063 12 190402 4240992 15420 12 1172 0 0 0
Currently, my script only reads the first line and calculates usage from that:
cpu=($( cat /proc/stat | grep '^cpu[^0-9] ' ))
unset cpu[0]
idle=${cpu[4]}
total=0
for value in "${cpu[#]}"; do
let total=$(( total+value ))
done
let usage=$(( (1000*(total-idle)/total+5)/10 ))
echo "$usage%"
This works as expected, because the script only parses this line:
cpu 1312092 24 395204 12582958 77712 456 3890 0 0 0
It's easy enough to get only the lines starting with cpu0 and cpu1
cpu=$( cat /proc/stat | grep '^cpu[0-9] ' )
but I don't know how to iterate over each line and apply this same process. Ive tried resetting the internal field separator inside a subshell, like this:
cpus=$( cat /proc/stat | grep '^cpu[0-9] ' )
(
IFS=$'\n'
for cpu in $cpus; do
cpu=($cpu)
unset cpu[0]
idle=${cpu[4]}
total=0
for value in "${cpu[#]}"; do
let total=$(( total+value ))
done
let usage=$(( (1000*(total-idle)/total+5)/10 ))
echo -n "$usage%"
done
)
but this gets me a syntax error
line 18: (1000*(total-idle)/total+5)/10 : division by 0 (error token is "+5)/10 ")
If I echo the cpu variable in the loop it looks like it's separating the lines properly. I looked at this thread and I think Im assigning the cpu variable to an array properly but is there another error Im not seeing?
I put my script into whats wrong with my script and it doesnt show me any errors apart from a warning about using cat within $(), s o I'm stumped.
Change this line in the middle of your loop:
IFS=' ' cpu=($cpu)
You need this because outside of your loop you're setting IFS=$'\n', but with that settingcpu($cpu)` won't do what you expect.
Btw, I would write your script like this:
#!/bin/bash -e
grep ^cpu /proc/stat | while IFS=$'\n' read cpu; do
cpu=($cpu)
name=${cpu[0]}
unset cpu[0]
idle=${cpu[4]}
total=0
for value in "${cpu[#]}"; do
((total+=value))
done
((usage=(1000 * (total - idle) / total + 5) / 10))
echo "$name $usage%"
done
The equivalent using awk:
awk '/^cpu/ { total=0; idle=$5; for (i=2; i<=NF; ++i) { total += $i }; print $1, int((1000 * (total - idle) / total + 5) / 10) }' < /proc/stat
Because the OP asked, an awk program.
awk '
/cpu[0-9] .*/ {
total = 0
idle = $5
for(i = 0; i <= NF; i++) { total += $i; }
printf("%s: %f%%\n", $1, 100*(total-idle)/total);
}
' /proc/stat
The /cpu[0-9] .*/ means "execute for every line matching this expression".
The variables like $1 do what you'd expect, but the 1st field has index 1, not 0: $0 means the whole line in awk.

Resources