awk command stacks in top - bash

total_pool=1
2
3
4
.
.
.
above is my variable name "total_pool" it has values of thousands in a single column.and It continuously change before every time I fire this script.
I want to parse each single entity of it in a loop..
The problem is this scripts runs from a crontab every 5 minutes..
and in my output of top command some times this query stacks !
like /bin/awk -vRS= -vFS="\n" "{print $1} for a long long time..
How to stop this behavior ? any better approch ?
NOTE:I cannot use array as I have too old bash version.Which do not have array support.
So any better approch to grep data from a column variable one by one ??
#!/bin/sh
row=1
for POOL in ${total_pool} ;
do
poolid=$(/bin/echo "$total_pool" | /bin/awk -vRS= -vFS="\n" "{print \$$row}"
/usr/local/rrd/bin/rrdtool update /var/graphs/p${poolid}.rrd `NOW`:$upload
row=`expr $row + 1`
done

Sounds like echo's standard output is being buffered. If stdbuf from coreutils is an option, you may want to use it to disable echo's output buffering.

Related

Sed through files without using for loop?

I have a small script which basically generates a menu of all the scripts in my ~/scripts folder and next to each of them displays a sentence describing it, that sentence being the third line within the script commented out. I then plan to pipe this into fzf or dmenu to select it and start editing it or whatever.
1 #!/bin/bash
2
3 # a script to do
So it would look something like this
foo.sh a script to do X
bar.sh a script to do Y
Currently I have it run a for loop over all the files in the scripts folder and then run sed -n 3p on all of them.
for i in $(ls -1 ~/scripts); do
echo -n "$i"
sed -n 3p "~/scripts/$i"
echo
done | column -t -s '#' | ...
I was wondering if there is a more efficient way of doing this that did not involve a for loop and only used sed. Any help will be appreciated. Thanks!
Instead of a loop that is parsing ls output + sed, you may try this awk command:
awk 'FNR == 3 {
f = FILENAME; sub(/^.*\//, "", f); print f, $0; nextfile
}' ~/scripts/* | column -t -s '#' | ...
Yes there is a more efficient way, but no, it doesn't only use sed. This is probably a silly optimization for your use case though, but it may be worthwhile nonetheless.
The inefficiency is that you're using ls to read the directory and then parse its output. For large directories, that causes lots of overhead for keeping that list in memory even though you only traverse it once. Also, it's not done correctly, consider filenames with special characters that the shell interprets.
The more efficient way is to use find in combination with its -exec option, which starts a second program with each found file in turn.
BTW: If you didn't rely on line numbers but maybe a tag to mark the description, you could also use grep -r, which avoids an additional process per file altogether.
This might work for you (GNU sed):
sed -sn '1h;3{H;g;s/\n/ /p}' ~/scripts/*
Use the -s option to reset the line number addresses for each file.
Copy line 1 to the hold space.
Append line 3 to the hold space.
Swap the hold space for the pattern space.
Replace the newline with a space and print the result.
All files in the directory ~/scripts will be processed.
N.B. You may wish to replace the space delimiter by a tab or pipe the results to the column command.

Is there a way to take an input that behaves like a file in bash?

I have a task where I'm given an input of the format:
4
A CS 22 M
B ECE 23 M
C CS 23 F
D CS 22 F
as the user input from the command line. From this, we have to perform tasks like determine the number of male and female students, determine which department has the most students, etc. I have done this using awk with the input as a file. Is there any way to do this with a user input instead of a file?
Example of a command I used for a file (where the content in the file is in the same format):
numberofmales=$(awk -F ' ' '{print $4}' file.txt | grep M | wc -l) #list number of males
Not Reproducible
It works fine for me, so your problem can't be reproduced with either GNU or BSD awk under Bash 5.0.18(1). With your posted code and file sample:
$ numberofmales=$(awk -F ' ' '{print $4}' file.txt | grep M | wc -l)
$ echo $numberofmales
2
Check to make sure you don't have problems in your input file, or elsewhere in your code.
Also, note that if you call awk without a file argument or input from a pipe, it tries to collect data from standard input. It may not actually be hanging; it's probably just waiting on end-of-file, which you can trigger with CTRL+D.
Recommended Improvements
Even if your code works, it can be improved. Consider the following, which skips the unnecessary field-separator definition and performs all the actions of your pipeline within awk.
males=$(
awk 'tolower($4)=="m" {count++}; END {print count}' file.txt
)
echo "$males"
Fewer moving parts are often easier to debug, and can often be more performant on large datasets. However, your mileage may vary.
User Input
If you want to use user input rather than a file, you can use standard input to collect your data, and then pass it as a quoted argument to a function. For example:
count_males () {
awk 'tolower($4)=="m" {count++}; END {print count}' <<< "$*"
}
echo "Enter data (CTRL-D when done):"
data=$(cat -)
# If at command prompt, wait until EOF above before
# pasting this line. Won't matter in scripts.
males=$(count_males "$data")
The result is now stored in males, and you can echo "$males" or make use of the variable in whatever other way you like.
Bash indeed does not care whether a file handle is connected to standard input or to a file, and neither does Awk.
However, if you want to pass the same input to multiple Awk instances, it really does make sense to store it in a temporary file.
A better overall solution is to write a better Awk script so you only need to read the input once.
awk 'NF > 1 { ++a[$4] } END { for (g in a) print g, a[g] }'
Demo: https://ideone.com/0ML7Xk
The NF > 1 condition is to skip the silly first line. Probably don't put that information there in the first place and let Awk figure out how many lines there are; it's probably better at counting than you are anyway.

More efficient way to loop through lines in shell

I've come to learn that looping through lines in bash by
while read line; do stuff; done <file
Is not the most efficient way to do it. https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice
What is a more time/resource efficient method?
Here's a time'd example using Bash and awk. I have 1 million records in a file:
$ wc -l 1M
1000000 1M
Counting it's records with bash, using while read:
$ time while read -r line ; do ((i++)) ; done < 1M ; echo $i
real 0m12.440s
user 0m11.548s
sys 0m0.884s
1000000
Using let "i++" took 15.627 secs (real) and NOPing with do : ; 10.466. Using awk:
$ time awk '{i++}END{print i}' 1M
1000000
real 0m0.128s
user 0m0.128s
sys 0m0.000s
As others have said, it depends on what you're doing.
The reason it's inefficient is that everything runs in its own process. Depending on what you are doing, that may or may not be a big deal.
If what you want to do in the loop is run another shell process, you won't get any gain from eliminating the loop. If you can do what you need without the need for a loop, you could get a gain.
awk? Perl? C(++)? Of course it depends on if you're interested in CPU time or programmer time, and the latter depends on what the programmer is used to using.
The top answer to the question you linked to pretty much explains that the biggest problem is spawning external processes for simple text processing tasks. E.g. running an instance of awk or a pipeline of sed and cut for each single line just to get a part of the string is silly.
If you want to stay in shell, use the string processing parameter expansions (${var#word}, ${var:n:m}, ${var/search/replace} etc.) and other shell features as much as you can. If you see yourself running a set of commands for each input line, it's time to think the structure of the script again. Most of the text processing commands can process a whole file with one execution, so use that.
A trivial/silly example:
while read -r line; do
x=$(echo "$line" | awk '{print $2}')
somecmd "$x"
done < file
would be better as
awk < file '{print $2}' | while read -r x ; do somecmd "$x" ; done
Choose between awk or perl both are efficient

How do I delete all rows with a blank space in the third column within a file?

So, I have a file which contains the results of some calculations I've run in the past weeks. I've collected the results in a file which I intend to plot. It is basically a bunch of rows with the format "x" "y" "f(x,y)", like this:
1.7 4.7 -460.5338556921
1.7 4.9 -460.5368762353
1.7 5.5
However, some lines, exemplified by the last one, contain a blank space in the 3rd column, resulting from failed calculations. I'd still like to plot the viable points, but, as there are thousands of points (and therefore rows) that task just be accomplished easily by hand. I'd like to know how to make a script or program (I'd prefer a shell script, but I'll gladly go along with whatever works), which identifies those lines and deletes them. Does anyone know a way to do it?
awk '$3' <filename>
or better
awk 'NF > 2' <filename> # if in any entry in the column-3 happens to be zero
This will do the purpose!
The simplest form of grep command that should probably be understood by any shell these days:
grep -v '^[^[:space:]]*[[:space:]]*[^[:space:]]*[[:space:]]*$' <filename>
With grep:
grep ' .* [^ ]' file
or using ERE:
grep -E '\s\S+\s\S' file
I would to use:
perl -lanE 'print if #F==3 && /^[\d\s\.+-]+$/' file
will print only lines:
which contains 3 fields
and contains only numbers, spaces, and .+-
I do not know how you are going to plot. You would like a grep or awk solution and pipe all valid lines into your plotting application.
When you need to call a program for each set of values, you can skip the invalid lines when you are reading the values:
while read -r x y fxy; do
if [ -n "${fxy}" ]; then
myplotter "$x" "$y" "${fxy}"
fi
done < file

Different output for pipe in script vs. command line

I have a directory with files that I want to process one by one and for which each output looks like this:
==== S=721 I=47 D=654 N=2964 WER=47.976% (1422)
Then I want to calculate the average percentage (column 6) by piping the output to AWK. I would prefer to do this all in one script and wrote the following code:
for f in $dir; do
echo -ne "$f "
process $f
done | awk '{print $7}' | awk -F "=" '{sum+=$2}END{print sum/NR}'
When I run this several times, I often get different results although in my view nothing really changes. The result is almost always incorrect though.
However, if I only put the for loop in the script and pipe to AWK on the command line, the result is always the same and correct.
What is the difference and how can I change my script to achieve the correct result?
Guessing a little about what you're trying to do, and without more details it's hard to say what exactly is going wrong.
for f in $dir; do
unset TEMPVAR
echo -ne "$f "
TEMPVAR=$(process $f | awk '{print $7}')
ARRAY+=($TEMPVAR)
done
I would append all your values to an array inside your for loop. Now all your percentages are in $ARRAY. It should be easy to calculate the average value, using whatever tool you like.
This will also help you troubleshoot. If you get too few elements in the array ${#ARRAY[#]} then you will know where your loop is terminating early.
# To get the percentage of all files
Percs=$(sed -r 's/.*WER=([[:digit:].]*).*/\1/' *)
# The divisor
Lines=$(wc -l <<< "$Percs")
# To change new lines into spaces
P=$(echo $Percs)
# Execute one time without the bc. It's easier to understand
echo "scale=3; (${P// /+})/$Lines" | bc

Resources