shell pipe process repeat - bash

I want to do something like this:
~ cat dump.sh
command 1 | command 2 | command 1 | command 2 | ...(ten times) | command 1 | command2
~ ./dump.sh < demo.log
So how to modify dump.sh while I can specify exactly n times of command 1 and command 2 in pair to process demo.log?

You can write a simple recursive helper function, something like this:
loop () {
case $1 in
0) cat;;
*) command 1 | command 2 | loop $(($1 - 1)) ;;
esac
}
Invoke it like
loop 3 <demo.log

Related

Sorting tab delimited numbers by column with pure bash script.

Im stuck on some homework. The requirements of the assignment are to accept an input file and perform some statistics on the values. The user may specify whether to calculate the statistics by row or by value. The shell script must be pure bash script so I can't use awk, sed, perl, python etc.
sample input:
1 1 1 1 1 1 1
39 43 4 3225 5 2 2
6 57 8 9 7 3 4
3 36 8 9 14 4 3
3 4 2 1 4 5 5
6 4 4814 7 7 6 6
I can't figure out how to sort and process the data by column. My code for processing the rows works fine.
# CODE FOR ROWS
while read -r line
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t > sorted.txt
....
#I perform the stats calculations
# for row line by working with the temp file sorted.txt
done
How could I process this data by column? I've never worked with shell script so I've been staring at this for hours.
If you wanted to analyze by columns you'll need the cols value first (number of columns). head -n 1 gives you the first row, and NF counts the number of fields, giving us the number of columns.
cols=$(head -n 1 test.txt | awk '{print NF}');
Then you can use cut with the '\t' delimiter to grab every column from input.txt, and run it through sort -n, as you did in your original post.
$ for i in `seq 2 $((cols+1))`; do cut -f$i -d$'\t' input.txt; done | sort -n > output.txt
For rows, you can use the shell built-in printf with the format modifier %dfor integers. The sort command works on lines of input, so we replace spaces ' ' with newlines \n using the tr command:
$ cat input.txt | while read line; do echo $(printf "%d\n" $line); done | tr ' ' '\n' | sort -n > output.txt
Now take the output file to gather our statistics:
Min: cat output.txt | head -n 1
Max: cat output.txt | tail -n 1
Sum: (courtesy of Dimitre Radoulov): cat output.txt | paste -sd+ - | bc
Mean: (courtesy of porges): cat output.txt | awk '{ $total += $2 } END { print $total/NR }'
Median: (courtesy of maxschlepzig): cat output.txt | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'
Histogram: cat output.txt | uniq -c
8 1
3 2
4 3
6 4
3 5
4 6
3 7
2 8
2 9
1 14
1 36
1 39
1 43
1 57
1 3225
1 4814

Grep variable in for loop

I want to grep a specific line for each loop in a for loop. I've already looked on the internet to see an answer to my problem, I tried them but it doesn't seem to work for me... And I don't find what I'm doing wrong.
Here is the code :
for n in 2 4 6 8 10 12 14 ; do
for U in 1 10 100 ; do
for L in 2 4 6 8 ; do
i=0
cat results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat
for k in $(seq 1 1 $L) ; do
${'var'.$k}=`grep " $k " results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat | tail -n 1`
done
which gives me :
%
%
% site density double occupancy
1 0.49791021 0.03866179
2 0.49891438 0.06077808
3 0.50426102 0.05718336
4 0.49891438 0.06077808
./run_deviation_functionL.sh: line 109: ${'var'.$k}=`grep " $k " results/output_iteration/occ_"$L"_"$n"_"$U"_it"$i".dat | tail -n 1`: bad substitution
Then, I would like to take only the density number, with something like:
${'density'.$k}=`echo "${'var'.$k:10:10}" | bc -l`
Anyone knows the reason why it fails?
Use declare to create variable names from variables:
declare density$k="`...`"
Use the variable indirection to retrieve them:
var=var$k
echo ${!var:10:10}

Counting letters in a file in shell script

I need a shell script/powershell, what count similar letters in a file.
Input:
this is the sample of this script.
This script counts similar letters.
Output:
t 9
h 4
i 8
s 10
e 4
a 2
...
In PowerShell, you can do it with the Group-Object cmdlet:
function Count-Letter {
param(
[String]$Path,
[Switch]$IncludeWhitespace,
[Switch]$CaseSensitive
)
# Read the file, convert to char array, and pipe to group-object
# Convert input string to lowercase if CaseSensitive is not specified
$CharacterGroups = if($CaseSensitive){
(Get-Content $Path -Raw).ToCharArray() | Group-Object -NoElement
} else {
(Get-Content $Path -Raw).ToLower().ToCharArray() | Group-Object -NoElement
}
# Remove any whitespace character group if IncludeWhitespace parameter is not bound
if(-not $IncludeWhitespace){
$CharacterGroups = $CharacterGroups |Where-Object { "$($_.Name)" -match "\S" }
}
# Return the groups, letters first and count second in a default format-table
$CharacterGroups |Select-Object #{Name="Letter";Expression={$_.Name}},Count
}
This is what the output looks like on my machine with your sample input + a linebreak
This one liner should do:
awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++)if(tolower($i)~/[a-z]/)a[tolower($i)]++}
END{for(x in a)print x, a[x]}' file
output for your example:
u 1
h 4
i 8
l 3
m 2
n 1
a 2
o 2
c 3
p 3
r 4
e 4
f 1
s 10
t 9
powershell one liner:
"this is the sample of this script".ToCharArray() | group -NoElement | sort Count -Descending | where Name -NE ' '
echo "this is the sample of this script. \
This script counts similar letters." | \
grep -o '.' | sort | uniq -c | sort -rg
Output, sorted, most frequent letters first:
10 s
10
8 t
8 i
4 r
4 h
4 e
3 p
3 l
3 c
2 o
2 m
2 a
2 .
1 u
1 T
1 n
1 f
Notes: no sed or awk needed; a simple grep -o '.' does all the heavy lifting. To not count spaces and punctuation, replace '.' with '[[:alpha:]]' |:
echo "this is the sample of this script. \
This script counts similar letters." | \
grep -o '[[:alpha:]]' | sort | uniq -c | sort -rg
To count capital and lower case letters as one, use the --ignore-case option of sort and uniq:
echo "this is the sample of this script. \
This script counts similar letters." | \
grep -o '[[:alpha:]]' | sort -i | uniq -ic | sort -rg
Output:
10 s
9 t
8 i
4 r
4 h
4 e
3 p
3 l
3 c
2 o
2 m
2 a
1 u
1 n
1 f
echo "this is the sample of this script" | \
sed -e 's/ //g' -e 's/\([A-z]\)/\1|/g' | tr '|' '\n' | \
sort | grep -v "^$" | uniq -c | \
awk '{printf "%s %s\n",$2,$1}'

the logic behind bash power set function

The function (output the power set of a given input)
p() { [ $# -eq 0 ] && echo || (shift; p "$#") |
while read r ; do echo -e "$1 $r\n$r"; done }
Test Input
p $(echo -e "1 2 3")
Test Output
1 2 3
2 3
1 3
3
1 2
2
1
I have difficulty grasping the recursion in the following code. I tried to understand it by placing some variables inside of the code to denote the level of recursion and the order of execution, but I am still puzzled.
Here are the things I can tell so far:
The subshell's output will not be shown on the final output, as it gets redirected to the read command via pipe
The echo command appends new line for all of its output
The order of execution I see is:
p (1 2 3) -> 1 followed by all combination of output below\n
all combination of output below
p (2 3) -> 2 3\n3\n
p (3) -> 3
p () ->
So I think I should have p(2) instead of p(3) on execution #3, but how does that happen? Since shift only goes in one direction.
If I were to use "p(1 2 3 4)" as the input, it is the part that shows "1 2 3" in the output that confuses me.
The use of -e in the echo command seems to me pure obfuscation, since it could have been written:
p() { [ $# -eq 0 ] && echo || (shift; p "$#") |
while read r ; do
echo $1 $r
echo $r
done
}
In other words, "for every set in the power set of all but the first argument (shift; p "$#"), output both that set with and without the first argument."
The bash function works by setting up a chain of subshells, each one reading from the next one, something like this, where each box is a subshell and below it, I've shown its output as it reads each line of input: (I used "" to make "nothing" visible. => means "call"; <- means "read".)
+---------+ +-------+ +-------+ +-------+
| p 1 2 3 | ==> | p 2 3 | ==> | p 3 | ==> | p |
+---------+ +-------+ +-------+ +-------+
1 2 3 "" <--+-- 2 3 "" <---+-- 3 "" <-----+-- ""
2 3 "" <-/ / /
1 3 "" <--+-- 3 "" <-/ /
3 "" <-/ /
1 2 "" <--+-- 2 "" <---+-- "" <-/
2 "" <-/ /
1 "" <--+-- "" <-/
"" <-/

Nested getline in AWK script

Please let me know if we can use nested getline within AWK scripts like:
while ( ("tail -f log" |& getline var0) > 0) {
while ( ("ls" | getline ) > 0) {
}
close("ls")
while ( ("date" | getline ) > 0) {
}
close("date")
}
close("tail -f log")
What is the depth we can make use of nested getline functionality and will there be any data loss of output at any level of the nested getline? What are the things we should make sure in implementing this style?
==================================================================================
UPDATE===================UPDATE==============UPDATE===============UPDATE=======
Requirement : Provide real time statistical data and errors by probing QA box and webserver / services logs and system status. Report would be generated in following format:
Local Date And Time | Category| Component | Condition
Assumption -: AWK script would execute faster than shell script with added advantage of using its inbuilt parsing and other functionalities.
Implementation : - The main command loop is command0="tail -f -n 0 -s 5 ...........". This command would start an infinite loop extracting appended logs of services / webserver of QA box. . Note the -f, -s and –n options which makes to dump all appended data to logs, sleep for 5 seconds after each iterations and start without printing any default content from the existing logs.
After each iteration, capture and verify the system time and execute various OS resource commands after 10 seconds interval (5 seconds sleep in-between each iteration and 4 seconds after processing the tail output – assuming that processing all tail command roughly take 1 sec, hence in all 10 seconds)
Various command I have used for extracting OS resources are:
I. command1="vmstat | nl | tr -s '\\t '"
II. command2="sar -W 0"
III. command3="top -b -n 1 | nl | tr -s '\\t '"
IV. command4="ls -1 /tmp | grep EXIT"
Search for respective command(?) in the script and go thru the while loop of it in the script to figure output processing of the respective command. Note I nave used ‘nl’ command for development / coding ease
Ultimately presence of /tmp/EXIT file on the box will make the script to exit after removing the same from the box
Below is my script - I have added comments as much as possible for self explanatory:
#Useage - awk -f script.awk
BEGIN {
command0="tail -f -n 0 -s 5 /x/web/webserver/*/logs/error_log /x/web/webserver/service/*/logs/log"
command1="vmstat | nl | tr -s '\\t '"
command2="sar -W 0"
command3="top -b -n 1 | nl | tr -s '\\t '"
command4="ls -1 /tmp | grep EXIT"
format = "%a %b %e %H:%M:%S %Z %Y"
split("", details)
split("", fields)
split("", data)
split("", values)
start_time=0
printf "\n>%s:\n\n", command0 #dummy print for debuggng command being executed
while ( (command0 |& getline var0) > 0) { #get the command output
if (start_time == 0) #if block to reset the start_time variable
{
start_time = systime() + 4
}
if (var0 ~ /==>.*<==/) { #if block to extract the file name from the tail output - outputted in '==>FileName<==' format
gsub(/[=><]/, "", var0)
len = split(var0, name, "/")
if(len == 7) {file = name[5]} else {file = name[6]}
}
if (len == 7 && var0 ~ /[Ee]rror|[Ee]xception|ORA|[Ff]atal/) { #extract the logs error statements
print strftime(format,systime()) " | Error Log | " file " | Error :" var0
}
if(systime() >= start_time) #check if curernt system time is greater than start_time as computed above
{
start_time = 0 #reset the start_time variable and now execute the system resource command
printf "\n>%s:\n\n", command1
while ( (command1 |& getline) > 0) { #process output of first command
if($1 <= 1)
continue #not needed for processing skip this one
if($1 == 2) #capture the fieds name and skip to next line
{
for (i = 1; i <= NF; i++){fields[$i] = i;}
continue
}
if ($1 == 3) #store the command data output in data array
split($0, data);
print strftime(format,systime()) " | System Resource | System | Time spent running non-kernel code :" data[fields["us"]]
print strftime(format,systime()) " | System Resource | System | Time spent running kernel code :" data[fields["sy"]]
print strftime(format,systime()) " | System Resource | System | Amount of memory swapped in from disk :" data[fields["si"]]
print strftime(format,systime()) " | System Resource | System | Amount of memory swapped to disk :" data[fields["so"]]
}
close(command1)
printf "\n>%s:\n\n", command2 #start processing second command
while ( (command2 |& getline) > 0) {
if ($4 ~ /[0-9]+[\.][0-9]+/) #check for 4th positional value if its format is of "int.intint" format
{
if( $4 > 0.0) #dummy check now to print if page swapping
print strftime(format,systime()) " | System Resource | Disk | Page rate is > 0.0 reads/second: " $4
}
}
close(command2)
printf "\n>%s:\n\n", command3 # start processing command number 3
while ( (command3 |& getline ) > 0) {
if($1 == 1 && $0 ~ /load average:/) #get the load average from the output if this is the first line
{
split($0, arr, ",")
print strftime(format,systime())" | System Resource | System |" arr[4]
}
if($1 > 7 && $1 <= 12) # print first top 5 process that are consuming most of the CPUs time
{
f=split($0, arr, " ")
if(f == 13)
print strftime(format,systime())" | System Resource | System | CPU% "arr[10]" Process No: "arr[1] - 7" Name: "arr[13]
}
}
close(command3)
printf "\n>%s:\n\n", command4 #process command number 4 to check presence of file
while ( (command4 |& getline var4) > 0) {
system("rm -rf /tmp/EXIT")
exit 0 #if file is there then remove the file and exit this script execution
}
close(command4)
}
}
close(command0)
}
Output -:
>tail -f -n 0 -s 5 /x/web/webserver/*/logs/error_log /x/web/webserver/service/*/logs/log:
>vmstat | nl | tr -s '\t ':
Sun Dec 16 23:05:12 PST 2012 | System Resource | System | Time spent running non-kernel code :9
Sun Dec 16 23:05:12 PST 2012 | System Resource | System | Time spent running kernel code :9
Sun Dec 16 23:05:12 PST 2012 | System Resource | System | Amount of memory swapped in from disk :0
Sun Dec 16 23:05:12 PST 2012 | System Resource | System | Amount of memory swapped to disk :2
>sar -W 0:
Sun Dec 16 23:05:12 PST 2012 | System Resource | Disk | Page rate is > 0.0 reads/second: 3.89
>top -b -n 1 | nl | tr -s '\t ':
Sun Dec 16 23:05:13 PST 2012 | System Resource | System | load average: 3.63
Sun Dec 16 23:05:13 PST 2012 | System Resource | System | CPU% 12.0 Process No: 1 Name: occworker
Sun Dec 16 23:05:13 PST 2012 | System Resource | System | CPU% 10.3 Process No: 2 Name: occworker
Sun Dec 16 23:05:13 PST 2012 | System Resource | System | CPU% 6.9 Process No: 3 Name: caldaemon
Sun Dec 16 23:05:13 PST 2012 | System Resource | System | CPU% 6.9 Process No: 4 Name: occmux
Sun Dec 16 23:05:13 PST 2012 | System Resource | System | CPU% 6.9 Process No: 5 Name: top
>ls -1 /tmp | grep EXIT:
This is your second post that I can recall about using getline this way. I mentioned last time that it was the wrong approach but it looks like you didn't believe me so let me try one more time.
Your question of "how do I use awk to execute commands with getline to read their output?" is like asking "how do I use a drill to cut glass?". You could get an answer telling you to tape over the part of the glass where you'll be drilling to avoid fracturing it and that WOULD answer your question but the more useful answer would probably be - don't do that, use a glass cutter.
Using awk as a shell from which to call commands is 100% the wrong approach. Simply use the right tool for the right job. If you need to parse a text file, use awk. If you need to manipulate files or processes or invoke commands, use shell (or your OS equivalent).
Finally, please read http://awk.freeshell.org/AllAboutGetline and don't even think about using getline until you fully understand all the caveats.
EDIT: here's a shell script to do what your posted awk script does:
tail -f log |
while IFS= read -r var0; do
ls
date
done
Look simpler? Not saying it makes sense to do that, but if you did want to do it, THAT's the way to implement it, not in awk.
EDIT: here's how to write the first part of your awk script in shell (bash in this case), I ran out of enthusiasm for translating the rest of it for you and I think this shows you how to do the rest yourself:
format = "%a %b %e %H:%M:%S %Z %Y"
start_time=0
tail -f -n 0 -s 5 /x/web/webserver/*/logs/error_log /x/web/webserver/service/*/logs/log |
while IFS= read -r line; do
systime=$(date +"%s")
#block to reset the start_time variable
if ((start_time == 0)); then
start_time=(( systime + 4 ))
fi
#block to extract the file name from the tail output - outputted in '==>FileName<==' format
case $var0 in
"==>"*"<==" )
path="${var0%% <==}"
path="${path##==> }"
name=( ${path//\// } )
len="${#name[#]}"
if ((len == 7)); then
file=name[4]
else
file=name[5]
fi
;;
esac
if ((len == 7)); then
case $var0 in
[Ee]rror|[Ee]xception|ORA|[Ff]atal ) #extract the logs error statements
printf "%s | Error Log | %s | Error :%s\n" "$(date +"$format")" "$file" "$var0"
;;
esac
fi
#check if curernt system time is greater than start_time as computed above
if (( systime >= start_time )); then
start_time=0 #reset the start_time variable and now execute the system resource command
....
Note that this would execute slightly faster than your awk script but that absolutely does not matter at all since your tail is taking 5 second breaks between iterations.
Also note that all I'm doing above is translating your awk script into shell, it doesn't necessarily mean it'd be the best way to write this tool from scratch.

Resources