Bash - Extract Matching String from GZIP Files Is Running Very Slow - bash

Complete novice in Bash. Trying to iterate thru 1000 gzip files, may be GNU parallel is the solution??
#!/bin/bash
ctr=0
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
for f in "$dir"/*.gz; do
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr" >> $1
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym
else
ctr=$((ctr+1))
fi
done
done
Any help to speed the process will be greatly appreciated !!!

#!/bin/bash
ctr=0
export ctr
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
export dir
doit() {
f="$1"
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr"
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym >&2
else
ctr=$((ctr+1))
fi
done
}
export -f doit
parallel doit ::: *gz 2>&1 > $1

The Bash while read loop is probably your main bottleneck here. Calling multiple external processes for simple field splitting will exacerbate the problem. Briefly,
while IFS="|" read -r first second third rest; do ...
leverages the shell's built-in field splitting functionality, but you probably want to convert the whole thing to a simple Awk script anyway.
echo "file_name,symbol,record_count" > "$1"
for f in "/data/myfolder"/*.gz; do
gunzip -c "$f" |
awk -F "\|" -v f="$f" -v OFS="," '
/H/ { if(ctr) print f, sym, ctr
ctr=0; sym=$3;
print sym >"/dev/stderr"
next }
{ ++ctr }'
done >>"$1"
This vaguely assumes that printing the lone sym is just for diagnostics. It should hopefully not be hard to see how this can be refactored if this is an incorrect assumption.

Related

Bash - disk utilization notification

This script should output a warning notification for the utilization of the main disk if over 50%, but it provides no output. My disk is currently sat at 60% so it should in theory work.
I have added an else statement to identify if the loop is not working but the else statement isnt triggered.
I'm provided no error so its hard to identify where i have gone wrong specifically.
#!/bin/bash
df -H | grep /dev/sda2 | awk '{ printf "%d", $5}' > diskOutput.txt
input="diskOutput.txt"
while IFS= read -r line
do
if [ $line -gt 50 ]
then
up="`uptime | cut -b 1-9`"
output="WARNING UTILISATION $line - $up"
echo "$output"
else
echo "no-in"
fi
done < $input
#rm diskOutput.txt
echo "finished"
Try this.
#!/bin/bash
df -H | grep /dev/sda2 | awk '{ printf "%d", $5}' > diskOutput.txt
echo "" >>diskOutput.txt
input="diskOutput.txt"
while IFS= read -r line
do
if [ $line -gt 50 ]
then
up="`uptime | cut -b 1-9`"
output="WARNING UTILISATION $line - $up"
echo "$output"
else
echo "no-in"
fi
done < $input
#rm diskOutput.txt
echo "finished"
You are setting an internal field separator as space here.
while IFS= read -r line
But when creating file, with %d you are removing all char except digits.

Bash ping output in csv format

My aim is to transform the output (the last 2 lines) of the ping command in a CSV style.
Here are some examples:
In case there is a packet loss lower than 100% <
URL, PacketLoss, Min, Average, Max, Deviation
In case there is packet loss equal to 100%
URL, 100, -1, -1, -1, -1
My script is below, but when the packet loss is 100% the output is:
URL, 100,
So the problem is at the if statement, as it does not enter in elif, I use the same syntax as checking if the address is full or not (with "www." or not).
Can you please have a look because I tried multiple things and it did not work.
My script:
#!/bin/bash
declare site=''
declare result='';
if [[ "$1" == "www."* ]]; then
site="$1";
else
site="www.$1";
fi
result="$site";
pingOutput=$(ping $site -c10 -i0.2 -q| tail -n2);
fl=true;
while IFS= read -r line
do
# !!! The problem is here, the if statement is not working properly and I do not know why !!!
if [ "$fl" == "true" ]; then
result="$result $(echo "$line" | cut -d',' -f3 | cut -d" " -f2 | sed -r 's/%//g')";
fl=false;
elif [[ "$line" == "ms"* ]]; then
result="$result $(echo "$line" | cut -d' ' -f4 | sed -r 's/\// /g')";
else
result="$result -1 -1 -1 -1";
fi
done <<< "$pingOutput"
echo "$result";
This is a pretty old question but I've just stumbled upon it today. Below I paste a slight modified version of the above script that fixes the if issue and works on Mac OS.
P.S. You can uncomment the # prctg=100.0% line to see the if working.
#!/bin/bash
declare site=''
declare result=''
declare prctg=''
[[ "$1" == "www."* ]] && site="$1" || site="www.$1"
result="$site"
pingOutput=$(ping $site -c10 -i0.2 -q | tail -n2)
fl=true
while IFS= read -r line
do
#echo $line
if [ "$fl" == "true" ]
then
prctg=$(echo "$line" | grep -Eo "[.[:digit:]]{1,10}%")
result="$result,$prctg"
fl=false
# prctg=100.0%
else
if [ "$prctg" == "100.0%" ]
then
result="$result,-1,-1,-1,-1"
else
result="$result,$(echo "$line" | cut -d' ' -f4 | sed -E 's/\//,/g')"
fi
fi
done <<< "$pingOutput"
echo "$result"
I hope it helps someone from the future! :)
Since the second line of the pingOutput was never processed (the loop ended before) the action of adding the -1 to the output was never performed.
Due to this problem I decided to capture the percentage of failure and act when no packets were returned (100%), I also simplified some expressions you used initially.
I investigated the script and came up with the following solution:
#!/bin/bash
declare site=''
declare result=''
declare prctg=''
[[ "$1" == "www."* ]] && site="$1" || site="www.$1"
result="$site"
pingOutput=$(ping $site -c10 -i0.2 -q| tail -n2)
fl=true
while IFS= read -r line
do
# !!! The problem is here, the if statement is not working properly and I do not know why !!!
echo $line
if [ "$fl" == "true" ]
then
prctg=$(echo "$line" | grep -Po "[0-9]{0,3}(?=%)")
result="$result $prctg"
fl=false
fi
if [ "$prctg" == "100" ]
then
result="$result -1 -1 -1 -1"
else
result="$result $(echo "$line" | cut -d' ' -f4 | sed -r 's/\// /g')"
fi
done <<< "$pingOutput"
echo "$result"

Remove one directory component from path (string manipulation)

I'm looking for the easiest and most readable way to remove a field from a path. So for example, I have /this/is/my/complicated/path/here, and I would like to remove the 5th field ("/complicated") from the string, using bash commands, so that it becomes /this/is/my/path.
I could do this with
echo "/this/is/my/complicated/path/here" | cut -d/ -f-4
echo "/"
echo "/this/is/my/complicated/path/here" | cut -d/ -f6-
but I would like this done in just one easy command, something that would like
echo "/this/is/my/complicated/path" | tee >(cut -d/ -f-4) >(cut -d/ -f6-)
except that this doesn't work.
With cut, you can specify a comma separated list of fields to print:
$ echo "/this/is/my/complicated/path/here" | cut -d/ -f-4,6-
/this/is/my/path/here
So, it's not really necessary to use two commands.
How about using sed?
$ echo "/this/is/my/complicated/path/here" | sed -e "s%complicated/%%"
/this/is/my/path/here
This removes the 5th path element
echo "/this/is/my/complicated/path/here" |
perl -F/ -lane 'splice #F,4,1; print join("/", #F)'
just bash
IFS=/ read -a dirs <<< "/this/is/my/complicated/path/here"
newpath=$(IFS=/; echo "${dirs[*]:0:4} ${dirs[*]:5}")
Anything wrong with a bash script?
#!/bin/bash
if [ -z "$1" ]; then
us=$(echo $0 | sed "s/^\.\///") # Get rid of a starting ./
echo " "Usage: $us StringToParse [delimiterChar] [start] [end]
echo StringToParse: string to remove something from. Required
echo delimiterChar: Character to mark the columns "(default '/')"
echo " "start: starting column to cut "(default 5)"
echo " "end: last column to cut "(default 5)"
exit
fi
# Parse the parameters
theString=$1
if [ -z "$2" ]; then
delim=/
start=4
end=6
else
delim=$2
if [ -z "$3" ]; then
start=4
end=6
else
start=`expr $3 - 1`
if [ -z "$4" ]; then
end=6
else
end=`expr $4 + 1`
fi
fi
fi
result=`echo $theString | cut -d$delim -f-$start`
result=$result$delim
final=`echo $theString | cut -d$delim -f$end-`
result=$result$final
echo $result

Move certain lines to the preceding line

I have a list that looks like this:
sharename:shareX
comment:commentX
sharename:shareY
comment:commentY
sharename:shareZ
comment:commentZ
and so on...
And this is how I would like the list to look like:
shareX;commentX
shareY;commentY
shareZ;commentZ
How can I accomplish that in bash?
Pure Bash:
IFS=':'
while read a b; read c d; do # read 2 lines
echo -e "$b:$d"
done < "$infile"
one liner:
odd=0; for i in `cat list | cut -d":" -f2`; do if [ $odd -eq 0 ]; then echo -ne $i; odd=1; else echo $i; odd=0; fi; done
formatted:
odd=0;
for i in `cat list | cut -d":" -f2`;
do
if [ $odd -eq 0 ];
then
echo -ne $i";";
odd=1;
else
echo $i;
odd=0;
fi;
done
untested, the sed part may be wrong
paste -d ';' - - < filename | sed -r 's/(^|;)[^:]:/\1/g'
This might work for you:
sed '$!N;s/[^:]*:\([^\n]*\)\n[^:]*:/\1;/' file
shareX;commentX
shareY;commentY
shareZ;commentZ

Simplest Bash code to find what files from a defined list don't exist in a directory?

This is what I came up with. It works perfectly -- I'm just curious if there's a smaller/crunchier way to do it. (wondering if possible without a loop)
files='file1|file2|file3|file4|file5'
path='/my/path'
found=$(find "$path" -regextype posix-extended -type f -regex ".*\/($files)")
for file in $(echo "$files" | tr '|', ' ')
do
if [[ ! "$found" =~ "$file" ]]
then
echo "$file"
fi
done
You can do this without invoking any external tools:
IFS="|"
for file in $files
do
[ -f "$file" ] || printf "%s\n" "$file"
done
Your code will break if you have file names with whitespace. This is how I would do it, which is a bit more concise.
echo "$files" | tr '|' '\n' | while read file; do
[ -e "$file" ] || echo "$file"
done
You can probably play around with xargs if you want to get rid of the loop all together.
$ eval "ls $path/{${files//|/,}} 2>&1 1>/dev/null | awk '{print \$4}' | tr -d :"
Or use awk
$ echo -n $files | awk -v path=$path -v RS='|' '{printf("! [[ -e %s ]] && echo %s\n", path"/"$0, path"/"$0) | "bash"}'
without whitespace in filenames:
files=(mbox todo watt zoff xorf)
for f in ${files[#]}; do test -f $f || echo $f ; done

Resources