Use awk to remove unwanted spaces - shell

I have the following piece of code:
awk_cmd='{ if ($4 == '"$1"') { printf $0 } }'
printf "$(date +%s)$EPM_DB_SEP" >> "$EPM_RUN_DIR/$2.pid"
ps -e -o user,group,comm,pid,ppid,pgid,etime,nice,rgroup,ruser,time,tty,vsz,stat,rss,args |\
awk "$awk_cmd" | sed 's/ */ /g' >> "$EPM_RUN_DIR/$2.pid"
Can I modify $awk_cmd to avoid using sed later to remove the unwanted spaces?
The awk version implied is the one coming with BusyBox v1.26.2

This is probably what you want:
function awk_cmd { awk -v pid="$1" -v ORS= '$4 == pid{$1=$1; print}'; }
printf "$(date +%s)$EPM_DB_SEP" >> "$EPM_RUN_DIR/$2.pid"
ps -e -o user,group,comm,pid,ppid,pgid,etime,nice,rgroup,ruser,time,tty,vsz,stat,rss,args |
awk_cmd "$1" >> "$EPM_RUN_DIR/$2.pid"
but without sample input/output it's an untested guess.

I’m adding another answer as I can’t edit the first one (I don’t understand why and contacted the site about that…). I’m giving more context and took into account Ed Morton’s comment. Also put "pid" in the first column.
filter_pid() { awk -v pid="$1" -v ORS= '$1 == pid{$1=$1; print}'; }
ps_store_current_info() {
printf "$(date +%s)$EPM_DB_SEP" >> "$EPM_RUN_DIR/$2.pid"
ps -e -o pid,ppid,user,group,comm,pgid,etime,nice,rgroup,ruser,time,tty,vsz,stat,rss,args |\
filter_pid $1 >> "$EPM_RUN_DIR/$2.pid"
printf "\n" >> "$EPM_RUN_DIR/$2.pid"
}
$EPM_DB_SEP equals | and $EPM_RUN_DIR points to a directory.
I well understand that doing this is not very clever because I won’t be able to later use the space as my (sub)field-separator but that’s really is another problem…

Very simple…
awk_cmd='{ if ($4 == '"$1"') { gsub(" *"," ",$0); printf $0 } }'
printf "$(date +%s)$EPM_DB_SEP" >> "$EPM_RUN_DIR/$2.pid"
ps -e -o user,group,comm,pid,ppid,pgid,etime,nice,rgroup,ruser,time,tty,vsz,stat,rss,args |\
awk "$awk_cmd" >> "$EPM_RUN_DIR/$2.pid"

Related

`df' unexpected' checking for diskspace inside a function using a while loop bash script

I am getting an issue where if I call this function below, I get the error line 89: syntax error at line 117: 'df' unexpected.
If I take the code out of the function it works fine.
Is there any reason for the error above?
This is a bash script on RHEL.
function testr{
df -H | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{ print $5 " " $1 }' | while read output;
do
usep=$(echo $output | awk '{ print $1}' | cut -d'%' -f1)
partition=$(echo $output | awk '{ print $2 }')
(.. Sends alert via mail after)
done
}
Maybe a little easier to read this way?
testr_zsh () {
# This (only) works with zsh.
for usep partition in $( df -H | awk 'NR>1 && !/tmpfs|cdrom/{print $5,$1}' | sed -n '/%/s/%//p' )
do
echo "\$usep: $usep, \$partition: $partition"
done
}
testr () {
for fs in $( df -H | awk 'NR>1 && !/tmpfs|cdrom/{print $5"|"$1}' | sed -n '/%/s/%//p' )
do
usep="$(echo "${fs}" | sed 's/|.*//' )"
partition="$(echo "${fs}" | sed 's/.*|//' )"
echo "\$usep: $usep, \$partition: $partition"
done
}
On my computer not all lines that pass through the awk filter have % in them hence adding the sed filter. zsh allows two vars in the for loop which is pretty slick.

Shell awk - Print a position from variable

Here is my string that needs to be parsed.
line='aaa vvv ccc'
I need to print the values one by one.
no_of_users=$(echo $line| wc -w)
If the no_of_users is greater than 1 then I need to print the values one by one.
aaa
vvv
ccc
I used this script.
if [ $no_of_users -gt 1 ]
then
for ((n=1;n<=$no_of_users;n++))
do
-- here is my issue ##echo 'user:'$n $line|awk -F ' ' -vno="${n}" 'BEGIN { print no }'
done
fi
In the { print no } I have to print the value in that position.
You may use this awk:
awk 'NF>1 {OFS="\n"; $1=$1} 1' <<< "$line"
aaa
vvv
ccc
What it does:
NF>1: If number of fields are greater than 1
OFS="\n": Set output field separator to \n
$1=$1: Force restructure of a record
1: Print a record
1st solution: Within single awk could you please try following. Where var is an awk variable which has shell variable line value in it.
awk -v var="$line" '
BEGIN{
num=split(var,arr," ")
if(num>1){
for(i=1;i<=num;i++){ print arr[i] }
}
}'
Explanation: Adding detailed explanation for above.
awk -v var="$line" ' ##Starting awk program and creating var variable which has line shell variable value in it.
BEGIN{ ##Starting BEGIN section of program from here.
num=split(var,arr," ") ##Splitting var into array arr here. Saving its total length into variable num to check it later.
if(num>1){ ##Checking condition if num is greater than 1 then do following.
for(i=1;i<=num;i++){ print arr[i] } ##Running for loop from i=1 to till value of num here and printing arr value with index i here.
}
}'
2nd solution: Adding one more solution tested and written in GNU awk.
echo "$line" | awk -v RS= -v OFS="\n" 'NF>1{$1=$1;print}'
Another option:
if [ $no_of_users -gt 1 ]
then
for ((n=1;n<=$no_of_users;n++))
do
echo 'user:'$n $(echo $line|awk -F ' ' -v x=$n '{printf $x }')
done
fi
You can use grep
echo $line | grep -o '[a-z][a-z]*'
Also with awk:
awk '{print $1, $2, $3}' OFS='\n' <<< "$line"
aaa
vvv
ccc
the key is setting OFS='\n'
Or a really toughie:
printf "%s\n" $line
(note: $line is unquoted)
printf will consume all words in line with word-splitting applied so each word is taken as a single input.
Example Use/Output
$ line='aaa vvv ccc'; printf "%s\n" $line
aaa
vvv
ccc
Using bash:
$ line='aaa vvv'ccc'
$ [[ $line =~ \ ]] && echo -e ${line// /\\n}
aaa
vvv
ccc
$ line=aaa
$ [[ $line =~ \ ]] && echo -e ${line// /\\n}
$
If you are on another shell:
$ line="foo bar baz" bash -c '[[ $line =~ \ ]] && echo -e ${line// /\\n}'
grep -Eq '[[:space:]]' <<< "$line" && xargs printf "%s\n" <<< $line
Do a silent grep for a space in the variable, if true, print with names on separate lines.
awk -v OFS='\n' 'NF>1{$1=$1; print}'
e.g.
$ line='aaa vvv ccc'
$ echo "$line" | awk -v OFS='\n' 'NF>1{$1=$1; print}'
aaa
vvv
ccc
$ line='aaa'
$ echo "$line" | awk -v OFS='\n' 'NF>1{$1=$1; print}'
$
another golfed awk variation
$ awk 'gsub(FS,RS)'
only print if there is a substitution.

Splitting out a large file

I would like to process a 200 GB file with lines like the following:
...
{"captureTime": "1534303617.738","ua": "..."}
...
The objective is to split this file into multiple files grouped by hours.
Here is my basic script:
#!/bin/sh
echo "Splitting files"
echo "Total lines"
sed -n '$=' $1
echo "First Date"
head -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
echo "Last Date"
tail -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
while read p; do
date=$(echo "$p" | sed 's/{"captureTime": "//' | sed 's/","ua":.*//' | xargs -i date -d '#{}' '+%Y%m%d%H')
echo $p >> split.$date
done <$1
Some facts:
80 000 000 lines to process
jq doesn't work well since some JSON lines are invalid.
Could you help me to optimize this bash script?
Thank you
This awk solution might come to your rescue:
awk -F'"' '{file=strftime("%Y%m%d%H",$4); print >> file; close(file) }' $1
It essentially replaces your while-loop.
Furthermore, you can replace the complete script with:
# Start AWK file
BEGIN{ FS='"' }
(NR==1){tmin=tmax=$4}
($4 > tmax) { tmax = $4 }
($4 < tmin) { tmin = $4 }
{ file="split."strftime("%Y%m%d%H",$4); print >> file; close(file) }
END {
print "Total lines processed: ", NR
print "First date: "strftime("%Y%m%d%H",tmin)
print "Last date: "strftime("%Y%m%d%H",tmax)
}
Which you then can run as:
awk -f <awk_file.awk> <jq-file>
Note: the usage of strftime indicates that you need to use GNU awk.
you can start optimizing by changing this
sed 's/{"captureTime": "//' | sed 's/","ua":.*//'
with this
sed -nE 's/(\{"captureTime": ")([0-9\.]+)(.*)/\2/p'
-n suppress automatic printing of pattern space
-E use extended regular expressions in the script

bash command substitution force to foreground

I have this:
echo -e "\n\n"
find /home/*/var/*/logs/ \
-name transfer.log \
-exec awk -v SUM=0 '$0 {SUM+=1} END {print "{} " SUM}' {} \; \
> >( sed '/\b0\b/d' \
| awk ' BEGIN {printf "\t\t\tTRANSFER LOG\t\t\t\t\t#OF HITS\n"}
{printf "%-72s %-s\n", $1, $2}
' \
| (read -r; printf "%s\n" "$REPLY"; sort -nr -k2)
)
echo -e "\n\n"
When run on a machine with bash 4.1.2 always returns correctly except I get all 4 of my new lines at the top.
When run on a machine with bash 3.00.15 it gives all 4 of my new lines at the top, returns the prompt in the middle of the output, and never completes just hangs.
I would really like to fix this for both versions as we have a lot of machines running both.
Why make life so difficult and unintelligible? Why not simplify?
TXFRLOG=$(find /home..... transfer.log)
awk .... ${TXFRLOG}
The answer I found was to use a while read
echo -e "\n\n"; \
printf "\t\t\tTRANSFER LOG\t\t\t\t\t#OF HITS\n"; \
while read -r line; \
do echo "$line" |sed '/\b0\b/d' | awk '{printf "%-72s %-s\n", $1, $2}'; \
done < <(find /home/*/var/*/logs/ -name transfer.log -exec awk -v SUM=0 '$0 {SUM+=1} END{print "{} " SUM}' {} \;;) \
|sort -nr -k2; \
echo -e "\n\n"

Bash: "xargs cat", adding newlines after each file

I'm using a few commands to cat a few files, like this:
cat somefile | grep example | awk -F '"' '{ print $2 }' | xargs cat
It nearly works, but my issue is that I'd like to add a newline after each file.
Can this be done in a one liner?
(surely I can create a new script or a function that does cat and then echo -n but I was wondering if this could be solved in another way)
cat somefile | grep example | awk -F '"' '{ print $2 }' | while read file; do cat $file; echo ""; done
Using GNU Parallel http://www.gnu.org/software/parallel/ it may be even faster (depending on your system):
cat somefile | grep example | awk -F '"' '{ print $2 }' | parallel "cat {}; echo"
awk -F '"' '/example/{ system("cat " $2 };printf "\n"}' somefile

Resources