Bash: "xargs cat", adding newlines after each file - bash

I'm using a few commands to cat a few files, like this:
cat somefile | grep example | awk -F '"' '{ print $2 }' | xargs cat
It nearly works, but my issue is that I'd like to add a newline after each file.
Can this be done in a one liner?
(surely I can create a new script or a function that does cat and then echo -n but I was wondering if this could be solved in another way)

cat somefile | grep example | awk -F '"' '{ print $2 }' | while read file; do cat $file; echo ""; done

Using GNU Parallel http://www.gnu.org/software/parallel/ it may be even faster (depending on your system):
cat somefile | grep example | awk -F '"' '{ print $2 }' | parallel "cat {}; echo"

awk -F '"' '/example/{ system("cat " $2 };printf "\n"}' somefile

Related

Print output of two commands in one line

I'm got this working:
while sleep 5s
do
lscpu | grep 'CPU MHz:' | cut -d ':' -f 2 | awk '{$1=$1};1' && grep 'cpu ' /proc/stat | awk '{usage=($2+$4)*100/($2+$4+$5)} END {print usage "%"}'
done
And it gives me the following output:
1601.058
3.4811%
1452.514
3.48059%
1993.800
3.48006%
2085.585
3.47955%
2757.776
3.47902%
1370.237
3.47851%
1497.903
3.47798%
But I'd really like to get the two values onto a single line. Every time I try to do this I run into a double / single quote variable issue. Granted I pulled some of this awk stuff from online so I'm not really up to speed on that. I just want to print per line, CPU clock and load ever 5 seconds.
Can you help me find a better way to do that?
You may use process substitution to run lscpu and cat /proc/stat and feed to single command. No need to use pipes.
while sleep 5; do
awk '/CPU MHz:/{printf "%s ", $NF} /cpu /{print ($2+$4)*100/($2+$4+$5)"%"}' <(lscpu) /proc/stat
done
If there is only one input command:
date| awk '{print $1}'
Wed
OR
awk '{print $NF}' <(date)
2019
If more then one command: Example , get the year of of the two date command in same line. (not very useful example, only for sake of demo)
awk '{printf "%s ", $1=NF}END{print ""}' <(date) <(date)
2019 2019
pipe the output of the 2 commands into paste
while sleep 5; do
lscpu | awk -F':[[:blank:]]+' '$1 == "CPU MHz" {print $2}'
awk '$1 == "cpu" {printf "%.4f%%\n", ($2+$4)*100/($2+$4+$5)}' /proc/stat
done | paste - -
The 2 columns will be separated by a tab.
Writing this for readability rather than efficiency, you might consider something like:
while sleep 5; do
cpu_pct=$(lscpu | awk -F': +' '/CPU MHz:/ { print $2 }')
usage=$(awk '/cpu / {usage=($2+$4)*100/($2+$4+$5)} END {print usage "%"}' /proc/stat)
printf '%s\n' "$cpu_pct $usage"
done
Command substitutions implicitly trim trailing newlines, so if lscpu | awk has output that ends in a newline, var=$(lscpu | awk) removes it; thereafter, you can use "$var" without that newline showing up.
All you need to do is change the newline on the first line to a different separator. Something like:
lscpu | ... | tr \\n : && grep ...
You can also use echo -n $(command_with_stdout). The -n switch specifies that the new line (\n) will be omitted.
while sleep 5s; do
echo -n $( lscpu | grep 'CPU MHz:' | cut -d ':' -f 2 | awk '{$1=$1};1' )
echo -n ' **** '
echo $( grep 'cpu ' /proc/stat | awk '{usage=($2+$4)*100/($2+$4+$5)} END {print usage "%"}' )
done
Or the same representation in one line:
while sleep 5s; do echo -n $( lscpu | grep 'CPU MHz:' | cut -d ':' -f 2 | awk '{$1=$1};1' ); echo -n ' **** '; echo $( grep 'cpu ' /proc/stat | awk '{usage=($2+$4)*100/($2+$4+$5)} END {print usage "%"}' ); done
EDIT: (remove -n switch from echo according to Charles Duffy's comment)
while sleep 5s; do echo "$( lscpu | grep 'CPU MHz:' | cut -d ':' -f 2 | awk '{$1=$1};1' ) **** $( grep 'cpu ' /proc/stat | awk '{usage=($2+$4)*100/($2+$4+$5)} END {print usage "%"}' )"; done

Splitting out a large file

I would like to process a 200 GB file with lines like the following:
...
{"captureTime": "1534303617.738","ua": "..."}
...
The objective is to split this file into multiple files grouped by hours.
Here is my basic script:
#!/bin/sh
echo "Splitting files"
echo "Total lines"
sed -n '$=' $1
echo "First Date"
head -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
echo "Last Date"
tail -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
while read p; do
date=$(echo "$p" | sed 's/{"captureTime": "//' | sed 's/","ua":.*//' | xargs -i date -d '#{}' '+%Y%m%d%H')
echo $p >> split.$date
done <$1
Some facts:
80 000 000 lines to process
jq doesn't work well since some JSON lines are invalid.
Could you help me to optimize this bash script?
Thank you
This awk solution might come to your rescue:
awk -F'"' '{file=strftime("%Y%m%d%H",$4); print >> file; close(file) }' $1
It essentially replaces your while-loop.
Furthermore, you can replace the complete script with:
# Start AWK file
BEGIN{ FS='"' }
(NR==1){tmin=tmax=$4}
($4 > tmax) { tmax = $4 }
($4 < tmin) { tmin = $4 }
{ file="split."strftime("%Y%m%d%H",$4); print >> file; close(file) }
END {
print "Total lines processed: ", NR
print "First date: "strftime("%Y%m%d%H",tmin)
print "Last date: "strftime("%Y%m%d%H",tmax)
}
Which you then can run as:
awk -f <awk_file.awk> <jq-file>
Note: the usage of strftime indicates that you need to use GNU awk.
you can start optimizing by changing this
sed 's/{"captureTime": "//' | sed 's/","ua":.*//'
with this
sed -nE 's/(\{"captureTime": ")([0-9\.]+)(.*)/\2/p'
-n suppress automatic printing of pattern space
-E use extended regular expressions in the script

using date variable inside sed command

I am storing date inside a variable and using that in the sed as below.
DateTime=`date "+%m/%d/%Y"`
Plc_hldr1=`head -$i place_holder.txt | tail -1 | awk -F ' ' '{ print $1 }'`
Plc_hldr2=`head -$i place_holder.txt | tail -1 | awk -F ' ' '{ print $2 }'`
sed "s/$Plc_hldr1/$DateTime/;s/$Plc_hldr2/$Total/" html_format.htm >> /u/raskar/test/html_final.htm
While running the sed command I am getting the below error.
sed: 0602-404 Function s/%%DDMS1RT%%/01/02/2014/;s/%%DDMS1C%%/1235/ cannot be parsed.
I suppose this is happening as the date contains the following output which includes slashes '/'
01/02/2014
I tried with different quotes around the date. How do I make it run?
Change the separator to something else that won't appear in your patterns, for example:
sed "s?$Plc_hldr1?$DateTime?;s?$Plc_hldr2?$Total?"
Not the direct quertion but replace
Plc_hldr1=`head -$i place_holder.txt | tail -1 | awk -F ' ' '{ print $1 }'`
Plc_hldr2=`head -$i place_holder.txt | tail -1 | awk -F ' ' '{ print $2 }'`
by
Plc_hldr1=`sed -n "$i {s/ .*//p;q}"`
Plc_hldr2=`sed -n "$i {s/[^ ]\{1,\} \{1,\}\([^ ]\{1,\}\) .*/\1/p;q}"`
and with aix/ksh
sed -n "$i {s/\([^ ]\{1,\} \{1,\}[^ ]\{1,\}\) .*/\1/p;q}" | read Plc_hldr1 Plc_hldr2

Print out onto same line with ":" separating variables

I have the following piece of code and would like to display HOST and RESULT side by side with a : separating them.
HOST=`grep pers results.txt | cut -d':' -f2 | awk '{print $1}'`
RESULT=`grep cleanup results.txt | cut -d':' -f2 | awk '{print $1}' | sed -e 's/K/000/' -'s/M/000000/'`
echo ${HOST}${RESULT}
Please can anyone assist with the final command to display these, I am just getting all of hosts and then all of results.
You probably want this:
HOST=( `grep pers results.txt | cut -d':' -f2 | awk '{ print $1 }'` ) #keep the output of the command in an array
RESULT=( `grep cleanup results.txt | cut -d':' -f2 | awk '{ print $1 }' | sed -e 's/K/000/' -'s/M/000000/'` )
for i in "${!HOST[#]}"; do
echo "${HOST[$i]}:${RESULT[$i]}"
done
A version that works without arrays, using an extra file handle to read from 2 sources at at time.
while read host; read result <&3; do
echo "$host:$result"
done < <( grep peers results.txt | cut -d: -f2 | awk '{print $1}' ) \
3< <( grep cleanup results.txt | cut -d':' -f2 | awk '{print $1}' | sed -e 's/K/000/' -'s/M/000000/')
It's still not quite POSIX, as it requires process substitution. You could instead use explicit fifes. (Also, an attempt to shorten the pipelines that produce the hosts and results. It's probably possible to combine this into a single awk command, since you can either do the substitution in awk, or pipe to sed from within awk. But this is all off-topic, so I leave it as an exercise to the reader.)
mkfifo hostsrc
mkfifo resultsrc
awk -F: '/peers/ {split($2, a, ' '); print a[1]}' results.txt > hostsrc &
awk -F: '/cleanup/ {split($2, a, ' '); print a[1]}' results.txt | sed -e 's/K/000' -e 's/M/000000/' > resultsrc &
while read host; read result <&3; do
echo "$host:$result"
done < hostsrc 3< resultsrc

No output when using awk inside bash script

My bash script is:
output=$(curl -s http://www.espncricinfo.com/england-v-south-africa-2012/engine/current/match/534225.html | sed -nr 's/.*<title>(.*?)<\/title>.*/\1/p')
score=echo"$output" | awk '{print $1}'
echo $score
The above script prints just a newline in my console whereas my required output is
$ curl -s http://www.espncricinfo.com/england-v-south-africa-2012/engine/current/match/534225.html | sed -nr 's/.*<title>(.*
?)<\/title>.*/\1/p' | awk '{print $1}'
SA
So, why am I not getting the output from my bash script whereas it works fine in terminal am I using echo"$output" in the wrong way.
#!/bin/bash
output=$(curl -s http://www.espncricinfo.com/england-v-south-africa-2012/engine/current/match/534225.html | sed -nr 's/.*<title>(.*?)<\/title>.*/\1/p')
score=$( echo "$output" | awk '{ print $1 }' )
echo "$score"
Score variable was probably empty, since your syntax was wrong.

Resources