How do I write an output from bash script to the second line of the CSV file that contains headers? - bash

I have a bash script that writes its output to the beginning of a CSV file. I need it to maintain the headers on the first line. I tried to use awk and sed but didn't succeed.
I got the main script which is used to make a SSH connection:
for n in $(cat list.txt)
do
ssh -t root#$n /etc/m_chkdsk_app.sh
done
list.txt contains servers names
server1
server2
server3
server4
and the run the following script on the remote computers
if [ -f /lnxfiler/diskstatus/m_chkdsk.csv ]
then
printf "$(cat /proc/sys/kernel/hostname)" >> /lnxfiler/diskstatus/New_m_chkdsk.csv && printf "," >> /lnxfiler/diskstatus/New_m_chkdsk.csv && printf "$(date +%d-%m-%Y)" >> /lnxfiler/diskstatus/New_m_chkdsk.csv && df -h | grep /dev/mapper/rootvg-var | awk '{printf "," $2 "," $3 "," $5 "," $6 "\n"}' >> /lnxfiler/diskstatus/New_m_chkdsk.csv
printf "$(cat /proc/sys/kernel/hostname)" >> /lnxfiler/diskstatus/New_m_chkdsk.csv && printf "," >> /lnxfiler/diskstatus/New_m_chkdsk.csv && printf "$(date +%d-%m-%Y)" >> /lnxfiler/diskstatus/New_m_chkdsk.csv && df -h | grep "/dev/mapper/rootvg-sap " | awk '{printf "," $2 "," $3 "," $5 "," $6 "\n"}' >> /lnxfiler/diskstatus/New_m_chkdsk.csv
cat /lnxfiler/diskstatus/m_chkdsk.csv >> /lnxfiler/diskstatus/New_m_chkdsk.csv
mv /lnxfiler/diskstatus/New_m_chkdsk.csv /lnxfiler/diskstatus/m_chkdsk.csv
else
printf "$(cat /proc/sys/kernel/hostname)" >> /lnxfiler/diskstatus/m_chkdsk.csv && printf "," >> /lnxfiler/diskstatus/m_chkdsk.csv && printf "$(date +%d-%m-%Y)" >> /lnxfiler/diskstatus/m_chkdsk.csv && df -h | grep /dev/mapper/rootvg-var | awk '{printf "," $2 "," $3 "," $5 "," $6 "\n"}' >> /lnxfiler/diskstatus/m_chkdsk.csv
printf "$(cat /proc/sys/kernel/hostname)" >> /lnxfiler/diskstatus/m_chkdsk.csv && printf "," >> /lnxfiler/diskstatus/m_chkdsk.csv && printf "$(date +%d-%m-%Y)" >> /lnxfiler/diskstatus/m_chkdsk.csv && df -h | grep "/dev/mapper/rootvg-sap " | awk '{printf "," $2 "," $3 "," $5 "," $6 "\n"}' >> /lnxfiler/diskstatus/m_chkdsk.csv
fi
exit
When I run the main script, I need all the output of the script to be added after the header.
Server Name,Date,Disk Size,Used,Use%,Mounted on
server1,08-09-2020,2.0G,363M,20%,/var
server1,08-09-2020,15G,41M,1%,/usr/sap
server1,08-09-2020,200G,237M,1%,/suse_manager
server2,08-09-2020,2.0G,138M,8%,/var
server2,08-09-2020,20G,6.6G,36%,/srv
server2,08-09-2020,80G,6.7G,9%,/srv/NFS
server3,08-09-2020,2.0G,363M,20%,/var
server3,08-09-2020,15G,41M,1%,/usr/sap
server4,08-09-2020,2.0G,138M,8%,/var
server4,08-09-2020,20G,6.6G,36%,/srv
server4,08-09-2020,80G,6.7G,9%,/srv/NFS

Here's a quick refactoring.
Driver script; don't read lines with for:
head -n 1 result.csv >newresult.csv
while IFS= read -r host; do
do
ssh -t "root#$host" /etc/m_chkdsk_app.sh </dev/null
done < list.txt >>newresult.csv
mv newresult.csv result.csv
Remote script:
df -h /dev/mapper/rootvg-var /dev/mapper/rootvg-sap |
awk -v date=$(date +%d-%m-%Y) 'BEGIN { OFS="," }
NR==FNR { host=$0; next }
/\/dev/ { print host, date, $2, $3, $5, $6 }
' /proc/sys/kernel/hostname - |
tee /lnxfiler/diskstatus/m_chkdsk.csv
The original script had tremendous amounts of repetition but it's of course possible that I have overlooked some crucial difference between almost identical code snippets. That's actually one of the reasons to avoid repeating yourself.
Selectively overwriting the old results on the remote server in slightly different fashion depending on whether the file already exists seemed entirely superfluous, so I took that out.
This assumes that you have old results in result.csv and that the first header line is so incredibly hard to get right that you have to copy it from the old file. It would probably be easier to just hard-code the script to write a new first line.
Depending on how robust you need this to be, maybe actually add set -e to the start of both scripts. If you don't absolutely have to store the results on the remote disk as well, that would cut out one of the main failure scenarios, and simpli|y the script still more.

If you have a file1.csv with data already in it, you can write all your new data to a tempfile.csv and then import it with an easy sed -
sed -i '1rtempfile.csv' file1.csv
This will read and add in the content of tempfile immediately after line 1.

Related

Splitting out a large file

I would like to process a 200 GB file with lines like the following:
...
{"captureTime": "1534303617.738","ua": "..."}
...
The objective is to split this file into multiple files grouped by hours.
Here is my basic script:
#!/bin/sh
echo "Splitting files"
echo "Total lines"
sed -n '$=' $1
echo "First Date"
head -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
echo "Last Date"
tail -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
while read p; do
date=$(echo "$p" | sed 's/{"captureTime": "//' | sed 's/","ua":.*//' | xargs -i date -d '#{}' '+%Y%m%d%H')
echo $p >> split.$date
done <$1
Some facts:
80 000 000 lines to process
jq doesn't work well since some JSON lines are invalid.
Could you help me to optimize this bash script?
Thank you
This awk solution might come to your rescue:
awk -F'"' '{file=strftime("%Y%m%d%H",$4); print >> file; close(file) }' $1
It essentially replaces your while-loop.
Furthermore, you can replace the complete script with:
# Start AWK file
BEGIN{ FS='"' }
(NR==1){tmin=tmax=$4}
($4 > tmax) { tmax = $4 }
($4 < tmin) { tmin = $4 }
{ file="split."strftime("%Y%m%d%H",$4); print >> file; close(file) }
END {
print "Total lines processed: ", NR
print "First date: "strftime("%Y%m%d%H",tmin)
print "Last date: "strftime("%Y%m%d%H",tmax)
}
Which you then can run as:
awk -f <awk_file.awk> <jq-file>
Note: the usage of strftime indicates that you need to use GNU awk.
you can start optimizing by changing this
sed 's/{"captureTime": "//' | sed 's/","ua":.*//'
with this
sed -nE 's/(\{"captureTime": ")([0-9\.]+)(.*)/\2/p'
-n suppress automatic printing of pattern space
-E use extended regular expressions in the script

Argument not recognised/accesed by egrep - Shell

Egrep and Awk to output columns of a line , with a specific value for the first column
I am to tasked to write a shell program which when ran as such
./tool.sh -f file -id id OR ./tool.sh -id id -f file
must output the name surname and birthdate (3 columns of the file ) for that specific id.
So far my code is structured as such :
elif [ "$#" -eq 4 ];
then
while [ "$1" != "" ];
do
case $1 in
-f)
cat < "$2" | egrep '"$4"' | awk ' {print $3 "\t" $2 "\t" $5}'
shift 4
;;
-id)
cat < "$4" | egrep '"$2"' | awk ' {print $3 "\t" $2 "\t" $5}'
shift 4
esac
done
(Ignoring the opening elif cause there are more subtasks for later)
My output is nothing. The program just runs.
I've tested the cat < people.dat | egrep '125' | awk ' {print $3 "\t" $2 "\t" $5}'
and it runs just fine.
I also had an instance where i had an output from the program while it was run like so
cat < "$2" | egrep '["$4"]' | awk ' {print $3 "\t" $2 "\t" $5}'
but it wasnt only that specific ID.
`egrep "$4"` was correct instead of `egrep '["$4"]'` in
`cat < "$2" | egrep '["$4"]' | awk ' {print $3 "\t" $2 "\t" $5}'`
Double quotes allow variables, single quotes don't. No commands need
certain types of quotes, they are purely a shell feature that are not
passed to the command. mentioned by(#that other guy)

Converting date to unix epoch using awk in log files

I have file containing multiple lines in format "[dd.mm.yyyy.] text value". I need to convert this to "Unix epoch| text value". I tried to use awk to do this but I can't seem to find the correct command
For example, if the file is:
[30.08.2013 13:54:49.126] Foo
[30.08.2013 13:56:49.126] Bar
[30.08.2013 13:59:49.126] Foo bar
I use the following (probably too complex awk command):
cat sample.txt | cut -c 2- |awk -F'[. :]' ' { $cmd="date --date " "\""$3$2$1" "$4":"$5":"$6"\""" +%s" ; $cmd |& getline epoch; close($cmd); printf epoch"|"; print $0 ;}';
The problem is that I get the time in epoch correctly but I can't access the rest of the line. The $0 (and other $ variables) contain the date command. So the output is
1377863689|date --date "20130830 13:54:49" +%s
1377863809|date --date "20130830 13:56:49" +%s
1377863989|date --date "20130830 13:59:49" +%s
What I wish to get is
1377863689|Foo
1377863809|Bar
1377863989|Foo bar
Is there a (preferably simple) way of accomplishing this? Should I use some other tool?
Assuming you have gawk (fair assumption since you are using GNU date) you can do this all internally to gawk:
$ awk 'match($0, /\[(.*)\] (.*)/, a) &&
match(a[1], /([0-9]{2})\.([0-9]{2})\.([0-9]{4}) ([0-9:]+)(\.[0-9]+)/,b) {
gsub(/:/," ",b[4])
s=b[3] " " b[2] " " b[1] " " b[4]
print mktime(s) "|" a[2]
}' file
1377896089|Foo
1377896209|Bar
1377896389|Foo bar
Or, a Bash solution:
while IFS= read -r line; do
if [[ "$line" =~ \[([[:digit:]]{2})\.([[:digit:]]{2})\.([[:digit:]]{4})\ +([[:digit:]:]+)\.([[:digit:]]+)\]\ +(.*) ]]
then
printf "%s|%s\n" $(gdate +"%s" --date="${BASH_REMATCH[3]}${BASH_REMATCH[2]}${BASH_REMATCH[1]} ${BASH_REMATCH[4]}") "${BASH_REMATCH[6]}"
fi
done <file
I propose to simplify it to
IFS=' |.|[';
while read -r _ day month year hour _ name; do
date=$(date --date "$year$month$day $hour" +%s);
echo "$date|$name";
done < sample.txt
Or, if you prefer to continue with awk
awk -F'[\\[\\]. ]' '{
split($0,a,"] ")
("date --date \"" $4$3$2" "$5"\" +%s") |& getline date
printf "%s|%s\n",date,a[2]
}' sample.txt

Arguments to change variable values in Bash script

i have this script in bash:
#!/bin/bash
dir="/home/dortiz/Prueba"
for i in $dir/*
do
cat $i | awk '{print $1" " $2" " $3" " $4"\n " $5}' | \
awk '/gi/{print ">" $0; getline; print}' | \
awk '$3>20.00 {print $0; getline; print;}' \
> "${i}.outsel"
done
cd /home/dortiz/Prueba
mv *.outsel /home/dortiz/Prueba2
and i would like to set an argument to change the value after ""awk '$3>"" in an easy way from my main program that will call this script.
i have read something about getopts but i dont uderstand it at all
Thanks a lot in advance
The simplest way is to just pass an argument to your script:
yourscript.sh 20.0
Then in your script
#!/bin/bash
value=$1 # store the value passed in as the first parameter.
dir="/home/dortiz/Prueba"
for i in $dir/*; do
awk '{print $1" " $2" " $3" " $4"\n " $5}' "$i" |
awk '/gi/{print ">" $0; getline; print}' |
awk -v val="$value" '$3>val {print $0; getline; print;}' > "${i}.outsel"
# ^^^^^^^^^^^^^^^
done
...
and the cat|awk|awk|awk pipeline can probably be written like this:
awk -v val="$value" '
$3 > val {
prefix = /gi/ ? ">" : ""
print prefix $1 " " $2" " $3" " $4"\n " $5
}
' "$i" > "$i.outsel"

remove delimiter if condition not satisfied and substitute a string on condition

Consider the below file.
HEAD~XXXX
XXX~XXX~XXX~XXX~XXX~XXX~~WIN~SCRIPT~~~
XXX~XXX~XXX~XXX~XXX~XXX~~WIN~TPSCRI~~~
XXX~XXX~XXX~XXX~XXX~XXX~~WIN~RSCPIT~~~
TAIL~20
wish the Output to be like below for the above:
HEAD~XXXX
XXX~XXX~XXX~XXX~XXX~XXX~~WIN~SCRIPT~~~
XXX~XXX~XXX~XXX~XXX~XXX~~~~~~
XXX~XXX~XXX~XXX~XXX~XXX~~~~~~
TAIL~20
If the 9th field is SCRIPT, I want both 8th & 9th fields to be empty like the 10th & if the line contains words HEAD/TAIL those have to ignored from our above condition, i.e., NF!=13 - will need the header & footer as it is in the input.
I have tried the below, but there should be a smarter way.
awk -F'~' -v OFS='~' '($9 != "Working line takeover with change of CP" {$9 = ""}) && ($9 != "Working line takeover with change of CP" {$8 = ""}) {NF=13; print}' file
the above doesn't work
head -1 file > head
tail -1 file > tail
sed -i '/HDR/d' file
sed -i '/TLR/d' file
sed -i '/^\s*$/d' file
awk -F'~' -v OFS='~' '$9 != "Working line takeover with change of CP" {$9,$8 = ""} {NF=13; print}' file >> file.tmp //syntax error
cat file.tmp >> head
cat tail >> head
echo "" >> head
mv head file1
I'm trying an UNIX shell script with the below requirements.
Consider a file like this..
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~SCRIPT~~~
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~OTHERS~~~~
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~OTHERS~~~
Each file should have 12 fields(~ as delimiter), if not a ~ has to removed.
If anything OTHER than SCRIPT string present in the 10th field, the field has to be removed.
I tried the below in /bin/bash, I know I'm not doing it so well. I'm feeding line to sed & awk commands.
while read readline
echo "entered while"
do
fieldcount=`echo $readline | awk -F '~' '{print NF}'`
echo "Field count printed"
if [ $fieldcount -eq 13 ] && [ $fieldcount -ne 12 ]
then
echo "entering IF & before deletion"
#remove delimiter at the end of line
#echo "$readline~" >> $S_DIR/$1.tmp
#sed -i '/^\s*$/d' $readline
sed -i s'/.$//' $readline
echo "after deletion"
if [ awk '/SCRIPT/' $readline -ne "SCRIPT"]
then
#sed -i 's/SCRIPT//' $readline
replace_what="OTHERS"
#awk -F '~' -v OFS=~ '{$'$replace_what'=''; print }'
sed -i 's/[^,]*//' $replace_what
echo "$readline" >> $S_DIR/$1.tmp
fi
else
echo "$readline" >> $S_DIR/$1.tmp
fi
done < $S_DIR/$1
awk -F'~' -v OFS='~' '$10 != "SCRIPT" {$10 = ""} {NF=12; print}' file
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~SCRIPT~~
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~~~
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~~~
In bash, I would write:
(
# execute in a subshell, so the IFS setting is localized
IFS='~'
while read -ra fields; do
[[ ${fields[9]} != "SCRIPT" ]] && fields[9]=''
echo "${fields[*]:0:12}"
done < file
)
Your followup question:
awk -F'~' -v OFS='~' '
$1 == "HEAD" || $1 == "TAIL" {print; next}
$9 != "SCRIPT" {$8 = $9 = ""}
{NF=13; print}
' file
If you have further questions, please create a new question instead of editing this one.

Resources