Kill a database query after command outputs a pattern - bash

I have the following command that downloads a file from a database, then checks its integrity and exits:
cmd_db -get file_1
ouput:
Getting file_1
Transferring data (10% done)
Transferring data (20% done)
...
Transferring data (90% done)
Transferring data (100% done)
Getting MD5
ERROR: stream broke
Exiting...
The file downloads properly but is erased after the error occurs.
I managed to manually kill the command when the output reaches "100% done".
After checking the MD5 manually, the file is actually fine.
There is something wrong with my connection and I already spent days trying to figure it out. So far the manual killing is the only solution that works.
So, I am trying to automate the killing but none of the following commands work. The program actually stops producing an output when it reaches "100% done", but continues until the end and erases the file.
cmd_db -get file_1 | sed '/100% done/q'
cmd_db -get file_1 | while read line; do test "$line" = "100% done" && killall; done
until cmd_db -get file_1 | grep -m 1 "100% done"; do : ; done
Is there any other way to do that?

Alright, since I don't know what your command is, I'll have to work around it. First, redirect the output of the command to grep, which will will look for the '100% done' and put that in a file. Put the whole thing in the background:
cmd_db -get file_1 | grep '100% done' > tmp$$ &
You are going to need that process ID, so that we can kill it later:
child=$!
Now, that file is going to be empty until we get that string. Put yourself into an infinite loop, and test the file size until it's greater than zero, then kill the child process and break out of the loop:
while true; do
[ $( wc -c tmp$$ | cut -d' ' -f1 ) -gt 0 ] && { kill -9 $child; break; }
done
Good luck!

Related

"WRITE" command works manually but not via script

My Co-Workers and I use the screen program on our Linux JUMP server to utilize as much screen space as possible. With that, we have multiple screens setup so that messages can go to one while we do work in another.
With that, i have a script that is used to verify network device connectivity which will send messages to my co-workers regardless if there is anything down or not.
The script initially references a file with their usernames in it and then grabs the highest PTS number which denotes the last screen session they activated and then puts it into the proper format in an external file like such:
cat ./netops_techs | while read -r line; do
temp=$(echo $line)
temp2=$(who | grep $temp | sed 's/[^0-9]*//g' | sort -n -r | head -n1)
if who | grep $temp; then
echo "$temp pts/$temp2" >> ./tech_send
fi
done
Once it is done, it will then scan our network every 5 minutes and send updates to the folks in the file "./tech_send" like such:
Techs=$(cat ./tech_send)
if [ ! -f ./Failed.log ]; then
echo -e "\nNo network devices down at this time."
for d in $Techs
do
cat ./no-down | write $d
done
else
# Writes downed buildings localy to my terminal
echo -e "\nThe following devices are currently down:"
echo ""
echo "IP Hostname Model Building Room Rack Users Affected" > temp_down.log
grep -f <(sed 's/.*/\^&\\>/' Failed.log) Asset-Location >> temp_down.log
cat temp_down.log | column -t > Down.log
cat Down.log
# This will send the downed buildings to the rest of NetOps
for d in $Techs
do
cat Down.log | write $d
done
fi
The issue stems from, when they are working in their main sectioned screen, the messages will pop up in that active screen instead of the inactive screen. If I send them a message manually such as:
write jsmith pts/25
Test Test
and then CTRL+D, it works fine even if they are in a different session. Via script though, it gives an error stating that:
write: jsmith is logged in more than once; writing to pts/23
write: jsmith/pts/25 is not logged in
I have verified the "tech_send" file and it has the correct format for them:
jsmith pts/25
Would appreciate any insight on why this is happening.

How to compare the last words of the lines in a file

I'm using a Raspberry Pi as a backup server. I use cron to run each backup job nightly and log the output to a file specific to each job. So each morning I have a bunch of log files (job1.log .. jobN.log). The log files are overwritten each time the job runs. I have another cron job (that runs after all the backup jobs) that sends me an email showing the last line of each log file. This all works as expected.
I'd like to be able to get a status in the subject of the email based on the last lines of the log files. When a backup job is successfully completed, the last line of the log file has some info followed by the word "completed" (which isn't included if the job fails). In my script that sends the email, I use "tail -1 >> summary.txt" for each log file, so summary.txt is a collection containing the last line of each logfile (and is included in the body of the email sent to me).
What I'd like to do is to check the last word of each line in summary.txt to see if all jobs completed successfully, and set the subject of the email appropriately (a simple "backup succeeded" or "backup failed" would be sufficient).
What would be the best way to do this? I know one possibility would be to use awk '{print $NF}' to get the last word of each line, but I'm not sure how to use that.
EDIT: As requested, here is the simplified code I'm currently using to send the "status" email to myself:
#!/bin/sh
tail -1 job1.log > summary.txt
tail -1 job2.log >> summary.txt
tail -1 job3.log >> summary.txt
mail -s "PI Backup Report" myemail#myhost < summary.txt
I know I could create an additional file with just the last lines by adding
awk '{print $NF}' summary.txt > results.txt
to the above script before the "mail" line, but then I still need to parse the results.txt file. How would I determine the status based on that file? Thanks again!
Measure total vs success lines in summary.txt.
xargs echo to trim results of excess whitespace
grep with regex specifying the line should end in "completed"
wc -l for line count
Set the title using an if statement
TOTAL=$(wc -l < summary.txt | xargs echo)
SUCCESS=$(grep -e 'completed$' summary.txt | wc -l)
title=$(if [ $TOTAL = $SUCCESS ]; then echo 'All Succeeded'; else echo "$SUCCESS/$TOTAL succeeded"; fi)
echo $title # or pass into mail command as subject

How to add a time stamp on each line of grep output?

I have a long running process running and I want to monitor its RAM usage. I can do this by watching top. However I would like to be able to log out and have a record written, every minute say, to a shared disk space instead.
My solution which works is:
nohup top -b -d 60 -p 10036|grep 10036 >> ramlog.txt &
But I would like to know when each line is outputted too. How can I modify the one-liner to add this information on each line?
I know about screen and tmux but I would like to get this simple one-liner working.
You could add a loop that reads each line from grep and prepends a date. Make sure to use grep --line-buffered to ensure each line is printed without delay.
nohup top -b -d 60 -p 10036 |
grep --line-buffered 10036 |
while read line; do echo "$(date): $line"; done >> ramlog.txt &

splitting a CSV and keeping the header without intermediate files

I am trying to split a dozen 100MB+ csv files into managable smaller files for a curl post.
I have managed to do it but with a lot of temporary files and IO. It's taking an eternity.
I am hoping someone can show me a way to do this much more effectively; preferably with little to no disk io
#!/bin/sh
for csv in $(ls *.csv); do
tail -n +2 $csv | split -a 5 -l - $RANDOM.split.
done
# chose a file randomly to fetch the header from
header=$(ls *.csv |sort -R |tail -1 | cut -d',' -f1)
mkdir split
for x in $(/usr/bin/find . -maxdepth 1 -type f -name '*.split.*'); do
echo Processing $x
cat header $x >> split/$x
rm -f $x
done
The above script may not entirely work. I basically got it working through a combination of these commands.
I decided to make the curl POST another step entirely in the case of upload failure; I didn't want to lose the data if it were all posted. But, if, say, on error from curl the data could be put into a redo folder then that can work.
#!/bin/sh
# working on a progress indicator as a percentage. Never finished.
count=$(ls -1 | wc -l 2> /dev/null | cut -d' ' -f1)
for file in $(/usr/bin/find . -maxdepth 1 -type f); do
echo Processing $file
curl -XPOST --data-binary #$file -H "Content-Type: text/cms+csv" $1
done
Edit 1 -- why the RANDOM? because split is going to produce the exact same files when it splits the next file as it did for the first. so .. aa ab ac ... will be produced for every file. I need to ensure every file produced by split is unique for the entire run
Not quite sure what you want to accomplish, but it seems to me that you are processing line by line. Thus, if you serialize all your csv files and lines, you can do it without disk I/O. Yet from your descriptions, I can't tell if this script runs many instances or just one instance (multiple processes or one process). Thus I can just try my best to mimic your script to reach as similar results as possible, yet to resolve the disk I/O problem. The codes are provided below, yet please correct script error if any, as I have no way to run/debug/verify it:
for csv in $(ls *.csv | sort -R); do
# first read line skip the first line, since I see your tail -n +2 command.
(read line;
count=0;
while read line; do
Processing $line;
count=$(($count + 1));
echo $csv.$count >> split/$count;
done
) < $csv
done
Your 'Processing' code now should process from a verbose line, rather than a file. Perhaps a pipe and have your Processing to process on STDIN will do the trick:
echo $line | Processing
Your curl can do similar way, to process from STDIN, by replacing #$file with -, and you can print what you want curl to send and then pipe it to curl, similar to this:
ProcessingAndPrint | curl -XPOST --data-binary - -H "Content-Type: text/cms+csv" $1

Fastest way to print a single line in a file

I have to fetch one specific line out of a big file (1500000 lines), multiple times in a loop over multiple files, I was asking my self what would be the best option (in terms of performance).
There are many ways to do this, i manly use these 2
cat ${file} | head -1
or
cat ${file} | sed -n '1p'
I could not find an answer to this do they both only fetch the first line or one of the two (or both) first open the whole file and then fetch the row 1?
Drop the useless use of cat and do:
$ sed -n '1{p;q}' file
This will quit the sed script after the line has been printed.
Benchmarking script:
#!/bin/bash
TIMEFORMAT='%3R'
n=25
heading=('head -1 file' 'sed -n 1p file' "sed -n '1{p;q} file" 'read line < file && echo $line')
# files upto a hundred million lines (if your on slow machine decrease!!)
for (( j=1; j<=100,000,000;j=j*10 ))
do
echo "Lines in file: $j"
# create file containing j lines
seq 1 $j > file
# initial read of file
cat file > /dev/null
for comm in {0..3}
do
avg=0
echo
echo ${heading[$comm]}
for (( i=1; i<=$n; i++ ))
do
case $comm in
0)
t=$( { time head -1 file > /dev/null; } 2>&1);;
1)
t=$( { time sed -n 1p file > /dev/null; } 2>&1);;
2)
t=$( { time sed '1{p;q}' file > /dev/null; } 2>&1);;
3)
t=$( { time read line < file && echo $line > /dev/null; } 2>&1);;
esac
avg=$avg+$t
done
echo "scale=3;($avg)/$n" | bc
done
done
Just save as benchmark.sh and run bash benchmark.sh.
Results:
head -1 file
.001
sed -n 1p file
.048
sed -n '1{p;q} file
.002
read line < file && echo $line
0
**Results from file with 1,000,000 lines.*
So the times for sed -n 1p will grow linearly with the length of the file but the timing for the other variations will be constant (and negligible) as they all quit after reading the first line:
Note: timings are different from original post due to being on a faster Linux box.
If you are really just getting the very first line and reading hundreds of files, then consider shell builtins instead of external external commands, use read which is a shell builtin for bash and ksh. This eliminates the overhead of process creation with awk, sed, head, etc.
The other issue is doing timed performance analysis on I/O. The first time you open and then read a file, file data is probably not cached in memory. However, if you try a second command on the same file again, the data as well as the inode have been cached, so the timed results are may be faster, pretty much regardless of the command you use. Plus, inodes can stay cached practically forever. They do on Solaris for example. Or anyway, several days.
For example, linux caches everything and the kitchen sink, which is a good performance attribute. But it makes benchmarking problematic if you are not aware of the issue.
All of this caching effect "interference" is both OS and hardware dependent.
So - pick one file, read it with a command. Now it is cached. Run the same test command several dozen times, this is sampling the effect of the command and child process creation, not your I/O hardware.
this is sed vs read for 10 iterations of getting the first line of the same file, after read the file once:
sed: sed '1{p;q}' uopgenl20121216.lis
real 0m0.917s
user 0m0.258s
sys 0m0.492s
read: read foo < uopgenl20121216.lis ; export foo; echo "$foo"
real 0m0.017s
user 0m0.000s
sys 0m0.015s
This is clearly contrived, but does show the difference between builtin performance vs using a command.
If you want to print only 1 line (say the 20th one) from a large file you could also do:
head -20 filename | tail -1
I did a "basic" test with bash and it seems to perform better than the sed -n '1{p;q} solution above.
Test takes a large file and prints a line from somewhere in the middle (at line 10000000), repeats 100 times, each time selecting the next line. So it selects line 10000000,10000001,10000002, ... and so on till 10000099
$wc -l english
36374448 english
$time for i in {0..99}; do j=$((i+10000000)); sed -n $j'{p;q}' english >/dev/null; done;
real 1m27.207s
user 1m20.712s
sys 0m6.284s
vs.
$time for i in {0..99}; do j=$((i+10000000)); head -$j english | tail -1 >/dev/null; done;
real 1m3.796s
user 0m59.356s
sys 0m32.376s
For printing a line out of multiple files
$wc -l english*
36374448 english
17797377 english.1024MB
3461885 english.200MB
57633710 total
$time for i in english*; do sed -n '10000000{p;q}' $i >/dev/null; done;
real 0m2.059s
user 0m1.904s
sys 0m0.144s
$time for i in english*; do head -10000000 $i | tail -1 >/dev/null; done;
real 0m1.535s
user 0m1.420s
sys 0m0.788s
How about avoiding pipes?
Both sed and head support the filename as an argument. In this way you avoid passing by cat. I didn't measure it, but head should be faster on larger files as it stops the computation after N lines (whereas sed goes through all of them, even if it doesn't print them - unless you specify the quit option as suggested above).
Examples:
sed -n '1{p;q}' /path/to/file
head -n 1 /path/to/file
Again, I didn't test the efficiency.
I have done extensive testing, and found that, if you want every line of a file:
while IFS=$'\n' read LINE; do
echo "$LINE"
done < your_input.txt
Is much much faster then any other (Bash based) method out there. All other methods (like sed) read the file each time, at least up to the matching line. If the file is 4 lines long, you will get: 1 -> 1,2 -> 1,2,3 -> 1,2,3,4 = 10 reads whereas the while loop just maintains a position cursor (based on IFS) so would only do 4 reads in total.
On a file with ~15k lines, the difference is phenomenal: ~25-28 seconds (sed based, extracting a specific line from each time) versus ~0-1 seconds (while...read based, reading through the file once)
The above example also shows how to set IFS in a better way to newline (with thanks to Peter from comments below), and this will hopefully fix some of the other issue seen when using while... read ... in Bash at times.
For the sake of completeness you can also use the basic linux command cut:
cut -d $'\n' -f <linenumber> <filename>

Resources