Unix script shows last created file that can be occur unknown time - shell

I prepared a script that can check display last created file.
file_to_search=find /var/lib/occas/domains/domain1/servers/traffic-1/logs/ -name "traffic-1.log*" 2>/dev/null | sort -n | tail -1 | cut -f2 -d" "
grep "Event: Invoke :" $file_to_search | awk 'BEGIN { FS = ">" } ; { print $1 }' | sort | uniq -ic >> /home/appuser/scripts/Traffic/EventInvoke_pl-1_Istanbul.txt.backup.$(date +"%Y-%m-%d")
I have following log files in this path: /var/lib/occas/domains/domain1/servers/traffic-1/logs/ but these files are being created changeable period. So, if I put this script to crontab for example 5 minutes, it can show same sometimes file and this is not what i want. I need a script that is showing last created file but when the file occurs. Help me, please?
10:54 traffic-1.log00023
11:01 traffic-1.log00024
11:05 traffic-1.log00025
11:06 traffic-1.log00026
11:09 traffic-1.log00027
11:18 traffic-1.log00028
11:23 traffic-1.log00029
11:34 traffic-1.log00030
11:39 traffic-1.log00031
11:40 traffic-1.log00032

How much delay between the generation of the log entry and the display would you be willing to take? In theory, you could start the cron job every minute, but I wouldn't do this.
Much easier would be a script which runs unattended and, in a loop, repeatedly checks the last line of the log file, and if it changes, does whatever needs to be done.
There are however two issues to observe:
The first is easy: You should at least sleep for 1 or 2 seconds after each "polling", otherwise your script will eat up a lot of system resources.
The second is a bit tricky: It could be, that your script terminates for whatever reason, and it this happens, you need to have it restarted automatically. One way would be to set up a "watchdog": Your script, in addition to checking the log file, touches a certain file every time it did the checking (no matter whether a new event need to be reported or not). The watchdog, which could be a cron job running every, say, 10 minutes, would verify that your script is still alive (i.e. that the output file of your script had been touched during the past couple of seconds), and if not, would start a new copy of your script. This means that you could loose a 10 minute timewindow, but since it is likely a very rare event, that your script crashes, this will, hopefully, not an issue.

Related

How do i speedup my for loop that generates a jvm for each entry in the list and closes?

I have a code that reads few entry in run time and writes them to a file. In a for loop, i read the content of the file and for each line, i run my code, this code starts a jvm and closes it after the process is done. I have scheduled the job every 5 minutes but the job takes more than 5 minutes due to the slow jvm calls. I would like to know if there is any way i can execute this in parallel!!
for i in `cat test.txt`
do
echo "test"
kafka-run-class kafka.admin.ConsumerGroupCommand ### This is my java process which takes time.
done
My test.txt contains 100 entries.
The obvious way is with GNU Parallel. So, if your file test.txt looks like this:
line 1
line 2
line 3
line 4
You can do:
parallel -k echo < test.txt
line 1
line 2
line 3
line 4
The -k just keeps the output in order.
You can also use --dry-run to see what it would do without actually doing anything.
You can also use say -j 12 if you want to run 12 jobs at a time, since, as I have written it, it will just run one job per CPU core in parallel.
TL;DR;
I am suggesting you try something like:
parallel kafka-run-class kafka.admin.ConsumerGroupCommand < test.txt

Solaris logadm log rotation

I would lie to understand how logadm works.
So by looking at the online materials I wrote a small script which redirects a date into a log file and sleeps for 1 second. this will run in a infinite loop.
#!/usr/bin/bash
while true
do
echo `date` >>/var/tmp/temp.log
sleep 1
done
After this I had executed below commands:
logadm -w /var/tmp/temp.log -s 100b
logadm -V
My intention from above commands is log(/var/tmp/temp.log) should be rotated for every 100 bytes.
But after setting these , when I run the script in background, I see the log file is not rotated.
# ls -lrth /var/tmp/temp.log*
-rw-r--r-- 1 root root 7.2K Jun 15 08:56 /var/tmp/temp.log
#
As I understand, you have to call it to do work, e.g. from crontab or manually like logadm -c /var/tmp/temp.log (usually placecd in crontab)
Sidenode: you could simply write date >> /var/tmp/temp.log w/o echo.
This not how I would normally do this, plus I think you may have misunderstood the -w option.
The -w option updates /etc/logadm.conf with the parameters on the command line, and logadm is then run at 10 minutes past 3am (on the machine I checked).
I took your script and ran it, then ran:
"logadm -s 100b /var/tmp/temp.log"
and it worked fine. Give it a try! :-)

log of parallel computations, how do I prevent interleaved write? lockfile or flock?

I see that has been discussed several times how to run scripts not concurrently, but I have not see the topic of concurrent write.
I am doing some parallel computation with xargs launching the commands for the actual computations. At the end of each computation I want that process to access a file and put the results in there. I am getting troubles because the write on the log file happens in a way that each process can access the log file at the same time, resulting in interleaved entries with one line from one run, another line from another run that finished about the same time (which is likely to happen due to the parallel nature of the run with xargs).
So in practice let's say that using xargs I run in parallel several insances of a script that reads:
#!/bin/bash
#### do something that takes some time
#### define content of the log
folder="<folder>"$PWD"</folder>\n"
datetag="<enddate>"`date`"</enddate>\n"
#### store log in XML ####
echo -e "<myrun>\n""$folder""$datetag""</myrun>" >> $outputfie
At present I get output file with interleaved runs log like this
<myrun>
<myrun>
<folder>./generations/test/run1</folder>
<folder>./generations/test/run2</folder>
<enddate>Sun Jul 6 11:17:58 CEST 2014</enddate>
</myrun>
<enddate>Sun Jul 6 11:17:58 CEST 2014</enddate>
</myrun>
Is there a way to give "exclusive access" to one instance of the script at a time, so that each script is writing its log without interference with the others?
I have seen flock and lockfile, but I am not sure what fits best to my case and I am seeking for advise/suggestion.
Thanks,
Roberto
I will use traceroute as example as that prints output slowly, but any other command would also work. Compare:
(echo 8.8.8.8;echo 8.8.4.4) | xargs -P6 -n1 traceroute > traceroute.xarg
to:
(echo 8.8.8.8;echo 8.8.4.4) | parallel traceroute > traceroute.para
Make sure you install GNU Parallel and not another parallel, and that /etc/parallel/config is empty.
I thinks this in the end does the job. The loop keeps going until this instance of the script can lock the log file for itself. Then writes and unlocks it.
The other instances of the script that are running in parallel and might be trying to write will find the lock ... or will be able to lock the file for themselves.
while [ -! `lockfile -1 log.lock` ]; do
echo -e "accessing file at "`date`
echo -e "$logblock" >> log
rm -f log.lock
break
done
Does anybody see any drawbacks in this type of solution?

Use shell output for error handling for condor

I need to submit multiple simulations to condor (multi-client execution grid) using shell and since this may take a while, I decided to write a shell script to do it for me. I am very new to shell scripting and this is the result of what I did on one day:
for H in {0..50}
do
for S in {0..10}
do
./p32 -data ../data.txt -out ../result -position $S -group $H
echo "> Ready to submit"
condor_submit profile.sub
echo "> Waiting 15 minutes for group $H Pos $S"
for W in {1..15}
do
echo "Staring minute $W"
sleep 60
done
done
echo "Deleting data_3 to free up space"
mkdir /tmp/data_3
if [$H < 10]
then
tar cfvz /tmp/data_3/group_000$H.tar.gz ../result/data_3/group_000$H
rm -r ../result/data_3/group_000$H
else
tar cfvz /tmp/data_3/group_00$H.tar.gz ../result/data_3/group_00$H
rm -r ../result/data_3/group_00$H
fi
done
This script runs through 0..50 simulations and submits 0..10 different parameters to a program that generates a condor submission profile. Then I submit this profile and let it execute for 15 minutes (with a call being made every minute to ensure the SSH pipe doesn't break). Once the 15 minutes are up I compress the output to a volume with more space and erase the original files.
The reason for me implementing this because is due to our condor system can only being able to handle up to 10,000 submissions at once and one submission (condor_submit profile.sub) executes 7000+ simulations.
Now my problem is with this line. When I checked this morning I (luckily) spotted that the when calling condor_submit profile.sub may cause an error if the network is too busy. The error code is:
ERROR: Failed to connect to local queue manager
CEDAR:6001:Failed to connect to <IP_NUMBER:PORT_NUMBER>
This means that from time to time a whole iteration gets lost! How can I work around this? The only way I see is to use shell to read in the last line/s of terminal output and evaluate whether they follow the expected response i.e.:
7392 job(s) submitted to cluster CLUSTER_NUMBER.
But how would I read in the last line and go about checking for errors?
Any help is very needed and very much appreciated
Does condor_submit give a non-zero exit code when it fails? If so, you can try calling it like this:
while ! condor_submit profile.sub; do
sleep 5
done
which will cause the current profile to be submitted every 5 seconds until it succeeds.

How to get the time of a process created remotely via ssh?

I am currently writing a script whose purpose is to kill a process whose running time exceeds a threshold. The "killer" is run on a bunch of hosts and the jobs are send by a server to each host. My idea was to use 'ps' to get the running time of the job but all it prints is
17433 17433 ? 00:00:00 python
whatever the time I wait.
I tried to find a simplified example to avoid posting all the code I wrote. Let's call S the server and H the host.
If I do the following steps:
1) ssh login#H from the server
2) python myscript.py (now logged on the host)
3) ps -U login (still on the host)
I get the same result than the one above 00:00:00 as far as the time is concerned.
How can I get the real execution time ? When I do everything locally on my machine, it works fine.
I thank you very much for your help.
V.
Alternatively, you can look at the creation of the pid file in /var/run, assuming your process created one and use the find command to see if it exceeds a certain threshold:
find /var/run/ -name "myProcess.pid" -mtime +1d
This will return the filename if it meets the criteria (last modified at least 1 day ago). You probably also want to check to make sure the process is actually running as it may have crashed and left the pid behind.
If you want how long the process has been alive, you could try
stat -c %Y /proc/`pgrep python`
..which will give it back to you in epoch time. If alternately you want the kill in one go, I suggest using the find mentioned above (but perhaps point it at /proc)
Try this out:
ps kstart_time -ef | grep myProc.py | awk '{print $5}'
This will show the start date/time of the proccess myProc.py:
[ 19:27 root#host ~ ]# ps kstart_time -ef | grep "httpd\|STIME" | awk '{print $5}'
STIME
19:25
Another option is etime.
etime is the elapsed time since the process was started, in the form dd-hh:mm:ss. dd is the number of days; hh, the number of hours; mm, the number of minutes; ss, the number of seconds.
[ 19:47 root#host ~ ]# ps -eo cmd,etime
CMD ELAPSED
/bin/sh 2-16:04:45
And yet another way to do this:
Get the process pid and read off the timestamp in the corresponding subdirectory in /proc.
First, get the process pid using the ps command (ps -ef or ps aux)
Then, use the ls command to display the creation timestamp of the directory.
[ 19:57 root#host ~ ]# ls -ld /proc/1218
dr-xr-xr-x 5 jon jon 0 Sep 20 16:14 /proc/1218
You can tell from the timestamp that the process 1218 began executing on Sept 20, 16:14.

Resources