I have log files that are broken down into between 1 and 4 "Tasks". In each "Task" there are sections for "WU Name" and "estimated CPU time remaining". Ultimately, I want to the bash script output to look like this 3 Task example;
Task 1 Mini_Protein_binds_COVID-19_boinc_ 0d:7h:44m:28s
Task 2 shapeshift_pair6_msd4X_4_f_e0_161_ 0d:4h:14m:22s
Task 3 rep730_0078_symC_reordered_0002_pr 1d:1h:38m:41s
So far; I can count the Tasks in the log. I can isolate x number of characters I want from the "WU Name". I can convert the "estimated CPU time remaining" in seconds to days:hours:minutes:seconds. And I can output all of that into 'pretty' columns. Problem is that I can only process 1 Task using;
# Initialize counter
counter=1
# Count how many iterations
cnt_wu=`grep -c "WU name:" /mnt/work/sec-conv/bnc-sample3.txt`
# Iterate the loop for cnt-wu times
while [ $counter -le ${cnt_wu} ]
do
core_cnt=$counter
wu=`cat /mnt/work/sec-conv/bnc-sample3.txt | grep -Po 'WU name: \K.*' | cut -c1-34`
sec=`cat /mnt/work/sec-conv/bnc-sample3.txt | grep -Po 'estimated CPU time remaining: \K.*' | cut -f1 -d"."`
dhms=`printf '%dd:%dh:%dm:%ds\n' $(($sec/86400)) $(($sec%86400/3600)) $(($sec%3600/60)) \ $(($sec%60))`
echo "Task ${core_cnt}" $'\t' $wu $'\t' $dhms | column -ts $'\t'
counter=$((counter + 1))
done
Note: /mnt/work/sec-conv/bnc-sample3.txt is a static one Task sample only used for this scripts dev.
What I can't figure out is the next step which is to be able to process x number of multiple Tasks. I can't figure out how to leverage the while/counter combination properly, and can't figure out how to increment through the occurrences of Tasks.
Adding bnc-sample.txt (contains 3 Tasks)
1) -----------
name: Rosetta#home
master URL: https://boinc.bakerlab.org/rosetta/
user_name: XXXXXXX
team_name:
resource share: 100.000000
user_total_credit: 10266.993660
user_expavg_credit: 512.420495
host_total_credit: 10266.993660
host_expavg_credit: 512.603669
nrpc_failures: 0
master_fetch_failures: 0
master fetch pending: no
scheduler RPC pending: no
trickle upload pending: no
attached via Account Manager: no
ended: no
suspended via GUI: no
don't request more work: no
disk usage: 0.000000
last RPC: Wed Jun 10 15:55:29 2020
project files downloaded: 0.000000
GUI URL:
name: Message boards
description: Correspond with other users on the Rosetta#home message boards
URL: https://boinc.bakerlab.org/rosetta/forum_index.php
GUI URL:
name: Your account
description: View your account information
URL: https://boinc.bakerlab.org/rosetta/home.php
GUI URL:
name: Your tasks
description: View the last week or so of computational work
URL: https://boinc.bakerlab.org/rosetta/results.php?userid=XXXXXXX
jobs succeeded: 117
jobs failed: 0
elapsed time: 2892439.609931
cross-project ID: 3538b98e5f16a958a6bdd2XXXXXXXXX
======== Tasks ========
1) -----------
name: shapeshift_pair6_msd4X_4_f_e0_161_X_0001_0001_fragments_abinitio_SAVE_ALL_OUT_946179_730_0
WU name: shapeshift_pair6_msd4X_4_f_e0_161_X_0001_0001_fragments_abinitio_SAVE_ALL_OUT_946179_730
project URL: https://boinc.bakerlab.org/rosetta/
received: Mon Jun 8 09:58:08 2020
report deadline: Thu Jun 11 09:58:08 2020
ready to report: no
state: downloaded
scheduler state: scheduled
active_task_state: EXECUTING
app version num: 420
resources: 1 CPU
estimated CPU time remaining: 26882.771040
slot: 1
PID: 28434
CPU time at last checkpoint: 3925.896000
current CPU time: 4314.761000
fraction done: 0.066570
swap size: 431 MB
working set size: 310 MB
2) -----------
name: rep730_0078_symC_reordered_0002_propagated_0001_0001_0001_A_v9_fold_SAVE_ALL_OUT_946618_54_0
WU name: rep730_0078_symC_reordered_0002_propagated_0001_0001_0001_A_v9_fold_SAVE_ALL_OUT_946618_54
project URL: https://boinc.bakerlab.org/rosetta/
received: Mon Jun 8 09:58:08 2020
report deadline: Thu Jun 11 09:58:08 2020
ready to report: no
state: downloaded
scheduler state: scheduled
active_task_state: EXECUTING
app version num: 420
resources: 1 CPU
estimated CPU time remaining: 26412.937920
slot: 2
PID: 28804
CPU time at last checkpoint: 3829.626000
current CPU time: 3879.975000
fraction done: 0.082884
swap size: 628 MB
working set size: 513 MB
3) -----------
name: Mini_Protein_binds_COVID-19_boinc_site3_2_SAVE_ALL_OUT_IGNORE_THE_REST_0aw6cb3u_944116_2_0
WU name: Mini_Protein_binds_COVID-19_boinc_site3_2_SAVE_ALL_OUT_IGNORE_THE_REST_0aw6cb3u_944116_2
project URL: https://boinc.bakerlab.org/rosetta/
received: Mon Jun 8 09:58:47 2020
report deadline: Thu Jun 11 09:58:46 2020
ready to report: no
state: downloaded
scheduler state: scheduled
active_task_state: EXECUTING
app version num: 420
resources: 1 CPU
estimated CPU time remaining: 27868.559616
slot: 0
PID: 30988
CPU time at last checkpoint: 1265.356000
current CPU time: 1327.603000
fraction done: 0.032342
swap size: 792 MB
working set size: 668 MB
Again, I appreciate any guidance!
Related
When I start a linux server with Cloud-init, I have a few scripts in /etc/cloud/cloud.cfg.d/ and they run in reverse alphabetical order
# ll /etc/cloud/cloud.cfg.d/
total 28
-rw-r--r-- 1 root root 173 Dec 10 12:38 00-cloudinit-lifecycle-hook.cfg
-rw-r--r-- 1 root root 2120 Jun 1 2021 05_logging.cfg
-rw-r--r-- 1 root root 590 Oct 26 17:55 10_aws_yumvars.cfg
-rw-r--r-- 1 root root 29 Dec 1 18:22 20_amazonlinux_repo_https.cfg
-rw-r--r-- 1 root root 586 Dec 10 12:38 50-cloudinit-tomcat.cfg
-rw-r--r-- 1 root root 585 Dec 10 12:40 60-cloudinit-newrelic.cfg
The last to execute is 00-cloudinit-lifecycle-hook.cfg, in which I complete the lifecycle for the Auto Scaling Group with a CONTINUE. The ASG fails if it doesn't receive this signal after a given time out.
The issue is that even if there's an error in 50-cloudinit-tomcat.cfg, it still runs 00-cloudinit-lifecycle-hook.cfg instead of stopping
How can I ensure cloud-init stops and never reaches the last script? I would like the ASG to never receive the CONTINUE signal if there's any error.
Here are the files:
EC2 instance user-data:
#cloud-config
bootcmd:
- [cloud-init-per, once, "app-volume", mkfs, -t, "ext4", "/dev/nvme1n1"]
mounts:
- ["/dev/nvme1n1", "/app-volume", "ext4", "defaults,nofail", "0", "0"]
merge_how:
- name: list
settings: [append]
- name: dict
settings: [no_replace, recurse_list]
50-cloudinit-tomcat.cfg
#cloud-config
merge_how:
- name: list
settings: [append]
- name: dict
settings: [no_replace, recurse_list]
runcmd:
- "#!/bin/bash -e"
- set +x
- echo ' '
- echo '# ===================================='
- echo '# Tomcat Cloud Init '
- echo '# /etc/cloud/cloud.cfg.d/'
- echo '# ===================================='
- echo ' '
- echo '#===================================='
- echo '# Run Ansible'
- echo '#===================================='
- echo ' '
- set -x
- ansible-playbook /opt/init-config/tomcat/tomcat-config.yaml
when I run ansible-playbook /opt/init-config/tomcat/tomcat-config.yaml directly in the instance I get an error, and I know it returns 2
ansible-playbook /opt/init-config/tomcat/tomcat-config.yaml #shows errors
echo $? # shows "2"
00-cloudinit-lifecycle-hook.cfg
#cloud-config
merge_how:
- name: list
settings: [append]
- name: dict
settings: [no_replace, recurse_list]
runcmd:
- "/opt/lifecycles/lifecycle-hook-continue.sh"
An alternative I can think of, is to send a ABANDON signal instead of CONTINUE as soon as there's en error in one of the cloud-init config. But I can't find in the documentation on to define if there's an error
I have a log file which is genarated from a backup restoring process. I want know whether is there any slowness compared to the previous times. so i need to the time deference for each an individual restore module time . here is my log file.
2019-10-26 08:06:56 6503 6501 Begin restore logical log 2290281 (Storage Manager copy ID: 1 1571839465).
2019-10-26 08:14:05 6503 6501 Completed restore logical log 2290281.
2019-10-26 08:14:09 6503 6501 Begin restore logical log 2290282 (Storage Manager copy ID: 1 1571839691).
2019-10-26 08:21:09 6503 6501 Completed restore logical log 2290282.
2019-10-26 08:21:13 6503 6501 Begin restore logical log 2290283 (Storage Manager copy ID: 1 1571839892).
so my expectation is showing simply below result
deference between (2019-10-26 08:14:05) and (2019-10-26 08:06:56)
deference between (2019-10-26 08:21:09) and (2019-10-26 08:14:09)
so my expected output is
7 minutes and 11 seconds
7 minutes and 00 seconds
You can use bash/date for the simple date math.
PREV=
DELTA=0
while read dt tm data ; do
CURR=$(date '+%s' -d "$dt $tm");
[ "$PREV" ] && DELTA=$((CURR-PREV));
PREV=$CURR ;
echo "$DELTA sec $dt $tm $data" ;
done < logfile
Additional formatting can be added to the echo
echo "$((DELTA/60)) min and $((DELTA%60)) sec $dt $tm $data" ;
Output:
0 min and 0 sec 2019-10-26 08:06:56 6503 6501 Begin restore logical log 2290281 (Storage Manager copy ID: 1 1571839465).
7 min and 9 sec 2019-10-26 08:14:05 6503 6501 Completed restore logical log 2290281.
0 min and 4 sec 2019-10-26 08:14:09 6503 6501 Begin restore logical log 2290282 (Storage Manager copy ID: 1 1571839691).
7 min and 0 sec 2019-10-26 08:21:09 6503 6501 Completed restore logical log 2290282.
I am using a simple salt state to send (file.managed) and execute (cmd.run) a shell script on a minion/target. No matter what exit or return value the shell script sends, the salt master is interpreting the result as successful.
I tried using cmd.script, but keep getting a permission denied error on the temp version of the file under /tmp. Filesystem is not mounted with noexec so we can't figure out why it won't work.
For cmd.run, stdout in the job output shows the failed return code and message but Salt still says Success. Running the script locally on the minion reports the return/exit code as expected.
I tried adding stateful: True into the cmd.run block and formatted the key value pairs at the end of the shell script as demonstrated in the docs.
Running against 2 minion target, 1 fail 1 succeed. Both report Result as True but correctly populate Comment with my key value pair.
I've tried YES/NO, TRUE/FALSE, 0/1 - nothing works.
The end of my shell script, formatted as shown in the docs.
echo Return_Code=${STATUS}
# exit ${STATUS}
if [[ ${STATUS} -ne 0 ]]
then
echo ""
echo "changed=False comment='Failed'"
else
echo ""
echo "changed=True comment='Success'"
fi
The SLS block:
stop_oracle:
cmd.run:
- name: {{scriptDir}}/{{scriptName}}{{scriptArg}}
- stateful: True
- failhard: True
SLS output from Successful minion:
----------
ID: stop_oracle
Function: cmd.run
Name: /u01/orastage/oraclepsu/scripts/oracle_ss_wrapper.ksh stop
Result: True
Comment: Success
Started: 14:37:44.519131
Duration: 18930.344 ms
Changes:
----------
changed:
True
pid:
26195
retcode:
0
stderr:
stty: standard input: Inappropriate ioctl for device
stdout:
Script running under ROOT
Mon Jul 1 14:38:03 EDT 2019 : Successful
Return_Code=0
SLS output from Failed minion:
----------
ID: stop_oracle
Function: cmd.run
Name: /u01/orastage/oraclepsu/scripts/oracle_ss_wrapper.ksh stop
Result: True
Comment: Failed
Started: 14:07:14.153940
Duration: 38116.134 ms
Changes:
Output from shell script run locally on fail target:
[oracle#a9tvdb102]:/home/oracle:>>
/u01/orastage/oraclepsu/scripts/oracle_ss_wrapper.ksh stop
Mon Jul 1 15:29:18 EDT 2019 : There are errors in the process
Return_Code=1
changed=False comment='Failed'
Output from shell script run locally on success target:
[ /home/oracle ]
oracle#r9tvdo1004.giolab.local >
/u01/orastage/oraclepsu/scripts/oracle_ss_wrapper.ksh stop
Mon Jul 1 16:03:18 EDT 2019 : Successful
Return_Code=0
changed=True comment='Success'
When i launch this batch command for create and merge deltas:
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta --rotate
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate
In searchd.log found this error and deltas are not merged into main
[Fri Sep 25 15:34:42.549 2015] [ 2312] WARNING: rotating index 'idx_product_main': cur to old rename failed: rename D:\Sphinx\project\data\product.spa to D:\Sphinx\project\data\product.old.spa failed: Broken pipe
Console output is:
using config file 'D:\Sphinx\project\product.conf'...
merging index 'idx_product_delta' into index 'idx_product_main'...
read 7.2 of 7.2 MB, 100.0% done
merged 11.5 Kwords
merged in 0.127 sec
ERROR: index 'idx_product_main': failed to delete 'D:\Sphinx\project\data\product.new.spa': Permission deniedtotal 671 reads, 0.006 sec, 15.3 kb/call avg, 0.0 msec/call avg total 36 writes, 0.004 sec, 277.8 kb/call avg, 0.1 msec/call avg
My product.conf is:
source src_product_main
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = database
sql_port = 3306 # optional, default is 3306
sql_query_pre = REPLACE INTO sphinx_index_meta(index_name, last_update) \
VALUES('idx_prodotti_main', current_timestamp())
sql_query_range = SELECT MIN(id),MAX(id) \
FROM product \
WHERE deleted = 0 AND visible= 1
sql_range_step = 1000
sql_query = SELECT id, text, last_update \
FROM product \
WHERE id>=$start AND id<=$end AND deleted = 0 AND visible = 1
sql_attr_timestamp = last_update
}
index idx_product_main
{
source = src_product_main
path = D:\Sphinx\project\data\product
ondisk_attrs = 1
stopwords = D:\Sphinx\project\stopwords.txt
min_word_len = 2
min_prefix_len = 0
min_infix_len = 3
ngram_len = 1
}
source src_product_delta : src_product_main
{
sql_query_range = SELECT MIN(id),MAX(id) \
FROM product \
WHERE deleted = 0 AND visible= 1
sql_range_step = 1000
sql_query = SELECT id, text, last_update \
FROM product \
WHERE id>=$start AND id<=$end AND deleted = 0 AND visible = 1
}
index idx_product_delta : idx_product_main
{
source = src_product_delta
path = D:\Sphinx\project\delta\product
ondisk_attrs = 1
stopwords = D:\Sphinx\project\stopwords.txt
min_word_len = 2
min_prefix_len = 0
min_infix_len = 3
ngram_len = 1
}
indexer
{
mem_limit = 128M
max_iosize = 1M
}
searchd
{
listen = 9312
listen = 9306:mysql41
log = D:\Sphinx\project\log\searchd.log
query_log = D:\Sphinx\project\log\query.log
read_timeout = 5
client_timeout = 300
max_children = 30
pid_file = D:\Sphinx\project\log\searchd.pid
seamless_rotate = 1
preopen_indexes = 0
unlink_old = 1
workers = threads # for RT to work
binlog_path = D:\Sphinx\project\data
}
I have also tried on Windows 7 and Windows 8, with both stable 2.2.10 and beta
2.3.1-id64-beta (r4926) with same error.
indexer running with a cron (windows scheduler) as SYSTEM user
searchd service running as SYSTEM user
D:\Sphinx\project\data\ folder permission has full control for SYSTEM
How can I solve this issue
UPDATE for Eugene Soldatov answer
I have also tried (first command less --rotate)
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate
but in console output found this error
Sphinx 2.2.10-id64-release (2c212e0)
Copyright (c) 2001-2015, Andrew Aksyonoff
Copyright (c) 2008-2015, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file 'D:\Sphinx\project\product.conf'...
indexing index 'idx_prodotti_delta'...
FATAL: failed to lock D:\Sphinx\project\delta\prodotti.spl: No error, will not index. Try --rotate option.
Sphinx 2.2.10-id64-release (2c212e0)
Copyright (c) 2001-2015, Andrew Aksyonoff
Copyright (c) 2008-2015, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file 'D:\Sphinx\project\product.conf'...
merging index 'idx_prodotti_delta' into index 'idx_prodotti_main'...
read 7.2 of 7.2 MB, 100.0% done
merged 11.5 Kwords
merged in 0.214 sec
ERROR: index 'idx_prodotti_main': failed to delete 'D:\Sphinx\project\data\prodotti.new.spa': Permission deniedtotal 20136 reads, 0.071 sec, 30.9 kb/call avg, 0.0 msec/call avg
total 36 writes, 0.012 sec, 283.3 kb/call avg, 0.3 msec/call avg
In searchd.log found this error
[Wed Sep 30 09:09:29.371 2015] [ 4244] rotating index 'idx_prodotti_main': started
[Wed Sep 30 09:09:29.381 2015] [ 4244] WARNING: rotating index 'idx_prodotti_main': cur to old rename failed: rename D:\Sphinx\project\data\prodotti.spa to D:\Sphinx\project\data\prodotti.old.spa failed: Broken pipe
[Wed Sep 30 09:09:29.381 2015] [ 4244] rotating index: all indexes done
UPDATE 2
Also try to insert sleep between two commands
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta --rotate
timeout /t 60
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate
Console output:
ERROR: index 'idx_prodotti_main': failed to delete 'D:\Sphinx\project\data\prodotti.new.spa': Permission deniedtotal 20137 reads, 0.072 sec, 30.9 kb/c
UPDATE 3: Issue solved
Issue solved by sphinx guys here
http://sphinxsearch.com/bugs/view.php?id=2335
The reason of such behavior is that --rotate command is asynchronous, so when you run second command:
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate
first may continue to work with index idx_product_delta:
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta --rotate
, so it's locked.
If it's possible, remove --rotate option on first command.
UPDATE:
Seems that you need --rotate option in first command. So you could measure average time that need to make it done and insert sleep between two commands. For example, for 30 seconds:
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta --rotate
timeout /t 30
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate
I setup a cronjob to call myscript.sh every 5 min which then calls a php file between 30 sec and 3 in time and I don't get it why the average Interval is 05:09.
I want to call cron2_.php every 4-8 min but no chance to achieve that.
Tank you.
Cron Job: */5 * * * * myscript.sh
Shell script:
#!/bin/sh
# Grab a random value between 60-180 or ( between 30sec and 3 minutes )
value=$RANDOM
while [ $value -gt 180 ] || [ $value -lt 30 ] ;
do
value=$RANDOM
done
# Sleep for that time.
sleep $value
# Exectue Cron.
echo "Exectued on:$(date)" >> public_html/log_file.txt
exec php -f public_html/cron2_.php
Here is the exectuion time for 2 hours:
Average Interval -> 05:09
Execution Time Interval Min:Sec
13:02:52 00:00
13:07:06 04:14
13:11:35 04:29
13:17:34 05:59
13:21:55 04:21
13:26:54 04:59
13:32:00 05:06
13:35:50 03:50
13:42:44 06:54
13:47:03 04:19
13:51:26 04:23
13:56:48 05:22
14:01:53 05:05
14:07:42 05:49
14:12:15 04:33
14:16:22 04:07
14:23:01 06:39
14:27:17 04:16
14:32:21 05:04
14:35:57 03:36
14:42:14 06:17
14:45:44 03:30
14:52:52 07:08
14:56:50 03:58
15:02:57 06:07
15:06:43 03:46
15:12:26 05:43
15:16:29 04:03
15:22:00 05:31
15:25:35 03:35
15:31:51 06:16
15:37:51 06:00
15:42:56 05:05
15:47:32 04:36
15:50:36 03:04
15:55:45 05:09
16:02:15 06:30
16:06:10 03:55
16:11:11 05:01
16:15:56 04:45
16:21:58 06:02
16:25:56 03:58
16:31:09 05:13
16:37:06 05:57
16:42:30 05:24
16:45:36 03:06
You want your script to run every 4 to 8 minutes. Let's say then that we want, on average, one execution every 6 minutes. In that case, set the crontab line to:
*/6 * * * * myscript.sh
Next, in your script, put a random delay of zero to two minutes:
sleep $(($RANDOM % 120))
Consider two extreme cases. First, suppose that one job waits the maximum 2 minutes and the next waits the minimum of 0 minutes. The time between their executions is 4 minutes. For the second case, consider the opposite: the first job waits the minimum of 0 minutes and the second waits the maximum of 2 minutes. In this case, the time between their executions is 8 minutes. Thus, this approach achieves a wait of 4 to 8 minutes with an average wait of 6 minutes.