Sphinx: Permission denied/Broken pipe on deltas merge

Sphinx: Permission denied/Broken pipe on deltas merge - windows

When i launch this batch command for create and merge deltas:
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta --rotate
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate
In searchd.log found this error and deltas are not merged into main
[Fri Sep 25 15:34:42.549 2015] [ 2312] WARNING: rotating index 'idx_product_main': cur to old rename failed: rename D:\Sphinx\project\data\product.spa to D:\Sphinx\project\data\product.old.spa failed: Broken pipe
Console output is:
using config file 'D:\Sphinx\project\product.conf'...
merging index 'idx_product_delta' into index 'idx_product_main'...
read 7.2 of 7.2 MB, 100.0% done
merged 11.5 Kwords
merged in 0.127 sec
ERROR: index 'idx_product_main': failed to delete 'D:\Sphinx\project\data\product.new.spa': Permission deniedtotal 671 reads, 0.006 sec, 15.3 kb/call avg, 0.0 msec/call avg total 36 writes, 0.004 sec, 277.8 kb/call avg, 0.1 msec/call avg
My product.conf is:
source src_product_main
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = database
sql_port = 3306 # optional, default is 3306
sql_query_pre = REPLACE INTO sphinx_index_meta(index_name, last_update) \
VALUES('idx_prodotti_main', current_timestamp())
sql_query_range = SELECT MIN(id),MAX(id) \
FROM product \
WHERE deleted = 0 AND visible= 1
sql_range_step = 1000
sql_query = SELECT id, text, last_update \
FROM product \
WHERE id>=$start AND id<=$end AND deleted = 0 AND visible = 1
sql_attr_timestamp = last_update
}
index idx_product_main
{
source = src_product_main
path = D:\Sphinx\project\data\product
ondisk_attrs = 1
stopwords = D:\Sphinx\project\stopwords.txt
min_word_len = 2
min_prefix_len = 0
min_infix_len = 3
ngram_len = 1
}
source src_product_delta : src_product_main
{
sql_query_range = SELECT MIN(id),MAX(id) \
FROM product \
WHERE deleted = 0 AND visible= 1
sql_range_step = 1000
sql_query = SELECT id, text, last_update \
FROM product \
WHERE id>=$start AND id<=$end AND deleted = 0 AND visible = 1
}
index idx_product_delta : idx_product_main
{
source = src_product_delta
path = D:\Sphinx\project\delta\product
ondisk_attrs = 1
stopwords = D:\Sphinx\project\stopwords.txt
min_word_len = 2
min_prefix_len = 0
min_infix_len = 3
ngram_len = 1
}
indexer
{
mem_limit = 128M
max_iosize = 1M
}
searchd
{
listen = 9312
listen = 9306:mysql41
log = D:\Sphinx\project\log\searchd.log
query_log = D:\Sphinx\project\log\query.log
read_timeout = 5
client_timeout = 300
max_children = 30
pid_file = D:\Sphinx\project\log\searchd.pid
seamless_rotate = 1
preopen_indexes = 0
unlink_old = 1
workers = threads # for RT to work
binlog_path = D:\Sphinx\project\data
}
I have also tried on Windows 7 and Windows 8, with both stable 2.2.10 and beta
2.3.1-id64-beta (r4926) with same error.
indexer running with a cron (windows scheduler) as SYSTEM user
searchd service running as SYSTEM user
D:\Sphinx\project\data\ folder permission has full control for SYSTEM
How can I solve this issue
UPDATE for Eugene Soldatov answer
I have also tried (first command less --rotate)
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate
but in console output found this error
Sphinx 2.2.10-id64-release (2c212e0)
Copyright (c) 2001-2015, Andrew Aksyonoff
Copyright (c) 2008-2015, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file 'D:\Sphinx\project\product.conf'...
indexing index 'idx_prodotti_delta'...
FATAL: failed to lock D:\Sphinx\project\delta\prodotti.spl: No error, will not index. Try --rotate option.
Sphinx 2.2.10-id64-release (2c212e0)
Copyright (c) 2001-2015, Andrew Aksyonoff
Copyright (c) 2008-2015, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file 'D:\Sphinx\project\product.conf'...
merging index 'idx_prodotti_delta' into index 'idx_prodotti_main'...
read 7.2 of 7.2 MB, 100.0% done
merged 11.5 Kwords
merged in 0.214 sec
ERROR: index 'idx_prodotti_main': failed to delete 'D:\Sphinx\project\data\prodotti.new.spa': Permission deniedtotal 20136 reads, 0.071 sec, 30.9 kb/call avg, 0.0 msec/call avg
total 36 writes, 0.012 sec, 283.3 kb/call avg, 0.3 msec/call avg
In searchd.log found this error
[Wed Sep 30 09:09:29.371 2015] [ 4244] rotating index 'idx_prodotti_main': started
[Wed Sep 30 09:09:29.381 2015] [ 4244] WARNING: rotating index 'idx_prodotti_main': cur to old rename failed: rename D:\Sphinx\project\data\prodotti.spa to D:\Sphinx\project\data\prodotti.old.spa failed: Broken pipe
[Wed Sep 30 09:09:29.381 2015] [ 4244] rotating index: all indexes done
UPDATE 2
Also try to insert sleep between two commands
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta --rotate
timeout /t 60
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate
Console output:
ERROR: index 'idx_prodotti_main': failed to delete 'D:\Sphinx\project\data\prodotti.new.spa': Permission deniedtotal 20137 reads, 0.072 sec, 30.9 kb/c
UPDATE 3: Issue solved
Issue solved by sphinx guys here
http://sphinxsearch.com/bugs/view.php?id=2335

The reason of such behavior is that --rotate command is asynchronous, so when you run second command:
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate
first may continue to work with index idx_product_delta:
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta --rotate
, so it's locked.
If it's possible, remove --rotate option on first command.
UPDATE:
Seems that you need --rotate option in first command. So you could measure average time that need to make it done and insert sleep between two commands. For example, for 30 seconds:
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf idx_product_delta --rotate
timeout /t 30
D:\Sphinx\bin\indexer.exe --config D:\Sphinx\project\product.conf --merge idx_product_main idx_product_delta --rotate

Related

Gearman worker in shell hangs as a zombie

I have a Gearman worker in a shell script started with perp in the following way:
runuid -s gds \
/usr/bin/gearman -h 127.0.0.1 -t 1000 -w -f gds-rel \
-- xargs /home/gds/gds-rel-worker.sh < /dev/null 2>/dev/null
The worker only does some input validation and calls another shell script run.sh that invokes bash, curl, Terragrunt, Terraform, Ansible and gcloud to provision and update resources in GCP like this:
./run.sh --release 1.2.3 2>&1 >> /var/log/gds-release
The script is intended to run unattended. The problem I have is that after the job finishes successfully (that's both shell scripts run.sh and gds-rel-worker.sh) the Gearman job remains executing, because the child process becomes zombie (see last line below).
root 144748 1 0 Apr29 ? 00:00:00 perpboot -d /etc/perp
root 144749 144748 0 Apr29 ? 00:00:00 \_ tinylog -k 8 -s 100000 -t -z /var/log/perp/perpd-root
root 144750 144748 0 Apr29 ? 00:00:00 \_ perpd /etc/perp
root 2492482 144750 0 May14 ? 00:00:00 \_ tinylog (gearmand) -k 10 -s 100000000 -t -z /var/log/perp/gearmand
gearmand 2492483 144750 0 May14 ? 00:00:08 \_ /usr/sbin/gearmand -L 127.0.0.1 -p 4730 --verbose INFO --log-file stderr --keepalive --keepalive-idle 120 --keepalive-interval 120 --keepalive-count 3 --round-robin --threads 36 --worker-wakeup 3 --job-retries 1
root 2531800 144750 0 May14 ? 00:00:00 \_ tinylog (gds-rel-worker) -k 10 -s 100000000 -t -z /var/log/perp/gds-rel-worker
gds 2531801 144750 0 May14 ? 00:00:00 \_ /usr/bin/gearman -h 127.0.0.1 -t 1000 -w -f gds-rel -- xargs /home/gds/gds-rel-worker.sh
gds 2531880 2531801 0 May14 ? 00:00:00 \_ [xargs] <defunct>
So far I have traced the problem to run.sh, because if I replace its call with something simpler (e.g. echo "Hello"; sleep 5) the worker does not hang. Unfortunately, I have no clue what is causing the problem. The script run.sh is rather long and complex, but has been working without a problem so far. Tracing the worker process I see this:
getpid() = 2531801
write(2, "gearman: ", 9) = 9
write(2, "gearman_worker_work", 19) = 19
write(2, " : ", 3) = 3
write(2, "gearman_wait(GEARMAN_TIMEOUT) ti"..., 151) = 151
write(2, "\n", 1) = 1
sendto(5, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
recvfrom(5, "\0RES\0\0\0\n\0\0\0\0", 8192, MSG_NOSIGNAL, NULL, NULL) = 12
sendto(5, "\0REQ\0\0\0\4\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
poll([{fd=5, events=POLLIN}, {fd=3, events=POLLIN}], 2, 1000) = 1 ([{fd=5, revents=POLLIN}])
sendto(5, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
recvfrom(5, "\0RES\0\0\0\6\0\0\0\0\0RES\0\0\0(\0\0\0QH:terra-"..., 8192, MSG_NOSIGNAL, NULL, NULL) = 105
pipe([6, 7]) = 0
pipe([8, 9]) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fea38480a50) = 2531880
close(6) = 0
close(9) = 0
write(7, "1.2.3\n", 18) = 6
close(7) = 0
read(8, "which: no terraform-0.14 in (/us"..., 1024) = 80
read(8, "Identity added: /home/gds/.ssh/i"..., 1024) = 54
read(8, 0x7fff6251f5b0, 1024) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2531880, si_uid=1006, si_status=0, si_utime=0, si_stime=0} ---
read(8,
So the worker continues reading standard output even though the child has finished successfully and presumably closed it. Any ideas how to catch what causes this problem?

I was able to solve it. The script run.sh was starting ssh-agent, which opens a socket and since Gearman redirects all outputs the worker continued reading the open file descriptor even after the script successfully completed.
I found it by examining the open file descriptors for the Gearman worker process after it hang:
# ls -l /proc/2531801/fd/*
lr-x------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/0 -> /dev/null
l-wx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/1 -> 'pipe:[9356665]'
l-wx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/2 -> 'pipe:[9356665]'
lr-x------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/3 -> 'pipe:[9357481]'
l-wx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/4 -> 'pipe:[9357481]'
lrwx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/5 -> 'socket:[9357482]'
lr-x------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/8 -> 'pipe:[9369888]'
Then identified the processes using file node for the pipe in file descriptor 8 that German worker continued reading:
# lsof | grep 9369888
gearman 2531801 gds 8r FIFO 0,13 0t0 9369888 pipe
ssh-agent 2531899 gds 9w FIFO 0,13 0t0 9369888 pipe
And finally listed files opened by ssh-agent and found what stands behind file descriptor 3:
# ls -l /proc/2531899/fd/*
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/0 -> /dev/null
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/1 -> /dev/null
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/2 -> /dev/null
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/3 -> 'socket:[9346577]'
# lsof | grep 9346577
ssh-agent 2531899 gds 3u unix 0xffff89016fd34000 0t0 9346577 /tmp/ssh-0b14coFWhy40/agent.2531898 type=STREAM
As a solution I added kill of the ssh-agent before exit from run.sh script and now there are no jobs hanging due to zombie process.

bash - count, process, and increment thru multiple "Tasks" in log file

I have log files that are broken down into between 1 and 4 "Tasks". In each "Task" there are sections for "WU Name" and "estimated CPU time remaining". Ultimately, I want to the bash script output to look like this 3 Task example;
Task 1 Mini_Protein_binds_COVID-19_boinc_ 0d:7h:44m:28s
Task 2 shapeshift_pair6_msd4X_4_f_e0_161_ 0d:4h:14m:22s
Task 3 rep730_0078_symC_reordered_0002_pr 1d:1h:38m:41s
So far; I can count the Tasks in the log. I can isolate x number of characters I want from the "WU Name". I can convert the "estimated CPU time remaining" in seconds to days:hours:minutes:seconds. And I can output all of that into 'pretty' columns. Problem is that I can only process 1 Task using;
# Initialize counter
counter=1
# Count how many iterations
cnt_wu=`grep -c "WU name:" /mnt/work/sec-conv/bnc-sample3.txt`
# Iterate the loop for cnt-wu times
while [ $counter -le ${cnt_wu} ]
do
core_cnt=$counter
wu=`cat /mnt/work/sec-conv/bnc-sample3.txt | grep -Po 'WU name: \K.*' | cut -c1-34`
sec=`cat /mnt/work/sec-conv/bnc-sample3.txt | grep -Po 'estimated CPU time remaining: \K.*' | cut -f1 -d"."`
dhms=`printf '%dd:%dh:%dm:%ds\n' $(($sec/86400)) $(($sec%86400/3600)) $(($sec%3600/60)) \ $(($sec%60))`
echo "Task ${core_cnt}" $'\t' $wu $'\t' $dhms | column -ts $'\t'
counter=$((counter + 1))
done
Note: /mnt/work/sec-conv/bnc-sample3.txt is a static one Task sample only used for this scripts dev.
What I can't figure out is the next step which is to be able to process x number of multiple Tasks. I can't figure out how to leverage the while/counter combination properly, and can't figure out how to increment through the occurrences of Tasks.
Adding bnc-sample.txt (contains 3 Tasks)
1) -----------
name: Rosetta#home
master URL: https://boinc.bakerlab.org/rosetta/
user_name: XXXXXXX
team_name:
resource share: 100.000000
user_total_credit: 10266.993660
user_expavg_credit: 512.420495
host_total_credit: 10266.993660
host_expavg_credit: 512.603669
nrpc_failures: 0
master_fetch_failures: 0
master fetch pending: no
scheduler RPC pending: no
trickle upload pending: no
attached via Account Manager: no
ended: no
suspended via GUI: no
don't request more work: no
disk usage: 0.000000
last RPC: Wed Jun 10 15:55:29 2020
project files downloaded: 0.000000
GUI URL:
name: Message boards
description: Correspond with other users on the Rosetta#home message boards
URL: https://boinc.bakerlab.org/rosetta/forum_index.php
GUI URL:
name: Your account
description: View your account information
URL: https://boinc.bakerlab.org/rosetta/home.php
GUI URL:
name: Your tasks
description: View the last week or so of computational work
URL: https://boinc.bakerlab.org/rosetta/results.php?userid=XXXXXXX
jobs succeeded: 117
jobs failed: 0
elapsed time: 2892439.609931
cross-project ID: 3538b98e5f16a958a6bdd2XXXXXXXXX
======== Tasks ========
1) -----------
name: shapeshift_pair6_msd4X_4_f_e0_161_X_0001_0001_fragments_abinitio_SAVE_ALL_OUT_946179_730_0
WU name: shapeshift_pair6_msd4X_4_f_e0_161_X_0001_0001_fragments_abinitio_SAVE_ALL_OUT_946179_730
project URL: https://boinc.bakerlab.org/rosetta/
received: Mon Jun 8 09:58:08 2020
report deadline: Thu Jun 11 09:58:08 2020
ready to report: no
state: downloaded
scheduler state: scheduled
active_task_state: EXECUTING
app version num: 420
resources: 1 CPU
estimated CPU time remaining: 26882.771040
slot: 1
PID: 28434
CPU time at last checkpoint: 3925.896000
current CPU time: 4314.761000
fraction done: 0.066570
swap size: 431 MB
working set size: 310 MB
2) -----------
name: rep730_0078_symC_reordered_0002_propagated_0001_0001_0001_A_v9_fold_SAVE_ALL_OUT_946618_54_0
WU name: rep730_0078_symC_reordered_0002_propagated_0001_0001_0001_A_v9_fold_SAVE_ALL_OUT_946618_54
project URL: https://boinc.bakerlab.org/rosetta/
received: Mon Jun 8 09:58:08 2020
report deadline: Thu Jun 11 09:58:08 2020
ready to report: no
state: downloaded
scheduler state: scheduled
active_task_state: EXECUTING
app version num: 420
resources: 1 CPU
estimated CPU time remaining: 26412.937920
slot: 2
PID: 28804
CPU time at last checkpoint: 3829.626000
current CPU time: 3879.975000
fraction done: 0.082884
swap size: 628 MB
working set size: 513 MB
3) -----------
name: Mini_Protein_binds_COVID-19_boinc_site3_2_SAVE_ALL_OUT_IGNORE_THE_REST_0aw6cb3u_944116_2_0
WU name: Mini_Protein_binds_COVID-19_boinc_site3_2_SAVE_ALL_OUT_IGNORE_THE_REST_0aw6cb3u_944116_2
project URL: https://boinc.bakerlab.org/rosetta/
received: Mon Jun 8 09:58:47 2020
report deadline: Thu Jun 11 09:58:46 2020
ready to report: no
state: downloaded
scheduler state: scheduled
active_task_state: EXECUTING
app version num: 420
resources: 1 CPU
estimated CPU time remaining: 27868.559616
slot: 0
PID: 30988
CPU time at last checkpoint: 1265.356000
current CPU time: 1327.603000
fraction done: 0.032342
swap size: 792 MB
working set size: 668 MB
Again, I appreciate any guidance!

icinga2 disk space check or with three arguments

I am trying to configure icinga2 to monitor my linux server disk space using check_nrpe. my configuraiton is given below
nrpe.cfg:
command[check_root]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
icinga configuration
object CheckCommand "nrpe-check-2arg" {
import "plugin-check-command"
command = [PluginDir + "/check_nrpe" ]
arguments = {
"-H" = "$host_name$"
"-c" = "$check$"
"-a" = "$loads$"
}
}
object Service "testing-haproxy-master: / disk space" {
import "generic-service"
host_name = "tmahaprx01.verizon.com"
check_command = "nrpe-check-2arg"
vars.address = "192.168.1.104"
vars.check = "check_root"
vars.loads = "80%!90%!/"
}
Now the out put i am getting is
root#icinga:/etc/icinga2/hosts# /usr/lib/nagios/plugins/check_nrpe -H 192.168.1.104 -c check_root -a '80%C!90%!/'
DISK OK - free space: /sys/fs/cgroup 0 MB (100% inode=99%); /dev 1457 MB (99%
inode=99%); /run 293 MB (99% inode=99%); /run/lock 5 MB (100% inode=99%);
/run/shm 1468 MB (100% inode=99%); /run/user 100 MB (100% inode=99%);|
/sys/fs/cgroup=0MB;0;0;0;0 /dev=0MB;291;145;0;1457 /run=0MB;58;29;0;293
/run/lock=0MB;0;0;0;5 /run/shm=0MB;293;146;0;1468 /run/user=0MB;19;9;0;100
The expecting output when I execute from my remote Linux machine is
root#tmahaprx01:~# /usr/lib/nagios/plugins/check_disk -w 80% -c 90% -p /
DISK OK - free space: / 43144 MB (96% inode=97%);| /=1743MB;9462;4731;0;47314
Could you please guide me how i can pass the third argument (/) ?

The problem with NRPE is that you're writing a command that executes another command. Assuming that the nrpe.cfg includes something like this:
command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
you know that the path must be the 3rd argument:
object CheckCommand "nrpe-disk" {
import "nrpe"
vars.nrpe_arguments = [ "$disk_wfree$%", "$disk_cfree$%", "$disk_partition$" ]
vars.nrpe_command = "check_disk"
//variables should be propagated from host/group definition
vars.disk_wfree = 20
vars.disk_cfree = 10
vars.disk_partition = "/"
}
variable names might be dependent on Icinga version, check the original nrpe command definition on your system, it might be located in:
/usr/share/icinga2/include/command-plugins.conf

Crond execute shell many times

I have the following timey.cpp code in RedHat 2.6.32 x86_64:
using namespace std ;
int main()
{
while( 1 ){
char x[64]={0} ;
strcpy( x,"1234567890") ;
std::string s = x ;
std::cout << "(" << x << ")" << std::endl ;
struct timeval localtimex ;
long secs,usecs ;
gettimeofday(&localtimex,0x00) ;
secs = localtimex.tv_sec ;
usecs = localtimex.tv_usec ;
//long mills = (time.tv_sec * 1000) + (time.tv_usec / 1000 ) ;
printf("secs=(%d),usecs=(%d)\n",secs,usecs) ;
sleep( 1 ) ;
} //while
}
in /home/informix/test, compiled by g++ --std=c++0x timey.cpp -o timey.exe,
and the shell timey.sh:
source /etc/bashrc
nohup /home/informix/test/timey.exe &
Then I run timey.sh by:
/home/informix/test/timey.sh
and take a look if timey.exe runs by:
ps -ef | grep timey
It seems timey.exe runs as expected:
informix 41340 1 0 10:32 pts/10 00:00:00 /home/informix/test/timey.exe
What confuses me is that I add this shell to crontab:
38 10 * * 1-5 informix /home/informix/test/timey.sh
and restart crond:
/etc/init.d/crond restart
What surprises me is I see 4 copies of timey.exe running:
ps -ef | grep timey
informix 41498 1 0 10:38 ? 00:00:00 /home/informix/test/timey.exe
informix 41499 1 0 10:38 ? 00:00:00 /home/informix/test/timey.exe
informix 41529 1 0 10:38 ? 00:00:00 /home/informix/test/timey.exe
informix 41561 1 0 10:38 ? 00:00:00 /home/informix/test/timey.exe
What did I do wrong so that 4 copies of timey.exe are running?
I see the following in /var/log/cron:
Jul 2 10:38:01 localhost CROND[41440]: (informix) CMD (/home/informix/test/timey.sh )
Jul 2 10:38:01 localhost CROND[41439]: (informix) CMD (/home/informix/test/timey.sh )
Jul 2 10:38:01 localhost CROND[41491]: (informix) CMD (/home/informix/test/timey.sh )
Jul 2 10:38:01 localhost CROND[41533]: (informix) CMD (/home/informix/test/timey.sh )
It seems that crond really ran timey.sh 4 times, but why?
Also:
In RedHat 2.6.32-279.el6.x86_64, it works!
In RedHat 2.6.32-358.el6.x86_64, it does not work!

Mac OS X: GNU parallel can't find the number of cores on a remote server

I used homebrew to install GNU parallel on my mac so I can run some tests remotely on my University's servers. I was quickly running through the tutorials, but when I ran
parallel -S <username>#$SERVER1 echo running on ::: <username>#$SERVER1
I got the message
parallel: Warning: Could not figure out number of cpus on <username#server> (). Using 1.
Possibly related, I never added parallel to my path and got the warning that "parallel" wasn't a recognized command, but parallel ran anyways and still echo'd correctly. This particular server has 16 cores, how can I get parallel to recognize them?

GNU Parallel is less tested on OS X as I do not have access to an OS X installation, so you have likely found a bug.
GNU Parallel has since 20120322 used these to find the number of CPUs:
sysctl -n hw.physicalcpu
sysctl -a hw 2>/dev/null | grep [^a-z]physicalcpu[^a-z] | awk '{ print \$2 }'
And the number of cores:
sysctl -n hw.logicalcpu
sysctl -a hw 2>/dev/null | grep [^a-z]logicalcpu[^a-z] | awk '{ print \$2 }'
Can you test what output you get from those?
Which version of GNU Parallel are you using?
As a work around you can force GNU Parallel to detect 16 cores:
parallel -S 16/<username>#$SERVER1 echo running on ::: <username>#$SERVER1
Since version 20140422 you have been able to export your path to the remote server:
parallel --env PATH -S 16/<username>#$SERVER1 echo running on ::: <username>#$SERVER1
That way you just need to add the dir where parallel lives on the server to your path on local machine. E.g. parallel on the remote server is in /home/u/user/bin/parallel:
PATH=$PATH:/home/u/user/bin parallel --env PATH -S <username>#$SERVER1 echo running on ::: <username>#$SERVER1
Information for Ole
My iMac (OSX MAvericks on Intel core i7) gives the following, which all looks correct:
sysctl -n hw.physicalcpu
4
sysctl -a hw
hw.ncpu: 8
hw.byteorder: 1234
hw.memsize: 17179869184
hw.activecpu: 8
hw.physicalcpu: 4
hw.physicalcpu_max: 4
hw.logicalcpu: 8
hw.logicalcpu_max: 8
hw.cputype: 7
hw.cpusubtype: 4
hw.cpu64bit_capable: 1
hw.cpufamily: 1418770316
hw.cacheconfig: 8 2 2 8 0 0 0 0 0 0
hw.cachesize: 17179869184 32768 262144 8388608 0 0 0 0 0 0
hw.pagesize: 4096
hw.busfrequency: 100000000
hw.busfrequency_min: 100000000
hw.busfrequency_max: 100000000
hw.cpufrequency: 3400000000
hw.cpufrequency_min: 3400000000
hw.cpufrequency_max: 3400000000
hw.cachelinesize: 64
hw.l1icachesize: 32768
hw.l1dcachesize: 32768
hw.l2cachesize: 262144
hw.l3cachesize: 8388608
hw.tbfrequency: 1000000000
hw.packages: 1
hw.optional.floatingpoint: 1
hw.optional.mmx: 1
hw.optional.sse: 1
hw.optional.sse2: 1
hw.optional.sse3: 1
hw.optional.supplementalsse3: 1
hw.optional.sse4_1: 1
hw.optional.sse4_2: 1
hw.optional.x86_64: 1
hw.optional.aes: 1
hw.optional.avx1_0: 1
hw.optional.rdrand: 0
hw.optional.f16c: 0
hw.optional.enfstrg: 0
hw.optional.fma: 0
hw.optional.avx2_0: 0
hw.optional.bmi1: 0
hw.optional.bmi2: 0
hw.optional.rtm: 0
hw.optional.hle: 0
hw.cputhreadtype: 1
hw.machine = x86_64
hw.model = iMac12,2
hw.ncpu = 8
hw.byteorder = 1234
hw.physmem = 2147483648
hw.usermem = 521064448
hw.pagesize = 4096
hw.epoch = 0
hw.vectorunit = 1
hw.busfrequency = 100000000
hw.cpufrequency = 3400000000
hw.cachelinesize = 64
hw.l1icachesize = 32768
hw.l1dcachesize = 32768
hw.l2settings = 1
hw.l2cachesize = 262144
hw.l3settings = 1
hw.l3cachesize = 8388608
hw.tbfrequency = 1000000000
hw.memsize = 17179869184
hw.availcpu = 8
sysctl -n hw.logicalcpu
8

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sphinx: Permission denied/Broken pipe on deltas merge - windows

Related

Gearman worker in shell hangs as a zombie

bash - count, process, and increment thru multiple "Tasks" in log file

icinga2 disk space check or with three arguments

Crond execute shell many times

Mac OS X: GNU parallel can't find the number of cores on a remote server

Categories

Resources