How to make groups for pdsh? - cluster-computing

my pdsh can only read /etc/genders for grouping, but I dont know how to generate a genders file for it. I prefer to use dsh style group files(/etc/dsh/group/nodes) for it, but the module dshgroup can not be activated.
I am in Debian 7:
$ uname -r
Linux version 3.2.0-4-amd64 (debian-kernel#lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.63-2
$ pdsh -L
4 modules loaded:
Module: rcmd/exec
Author: Mark Grondona <mgrondona#llnl.gov>
Descr: arbitrary command rcmd connect method
Active: yes
Module: misc/genders
Author: Jim Garlick <garlick#llnl.gov>
Descr: target nodes using libgenders and genders attributes
Active: yes
Options:
-g query,... target nodes using genders query
-X query,... exclude nodes using genders query
-F file use alternate genders file `file'
-i request alternate or canonical hostnames if applicable
-a target all nodes except those with "pdsh_all_skip" attribute
-A target all nodes listed in genders database
Module: rcmd/rsh
Author: Jim Garlick <garlick#llnl.gov>
Descr: BSD rcmd connect method
Active: yes
Module: rcmd/ssh
Author: Jim Garlick <garlick#llnl.gov>
Descr: ssh based rcmd connect method
Active: yes
$ cat /etc/pdsh/machines
10.0.0.1
10.0.0.101
$ cat /etc/dsh/group/nodes
10.0.0.101
$ pdsh -a uptime
no remote hosts specified
$ pdsh -g nodes uptime
no remote hosts specified
$ ls -l /etc/genders
-rw-r--r-- 1 root root 0 Nov 3 19:49 /etc/genders
/etc/genders is empty, because I don't know how to generate.
if I delete /etc/genders:
$ sudo rm /etc/genders
$ pdsh -a uptime
/etc/genders: error opening genders file
how to make pdsh to read dsh style group files?

Try these links they have sample genders files & parsing checks..
https://web.archive.org/web/20150915154542/https://computing.llnl.gov/linux/genders.html (archive.org copy; original link is dead)
https://web.archive.org/web/20150201063434/http://www.phillippadgett.com/blog (archive.org copy, original site is dead)
To get dshgroup module installed, you'll have to compile pdsh yourself. I think the apt package maintainer didn't enable ALL modules by default. May be you can file a bug with him.

pdsh is still a powerful tool.
As most refences seem to have vanished,
I thought I'd do a sample file here for all of us.
node001 gpu,compute,randomgroup
node002 compute,randomgroup
You can now pdsh across nodes using the groups gpu,compute or randomgroup.

Related

weird output when i run sh file.

I've made a sh file
#!/bin/bash
myvariable=Hello
anothervar=Fred
echo $myvariable $anothervar
echo
sampledir=/etc
ls $sampledir
when i run it with
$sh simplevariables.sh
I get this output:
afpovertcp.cfg networks
afpovertcp.cfg~orig networks~orig
aliases newsyslog.conf
aliases.db newsyslog.d
apache2 nfs.conf
asl nfs.conf~orig
asl.conf notify.conf
auto_home ntp-restrict.conf
auto_master ntp.conf
auto_master~orig ntp_opendirectory.conf
autofs.conf openldap
bashrc pam.d
bashrc_Apple_Terminal passwd
bashrc~previous passwd~orig
com.apple.IPConfiguration.plist paths
com.apple.screensharing.agent.launchd paths~orig
csh.cshrc periodic
csh.cshrc~orig pf.anchors
csh.login pf.conf
csh.login~orig pf.os
csh.logout php-fpm.conf.default
csh.logout~orig php.ini.default
cups php.ini.default-previous
defaults postfix
dnsextd.conf ppp
efax.rc~previous profile
emond.d profile~orig
find.codes protocols
find.codes~orig protocols~previous
fstab.hd racoon
fstab.hd~previous rc.common
ftpd.conf rc.common~previous
ftpd.conf.default rc.netboot
ftpusers resolv.conf
ftpusers~orig rmtab
gettytab rpc
gettytab~orig rpc~previous
group rtadvd.conf
group~previous rtadvd.conf~previous
hosts security
hosts-original services
hosts.equiv services~previous
hosts~orig shells
irbrc shells~orig
kern_loader.conf snmp
kern_loader.conf~previous ssh
krb5.keytab ssl
localtime sudo_lecture
locate.rc sudoers
mach_init.d sudoers.d
mach_init_per_login_session.d sudoers~orig
mach_init_per_user.d syslog.conf
mail.rc syslog.conf~previous
mail.rc~orig ttys
man.conf ttys~previous
manpaths xtab
master.passwd zprofile
master.passwd~orig zshrc
nanorc
Any suggestions how to just get an output of
Hello Fred
What should I do to get rid of all the unnecessary Garbo and just the output?
This happens to all the other scripting files that I run as well. Any suggestions?
You should remove this
sampledir=/etc
ls $sampledir
it is printing out the contents of /etc, and you are not seeing your wanted output

SGE submitted job state doesn't change from "qw"

I'm using Sun Grid Engine on ubuntu 14.04 to queue my jobs to be run on a multicore CPU.
I've installed and set up SGE on my system. I created a "hello_world" dir which contains two shell scripts namely "hello_world.sh" & "hello_world_qsub.sh", first one including a simple command and second one including qsub command to submit the first script file as a job to be run.
Here's what "hello_world.sh" includes:
#!/bin/bash
echo "Hello world" > /home/theodore/tmp/hello_world/hello_world_output.txt
And here's what "hello_world_qsub.sh" includes:
#!/bin/bash
qsub \
-e /home/hello_world/hello_world_qsub.error \
-o /home/hello_world/hello_world_qsub.log \
./hello_world.sh
after giving permission to the second sh file and running it with "./hello_world_qsub.sh" command from the specified dir, the output is reasonable:
Your job 1 ("hello_world.sh") has been submitted
But the output of "qstat" command is frustrating:
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1 0.50000 hello_worl mhr qw 05/16/2016 20:26:23 1
And the "state" column always remains on "qw" and never changes to "r".
Here's the output of "qstat -j 1" command:
==============================================================
job_number: 1
exec_file: job_scripts/1
submission_time: Mon May 16 20:26:23 2016
owner: mhr
uid: 1000
group: mhr
gid: 1000
sge_o_home: /home/mhr
sge_o_log_name: mhr
sge_o_path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
sge_o_shell: /bin/bash
sge_o_workdir: /home/mhr/hello_world
sge_o_host: localhost
account: sge
stderr_path_list: NONE:NONE:/home/hello_world/hello_world_qsub.error
mail_list: mhr#localhost
notify: FALSE
job_name: hello_world.sh
stdout_path_list: NONE:NONE:/home/hello_world/hello_world_qsub.log
jobshare: 0
env_list:
script_file: ./hello_world.sh
scheduling info: queue instance "mainqueue#localhost" dropped because it is temporarily not available
All queues dropped because of overload or full
And here's the output of "qhost" command:
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
localhost - - - - - - -
What should I do to make my jobs run and finish their task?
From your qhost output, it looks like your machine "localhost" is properly configured in SGE. However, on "localhost" sge_execd is either not running or not configured properly. If it were, qhost would report statistics for "localhost".

Convert SNMP traps from v1 to v3

I'm trying to convert snmp v1 traps to v3. I've followed this discussion but it's vague.
I've also looked here but without success.
To be more clear: I have a Centos 6 station, with net-snmp 5.5 on it. I need to generate v1 traps, receive them, convert them to v3, then forward them.
Regarding the first guide, this is what I managed so far:
Master:
snmpd -Lo --master=agentx --agentXSocket=tcp:192.168.58.64:42000 udp:1161
Listen:
snmpwalk -v3 -u snmpv3user -A snmpv3pass -a MD5 -l authnoPriv 192.168.58.64:1161
Later edit:
I have made some progress, I was able to run snmpd as master, connect snmptrapd as agent to it, then have v1 traps mechanism functional.
I did the following:
In order to get snmptrapd connected as a subagent to snmpd you need to do the following:
###1 EDIT /etc/hosts.allow and add
snmpd: $(your_ip)
smptrapd: $(your_ip)
this is important because snmptrapd fails silently if rejected
by tcp wrap.
###2 EDIT /etc/snmp/snmpd.conf and add at the bottom of the other
com2sec directives.
com2sec infwnet $(your_ip) YOUR-COMMUNITY
add these lines
group MyROGroup v1 infwnet
group MyROGroup v2c infwnet
group MyROGroup usm infwnet
under
"# Second, map the security names into group names:"
add this view at the bottom of the other views
view all included .1 80
add this group acces at the bottom of other group access directives
access MyROGroup "" any noauth exact all none none
add this line as well:
master agentx
###3 TEST it with this:
snmpwalk -v1 -c YOUR_COMMUNITY $(your_ip) .
###4 CREATE THE FOLLOWING TRAP TEST EXAMPLE:
touch /usr/share/snmp/mibs/UCD-TRAP-TEST-MIB.txt
###5 COPY PASTE THE TEXT BELOW INTO IT:
UCD-TRAP-TEST-MIB DEFINITIONS ::= BEGIN
IMPORTS ucdExperimental FROM UCD-SNMP-MIB;
demotraps OBJECT IDENTIFIER ::= { ucdExperimental 990 }
demoTrap TRAP-TYPE
ENTERPRISE demotraps
VARIABLES { sysLocation }
DESCRIPTION "An example of an SMIv1 trap"
::= 17
END
###6 EDIT /etc/sysconfig/snmptrapd (not /etc/default/snmptrapd !!)
replace OPTIONS with this:
OPTIONS="-Lsd -m ALL -M /usr/share/snmp/mibs -p /var/run/snmptrapd.pid"
###7 TEST IT WITH
snmptrap -v 1 -c public $(your_ip) UCD-TRAP-TEST-MIB::demotraps "" 6 17 "" SNMPv2-MIB::sysLocation.0 s "Just here"
Now I just need to find a way to convert them to v3 and read/receive them from a remote snmpd

set additional folder for snmp MIBs

I am rebuilding an Icinga server that has been left behind by a previous employee. I have everything up and running, except for a bunch of MIB files for 3com switches that I cannot get to work.
The server is a CentOS 6 OpenVZ container.
In the original server there is a bunch of mib files in the default location at /usr/share/snmp/mibs/ and the 3com ones at /usr/share/snmp/mibs/3Com_4500/MIBs. The 3Com mibs work fine:
/usr/lib/nagios/plugins/check_snmp -H 10.10.111.11 -P 2c -C public -o hwDevMFanStatus.65536 -s "active(1)" -m A3COM-HUAWEI-LswDEVM-MIBSNMP OK - active(1) |
In the new server, the MIBs in the 3com folder do not get acknowledged and I get errors like the following:
/usr/lib/nagios/plugins/check_snmp -H 10.10.111.11 -P2c -C someuser -o hwDevMFanStatus.65536 -s "active(1)" -m A3COM-HUAWEI-LswDEVM-MIB
External command error: No log handling enabled - turning on stderr logging
Cannot find module (A3COM-HUAWEI-LswDEVM-MIB): At line 0 in (none)
hwDevMFanStatus.65536: Unknown Object Identifier (Sub-id not found: (top) -> hwDevMFanStatus)
/etc/snmp/snmpd.conf is identical for both servers and so is /etc/sysconfig/snmp.
set does not show any ENV variable related to snmp or mib.
Thanks
You are confusing snmpd.conf and snmp.conf the former being the configuration file for the SNMP daemon whereas Net-SNMP applications use snmp.conf.
The mibs/mibdirs directives you are interested in would be specified in snmp.conf (see also man snmp.conf.

system(): why do I not have the same permissions when using R in EMACS as I do in the bash terminal?

update: the error only occurs when logged into R from within emacs
what works:
When I ssh into a remote server and run
$ ./foo.rb
from the bash shell, it works. Furthermore, if I launch R and execute
$ R
system('./foo.rb')
I am in a group with permission to read/write/execute the file. File permissions are -rwxrwx---
what doesn't work:
Launch emacs and start an R session:
M-x R
ssh-myserver:.
system('./foo.rb')
I get the following error:
ruby: Permission denied -- foo.rb (LoadError)
why is this? Is there a way to work around this?
I can not find any information from ?system or ?system2
Here is the output from sessionInfo()
> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] PECAn_0.1.1 xtable_1.5-6 gridExtra_0.7 RMySQL_0.7-5
[5] DBI_0.2-5 ggplot2_0.8.9 proto_0.3-8 reshape_0.8.3
[9] plyr_1.6 rjags_2.2.0-2 coda_0.13-5 lattice_0.19-17
[13] randtoolbox_1.09 rngWELL_0.9 MASS_7.3-11 XML_3.2-0
loaded via a namespace (and not attached):
[1] digest_0.4.2
Warning message:
'DESCRIPTION' file has 'Encoding' field and re-encoding is not possible
output of 'id' and 'env' from ssh and emacs, per comment by #sarnold (changed user names, group names, and ip addresses)
1. server
1.1 'id'
uid=1668(dleb) gid=1668(dleb) groups=117(ebusers),159(lab_admin),166(lab),1340(pal_web),1668(dleb)
1.2 'env'
LC_PAPER=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
SHELL=/usr/local/bin/system-specific
KDE_NO_IPV6=1
SSH_CLIENT=888.888.888.88 51857 22
NCARG_FONTCAPS=/usr/lib64/ncarg/fontcaps
LC_NUMERIC=en_US.UTF-8
USER=dleb
LS_COLORS=
LC_TELEPHONE=en_US.UTF-8
KDEDIR=/usr
NCARG_GRAPHCAPS=/usr/lib64/ncarg/graphcaps
MAIL=/var/mail/dleb
PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/opt/dell/srvadmin/bin
LC_IDENTIFICATION=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
R_LIBS=/home/a-m/dleb/lib/R
PWD=/home/dleb
NCARG_ROOT=/usr
KDE_IS_PRELINKED=1
LANG=en_US.UTF-8
NCARG_DATABASE=/usr/lib64/ncarg/database
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
LOADEDMODULES=
LC_MEASUREMENT=en_US.UTF-8
NCARG_LIB=/usr/lib64/ncarg
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
NCARG_NCARG=/usr/share/ncarg
SHLVL=1
HOME=/home/a-m/dleb
LOGNAME=dleb
CVS_RSH=ssh
SSH_CONNECTION=888.888.888.88 51857 999.999.999.99 22
LC_CTYPE=en_US.UTF-8
MODULESHOME=/usr/share/Modules
LESSOPEN=|/usr/bin/lesspipe.sh %s
DISPLAY=localhost:15.0
LC_TIME=en_US.UTF-8
G_BROKEN_FILENAMES=1
LC_NAME=en_US.UTF-8
_=/bin/env
emacs/ess R session
2.1 system('id')
uid=1668(dleb) gid=1668(dleb) groups=117(ebusers),159(lab_admin),166(lab),1340(pal_web),1668(dleb)
2.2 system('env')
LN_S=ln -s
R_TEXI2DVICMD=/usr/bin/texi2dvi
LC_PAPER=en_US.UTF-8
SED=/bin/sed
LC_ADDRESS=en_US.UTF-8
R_PDFVIEWER=/usr/bin/xdg-open
LC_MONETARY=en_US.UTF-8
HOSTNAME=ebi-forecast
R_INCLUDE_DIR=/usr/include/R
R_PRINTCMD=lpr
SHELL=/usr/local/bin/system-specific
TERM=dumb
AWK=gawk
HISTSIZE=1
R_RD4DVI=ae
SSH_CLIENT=888.888.888.88 51159 22
KDE_NO_IPV6=1
R_RD4PDF=times,hyper
R_PAPERSIZE=a4
NCARG_FONTCAPS=/usr/lib64/ncarg/fontcaps
PERL=/usr/bin/perl
LC_NUMERIC=en_US.UTF-8
SSH_TTY=/dev/pts/14
LC_ALL=C
EMACS=t
USER=dleb
LC_TELEPHONE=en_US.UTF-8
LS_COLORS=
LD_LIBRARY_PATH=/usr/lib64/R/lib:/usr/local/lib64:/usr/lib/jvm/jre/lib/amd64/server:/usr/lib/jvm/jre/lib/amd64:/usr/lib/jvm/java/lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
TAR=/bin/gtar
ENV=
R_ZIPCMD=/usr/bin/zip
KDEDIR=/usr
PAGER=/usr/bin/less
NCARG_GRAPHCAPS=/usr/lib64/ncarg/graphcaps
R_GZIPCMD=/usr/bin/gzip
PATH=/bin:/usr/bin:/usr/sbin:/usr/local/bin
LC_COLLATE=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
EGREP=/bin/grep -E
PWD=/home/a-m/dleb/pecan
INPUTRC=/etc/inputrc
R_LIBS=/home/a-m/dleb/lib/R
NCARG_ROOT=/usr
R_SHARE_DIR=/usr/share/R
WHICH=/usr/bin/which
EDITOR=vi
LANG=en_US.UTF-8
KDE_IS_PRELINKED=1
R_LIBS_SITE=/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib64/R/library:/usr/share/R/library
M ODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
NCARG_DATABASE=/usr/lib64/ncarg/database
LC_MEASUREMENT=en_US.UTF-8
LOADEDMODULES=
PS3=
R_BROWSER=/usr/bin/xdg-open
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
NCARG_LIB=/usr/lib64/ncarg
HOME=/home/a-m/dleb
SHLVL=1
NCARG_NCARG=/usr/share/ncarg
R_ARCH=
TR=/usr/bin/tr
MAKE=make
R_UNZIPCMD=/usr/bin/unzip
LOGNAME=dleb
CVS_RSH=ssh
LC_CTYPE=en_US.UTF-8
SSH_CONNECTION=888.888.888.88 51159 999.999.999.99 22
R_BZIPCMD=/usr/bin/bzip2
MODULESHOME=/usr/share/Modules
LESSOPEN=|/usr/bin/lesspipe.sh %s
PROMPT_COMMAND=
R_HOME=/usr/lib64/R
DISPLAY=localhost:22.0
R_PLATFORM=x86_64-redhat-linux-gnu
INSIDE_EMACS=23.2.1,tramp:2.1.18-23.2
R_LIBS_USER=~/R/x86_64-redhat-linux-gnu-library/2.12
LC_TIME=en_US.UTF-8
R_DOC_DIR=/usr/share/doc/R-2.12.2
R_SESSION_TMPDIR=/tmp/RtmpqA6bpJ
HISTFILE=/home/a-m/dleb/.tramp_history
G_BROKEN_FILENAMES=1
LC_NAME=en_US.UTF-8
_=/bin/env
Assuming you started up R as the same user, you do. You error is not coming from a permissions problem for foo.rb, however, or else your shell would be giving the error. (i.e. sh: ./test.rb: Permission denied; see example below). Here, ruby itself is giving the error. Without knowing exactly what is in your foo.rb, I would suggest digging in there to see what it is trying to load/source, and checking the permissions on those.
#!/usr/bin/env ruby
puts 'Hello world'
Now in R....
> system('ls -l test.rb')
-rw-r--r-- 1 jcolby staff 40 Oct 21 08:23 test.rb
> system('./test.rb')
sh: ./test.rb: Permission denied
> system('chmod a+x test.rb')
> system('./test.rb')
Hello world
I presume the M ODULEPATH in the Emacs-derived output is simply a copy and paste typo.
The differences between the two env outputs is much greater than I expected; I've selected the ones that look slightly suspicious to me:
$ diff -u works fails
--- works 2011-10-24 15:04:02.000000000 -0700
+++ fails 2011-10-24 15:12:36.000000000 -0700
...
+LD_LIBRARY_PATH=/usr/lib64/R/lib:/usr/local/lib64:/usr/lib/jvm/jre/lib/amd64/server:/usr/lib/jvm/jre/lib/amd64:/usr/lib/jvm/java/lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
...
-PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/opt/dell/srvadmin/bin
-PWD=/home/dleb
...
+PATH=/bin:/usr/bin:/usr/sbin:/usr/local/bin
...
+PWD=/home/a-m/dleb/pecan
...
In the emacs-derived session, your LD_LIBRARY_PATH environment variable may be changing specifics of which dynamically linked libraries are being used when executing ruby. If you ssh in to your server and execute your foo.rb with the changed LD_LIBRARY_PATH, does it work or fail?
LD_LIBRARY_PATH=/usr/lib64/R/lib:/usr/local/lib64:/usr/lib/jvm/jre/lib/amd64/server:/usr/lib/jvm/jre/lib/amd64:/usr/lib/jvm/java/lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib ./foo.rb
The PATH environment variable between the two sessions is different; perhaps you have permission to execute /usr/local/bin/ruby (or the libraries in /usr/local/lib/ruby/) but not /usr/bin/ruby (or the libraries in /usr/lib/ruby/). Does your script use #!env ruby or does it use #!/usr/bin/ruby (or some other fixed path)?
Your pwd in one instance is /home/dleb, the other /home/a-m/dleb/pecan -- but HOME is set to /home/a-m/dleb on both systems. Is /home/dleb a symbolic link or does it actually exist separate from /home/a-m/dleb? (This really is grasping at straws -- I don't think this is it, but this problem is baffling.)
One last thing to consider: is your server confined with a tool such as AppArmor, SELinux, TOMOYO, or SMACK? Any of these mandatory access control tools can prevent an application from writing in specific locations, perhaps they aren't yet configured for your site. Check dmesg(1) output to see if there are any rejection messages, most or all these tools log to dmesg(1) if auditd(8) isn't running.

Resources