My goal is to do the following:
1) Check how much memory is being used by each GPU on a specific server. I accomplish this with (nvidia-smi --query-gpu=memory.free --format=csv).
2) Find the GPU with the maximum free memory. I accomplish this with my_cmd(). It works for the remote server I am currently logged into.
3) If the maximum free memory on the remote server I'm logged into is less than 1000 MiB, SSH into each other GPU server in the cluster to find the maximum free memory available. These servers are labelled according to to_check.
My current issue:
The code below works when scriptuse is given the cd command, etc.
The code below fails when scriptuse is given mycmd. It gives me the error:
bash: my_cmd: command not found.
Now, I think there's more than one problem here. First, I think I'm not providing my_cmd properly to the ssh command. Second, when I use my_cmd, I don't think I'm successfully sshing into the other servers.
Can anyone point out what is wrong and how to fix it?
The complete bash script is below.
#/bin/bash
#https://stackoverflow.com/questions/45313313/nvidia-smi-command-in-bash-vs-in-terminal-for-maximum-of-an-array/45313404#45313404
my_cmd()
{
max_idx=0
max_mem=0
idx=0
{
read _; # discard first line (header)
while read -r mem _; do # for each subsequent line, read first word into mem
if (( mem > max_mem )); then # compare against maximum mem value seen
max_mem=$mem # ...if greater, then update both that max value
max_idx=$idx # ...and our stored index value.
fi
((++idx))
done
} < <(nvidia-smi --query-gpu=memory.free --format=csv)
echo "Maximum memory seen is $max_mem, at processor $idx"
}
tocheck=('4' '5' '6' '7' '8') #The GPUs to check
it1=1
#scriptuse="my_cmd"
scriptuse= "cd ~/spatial; pwd; echo $gpuval"
while [ $it1 -lt ${#tocheck[#]} ] ; do #While we stil don't have enough free memory
echo $it1
gpuval=${tocheck[$it1]}
ssh gpu${gpuval} "${scriptuse}"
it1=$[it1+1]
done
EDIT
Thank you very much for the help, but my problem is not yet solved. I have done this:
1) Remove my_cmd from my bash script. It now looks like this:
#/bin/bash
#https://stackoverflow.com/questions/45313313/nvidia-smi-command-in-bash-vs-in-terminal-for-maximum-of-an-array/45313404#45313404
tocheck=('4' '5' '6' '7' '8') #The GPUs to check
it1=1
scriptuse= "cd ~/spatial; echo $gpuval"
while [ $it1 -lt ${#tocheck[#]} ] ; do #While we stil don't have enough free memory
echo $it1
gpuval=${tocheck[$it1]}
ssh gpu${gpuval} "${scriptuse}" /my_script.sh
it1=$[it1+1]
done
2) Create a separate bash script called my_script.sh that contains my_cmd:
#/bin/bash
#https://stackoverflow.com/questions/45313313/nvidia-smi-command-in-bash-vs-in-terminal-for-maximum-of-an-array/45313404#45313404
max_idx=0
max_mem=0
idx=0
{
read _; # discard first line (header)
while read -r mem _; do # for each subsequent line, read first word into mem
if (( mem > max_mem )); then # compare against maximum mem value seen
max_mem=$mem # ...if greater, then update both that max value
max_idx=$idx # ...and our stored index value.
fi
((++idx))
done
} < <(nvidia-smi --query-gpu=memory.free --format=csv)
echo "Maximum memory seen is $max_mem, at processor $idx"
3) Ran chmod to ensure both files can be run.
4) Ensured both files exist on all GPUs in the cluster (they have a common storage).
5) Ran ./test_run, which is the bash script from step 1.
I get the error:
./test_run.sh: line 8: cd ~/spatial; echo : No such file or directory
1
bash: /my_script.sh: No such file or directory
2
bash: /my_script.sh: No such file or directory
3
bash: /my_script.sh: No such file or directory
4
bash: /my_script.sh: No such file or directory
EDIT: The final solution
Thanks to the accepted answer below and the discussion in the comments, here's what ended up working:
1) Leave my_script as it is in the previous edit.
2) The file test_run should look like this:
#/bin/bash
tocheck=('4' '5' '6' '7' '8') #The GPUs to check
it1=1
while [ $it1 -lt ${#tocheck[#]} ] ; do #While we still don't have enough free memory
echo $it1
gpuval=${tocheck[$it1]}
ssh gpu${gpuval} ~/spatial/my_script.sh
it1=$[it1+1]
done
I think the reason this works is that all of the GPUs on the cluster have a common storage, so they all have access to /user/spatial.
The environment your script is running in (your shell) is totally unrelated to the environment the remote host is running in (the remote shell). If you define a function my_cmd in your shell it will not be transmitted across the wire to the remote host's shell.
Try a simpler example:
$ foo() { echo foo; }
$ foo
foo
$ ssh remote-host foo
bash: foo: command not found
This simply isn't how SSH, Bash, and Linux/POSIX are designed. Now, ssh does update some parts of the remote environment (as detailed in man ssh), but this is limited to certain environment variables, not functions.
Notably, the remote shell might not even be the same type of shell as your (e.g. yours might be Bash, but the remote shell might be Zsh), so it's not possible generally to transmit shell functions across ssh.
A much simpler and more reliable option is to create a shell script (rather than a function) that you intend to be run on the remote shell, and ensure that script exists on the remote machine. For example:
# Copy the script to the remote host's /tmp directory
scp my_cmd.sh remote-host:/tmp
# Invoke the script on the remote host
$ ssh remote-host /tmp/my_cmd.sh
Edit:
./test_run.sh: line 8: cd ~/spatial; echo : No such file or directory
Are you sure ~/spatial exists on the remote host?
bash: /my_script.sh: No such file or directory
Are you sure /my_script.sh exists on the remote host?
Again, your remote host is a wholly different environment. Just because a file or directory exists on your local machine doesn't mean it exists on the remote host unless you put it there.
Try ssh [remote-host] 'ls ~' and ssh [remote-host] 'ls /' - I bet you'll see the directory and file don't exist.
Related
I am trying to login on one of the remote server(Box1) and trying to read one file on remote server(Box1).
That contain the another server(Box2) details, base upon that details I have to come back to the local server and ssh to another server(Box2) for some data crunching. and so on.....
ssh box1.com << EOF
if [[ ! -f /home/rakesh/tomar.log ]]
then
echo "LOG file not found"
else
echo " LOG file present"
export server_node1= `cat /home/rakesh/tomar.log`
fi
EOF
ssh box2.com << EOF
if [[ ! -f /home/rakesh/tomar.log ]]
then
echo "LOG file not found"
else
echo " LOG file present"
export server_node2= `cat /home/rakesh/tomar.log`
fi
EOF
but I am not getting value of "server_node1" and "server_node2" on local machine.
any help would be appreciated.
Just like bash -c 'export foo=bar' cannot declare a variable in the calling shell where you typed this, an ssh command cannot declare a variable in the calling shell. You will have to refactor so that the calling shell receives the information and knows what to do with it.
I agree with the comment that storing a log file in a variable is probably not a sane, or at least elegant, thing to do, but the easy way to do what you are attempting is to put the ssh inside the assignment.
server_node1=$(ssh box1.com cat tomar.log)
server_node2=$(ssh box2.com cat tomar.log)
A few notes and amplifications:
The remote shell will run in your home directory, so I took it out (on the assumption that /home/rt9419 is your home directory, obviously).
In case of an error in the cat command, the exit code of ssh will be the error code from cat, and the error message on standard error will be visible on your standard error, so the echo seemed quite superfluous. (If you want a custom message, variable=$(ssh whatever) || echo "Custom message" >&2 would do that. Note the redirection to standard error; it doesn't seem to matter here, but it's good form.)
If you really wanted to, you could run an arbitrarily complex command in the ssh; as outlined above, it didn't seem necessary here, but you could do assigment=$(ssh remote 'if [[ things ]]; then for variable in $(complex commands to drive a loop); do : etc etc; done; fi; more </dev/null; exit "$variable"') or whatever.
As further comments on your original attempt,
The backticks in the here document in your attempt would be evaluated by your local shell before the ssh command even ran. There are separate questions about how to fix that; see e.g. How have both local and remote variable inside an SSH command. but in short, unless you absolutely require the local shell to be able to modify the commands you send, probably put them in single quotes, like I did in the silly complex ssh example above.
The function of export is to make variables visible to child processes. There is no way to affect the environment of a parent process (short of having it cooperate and/or coordinate the change, as in the code above). As an example to illustrate the difference, if you set PERL5LIB to a directory with Perl libraries, but fail to export it, the Perl process you start will not see the variable; it is only visible to the current shell. When you export it, any Perl process you start as a child of this shell will also see this variable and the value you assigned. In other words, you export variables which are not private to the current shell (and don't export private ones; aside from making sure they are private, this saves the amount of memory which needs to be copied between processes), but that still only makes them visible to children, by the design of the U*x process architecture.
You should get back the file from box1and box2 with an scp:
scp box1.com:/home/rt9419/tomar.log ~/tomar1.log
#then you can cat!
export server_node1=`cat ~/tomar1.log`
idem with box2
scp box2.com:/home/rt9419/tomar.log ~/tomar2.log
#then you can cat!
export server_node2=`cat ~/tomar2.log`
There are several possibilities. In your case, you could on the remote system create a file (in bash syntax), containing the assignments of these variables, for example
echo "export server_node2='$(</home/rt9419/tomar.log)'" >>export_settings
(which makes me wonder why you want the whole content of your logfile be stored into a variable, but this is another question), then transfer this file to your host (for example with scp) and source it from within your bash script.
I have a shell script which usually runs nearly 10 mins for a single run,but i need to know if another request for running the script comes while a instance of the script is running already, whether new request need to wait for existing instance to compplete or a new instance will be started.
I need a new instance must be started whenever a request is available for the same script.
How to do it...
The shell script is a polling script which looks for a file in a directory and execute the file.The execution of the file takes nearly 10 min or more.But during execution if a new file arrives, it also has to be executed simultaneously.
the shell script is below, and how to modify it to execute multiple requests..
#!/bin/bash
while [ 1 ]; do
newfiles=`find /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/ -newer /afs/rch/usr$
touch /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/.my_marker
if [ -n "$newfiles" ]; then
echo "found files $newfiles"
name2=`ls /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/ -Art |tail -n 2 |head $
echo " $name2 "
mkdir -p -m 0755 /afs/rch/usr8/fsptools/WWW/dumpspace/$name2
name1="/afs/rch/usr8/fsptools/WWW/dumpspace/fipsdumputils/fipsdumputil -e -$
$name1
touch /afs/rch/usr8/fsptools/WWW/dumpspace/tempfiles/$name2
fi
sleep 5
done
When writing scripts like the one you describe, I take one of two approaches.
First, you can use a pid file to indicate that a second copy should not run. For example:
#!/bin/sh
pidfile=/var/run/$(0##*/).pid
# remove pid if we exit normally or are terminated
trap "rm -f $pidfile" 0 1 3 15
# Write the pid as a symlink
if ! ln -s "pid=$$" "$pidfile"; then
echo "Already running. Exiting." >&2
exit 0
fi
# Do your stuff
I like using symlinks to store pid because writing a symlink is an atomic operation; two processes can't conflict with each other. You don't even need to check for the existence of the pid symlink, because a failure of ln clearly indicates that a pid cannot be set. That's either a permission or path problem, or it's due to the symlink already being there.
Second option is to make it possible .. nay, preferable .. not to block additional instances, and instead configure whatever it is that this script does to permit multiple servers to run at the same time on different queue entries. "Single-queue-single-server" is never as good as "single-queue-multi-server". Since you haven't included code in your question, I have no way to know whether this approach would be useful for you, but here's some explanatory meta bash:
#!/usr/bin/env bash
workdir=/var/tmp # Set a better $workdir than this.
a=( $(get_list_of_queue_ids) ) # A command? A function? Up to you.
for qid in "${a[#]}"; do
# Set a "lock" for this item .. or don't, and move on.
if ! ln -s "pid=$$" $workdir/$qid.working; then
continue
fi
# Do your stuff with just this $qid.
...
# And finally, clean up after ourselves
remove_qid_from_queue $qid
rm $workdir/$qid.working
done
The effect of this is to transfer the idea of "one at a time" from the handler to the data. If you have a multi-CPU system, you probably have enough capacity to handle multiple queue entries at the same time.
ghoti's answer shows some helpful techniques, if modifying the script is an option.
Generally speaking, for an existing script:
Unless you know with certainty that:
the script has no side effects other than to output to the terminal or to write to files with shell-instance specific names (such as incorporating $$, the current shell's PID, into filenames) or some other instance-specific location,
OR that the script was explicitly designed for parallel execution,
I would assume that you cannot safely run multiple copies of the script simultaneously.
It is not reasonable to expect the average shell script to be designed for concurrent use.
From the viewpoint of the operating system, several processes may of course execute the same program in parallel. No need to worry about this.
However, it is conceivable, that a (careless) programmer wrote the program in such a way that it produces incorrect results, when two copies are executed in parallel.
I am writing a shell script on Solaris to check if the files on the Remote Host is done writing before transferring over to Local Host. I have done a skeleton, but there are certain parts I am not sure on how to do. I did a little reading on the commands to check file size, it is stat -c %s LogFiles.txt but I am not sure as to how to check it over in the Remote Host.
# Get File Size on Remote Host
INITIALSIZE =
sleep 5
# Get File Size on Remote Host Again
LATESTSIZE =
#Loop 5 times
for i in {1..5}
do
if [ "$INITIALSIZE" -ne "$LATESTSIZE"]
then
sleep 5
# Get File Size on Remote Host
LATESTSIZE=
else
scp -P 22 $id#$ip:$srcpath/\*.txt $destpath
break
done
Assuming that your measurement for 'done' is "file size constant for 5 sec", you can simply use ssh as follows:
ssh user#remote.machine "command to execute"
this can be piped or set as variable on the local machine e.g. in your case:
latestsize=$( ssh user#remote.machine "<sizedeterminer> <file>" )
Passwordless login of course would skip the askpass problem. See point 3.3 in this manual or an example here.
Question,
I want to have a bash script that will have a global variable that can be incremented from other bash scripts.
Example:
I have a script like the following:
#! /bin/bash
export Counter=0
for SCRIPT in /Users/<user>/Desktop/*sh
do
$SCRIPT
done
echo $Counter
That script will call all the other bash scripts in a folder and those scripts will have something like the following:
if [ "$Output" = "$Check" ]
then
echo "OK"
((Counter++))
I want it to then increment the $Counter variable if it does equal "OK" and then pass that value back to the initial batch script so I can keep that counter number and have a total at the end.
Any idea on how to go about doing that?
Environment variables propagate in one direction only -- from parent to child. Thus, a child process cannot change the value of an environment variable set in their parent.
What you can do is use the filesystem:
export counter_file=$(mktemp "$HOME/.counter.XXXXXX")
for script in ~user/Desktop/*sh; do "$script"; done
...and, in the individual script:
counter_curr=$(< "$counter_file" )
(( ++counter_curr ))
printf '%s\n' "$counter_curr" >"$counter_file"
This isn't currently concurrency-safe, but your parent script as currently written will never call more than one child at a time.
An even easier approach, assuming that the value you're tracking remains relatively small, is to use the file's size as a proxy for the counter's value. To do this, incrementing the counter is as simple as this:
printf '\n' >>"$counter_file"
...and checking its value in O(1) time -- without needing to open the file and read its content -- is as simple as checking the file's size; with GNU stat:
counter=$(stat -f %z "$counter_file")
Note that locking may be required for this to be concurrency-safe if using a filesystem such as NFS which does not correctly implement O_APPEND; see Norman Gray's answer (to which this owes inspiration) for a working implementation.
You could source the other scripts, which means they're not running in a sub-process but "inline" in the calling script like this:
#! /bin/bash
export counter=0
for script in /Users/<user>/Desktop/*sh
do
source "$script"
done
echo $counter
But as pointed out in the comments i'd only advise to use this approach if you control the called scripts yourself. If they for example exit or have variables clashing with each other, bad things could happen.
As described, you can't do this, since there isn't anything which corresponds to a ‘global variable’ for shell scripts.
As the comment suggests, you'll have to use the filesystem to communicate between scripts.
One simple/crude way of doing what you describe would be to simply have each cooperating script append a line to a file, and the ‘global count’ is the size of this file:
#! /bin/sh -
echo ping >>/tmp/scriptcountfile
then wc -l /tmp/scriptcountfile is the number of times that's happened. Of course, there's a potential race condition there, so something like the following would sequence those accesses:
#! /bin/sh -
(
flock -n 9
echo 'do stuff...'
echo ping >>/tmp/stampfile
) 9>/tmp/lockfile
(the flock command is available on Linux, but isn't portable).
Of course, then you can start to do fancier things by having scripts send stuff through pipes and sockets, but that's going somewhat over the top.
I've only been writing actual .sh scripts since sometime this morning, and I'm a bit stuck. I'm trying to write a script to check to see if a process is running, and to start it if it isn't. (I plan to run this script once every 10 to 15 minutes with cron.)
Here's what I have so far:
#!/bin/bash
APPCHK=$(ps aux | grep -c "/usr/bin/rsync -rvz -e ssh /home/e-smith/files/ibays/drive-i/files/Warehouse\ Pics/organized_pics imgserv#192.168.0.140:~/webapps/pavlick_container/public/images
")
RUNSYNC=$(rsync -rvz -e ssh /home/e-smith/files/ibays/drive-i/files/Warehouse\ Pics/organized_pics imgserv#192.168.0.140:~/webapps/pavlick_container/public/images)
if [ $APPCHK < '2' ];
then
$RUNSYNC
fi
exit
Here's the error that I'm getting:
$ ./image_sync.sh
rsync: mkdir "/home/i/webapps/pavlick_container/public/images" failed: No such file or directory (2)
rsync error: error in file IO (code 11) at main.c(595) [Receiver=3.0.7]
rsync: connection unexpectedly closed (9 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(601) [sender=3.0.7]
./image_sync.sh: line 8: 2: No such file or directory
TRTWF is that
rsync -rvz -e ssh /home/e-smith/files/ibays/drive-i/files/Warehouse\ Pics/organized_pics imgserv#192.168.0.140:~/webapps/pavlick_container/public/images
runs just fine from a terminal window.
What am I doing wrong?
Your grep call is wrong on two counts. The pattern shouldn't include a newline. To look for an exact string, use grep -F 'substring' or grep -xF 'exact whole line'.
Finding if a process is running with ps | grep is highly brittle. On most unices (at least Solaris, Linux and *BSD), use pgrep: pgrep -f 'PATTERN' returns true if there's a running process whose command line matches PATTERN.
Every program returns a status code, either 0 to indicate success or a number between 1 and 255 to indicate failure. In the shell, any command is a valid boolean expression; the status code 0 is treated as true and anything else as false.
$(…) means run the command inside the parentheses and capture its output. So rsync is executed as soon as the shell hits the definition of the RUNSYNC variable. To store a block of shell code, use a function (example below, although you don't actually need a function here, you could just write the code directly).
Your test [ $APPCHK < 2 ] should be [ $APPCHK -lt 2 ]: < means input redirection. (In bash, you can also write [[ foo < bar ]], but that's string comparison, not numeric comparison.)
~/ at the beginning of the remote rsync path is optional. Also, -e ssh is the default unless your version of rsync is really old.
exit at the end of the script is useless, the script will exit anyway.
Here's a script taking the above into account:
#!/bin/bash
run_rsync () {
rsync -rvz '/home/e-smith/files/ibays/drive-i/files/Warehouse Pics/organized_pics' \
imgserv#192.168.0.140:webapps/pavlick_container/public/images
}
process_pattern='/usr/bin/rsync -rvz /home/e-smith/files/ibays/drive-i/files/Warehouse Pics/organized_pics imgserv#192\.168\.0\.140:webapps/pavlick_container/public/images'
if pgrep -xF "$process_pattern"; then
run_rsync
fi
Looks like with your rsync command that some directory along this path is wrong: ~/webapps/pavlick_container/public/images
Have you checked on the server 192.168.0.140 in imgserv's home directory to see if "pavlick_container/public" exists? That's my guess.
You have a number of problems. First you are running the commands instead of putting the commands in variables. There is also a much easier way.
RUNSYNC="rsync -rvz -e ssh /home/e-smith/files/ibays/drive-i/files/Warehouse\ Pics/organized_pics imgserv#192.168.0.140:~/webapps/pavlick_container/public/images"
if ! pgrep -f "rsync.*organized_pics"; then $RUNSYNC; fi
First of all, the way of checking if the program is running is mostly wrong. This may or may not work. You should rely on some special file you create when your script starts, that it is deleted when your script ends. This will tell you if the script is running, just checking if this file exists.
Then, try to either put a \ before the ~ or to remove the ~/ completely. If cron is run as other user, the tilde will be substituted in the client for the user directory. It works for the command line because maybe the home directory of your user in both machines match, but not in the user the cron is running. A guess at this point, but again, try to remove the ~/ and see if it works.
If your real code is missing a closing dlb-quote on the grep target, you're going to get weird results from the get-go.
Also, ps aux will not list a complete command line result like you show (at least on all the the pss I have used).
You need to make it ps auxwww. Often you will see people add | grep -v grep | (you'll see why at some point). This can be reduced to changing your static search target slightly like "/usr/bin/rsync" to "/usr/bin/[r]sync ".
Other users are also helping with their comments. Using a flag file as #DiegoSevilla mentions is marginally deprecated. use a mkdir /tmp/MyWatcher_flagDir for your flag. Directory creation is an atomic activity (where as file creations are not), and this will eliminate any errors you might encounter from having 2 copies of you monitor try to make a flag file at the same time. Only one process will succeed in making or removing a flag dir.
I hope this helps.