I'm computing on a AWS EC2 type environment. I have a script that runs on the headnode, then executes a shell command on a VM which it detachs from the headnode parent shell.
headnode.sh
#!/bin/bash
...
nohup ssh -i vm-key ubuntu#vm-ip './vm_script.sh' &
exit 0
The reason I do it like this, is that the vm_script takes a very long time to finish. So I want to detach it from the headnode shell, and have it run in the background of the VM.
vm_script.sh
#!/bin/bash
...
ssh -i head-node-key user#head-node-ip './delete-vm.sh'
exit 0
Once the VM completes the task it sends a command back to the headnode to delete the VM.
Is there a more reliable way to execute the final script on the headnode following completion of the script on the VM?
E.g., After I launch the vm_script.sh, could I launch another script on the headnode that waits for the completion of the PID on the VM?
Assuming you trust in the network link, the more general reliable way is the simpler way. I would rewrite headnode.sh as presented below:
#!/bin/bash
...
ssh -i vm-key ubuntu#vm-ip './vm_script.sh' && ./delete-vm.sh
The script above waits the termination of vm_script.sh and then invokes ./delete-vm.sh if no error code is returned by vm_script.sh.
If you don't want to be stuck in the shell waiting for the termination of headnode.sh, just call it using nohup ./headnode.sh &
Related
I'm using a Dockerfile that ends with a CMD ["/start.sh"]:
#!/bin/bash
service ssh start
/usr/bin/node /myApp/app.js
if for some reason i need to kill the node process, the ssh server is being closed as well (forces me to reboot the container to reconnect).
Any simple way to avoid this behavior?
Thank You.
The container exits as soon as main process of the container exits. In your case, the main process inside the container is start.sh shell script. The start.sh shell script is starting the ssh service and then running the nodejs process as child process. Once the nodejs process dies, the shell script exits as well and so the container exits. So what you can do is to put the nodejs process in background.
#!/bin/bash
service ssh start
/usr/bin/node /myApp/app.js &
# Need the following infinite loop as the shell script should not exit
while do:
sleep 2
done
I DO NOT recommend this approach though. You should have only a single process per container. Read the following answers to understand why -
Running multiple applications in one docker container
If you still want to run multiple processes inside container, there are better ways to do it like using supervisord - https://docs.docker.com/config/containers/multi-service_container/
I am trying to run several roslaunch files, one after the other, from a bash script. However, when the nodes complete execution, they hang with the message:
[grem_node-1] process has finished cleanly
log file: /home/user/.ros/log/956b5e54-75f5-11e9-94f8-a08cfdc04927/grem_node-1*.log
Then I need to Ctrl-C to get killing on exit for all of the nodes launched from the launch file. Is there some way of causing nodes to automatically kill themselves on exit? Because at the moment I need to Ctrl-C every time a node terminates.
My bash script looks like this, by the way:
python /home/user/git/segmentation_plots/scripts/generate_grem_launch.py /home/user/Data2/Coco 0 /home/user/git/Async_CNN/config.txt
source ~/setupgremsim.sh
roslaunch grem_ros grem.launch config:=/home/user/git/Async_CNN/config.txt
source /home/user/catkin_ws/devel/setup.bash
roslaunch rpg_async_cnn_generator conf_coco.launch
The script setupgremsim.sh sources another catkin workspace.
Many thanks!
Thanks all for your advice. What I ended up doing was this; I launched my ROS Nodes from separate python scripts, which I then called from the bash script. In python you are able to terminate child processes with shutdown. So to provide an example for anyone else with this issue:
bash script:
#!/bin/bash
for i in {0..100}
do
echo "========================================================\n"
echo "This is the $i th run\n"
echo "========================================================\n"
source /home/timo/catkin_ws/devel/setup.bash
python planar_launch_generator.py
done
and then inside planar_launch_generator.py:
import roslaunch
import rospy
process_generate_running = True
class ProcessListener(roslaunch.pmon.ProcessListener):
global process_generate_running
def process_died(self, name, exit_code):
global process_generate_running
process_generate_running = False
rospy.logwarn("%s died with code %s", name, exit_code)
def init_launch(launchfile, process_listener):
uuid = roslaunch.rlutil.get_or_generate_uuid(None, False)
roslaunch.configure_logging(uuid)
launch = roslaunch.parent.ROSLaunchParent(
uuid,
[launchfile],
process_listeners=[process_listener],
)
return launch
rospy.init_node("async_cnn_generator")
launch_file = "/home/user/catkin_ws/src/async_cnn_generator/launch/conf_coco.launch"
launch = init_launch(launch_file, ProcessListener())
launch.start()
while process_generate_running:
rospy.sleep(0.05)
launch.shutdown()
Using this method you could source any number of different catkin workspaces and launch any number of launchfiles.
Try to do this
(1) For each launch you put in a separate shell script. So you have N script
In each script, call the launch file in xterm. xterm -e "roslaunch yourfacnylauncher"
(2) Prepare a master script which calling all N child script in the sequence you want it to be and delay you want it to have.
Once it is done, xterm should kill itself.
Edit. You can manually kill one if you know its gonna hang. Eg below
#!/bin/sh
source /opt/ros/kinetic/setup.bash
source ~/catkin_ws/devel/setup.bash
start ROScore using systemd or rc.local using lxtermal or other terminals to avoid accident kill. Then run the part which you think gonna hang or create a problem. Echo->action if necessary
xterm -geometry 80x36+0+0 -e "echo 'uav' | sudo -S dnsmasq -C /dev/null -kd -F 10.5.5.50,10.5.5.100 -i enp59s0 --bind-dynamic" & sleep 15
Stupid OUSTER LIDAR cant auto config like Veloydne and will hang here. other code cant run
killall xterm & sleep 1
Lets just kill it and continuous run other launches
xterm -e "roslaunch '/home/uav/catkin_ws/src/ouster_driver_1.12.0/ouster_ros/os1.launch' os1_hostname:=os1-991907000715.local os1_udp_dest:=10.5.5.1"
I am using guestcontrol with Virtual Box with a Windows host and a Linux (RHEL7) guest. I want to do some config from the host to the guest by running a shell script on the guest (from a .bat on the host). This is fine and the script runs, however, it hangs when I call the reboot (I believe it is because nothing is returned). So when the following .sh is called:
#!/bin/bash
echo "here"
exit
The .bat file shows "here" and then exits (or if I use pause gives the correct message). However, when I add the reboot, the .bat never processes anything past where it calls the script. I think this would be because the guest never tells the host that the script is complete.
I have tried things like:
#!/bin/bash
{ sleep 1; reboot; } >/dev/null &
exit
or even:
#!/bin/bash
do_reboot(){
sleep 1
reboot
}
do_reboot() &
exit
but the .bat never gets past the line where it runs the .sh
How can I tell the host that the .sh script (on the guest) is complete so it can continue with the .bat script?
We need to make sure there are no sub processes running, so we want to do a no heads up using the nohup command. So the script simply becomes this:
#!/bin/bash
nohup reboot &> /tmp/nohup.out </dev/null &
exit
The stdin and stdout were causing the issues, so this just sends them into the void so that the script will not be waiting for any input from any other processes.
If you have any issues with this script, you could do something like:
#!/bin/bash
nohup /path/to/reboot_delay.sh &> /tmp/nohup.out </dev/null &
exit
And then in /path/to/reboot_delay.sh you would have:
#!/bin/bash
sleep 10 # or however many seconds you need to wait for something to happen
reboot
This way you could even allow some time for something to finish etc, yet the host machine (or ssh or wherever you are calling this from) would still know the script had finished and do what it needs to do.
I hope this can help people in future.
I'm have a tad bit of difficulty with developing bash based deployment scripts for a pipeline I want to run on an OpenStack VM. There are 4 scripts in total:
head_node.sh - launches the vm and attaches appropriate disk storage to the VM. Once that's completed, it runs the scripts (2 and 3) sequentially by passing a command through ssh to the VM.
install.sh - VM-side, installs all of the appropriate software needed by the pipeline.
run.sh - VM-side, mounts storage on the VM and downloads raw data from object storage. It then runs the final script, but does so by detaching the process from the shell created by ssh using nohup ./pipeline.sh &. The reason I want to detach from the shell is that the next portion is largely just compute and may take days to finish. Therefore, the user shouldn't have to keep the shell open that long and it should just run in the background.
pipeline.sh - VM-side, essentially a for loop that iterates through a list of files, and sequential runs commands on those and intermediate files. The result are analysed which are then staged back to the object storage. The VM then essentially tells the head node to kill it.
Now I'm running into a problem with nohup. If I launch the pipeline.sh script normally (i.e. without nohup) and keep it attached to that shell, everything runs smoothly. However, if I detach the script, it errors out after the first command in the first iteration of the for loop. Am I thinking about this the wrong way? What's the correct way to do this?
So this is how it looks:
$./head_node.sh
head_node.sh
#!/bin/bash
... launched VM etc
ssh $vm_ip './install.sh'
ssh $vm_ip './run.sh'
exit 0
install.sh - omitted - not important for the problem
run.sh
#!/bin/bash
... mounts storage downloads appropriate files
nohup ./pipeline.sh > log &
exit 0
pipeline.sh
#!/bin/bash
for f in $(find . -name '*ext')
do
process1 $f
process2 $f
...
done
... stage files to object storage, unmount disks, additional cleanups
ssh $head_node 'nova delete $vm_hash'
exit 0
Since I'm evoking the run.sh script from an ssh instance, subprocesses launched from the script (namely pipeline.sh) will not properly detach from the shell and will error out on termination of the ssh instance evoking run.sh. The pipeline.sh script can be properly detached by calling it from the head node, e.g., nohup ssh $vm_ip './pipeline.sh' &, this will keep the session alive until the end of the pipeline.
My script is:
for i in $(seq $nb_lignes) a list of machines
do
ssh root#$machine -x "java ....."
sleep 10
done
--> i execute this script from machine C
i have two machines A and B ($nb_lignes=2)
ssh root#$machineA -x "java ....." : create a node with Pastry overlay
wait 10 secondes
ssh root#$machineB -x "java .....":create another node join the first (that's way i have use sleep 10 secondes)
i run the script from machine C:
i'd like that it display : node 1 is created , wait 10 seconds and display node 2 is created
My problem: it display node 1 is created only
i tape ctrl+c it diplay node 2 is created
PS: the two process java are still runing in machine A and B
Thank you
From the way I'm reading this, armani is correct; since your java program does not exit, the second iteration of the loop doesn't get run until you "break" the first one. I would guess that the Java program is ignoring a break signal sent to it by ssh.
Rather than backgrounding each SSH with a &, you're probably better off using the tools provided to you by ssh itself. From the ssh man page:
-f Requests ssh to go to background just before command execution.
This is useful if ssh is going to ask for passwords or
passphrases, but the user wants it in the background. This
implies -n. The recommended way to start X11 programs at a
remote site is with something like ssh -f host xterm.
So ... your script would look something like this:
for host in machineA machineB; do
ssh -x -f root#${host} "java ....."
sleep 10
done
Try the "&" character after the "ssh" command. That spawns the process separately [background] and continues on with the script.
Otherwise, your script is stuck running ssh.
EDIT: For clarity, this would be your script:
for i in $(seq $nb_lignes) a list of machines
do
ssh root#$machine -x "java ....." &
sleep 10
done