Multi threading in command line possible? - shell

I'm using the following command to check the whois information from a list of domains in a text file and then output any lines that contain an email to a new file:
for i in $(cat testdomains.txt); do whois $i| egrep [a-zA-Z0-9]#[a-zA-Z0-9]\.[a-zA-Z0-9]; done >> results.txt
Is there any way to speed this up by checking more than one domain at a time? For example, right now it is going from one domain to the next checking the information. Is there anything I could change in the command to make it check 50 domains at a time?

With &, you can run any command in background (so in parallel) :
for i in $(< testdomains.txt); do
whois "$i" | egrep '[a-zA-Z0-9]#[a-zA-Z0-9]\.[a-zA-Z0-9]' &
done >> results.txt
Note
If you put the control operator & at the end of a command, e.g. command args &, the shell executes the command in the background in a subshell. The shell does not wait for the command to finish, and the return status is 0. Pid of the last backgrounded command is available via the special variable $!
every & do a fork(2) in the background
see How do I wait for several spawned processes?

Related

Determining all the processes started with a given executable in Linux

I have this need to collect\log all the command lines that were used to start a process on my machine during the execution of a Perl script which happens to be a test automation script. This Perl script starts the executable in question (MySQL) multiple times with various command lines and I would like to inspect all of the command lines of those invocations. What would be the right way to do this? One possibility i see is run something like "ps -aux | grep mysqld | grep -v grep" in a loop in a shell script and capture the results in a file but then I would have to do some post processing on this and remove duplicates etc and I could possibly miss some process command lines because of timing issues. Is there a better way to achieve this.
Processing the ps output can always miss some processes. It will only capture the ones currently existing. The best way would be to modify the Perl script to log each command before or after it executes it.
If that's not an option, you can get the child pids of the perl script by running:
pgrep -P $pid -a
-a gives the full process command. $pid is the pid of the perl script. Then process just those.
You could use strace to log calls to execve.
$ strace -f -o strace.out -e execve perl -e 'system("echo hello")'
hello
$ egrep ' = 0$' strace.out
11232 execve("/usr/bin/perl", ["perl", "-e", "system(\"echo hello\")"], 0x7ffc6d8e3478 /* 55 vars */) = 0
11233 execve("/bin/echo", ["echo", "hello"], 0x55f388200cf0 /* 55 vars */) = 0
Note that strace.out will also show the failed execs (where execve returned -1), hence the egrep command to find the successful ones. A successful execve call does not return, but strace records it as if it returned 0.
Be aware that this is a relatively expensive solution because it is necessary to include the -f option (follow forks), as perl will be doing the exec call from forked subprocesses. This is applied recursively, so it means that your MySQL executable will itself run through strace. But for a one-off diagnostic test it might be acceptable.
Because of the need to use recursion, any exec calls done from your MySQL executable will also appear in the strace.out, and you will have to filter those out. But the PID is shown for all calls, and if you were to log also any fork or clone calls (i.e. strace -e execve,fork,clone), you would see both the parent and child PIDs, in the form <parent_pid> clone(......) = <child_pid> so then you should hopefully then have enough information to reconstruct the process tree and decide which processes you are interested in.

How can I start a subscript within a perpetually running bash script after a specific string has been printed in the terminal output?

Specifics:
I'm trying to build a bash script which needs to do a couple of things.
Firstly, it needs to run a third party script that I cannot manipulate. This script will build a project and then start a node server which outputs data to the terminal continually. This process needs to continue indefinitely so I can't have any exit codes.
Secondly, I need to wait for a specific line of output from the first script, namely 'Started your app.'.
Once that line has been output to the terminal, I need to launch a separate set of commands, either from another subscript or from an if or while block, which will change a few lines of code in the project that was built by the first script to resolve some dependencies for a later step.
So, how can I capture the output of the first subscript and use that to run another set of commands when a particular line is output to the terminal, all while allowing the first script to run in the terminal, and without using timers and without creating a huge file from the output of subscript1 as it will run indefinitely?
Pseudo-code:
#!/usr/bin/env bash
# This script needs to stay running & will output to the terminal (at some point)
# a string that we need to wait/watch for to launch subscript2
sh subscript1
# This can't run until subscript1 has output a particular string to the terminal
# This could be another script, or an if or while block
sh subscript2
I have been beating my head against my desk for hours trying to get this to work. Any help would be appreciated!
I think this is a bad idea — much better to have subscript1 changed to be automation-friendly — but in theory you can write:
sh subscript1 \
| {
while IFS= read -r line ; do
printf '%s\n' "$line"
if [[ "$line" = 'Started your app.' ]] ; then
sh subscript2 &
break
fi
done
cat
}

Capturing ssh output in bash script while backgrounding connection

I have a loop that will connect to a server via ssh to execute a command. I want to save the output of that command.
o=$(ssh $s "$#")
This works fine. I can then do what I need with the output. However I have a lot of servers to run this against and I'm trying to speed up the process by backgrounding the ssh connection, basically to do all of the requests at once. If I wasn't saving the output I could do something like
ssh $s "$#" &
and this works fine
I haven't been able to get the correct combination to do both.
o=$(ssh $s "$#")&
This doesn't give me any output. Other combinations I've tried appear to try to execute the output. Suggestions?
Thanks!
A process going to the background gets its own copies of the file descriptors. The stdout (o=..) will not be available in the calling process. However, you can bind the stdout to a file and access the file.
ssh $s "$#" >outfile &
wait
o=$(cat outfile)
If you don't like files, you could also use named pipes. This way the 'wait' is done by the 'cat' command. The pipe can be reused and consumes no space on the disk.
mkfifo testpipe
ssh $s "$#" >testpipe &
o=$(cat testpipe)
I would just use a temporary file. You can't set a variable in a background process and access it from the shell that started it.
ssh "$s" "$#" > output.txt & ssh_pid=$!
...
wait "$ssh_pid"
o=$(<output.txt)

How to run a time-limited background command and read its output (without timeout command)

I'm looking at https://stackoverflow.com/a/10225050/1737158
And in same Q there is an answer with timeout command but it's not in all OSes, so I want to avoid it.
What I try to do is:
demo="$(top)" &
TASK_PID=$!
sleep 3
echo "TASK_PID: $TASK_PID"
echo "demo: $demo"
And I expect to have nothing in $demo variable while top command never ends.
Now I get an empty result. Which is "acceptable" but when i re-use the same thing with the command which should return value, I still get an empty result, which is not ok. E.g.:
demo="$(uptime)" &
TASK_PID=$!
sleep 3
echo "TASK_PID: $TASK_PID"
echo "demo: $demo"
This should return uptime result but it doesn't. I also tried to kill the process by TASK_PID but I always get. If a command fails, I expect to have stderr captures somehow. It can be in different variable but it has to be captured and not leaked out.
What happens when you execute var=$(cmd) &
Let's start by noting that the simple command in bash has the form:
[variable assignments] [command] [redirections]
for example
$ demo=$(echo 313) declare -p demo
declare -x demo="313"
According to the manual:
[..] the text after the = in each variable assignment undergoes tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal before being assigned to the variable.
Also, after the [command] above is expanded, the first word is taken to be the name of the command, but:
If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.
So, as expected, when demo=$(cmd) is run, the result of $(..) command substitution is assigned to the demo variable in the current shell.
Another point to note is related to the background operator &. It operates on the so called lists, which are sequences of one or more pipelines. Also:
If a command is terminated by the control operator &, the shell executes the command asynchronously in a subshell. This is known as executing the command in the background.
Finally, when you say:
$ demo=$(top) &
# ^^^^^^^^^^^ simple command, consisting ONLY of variable assignment
that simple command is executed in a subshell (call it s1), inside which $(top) is executed in another subshell (call it s2), the result of this command substitution is assigned to variable demo inside the shell s1. Since no commands are given, after variable assignment, s1 terminates, but the parent shell never receives the variables set in child (s1).
Communicating with a background process
If you're looking for a reliable way to communicate with the process run asynchronously, you might consider coprocesses in bash, or named pipes (FIFO) in other POSIX environments.
Coprocess setup is simpler, since coproc will setup pipes for you, but note you might not reliably read them if process is terminated before writing any output.
#!/bin/bash
coproc top -b -n3
cat <&${COPROC[0]}
FIFO setup would look something like this:
#!/bin/bash
# fifo setup/clean-up
tmp=$(mktemp -td)
mkfifo "$tmp/out"
trap 'rm -rf "$tmp"' EXIT
# bg job, terminates after 3s
top -b >"$tmp/out" -n3 &
# read the output
cat "$tmp/out"
but note, if a FIFO is opened in blocking mode, the writer won't be able to write to it until someone opens it for reading (and starts reading).
Killing after timeout
How you'll kill the background process depends on what setup you've used, but for a simple coproc case above:
#!/bin/bash
coproc top -b
sleep 3
kill -INT "$COPROC_PID"
cat <&${COPROC[0]}

how to get value before child process halt in bash

Requirement is:
child process will return a value, IP address, it use wget method
but child process maybe halt.
parent process can not wait child process, it need return value after some second.
The possible script is
parent.sh:
./child.sh &
sleep 60
echo child_return_value
child.sh:
child_return_value=$(wget ipaddress)
Just to add another possible approach, you can capture the output of a background process without (manually) using files by using process substitution, if your shell supports it. You can use the read builtin to get the output, which allows setting a timeout value:
exec 3< <(wget -O- ipaddress);
read -r -u3 -t60;
return_value="$REPLY";
exec 3<&-;
echo "$return_value";
The shell will actually create a FIFO or /dev/fd/xx special file on your behalf under this solution.
I would use the -T|--timeout option of wget to have the request time out after a specified number of seconds. If you do this, you can avoid messing with background processes and IPC entirely:
return_value=$(wget -T60 -O- ipaddress); ## 60 sec timeout
echo "$return_value";
You could have the child process write the result to a file that the parent process can read.
child_out="$(mktemp)"
./child.sh > "$child_out" &
sleep 60
if [ -s "$child_out" ]
then
child_return_value=$(cat "$child_out")
else
# Child did not produce a result yet.
fi
Don't forget to remove the temporary file in the parent script. Preferably using a trap so it will be removed under all (well, most) circumstances.

Resources