How do I print the full pipeline of which my script is one part of? - bash

Using echo "blah" | myscript, as an example pipeline am looking for a way to from within "myscript":
echo "$SOME_BUILTIN"
whose output would be the original pipeline:
echo "blah" | myscript
Any ideas?

Here you go:
#!/usr/bin/env bash
siblings=($(pgrep -P "$(ps -p $$ o ppid=)" | grep -Fvx $$))
pids=$$
sibling=$$
while true
do
pid=$sibling
for sibling in ${siblings[#]}
do
if [ "$(readlink "/proc/${sibling}/fd/1")" = "$(readlink "/proc/${pid}/fd/0")" ]
then
# Found previous pipe process
pids="${sibling},${pids}"
continue 2
fi
done
break # No previous pipe process
done
while IFS= read -r line
do
pipeline="${pipeline+${pipeline} | }$line"
done < <(ps --sort=pid -p "$pids" o args=)
printf '%s\n' "$pipeline"
Example:
$ ./test.sh
bash ./test.sh
$ cat /dev/zero | sleep 1 | ./test.sh
cat /dev/zero | sleep 1 | bash ./test.sh
Working on a small project to make this a proper tool.

It's not easy to do this directly, but you could use following information to recreate the pipeline:
Your script's PID is in $$ env variable
Using ps -fe and looking at columns 2 and 3 you can find your script's parent and the parent's parent and so on. The pipeline consists of a series of direct ancestors, and you can follow the path down to process with PID 1 (init process or equivalent).
Knowing each process's PID, you can get its command line from /proc/PID/cmdline (replace '\0''s with spaces using tr for human-readable format
If you know that the script was called from a pipeline, you're essentially done. If you need to find out whether it was called from a pipeline or interactively, you would probably need to check what ls -lh /proc/self/fd/0 prints. That's the 0-th file descriptor, meaning stdin. If it points to /dev/someting (like /dev/pts/XX) you're probably interactive. If the target of the link starts with pipe:[, it's likely you are receiving data from a pipeline (it could also be a network socket but hopefully you can rule this case out as unlikely in your setup).

Related

How to find the number of instances of current script running in bash?

I have the below code to find out the number of instances of current script running that is running with same arg1. But looks like the script creates a subshell and executes this command which also shows up in output. What would be the better approach to find the number of instances of running script ?
$cat test.sh
#!/bin/bash
num_inst=`ps -ef | grep $0 | grep $1 | wc -l`
echo $num_inst
$ps aux | grep test.sh | grep arg1 | grep -v grep | wc -l
0
$./test.sh arg1 arg2
3
$
I am looking for a solution that matches all running instance of ./test.sh arg1 arg2 not the one with ./test.sh arg10 arg20
The reason this creates a subshell is that there's a pipeline inside the command substitution. If you run ps -ef alone in a command substitution, and then separately process the output from that, you can avoid this problem:
#!/bin/bash
all_processes=$(ps -ef)
num_inst=$(echo "$all_processes" | grep "$0" | grep -c "$1")
echo "$num_inst"
I also did a bit of cleanup on the script: double-quote all variable references to avoid weird parsing, used $() instead of backticks, and replaced grep ... | wc -l with grep -c.
You might also replace the echo "$all_processes" | ... with ... <<<"$all_processes" and maybe the two greps with a single grep -c "$0 $1":
...
num_inst=$(grep -c "$0 $1" <<<"$all_processes")
...
Modify your script like this:
#!/bin/bash
ps -ef | grep $0 | wc -l
No need to store the value in a variable, the result is printed to standard out anyway.
Now why do you get 3?
When you run a command within back ticks (fyi you should use syntax num_inst=$( COMMAND ) and not back ticks), it creates a new sub-shell to run COMMAND, then assigns the stdout text to the variable. So if you remove the use of $(), you will get your expected value of 2.
To convince yourself of that, remove the | wc -l, you will see that num_inst has 3 processes, not 2. The third one exists only for the execution of COMMAND.

Access $? Variable with a piped statement?

I have some code that I would like to have the $? variable of.
VARIABLE=`grep "searched_string" test.log | sed 's/searched/found/'`
Is there any way to test if this entire line (rather than just the sed command) was completed successfully? If I try the following code right after it:
if [ "$?" -ne 0 ]
then
echo 1
exit
fi
it doesn't run even if the grep part of the statement fails.
Could someone show how to resolve this issue?
Use the
echo ${PIPESTATUS[#]}
will print out the array of exit-statuses of all commands.
$ ls | grep . | wc -l
28
$ echo ${PIPESTATUS[#]}
0 0 0
but
$ ls | grep nonexistentfilename | wc -l
0
$ echo ${PIPESTATUS[#]}
0 1 0 #the grep returns 1 - pattern not found
or
$ ls nonexistentfilename | grep somegibberish | wc -l
ls: nonexistentfilename: No such file or directory
0
$ echo ${PIPESTATUS[#]}
1 1 0 #ls and grep fails
for exact command status
echo ${PIPESTATUS[1]} #for the grep
also here is the
set -o pipefail
from the docs
pipefail
If set, the return value of a pipeline is the value of the
last (rightmost) command to exit with a non-zero status, or zero if
all commands in the pipeline exit successfully. This option is
disabled by default.
$ ls nonexistentfile | wc -c
ls: nonexistentfile: No such file or directory
0
$ echo $?
0
$ set -o pipefail
$ ls nonexistentfile | wc -c
ls: nonexistentfile: No such file or directory
0
$ echo $?
1
EDIT based on the comment
Youre probably tried the next:
VARIABLE=$(grep "searched_string" test.log | sed 's/searched/found/')
echo "${PIPESTATUS[#]}"
Of course, this can't work because the whole $(...) part runs in the subshell (another process) and therefore any variable what is created is lost when the subshell exits. (at the ))
You should put the whole PIPESTATUS mechanism into $(...) like next:
variable=$(
grep "searched_string" test.log | sed 's/searched/found/'
# do something with PIPESTATUS
# you should not echo anythig to stdout (because will be captured into $variable)
# you can echo on stderr - e.g.
echo "=${PIPESTATUS[#]}=" >&2
)
Also, the second line of the comment is an solution, eg:
var_with_status=$(command | commmand2 ; echo ":DELIMITER:${PIPESTATUS[#]}")
now, the $var_with_status will contain not only the result of the command | command2 but the PIPESTATUS too, delimited with some unique delimiter, so you can extract it...
Also, the set -o pipefail will indicate the result - if you don't need exact place of the fail.
Also you can write the PIPESTATUS in some temp-file (in the subshell) and the parent can read it and delete the temp-file...
Also is possible print the PIPESTATUS into different file-descriptors in the subshell and read this descriptor in the parent shell, but....
... beware do not fall into the XY problem, where you will make extremelly complicated script, only because you don't want change the logic of the processing.
e.g. you can always break you script into safe parts, like:
var1=$(grep 'str' test.log)
#check the `$var1` and do something with the error indicated with `$?`
var2=(sed '....' <<<"$var1")
#check the `$var2` and do something with the error indicated with `$?`
#and so on
simple enough?
So, ask yourself - do you really need mungling with how to get the PIPESTATUS form an subshell?
Ps: don't use uppercase variable names. could interfere with some environment variables and causes hard-to-debug problems..

Get the name of the caller script in bash script

Let's assume I have 3 shell scripts:
script_1.sh
#!/bin/bash
./script_3.sh
script_2.sh
#!/bin/bash
./script_3.sh
the problem is that in script_3.sh I want to know the name of the caller script.
so that I can respond differently to each caller I support
please don't assume I'm asking about $0 cause $0 will echo script_3 every time no matter who is the caller
here is an example input with expected output
./script_1.sh should echo script_1
./script_2.sh should echo script_2
./script_3.sh should echo user_name or root or anything to distinguish between the 3 cases?
Is that possible? and if possible, how can it be done?
this is going to be added to a rm modified script... so when I call rm it do something and when git or any other CLI tool use rm it is not affected by the modification
Based on #user3100381's answer, here's a much simpler command to get the same thing which I believe should be fairly portable:
PARENT_COMMAND=$(ps -o comm= $PPID)
Replace comm= with args= to get the full command line (command + arguments). The = alone is used to suppress the headers.
See: http://pubs.opengroup.org/onlinepubs/009604499/utilities/ps.html
In case you are sourceing instead of calling/executing the script there is no new process forked and thus the solutions with ps won't work reliably.
Use bash built-in caller in that case.
$ cat h.sh
#! /bin/bash
function warn_me() {
echo "$#"
caller
}
$
$ cat g.sh
#!/bin/bash
source h.sh
warn_me "Error: You did not do something"
$
$ . g.sh
Error: You did not do something
g.sh
$
Source
The $PPID variable holds the parent process ID. So you could parse the output from ps to get the command.
#!/bin/bash
PARENT_COMMAND=$(ps $PPID | tail -n 1 | awk "{print \$5}")
Based on #J.L.answer, with more in depth explanations, that works for linux :
cat /proc/$PPID/comm
gives you the name of the command of the parent pid
If you prefer the command with all options, then :
cat /proc/$PPID/cmdline
explanations :
$PPID is defined by the shell, it's the pid of the parent processes
in /proc/, you have some dirs with the pid of each process (linux). Then, if you cat /proc/$PPID/comm, you echo the command name of the PID
Check man proc
Couple of useful files things kept in /proc/$PPID here
/proc/*some_process_id*/exe A symlink to the last executed command under *some_process_id*
/proc/*some_process_id*/cmdline A file containing the last executed command under *some_process_id* and null-byte separated arguments
So a slight simplification.
sed 's/\x0/ /g' "/proc/$PPID/cmdline"
If you have /proc:
$(cat /proc/$PPID/comm)
Declare this:
PARENT_NAME=`ps -ocomm --no-header $PPID`
Thus you'll get a nice variable $PARENT_NAME that holds the parent's name.
You can simply use the command below to avoid calling cut/awk/sed:
ps --no-headers -o command $PPID
If you only want the parent and none of the subsequent processes, you can use:
ps --no-headers -o command $PPID | cut -d' ' -f1
You could pass in a variable to script_3.sh to determine how to respond...
script_1.sh
#!/bin/bash
./script_3.sh script1
script_2.sh
#!/bin/bash
./script_3.sh script2
script_3.sh
#!/bin/bash
if [ $1 == 'script1' ] ; then
echo "we were called from script1!"
elsif [ $1 == 'script2' ] ; then
echo "we were called from script2!"
fi

How execute bash script line by line?

If I enter bash -x option, it will show all the line. But the script will execute normaly.
How can I execute line by line? Than I can see if it do the correct thing, or I abort and fix the bug. The same effect is put a read in every line.
You don't need to put a read in everyline, just add a trap like the following into your bash script, it has the effect you want, eg.
#!/usr/bin/env bash
set -x
trap read debug
< YOUR CODE HERE >
Works, just tested it with bash v4.2.8 and v3.2.25.
IMPROVED VERSION
If your script is reading content from files, the above listed will not work. A workaround could look like the following example.
#!/usr/bin/env bash
echo "Press CTRL+C to proceed."
trap "pkill -f 'sleep 1h'" INT
trap "set +x ; sleep 1h ; set -x" DEBUG
< YOUR CODE HERE >
To stop the script you would have to kill it from another shell in this case.
ALTERNATIVE1
If you simply want to wait a few seconds before proceeding to the next command in your script the following example could work for you.
#!/usr/bin/env bash
trap "set +x; sleep 5; set -x" DEBUG
< YOUR CODE HERE >
I'm adding set +x and set -x within the trap command to make the output more readable.
The BASH Debugger Project is "a source-code debugger for bash that follows the gdb command syntax."
If your bash script is really a bunch of one off commands that you want to run one by one, you could do something like this, which runs each command one by one when you increment a variable LN, corresponding to the line number you want to run. This allows you to just run the last command again super easy, and then you just increment the variable to go to the next command.
Assuming your commands are in a file "it.sh", run the following, one by one.
$ cat it.sh
echo "hi there"
date
ls -la /etc/passwd
$ $(LN=1 && cat it.sh | head -n$LN | tail -n1)
"hi there"
$ $(LN=2 && cat it.sh | head -n$LN | tail -n1)
Wed Feb 28 10:58:52 AST 2018
$ $(LN=3 && cat it.sh | head -n$LN | tail -n1)
-rw-r--r-- 1 root wheel 6774 Oct 2 21:29 /etc/passwd
Have a look at bash-stepping-xtrace.
It allows stepping xtrace.
xargs: can filter lines
cat .bashrc | xargs -0 -l -d \\n bash
-0 Treat as raw input (no escaping)
-l Separate each line (Not by default for performances)
-d \\n The line separator

Using xargs to assign stdin to a variable

All that I really want to do is make sure everything in a pipeline succeeded and assign the last stdin to a variable. Consider the following dumbed down scenario:
x=`exit 1|cat`
When I run declare -a, I see this:
declare -a PIPESTATUS='([0]="0")'
I need some way to notice the exit 1, so I converted it to this:
exit 1|cat|xargs -I {} x={}
And declare -a gave me:
declare -a PIPESTATUS='([0]="1" [1]="0" [2]="0")'
That is what I wanted, so I tried to see what would happen if the exit 1 didn't happen:
echo 1|cat|xargs -I {} x={}
But it fails with:
xargs: x={}: No such file or directory
Is there any way to have xargs assign {} to x? What about other methods of having PIPESTATUS work and assigning the stdin to a variable?
Note: these examples are dumbed down. I'm not really doing an exit 1, echo 1 or a cat, but used these commands to simplify so we can focus on my particular issue.
When you use backticks (or the preferred $()) you're running those commands in a subshell. The PIPESTATUS you're getting is for the assignment rather than the piped commands in the subshell.
When you use xargs, it knows nothing about the shell so it can't make variable assignments.
Try set -o pipefail then you can get the status from $?.
xargs is run in a child process, as are all the commands you call. So they can't effect the environment of your shell.
You might be able to do something with named pipes (mkfifo), or possible bash's read function?
EDIT:
Maybe just redirect the output to a file, then you can use PIPESTATUS:
command1 | command2 | command3 >/tmp/tmpfile
## Examine PIPESTATUS
X=$(cat /tmp/tmpfile)
How about ...
read x <<<"$(echo 1)"
read x < <(echo 1)
echo "$x"
Why not just populate a new array?
IFS=$'\n' read -r -d '' -a result < <(echo a | cat | cat; echo "PIPESTATUS='${PIPESTATUS[*]}'" )
IFS=$'\n' read -r -d '' -a result < <(echo a | exit 1 | cat; echo "PIPESTATUS='${PIPESTATUS[*]}'" )
echo "${#result[#]}"
echo "${result[#]}"
echo "${result[0]}"
echo "${result[1]}"
There are already a few helpful solutions. It turns out that I actually had an example that matches the question as framed above; close-enough anyway.
Consider this:
XX=$(ls -l *.cpp | wc -l | xargs -I{} echo {})
echo $XX
3
Meaning that I had 3 x .cpp files to in my working directory. Now $XX is 3 and I can make use of that result in my script. It is contrived, because I don't actually need the xargs in this example. It works though.
In the example from the question ...
x=`exit 1|cat`
I don't think that will give you what was specified. exit will quit the sub-shell before the cat gets a mention. Also on that note,
I might start with something like
declare -a PIPESTATUS='([0]="0")'
x=$?
x now has the status from the last command.
Assign each line of input to an array, e.g. all python files in a directory
declare -a pyFiles=($(ls -l *.py | awk '{print $9}'))
where $9 is the nineth field in ls -l corresponding to the filename

Resources