Idiomatic Analog to Ruby's `Object#tap` for Unix command Pipelines? - ruby

Is there an idiomatic analog to Ruby's Object#tap for Unix command pipelines?
Use case: within a pipeline I want to execute a command for its side effects but return the input implicitly so as to not break the chaining of the pipeline. For example:
echo { 1, 2, 3 } |
tr ' ' '\n' |
sort |
tap 'xargs echo' | # arbitrary code, but implicitly return the input
uniq
I'm coming from Ruby, where I would do this:
[ 1, 2, 3 ].
sort.
tap { |x| puts x }.
uniq

The tee command is similar; it writes its input to standard output as well as one or more files. If that file is a process substitution, you get the same effect, I believe.
echo 1 2 3 | tr ' ' '\n' | sort | tee >( **code** ) | uniq
The code in the process substitution would read from its standard input, which should be the same thing that the call to uniq ends up seeing.

Related

Parsing CSV records when a value is multiline

Source file looks like this:
"google.com", "vuln_example1
vuln_example2
vuln_example3"
"facebook.com", "vuln_example2"
"reddit.com", "stupidly_long_vuln_name1"
"stackoverflow.com", ""
I've been trying to get the output to be something like this but the line breaks seem to cause me no end of problems. I'm using a "while read line" job to do this because I do some processing on the columns (e.g Vulnerability count and url in this example). This is output into a jenkins job (yuk).
The basic summary of the problem is getting the linebreaks in the csv to be output into the third column while retaining the table structure. I've got a sort of weird example of the desired output below.
||hostname ||Vulnerability count|| Vulnerability list || URL ||
|google.com |3 |vuln_example1 |http://cve.com/vuln_example1|
| | |vuln_example2 |http://cve.com/vuln_example2|
| | |vuln_example3 |http://cve.com/vuln_example3|
|facebook.com |1 |vuln_example2 |http://cve.com/vuln_example2|
|reddit.com |1 |stupidly_long_vuln_name1 |http://cve.com/stupidly_long_vuln_name1|
|stackoverflow.com |0 | ||
Looking at this... I've got a feeling it might be easier by showing some code and example output.
Parsing your input with the command line below makes the problem easier (I'm assuming the inputs are correct):
perl -0777 -pe 's/([^"])\s*\n/\1 /g ; s/[",]//g' < sample.txt
This line invokes Perl to perform two regex substitutions:
s/([^"])\s*\n/\1 /g: This substitution removes an end of line if it doesn't terminate by a quote " (i.e. if a host entry, with all vulnerabilities isn't yet complete).
s/[",]//g removes all quotes and commas remaining.
For each host entry like this one:
"google.com", "vuln_example1
vuln_example2
vuln_example3"
You'll get:
google.com vuln_example1 vuln_example2 vuln_example3
Then you can assume for each line, you have an host and a set of vulnerabilities.
The given example below stores vulnerabilities in an array and loop through it, formatting and printing each line:
# Replace this by your custom function
# to get an URL for a given vulnerability
function get_vuln_url () {
# This just displays a random url for an non-empty arg
[[ -z "$1" ]] || echo "http://host/$1.htm"
}
# Format your line (see printf help)
function print_row () {
printf "%-20s|%5s|%-30s|%s\n" "$#"
}
# The perl line reformat
perl -0777 -pe 's/([^"])\s*\n/\1 /g ; s/[",]//g' < sample.txt |
while read -r line ; do
arr=(${line})
print_row "${arr[0]}" "$((${#arr[#]} - 1))" "${arr[1]}" "$(get_vuln_url ${arr[1]})"
#echo -e "${arr[0]}\t|$vul_count\t|${arr[1]}\t|$(get_vuln_url ${arr[1]})"
for v in "${arr[#]:2}" ; do
print_row " " " " "$v" "$(get_vuln_url ${arr[1]})"
done
done
Output:
google.com | 3|vuln_example1 |http://host/vuln_example1.htm
| |vuln_example2 |http://host/vuln_example1.htm
| |vuln_example3 |http://host/vuln_example1.htm
facebook.com | 1|vuln_example2 |http://host/vuln_example2.htm
reddit.com | 1|stupidly_long_vuln_name1 |http://host/stupidly_long_vuln_name1.htm
stackoverflow.com | 0| |
Update.
If you don't have Perl, and if your file doesn't have tabulations, you can use this command as a workaround instead:
tr '\n' '\t' < sample.txt | sed -r -e 's/([^"])\s*\t/\1 /g' -e 's/[",]//g' -e 's/\t/\n/g'
tr '\n' '\t' replaces all ends of line by tabulations
sed part acts like Perl line, except it deals with tabulations instead of ends of line and restores tabulations back to ends of line.

How to append lots of variables to one variable with a simple command

I want to stick all the variables into one variable
A=('blah')
AA=('blah2')
AAA=('blah3')
AAB=('blah4')
AAC=('blah5')
#^^lets pretend theres 100 more of these ^^
#Variable composition
#after AAA, is AAB then AAC then AAD etc etc, does that 100 times
I want them all placed into this MASTER variable
#MASTER=${A}${AA}${AAA} (<-- insert AAB, AAC and 100 more variables here)
I obviously don't want to type 100 variables in this expression because there's probably an easier way to do this. Plus I'm gonna be doing more of these therefore I need it automated.
I'm relatively new to sed, awk, is there a way to append those 100 variables into the master variable?
For this specific purpose I DO NOT want an array.
You can use a simple one-liner, quite straightforward, though more expensive:
master=$(set | grep -E '^(A|AA|A[A-D][A-D])=' | sort | cut -f2- -d= | tr -d '\n')
set lists all the variables in var=name format
grep filters out the variables we need
sort puts them in the right order (probably optional since set gives a sorted output)
cut extracts the values, removing the variable names
tr removes the newlines
Let's test it.
A=1
AA=2
AAA=3
AAB=4
AAC=5
AAD=6
AAAA=99 # just to make sure we don't pick this one up
master=$(set | grep -E '^(A|AA|A[A-D][A-D])=' | sort | cut -f2- -d= | tr -d '\n')
echo "$master"
Output:
123456
With my best guess, how about:
#!/bin/bash
A=('blah')
AA=('blah2')
AAA=('blah3')
AAB=('blah4')
AAC=('blah5')
# to be continued ..
for varname in A AA A{A..D}{A..Z}; do
value=${!varname}
if [ -n "$value" ]; then
MASTER+=$value
fi
done
echo $MASTER
which yields:
blahblah2blah3blah4blah5...
Although I'm not sure whether this is what the OP wants.
echo {a..z}{a..z}{a..z} | tr ' ' '\n' | head -n 100 | tail -n 3
adt
adu
adv
tells us, that it would go from AAA to ADV to reach 100, or for ADY for 103.
echo A{A..D}{A..Z} | sed 's/ /}${/g'
AAA}${AAB}${AAC}${AAD}${AAE}${AAF}${AAG}${AAH}${AAI}${AAJ}${AAK}${AAL}${AAM}${AAN}${AAO}${AAP}${AAQ}${AAR}${AAS}${AAT}${AAU}${AAV}${AAW}${AAX}${AAY}${AAZ}${ABA}${ABB}${ABC}${ABD}${ABE}${ABF}${ABG}${ABH}${ABI}${ABJ}${ABK}${ABL}${ABM}${ABN}${ABO}${ABP}${ABQ}${ABR}${ABS}${ABT}${ABU}${ABV}${ABW}${ABX}${ABY}${ABZ}${ACA}${ACB}${ACC}${ACD}${ACE}${ACF}${ACG}${ACH}${ACI}${ACJ}${ACK}${ACL}${ACM}${ACN}${ACO}${ACP}${ACQ}${ACR}${ACS}${ACT}${ACU}${ACV}${ACW}${ACX}${ACY}${ACZ}${ADA}${ADB}${ADC}${ADD}${ADE}${ADF}${ADG}${ADH}${ADI}${ADJ}${ADK}${ADL}${ADM}${ADN}${ADO}${ADP}${ADQ}${ADR}${ADS}${ADT}${ADU}${ADV}${ADW}${ADX}${ADY}${ADZ
The final cosmetics is easily made by hand.
One-liner using a for loop:
for n in A AA A{A..D}{A..Z}; do str+="${!n}"; done; echo ${str}
Output:
blahblah2blah3blah4blah5
Say you have the input file inputfile.txt with arbitrary variable names and values:
name="Joe"
last="Doe"
A="blah"
AA="blah2
then do:
master=$(eval echo $(grep -o "^[^=]\+" inputfile.txt | sed 's/^/\$/;:a;N;$!ba;s/\n/$/g'))
This will concatenate the values of all variables in inputfile.txt into master variable. So you will have:
>echo $master
JoeDoeblahblah2

How to assign command output to a variable

I want to assign the ouput of the following command to a variable in shell:
${arr2[0]} | rev | cut -c 9- | rev
For example:
mod=${arr2[0]} | rev | cut -c 9- | rev
echo $mod
The above method is not working: the output is blank.
I also tried:
mod=( "${arr2[0]}" | rev | cut -c 9- | rev )
But I get the error:
34: syntax error near unexpected token `|'
line 34: ` mod=( "${arr2[0]}" | rev | cut -c 9- | rev ) '
To add an explanation to your correct answer:
You had to combine your variable assignment with a command substitution (var=$(...)) to capture the (stdout) output of your command in a variable.
By contrast, your original command used just var=(...) - no $ before the ( - which is used to create arrays[1], with each token inside ( ... ) becoming its own array element - which was clearly not your intent.
As for why your original command broke:
The tokens inside (...) are subject to the usual shell expansions and therefore the usual quoting requirements.
Thus, in order to use $ and the so-called shell metacharacters (| & ; ( ) < > space tab) as literals in your array elements, you must quote them, e.g., by prepending \.
All these characters - except $, space, and tab - cause a syntax error when left unquoted, which is what happened in your case (you had unquoted | chars.)
[1] In bash, and also in ksh and zsh. The POSIX shell spec. doesn't support arrays at all, so this syntax will always break in POSIX-features-only shells.
mod=$(echo "${arr2[0]}" | rev | cut -c 9- | rev )
echo "****:"$mod
or
mod=`echo "${arr2[0]}" | rev | cut -c 9- | rev`
echo "****:"$mod

If xargs is map, what is filter?

I think of xargs as the map function of the UNIX shell. What is the filter function?
EDIT: it looks like I'll have to be a bit more explicit.
Let's say I have to hand a program which accepts a single string as a parameter and returns with an exit code of 0 or 1. This program will act as a predicate over the strings that it accepts.
For example, I might decide to interpret the string parameter as a filepath, and define the predicate to be "does this file exist". In this case, the program could be test -f, which, given a string, exits with 0 if the file exists, and 1 otherwise.
I also have to hand a stream of strings. For example, I might have a file ~/paths containing
/etc/apache2/apache2.conf
/foo/bar/baz
/etc/hosts
Now, I want to create a new file, ~/existing_paths, containing only those paths that exist on my filesystem. In my case, that would be
/etc/apache2/apache2.conf
/etc/hosts
I want to do this by reading in the ~/paths file, filtering those lines by the predicate test -f, and writing the output to ~/existing_paths. By analogy with xargs, this would look like:
cat ~/paths | xfilter test -f > ~/existing_paths
It is the hypothesized program xfilter that I am looking for:
xfilter COMMAND [ARG]...
Which, for each line L of its standard input, will call COMMAND [ARG]... L, and if the exit code is 0, it prints L, else it prints nothing.
To be clear, I am not looking for:
a way to filter a list of filepaths by existence. That was a specific example.
how to write such a program. I can do that.
I am looking for either:
a pre-existing implementation, like xargs, or
a clear explanation of why this doesn't exist
If map is xargs, filter is... still xargs.
Example: list files in the current directory and filter out non-executable files:
ls | xargs -I{} sh -c "test -x '{}' && echo '{}'"
This could be made handy trough a (non production-ready) function:
xfilter() {
xargs -I{} sh -c "$* '{}' && echo '{}'"
}
ls | xfilter test -x
Alternatively, you could use a parallel filter implementation via GNU Parallel:
ls | parallel "test -x '{}' && echo '{}'"
So, youre looking for the:
reduce( compare( filter( map(.. list()) ) ) )
what can be rewiritten as
list | map | filter | compare | reduce
The main power of bash is a pipelining, therefore isn't need to have one special filter and/or reduce command. In fact nearly all unix commands could act in one (or more) functions as:
list
map
filter
reduce
Imagine:
find mydir -type f -print | xargs grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1
^------list+filter------^ ^--------map-----------^ ^--filter--^ ^compare^ ^reduce^
Creating a test case:
mkdir ./testcase
cd ./testcase || exit 1
for i in {1..10}
do
strings -1 < /dev/random | head -1000 > file.$i.txt
done
mkdir emptydir
You will get a directory named testcase and in this directory 10 files and one directory
emptydir file.1.txt file.10.txt file.2.txt file.3.txt file.4.txt file.5.txt file.6.txt file.7.txt file.8.txt file.9.txt
each file contains 1000 lines of random strings some lines are contains only numbers
now run the command
find testcase -type f -print | xargs grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1
and you will get the largest number-only line from each files like: 42. (of course, this can be done more effectively, this is only for demo)
decomposed:
The find testcase -type f -print will print every plain files so, LIST (and reduced only to files). ouput:
testcase/file.1.txt
testcase/file.10.txt
testcase/file.2.txt
testcase/file.3.txt
testcase/file.4.txt
testcase/file.5.txt
testcase/file.6.txt
testcase/file.7.txt
testcase/file.8.txt
testcase/file.9.txt
the xargs grep -H '^[0-9]*$' as MAP will run a grep command for each file from a list. The grep is usually using as filter, e.g: command | grep, but now (with xargs) changes the input (filenames) to (lines containing only digits). Output, many lines like:
testcase/file.1.txt:1
testcase/file.1.txt:8
....
testcase/file.9.txt:4
testcase/file.9.txt:5
structure of lines: filename colon number, want only numbers so calling a pure filter, what strips out the filenames from each line cut -d: -f2. It outputs many lines like:
1
8
...
4
5
Now the reduce (getting the largest number), the sort -nr sorts all number numerically and reverse order (desc), so its output is like:
42
18
9
9
...
0
0
and the head -1 print the first line (the largest number).
Of course, you can write your own list/filter/map/reduce functions directly with bash programming constructions (loops, conditions and such), or you can employ any fullblown scripting language like perl, special languages like awk, sed "language", or dc (rpn) and such.
Having an special filter command such:
list | filter_command cut -d: -f 2
is simple doesn't needed, because you can use directly the
list | cut
You can have awk do the filter and reduce function.
Filter:
awk 'NR % 2 { $0 = $0 " [EVEN]" } 1'
Reduce:
awk '{ p = p + $0 } END { print p }'
I totally understand your question here as a long time functional programmer and here is the answer: Bash/unix command pipelining isn't as clean as you'd hoped.
In the example above:
find mydir -type f -print | xargs grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1
^------list+filter------^ ^--------map-----------^ ^--filter--^ ^compare^ ^reduce^
a more pure form would look like:
find mydir | xargs -L 1 bash -c 'test -f $1 && echo $1' _ | grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1
^---list--^^-------filter---------------------------------^^------map----------^^--map-------^ ^reduce^
But, for example, grep also has a filtering capability: grep -q mypattern which simply return 0 if it matches the pattern.
To get a something more like what you want, you simply would have to define a filter bash function and make sure to export it so it was compatible with xargs
But then you get into some problems. Like, test has binary and unary operators. How will your filter function handle this? Hand, what would you decide to output on true for these cases? Not insurmountable, but weird. Assuming only unary operations:
filter(){
while read -r LINE || [[ -n "${LINE}" ]]; do
eval "[[ ${LINE} $1 ]]" 2> /dev/null && echo "$LINE"
done
}
so you could do something like
seq 1 10 | filter "> 4"
5
6
7
8
9
As I wrote this I kinda liked it

Pipe output to two different commands not interlaced

Using techniques mentioned here (Pipe output to two different commands) we can split a stdout into multiple processes.
expensive_command | tee >(proc_1) >(proc_2) | proc_3
my problem is this interlaces the output.
Is there a way to copy the stdout but force proc_2 to block until proc_1 finishes?
I'm thinking something like
expensive_command | tee >(proc_1) | wait for EOF | tee >(proc_2) ...
You can use a fifo as a cheap lock. Have proc1 write to it after it completes, and wait until a read from the fifo succeeds before running proc2.
mkfifo cheap_lock
expensive_command | tee >(proc1; echo foo > cheap_lock) \
>(read < cheap_lock; proc2 ) | proc3
(Of course, it's your responsibility to ensure that no other processes try to read from or write to cheap_lock.)
You can create a buffer holder that would release the output once data from input reaches eof like
expensive_command | awk '{ a[i++] = $0 }END{for (i = 0; i in a; ++i) { print a[i] | "tee temp.txt" } }'
Only that awk does not support process substitution.
In bash you can do:
readarray -t lines <(expressive_command | tee >(proc_1))
printf '%s\n' "${lines[#]}" | tee >(proc_2)
Depending on the peak data size of your output from expressive_command or version of your Bash, the command may require adjustments. You can also consider using another language.
Add: You can also use stdbuf. It runs command with modified buffering operations for its standard streams.

Resources