XARGS with for loop pr - bash

Hi I am working in bash shell with a file of file names that contains multiple files for the same sample on different lines
file.txt
Filename1_1 SampleName1
Filename1_2 SampleName1
Filename2_1 SampleName2
Filename2_2 SampleName2
I am trying to use xargs with a for loop to pass the filenames into one argument (i.e print Filename1_1 FileName1_2).
Which would be the effect of :
cat file.txt | xargs bash -c 'echo ${0} ${2}'
Since it is quite a long file i cannot use this repeatedly and thought using a for loop will help. But isn't producing the output i expected
Here is what i thought would be simple to do.
for (( i = 0,j=2; i<=63; i= i+4,j=j+4 ))
do
cat file.txt | xargs bash -c 'echo ${i} ${j}'
done
However running this loops through and prints a bunch of blank lines.
Anyone have an idea of getting this to work like i want?
I am looking for an output that looks like below to pass each line to another function
Filename1_1 Filename1_2
Filename2_1 Filename2_2
Filename3_1 Filename3_2
Filename4_1 Filename4_2

Just use -n2 and specify maximum number of arguments.
<file.txt xargs -n2 bash -c 'echo $1 $2' _

Related

How to properly pass filenames with spaces with $* from xargs to sed via sh?

Disclaimer: this happens on macOS (Big Sur); more info about the context below.
I have to write (almost did) a script which will replace images URLs in big text (xml) files by their Base64-encoded value.
The script should run the same way with single filenames or patterns, or both, e.g.:
./replace-encode single.xml
./replace-encode pattern*.xml
./replace-encode single.xml pattern*.xml
./replace-encode folder/*.xml
Note: it should properly handle files\ with\ spaces.xml
So I ended up with this script:
#!/bin/bash
#needed for `ls` command
IFS=$'\n'
ls -1 $* | xargs -I % sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' % | xargs -tI % sh -c 'sed -i "" "s#%#`curl -s % | base64`#" $0' "$*"
What it does: ls all files, pipe the list to xargs then search all URLs surrounded by anchors (hence the > and < in the search expr. - also had to use sed because grep is limited on macOS), then pipe again to a sh script which runs the sed search & replace, where the remplacement is the big Base64 string.
This works perfectly fine... but only for fileswithoutspaces.xml
I tried to play with $0 vs $1, $* vs $#, w/ or w/o " but to no avail.
I don't understand exactly how does the variable substitution (is it how it's called? - not a native English speaker, and above all, not a script-writer at all!!! just a Java dev. all day long...) work between xargs, sh or even bash with arguments like filenames.
The xargs -t is here to let me check out how the substitution works, and that's how I noticed that using a pattern worked but I have to let the " around the last $*, otherwise only the 1st file is searched & replaced; output is like:
user#host % ./replace-encode pattern*.xml
sh -c sed -i "" "s#https://www.some.com/public/123456.jpg#`curl -s https://www.some.com/public/123456.jpg | base64`#" $0 pattern_123.xml
pattern_456.xml
Both pattern_123.xml and pattern_456.xml are handled here; w/ $* instead of "$*" in the end of the command, only pattern_123.xml is handled.
So is there a simple way to "fix" this?
Thank you.
Note: macOS commands have some limitations (I know) but as this script is intended to non-technical users, I can't ask them to install (or have the IT team installed on their behalf) some alternate GNU-versions installed e.g. pcregrep or 'ggrep' like I've read many times...
Also: I don't intend to change from xargs to for loops or so because, 1/ don't have the time, 2/ might want to optimize the 2nd step where some URLs might be duplicate or so.
There's no reason for your software to use ls or xargs, and certainly not $*.
./replace-encode single.xml
./replace-encode pattern*.xml
./replace-encode single.xml pattern*.xml
./replace-encode folder/*.xml
...will all work fine with:
#!/usr/bin/env bash
while IFS= read -r line; do
replacement=$(curl -s "$line" | base64)
in="$line" out="$replacement" perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' "$#"
done < <(sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' "$#" | sort | uniq)
Finally ended up with this single-line script:
sed -nr 's/.*>(https?:\/\/[^<]+)<.*/\1/p' "$#" | xargs -I% sh -c 'sed -i "" "s#%#`curl -s % | base64`#" "$#"' _ "$#"
which does properly support filenames with or without spaces.

grep from 7 GB text file OR many smaller ones

I have about two thousand text files in folder.
I want to loop each one and search for specific word in line.
for file in "./*.txt";
do
cat $file | grep "banana"
done
I was wondering if join all text files into one file would be faster.
The whole directory has about 7 GB.
You're not actually looping, you're calling cat just once on the string ./*.txt, i.e., your script is equivalent to
cat ./*.txt | grep 'banana'
This is not equivalent to
grep 'banana' ./*.txt
though, as the output for the latter would prefix the filename for each match; you could use
grep -h 'banana' ./*.txt
to suppress filenames.
The problem you could run into is that ./*.txt expands to something that is longer than the maximum command line length allowed; to prevent that, you could do something like
printf '%s\0' ./*.txt | xargs -0 grep -h 'banana'
which is save for both files containing blanks and shell metacharacters and calls grep as few times as possible1.
This can even be parallelized; to run 4 grep processes in parallel, each handling 5 files at a time:
printf '%s\0' ./*.txt | xargs -0 -L 5 -P 4 grep -h 'banana'
What I think you intended to run is this:
for file in ./*.txt; do
cat "$file" | grep "banana"
done
which would call cat/grep once per file.
1At first I thought that printf would run into trouble with command line length limitations as well, but it seems that as a shell built-in, it's exempt:
$ touch '%s\0' {1000000..10000000} > /dev/null
-bash: /usr/bin/touch: Argument list too long
$ printf '%s\0' {1000000..10000000} > /dev/null
$

Add a prefix to logs with AWK

I am facing a problem with a script I need to use for log analysis; let me explain the question:
I have a gzipped file like:
5555_prova.log.gz
Inside the file there are mali lines of log like this one:
2018-06-12 03:34:31 95.245.15.135 GET /hls.playready.vod.mediasetpremium/farmunica/2018/06/218742_163f10da04c7d2/hlsrc/w12/21.ts
I need a script read the gzipped log file which is capable to output on the stdout a modified log line like this one:
5555 2018-06-12 03:34:31 95.245.15.135 GET /hls.playready.vod.mediasetpremium/farmunica/2018/06/218742_163f10da04c7d2/hlsrc/w12/21.ts
As you can see the line of log now start with the number read from the gzip file name.
I need this new line to feed a logstash data crunching chain.
I have tried with a script like this:
echo "./5555_prova.log.gz" | xargs -ISTR -t -r sh -c "gunzip -c STR | awk '{$0="5555 "$0}' "
this is not exactly what I need (the prefix is static and not captured with a regular expression from the file name) but even with this simplified version I receive an error:
sh -c gunzip -c ./5555_prova.log.gz | awk '{-bash=5555 -bash}'
-bash}' : -c: line 0: unexpected EOF while looking for matching `''
-bash}' : -c: line 1: syntax error: unexpected end of file
As you can see from the above output the $0 is no more the whole line passed via pipe to awk but is a strange -bash.
I need to use xargs because the list of gzipped file is fed the the command line from an another tool (i.e. an instantiated inotifywait listening to a directory where the files are written via ftp).
What I am missing? do you have some suggestions to point me in the right direction?
Regards,
S.
Trying to following the #Charles Duffy suggestion I have written this code:
#/bin/bash
#
# Usage: sendToLogstash.sh [pattern]
#
# Executes a command whenever files matching the pattern are closed in write
# mode or moved to. "{}" in the command is replaced with the matching filename (via xargs).
# Requires inotifywait from inotify-tools.
#
# For example,
#
# whenever.sh '/usr/local/myfiles/'
#
#
DIR="$1"
PATTERN="\.gz$"
script=$(cat <<'EOF'
awk -v filename="$file" 'BEGIN{split(filename,array,"_")}{$0=array[1] OFS $0} 1' < $(gunzip -dc "$DIR/$file")
EOF
)
inotifywait -q --format '%f' -m -r -e close_write -e moved_to "$DIR" \
| grep --line-buffered $PATTERN | xargs -I{} -r sh -c "file={}; $script"
But I got the error:
[root#ms-felogstash ~]# ./test.sh ./poppo
gzip: /1111_test.log.gz: No such file or directory
gzip: /1111_test.log.gz: No such file or directory
sh: $(gunzip -dc "$DIR/$file"): ambiguous redirect
Thanks for your help, I feel very lost writing bash scripts.
Regards,
S.
EDIT: Also in case you are dealing with multiple .gz files and want to print their content along with their file names(first column _ delimited) then following may help you.
for file in *.gz; do
awk -v filename="$file" 'BEGIN{split(filename,array,"_")}{$0=array[1] OFS $0} 1' <(gzip -dc "$file")
done
I haven't tested your code(couldn't completely understand also), so trying to give here a way like in case your code could pass file name to awk then it will be pretty simple to append the file's first digits like as follows(just an example).
awk 'FNR==1{split(FILENAME,array,"_")} {$0=array[1] OFS $0} 1' 5555_prova.log_file
So here I am taking FILENAME out of the box variable for awk(only in first line of file) and then by splitting it into array named array and then adding it in each line of the file.
Also wrap "gunzip -c STR this with ending " which seems to be missing before you pass its output to awk too.
NEVER, EVER use xargs -I with a string substituted into sh -c (or bash -c or any other context where that string is interpreted as code). This allows malicious filenames to run arbitrary commands -- think about what happens if someone runs touch $'$(rm -rf ~)\'$(rm -rf ~)\'.gz', and gets that file into your log.
Instead, let xargs append arguments after your script text, and write your script to iterate over / read those arguments as data, rather than having them substituted into code.
To show how to use xargs safely (well, safely if we assume that you've filtered out filenames with literal newlines):
# This way you don't need to escape the quotes in your script by hand
script=$(cat <<'EOF'
for arg; do gunzip -c <"$arg" | awk '{$0="5555 "$0}'; done
EOF
)
# if you **did** want to escape them by hand, it would look like this:
# script='for arg; do gunzip -c <"$arg" | awk '"'"'{$0="5555 "$0}'"'"'; done'
echo "./5555_prova.log.gz" | xargs -d $'\n' sh -c "$script" _
To be safer with all possible filenames, you'd instead use:
printf '%s\0' "./5555_prova.log.gz" | xargs -0 sh -c "$script" _
Note the use of NUL-delimited input (created with printf '%s\0') and xargs -0 to consume it.

pipe tail output into another script

I am trying to pipe the output of a tail command into another bash script to process:
tail -n +1 -f your_log_file | myscript.sh
However, when I run it, the $1 parameter (inside the myscript.sh) never gets reached. What am I missing? How do I pipe the output to be the input parameter of the script?
PS - I want tail to run forever and continue piping each individual line into the script.
Edit
For now the entire contents of myscripts.sh are:
echo $1;
Generally, here is one way to handle standard input to a script:
#!/bin/bash
while read line; do
echo $line
done
That is a very rough bash equivalent to cat. It does demonstrate a key fact: each command inside the script inherits its standard input from the shell, so you don't really need to do anything special to get access to the data coming in. read takes its input from the shell, which (in your case) is getting its input from the tail process connected to it via the pipe.
As another example, consider this script; we'll call it 'mygrep.sh'.
#!/bin/bash
grep "$1"
Now the pipeline
some-text-producing-command | ./mygrep.sh bob
behaves identically to
some-text-producing-command | grep bob
$1 is set if you call your script like this:
./myscript.sh foo
Then $1 has the value "foo".
The positional parameters and standard input are separate; you could do this
tail -n +1 -f your_log_file | myscript.sh foo
Now standard input is still coming from the tail process, and $1 is still set to 'foo'.
Perhaps your were confused with awk?
tail -n +1 -f your_log_file | awk '{
print $1
}'
would print the first column from the output of the tail command.
In the shell, a similar effect can be achieved with:
tail -n +1 -f your_log_file | while read first junk; do
echo "$first"
done
Alternatively, you could put the whole while ... done loop inside myscript.sh
Piping connects the output (stdout) of one process to the input (stdin) of another process. stdin is not the same thing as the arguments sent to a process when it starts.
What you want to do is convert the lines in the output of your first process into arguments for the the second process. This is exactly what the xargs command is for.
All you need to do is pipe an xargs in between the initial command and it will work:
tail -n +1 -f your_log_file | xargs | myscript.sh

How to apply shell command to each line of a command output?

Suppose I have some output from a command (such as ls -1):
a
b
c
d
e
...
I want to apply a command (say echo) to each one, in turn. E.g.
echo a
echo b
echo c
echo d
echo e
...
What's the easiest way to do that in bash?
It's probably easiest to use xargs. In your case:
ls -1 | xargs -L1 echo
The -L flag ensures the input is read properly. From the man page of xargs:
-L number
Call utility for every number non-empty lines read.
A line ending with a space continues to the next non-empty line. [...]
You can use a basic prepend operation on each line:
ls -1 | while read line ; do echo $line ; done
Or you can pipe the output to sed for more complex operations:
ls -1 | sed 's/^\(.*\)$/echo \1/'
for s in `cmd`; do echo $s; done
If cmd has a large output:
cmd | xargs -L1 echo
You can use a for loop:
for file in * ; do
echo "$file"
done
Note that if the command in question accepts multiple arguments, then using xargs is almost always more efficient as it only has to spawn the utility in question once instead of multiple times.
You actually can use sed to do it, provided it is GNU sed.
... | sed 's/match/command \0/e'
How it works:
Substitute match with command match
On substitution execute command
Replace substituted line with command output.
A solution that works with filenames that have spaces in them, is:
ls -1 | xargs -I %s echo %s
The following is equivalent, but has a clearer divide between the precursor and what you actually want to do:
ls -1 | xargs -I %s -- echo %s
Where echo is whatever it is you want to run, and the subsequent %s is the filename.
Thanks to Chris Jester-Young's answer on a duplicate question.
xargs fails with with backslashes, quotes. It needs to be something like
ls -1 |tr \\n \\0 |xargs -0 -iTHIS echo "THIS is a file."
xargs -0 option:
-0, --null
Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are
not special (every character is taken literally). Disables the end of file string, which is treated like
any other argument. Useful when input items might contain white space, quote marks, or backslashes. The
GNU find -print0 option produces input suitable for this mode.
ls -1 terminates the items with newline characters, so tr translates them into null characters.
This approach is about 50 times slower than iterating manually with for ... (see Michael Aaron Safyans answer) (3.55s vs. 0.066s). But for other input commands like locate, find, reading from a file (tr \\n \\0 <file) or similar, you have to work with xargs like this.
i like to use gawk for running multiple commands on a list, for instance
ls -l | gawk '{system("/path/to/cmd.sh "$1)}'
however the escaping of the escapable characters can get a little hairy.
Better result for me:
ls -1 | xargs -L1 -d "\n" CMD

Resources