grep -l does not behave as expected on piping to xargs [duplicate] - bash

This question already has answers here:
How to ignore xargs commands if stdin input is empty?
(7 answers)
Closed 4 years ago.
So I have a command like: grep "\"tool\":\"SEETEST\"" * -l Which works great standalone - it prints out a list of JSON files generated for the selected tool in the current directory.
But then, if I were to pipe it to xargs ls like that:
grep "\"tool\":\"SEETEST\"" * -l | xargs ls -lSh
It prints all the files in the current directory!
How do I make it print just the matched filenames and pipe them to ls sorted by size?

If there are not matches for xargs, then it will list all files in the current directory:
#----------- current files in the directory
mortiz#florida:~/Documents/projects/bash/test$ ls -ltr
total 8
-rw-r--r-- 1 mortiz mortiz 585 Jun 18 12:13 json.example2
-rw-r--r-- 1 mortiz mortiz 574 Jun 18 12:14 json.example
#----------- using your command
mortiz#florida:~/Documents/projects/bash/test$ grep "\"title\": \"example\"" * -l
json.example
#-----------adding xargs to the previous command
mortiz#florida:~/Documents/projects/bash/test$ grep "\"title\": \"example\"" * -l | xargs ls -lSh
-rw-r--r-- 1 mortiz mortiz 574 Jun 18 12:14 json.example
#-----------adding purposely an error on "title"
mortiz#florida:~/Documents/projects/bash/test$ grep "\"titleo\": \"example\"" * -l | xargs ls -lSh
total 8.0K
-rw-r--r-- 1 mortiz mortiz 585 Jun 18 12:13 json.example2
-rw-r--r-- 1 mortiz mortiz 574 Jun 18 12:14 json.example
If you want to use xargs and grep didn't return any match, then add "-r | --no-run-if-empty" that will prevent xargs to list all the files in the current directory:
grep "\"titleo\": \"example\"" * -l | xargs -r ls -lSh

Related

For loop with if statements isn't working as expected in bash

It only prints the "else" statement for everything but I know for a fact the files exist that it's looking for. I've tried adapting some of the other answers but I thought this should definitely work.
Does anyone know what's wrong with my syntax?
# Contents of script
for ID_SAMPLE in $(cut -f1 metadata.tsv | tail -n +2);
do if [ -f ./output/${ID_SAMPLE} ]; then
echo Skipping ${ID_SAMPLE};
else
echo Processing ${ID_SAMPLE};
fi
done
Additional information
# Output directory
(base) -bash-4.1$ ls -lhS output/
total 170K
drwxr-xr-x 8 jespinoz tigr 185 Jan 3 16:16 ERR1701760
drwxr-xr-x 8 jespinoz tigr 185 Jan 17 18:03 ERR315863
drwxr-xr-x 8 jespinoz tigr 185 Jan 16 23:23 ERR599042
drwxr-xr-x 8 jespinoz tigr 185 Jan 17 00:10 ERR599072
drwxr-xr-x 8 jespinoz tigr 185 Jan 16 13:00 ERR599078
# Example of inputs
(base) -bash-4.1$ cut -f1 metadata.tsv | tail -n +2 | head -n 10
ERR1701760
ERR599078
ERR599079
ERR599070
ERR599071
ERR599072
ERR599073
ERR599074
ERR599075
ERR599076
# Output of script
(base) -bash-4.1$ bash test.sh | head -n 10
Processing ERR1701760
Processing ERR599078
Processing ERR599079
Processing ERR599070
Processing ERR599071
Processing ERR599072
Processing ERR599073
Processing ERR599074
Processing ERR599075
Processing ERR599076
# Checking a directory
(base) -bash-4.1$ ls -l ./output/ERR1701760
total 294
drwxr-xr-x 2 jespinoz tigr 386 Jan 15 21:00 checkpoints
drwxr-xr-x 2 jespinoz tigr 0 Jan 10 01:36 tmp
-f is for checking whether the name is a file, but all your names are directories. Use -d to check that.
if [ -d "./output/$ID_SAMPLE" ]
then
If you want to check whether the name exists with any type, use -e.

Csh - Fetching fields via awk inside xargs

I'm struggling to understand this behavior:
Script behavior: read a file (containing dates); print a list of files in a multi-level directory tree and get their size, print the file size only, (future step: sum the overall file size).
Starting script:
cat dates | xargs -I {} sh -c "echo '{}: '; du -d 2 "/folder/" | grep {} | head"
2000-03:
1000 /folder/2000-03balbasldas
2000-04:
12300 /folder/2000-04asdwqdas
[and so on]
But when I try to filter via awk on the first field, I still get the whole line
cat dates | xargs -I {} sh -c "echo '{}: '; du -d 2 "/folder/" | grep {} | awk '{print $1}'"
2000-03:
1000 /folder/2000-03balbasldas
2000-04:
12300 /folder/2000-04asdwqdas
I've already approached it via divide-et-impera, and the following command works just fine:
du -d 2 "/folder/" | grep '2000-03' | awk '{print $1}'
1000
I'm afraid that I'm missing something very trivial, but I haven't found anything so far.
Any idea? Thanks!
Input: directory containing folders named YYYY-MM-random_data and a file containing strings:
ls -l
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-03-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-04-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-05-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-06-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-06-blablablb
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-06-blablablc
[...]
cat dates
2000-03
2000-04
2000-05
[...]
Expected output: sum of the disk space occupied by all the files contained in the folder whose name include the string in the file dates
2000-03: 1000
2000-04: 2123
2000-05: 1222112
[...]
======
But in particular, I'm interested in why awk is not able to fetch the column $1 I asked it to.
Ok it seems I found the answer myself after a lot of research :D
I'll post it here, hoping that it will help somebody else out.
https://unix.stackexchange.com/questions/282503/right-syntax-for-awk-usage-in-combination-with-other-command-inside-xargs-sh-c
The trick was to escape the $ sign.
cat dates | xargs -I {} sh -c "echo '{}: '; du -d 2 "/folder/" | grep {} | awk '{print \$1}'"
Using GNU Parallel it looks like this:
parallel --tag "eval du -s folder/{}* | perl -ne '"'$s+=$_ ; END {print "$s\n"}'"'" :::: dates
--tag prepends the line with the date.
{} is replaced with the date.
eval du -s folder/{}* finds all the dirs starting with the date and gives the total du from those dirs.
perl -ne '$s+=$_ ; END {print "$s\n"}' sums up the output from du
Finally there is bit of quoting trickery to get it quoted correctly.

grabbing the newest file from a subset of the contents of a folder in bash

I have a folder with such contents
nass#starmaze:~/audio_setup/scripts$ ls -l ../jmess/
total 32
-rw-rw-r-- 1 nass users 1573 Νοέ 16 2014 jmess_fxio56-78feedsHDA-play12.jmess
-rw-rw-r-- 1 nass nass 1573 Δεκ 13 2014 jmess_pb-2.jmess
-rw-rw-r-- 1 nass nass 1573 Δεκ 20 2014 jmess_pb-3.jmess
-rw-rw-r-- 1 nass nass 1939 Ιούν 12 13:05 jmess_starmazeOnMaster.jmess
-rw-rw-r-- 1 nass nass 2163 Δεκ 15 2014 jmess_starmazeOnMaster.jmess.bak1-art
-rw-rw-r-- 1 nass nass 2161 Δεκ 15 2014 jmess_starmazeOnMaster.jmess.bak2-bcr
-rw-rw-r-- 1 nass nass 2389 Δεκ 22 2014 jmess_starmazeOnMaster.jmess.bak3-hoo
-rw-rw-r-- 1 nass nass 2163 Δεκ 15 2014 jmess_starmazeOnMaster.jmess.bak4-dsp
I want to be able to pick up the newest file, but only from the subset of files that do not contain the word "Master" in them. And I want to put that in a bash script.
So this
ls -t1 "${JCMESS_FOLDER}" | head -n1
provides the newest file in the folder , while this
ls -t1 "${JCMESS_FOLDER}"/!(*Master*) | head -n1
provides the newest file among the subset that I am interested in.
However, when I place the latter in a bash script as
$NEWEST_JCMESS_FILE=$( ls -t1 "${JCMESS_FOLDER}"/!(*Master*) | head -n1 )
it does not work:
./06.load_jcmess: command substitution: line 8: syntax error near unexpected token `('
./06.load_jcmess: command substitution: line 8: ` ls -t1 "${JCMESS_FOLDER}"/!(*Master*) | head -n1 )'
I am not sure what is wrong in this case and I ahve not been able to successfully find an answer for this.
thank you in advance for your help
This is BashFAQ #3:
newest() {
local candidate result=$1; shift # start with first argument as candidate
[[ -e $result ]] || return # handle case where nothing matched
for candidate; do # for loop default behavior is to loop over "$#"
[[ $candidate -nt $result ]] && result=$candidate
done
printf '%s\n' "$result"
}
shopt -s extglob # enable extglobs, ie. !(...)
newest_file=$(newest "$JCMESS_FOLDER"/!(*Master*))

Script to generate a list to run a command

Sorry for the semi-vague title, I wasn't exactly sure how to word it. I'm looking to generate a list, excluding devices without a matching major/minor number, and run
lkdev -l hdiskn -a -c DATAn
where the hdisk and the DATA device having corresponding major/minor numbers.
In /dev, I have -
root# testbox /dev
#ls -l | grep -E "DATA|hdisk" | grep -v rhd
crw-r--r-- 1 root system 18, 3 Oct 03 10:50 DATA01
crw-r--r-- 1 root system 18, 2 Oct 03 10:50 DATA02
brw------- 1 root system 18, 1 Apr 12 2013 hdisk0
brw------- 1 root system 18, 0 Apr 12 2013 hdisk1
brw------- 1 root system 18, 3 Jan 14 2014 hdisk2
brw------- 1 root system 18, 2 Jan 14 2014 hdisk3
brw------- 1 root system 18, 4 Jan 14 2014 hdisk4
So essentially, I'm trying to create something where hdisk0,1,4 are all excluded, and hdisk2-3 are locked with DATA01 and DATA02, respectively.
I originally was trying to use sort and/or uniq to isolate/remove fields, but haven't been able to generate the desired list to even begin looking at running the command on each.
(As a note, I have several servers with hundreds of these. If it were just these few, I'd find a "simpler" way.)
(I can't test it right now, so please correct syntax errors if any)
You could play with sort en uniq like beneath
ls -l | grep -E "DATA|hdisk" | sed -e 's/.* \([0-9]*, *[0-9]*\).*/\1/' | sort |
uniq -c | grep -v " 1" | cut -c8- | while read majorminor; do
ls -l | grep " ${majorminor}" | sed 's/.* //'
done
However, you should start with selecting the right lines without counting:
for data in $(find /dev -type c -name "DATA*" | cut -d/ -f3); do
majorminor="$(ls -l $data | sed -e 's/.* \([0-9]*, *[0-9]*\).*/\1/')"
echo "$data <==> $(ls -l hdisk* | grep " ${majorminor}" | sed 's/.* //')"
done

Can sed be used with find to print Fields and Folder Name

This small script :
touch ilFldsN9LS.txt
ls -l | grep "^d" > /home/userB/PLAY/LibTESTxOutputFiles/ilFldsN9LS_testTEST.txt
produces file content of this format:
drwxr-xr-x 2 userB userB 4096 Mar 23 22:40 BASH_Collection_FolderNESTY
drwxr-xr-x 2 userB userB 4096 Mar 24 17:33 BASH_Collection_Functionality
What I wish to achieve is to get output very much like the above, but using find.
Turning to the use of find, this script: (which unlike the one previous, is recursive)
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; | grep "^d" |\
| tee -a /home/UserB/PLAY/LibTESTxOutputFiles/ilFldsR9FB.txt
produces file content of this format:
drwxr-xr-x 2 UserB UserB 4096 Mar 24 17:33 ./BASH_Collection_Functionality
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 ./LibTESTxOutputFiles/AdditionalTESTresults
adding some AWK to the script, like so:
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; | grep "^d" | awk '{ sub(/\.\//, " ");print}'\
| tee -a /home/innocentxlii/PLAY/LibTESTxOutputFiles/ilFldsR9FB.txt
produces output with the ** ./ * stripped from the front of the path
and pads the gap with an extra space to ease reading:
drwxr-xr-x 2 UserB UserB 4096 Mar 24 17:33 BASH_Collection_Functionality
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 LibTESTxOutputFiles/AdditionalTESTresults
Where I have gotten stuck, is I have been trying to use sed to keep the Fields, but to
have only the last Folder in the path, listed. For example the last item above would be:
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 AdditionalTESTresults
Ideas? I tried literally dozens of sed variants but have realized something must be
wrong with this approach.
How about this: sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
Check my test runs below:
$ echo "drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1/dir2/dir3/dir4" | sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir4
$ echo "drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1/dir2/dir3" | sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir3
$ echo "drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1/dir2" | sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir2
$ echo "drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1" | sed -r 's/(.* )(.*)\/(.*)$/\1\3/g'
drwxr-xr-x 2 UserB UserB 4096 Mar 25 16:04 dir1
NOTE: Please note that there is a space After the 1st asterisks in the sed command
You don't need sed, you can do it all with awk
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; |
awk '{n=split($NF,a,"/"); sub($NF, " "a[n])}1'
grep "^d" is redundant since you are already finding directories with find . -type d
awk Explanation:
n=split($NF,a,"/") splits the last field (denoted by $NF) by / and assigns it to array a
n gives the length of the array
a[n] will therefore return the string following the last / (i.e. the inner most directory)
sub($NF, " "a[n]) replaces the last field (denoted by $NF) with a space for padding (as per example) + inner most directory (denoted by a[n])
awk '{...}1' the 1 outside the awk is the same as print
EDIT: RE: for cases where directory contain spaces in name
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; |
awk -F '[0-9]+:[0-9]+ ' '{n=split($NF,a,"/"); sub($NF, " "a[n])}1'
specifying the input separator with -F '[0-9]+:[0-9]+ ' (for modification time) will ensure the last field ($NF) is the file name -- regardless whether or not it contain spaces in the directory name
what about
sed 's|\./||'
that remove the first ./, so in your line
find . -type d \( ! -iname ".*" \) -exec ls -ld {} \; | sed 's|\./||'

Resources