More universal alternative to this sed command? - shell

I have a variable called $dirs storing directories in a dir tree:
root/animals/rats/mice
root/animals/cats
And I have another variable called $remove for example that holds the names of the directories I want to remove from the dirs variable:
rats
crabs
I am using a for loop to do that:
for d in $remove; do
dirs=$(echo "$dirs" | sed "/\b$d\b/d")
done
After that loop is done, what I should be left with is:
root/animals/cats
because the loop found rats.
I have tested this approach on 3 systems but it only works as expected on 2.
Is there a more universal approach that would work on all shells?

You are looking for something like
echo "${dirs}" | grep -Ev "rats|crabs"
When you can't store the exclusion list in the format with |, try to change it on the fly:"
echo "${dirs}" | grep -Ev $(echo "${remove}" | tr -s "\n" "|" | sed 's/|$//')
You can use the excludeFile technique without a temp file with
echo "${dirs}" | grep -vf <(echo "${remove}")
I am not sure which of there solutions will be best supported.

Related

using cut on a line having multiple instances of the same delimiter - unix

I am trying to write a generic script which can have different file name inputs.
This is just a small part of my bash script.
for example, lets say folder 444-55 has 2 files
qq.filter.vcf
ee.filter.vcf
I want my output to be -
qq
ee
I tried this and it worked -
ls /data2/delivery/Stack_overflow/1111_2222_3333_23/secondary/444-55/*.filter.vcf | sort | cut -f1 -d "." | xargs -n 1 basename
But lets say I have a folder like this -
/data2/delivery/Stack_overflow/de.1111_2222_3333_23/secondary/444-55/*.filter.vcf
My script's output would then be
de
de
How can I make it generic?
Thank you so much for your help.
Something like this in a script will "cut" it:
for i in /data2/delivery/Stack_overflow/1111_2222_3333_23/secondary/444-55/*.filter.vcf
do
basename "$i" | cut -f1 -d.
done | sort
advantages:
it does not parse the output of ls, which is frowned upon
it cuts after having applied the basename treatment, and the cut ignores the full path.
it also sorts last so it's guaranteed to be sorted according to the prefix
Just move the basename call earlier in the pipeline:
printf "%s\n" /data2/delivery/Stack_overflow/1111_2222_3333_23/secondary/444-55/*.filter.vcf |
xargs -n 1 basename |
sort |
cut -f1 -d.

get the file name that has specific extension in shell script

I have three files in a directory that has the structure like this:
file.exe.trace, file.exe.trace.functions and file.exe.trace.netlog
I want to know how can I get file.exe as file name?
In other world I need to get file name that has the .trace extension? I should note that as you can see all the files has the .trace part.
If $FILENAME has the name, the root part can be gotten from ${FILENAME%%.trace*}
for FILENAME in *.trace; do
echo ${FILENAME%%.trace*}
done
You can also use basename:
for f in *.trace; do
basename "$f" ".trace"
done
Update: The previous won't process files with extra extensions besides .trace like .trace.functions, but the following sed will do:
sed -r 's_(.*)\.trace.*_\1_' <(ls -c1)
You can also use it in a for loop instead:
for f in *.trace*; do
sed -r 's_(.*)\.trace.*_\1_' <<< "$f"
done
Try:
for each in *exe*trace* ; do echo $each | awk -F. '{print $1"."$2}' ; done | sort | uniq

bash: how to display the name of the first directory that contains a certain file

I have this:
ls */file
dir1/file dir2/file dir3/file
But I need just the first directory name, like this: dir1
I did this:
IFS="/" read foo bar <<< "$(ls */file 2>/dev/null)"
echo $foo
dir1
And it works, but now I have a problem with subshell expansion over ssh. Is there a more elegant way (without subshells or sed) to do this?
If not, I'll then post a question regarding a completely different issue - expanding subshells over ssh.
for F in */file; do
D=${F%%/*}
break
done
Another:
F=(*/file); D=${F%%/*}
Try
ls */file | cut -d"/" -f1
Use / as a separator.
You can use the tricky Double quotes!
Like so:
LIST=`ls */file`
echo "$LIST" | cut -d/ -f1
or
echo "$LIST" | awk -F/ {'print $1'}
You can use builtin read bulletin with -d option:
read -d '/' a < <(echo */file)
echo "$a"
dir1
If you just need the name of the folder you can use :
$ls -1 | awk 'NR==n'
Where n=1 is the first directory, you can change the value of n to get the nth Directory.

How to loop over files in natural order in Bash?

I am looping over all the files in a directory with the following command:
for i in *.fas; do some_code; done;
However, I get them in this order
vvchr1.fas
vvchr10.fas
vvchr11.fas
vvchr2.fas
...
instead of
vvchr1.fas
vvchr2.fas
vvchr3.fas
...
what is natural order.
I have tried sort command, but to no avail.
readarray -d '' entries < <(printf '%s\0' *.fas | sort -zV)
for entry in "${entries[#]}"; do
# do something with $entry
done
where printf '%s\0' *.fas yields a NUL separated list of directory entries with the extension .fas, and sort -zV sorts them in natural order.
Note that you need GNU sort installed in order for this to work.
With option sort -g it compares according to general numerical value
for FILE in `ls ./raw/ | sort -g`; do echo "$FILE"; done
0.log
1.log
2.log
...
10.log
11.log
This will only work if the name of the files are numerical. If they are string you will get them in alphabetical order. E.g.:
for FILE in `ls ./raw/* | sort -g`; do echo "$FILE"; done
raw/0.log
raw/10.log
raw/11.log
...
raw/2.log
You will get the files in ASCII order. This means that vvchr10* comes before vvchr2*. I realise that you can not rename your files (my bioinformatician brain tells me they contain chromosome data, and we simply don't call chromosome 1 "chr01"), so here's another solution (not using sort -V which I can't find on any operating system I'm using):
ls *.fas | sed 's/^\([^0-9]*\)\([0-9]*\)/\1 \2/' | sort -k2,2n | tr -d ' ' |
while read filename; do
# do work with $filename
done
This is a bit convoluted and will not work with filenames containing spaces.
Another solution: Suppose we'd like to iterate over the files in size-order instead, which might be more appropriate for some bioinformatics tasks:
du *.fas | sort -k2,2n |
while read filesize filename; do
# do work with $filename
done
To reverse the sorting, just add r after -k2,2n (to get -k2,2nr).
You mean that files with the number 10 comes before files with number 3 in your list? Thats because ls sorts its result very simple, so something-10.whatever is smaller than something-3.whatever.
One solution is to rename all files so they have the same number of digits (the files with single-digit in them start with 0 in the number).
while IFS= read -r file ; do
ls -l "$file" # or whatever
done < <(find . -name '*.fas' 2>/dev/null | sed -r -e 's/([0-9]+)/ \1/' | sort -k 2 -n | sed -e 's/ //;')
Solves the problem, presuming the file naming stays consistent, doesn't rely on very-recent versions of GNU sort, does not rely on reading the output of ls and doesn't fall victim to the pipe-to-while problems.
Like #Kusalananda's solution (perhaps easier to remember?) but catering for all files(?):
array=("$(ls |sed 's/[^0-9]*\([0-9]*\)\..*/\1 &/'| sort -n | sed 's/^[^ ]* //')")
for x in "${array[#]}";do echo "$x";done
In essence add a sort key, sort, remove sort key.
EDIT: moved comment to appropriate solution
use sort -rh and the while loop
du -sh * | sort -rh | grep -P "avi$" |awk '{print $2}' | while read f; do fp=`pwd`/$f; echo $fp; done;

How can I get the output of a command into a bash variable?

I can't remember how to capture the result of an execution into a variable in a bash script.
Basically I have a folder full of backup files of the following format:
backup--my.hostname.com--1309565.tar.gz
I want to loop over a list of all files and pull the numeric part out of the filename and do something with it, so I'm doing this so far:
HOSTNAME=`hostname`
DIR="/backups/"
SUFFIX=".tar.gz"
PREFIX="backup--$HOSTNAME--"
TESTNUMBER=9999999999
#move into the backup dir
cd $DIR
#get a list of all backup files in there
FILES=$PREFIX*$SUFFIX
#Loop over the list
for F in $FILES
do
#rip the number from the filename
NUMBER=$F | sed s/$PREFIX//g | sed s/$SUFFIX//g
#compare the number with another number
if [ $NUMBER -lg $TESTNUMBER ]
#do something
fi
done
I know the "$F | sed s/$PREFIX//g | sed s/$SUFFIX//g" part rips the number correctly (though I appreciate there might be a better way of doing this), but I just can't remember how to get that result into NUMBER so I can reuse it in the if statement below.
Use the $(...) syntax (or ``).
NUMBER=$( echo $F | sed s/$PREFIX//g | sed s/$SUFFIX//g )
or
NUMBER=` echo $F | sed s/$PREFIX//g | sed s/$SUFFIX//g `
(I prefer the first one, since it is easier to see when multiple ones nest.)
Backticks if you want to be portable to older shells (sh):
NUMBER=`$F | sed s/$PREFIX//g | sed s/$SUFFIX//g`.
Otherwise, use NUMBER=$($F | sed s/$PREFIX//g | sed s/$SUFFIX//g). It's better and supports nesting more readily.

Resources