bash: get directory from certain part of path onwards - bash

I have an arbitrary path which contains the directory mydir:
/some/path/to/mydir/further/path/file.ext
I want to get the part after mydir, in this example:
/further/path/file.ext
Please note that the levels of subdirectories are also arbitrary, so a path like
/yet/another/long/path/to/mydir/file.ext
is also possible (where the result would be "file.ext")
The first occurrence of mydir should be used, so the path
/path/mydir/some/other/path/mydir/path/file.ext
should result in
/some/other/path/mydir/path/file.ext
How can one do this with bash?
Note. It is assumed that mydir will always appear enclosed between slashes.

after=${mydir#*/mydir/}
if [ "$mydir" = "$after" ]; then
fail_with_error "Path does not contain /mydir/"
fi
after="/$after"
In line 1, the # means substring after, and the * is the usual placeholder. To be safe against directories like .../mydirectaccess/... I included the slashes at both ends of mydir. Line 5 just prepends the slash that had been taken off by line 1.

Using Shell Parameter Expansion:
$ mydir="/some/path/to/mydir/further/path/file.ext"
$ echo ${mydir#*mydir}
/further/path/file.ext
$ mydir="/path/mydir/some/other/path/mydir/path/file.ext"
$ echo ${mydir#*mydir}
/some/other/path/mydir/path/file.ext

Go through sed. Example:
echo /some/path/to/mydir/further/path/file.ext | sed 's/.*mydir/mydir/'

Using bash, you can do something like this:
V=/yet/another/long/path/to/mydir/file.ext
R=${V#*mydir/}
echo $R
file.ext

Related

Add file by (*) star character to variable in for loop

I have a folder structure where two files are in a folder. The files have long names, yet are distinguished by R1 and R2. Note I am running this over many folders using the for loop but keeping it simple for this example. I am running a loop and am wonder how to correctly call the files with a (*) star character to autocomplete without having to type in all file name. My attempt is below:
#!/bin/bash
for item in Folder_Directory:
do
forward=$item/*R1*
reverse=$item/*R2*
bbmap.sh ref=reference.fna in1=$forward in2=$reverse outu=Unmapped.fasta
done
The output I am getting is an error because the variable is not identifying the desired file:
Error:
align2.BBMap build=1 overwrite=true fastareadlen=500 ref=reference.fna
in1=Folder_Dictory/*R1* in2=Folder_Dictory/*R2* outu=Folder_Dictory/Unmapped.fastq
In this example I could autocomplete the files, however, when I expand this loop to include multiple folders that is no longer ideal. Autocompleting using (*) characters was my first approach, any other suggestions or fixes to my issue are greatly appreciated.
The problem is that the shell sees in1=Folder_Dictory/*R1* and notices that there are no files which match the glob with the literal in1= prefix, and so the wildcard does not get expanded at all.
You probably want to evaluate the wildcard before passing it to the command, like for instance
for item in Folder_Directory:
do
forward=$item/*R1*
reverse=$item/*R2*
bbmap.sh ref=reference.fna in1="$(echo $forward)" in2="$(echo $reverse)" outu=Unmapped.fasta
done
This will of course still be erratic if the wildcard expands to more than one file.
If you want only two files from your folder_structure, then i believe it would be good to use find to search for the files and assign then into separate variables as per your requirement...don't see use of for loop here.
forward=$(find Folder_Directory -type f -name "*R1*")
reverse=$(find Folder_Directory -type f -name "*R2*")
bbmap.sh ref=reference.fna in1="$forward" in2="$reverse" outu=Unmapped.fasta
It works like this:
test=f*
$ echo $test
file
But
$ echo "$test"
f*
And
test2=$test
$ echo "$test" $test2
f* file
$ echo "$test" "$test2"
f* f*
To make it work, you have to do something like this:
test3="$(echo $test)"
$ echo "$test" "$test2" "$test3"
f* f* file

Assign a variable in a bash script dynamically that parses a os path

I seem to be not able to get this right
When i run this code I need a variable for the filename later on. How should I do thi?
#!/bin/bash
foo="../../../data/audio/serval-data/wav-16bit-16khz/ytdl/balanced_train/vidzyGjrJfE_rg.wav"
echo $foo
echo "${foo%.*}" | cut -d "/" -f10;
# fid=vidzyGjrJfE_rg
I want to use new variable fid to have value "vidzyGjrJfE_rg"
You can use a couple of iterations of shell builtins for this (see #melpomene's answer) but FYI that's exactly what basename exists to do:
$ foo="../../../data/audio/serval-data/wav-16bit-16khz/ytdl/balanced_train/vidzyGjrJfE_rg.wav"
$ fid=$(basename "$foo" '.wav')
$ echo "$fid"
vidzyGjrJfE_rg
You can do it like this:
fid="${foo##*/}"
fid="${fid%.*}"
Just as % removes a matching suffix from a variable, # removes a prefix (and ## removes the longest matching prefix). See https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion.

Remove suffix as well as prefix from path in bash

I have filepaths of the form:
../healthy_data/F35_HC_532d.dat
I want to extract F35_HC_532d from this. I can remove prefix and suffix from this filename in bash as:
for i in ../healthy_data/*; do echo ${i#../healthy_data/}; done # REMOVES PREFIX
for i in ../healthy_data/*; do echo ${i%.dat}; done # REMOVES SUFFIX
How can I combine these so that in a single command I would be able to remove both and extract only the part that I want?
You can use BASH regex for this like this and print captured group #1:
for file in ../healthy_data/*; do
[[ $file =~ .*/([_[:alnum:]]+)\.dat$ ]] && echo "${BASH_REMATCH[1]}"
done
If you can use Awk, it is pretty simple,
for i in ../healthy_data/*
do
stringNeeded=$(awk -F/ '{split($NF,temp,"."); print temp[1]}' <<<"$i")
printf "%s\n" "$stringNeeded"
done
The -F/ splits the input string on / character, and $NF represents the last field in the string in that case, F35_HC_532d.dat, now the split() function is called with the de-limiter . to extract the part before the dot.
The options/functions in the above Awk are POSIX compatible.
Also bash does not support nested parameter expansions, you need to modify in two fold steps something like below:-
tempString="${i#*/*/}"
echo "${tempString%.dat}"
In a single-loop,
for i in ../healthy_data/*; do tempString="${i#*/*/}"; echo "${tempString%.dat}" ; done
The two fold syntax here, "${i#*/*/}" part just stores the F35_HC_532d.dat into the variable tempString and in that variable we are removing the .dat part as "${tempString%.dat}"
If all files end with .dat (as you confirmed) you can use the basename command:
basename -s .dat /path/to/files/*
If there are many(!) of those files, use find to avoid an argument list too long error:
find /path/to/files -maxdepth 1 -name '*.dat' -exec basename -s .dat {} +
For a shell script which needs to deal if any number of .dat files use the second command!
Do you count this as one step?
for i in ../healthy_data/*; do
sed 's#\.[^.]*##'<<< "${i##*/}"
done
You can't strip both a prefix and suffix in a single parameter expansion.
However, this can be accomplished in a single loop using parameter expansion operations only. Just save the prefix stripped expansion to a variable and use expansion again to remove its suffix:
for file in ../healthy_data/*; do
prefix_stripped="${file##*\/healthy_data\/}"
echo "${prefix_stripped%.dat}"
done
If you are on zsh, one way to achieve this without the need for defining another variable is
for i in ../healthy_data/*; do echo "${${i#../healthy_data/}%.dat}"; done
This removes prefix and suffix in one step.
In your specific example the prefix stems from the fact that the files are located in a different directory. You can get rid of the prefix by cding in this case.
(cd ../healthy_data ; for i in *; do echo ${i%.dat}; done)
The (parens) invoke a sub shell process and your current shell stays where it is. If you don't want a sub shell you can cd back easily:
cd ../healthy_data ; for i in *; do echo ${i%.dat}; done; cd -

Extract number from filename

I'm doing a bash script, which automatically can run simulations for me. In order to start the simulation, this other script need an input, which should be dictated by the name of the folder.
So if I have a folder names No200, then I want to extract the number 200. So far, what I have is
PedsDirs=`find . -type d -maxdepth 1`
for dir in $PedsDirs
do
if [ $dir != "." ]; then
NoOfPeds = "Number appearing in the name dir"
fi
done
$ dir="No200"
$ echo "${dir#No}"
200
In general, to remove a prefix use ${variable-name#prefix}; to remove a suffix: ${variable-name%suffix}.
Bonus tip: avoid using find. It introduces many problems, especially when your files/directories contain whitespace. Use bash builtins glob features instead:
for dir in No*/ # Loops over all directories starting with 'No'.
do
dir="${dir%/}" # Removes the trailing slash from the directory name.
NoOfPeds="${dir#No}" # Removes the 'No' prefix.
done
Also, try to always use quotes around variable names to avoid accidental expansion (i.e. use "$dir" instead of just $dir).
Be careful, as you have to join the = to the variable name in bash. To get just the number, you can do something like:
NoOfPeds=`echo $dir | tr -d -c 0-9`
(that is, delete whatever char that it is not a number). All numbers will be then in NoOfPeds.

Running command on substring of every file

Let's say I've some files like:
samplea.txt
sampleb.txt
samplec.txt
And I want to run some command with this form:
./cmd -foo a.xml -bar samplea.txt
First I've tried to
for file in "./*.txt"
do
echo -e $file
done
But this way it will print every file in a straight line. By trying:
echo -e $file\n
It does not produce the expected (single line for each file).
Couldn't even pass through the first part of the problem, that would be running a command on each file (which it could be achieved by find (...) -exec), but what i really wanted to do was extract a substring of each name.
Doing:
echo ${file:1}
won't work since I could only do so after splitting the filenames, to get the "a","b","c" from each one.
I'm sorry if it sounds confusing, but it's my first bash script.
Do not quote the wildcard expression. You can use parameter expansion to remove parts of a string:
for file in sample*.txt ; do
part=${file#sample} # Remove "sample" at the beginning.
part=${part%.txt} # Remove ".txt" at the end.
./cmd -foo "$part".xml -bar "$file"
done

Resources