Remove suffix as well as prefix from path in bash - bash

I have filepaths of the form:
../healthy_data/F35_HC_532d.dat
I want to extract F35_HC_532d from this. I can remove prefix and suffix from this filename in bash as:
for i in ../healthy_data/*; do echo ${i#../healthy_data/}; done # REMOVES PREFIX
for i in ../healthy_data/*; do echo ${i%.dat}; done # REMOVES SUFFIX
How can I combine these so that in a single command I would be able to remove both and extract only the part that I want?

You can use BASH regex for this like this and print captured group #1:
for file in ../healthy_data/*; do
[[ $file =~ .*/([_[:alnum:]]+)\.dat$ ]] && echo "${BASH_REMATCH[1]}"
done

If you can use Awk, it is pretty simple,
for i in ../healthy_data/*
do
stringNeeded=$(awk -F/ '{split($NF,temp,"."); print temp[1]}' <<<"$i")
printf "%s\n" "$stringNeeded"
done
The -F/ splits the input string on / character, and $NF represents the last field in the string in that case, F35_HC_532d.dat, now the split() function is called with the de-limiter . to extract the part before the dot.
The options/functions in the above Awk are POSIX compatible.
Also bash does not support nested parameter expansions, you need to modify in two fold steps something like below:-
tempString="${i#*/*/}"
echo "${tempString%.dat}"
In a single-loop,
for i in ../healthy_data/*; do tempString="${i#*/*/}"; echo "${tempString%.dat}" ; done
The two fold syntax here, "${i#*/*/}" part just stores the F35_HC_532d.dat into the variable tempString and in that variable we are removing the .dat part as "${tempString%.dat}"

If all files end with .dat (as you confirmed) you can use the basename command:
basename -s .dat /path/to/files/*
If there are many(!) of those files, use find to avoid an argument list too long error:
find /path/to/files -maxdepth 1 -name '*.dat' -exec basename -s .dat {} +
For a shell script which needs to deal if any number of .dat files use the second command!

Do you count this as one step?
for i in ../healthy_data/*; do
sed 's#\.[^.]*##'<<< "${i##*/}"
done

You can't strip both a prefix and suffix in a single parameter expansion.
However, this can be accomplished in a single loop using parameter expansion operations only. Just save the prefix stripped expansion to a variable and use expansion again to remove its suffix:
for file in ../healthy_data/*; do
prefix_stripped="${file##*\/healthy_data\/}"
echo "${prefix_stripped%.dat}"
done

If you are on zsh, one way to achieve this without the need for defining another variable is
for i in ../healthy_data/*; do echo "${${i#../healthy_data/}%.dat}"; done
This removes prefix and suffix in one step.

In your specific example the prefix stems from the fact that the files are located in a different directory. You can get rid of the prefix by cding in this case.
(cd ../healthy_data ; for i in *; do echo ${i%.dat}; done)
The (parens) invoke a sub shell process and your current shell stays where it is. If you don't want a sub shell you can cd back easily:
cd ../healthy_data ; for i in *; do echo ${i%.dat}; done; cd -

Related

Rename all files with the name pattern *.[a-z0-9].bundle.*, to replace the [a-z0-9] with a given string

On building apps with the Angular 2 CLI, I get outputs which are named, for instance:
inline.d41d8cd.bundle.js
main.6d2e2e89.bundle.js
etc.
What I'm looking to do is create a bash script to rename the files, replacing the digits between the first two . with some given generic string. Tried a few things, including sed, but I couldn't get them to work. Can anyone suggest a bash script to get this working?
In pure bash regEx using the =~ variable (supported from bash 3.0 onwards)
#!/bin/bash
string_to_replace_with="sample"
for file in *.js
do
[[ $file =~ \.([[:alnum:]]+).*$ ]] && string="${BASH_REMATCH[1]}"
mv -v "$file" "${file/$string/$string_to_replace_with}"
done
For your given input files, running the script
$ bash script.sh
inline.d41d8cd.bundle.js -> inline.sample.bundle.js
main.6d2e2e89.bundle.js -> main.sample.bundle.js
Short, powerfull and efficient:
Use this (perl) tool. And use Perl Regular Expression:
rename 's/\.\X{4,8}\./.myString./' *.js
or
rename 's/\.\X+\./.myString./' *.js
A pure-bash option:
shopt -s extglob # so *(...) will work
generic_string="foo" # or whatever else you want between the dots
for f in *.bundle.js ; do
mv -vi "$f" "${f/.*([^.])./.${generic_string}.}"
done
The key is the replacement ${f/.*([^.]./.${generic_string}.}. The pattern /.*([^.])./ matches the first occurrence of .<some text>., where <some text> does not include a dot ([^.]) (see the man page). The replacement .${generic_string}. replaces that with whatever generic string you want. Other than that, double-quote in case you have spaces, and there you are!
Edit Thanks to F. Hauri - added -vi to mv. -v = show what is being renamed; -i = prompt before overwrite (man page).

Extract number from filename

I'm doing a bash script, which automatically can run simulations for me. In order to start the simulation, this other script need an input, which should be dictated by the name of the folder.
So if I have a folder names No200, then I want to extract the number 200. So far, what I have is
PedsDirs=`find . -type d -maxdepth 1`
for dir in $PedsDirs
do
if [ $dir != "." ]; then
NoOfPeds = "Number appearing in the name dir"
fi
done
$ dir="No200"
$ echo "${dir#No}"
200
In general, to remove a prefix use ${variable-name#prefix}; to remove a suffix: ${variable-name%suffix}.
Bonus tip: avoid using find. It introduces many problems, especially when your files/directories contain whitespace. Use bash builtins glob features instead:
for dir in No*/ # Loops over all directories starting with 'No'.
do
dir="${dir%/}" # Removes the trailing slash from the directory name.
NoOfPeds="${dir#No}" # Removes the 'No' prefix.
done
Also, try to always use quotes around variable names to avoid accidental expansion (i.e. use "$dir" instead of just $dir).
Be careful, as you have to join the = to the variable name in bash. To get just the number, you can do something like:
NoOfPeds=`echo $dir | tr -d -c 0-9`
(that is, delete whatever char that it is not a number). All numbers will be then in NoOfPeds.

Storing the first 4 characters of a filename into a variable

This follwing code is part of a script I am writing. Now, for the purposes of this script, I am assuming there is only 1 file in ./src, so this loop should only execute once. Now, in the loop somewhere, I want to take the first 4 characters of $f (the filename) and store it in another variable. I know there is the cut command but I'm not sure if or how that would be used here because I thought cut was used for contents of files, not the files themselves.
for f in `ls ./src`
do
echo $f
cd tmp
f="../src/$f"
sh "$f"
done
From http://tldp.org/LDP/abs/html/string-manipulation.html
Substring Extraction
${string:position}
Extracts substring from $string at $position.
If the $string parameter is "*" or "#", then this extracts the positional parameters, [1] starting at $position.
${string:position:length}
Extracts $length characters of substring from $string at $position.
Example
shortName=${f:0:4}
Have fun!
You can use pure bash way:
${parameter:offset:length}
i.e. to get first chars of $HOME variable:
echo ${HOME:0:4}
btw your script is also faulty (never parse ls output). It should be like this:
for f in ./src/*
do
echo $f
cd tmp
f="../src/$f"
first4=${f:0:4}
sh "$f"
done

Basename puts single quotes around variable

I am writing a simple shell script to make automated backups, and I am trying to use basename to create a list of directories and them parse this list to get the first and the last directory from the list.
The problem is: when I use basename in the terminal, all goes fine and it gives me the list exactly as I want it. For example:
basename -a /var/*/
gives me a list of all the directories inside /var without the / in the end of the name, one per line.
BUT, when I use it inside a script and pass a variable to basename, it puts single quotes around the variable:
while read line; do
dir_name=$(echo $line)
basename -a $dir_name/*/ > dir_list.tmp
done < file_with_list.txt
When running with +x:
+ basename -a '/Volumes/OUTROS/backup/test/*/'
and, therefore, the result is not what I need.
Now, I know there must be a thousand ways to go around the basename problem, but then I'd learn nothing, right? ;)
How to get rid of the single quotes?
And if my directory name has spaces in it?
If your directory name could include spaces, you need to quote the value of dir_name (which is a good idea for any variable expansion, whether you expect spaces or not).
while read line; do
dir_name=$line
basename -a "$dir_name"/*/ > dir_list.tmp
done < file_with_list.txt
(As jordanm points out, you don't need to quote the RHS of a variable assignment.)
Assuming your goal is to populate dir_list.tmp with a list of directories found under each directory listed in file_with_list.txt, this might do.
#!/bin/bash
inputfile=file_with_list.txt
outputfile=dir_list.tmp
rm -f "$outputfile" # the -f makes rm fail silently if file does not exist
while read line; do
# basic syntax checking
if [[ ! ${line} =~ ^/[a-z][a-z0-9/-]*$ ]]; then
continue
fi
# collect targets using globbing
for target in "$line"/*; do
if [[ -d "$target" ]]; then
printf "%s\n" "$target" >> $outputfile
fi
done
done < $inputfile
As you develop whatever tool will process your dir_list.tmp file, be careful of special characters (including spaces) in that file.
Note that I'm using printf instead of echo so that targets whose first character is a hyphen won't cause errors.
This might work
while read; do
find "$REPLY" >> dir_list.tmp
done < file_with_list.txt

Renaming multiples files with a bash loop

I need to rename 45 files, and I don't want to do it one by one. These are the file names:
chr10.fasta chr13_random.fasta chr17.fasta chr1.fasta chr22_random.fasta chr4_random.fasta chr7_random.fasta chrX.fasta
chr10_random.fasta chr14.fasta chr17_random.fasta chr1_random.fasta chr2.fasta chr5.fasta chr8.fasta chrX_random.fasta
chr11.fasta chr15.fasta chr18.fasta chr20.fasta chr2_random.fasta chr5_random.fasta chr8_random.fasta chrY.fasta
chr11_random.fasta chr15_random.fasta chr18_random.fasta chr21.fasta chr3.fasta chr6.fasta chr9.fasta
chr12.fasta chr16.fasta chr19.fasta chr21_random.fasta chr3_random.fasta chr6_random.fasta chr9_random.fasta
chr13.fasta chr16_random.fasta chr19_random.fasta chr22.fasta chr4.fasta chr7.fasta chrM.fasta
I need to change the extension ".fasta" to ".fa". I'm trying to write a bash script to do it:
for i in $(ls chr*)
do
NEWNAME = `echo $i | sed 's/sta//g'`
mv $i $NEWNAME
done
But it doesn't work. Can you tell me why, or give another quick solution?
Thanks!
Several mistakes here:
NEWNAME = should be without space. Here bash is looking for a command named NEWNAME and that fails.
you parse the output of ls. this is bad if you had files with spaces. Bash can build itself a list of files with the glob operator *.
You don't escape "$i" and "$NEWNAME". If any of them contains a space it makes two arguments for mv.
If a file name begins with a dash mv will believe it is a switch. Use -- to stop argument processing.
Try:
for i in chr*
do
mv -- "$i" "${i/%.fasta/.fa}"
done
or
for i in chr*
do
NEWNAME="${i/%.fasta/.fa}"
mv -- "$i" "$NEWNAME"
done
The "%{var/%pat/replacement}" looks for pat only at the end of the variable and replaces it with replacement.
for f in chr*.fasta; do mv "$f" "${f/%.fasta/.fa}"; done
If you have the rename command, you can do:
rename .fasta .fa chr*.fasta

Resources