Extract number from filename - bash

I'm doing a bash script, which automatically can run simulations for me. In order to start the simulation, this other script need an input, which should be dictated by the name of the folder.
So if I have a folder names No200, then I want to extract the number 200. So far, what I have is
PedsDirs=`find . -type d -maxdepth 1`
for dir in $PedsDirs
do
if [ $dir != "." ]; then
NoOfPeds = "Number appearing in the name dir"
fi
done

$ dir="No200"
$ echo "${dir#No}"
200
In general, to remove a prefix use ${variable-name#prefix}; to remove a suffix: ${variable-name%suffix}.
Bonus tip: avoid using find. It introduces many problems, especially when your files/directories contain whitespace. Use bash builtins glob features instead:
for dir in No*/ # Loops over all directories starting with 'No'.
do
dir="${dir%/}" # Removes the trailing slash from the directory name.
NoOfPeds="${dir#No}" # Removes the 'No' prefix.
done
Also, try to always use quotes around variable names to avoid accidental expansion (i.e. use "$dir" instead of just $dir).

Be careful, as you have to join the = to the variable name in bash. To get just the number, you can do something like:
NoOfPeds=`echo $dir | tr -d -c 0-9`
(that is, delete whatever char that it is not a number). All numbers will be then in NoOfPeds.

Related

In bash, how can I remove multiple versions of the same file?

This may be a very specific case, but I know very little about bash and I need to remove "duplicate" files. I've been downloading totally legal videogame roms these past few days, and I noticed that a lot of packs have many different versions of the same game, like this:
Awesome Golf (1991).lnx
Awesome Golf (1991) [b1].lnx
Baseball Heroes (1991).lnx
Baseball Heroes (1991) [b1].lnx
Basketbrawl (1992).lnx
Basketbrawl (1992) [a1].lnx
Basketbrawl (1992) [b1].lnx
Batman Returns (1992).lnx
Batman Returns (1992) [b1].lnx
How can I make a bash script that removes the duplicates? A duplicate would be any file that has the same name, and the name would be the string before the first parenthesis. The script should parse all the files and grab their names, see which names match to detect duplicates, and remove all files except the first one (first being the first that comes up in alphabetical order).
Would you please try the following:
#!/bin/bash
dir="dir" # the directory where the rom files are located
declare -A seen # associative array to detect the duplicates
while IFS= read -r -d "" f; do # loop over filenames by assigning "f" to it
name=${f%(*} # extract the "name" by removing left paren and following characters
name=${name%.*} # remove the extension considering the case the filename doesn't have parens
name=${name%[*} # remove the left square bracket and following characters considering the case as above
name=${name%% } # remove trailing whitespaces, if any
if (( seen[$name]++ )); then # if the name duplicates...
# remove "echo" if the output looks good
echo rm -- "$f" # then remove the file
fi
done < <(find "$dir" -type f -name "*.lnx" -print0 | sort -z -t "." -k1,1)
# sort the list of filenames in alphabetical order
Please modify the first dir= line to your directory path which holds the rom files.
The echo command just prints the filenames to be removed as a rehearsal. If the output looks good, then remove echo and execute the real one.
[Explanation]
An associative array seen associates the extracted "name" with a
counter of appearance. If the counter is non-zero, the file is a duplicated
one and can be removed (as long as the files are properly sorted).
The -print0 option to find, the -z option to sort and the -d ""
option to read make a null character as a delimiter of filenames to
accept filenames which contain special characters such as a whitespace,
tab, newline, etc.

How can I create a rename script using multiple rules?

I constantly get a bunch of files named "Unknown.png" into a folder, and often times they get renamed "unknown (1).png, unknown (2).png" etc. This is a bit of a problem as sometimes when cleaning up files and moving them somewhere else I get asked if I want to replace or rename, etc.
So I decided to make a crontab task that renames the files to CB_RANDOM this way I don't even have to worry about potentially overwriting two files with the same name.
I could figure it so far, I find the files, replace the name Unknown to CB_ and add a random number.
the problem comes to (x) at the end of the filename. I managed to figure out also how to solve it I just strip away any parenthesis and numbers.
The problem is I can't figure out how to make the rename function to follow both rules.
for u in (find -name unknown*); do
rCode = random
rename -v 's/unknown/CB_$rCode' $u
rename -v 's/[ ()0123456789]//g' $u
Ideally I'd like to be able to follow both rules on the same line of code, specially since once it runs the first line, then $u wont be able to find the file for the second step.
No need for a loop:
find -name 'unknown*' -exec rename 's/unknown \([0-9]+\)\.(.*)$/"CB_".sprintf("%04s",int(rand(10000))).".".$1/e' {} \;
find all the files, starting in the current directory, recursively, with names similar to "unknown (1).png"
rename them with a resulting filename similar to "CB_0135.png"
This produces an error message if a filename already exists.
Your code should first be changed into
# find is a subcommand, use $()
# find a file with wildcard, use quotes
for u in $(find -name "unknown*"); do
# Is random a command? Use $()
rCode=$(random)
# Debug with echo, will show other problem
echo "File $u"
# $rCode will not be replaced by its value in single quotes
# Write a filename in double quotes, so it will not be split by a space
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done
The new line with echo shows that the loop is breaking up the filenames at the spaces. You can change this in
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
echo "File $u"
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done < <(find -name "unknown*")
I never use rename and would use
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
# construct new filename.
# Restriction: Path to file is without newlines, spaces or parentheses
newfile=$(sed 's/[ ()]//g; s/.*unknown/&_'"${rCode}"'_/' <<< "$u")
echo "Moving file $u to ${newfile}"
mv "$u" to "${newfile}"
done < <(find -name "unknown*")
EDIT:
I removed a sed command for renaming files with (something) in it:
# Removed command
newfile=$(sed 's/\(.*\)(\(.*\))/\1'"${rCode}"'_\2/' <<< "$u")

MacOS shell script to move files based on tag

I am trying to write a shell script so that I can move school files from one destination to another based on the input. I download these files from a source like canvas and want to move them from my downloads based on the tag I assign, to the path for my course folder which is nested pretty deep thanks to how I stay organized. Unfortunately, since I store these files in my OneDrive school account, I am unable to eliminate some spacing issues but I believe I have accounted for these. Right now the script is the following:
if [ "$1" = "311" ];
then
course="'/path/to/311/folder/$2'"
elif [ "$1" = "411" ];
then
course="'/path/to/411/folder/$2'"
elif [ "$1" = "516" ];
then
course="'/path/to/516/folder/$2'"
elif [ "$1" = "530" ];
then
course="'/path/to/530/folder/$2'"
elif [ "$1" = "599" ];
then
course="'/path/to/599/folder/$2'"
fi
files=$(mdfind 'kMDItemUserTags='$1'' -onlyin /Users/user/Downloads)
#declare -a files=$(mdfind 'kMDItemUserTags='$1'' -onlyin /Users/user/Downloads)
#mv $files $course
#echo "mv $files $course"
#echo $course
for file in $files
#for file in "${files[#]}"
do
#echo $file
#echo $course
mv $file $course
done
Where $1 is the tag ID and first part of path selection, and $2 is what week number folder I want to move it to. The single quotation marks are there to take care of the spacing in the filepath. I could very easily do this in python but I'm trying to expand my capabilities some. Every time I run this script I get the following message:
usage: mv [-f | -i | -n] [-v] source target
mv [-f | -i | -n] [-v] source ... directory
I initially tried to just move them all at once (per the first mv command that's commented out) and got this error, then tried the for loop, and array but get the same error each time. However, when I uncomment the echo statements in the for loop and manually try to move each one by copying and pasting the paths to the command line, it works perfectly. My best guess is something to do with the formatting of the variable "files", since
echo "mv $files $course"
indicates the presence of a newline character or separator between each file it saves.
I'm sure it's something super simple that I'm missing since I just started trying to pick up shell scripting last week, but nothing I have been able to find online has helped me resolve this. Any help would be greatly appreciated. Thanks
You can replace the files variable assignment and for loop with one command make this the script:
if [ "$1" = "311" ];
then
course="'/path/to/311/folder/$2'"
elif [ "$1" = "411" ];
then
course="'/path/to/411/folder/$2'"
elif [ "$1" = "516" ];
then
course="'/path/to/516/folder/$2'"
elif [ "$1" = "530" ];
then
course="'/path/to/530/folder/$2'"
elif [ "$1" = "599" ];
then
course="'/path/to/599/folder/$2'"
fi
mv -t $course $(mdfind 'kMDItemUserTags='$1'' -onlyin /Users/user/Downloads | sed ':a;N;$!ba;s/\n/ /g)
The sed ':a;N;$!ba;s/\n/ /g command simply replaces the newline characters with spaces, and the -t option for mv simply makes mv take the destination as the first argument.
You're getting rather confused about how quoting works in the shell. First rule: quotes go around data, not in data. For example, you use:
course="'/path/to/311/folder/$2'"
...
mv $file $course
When you set course this way, the double-quotes are treated as shell syntax (i.e. they change how what's between them is parsed), but the single-quotes are stored as part of the variable's value, and will thereafter be treated as data. When you use this variable in the mv command, it's actually looking for a directory literally named single-quote, and under that a directory named "path", etc. Instead, just put the appropriate quotes for how you want it parsed at that point, and then double-quotes around the variable when you use it (to prevent probably-unwanted word splitting and wildcard expansion). Like this:
course="/path/to/311/folder/$2"
...
mv "$file" "$course" # This needs more work -- see below
Also, where you have:
mdfind 'kMDItemUserTags='$1'' -onlyin /Users/user/Downloads
that doesn't really make any sense. You've got a single-quoted section, 'kMDItemUserTags=' where the quotes have no effect at all (single-quotes suppress all special meanings that characters have, like $ introducing variable substitution, but there aren't any characters there with special meanings, so no reason for the quotes), followed by $ without double-quotes around it, meaning that some special characters (whitespace and wildcards) in its value will get special parsing (which you probably don't want), followed by a zero-length single-quoted string, '', which parses out to exactly nothing. You want the $1 part in double-quotes; some people also include the rest of the string in the double-quoted section, which has no effect at all. In fact, other than the $2 part (and the spaces between parameters), you can quote or not however you want. Thus, any of these would work equivalently:
mdfind kMDItemUserTags="$1" -onlyin /Users/user/Downloads
mdfind "kMDItemUserTags=$1" -onlyin /Users/user/Downloads
mdfind "kMDItemUserTags=$1" '-onlyin' '/Users/user/Downloads'
mdfind 'kMDItemUserTags'="$1" '-'"only"'in' /'Users'/'user'/'Down'loads
...etc
Ok, next problem: parsing the output from mdfind from a series of characters into separate filepaths. This is actually tricky. If you put double-quotes around the resilting string, it'll get treated as one long filepath that happens to contain some newlines in it (which is totally legal, but not what you want). If you don't double-quote it, it'll be split into separate filepaths based on whitespace (not just newlines, but also spaces and tabs -- and spaces are common within macOS filenames), and anything that looks like a wildcard will get expanded to a list of matching filenames. This tends to cause chaos.
The solution: there's one character than cannot occur in a filepath, the ASCII NULL (character code 0), and mdfind -0 will output its list delimited with null characters. You can't put the result in a shell variable (they can't hold nulls either), but you can pass it through a pipe to, say, xargs -0, which will (thanks to the -0 option) parse the nulls as delimiters, and build commands out of the results. There is one slightly tricky thing: you want xargs to put the filepaths it gets in the middle of the argument list to mv, not at the end like it usually does. The -J option lets you tell it where to add arguments. I'll also suggest two safety measures: the -p option to xargs makes it ask before actually executing the command (use this at least until you're sure it's doing the right thing), and the -n option to mv, which tells it not to overwrite existing files if there's a naming conflict. The result is something like this:
mdfind -0 kMDItemUserTags="$1" -onlyin /Users/user/Downloads | xargs -0 -p -J% mv -n % "$course"
It is a good point to consider about filenames with whitespaces.
However the problem is that you are not quoting the filename in the mv command. Please take a look of a simple example below:
filename="with space.txt"
=> assign a variable to a filname with a space
touch "$filename"
=> create a file "with space.txt"
str="'$filename'"
=> wrap with single quotes (as you do)
echo $str
=> yields 'with space.txt' and may look good, which is a pitfall
mv $str "newname.txt"
=> causes an error
The mv command above causes an error because the command is invoked with
three arguments as: mv 'with space.txt' newname.txt. Unfortunately
the pre-quoting with single quotes is meaningless.
Instead, please try something like:
if [ "$1" = "311" ]; then
course="/path/to/311/folder/$2"
elif [ "$1" = "411" ]; then
course="/path/to/411/folder/$2"
elif [ "$1" = "516" ]; then
course="/path/to/516/folder/$2"
elif [ "$1" = "530" ]; then
course="/path/to/530/folder/$2"
elif [ "$1" = "599" ]; then
course="/path/to/599/folder/$2"
else
# illegal value in $1. do some error handling
fi
# the lines above may be simplified if /path/to/*folder/ have some regularity
mdfind "kMDItemUserTags=$1" -onlyin /Users/user/Downloads | while read -r file; do
mv "$file" "$course"
done
# the syntax above works as long as the filenames do not contain newline characters

Remove suffix as well as prefix from path in bash

I have filepaths of the form:
../healthy_data/F35_HC_532d.dat
I want to extract F35_HC_532d from this. I can remove prefix and suffix from this filename in bash as:
for i in ../healthy_data/*; do echo ${i#../healthy_data/}; done # REMOVES PREFIX
for i in ../healthy_data/*; do echo ${i%.dat}; done # REMOVES SUFFIX
How can I combine these so that in a single command I would be able to remove both and extract only the part that I want?
You can use BASH regex for this like this and print captured group #1:
for file in ../healthy_data/*; do
[[ $file =~ .*/([_[:alnum:]]+)\.dat$ ]] && echo "${BASH_REMATCH[1]}"
done
If you can use Awk, it is pretty simple,
for i in ../healthy_data/*
do
stringNeeded=$(awk -F/ '{split($NF,temp,"."); print temp[1]}' <<<"$i")
printf "%s\n" "$stringNeeded"
done
The -F/ splits the input string on / character, and $NF represents the last field in the string in that case, F35_HC_532d.dat, now the split() function is called with the de-limiter . to extract the part before the dot.
The options/functions in the above Awk are POSIX compatible.
Also bash does not support nested parameter expansions, you need to modify in two fold steps something like below:-
tempString="${i#*/*/}"
echo "${tempString%.dat}"
In a single-loop,
for i in ../healthy_data/*; do tempString="${i#*/*/}"; echo "${tempString%.dat}" ; done
The two fold syntax here, "${i#*/*/}" part just stores the F35_HC_532d.dat into the variable tempString and in that variable we are removing the .dat part as "${tempString%.dat}"
If all files end with .dat (as you confirmed) you can use the basename command:
basename -s .dat /path/to/files/*
If there are many(!) of those files, use find to avoid an argument list too long error:
find /path/to/files -maxdepth 1 -name '*.dat' -exec basename -s .dat {} +
For a shell script which needs to deal if any number of .dat files use the second command!
Do you count this as one step?
for i in ../healthy_data/*; do
sed 's#\.[^.]*##'<<< "${i##*/}"
done
You can't strip both a prefix and suffix in a single parameter expansion.
However, this can be accomplished in a single loop using parameter expansion operations only. Just save the prefix stripped expansion to a variable and use expansion again to remove its suffix:
for file in ../healthy_data/*; do
prefix_stripped="${file##*\/healthy_data\/}"
echo "${prefix_stripped%.dat}"
done
If you are on zsh, one way to achieve this without the need for defining another variable is
for i in ../healthy_data/*; do echo "${${i#../healthy_data/}%.dat}"; done
This removes prefix and suffix in one step.
In your specific example the prefix stems from the fact that the files are located in a different directory. You can get rid of the prefix by cding in this case.
(cd ../healthy_data ; for i in *; do echo ${i%.dat}; done)
The (parens) invoke a sub shell process and your current shell stays where it is. If you don't want a sub shell you can cd back easily:
cd ../healthy_data ; for i in *; do echo ${i%.dat}; done; cd -

convert a file path into string

I'm having an error trying to find a way to replace a string in a directory path with another string
sed: Error tryning to read from {directory_path}: It's a directory
The shell script
#!/bin/sh
R2K_SOURCE="source/"
R2K_PROCESSED="processed/"
R2K_TEMP_DIR=""
echo " Procesando archivos desde $R2K_SOURCE "
for file in $(find $R2K_SOURCE )
do
if [ -d $file ]
then
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
echo "directorio $R2K_TEMP_DIR"
else
# some code executes
:
fi
done
# find $R2K_PROCCESED -type f -size -200c -delete
i'm understanding that the rror it's in this line
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
but i don't know how to tell sh that treats $file variable as string and not as a directory object.
If you want ot replace part of path name you can echo path name and take it to sed over pipe.
Also you must enable globbing by placing sed commands into double quotes instead of single and change separator for 's' command like that:
R2K_TEMP_DIR=$(echo "$file" | sed "s:$R2K_SOURCE:$R2K_PROCESSED:g")
Then you will be able to operate with slashes inside 's' command.
Update:
Even better is to remove useless echo and use "here is string" instead:
R2K_TEMP_DIR=$(sed "s:$R2K_SOURCE:$R2K_PROCESSED:g" <<< "$file")
First, don't use:
for item in $(find ...)
because you might overload the command line. Besides, the for loop cannot start until the process in $(...) finishes. Instead:
find ... | while read item
You also need to watch out for funky file names. The for loop will cough on all files with spaces in them. THe find | while will work as long as files only have a single space in their name and not double spaces. Better:
find ... -print0 | while read -d '' -r item
This will put nulls between file names, and read will break on those nulls. This way, files with spaces, tabs, new lines, or anything else that could cause problems can be read without problems.
Your sed line is:
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
What this is attempting to do is edit your $file which is a directory. What you want to do is munge the directory name itself. Therefore, you have to echo the name into sed as a pipe:
R2K_TEMP_DIR=$(echo $file | sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g')
However, you might be better off using environment variable parameters to filter your environment variable.
Basically, you have a directory called source/ and all of the files you're looking for are under that directory. You simply want to change:
source/foo/bar
to
processed/foo/bar
You could do something like this ${file#source/}. The # says this is a left side filter and it will remove the least amount to match the glob expression after the #. Check the manpage for bash and look under Parameter Expansion.
This, you could do something like this:
#!/bin/sh
R2K_SOURCE="source/"
R2K_PROCESSED="processed/"
R2K_TEMP_DIR=""
echo " Procesando archivos desde $R2K_SOURCE "
find $R2K_SOURCE -print0 | while read -d '' -r file
do
if [ -d $file ]
then
R2K_TEMP_DIR="processed/${file#source/}"
echo "directorio $R2K_TEMP_DIR"
else
# some code executes
:
fi
done
R2K_TEMP_DIR="processed/${file#source/}" removes the source/ from the start of $file and you merely prepend processed/ in its place.
Even better, it's way more efficient. In your original script, the $(..) creates another shell process to run your echo in which then pipes out to another process to run sed. (Assuming you use loentar's solution). You no longer have any subprocesses running. The whole modification of your directory name is internal.
By the way, this should also work too:
R2K_TEMP_DIR="$R2K_PROCESSED/${file#$R2K_SOURCE}"
I just didn't test that.

Resources