Remove numbers at beginning of filenames in directory in bash - bash

In an attempt to rename the files in one directory with numbers at the front I made an error in my script so that this happened in the wrong directory. Therefore I now need to remove these numbers from the beginning of all of my filenames in a directory. These range from 1 to 3 digits. Examples of the filnames I am working with are:
706terrain_Slope1000m_Minimum_all_25PCs_bolt_all_25PCs_qq_bolt.png
680met_sfcWind_all_25PCs_bolt_number.txt
460greenness_NDVI_500m_min_all_25PCs_bolt_number.txt
I was thinking of using mv but I'm not really sure how to do it with varying numbers of digits at the beginning, so any advice would be appreciated!

A simple way in bash is making use of a regular expression test:
for file in *; do
[[ -f "${file}" ]] && [[ "${file}" =~ (^[0-9]+) ]] && mv ${file} ${file/${BASH_REMATCH[1]}}
done
This does the following:
[[ -f "${file}" ]]: test if file is a file, if so
[[ "${file}" =~ (^[0-9]+) ]]: check if file starts with a number
${file/${BASH_REMATCH[1]}}: remove the number from the string file by using BASH_REMATCH, a variable that matches the groupings from the regex match.

If you've got perl's rename installed, the following should work :
rename 's/^[0-9]{1,3}//' /path/to/files
/path/to/files can be a list of specific files, or probably in your case a glob (e.g. *.{png,txt}). You don't need to select only files starting with digits as rename won't modify those that do not.

Using bash parameter expansion:
shopt -s extglob
for i in +([0-9])*.{txt,png}; do
mv -- "$i" "${i##+([0-9])}"
done
This will remove starting digits (any number) in filenames having png and txt extension.
The ## is removing the longest matching prefix pattern.
The +(...) is path name expansion syntax for repeated characters.
And [0-9] is pattern matching digits.

Alternate method using GNU find:
#!/usr/bin/env bash
find ./ \
-maxdepth 1\
-type f\
-name '[[:digit:]]*'\
-exec bash -c 'shopt -s extglob; f="${1##*/}"; d="${1%%/*}"; mv -- "$1" "${d}/${f##+([[:digit:]])}"' _ {} \;
Find all actual files in current directory whose name start with a digit.
For each found file, execute the Bash script below:
shopt -s extglob # need for extended pattern syntax
f="${1##*/}" # Get file name without directory path
d="${1%%/*}" # Get directory path without file name
mv -- "$1" "${d}/${f##+([[:digit:]])}" # Rename without the leading digits

Using basic features of a POSIX-compliant shell:
#!/bin/sh
for f in [[:digit:]]*; do
if [ -f "$f" ]; then
pf="${f%${f#???}}" pf="${pf##*[[:digit:]]}"
mv "$f" "$pf${f#???}"
fi
done

Related

How to search for keywords in metadata across all files in a folder recursively?

I need to search all subdirectories and files recursively from a location and print out any files that contains metadata matching any of my specified keywords.
e.g. If John Smith was listed as the author of hello.js in the metadata and one of my keywords was 'john' I would want the script to print hello.js.
I think the solution could be a combination of mdls and grep but I have not used bash much before so am a bit stuck.
I have tried the following command but this only prints the line the keyword is on if 'john' is found.
mdls hello.js | grep john
Thanks in advance.
(For reference I am using macOS.)
Piping the output of mdls into grep as you show in your question doesn't carry forward the filename. The following script iterates recursively over the files in the selected directory and checks to see if one of the attributes matches the desired pattern (using regex). If it does, the filename is output.
#!/bin/bash
shopt -s globstar # expand ** recursively
shopt -s nocasematch # ignore case
pattern="john"
attrib=Author
for file in /Users/me/myfiles/**/*.js
do
attrib_value=$(mdls -name "$attrib" "$file")
if [[ $attrib_value =~ $pattern ]]
then
printf 'Pattern: %s found in file $file\n' "$pattern" "$file"
fi
done
You can use a literal test instead of a regular expression:
if [[ $attrib_value == *$pattern* ]]
In order to use globstar you will need to use a later version of Bash than the one installed by default in MacOS. If that's not possible then you can use find, but there are challenges in dealing with filenames that contain newlines. This script takes care of that.
#!/bin/bash
shopt -s nocasematch # ignore case
dir=/Users/me/myfiles/
check_file () {
local attrib=$1
local pattern=$2
local file=$3
local attrib_value=$(mdls -name "$attrib" "$file")
if [[ $attrib_value =~ $pattern ]]
then
printf 'Pattern: %s found in file $file\n' "$pattern" "$file"
fi
}
export -f check_file
pattern="john"
attrib=Author
find "$dir" -name '*.js' -print0 | xargs -0 -I {} bash -c 'check_file "$attrib" "$pattern" "{}"'

Check if a filename has a string in it

I'm having problems creating an if statement to check the files in my directory for a certain string in their names.
For example, I have the following files in a certain directory:
file_1_ok.txt
file_2_ok.txt
file_3_ok.txt
file_4_ok.txt
other_file_1_ok.py
other_file_2_ok.py
other_file_3_ok.py
other_file_4_ok.py
another_file_1_not_ok.sh
another_file_2_not_ok.sh
another_file_3_not_ok.sh
another_file_4_not_ok.sh
I want to copy all files that contain 1_ok to another directory:
#!/bin/bash
directory1=/FILES/user/directory1/
directory2=/FILES/user/directory2/
string="1_ok"
cd $directory
for every file in $directory1
do
if [$string = $file]; then
cp $file $directory2
fi
done
UPDATE:
The simpler answer was made by Faibbus, but refer to Inian if you want to remove or simply move files that don't have the specific string you want.
The other answers are valid as well.
cp directory1/*1_ok* directory2/
Use find for that:
find directory1 -maxdepth 1 -name '*1_ok*' -exec cp -v {} directory2 \;
The advantage of using find over the glob solution posted by Faibbus is that it can deal with an unlimited number of files which contain 1_ok were the glob solution will lead to an argument list too long error when calling cp with too many arguments.
Conclusion: For interactive use with a limited number of input files the glob will be fine, for a shell script, which has to be stable, I would use find.
With your script I suggest:
#!/bin/bash
source="/FILES/user/directory1"
target="/FILES/user/directory2"
regex="1_ok"
for file in "$source"/*; do
if [[ $file =~ $regex ]]; then
cp -v "$file" "$target"
fi
done
From help [[:
When the =~ operator is used, the string to the right of the operator
is matched as a regular expression.
Please take a look: http://www.shellcheck.net/
Using extglob matching in bash with the below pattern,
+(pattern-list)
Matches one or more occurrences of the given patterns.
First enable extglob by
shopt -s extglob
cp -v directory1/+(*not_ok*) directory2/
An example,
$ ls *.sh
another_file_1_not_ok.sh another_file_3_not_ok.sh
another_file_2_not_ok.sh another_file_4_nnoot_ok.sh
$ shopt -s extglob
$ cp -v +(*not_ok*) somedir/
another_file_1_not_ok.sh -> somelib/another_file_1_not_ok.sh
another_file_2_not_ok.sh -> somelib/another_file_2_not_ok.sh
another_file_3_not_ok.sh -> somelib/another_file_3_not_ok.sh
To remove the files except the one containing this pattern, do
$ rm -v !(*not_ok*) 2>/dev/null

Trying to rename certain file types within recursive directories

I have a bunch of files within a directory structure as such:
Dir
SubDir
File
File
Subdir
SubDir
File
File
File
Sorry for the messy formatting, but as you can see there are files at all different directory levels. All of these file names have a string of 7 numbers appended to them as such: 1234567_filename.ext. I am trying to remove the number and underscore at the start of the filename.
Right now I am using bash and using this oneliner to rename the files using mv and cut:
for i in *; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
This is being run while I am CD'd into the directory. I would love to find a way to do this recursively, so that it only renamed files, not folders. I have also used a foreach loop in the shell, outside of bash for directories that have a bunch of folders with files in them and no other subdirectories as such:
foreach$ set p=`echo $f | cut -d/ -f1`
foreach$ set n=`echo $f | cut -d/ -f2 | cut -d_ -f2-10`
foreach$ mv $f $p/$n
foreach$ end
But that only works when there are no other subdirectories within the folders.
Is there a loop or oneliner I can use to rename all files within the directories? I even tried using find but couldn't figure out how to incorporate cut into the code.
Any help is much appreciated.
With Perl‘s rename (standalone command):
shopt -s globstar
rename -n 's|/[0-9]{7}_([^/]+$)|/$1|' **/*
If everything looks fine remove -n.
globstar: If set, the pattern ** used in a pathname expansion context will
match all files and zero or more directories and subdirectories. If
the pattern is followed by a /, only directories and subdirectories
match.
bash does provide functions, and these can be recursive, but you don't need a recursive function for this job. You just need to enumerate all the files in the tree. The find command can do that, but turning on bash's globstar option and using a shell glob to do it is safer:
#!/bin/bash
shopt -s globstar
# enumerate all the files in the tree rooted at the current working directory
for f in **; do
# ignore directories
test -d "$f" && continue
# separate the base file name from the path
name=$(basename "$f")
dir=$(dirname "$f")
# perform the rename, using a pattern substitution on the name part
mv "$f" "${dir}/${name/#???????_/}"
done
Note that that does not verify that file names actually match the pattern you specified before performing the rename; I'm taking you at your word that they do. If such a check were wanted then it could certainly be added.
How about this small tweak to what you have already:
for i in `find . -type f`; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
Basically just swapping the * with `find . -type f`
Should be possible to do this using find...
find -E . -type f \
-regex '.*/[0-9]{7}_.*\.txt' \
-exec sh -c 'f="${0#*/}"; mv -v "$0" "${0%/*}/${f#*_}"' {} \;
Your find options may be different -- I'm doing this in FreeBSD. The idea here is:
-E instructs find to use extended regular expressions.
-type f causes only normal files (not directories or symlinks) to be found.
-regex ... matches the files you're looking for. You can make this more specific if you need to.
exec ... \; runs a command, using {} (the file we've found) as an argument.
The command we're running uses parameter expansion first to grab the target directory and second to strip the filename. Note the temporary variable $f, which is used to address the possibility of extra underscores being part of the filename.
Note that this is NOT a bash command, though you can of course run it from the bash shell. If you want a bash solution that does not require use of an external tool like find, you may be able to do the following:
$ shopt -s extglob # use extended glob format
$ shopt -s globstar # recurse using "**"
$ for f in **/+([0-9])_*.txt; do f="./$f"; echo mv "$f" "${f%/*}/${f##*_}"; done
This uses the same logic as the find solution, but uses bash v4 extglob to provide better filename matching and globstar to recurse through subdirectories.
Hope these help.

Bash copy files to sudirectories based on matching variable

I have a directory with the following content:
/B7_001
/B7_002
B7_001_name.mat
B7_002_name.mat
Each subdirectory in this directory has the following structure:
/B7_001/results_activity/sham/task/
/B7_002/results_activity/sham/task/
I want to copy or move each file down the subdirectory tree of the subdirectory matching the first part of it's name.
For example copy file
"B7_001_name.mat" in to /B7_001/results_activity/sham/task/.
I haven’t had much success with the following code, so any tips would be appreciated. Thanks!
for i in B7_*
do
cp ${i}_name.mat /${i}/results_activity/sham/task/
done
Try to understand the following:
shopt -s nullglob # play it safe when using globs
for i in B7_*_name.mat; do
echo "i=$i"
d=${i%_name.mat}
echo "d=$d"
echo mv "$i" "/$d/results_activity/sham/task"
done
Run as is, it's harmless and will not perform any actions; only print some stuff to your terminal.
for i in B7_*_name.mat: that's a for loop where the variable i will successively take the filenames matching the glob B7_*_name.mat.
with this variable, we remove the _name.mat part using the shell parameter expansion:
d=${i%_name.mat}
which means that d is the expansion $i where the trailing (because of %) _name.mat is removed.
How about using find - while combination like below :
find . -type f -print0 | while read -r -d '' line
do
if [[ "$line" =~ ^\./B7_001 ]]
then
cp "$line" B7_001/results_activity/sham/task/
elif [[ "$line" =~ ^\./B7_002 ]]
then
cp "$line" B7_002/results_activity/sham/task/
fi
done

Filenames with wildcards in variables

#!/bin/bash
outbound=/home/user/outbound/
putfile=DATA_FILE_PUT_*.CSV
cd $outbound
filecnt=0
for file in $putfile; do let filecnt=filecnt+1; done
echo "Filecount: " $filecnt
So this code works well when there are files located in the outbound directory. I can place files into the outbound path and as long as they match the putfile mask then the files are incremented as expected.
Where the problem comes in is if I run this while there are no files located in $outbound.
If there are zero files there $filecnt still returns a 1 but I'm looking to have it return a 0 if there are no files there.
Am I missing something simple?
Put set -x just below the #! line to watch what your script is doing.
If there is no matching file, then the wildcard is left unexpanded, and the loop runs once, with file having the value DATA_FILE_PUT_*.CSV.
To change that, set the nullglob option. Note that this only works in bash, not in sh.
shopt -s nullglob
putfile=DATA_FILE_PUT_*.CSV
for file in $putfile; do let filecnt=filecnt+1; done
Note that the putfile variable contains the wildcard pattern, not the list of file names. It might make more sense to put the list of matches in a variable instead. This needs to be an array variable, and you need to change the current directory first. The number of matching files is then the length of the array.
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
echo "Filecount: " ${#putfiles}
If you need to iterate over the files, take care to protect the expansion of the array with double quotes, otherwise if a file name contains whitespace then it will be split over several words (and if a filename contains wildcard characters, they will be expanded).
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
for file in "${putfiles[#]}"; do
echo "Processing $file"
done
You could test if file exists first
for file in $putfile; do
if [ -f "$file" ] ; then
let filecnt=filecnt+1
fi
done
Or look for your files with find
for file in $(find . -type f -name="$putfile"); do
let filecnt=filecnt+1
done
or simply (fixed)
filecnt=$(find . -type f -name "$putfile" | wc -l); echo $filecnt
This is because when no matches are found, bash by default expands the wildcard DATA_FILE_PUT_*.CSV to the word DATA_FILE_PUT_*.CSV and therefore you end up with a count of 1.
To disable this behavior, use shopt -s nullglob
Not sure why you need a piece of code here. Following one liner should do your job.
ls ${outbound}/${putfile} | wc -l
Or
find ${outbound} -maxdepth 1 -type f -name "${putfile}" | wc -l

Resources