Check if a filename has a string in it - bash

I'm having problems creating an if statement to check the files in my directory for a certain string in their names.
For example, I have the following files in a certain directory:
file_1_ok.txt
file_2_ok.txt
file_3_ok.txt
file_4_ok.txt
other_file_1_ok.py
other_file_2_ok.py
other_file_3_ok.py
other_file_4_ok.py
another_file_1_not_ok.sh
another_file_2_not_ok.sh
another_file_3_not_ok.sh
another_file_4_not_ok.sh
I want to copy all files that contain 1_ok to another directory:
#!/bin/bash
directory1=/FILES/user/directory1/
directory2=/FILES/user/directory2/
string="1_ok"
cd $directory
for every file in $directory1
do
if [$string = $file]; then
cp $file $directory2
fi
done
UPDATE:
The simpler answer was made by Faibbus, but refer to Inian if you want to remove or simply move files that don't have the specific string you want.
The other answers are valid as well.

cp directory1/*1_ok* directory2/

Use find for that:
find directory1 -maxdepth 1 -name '*1_ok*' -exec cp -v {} directory2 \;
The advantage of using find over the glob solution posted by Faibbus is that it can deal with an unlimited number of files which contain 1_ok were the glob solution will lead to an argument list too long error when calling cp with too many arguments.
Conclusion: For interactive use with a limited number of input files the glob will be fine, for a shell script, which has to be stable, I would use find.

With your script I suggest:
#!/bin/bash
source="/FILES/user/directory1"
target="/FILES/user/directory2"
regex="1_ok"
for file in "$source"/*; do
if [[ $file =~ $regex ]]; then
cp -v "$file" "$target"
fi
done
From help [[:
When the =~ operator is used, the string to the right of the operator
is matched as a regular expression.
Please take a look: http://www.shellcheck.net/

Using extglob matching in bash with the below pattern,
+(pattern-list)
Matches one or more occurrences of the given patterns.
First enable extglob by
shopt -s extglob
cp -v directory1/+(*not_ok*) directory2/
An example,
$ ls *.sh
another_file_1_not_ok.sh another_file_3_not_ok.sh
another_file_2_not_ok.sh another_file_4_nnoot_ok.sh
$ shopt -s extglob
$ cp -v +(*not_ok*) somedir/
another_file_1_not_ok.sh -> somelib/another_file_1_not_ok.sh
another_file_2_not_ok.sh -> somelib/another_file_2_not_ok.sh
another_file_3_not_ok.sh -> somelib/another_file_3_not_ok.sh
To remove the files except the one containing this pattern, do
$ rm -v !(*not_ok*) 2>/dev/null

Related

How to search for keywords in metadata across all files in a folder recursively?

I need to search all subdirectories and files recursively from a location and print out any files that contains metadata matching any of my specified keywords.
e.g. If John Smith was listed as the author of hello.js in the metadata and one of my keywords was 'john' I would want the script to print hello.js.
I think the solution could be a combination of mdls and grep but I have not used bash much before so am a bit stuck.
I have tried the following command but this only prints the line the keyword is on if 'john' is found.
mdls hello.js | grep john
Thanks in advance.
(For reference I am using macOS.)
Piping the output of mdls into grep as you show in your question doesn't carry forward the filename. The following script iterates recursively over the files in the selected directory and checks to see if one of the attributes matches the desired pattern (using regex). If it does, the filename is output.
#!/bin/bash
shopt -s globstar # expand ** recursively
shopt -s nocasematch # ignore case
pattern="john"
attrib=Author
for file in /Users/me/myfiles/**/*.js
do
attrib_value=$(mdls -name "$attrib" "$file")
if [[ $attrib_value =~ $pattern ]]
then
printf 'Pattern: %s found in file $file\n' "$pattern" "$file"
fi
done
You can use a literal test instead of a regular expression:
if [[ $attrib_value == *$pattern* ]]
In order to use globstar you will need to use a later version of Bash than the one installed by default in MacOS. If that's not possible then you can use find, but there are challenges in dealing with filenames that contain newlines. This script takes care of that.
#!/bin/bash
shopt -s nocasematch # ignore case
dir=/Users/me/myfiles/
check_file () {
local attrib=$1
local pattern=$2
local file=$3
local attrib_value=$(mdls -name "$attrib" "$file")
if [[ $attrib_value =~ $pattern ]]
then
printf 'Pattern: %s found in file $file\n' "$pattern" "$file"
fi
}
export -f check_file
pattern="john"
attrib=Author
find "$dir" -name '*.js' -print0 | xargs -0 -I {} bash -c 'check_file "$attrib" "$pattern" "{}"'

Remove numbers at beginning of filenames in directory in bash

In an attempt to rename the files in one directory with numbers at the front I made an error in my script so that this happened in the wrong directory. Therefore I now need to remove these numbers from the beginning of all of my filenames in a directory. These range from 1 to 3 digits. Examples of the filnames I am working with are:
706terrain_Slope1000m_Minimum_all_25PCs_bolt_all_25PCs_qq_bolt.png
680met_sfcWind_all_25PCs_bolt_number.txt
460greenness_NDVI_500m_min_all_25PCs_bolt_number.txt
I was thinking of using mv but I'm not really sure how to do it with varying numbers of digits at the beginning, so any advice would be appreciated!
A simple way in bash is making use of a regular expression test:
for file in *; do
[[ -f "${file}" ]] && [[ "${file}" =~ (^[0-9]+) ]] && mv ${file} ${file/${BASH_REMATCH[1]}}
done
This does the following:
[[ -f "${file}" ]]: test if file is a file, if so
[[ "${file}" =~ (^[0-9]+) ]]: check if file starts with a number
${file/${BASH_REMATCH[1]}}: remove the number from the string file by using BASH_REMATCH, a variable that matches the groupings from the regex match.
If you've got perl's rename installed, the following should work :
rename 's/^[0-9]{1,3}//' /path/to/files
/path/to/files can be a list of specific files, or probably in your case a glob (e.g. *.{png,txt}). You don't need to select only files starting with digits as rename won't modify those that do not.
Using bash parameter expansion:
shopt -s extglob
for i in +([0-9])*.{txt,png}; do
mv -- "$i" "${i##+([0-9])}"
done
This will remove starting digits (any number) in filenames having png and txt extension.
The ## is removing the longest matching prefix pattern.
The +(...) is path name expansion syntax for repeated characters.
And [0-9] is pattern matching digits.
Alternate method using GNU find:
#!/usr/bin/env bash
find ./ \
-maxdepth 1\
-type f\
-name '[[:digit:]]*'\
-exec bash -c 'shopt -s extglob; f="${1##*/}"; d="${1%%/*}"; mv -- "$1" "${d}/${f##+([[:digit:]])}"' _ {} \;
Find all actual files in current directory whose name start with a digit.
For each found file, execute the Bash script below:
shopt -s extglob # need for extended pattern syntax
f="${1##*/}" # Get file name without directory path
d="${1%%/*}" # Get directory path without file name
mv -- "$1" "${d}/${f##+([[:digit:]])}" # Rename without the leading digits
Using basic features of a POSIX-compliant shell:
#!/bin/sh
for f in [[:digit:]]*; do
if [ -f "$f" ]; then
pf="${f%${f#???}}" pf="${pf##*[[:digit:]]}"
mv "$f" "$pf${f#???}"
fi
done

Trying to rename certain file types within recursive directories

I have a bunch of files within a directory structure as such:
Dir
SubDir
File
File
Subdir
SubDir
File
File
File
Sorry for the messy formatting, but as you can see there are files at all different directory levels. All of these file names have a string of 7 numbers appended to them as such: 1234567_filename.ext. I am trying to remove the number and underscore at the start of the filename.
Right now I am using bash and using this oneliner to rename the files using mv and cut:
for i in *; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
This is being run while I am CD'd into the directory. I would love to find a way to do this recursively, so that it only renamed files, not folders. I have also used a foreach loop in the shell, outside of bash for directories that have a bunch of folders with files in them and no other subdirectories as such:
foreach$ set p=`echo $f | cut -d/ -f1`
foreach$ set n=`echo $f | cut -d/ -f2 | cut -d_ -f2-10`
foreach$ mv $f $p/$n
foreach$ end
But that only works when there are no other subdirectories within the folders.
Is there a loop or oneliner I can use to rename all files within the directories? I even tried using find but couldn't figure out how to incorporate cut into the code.
Any help is much appreciated.
With Perl‘s rename (standalone command):
shopt -s globstar
rename -n 's|/[0-9]{7}_([^/]+$)|/$1|' **/*
If everything looks fine remove -n.
globstar: If set, the pattern ** used in a pathname expansion context will
match all files and zero or more directories and subdirectories. If
the pattern is followed by a /, only directories and subdirectories
match.
bash does provide functions, and these can be recursive, but you don't need a recursive function for this job. You just need to enumerate all the files in the tree. The find command can do that, but turning on bash's globstar option and using a shell glob to do it is safer:
#!/bin/bash
shopt -s globstar
# enumerate all the files in the tree rooted at the current working directory
for f in **; do
# ignore directories
test -d "$f" && continue
# separate the base file name from the path
name=$(basename "$f")
dir=$(dirname "$f")
# perform the rename, using a pattern substitution on the name part
mv "$f" "${dir}/${name/#???????_/}"
done
Note that that does not verify that file names actually match the pattern you specified before performing the rename; I'm taking you at your word that they do. If such a check were wanted then it could certainly be added.
How about this small tweak to what you have already:
for i in `find . -type f`; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
Basically just swapping the * with `find . -type f`
Should be possible to do this using find...
find -E . -type f \
-regex '.*/[0-9]{7}_.*\.txt' \
-exec sh -c 'f="${0#*/}"; mv -v "$0" "${0%/*}/${f#*_}"' {} \;
Your find options may be different -- I'm doing this in FreeBSD. The idea here is:
-E instructs find to use extended regular expressions.
-type f causes only normal files (not directories or symlinks) to be found.
-regex ... matches the files you're looking for. You can make this more specific if you need to.
exec ... \; runs a command, using {} (the file we've found) as an argument.
The command we're running uses parameter expansion first to grab the target directory and second to strip the filename. Note the temporary variable $f, which is used to address the possibility of extra underscores being part of the filename.
Note that this is NOT a bash command, though you can of course run it from the bash shell. If you want a bash solution that does not require use of an external tool like find, you may be able to do the following:
$ shopt -s extglob # use extended glob format
$ shopt -s globstar # recurse using "**"
$ for f in **/+([0-9])_*.txt; do f="./$f"; echo mv "$f" "${f%/*}/${f##*_}"; done
This uses the same logic as the find solution, but uses bash v4 extglob to provide better filename matching and globstar to recurse through subdirectories.
Hope these help.

bash rename files issue?

I know nothing about Linux commands o bash scripts so help me please.
I have a lot of file in different directories i want to rename all those files from "name" to "name.xml" using bash file is it possible to do that? I just find usefulness codes on the internet like this:
shopt -s globstar # enable ** globstar/recursivity
for i in **/*.txt; do
echo "$i" "${i/%.txt}.xml";
done
it does not even work.
For the purpose comes in handy the prename utility which is installed by default on many Linux distributions, usually it is distributed with the Perl package. You can use it like this:
find . -iname '*.txt' -exec prename 's/.txt/.xml/' {} \;
or this much faster alternative:
find . -iname '*.txt' | xargs prename 's/.txt/.xml/'
Explanation
Move/rename all files –whatever the extension is– in current directory and below from name to name.xml. You should test using echo before running the real script.
shopt -s globstar # enable ** globstar/recursivity
for i in **; do # **/*.txt will look only for .txt files
[[ -d "$i" ]] && continue # skip directories
echo "$i" "$i.xml"; # replace 'echo' by 'mv' when validated
#echo "$i" "${i/%.txt}.xml"; # replace .txt by .xml
done
Showing */.txt */.xml means effectively there are no files matching the given pattern, as by default bash will use verbatim * if no matches are found.
To prevent this issue you'd have to additionally set shopt -s nullglob to have bash just return nothing when there is no match at all.
After verifying the echoed lines look somewhat reasonable you'll have to replace
echo "$i" "${i/%.txt}.xml"
with
mv "$i" "${i/%.txt}.xml"
to rename the files.
You can use this bash script.
#!/bin/bash
DIRECTORY=/your/base/dir/here
for i in `find $DIRECTORY -type d -exec find {} -type f -name \*.txt\;`;
do mv $i $i.xml
done

How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?

Say I want to copy the contents of a directory excluding files and folders whose names contain the word 'Music'.
cp [exclude-matches] *Music* /target_directory
What should go in place of [exclude-matches] to accomplish this?
In Bash you can do it by enabling the extglob option, like this (replace ls with cp and add the target directory, of course)
~/foobar> shopt extglob
extglob off
~/foobar> ls
abar afoo bbar bfoo
~/foobar> ls !(b*)
-bash: !: event not found
~/foobar> shopt -s extglob # Enables extglob
~/foobar> ls !(b*)
abar afoo
~/foobar> ls !(a*)
bbar bfoo
~/foobar> ls !(*foo)
abar bbar
You can later disable extglob with
shopt -u extglob
The extglob shell option gives you more powerful pattern matching in the command line.
You turn it on with shopt -s extglob, and turn it off with shopt -u extglob.
In your example, you would initially do:
$ shopt -s extglob
$ cp !(*Music*) /target_directory
The full available extended globbing operators are (excerpt from man bash):
If the extglob shell option is enabled using the shopt builtin, several extended
pattern matching operators are recognized.A pattern-list is a list of one or more patterns separated by a |. Composite patterns may be formed using one or more of the following sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
So, for example, if you wanted to list all the files in the current directory that are not .c or .h files, you would do:
$ ls -d !(*#(.c|.h))
Of course, normal shell globing works, so the last example could also be written as:
$ ls -d !(*.[ch])
Not in bash (that I know of), but:
cp `ls | grep -v Music` /target_directory
I know this is not exactly what you were looking for, but it will solve your example.
If you want to avoid the mem cost of using the exec command, I believe you can do better with xargs. I think the following is a more efficient alternative to
find foo -type f ! -name '*Music*' -exec cp {} bar \; # new proc for each exec
find . -maxdepth 1 -name '*Music*' -prune -o -print0 | xargs -0 -i cp {} dest/
A trick I haven't seen on here yet that doesn't use extglob, find, or grep is to treat two file lists as sets and "diff" them using comm:
comm -23 <(ls) <(ls *Music*)
comm is preferable over diff because it doesn't have extra cruft.
This returns all elements of set 1, ls, that are not also in set 2, ls *Music*. This requires both sets to be in sorted order to work properly. No problem for ls and glob expansion, but if you're using something like find, be sure to invoke sort.
comm -23 <(find . | sort) <(find . | grep -i '.jpg' | sort)
Potentially useful.
You can also use a pretty simple for loop:
for f in `find . -not -name "*Music*"`
do
cp $f /target/dir
done
In bash, an alternative to shopt -s extglob is the GLOBIGNORE variable. It's not really better, but I find it easier to remember.
An example that may be what the original poster wanted:
GLOBIGNORE="*techno*"; cp *Music* /only_good_music/
When done, unset GLOBIGNORE to be able to rm *techno* in the source directory.
My personal preference is to use grep and the while command. This allows one to write powerful yet readable scripts ensuring that you end up doing exactly what you want. Plus by using an echo command you can perform a dry run before carrying out the actual operation. For example:
ls | grep -v "Music" | while read filename
do
echo $filename
done
will print out the files that you will end up copying. If the list is correct the next step is to simply replace the echo command with the copy command as follows:
ls | grep -v "Music" | while read filename
do
cp "$filename" /target_directory
done
One solution for this can be found with find.
$ mkdir foo bar
$ touch foo/a.txt foo/Music.txt
$ find foo -type f ! -name '*Music*' -exec cp {} bar \;
$ ls bar
a.txt
Find has quite a few options, you can get pretty specific on what you include and exclude.
Edit: Adam in the comments noted that this is recursive. find options mindepth and maxdepth can be useful in controlling this.
The following works lists all *.txt files in the current dir, except those that begin with a number.
This works in bash, dash, zsh and all other POSIX compatible shells.
for FILE in /some/dir/*.txt; do # for each *.txt file
case "${FILE##*/}" in # if file basename...
[0-9]*) continue ;; # starts with digit: skip
esac
## otherwise, do stuff with $FILE here
done
In line one the pattern /some/dir/*.txt will cause the for loop to iterate over all files in /some/dir whose name end with .txt.
In line two a case statement is used to weed out undesired files. – The ${FILE##*/} expression strips off any leading dir name component from the filename (here /some/dir/) so that patters can match against only the basename of the file. (If you're only weeding out filenames based on suffixes, you can shorten this to $FILE instead.)
In line three, all files matching the case pattern [0-9]*) line will be skipped (the continue statement jumps to the next iteration of the for loop). – If you want to you can do something more interesting here, e.g. like skipping all files which do not start with a letter (a–z) using [!a-z]*, or you could use multiple patterns to skip several kinds of filenames e.g. [0-9]*|*.bak to skip files both .bak files, and files which does not start with a number.
this would do it excluding exactly 'Music'
cp -a ^'Music' /target
this and that for excluding things like Music?* or *?Music
cp -a ^\*?'complete' /target
cp -a ^'complete'?\* /target

Resources