I want to search a directory (let's call it "testDir") for files which names start with a letter "a", have letter "z" at fourth position and their file extension is .html.
Is there any way to use grep for this? How can I search for a character at fixed index?
You can use native Bash pattern matching: a??z*.html. This pattern means exactly what you're asking for:
Start with the letter "a"
Followed by any two characters
Followed by the letter "z" (4th position)
Followed by 0 or more characters
Ending with ".html"
You can get the matching filenames with any shell tool that prints filenames when passed as arguments.
Some examples:
ls testDir/a??z*.html or echo testDir/a??z*.html. Note that these will print with the testDir/ prefix.
(cd testDir && echo a??z*.html) will print just the filenames without the testDir/ prefix.
Note that the ls command will produce an error when there are no matching files, while the echo command will print the pattern (a??z*.html).
For more details on pattern matching, see the Pattern Matching section in man bash.
If you are looking for an alternative that produces no output when there are no matches, grep will be easier to use, but grep uses different syntax for matching pattern, it uses regular expressions.
The same pattern written in regular expressions is ^a..z.*\.html$.
This breaks down to:
^ means start of line, so ^a means to start with "a"
. is any character, precisely one
.* is 0 or more of any character
\. is a "."
$ means end of line, so html$ means to end with "html"
Here's one way to apply it to your example:
(cd testDir && ls | grep '^a..z.*\.html$')
How about this:
ls -d testDir/a??z* |grep -e '.html$'
Related
I'm trying to convert 3,000 or so .svg files from CapitalCase to camelCase.
Current:
-Folder
--FileName1
--FileName2
--FileName3
Goal:
-Folder
--fileName1
--fileName2
--fileName3
How can I use terminal to change the casing on the first character with to lowercase?
Currently I've been trying something along these lines: for f in *.svg; do mv -v "$f" "${f:1}"; done
All files in the folder start with a letter or number.
This can be done very succinctly in zsh with zmv:
autoload zmv
zmv -nvQ '(**/)(?)(*.svg)(.)' '$1${(L)2}$3'
This will recurse through any number of directory levels, and can handle name collisions and other edge cases.
Some of the pieces:
-n: no execution. With this option, zmv will only report what changes it would make. It's a dry run that can be used to test out the patterns. Remove it when you're ready to actually change the names.
-v: verbose.
-Q: qualifiers. Used to indicate that the source pattern includes a glob qualifier (in our case (.)).
'(**/)(?)(*.svg)(.)': source pattern. This is simply a regular zsh glob pattern, divided into groups with parentheses. The underlying pattern is **/?*.svg(.). The pieces:
(**/): directories and subdirectories. This will match any number of directory levels (to only affect the current directory, see below).
(?): matches a single character at the start of the file name. We'll convert this to lowercase later.
(*.svg): matches the rest of the file name.
(.): regular files only. This is a zsh glob qualifier; zmv recognizes it as a qualifier instead of a grouping because of the -Q option. The . qualifier limits the matching to regular files so that we don't try to rename directories.
'$1${(L)2}$3': destination pattern. Each of the groupings in the source pattern is referenced in order with $1, $2, etc.
$1: the directory. This could contain multiple levels.
${(L)2}: The first letter in the file name, converted to lowercase. This uses the L parameter expansion flag to change the case.
The l expansion modifier will also work: $2:l.
The conversion can handle non-ASCII characters, e.g. Éxito would
become éxito.
$3: the rest of the file name, including the extension.
Variations
This will only change files in the current directory:
zmv -nv '(?)(*.svg)' '$1:l$2'
The source pattern in the following version will only match files that start with an uppercase letter. Since the zmv utility won't rename files if the source and destination match, this isn't strictly necessary, but it will be slightly more efficient:
zmv -nvQ '(**/)([[:upper:]])(*.svg)(.)' '$1${(L)2}$3'
More information
zmv documentation:
https://zsh.sourceforge.io/Doc/Release/User-Contributions.html#index-zmv
zsh parameter expansion flags:
https://zsh.sourceforge.io/Doc/Release/Expansion.html#Parameter-Expansion-Flags
Page with some zsh notes, including a bunch of zmv examples:
https://grml.org/zsh/zsh-lovers.html
Solving in bash, tested and working fine, be careful though with your files you working on.
Renaming files in current directory where this script is (1st arg then'd be .) or provide a path, it's do lower-casing of the first letter, if it was uppercase, and yet nothing if it was a number, argument must be provided:
# 1 argument - folder name
# 2 argument - file extension (.txt, .svg, etc.)
for filename in $(ls "$1" | grep "$2")
do
firstChar=${filename:0:1}
restChars=${filename:1}
if [[ "$firstChar" =~ [A-Z] ]] && ! [[ "$firstChar" =~ [a-z] ]]; then
toLowerFirstChar=$(echo $firstChar | awk '{print tolower($0)}')
modifiedFilename="$toLowerFirstChar$restChars"
mv "$1/$filename" "$1/$modifiedFilename"
else
echo "Non-alphabetic or already lowercase"
# here may do what you want fith files started with numbers in name
fi
done
Use: bash script.sh Folder .txt
ATTENTION: Now here after running script and renaming, names of some files may coincide and there would be a conflict in this case. Can later fix it and update this script.
I am new to shell script. I want to iterate a directory for the below specific pattern.
Ad_sf_03041500000.dat
SF_AD_0304150.DEL
SF_AD_0404141.EXP
Number of digits should be exactly match with this pattern.
I am using KSH shell script. Could you please help me to iterate only those files in for loop.
The patterns you are looking for are
Ad_sf_{11}([[:digit:]]).dat
SF_AD_{7}([[:digit:]]).DEL
SF_AD_{7}([[:digit:]]).EXP
Note that the {n}(...) pattern, to match exactly n occurrences of the following pattern, is an extension unique to ksh (as far as I know, not even zsh provides an equivalent).
To iterate over matching files, you can use
for f in Ad_sf_{11}(\d).dat SF_AD_{7}(\d).#(DEL|EXP); do
where I've use the "pick one" operator #(...) to combine the two shorter patterns into a single pattern, and I've used \d, which ksh supports as a shorter version of [[:digit:]] when inside parentheses.
Automatic wildcard generation method. Print the filenames with leading text and line numbers...
POSIX shell:
2> /dev/null find \
$(echo Ad_sf_03041500000.dat SF_AD_0304150.DEL SF_AD_0404141.EXP |
sed 's/[0-9]/[0-9]/g' ) |
while read f ; do
echo "Here's $f";
done | nl
ksh (with a spot borrowed from Chepner):
set - Ad_sf_03041500000.dat SF_AD_0304150.DEL SF_AD_0404141.EXP
for f in ${*//[0-9]/[0-9]} ; do [ -f "$f" ] || continue
echo "Here's $f";
done | nl
Output of either method:
1 Here's Ad_sf_03041500000.dat
2 Here's SF_AD_0304150.DEL
3 Here's SF_AD_0404141.EXP
If the line numbers aren't wanted, omit the | nl. echo can be replaced with whatever command needs to be run on the files.
How the POSIX code works. The OP spec is simple enough to churn out the correct wildcard with a little tweaking. Example:
echo Ad_sf_03041500000.dat SF_AD_0304150.DEL SF_AD_0404141.EXP |
sed 's/[0-9]/[0-9]/g'
Which outputs exactly the patterns needed (line feeds added for clarity):
Ad_sf_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].dat
SF_AD_[0-9][0-9][0-9][0-9][0-9][0-9][0-9].DEL
SF_AD_[0-9][0-9][0-9][0-9][0-9][0-9][0-9].EXP
The patterns above go to find, which prints only the matching filenames, (not the pattern itself when there are no files), then the filenames go to a while loop.
(The ksh variant is the same method but uses pattern substitution, set, and test -f in place of sed and find.)
I found a good answer that explains how to remove a specified pattern from a string variable. In this case, to remove 'foo' we use the following:
string="fooSTUFF"
string="${string#foo}"
However, I would like to add the "OR" functionality that would be able to remove 'foo' OR 'boo' in the cases when my string starts with any of them, and leave the string as is, if it does not start with 'foo' or 'boo'. So, the modified script should look something like that:
string="fooSTUFF"
string="${string#(foo OR boo)}"
How could this be properly implemented?
If you have set the extglob (extended glob) shell option with
shopt -s extglob
Then you can write:
string="${string##(foo|boo)}"
The extended patterns are documented in the bash manual; they take the form:
?(pattern-list): Matches zero or one occurrence of the given patterns.
*(pattern-list): Matches zero or more occurrences of the given patterns.
+(pattern-list): Matches one or more occurrences of the given patterns.
#(pattern-list): Matches one of the given patterns.
!(pattern-list): Matches anything except one of the given patterns.
In all cases, pattern-list is a list of patterns separated by |
You need an extended glob pattern for that (enabled with shopt -s extglob):
$ str1=fooSTUFF
$ str2=booSTUFF
$ str3=barSTUFF
$ echo "${str1##(foo|boo)}"
STUFF
$ echo "${str2##(foo|boo)}"
STUFF
$ echo "${str3##(foo|boo)}"
barSTUFF
The #(pat1|pat2) matches one of the patterns separated by |.
#(pat1|pat2) is the general solution for your question (multiple patterns); in some simple cases, you can get away without extended globs:
echo "${str#[fb]oo}"
would work for your specific example, too.
You can use:
string=$(echo $string | tr -d "foo|boo")
I want to search the below text file for words that ends in _letter, and get the whole portion upto "::". There is no space between any letter
blahblah:/blahblah::abc_letter:/blahblah/blahblah
blahblah:/blahblah::cd_123_letter:/blahblah/blahblah
blahblah:::/blahblah::24_cde_letter:/blahblah/blahblah
blahblah::/blahblah::45a6_letter:/blahblah/blahblah
blahblah:/blahblah::fgh_letter:/blahblah/blahblah
blahblah:/blahblah::789_letter:/blahblah/blahblah
I tried
egrep -o '*_letter'
and
egrep -o "*_letter"
But it only returns the word _letter
then I want to feed the input to the parametre of a shell script for loop. So the script will look like following
for i in [grep command]
mkdir $i
end
It will create the following directories
abc_letter/
cd_123_letter/
24_cde_letter/
45a6_letter/
fgh_letter/
789_letter/
ps: The result between :: and _letter doesn't contain any special character, only alphanumeric character
also my system doesn't have perl
Assuming no spaces or new-lines:
for i in $(sed 's/^.*:\([^/]*_letter\):.*$/\1/g' infile); do
mkdir $i
done
To extract after : to _letter strings from a file.txt and use them in your for loop, you can use the following egrep and revise your: script.sh, like this:
#!/bin/bash
for i in $(egrep -o "[^:]+_letter" file.txt); do
mkdir -p $i
done
Then you run ./script.sh, and later you check with ls, you see:
$ ls -1
24_cde_letter
45a6_letter
789_letter
abc_letter
cd_123_letter
fgh_letter
file.txt
script.sh
Explanation
Your original egrep -o '*_letter' probably just confused bash filename expansion with regular expression,
In bash, *something uses star globbing character to match * = anything here + something.
However in regular expression star * means the preceding character zero or more times. Since * is at the beginning of what you wrote, there is nothing before it, so it does not match anything there.
The only thing egrep can match is _letter, and since we are using the -o option it only displays the match, on an individual line, and thus why you originally only saw a line of _letter matches
Our new changes:
egrep pattern starts with [^ ... ], a negation, matches the opposite of what characters you put within. We put : within.
The + says to match the preceding one or more times.
So combined, it says look for anything-but-:, and do this one or more times.
Thus of course it matches anything after :, and keeps matching, until the next part of the pattern
The next part of the pattern is just _letter
egrep -o so only matched text will be shown, one per line
So in this way, from lines such as:
blahblah:/blahblah::abc_letter:/blahblah/blahblah
It successfully extracts:
abc_letter
Then, changes to your bash script:
Bash command substitution $() to have the results of the egrep command sent to the for-loop
for i value...; do ... done syntax
mkdir -p just a convenience in case you are re-testing, it will not error if directory was already made.
So altogether it helps to extract the pattern you wanted and generate directories with those names.
I've modified this script from the arch forums: https://wiki.archlinux.org/index.php/Convert_Flac_to_Mp3#With_FFmpeg
I'm trying to find specific file types in a directory structure, convert them to another music file type, and place them in a "converted" directory that maintains the same directory structure.
I'm stuck at stripping the string $b of its file name.
$b holds the string ./converted/alt-j/2012\ an\ awesome\ wave/01\ Intro.flac
Is there a way I can remove the file name from the string? I don't think ffmpeg can create/force parent directories of output files.
#!/bin/bash
# file convert script
find -type f -name "*.flac" -print0 | while read -d $'\0' a; do
b=${a/.\//.\/converted/}
< /dev/null ffmpeg -i "$a" "${b[#]/%flac/ogg}"
#echo "${b[#]/%flac/ogg}"
I'm stuck at stripping the string $b of its file name.
Let us start with b:
$ b=./converted/alt-j/2012\ an\ awesome\ wave/01\ Intro.flac
To remove the file name, leaving the path:
$ c=${b%/*}
To verify the result:
$ echo "$c"
./converted/alt-j/2012 an awesome wave
To make sure that directory c exists, do:
$ mkdir -p "$c"
Or, all in one step:
$ mkdir -p "${b%/*}"
How it works
We are using the shell's suffix removal feature. In the form ${parameter%word}, the shell finds the shortest match of word against the end of parameter and removes it. (Note that word is a shell glob, not a regex.) In out case, word is /* which matches a slash followed by any characters. Because this removes the shortest such match, this removes only the filename part from the parameter.
Suffix Removal Detailed Documentation
From man bash:
${parameter%word} ${parameter%%word}
Remove matching suffix pattern. The word is expanded to produce a pattern just as in pathname expansion. If the pattern
matches a trailing portion of the expanded value of parameter, then the result of the expansion is the expanded value of
parameter with the shortest matching pattern (the %'' case) or the longest matching pattern (the%%'' case) deleted.
If parameter is # or *, the pattern removal operation is applied to each positional parameter in turn, and the expansion
is the resultant list. If parameter is an array variable subscripted with # or *, the pattern removal operation is
applied to each member of the array in turn, and the expansion is the resultant list.