How can I get a long listing of text files containing "foo" followed by two digits? - shell

Using metacharacters, I need to perform a long listing of all files whose name contains the string foo followed by two digits, then followed by .txt. foo**.txt will not work, obviously. I can't figure out how to do it.

Use Valid Shell Globbing with Character Class
To find your substring anywhere in a filename like bar-foo12-baz.txt, you need a wilcard before and after the match. You can also use a character class in your pattern to match a limited range of characters. For example, in Bash:
# Explicit character classes.
ls -l *foo[0-9][0-9]*.txt
# POSIX character classes.
ls -l *foo[[:digit:]][[:digit:]]*.txt
See Also
Filename Expansion
Pattern Matching

Something like ls foo[0-9][0-9]*.txt of whatever exactly fits your pattern.

Related

Bash glob, how to OR over strings of non unit length?

I have in a directory a bunch of files. Each file's basename ends with a two digit number and a letter, such as file_01A.txt, file_03B.txt, file_13A.txt.
In a terminal using bash (I assume, working on a mac osx) I use
ls *01*[AB]*.txt
returns all files such as 01A and 01B. This makes sense to me.
ls *02*[AB]*.txt
returns similarly all files such as 02A and 02B.
Now I want to return all files 01A, 01B, 02A, 02B. Hence I want something like:
ls *(01 or 02)*[AB]*.txt
Attempt 1: I tried with | but that throws an error.
Attempt 2: ls *[01,02]*[AB]*.tex but that gives the 03 files too, since I assume it is interpreting the 01 and 02 as individual matches.
Attempt 3: ls *["01","02"]*[AB]*.tex is the same again.
It's not hard to articulate a single wildcard which matches your requirement.
ls *0[12]*[AB]*.tex
In the general case, use multiple wildcards if you can't articulate a single one. Notice that the shell expands them in the order you write them, and if they both match some files, there will be duplicates in the expansion.
ls *01*[AB]*.tex *02*[AB]*.tex
You seem to be confused about what the metacharaters mean. * matches any string, ? matches any character, and [abc] matches any one character which is listed between the square brackets. [!abc] watches a single character which is not a, b, or c. Bash also supports an extension called brace expansion, where foo{bar,quux} is basically an abbreviation of foobar fooquux. Your attempt could thus be rearticulated as
ls *{01,02}*[AB].tex
though the repeated prefix 0 is obviously redundant, and would better be left outside the braces, and then you might as well switch back to straight square brackets.
There is also a separate extended globbing syntax which allows for more elaborate wildcards. See the reference manual for details.

How to get match of a pattern even if it is splitted by characters using a bash command (similar to grep)?

I'm trying to output all the lines of a file which contain a specific word/pattern even if it contains other characters between its letters.
Let's say we have a bunch of domain names and we want to filter out all those that contain "paypal" inside, I would like to have this kind of output :
pay-pal-secure.com
payppal.net
etc...
I was wondering if this is possible with grep or does it exist something else that might do it.
Many thanks !
Replace paypal with regexp p.*a.*y.*p.*a.*l to allow all characters between the letters.
Update:
Use extended regular expression p.{0,2}a.{0,2}y.{0,2}p.{0,2}a.{0,2}l to limit characters between the letters to none to two.
Example: grep -E 'p.{0,2}a.{0,2}y.{0,2}p.{0,2}a.{0,2}l' file
See: The Stack Overflow Regular Expressions FAQ
Alternatively you could use agrep (approximate grep):
$ agrep -By paypal file
agrep: 2 words match within 1 error
pay-pal-secure.com
payppal.net

Extended glob can't retrace behaviour

I looked at a bash guide where I found this example:
http://guide.bash.academy/expansions/
$ ls !(my*).txt # All the .txt files that do not begin with my.
hello.txt
$ ls !(my)*.txt # Can you guess why this one matches myscript.txt?
myscript.txt
hello.txt
I'm familiar with basic concepts of regular expressions maybe this is confusing me because I'm trying to apply those concepts to extended globs in bash.
I do not understand why !(my)*.txt is expanding myscript.txt in bash. The explanation in the guide does not help me at all.
My reasoning:
!(my*).txt does not match myscript.txt because it does start with my then matches the rest of the characters script and at the end it matches .txt
!(my)*.txt does not (wrong!!!) match myscript.txt because it is starting with my followed by any characters and at the end it matches .txt
Where am I wrong in my argumentation?
This is a common gotcha with wildcards. The question to ask yourself is, is there any way to split up myscript.txt such that the first piece matches !(my) and the second matches *.txt?
The answer is, counter-intuitively, yes: If you "split" "myscript.txt" into "" (the empty string) and "myscript.txt" then the empty string matches !(my) and "myscript.txt" matches *.txt. The empty string is a valid match!

Removing an optional / (directory separator) in Bash

I have a Bash script that takes in a directory as a parameter, and after some processing will do some output based on the files in that directory.
The command would be like the following, where dir is a directory with the following structure inside
dir/foo
dir/bob
dir/haha
dir/bar
dir/sub-dir
dir/sub-dir/joe
> myscript ~/files/stuff/dir
After some processing, I'd like the output to be something like this
foo
bar
sub-dir/joe
The code I have to remove the path passed in is the following:
shopt -s extglob
for file in $files ; do
filename=${file#${1}?(/)}
This gets me to the following, but for some reason the optional / is not being taken care of. Thus, my output looks like this:
/foo
/bar
/sub-dir/joe
The reason I'm making it optional is because if the user runs the command
> myscript ~/files/stuff/dir/
I want it to still work. And, as it stands, if I run that command with the trailing slash, it outputs as desired.
So, why does my ?(/) not work? Based on everything I've read, that should be the right syntax, and I've tried a few other variations as well, all to no avail.
Thanks.
that other guy's helpful answer solves your immediate problem, but there are two things worth nothing:
enumerating filenames with an unquoted string variable (for file in $files) is ill-advised, as sjsam's helpful answer points out: it will break with filenames with embedded spaces and filenames that look like globs; as stated, storing filenames in an array is the robust choice.
there is no strict need to change global shell option shopt -s extglob: parameter expansions can be nested, so the following would work without changing shell options:
# Sample values:
file='dir/sub-dir/joe'
set -- 'dir/' # set $1; value 'dir' would have the same effect.
filename=${file#${1%/}} # -> '/sub-dir/joe'
The inner parameter expansion, ${1%/}, removes a trailing (%) / from $1, if any.
I suggested you change files to an array which is a possible workaround for non-standard filenames that may contain spaces.
files=("dir/A/B" "dir/B" "dir/C")
for filename in "${files[#]}"
do
echo ${filename##dir/} #replace dir/ with your param.
done
Output
A/B
B
C
Here's the documentation from man bash under "Parameter Expansion":
${parameter#word}
${parameter##word}
Remove matching prefix pattern. The word is
expanded to produce a pattern just as in pathname
expansion. If the pattern matches the beginning of
the value of parameter, then the result of the
expansion is the expanded value of parameter with
the shortest matching pattern (the ``#'' case) or
the longest matching pattern (the ``##'' case)
deleted.
Since # tries to delete the shortest match, it will never include any trailing optional parts.
You can just use ## instead:
filename=${file##${1}?(/)}
Depending on what your script does and how it works, you can also just rewrite it to cd to the directory to always work with paths relative to .

bash copy file where some of the filename is not known

In a bash script i want to copy a file but the file name will change over time.
The start and end of the file name will however stay the same.
is there a way so i get the file like so:
cp start~end.jar
where ~ can be anything?
the cp command would be run a a bash script on a ubuntu machine if this makes and difference.
A glob (start*end) will give you all matching files.
Check out the Expansion > Pathname Expansion > Pattern Matching section of the bash manual for more specific control
* Matches any string, including the null string.
? Matches any single character.
[...] Matches any one of the enclosed characters. A pair of characters separated by a hyphen denotes a range expression; any character that sorts between those two characters, inclusive, using the current locale's collat-
ing sequence and character set, is matched. If the first character following the [ is a ! or a ^ then any character not enclosed is matched. The sorting order of characters in range expressions is determined by
the current locale and the value of the LC_COLLATE shell variable, if set. A - may be matched by including it as the first or last character in the set. A ] may be matched by including it as the first character in
the set.
and if you enable extglob:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Use a glob to capture the variable text:
cp start*end.jar

Resources