Bash glob, how to OR over strings of non unit length? - bash

I have in a directory a bunch of files. Each file's basename ends with a two digit number and a letter, such as file_01A.txt, file_03B.txt, file_13A.txt.
In a terminal using bash (I assume, working on a mac osx) I use
ls *01*[AB]*.txt
returns all files such as 01A and 01B. This makes sense to me.
ls *02*[AB]*.txt
returns similarly all files such as 02A and 02B.
Now I want to return all files 01A, 01B, 02A, 02B. Hence I want something like:
ls *(01 or 02)*[AB]*.txt
Attempt 1: I tried with | but that throws an error.
Attempt 2: ls *[01,02]*[AB]*.tex but that gives the 03 files too, since I assume it is interpreting the 01 and 02 as individual matches.
Attempt 3: ls *["01","02"]*[AB]*.tex is the same again.

It's not hard to articulate a single wildcard which matches your requirement.
ls *0[12]*[AB]*.tex
In the general case, use multiple wildcards if you can't articulate a single one. Notice that the shell expands them in the order you write them, and if they both match some files, there will be duplicates in the expansion.
ls *01*[AB]*.tex *02*[AB]*.tex
You seem to be confused about what the metacharaters mean. * matches any string, ? matches any character, and [abc] matches any one character which is listed between the square brackets. [!abc] watches a single character which is not a, b, or c. Bash also supports an extension called brace expansion, where foo{bar,quux} is basically an abbreviation of foobar fooquux. Your attempt could thus be rearticulated as
ls *{01,02}*[AB].tex
though the repeated prefix 0 is obviously redundant, and would better be left outside the braces, and then you might as well switch back to straight square brackets.
There is also a separate extended globbing syntax which allows for more elaborate wildcards. See the reference manual for details.

Related

Folder listing with gsutil with condition

I have got this: gsutil ls -d gs://mystorage/*123*,
which gives me all files matching the pattern "123".
I wonder if i could do this with condition like >123 and <127. To grab all files whose names contain 124, 125 and 126.
Other than *, gsutil supports special wildcard names.
You can use these special wildcards to match the name of your files, but keep in mind that you are working with strings and characters rather than numbers, therefore the solution is not very straight forward. Here is a guide using regexp, that better explains how to work with digits, in a general way.
For your specific question, you would end up with something like:
gsutil ls -d gs://mystorage/*12[456]*

Weird issue when running grep with the --include option

Here is the code at the bash shell. How is the file mask supposed to be specified, if not this way? I expected both commands to find the search expression, but it's not happening. In this example, I know in advance that I prefer to restrict the search to python source code files only, because unqualified searches are silly time wasters.
So, this works as expected:
grep -rni '/home/ga/projects' -e 'def Pr(x,u,v)'
/home/ga/projects/anom/anom.py:27:def Pr(x,u,v): blah, blah, ...
but this won't work:
grep --include=\*.{py} -rni '/home/ga/projects' -e 'def Pr(x,u,v)'
I'm using GNU grep version 2.16.
--include=\*.{py} looks like a broken attempt to use brace expansion (an unquoted {...} expression).
However, for brace expansion
to occur in bash (and ksh and zsh), you must either have:
a list of at least 2 items, separated with ,; e.g. {py,txt}, which expands to 2 arguments, py and txt.
or, a range of items formed from two end points, separated with ..; e.g., {1..3}, which expands to 3 arguments, 1, 2, and 3.
Thus, with a single item, simply do not use brace expansion:
--include=\*.py
If you did have multiple extensions to consider, e.g., *.py as well as *.pyc files, here's a robust form that illustrates the underlying shell features:
'--include=*.'{py,pyc}
Here:
Brace expansion is applied, because {...} contains a 2-item list.
Since the {...} directly follows the literal (single-quoted) string --include=*., the results of the brace expansion include the literal part.
Therefore, 2 arguments are ultimately passed to grep, with the following literal content:
--include=*.py
--include=*.pyc
Your command fails because of the braces '{}'. It will search for it in the file name. You can create a file such as 'myscript.{py}' to convince yourself. You'll see it will appear in the results.
The correct option parameter would be '*.py' or the equivalent \*.py. Either way will protect it from being (mis)interpreted by the shell.
On the other side, I can only advise to use the command find for such jobs :
find /home/ga/projects -regex '.*\.py$' -exec grep -e "def Pr(x,u,v)" {} +
That will protect you from hard to understand shell behaviour.
Try like this (using quotes to be safe; also better readability than backslash escaping IMHO):
grep --include='*.py' ...
your \*.{py} brace expansion usage isn't supported at all by grep. Please see the comments below for the full investigation regarding this. For the record, blame this answer for the resulting brace wars ;)
By the way, the brace expansion works generally fine in Bash. See mklement0 answer for more details.
Ack. As an alternative, you might consider switching to ack instead from now on. It's a tool just like grep, but fully optimized for programmers.
It's a great fit for what you are doing. A nice quote about it:
Every once in a while something comes along that improves an idea so much, you can't ignore it. Such a thing is ack, the grep replacement.

How can I get a long listing of text files containing "foo" followed by two digits?

Using metacharacters, I need to perform a long listing of all files whose name contains the string foo followed by two digits, then followed by .txt. foo**.txt will not work, obviously. I can't figure out how to do it.
Use Valid Shell Globbing with Character Class
To find your substring anywhere in a filename like bar-foo12-baz.txt, you need a wilcard before and after the match. You can also use a character class in your pattern to match a limited range of characters. For example, in Bash:
# Explicit character classes.
ls -l *foo[0-9][0-9]*.txt
# POSIX character classes.
ls -l *foo[[:digit:]][[:digit:]]*.txt
See Also
Filename Expansion
Pattern Matching
Something like ls foo[0-9][0-9]*.txt of whatever exactly fits your pattern.

Linux shell list file what's the difference bewteen tmp/**/* and tmp/*

I encounter one problem about the file system in the shell.
what's difference between tmp/**/* and tmp/*?
I make the experiment in my system,
have this directory dir2
dir2
-->dir1
-->xx2
-->ff.txt
and I run ls dir2/*:
dir2/ff.txt
dir2/dir1:
xx2
then I run ls dir2/**/*:
dir2/dir1/xx2
So it means the ** is to ignore this directory(like ignore the dir1),
Can some one help me ?
I think there's a formatting issue in the question test, but I'll answer based on the question title and examples.
There shouldn't be any difference between a single and double asterisk at any single level of the path. Either expression matches any name, except for hidden ones which start with a dot (this can be changed by shell options). So:
tmp/**/* (equivalent to tmp/*/*) is expanded to all names which are nested two levels deep in tmp. The first asterisk expands only to directories and not files at the first level because it's followed by a slash.
tmp/* expands to anything nested one level deep inside tmp.
To this comes the fact that ls will list contents of directory if a directory is given on its command line. This can be overridden by adding -d option to ls.

11*(...) as a bash parameter without quotation marks

I'm trying to write a small piece of code that passes a small formula to another program, however i've found that something strange happens when the formula starts with 11*(:
$ echo 11*15
Neatly prints '11*15'
$ echo 21*(15)
Neatly prints '21*(15)', while
echo 11*(15)
Only gives '11'. As far as I've found this only happens with '11*('. I know that this can be solved by using proper quotation marks, but I'm still curious as to why this happens.
Does anyone know?
How is your program coded? If its coded to take in parameters, then pass your formula like
./myprogram "11*15"
or
echo '11*15' | myprogram
If you do echo just like that on the command line, you may inadvertently display files that has 11 in its file name
11*(15) uses a Bash-specific extended glob syntax. You've stumbled across it accidentally, emphasizing why quotation marks are a good idea. (I also learned a lot tracking down why it was working differently for me; thanks for that.)
The behavior of
echo 11*(15)
in bash is going to vary depending on whether extglob is enabled. If it's enabled *(PATTERN-LIST) matches zero or more occurrences of the patterns. If it's disabled, it doesn't, and the resulting ( is likely to cause a syntax error.
For example:
$ ls
11 115 1155 11555 115555
$ shopt -u extglob
$ echo 11*(55)
bash: syntax error near unexpected token `('
$ shopt -s extglob
$ echo 11*(55)
11 1155 115555
$
(This explains the odd behavior I discussed in comments.)
Quoting from the bash 4.2.8 documentation (info bash):
If the `extglob' shell option is enabled using the `shopt' builtin,
several extended pattern matching operators are recognized. In the
following description, a PATTERN-LIST is a list of one or more
patterns separated by a `|'. Composite patterns may be formed using
one or more of the following sub-patterns:
`?(PATTERN-LIST)'
Matches zero or one occurrence of the given patterns.
`*(PATTERN-LIST)'
Matches zero or more occurrences of the given patterns.
`+(PATTERN-LIST)'
Matches one or more occurrences of the given patterns.
`#(PATTERN-LIST)'
Matches one of the given patterns.
`!(PATTERN-LIST)'
Matches anything except one of the given patterns.

Resources