bash filename globbing - operate on files starting with capital - bash

Lets say I have a folder with the following jpeg-files:
adfjhu.jpg Afgjo.jpg
Bdfji.jpg bkdfjhru.jpg
Cdfgj.jpg cfgir.jpg
Ddfgjr.jpg dfgjrr.jpg
How do I remove or list the files that starts with a capital?
This can be solved with a combination of find, grep and xargs.
But it is possible with normal file-globbing/pattern matching in bash?
cmd below doesn't work due to the fact that (as far as I can tell) LANG is set to en_US
and the collation order.
$ ls [A-Z]*.jpg
Afgjo.jpg Bdfji.jpg bkdfjhru.jpg Cdfgj.jpg cfgir.jpg Ddfgjr.jpg dfgjrr.jpg
This sort of works
$ ls +(A|B|C|D)*.jpg
Afgjo.jpg Bdfji.jpg Cdfgj.jpg Ddfgjr.jpg
But I don't wanna do this for all characters A-Z for a general solution!
So is this possible?
cheers
//Fredrik

you should set your locale to the C (or POSIX) locale.
$ LC_ALL=C ls [A-Z]*.jpg
or
$ LC_ALL=C ls [[:upper:]]*.jpg
read here for more information: http://www.opengroup.org/onlinepubs/007908799/xbd/locale.html

Use a bracket expression with a character class:
ls -l [[:upper:]]*
See man 7 regex for a list of character classes and other information.
From that page:
Within a bracket expression, the name of a character class enclosed in '[:' and ':]' stands for the list of all characters belonging to that class. Standard character class names are:
alnum digit punct
alpha graph space
blank lower upper
cntrl print xdigit

Use grep:
ls | grep -e ^[A-Z]
If you want make more use a for loop:
for i in $(ls | grep -e ^[A-Z]); do echo $i ;done

Related

Using regex in grep filename

I want to search a certain string in a number of archival log folders which reflect different servers. I use 2 different commands as of now
-bash-4.1$ zcat /mnt/bkp/logs/cmmt-54-22[8-9]/my_app.2021-12-28-* | grep 'abc'
and
-bash-4.1$ zcat /mnt/bkp/logs/cmmt-54-23[0-3]/my_app.2021-12-28-* | grep 'abc'
I basically want to search on server folders cmmt-54-228, cmmt-54-229 .... cmmt-54-233.
I tried combining the two commands into one but it doesn't seem to be working some mistake in using regex from my side
-bash-4.1$ zcat /mnt/bkp/logs/cmmt-54-22[8-9]|3[0-3]/my_app.2021-12-28-* | grep 'abc'
Please help.
Regex is not glob. See man 7 glob vs man 7 regex.
grep with with regex. grep filters lines that match some regular expresion.
Shell expands words that you write. Shell replaces what you write that contains "filename expansion triggers" * ? [ and replaces that word with a list of words of matching filenames.
You can use extended pattern matching (see man bash), which sounds like the most natural here:
shopt -s extglob
echo /mnt/bkp/logs/cmmt-54-2#(2[8-9]|3[0-3])/my_app.2021-12-28-*
In interactive shell I would just write it twice:
zcat /mnt/bkp/logs/cmmt-54-22[8-9]/my_app.2021-12-28-* /mnt/bkp/logs/cmmt-54-23[0-3]/my_app.2021-12-28-*
Or with brace expansion (see man bash):
zcat /mnt/bkp/logs/cmmt-54-2{2[8-9],3[0-3]}/my_app.2021-12-28-*
Braces expansion first replaces the word by two words, then filename expansion replaces them for actual filenames.
You can also find files with a -regex. For that, see man find. (Or output a list of filenames and pipe it to grep and then use xargs or similar to pass it to a command)

Extracting all but a certain sequence of characters in Bash

In bash I need to extract a certain sequence of letters and numbers from a filename. In the example below I need to extract just the S??E?? section of the filenames. This must work with both upper/lowercase.
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
Expected output would be:
s01e02
s03e12
S05E11
I've been trying to do this with SED but can't get it to work. This is what I have tried, without success:
sed 's/.*s[0-9][0-9]e[0-9][0-9].*//'
Many thanks for any help.
With sed we can match the desired string in a capture group, and use the I suffix for case-insensitive matching, to accomplish the desired result.
For the sake of this answer I'm assuming the filenames are in a file:
$ cat fnames
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
One sed solution:
$ sed -E 's/.*\.(s[0-9][0-9]e[0-9][0-9])\..*/\1/I' fnames
s01e02
s03e12
S05E11
Where:
-E - enable extended regex support
\.(s[0-9][0-9]e[0-9][0-9])\. - match s??e?? with a pair of literal periods as bookends; the s??e?? (wrapped in parens) will be stored in capture group #1
\1 - print out capture group #1
/I - use case-insensitive matching
I think your pattern is ok. With the grep -o you get only the matched part of a string instead of matching lines. So
grep -io 'S[0-9]{2}E[0-9]{2}'
solves your problem. Compared to your pattern only numbers will be matched. Maybe you can put it in an if, so lines without a match show that something is wrong with the filename.
Suppose you have those file names:
$ ls -1
great.s03e12.h264.Dolby.mkv
my.show.s01e02.h264.aac.subs.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
You can extract the substring this way:
$ printf "%s\n" * | sed -E 's/^.*([sS][0-9][0-9][eE][0-9][0-9]).*/\1/'
Or with grep:
$ printf "%s\n" *.m* | grep -o '[sS][0-9][0-9][eE][0-9][0-9]'
Either prints:
s03e12
s01e02
S05E11
You could use that same sed or grep on a file (with filenames in it) as well.

with shell script ,how to find number line in text

I`m trying to find lines which match the pattern x.y or x.y.z, where x,y and z are numbers.
For example, given the lines:
1.0/
2.2.5rc1/
2.3.0/
2.3.1/
abc-1.0.0/
the result should be:
1.0
2.3.0
2.3.1
How can I do this?
Things to know:
Call grep in extended mode using -E.
start the pattern with a ^ to signify you want the search to start at the first character.
To search for a digit, use \d
To search for a dot, use \.
To search for thing1 OR thing1, use thing1|thing2.
Note: As Jonathan Leffler pointed out below, \d is a notation that might not work across all version of grep. Try [0-9] or [[:digit:]] to be compliant in POSIX-standard implementations of grep.
Knowing that, we put it together like so:
grep -E "^(\d.\d.|\d.\d.\d)/" yourfile
You can do
grep -C 2 yourSearch yourFile
To send it in a file, do
grep -C 2 yourSearch yourFile > result.txt
Hope it helps!

Remove everything but one file tcsh

I basically want to do the following bash command but in tcsh:
rm !(file1)
Thanks
You can use ls -1 (that's the number one, not the lowercase letter L) to list one file per line, and then use grep -vx <pattern> to exclude (-v) lines that exactly (-x) match <pattern>, and then xargs it to your command, rm. For example,
ls -1 | grep -vx file1 | xargs rm
In case your version of grep doesn't support the -x option, you can use anchors:
ls -1 | grep -vx '^file1$' | xargs rm
To use this with commands other than rm that may not take an arbitrary number of arguments, remember to add the -n 1 option to xargs so that arguments are handled one by one:
ls -1 | grep -vx '^file1$' | xargs -n 1 rm
I believe you can also achieve this using find's -name option to specify a parameter by negation, i.e. the find utility itself may support expressions like !(file1), though you'll still have to pipe the results to xargs.
tcsh has a special ^ syntax for glob patterns (not supported in csh, sh, or bash). Prefixing a glob pattern with ^ negates it, causing to match all file names that don't match the pattern.
Quoting the tcsh manual:
An entire glob-pattern can also be negated with `^':
> echo *
bang crash crunch ouch
> echo ^cr*
bang ouch
A single file name is not a glob pattern, and so the ^ prefix doesn't apply to it, but it can be turned into one by, for example, surrounding the first character with square brackets.
So this:
rm ^[f]ile1
should remove all files in the current directory other than file1.
I strongly recommend testing this before using it, either by using an echo command first:
echo ^[f]ile1
or by using Ctrl-X * to expand the pattern to a list of files before hitting Enter.
UPDATE: I've since learned that bash supports similar functionality but with a different syntax. In bash, !(PATTERN) matches anything not matched by the pattern. This is not recognized unless the extglob shell option is enabled. Unlike tcsh's ^ syntax, the pattern can be a single file name. This isn't relevant to what you're asking, but it could be useful if you ever decide to switch to bash.
zsh probably has something similar.

Grep characters before and after match?

Using this:
grep -A1 -B1 "test_pattern" file
will produce one line before and after the matched pattern in the file. Is there a way to display not lines but a specified number of characters?
The lines in my file are pretty big so I am not interested in printing the entire line but rather only observe the match in context. Any suggestions on how to do this?
3 characters before and 4 characters after
$> echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}'
23_string_and
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
This will match up to 5 characters before and after your pattern. The -o switch tells grep to only show the match and -E to use an extended regular expression. Make sure to put the quotes around your expression, else it might be interpreted by the shell.
You could use
awk '/test_pattern/ {
match($0, /test_pattern/); print substr($0, RSTART - 10, RLENGTH + 20);
}' file
You mean, like this:
grep -o '.\{0,20\}test_pattern.\{0,20\}' file
?
That will print up to twenty characters on either side of test_pattern. The \{0,20\} notation is like *, but specifies zero to twenty repetitions instead of zero or more.The -o says to show only the match itself, rather than the entire line.
I'll never easily remember these cryptic command modifiers so I took the top answer and turned it into a function in my ~/.bashrc file:
cgrep() {
# For files that are arrays 10's of thousands of characters print.
# Use cpgrep to print 30 characters before and after search pattern.
if [ $# -eq 2 ] ; then
# Format was 'cgrep "search string" /path/to/filename'
grep -o -P ".{0,30}$1.{0,30}" "$2"
else
# Format was 'cat /path/to/filename | cgrep "search string"
grep -o -P ".{0,30}$1.{0,30}"
fi
} # cgrep()
Here's what it looks like in action:
$ ll /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
-rw-r--r-- 1 rick rick 25780 Jul 3 19:05 /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
$ cat /tmp/rick/scp.Mf7UdS/Mf7UdS.Source | cgrep "Link to iconic"
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
$ cgrep "Link to iconic" /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
The file in question is one continuous 25K line and it is hopeless to find what you are looking for using regular grep.
Notice the two different ways you can call cgrep that parallels grep method.
There is a "niftier" way of creating the function where "$2" is only passed when set which would save 4 lines of code. I don't have it handy though. Something like ${parm2} $parm2. If I find it I'll revise the function and this answer.
With gawk , you can use match function:
x="hey there how are you"
echo "$x" |awk --re-interval '{match($0,/(.{4})how(.{4})/,a);print a[1],a[2]}'
ere are
If you are ok with perl, more flexible solution : Following will print three characters before the pattern followed by actual pattern and then 5 character after the pattern.
echo hey there how are you |perl -lne 'print "$1$2$3" if /(.{3})(there)(.{5})/'
ey there how
This can also be applied to words instead of just characters.Following will print one word before the actual matching string.
echo hey there how are you |perl -lne 'print $1 if /(\w+) there/'
hey
Following will print one word after the pattern:
echo hey there how are you |perl -lne 'print $2 if /(\w+) there (\w+)/'
how
Following will print one word before the pattern , then the actual word and then one word after the pattern:
echo hey there how are you |perl -lne 'print "$1$2$3" if /(\w+)( there )(\w+)/'
hey there how
If using ripgreg this is how you would do it:
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
You can use regexp grep for finding + second grep for highlight
echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}' | grep string
23_string_and
With ugrep you can specify -ABC context with option -o (--only-matching) to show the match with extra characters of context before and/or after the match, fitting the match plus the context within the specified -ABC width. For example:
ugrep -o -C30 pattern testfile.txt
gives:
1: ... long line with an example pattern to match. The line could...
2: ...nother example line with a pattern.
The same on a terminal with color highlighting gives:
Multiple matches on a line are either shown with [+nnn more]:
or with option -k (--column-number) to show each individually with context and the column number:
The context width is the number of Unicode characters displayed (UTF-8/16/32), not just ASCII.
I personally do something similar to the posted answers.. but since the dot key, like any keyboard key, can be tapped or held down.. and I often don't need a lot of context(if I needed more I might do the lines like grep -C but often like you I don't want lines before and after), so I find it much quicker for entering the command, to just tap the dot key for how many dots / how many characters, if it's a few then tapping the key, or hold it down for more.
e.g. echo zzzabczzzz | grep -o '.abc..'
Will have the abc pattern with one dot before and two after. ( in regex language, Dot matches any character). Others used dot too but with curly braces to specify repetition.
If I wanted to be strict re between (0 or x) characters and exactly y characters, then i'd use the curlies.. and -P, as others have done.
There is a setting re whether dot matches new line but you can look into that if it's a concern/interest.

Resources