bash list files of a particular naming convention - bash

Operating System - Linux (Ubuntu 20.04)
I have a directory with thousands of files in it. The file names range anything from a.daily.csv to a.b.daily.csv to a.b.c.daily.csv to a.b.c.d.daily.csv to a.b.c.d.e.daily.csv
The challenge I'm having is in listing just a.daily.csv or a.b.daily.csv and so on. That is to say with "daily.csv" as the fixed part, I would like to be able to wildcard what is in front of it with "." being the delimiter between the fields
I tried a few wildcards such as ? [a-zA-Z0-9] & so on but unable to achieve this. Please could I get some guidance
Please note a,b,c etc are placeholders I'm using to post the question. In real world, a,b,c are alphanumeric words
Example -
PAHKY.daily.csv
TYUI.GHJ.WE.daily.csv
WGGH.FGH.daily.csv
98KJL-GHR.YUI.daily.csv
67HJE.HJQ.ATD.HJ.daily.csv
If I want to list all those files that are like PAHKY.daily.csv where thre is only one filed (dot being the delimiter) in front of daily.csv, how could I do this?

If you enable the extglob option:
$ shopt -s extglob
you can use extended pattern matching operators like *(pattern) for zero or more of pattern. Knowing that [^.] matches any character but a dot, this leads to:
$ ls *([^.]).daily.csv
PAHKY.daily.csv
to obtain all a.daily.csv files. For the next group:
$ ls *([^.]).*([^.]).daily.csv
WGGH.FGH.daily.csv 98KJL-GHR.YUI.daily.csv
and so on. Replace *(pattern) by +(pattern) if you want to match one or more of pattern instead of zero or more.

You use grep with ls, as grep works well with regex
Try something like this,
^a\.b\.c\.data\.csv$
ls | grep 'Your Expression'
Fact, you can even use find without piping to grep

This should work:
ls |grep -Po '([A-Za-z0-9\-\.]?)+.daily.csv'
Explanation:
-P, --perl-regexp
-o, --only-matching
[A-Za-z0-9\-\.] --match the group of characters : (A-Z,a-z,0-9,-,.)
() -- to capture a group
? -- matches zero or one of the previous RE.
+ -- matches one or more of the previous RE
Output:
67HJE.HJQ.ATD.HJ.daily.csv
98KJL-GHR.YUI.daily.csv
PAHKY.daily.csv
TYUI.GHJ.WE.daily.csv
WGGH.FGH.daily.csv

Related

Why does the second grep command not work?

I have a folder named "components" and within that folder a file name "apple"
If I cd to "components" folder and execute the following command:
ls | grep -G a*e
It works and returns apple correctly.
However, if I do not cd to components folder and execute the following command:
ls components | grep -G a*e
It does not work and returns blank. What could be the reason?
A third grep command below works fine.
ls components | grep ap
The actual filename I am grepping is complex. So I need the grep -G tag to work.
Unquoted, a*e is a shell glob pattern that is expanded by the shell before grep runs.
When you are in the directory, this:
ls | grep -G a*e
becomes
ls | grep -G apple
As you have a file named 'apple' this matches.
When you are not in the folder, and you run:
ls components | grep -G a*e
the shell again attempts to expand the glob pattern.
If there is any file in your current directory that matches (for example, "abalone"), then the glob will expand to that. It may expand to multiple strings if there is more than one such filename (for example, "abalone", "algae"). The command becomes something like:
ls components | grep -G abalone
ls components | grep -G abalone algae
In the first case, you will get blank output unless components directory also contains that filename.
In the second case, grep will ignore the directory entirely and attempt to find the string "abalone" inside the file "algae".
There is a third possibility: the glob fails to find anything. In this case, grep will receive the regexp a*e. The -G option to grep tell it to use BRE-style regexp. With these, a*e means "zero or more a followed by e". This is equivalent to saying "contains e".
In that case, you should see apple in your results regardless of whether you are in components or not. In a comment, you say that ls components | grep "a*e" returned nothing. As quoting should force precisely the same result as this third case, this is surprising.
Note that if you are intending to use globs you don't need grep at all:
cd components
ls a*e
ls components/a*e
a*e is a glob, not a regex. It's important to understand the difference.
The shell expands globs in unquoted arguments by matching the argument with available files. The * in a*e means "any sequence of characters not containing a directory separator", so it will match the filename apple (or accolade.node) as long as that file is present in the current directory. Glob matches are complete, not substring matches.
So when you execute grep a*e in a directory which contains the file apple, the shell will replace a*e with the word apple before invoking grep, making the command grep apple. If the directory also contained the file accolade.node, the shell would have put that into the command line as well; grep accolade.node apple. That's very rarely what you want to happen to grep arguments (other than filename arguments), so it's highly recommended to get into the habit of quoting arguments.
Unlike the shell, grep is based on regular expression matching. In a regular expression, * means "any number of repetitions of the previous element", so the regular expression a*e will match e, ae, aae, aaae, and so on. Since grep does substring matching (by default), those strings could be anywhere in the line being matched. That will match the e in apple, for example, but it will also match any other line which contains an e, such as electronics. (That makes it a bit surprising that ls components | grep "a*e" did not match components/apple. Perhaps there was some typing problem.)
In order to match a followed by a sequence of arbitrary characters followed by an e, you could use the regular expression a.*e (i.e. grep "a.*e" -- note the use of quotes to avoid having the shell try to expand that argument as a glob). But that will probably match too much, if you're expecting it to do the same thing as the glob a*e. You might want to add some restrictions. For example, grep -w forces the match to be complete words. And (with gnu grep, at least) you can use grep -w "a\S*e" to match a complete word which starts with a and ends with e, using the \S shortcut (any character other than whitespace).
You very rarely want to use -G, by the way, particularly since it's the default (unfortunately). Most of the time, you'll want to use grep -E in order to not have to insert backslashes throughout your pattern. Please read man 7 regex for a quick overview of regex syntax and the difference between basic and extended Posix regexes. man grep is also useful, of course.

Displaying file name using special characters

I've one file ABC_123.csv in my app directory and I want to display its full name. I found two ways to do it (see below code snippet): one using ??? and the other using asterisk at the end of required text ABC_.
But, both ways are also displaying the path along with the name. Both below commands are producing results in this format: path + name. I only need the name. Is there any special character (like ? or *) to display the name file only?
[input]$ ls /usr/opt/app/ABC_???.csv
[output] /usr/opt/app/ABC_123.csv
[input]$ ls /usr/opt/app/ABC_*.csv
[output] /usr/opt/app/ABC_123.csv
I cannot do this:
[input]$ cd /usr/opt/app
[input]$ ls ABC_???.csv
[output] ABC_123.csv
Required output:
[input]$ ls /usr/opt/app/ABC_(some-special-character).csv
[output] ABC_123.csv
[Edited] basename is working, but, I want to achieve this using ls and some special character (as highlighted above in Required output). Is there any way to this?
[input]$ basename /usr/opt/app/ABC_???.csv
[output] ABC_123.csv
You can use
find /usr/opt/app/ -type f -name "ABC_*.csv" -exec basename '{}' \;
basename will isolate the file name. find will search the specified directory for files that match the provided pattern.
If you were limited to ls, then this might help.
file="$(echo /usr/opt/app/ABC_???.csv)"; echo "${file##*/}"
Pipe every csv file to basename:
ls /usr/opt/app/ABC_*.csv | xargs basename -a
Without a need for basename:
(cd /usr/opt/app/ && ls ABC_*.csv)
"I cannot do this" was similar but didn't explain why, so maybe one-liner is doable. Doing in sub-shell prevents current dir from changing.
And no - there is no special character that could be used there. It's globbing: https://en.wikipedia.org/wiki/Glob_(programming)
Pure bash:
files=( /usr/opt/app/ABC_???.csv ) # Expands to array of matching filename(s)
printf "%s\n" "${file[#]##*/}"
The ## part of the expansion removes the longest leading string that matches the following pattern from each element of the array (So it works if this pattern matches only a single file like your question says, or if it matches more than one). */ will thus cut off everything up to and including the last slash in the string.

grep ignore if the word searched begin/end with a specific character

Here is my problem : I use grep to find a string into multiple files.
Let's say I am looking for the word "balloon". grep is returning lines containing balloon like "Here is a balloon", "loginxxballoonx", "balloon123" etc. This is not a problem except for one case : I want to ignore the line if it finds "/balloon/".
How can I look for every "balloon" strings in multiple files, but ignore those with / before and after (ignore "/balloon/")
EDIT : I will precise my problem a bit more : my strings to search for are stored in a file. I use grep -f mytokenfile to search for every strings stored in my "mytokenfile" file. For example, my file "mytokenfile" looks like this :
balloon
avion
car
bus
I would like to get all the lines containing these strings, with or without prefixes/suffixes, except if the prefix and suffix are "/".
Should work by using the negation sign ^
grep [^/]balloon[^/] ballonfile
Edit:
But this doesn't work if there is a 'balloon' not prefixed or suffixed by any other characters.
Use the following approach(considering that there could be a line with multiple occurrences of search keyword such as loginxxballoonx, sme text /balloon/ text):
cat testfile | grep '[^/]balloon[^/]' | grep -v '/balloon/'
-v (--invert-match)
Invert the sense of matching, to select non-matching lines. (-v is
specified by POSIX.)

grep exact pattern from a file in bash

I have the following IP addresses in a file
3.3.3.1
3.3.3.11
3.3.3.111
I am using this file as input file to another program. In that program it will grep each IP address. But when I grep the contents I am getting some wrong outputs.
like
cat testfile | grep -o 3.3.3.1
but I am getting output like
3.3.3.1
3.3.3.1
3.3.3.1
I just want to get the exact output. How can I do that with grep?
Use the following command:
grep -owF "3.3.3.1" tesfile
-o returns the match only and not the whole line.-w greps for whole words, meaning the match must be enclosed in non word chars like <space>, <tab>, ,, ; the start or the end of the line etc. It prevents grep from matching 3.3.3.1 out of 3.3.3.111.
-F greps for fixed strings instead of patterns. This prevents the . in the IP address to be interpreted as any char, meaning grep will not match 3a3b3c1 (or something like this).
To match whole words only, use grep -ow 3.3.3.1 testfile
UPDATE: Use the solution provided by hek2mgl as it is more robust.
You may use anhcors.
grep '^3\.3\.3\.1$' file
Since by default grep uses regex, you need to escape the dots in-order to make grep to match literal dot character.

with shell script ,how to find number line in text

I`m trying to find lines which match the pattern x.y or x.y.z, where x,y and z are numbers.
For example, given the lines:
1.0/
2.2.5rc1/
2.3.0/
2.3.1/
abc-1.0.0/
the result should be:
1.0
2.3.0
2.3.1
How can I do this?
Things to know:
Call grep in extended mode using -E.
start the pattern with a ^ to signify you want the search to start at the first character.
To search for a digit, use \d
To search for a dot, use \.
To search for thing1 OR thing1, use thing1|thing2.
Note: As Jonathan Leffler pointed out below, \d is a notation that might not work across all version of grep. Try [0-9] or [[:digit:]] to be compliant in POSIX-standard implementations of grep.
Knowing that, we put it together like so:
grep -E "^(\d.\d.|\d.\d.\d)/" yourfile
You can do
grep -C 2 yourSearch yourFile
To send it in a file, do
grep -C 2 yourSearch yourFile > result.txt
Hope it helps!

Resources