Input redirection to grep - bash

I have a directory with contents like this -
vishal.yadav#droid36:~/Shell$ ls
lazy_dog.txt ls-error.txt ls-output.txt ShellCommands.txt TheTimeMachineHGWells.txt words.txt words.txt.bak
First Command
If I try using ls | grep *.txt I get the following output -
ShellCommands.txt: $ cat > lazy_dog.txt
ShellCommands.txt: $ cat lazy_dog.txt
ShellCommands.txt: $ cat < lazy_dog.txt
ShellCommands.txt:input from the keyboard to the file lazy_dog.txt. We see that the result is the
Second Command
And if I use ls | grep .*.txt I get this as output -
lazy_dog.txt
ls-error.txt
ls-output.txt
ShellCommands.txt
TheTimeMachineHGWells.txt
words.txt
words.txt.bak
Isn't .*.txt and *.txt one and the same?
In the First Command, is the output of ls the regex for grep or is it the list of files?
Similarly, for the Second Command, is the output of ls the regex or list of files?

In the first command (ls | grep *.txt), the output from ls is completely ignored by grep because it sees:
grep lazy_dog.txt ls-error.txt ls-output.txt ShellCommands.txt TheTimeMachineHGWells.txt
It has one pattern lazy_dog.txt and four files, so it reads each file in turn to find the pattern, and prefixes the matching output lines with the name of the file that held the pattern. If there was only one file name, it would not list the file name before the matched lines.
It appears that the only file of the four that grep searches (ls-error.txt, ls-output.txt, ShellCommands.txt, TheTimeMachineHGWells.txt) that contains the text lazy_dog.txt is ShellCommands.txt, so that's what you see in the output. Note that a line containing lazy_dogstxt would also match the regex (but not the shell glob).
In the second command (ls | grep .*.txt), there are no files that match .*.txt, so that argument is passed to grep unexpanded, so it has only a pattern, so it reads its standard input, which is the output from ls this time. All the file names match the regex .*.txt (even though none of them match the shell glob .*.txt), so they're all listed. Note that it would also pick up many other lines, even one containing just "etxt", because the . is a grep metacharacter (and the .*.txt regex matches any string of zero or more characters followed by one arbitrary character and then txt.

do ls -al:
you will find that the current directory is listed as a . and previous directory is listed as ...
So when you say ls | grep .*.txt, the . is taken as path matching from current directory that contains .txt afterwards.

grep see the pattern .*.txt as regex, not glob.
So you can use ls *.txt or ls | grep .*txt

Related

Printing results common to many grep

How can I use two grep statements and print only those files which satisfy both the grep searches....
OR
How can I look for two different String in a file and print the contents of the file if it contain both the Strings ?
Something like this should get you started:
pattern1=your-pattern
pattern2=your-pattern
basedir=/path/to/dir
grep -Zlr "$pattern1" "$basedir" | xargs -0 grep -l "$pattern2"
The key elements:
The -l flag is to print the names of files that matched.
The -Z flag is to output a null-byte after matched filenames
xargs -0 will expect null terminated items in its input
The first grep will find files matching pattern1, the second grep will find files matching pattern2 -> the end result is a list of files matching both patterns.

Piping the contents of a file to ls

I have a file called "input.txt." that contains one line:
/bin
I would like to make the contents of the file be the input of the command ls
I tried doing
cat input.txt | ls
but it doesn't output the list of files in the /bin directory
I also tried
ls < input.txt
to no avail.
You are looking for the xargs (transpose arguments) command.
xargs ls < input.txt
You say you want /bin to be the "input" to ls, but that's not correct; ls doesn't do anything with its input. Instead, you want /bin to be passed as a command-line argument to ls, as if you had typed ls /bin.
Input and arguments are completely different things; feeding text to a command as input is not the same as supplying that text as an argument. The difference can be blurred by the fact that many commands, such as cat, will operate on either their input or their arguments (or both) – but even there, we find an important distinction: what they actually operate on is the content of files whose names are passed as arguments.
The xargs command was specifically designed to transform between those two things: it interprets its input as a whitespace-separated list of command-line arguments to pass to some other command. That other command is supplied to xargs as its command-line argument(s), in this case ls.
Thanks to the input redirection provided by the shell via <, the arguments xargs supplies to ls here come from the input.txt file.
There are other ways of accomplishing the same thing; for instance, as long as input.txt does not have so many files in it that they won't fit in a single command line, you can just do this:
ls $(< input.txt)
Both the above command and the xargs version will treat any spaces in the input.txt file as separating filenames, so if you have filenames containing space characters, you'll have to do more work to interpret the file properly. Also, note that if any of the filenames contain wildcard/"glob" characters like ? or * or [...], the $(<...) version will expand them as wildcard patterns, while xargs will not.
ls takes the filenames from its command line, not its standard input, which | ls and ls < file would use.
If you have only one file listed in input.txt and the filename doesn't contain trailing newlines, it's enough to use (note quotes):
ls "$(cat input.txt)"
Or in almost all but plain POSIX shell:
ls "$(< input.txt)"
If there are many filenames in the file, you'd want to use xargs, but to deal with whitespace in the names, use -d "\n" (with GNU xargs) to take each line as a filename.
xargs -d "\n" ls < input.txt
Or, if you need to handle filenames with newlines, you can separate them using NUL bytes in the input, and use
xargs -0 ls < input.txt
(This also works even if there's only one filename.)
Try xargs
cat file | xargs ls
Ohhh man, I have to put this to get 30 characters long ;)

combine grep and grep -v search together

I am trying to combine grep and grep -v search together.
Output should be display all lines ending with .xml, but to exclude lines starting with $.
Here are the commands I have tried; none worked:
grep *.xml file1.txt | grep -v '$' file1.txt > output
grep *.xml | grep -v '$' file1.txt > output
grep *.xml grep -v '$' file1.txt > output
grep *.xml '$' file1.txt > output
To match a $ at the start of a line, anchor it to the start of the line with ^. Also, $ by itself matches the end of the line (it's a special character, just like ^), and * will not do what you think it does (it works differently in regular expressions compared to in shell globbing patterns). So,
grep -v '^\$'
will filter out all lines starting with a $.
You can do either
grep '\.xml$' file1.txt | grep -v '^\$'
or
grep '^[^$].*\.xml$' file1.txt
to find all lines in the file file1.txt that do not start with $ but that ends with .xml.
Notice that I also escape the dot in .xml as that otherwise matches any character, and that the second command combines both criteria by using a character range ([ ... ]) containing all characters except $ (the .* matches any number of any characters).
The single quotes are necessary so that the shell won't interpret the regular expression as a shell globbing pattern.
You should use "cat" command to direct the output to an file.
And then use regular expression to filter the keyword, in this case all lines start with $ symbol is '^[$]'.
So you can use command cat *.xml | grep -v '^[$]'.

How to grep strings that contain a query string with any 2 characters in the beginning "xxTHISISMYSTRING" from a file?

I have a multi-lined file in the format:
hhhhhhhhhhhhhhhhhhhhhaaaahhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhhoaaaaahhhhhhhhhhhhhh
hhhhhhhhhhhhhbaaaahhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhhhhfbaaaahhhhhhhhhhhh
I want to find all strings that contain the "aaaa" motif as well as the two letters preceding it.
How would I grep out the strings: hhaaaa, oaaaaa, hbaaaa, fbaaaa? With "aaaa" as my input.
To match any character in a regex, use .:
$ grep -o ..aaaa file
hhaaaa
hoaaaa
hbaaaa
fbaaaa
The -o option tells grep to print only the matches, not the context for the matches.
To restrict the match to alphabetic characters, use the alphabetic class:
$ grep -Eo '[[:alpha:]]{2}aaaa' file
hhaaaa
hoaaaa
hbaaaa
fbaaaa
[[:alpha:]] matches any alphabetic character. Unlike A-Z, this is unicode-safe. The {2} indicates two such characters. To avoid backslashes, we have added the -E flag to turn on extended regex.
grep -oh "..aaaa" file.txt
will do.
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
grep -o '..aaaa' file
should do it. Had the objective been to count the total matches, then do:
grep -o '..aaaa' file | wc -l
GREP manpage says :
-o, --only-matching Print only the matched (non-empty) parts of
a matching line, with each such part on a separate output line.
WC manpage says :
-l, --lines print the newline counts

Listing files in date order with spaces in filenames

I am starting with a file containing a list of hundreds of files (full paths) in a random order. I would like to list the details of the ten latest files in that list. This is my naive attempt:
$ ls -las -t `cat list-of-files.txt` | head -10
That works, so long as none of the files have spaces in, but fails if they do as those files are split up at the spaces and treated as separate files. File "hello world" gives me:
ls: hello: No such file or directory
ls: world: No such file or directory
I have tried quoting the files in the original list-of-files file, but the here-document still splits the files up at the spaces in the filenames, treating the quotes as part of the filenames:
$ ls -las -t `awk '{print "\"" $0 "\""}' list-of-files.txt` | head -10
ls: "hello: No such file or directory
ls: world": No such file or directory
The only way I can think of doing this, is to ls each file individually (using xargs perhaps) and create an intermediate file with the file listings and the date in a sortable order as the first field in each line, then sort that intermediate file. However, that feels a bit cumbersome and inefficient (hundreds of ls commands rather than one or two). But that may be the only way to do it?
Is there any way to pass "ls" a list of files to process, where those files could contain spaces - it seems like it should be simple, but I'm stumped.
Instead of "one or more blank characters", you can force bash to use another field separator:
OIFS=$IFS
IFS=$'\n'
ls -las -t $(cat list-of-files.txt) | head -10
IFS=$OIFS
However, I don't think this code would be more efficient than doing a loop; in addition, that won't work if the number of files in list-of-files.txt exceeds the max number of arguments.
Try this:
xargs -a list-of-files.txt ls -last | head -n 10
I'm not sure whether this will work, but did you try escaping spaces with \? Using sed or something. sed "s/ /\\\\ /g" list-of-files.txt, for example.
This worked for me:
xargs -d\\n ls -last < list-of-files.txt | head -10

Resources