My input is a large list of files. They could have any characters in the name (including periods, because there are some package names as well). Here's some small sample input:
com.test.impl.servlets.Test.xml
TestClass.class
TestClass1.class
Test2.java
Test3.java
I want to know all of the different file extensions in my list, so essentially, I want egrep -o everything after the last period. Something like this:
input | xargs <unknown command but probably egrep> | sort -u
Would return:
.xml
.class
.java
You can try grep -o '\.[^.]*$':
$ echo 'com.test.impl.servlets.Test.xml
TestClass.class
TestClass1.class
Test2.java
Test3.java' | grep -o '\.[^.]*$' | sort -u
.class
.java
.xml
or sed 's/.*\././':
$ echo 'com.test.impl.servlets.Test.xml
TestClass.class
TestClass1.class
Test2.java
Test3.java' | sed 's/.*\././' | sort -u
.class
.java
.xml
If your make has pcre compiled in
$ grep -P -o '.*\.\K.*'
Related
This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E '\w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E '\w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
You could do this …
test $(grep -o -E '\w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
Frequently I want to generate a list of files having
the stated condition.
Suppose I want to find all files with a copyright and a main but
without using fcntl or a namespace.
Here is a clumsy approach:
fgrep -i -r -l copyright *|xargs fgrep -i -l main|xargs fgrep -i -l -v fcntl|xargs fgrep -i -l namespace
Does anyone know how to achieve the same result with a more sophisticated approach using standard utilities?
For fun, I have begun to write my own C++17 program to achieve a speedy result but I would love to find my own work unnecessary. Here is my GitHub repository with that code:
https://github.com/jlettvin/Greased-Grep
With (GNU) grep, I would do this as follows:
grep -Flir 'main' . \
| xargs grep -Fli 'copyright' \
| xargs grep -FLi -e 'fcntl' -e 'namespace'
This is quite similar to what you had. To get files not containing a pattern, I use the -L option (you tried -lv – that returns the files that contain at least one line that doesn't match, i.e., typically all files).
For the last step, excluding files that don't match, I can do with just one grep invocation and multiple patterns specified with -e.
To make this more robust and allow for any characters in filenames, you can require that grep separates filenames with a NUL byte (-Z) and xargs expecting that (-0):
grep -FlirZ 'main' . \
| xargs -0 grep -FliZ 'copyright' \
| xargs -0 grep -FLi -e 'fcntl' -e 'namespace'
How can I replace some strings in some files recursively, taking some exclusions into account? For example, I don't want to apply the replacement to binary files or files in .svn directories.
This is the solution I'm currently using, perhaps there is a better way?
grep -irl foobar | grep -v .svn | grep -v Binary | xargs sed -i 's/foobar/baz/g'
I have the following working script to grep in a directory of Many files from some specific strings previously saved into a file.
I use the files extension to grep all files as its name are random and note that every string from my previously file should be searched in all the files.
Also, I cut the outputting grep as it return 2 or 3 lines of the matched file and I only want a specific part that shows the filename.
I might be using something redundant, how it could be faster?
#!/bin/bash
#working but slow
cd /var/FILES_DIRECTORY
while read line
do
LC_ALL=C fgrep "$line" *.cps | cut -c1-27 >> /var/tmp/test_OUT.txt
done < "/var/tmp/test_STRINGS.txt"
grep -F -f /var/tmp/test_STRINGS.txt *.cps | cut -c1-27
Isn't what you're looking for ?
this should speed up your script :
#!/bin/bash
#working fast
cd /var/FILES_DIRECTORY
export LC_ALL=C
grep -f /var/tmp/test_STRINGS.txt *.cps | cut -c1-27 > /var/tmp/test_OUT.txt
When I look for log files with an error message using grep error *log, it returns a list of logfiles
$grep error *log
Binary file out0080-2011.01.07-12.38.log matches
Binary file out0081-2011.01.07-12.38.log matches
Binary file out0082-2011.01.07-12.38.log matches
Binary file out0083-2011.01.07-12.38.log matches
However, these are text, not binary files.
I am not sure why these are considered binary, the first few lines contain the following non-error messages:
out0134
-catch_rsh /opt/gridengine/default/spool/compute-0-17/active_jobs/327708.1/pe_hostfile
compute-0-17
I would like to grep the contents of the returned files for an error message and return the names of the files with the message.
How can I grep the contents of the returned files, rather than this list of returned files, as happens with grep error *log | grep foo?
Here's the answer you might be looking for:
grep -l foo $(grep -l error *.log)
-l tells grep to print filenames only; that does the first grep, then substitutes the result into the next grep's command. Alternatively, if you like xargs:
grep -l error *.log | xargs grep -l foo
which does the same thing, using xargs to call the second grep with the first grep's results as arguments.
-a, --text
Process a binary file as if it were text; this is equivalent to
the --binary-files=text option.
grep -a "some error message" *.log
Btw, here is how grep determines binary from text files
If the first few bytes of a file
indicate that the file contains binary
data, assume that the file is of type
TYPE. By default, TYPE is
binary...
Update
If you want just a list of file names which contain the word foo within the line that also contains error then you can do one or the other of these:
grep -la "error.*foo" *.log <-- assumes foo comes after error
I do this.
$find . -type f -name *.log | fgrep -v
[anything unwanted] | xargs grep -i
[search inside files]
A comment asked about how to only grep for foo in the files that match the error, you can:
for i in *log ; do
grep -a error $i >/dev/null 2>&1 && {
echo -e "File $i:\n\t"
grep -a foo $i
}
done