Bash script - search for files whose name matches a pattern

Bash script - search for files whose name matches a pattern - bash

I am trying to compile a simple bash script. It should search for files whose name matches a supplied pattern (pattern is supplied as an argument) and list a few first lines of the file. All the files will be in one directory.
I know I should use head -n 3 for listing the first few lines of the file, but I have no idea how to search for that supplied pattern and how to put it together.
Thank you very much for all the answers.

No need really, the shell will do patterns for you:
head -3 *.c
==> it.c <==
#include<stdio.h>
int main()
{
==> sem.c <==
#include <stdio.h> /* printf() */
#include <stdlib.h> /* exit(), malloc(), free() */
#include <sys/types.h> /* key_t, sem_t, pid_t */
==> usbtest.c <==
Another example:
head -3 file[0-9]
==> file1 <==
file1 line 1
file1 line 2
file1 line 3
==> file2 <==
file2 line 1
file2 line 2
file2 line 3
==> file9 <==
file9 line 1
file9 line 2
file9 line 3

Bash has a globstar option that when set will enable you to use ** to search subdirectories:
head -3 **/mypattern*.txt
To set globstar you can add the following to your .bashrc:
shopt -s globstar

find . -type f -name 'mypattern*.txt' -exec head -n 3 {} \;
Add a -maxdepth 0 before the -exec if you do not want to descend into subdirectories.

Related

Bash: How do I check (and return) the results of a command filtered by file content

I executed a command on Linux to list all the files & subfiles (with specific format) in a folder.
This command is:
ls -R | grep -e "\.txt$" -e "\.py$"
In an other hand, I have some filenames stored in a file .txt (line by line).
I want to show the result of my previous command, but I want to filter the result using the file called filters.txt.
If the result is in the file, I keep it
Else, I do not keep it.
How can I do it, in bash, in only one line?
I suppose this is something like:
ls -R | grep -e "\.txt$" -e "\.py$" | grep filters.txt
An example of the files:
# filters.txt
README.txt
__init__.py
EDIT 1
I am trying to a file instead a list of argument because I get the error:
'/bin/grep: Argument list too long'
EDIT 2
# The result of the command ls -R
-rw-r--r-- 1 XXX 1 Oct 28 23:36 README.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 __init__.py
-rw-r--r-- 1 XXX 1 Oct 28 23:36 iamaninja.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 donttakeme.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 donttakeme2.txt
What I want as a result:
-rw-r--r-- 1 XXX 1 Oct 28 23:36 README.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 __init__.py

You can use comm :
comm -12 <(ls -R | grep -e "\.txt$" -e "\.py$" ) <(cat filters.txt)
This will give you the intersection of the two lists.
EDIT
It seems that ls is not great for this, maybe find Would be safer

find . -type f | xargs grep $(sed ':a;N;$!ba;s/\n/\\|/g' filters.txt)
That is, for each of your files, take your filters.txt and replace all newlines with \| using sed and then grep for all the entries.
Grep uses \| between items when grepping for more than one item. So the sed command transforms the filters.txt into such a list of items to be used by grep.

grep -f filters.txt -r .
..where . is your current folder.

You can run this script in the target directory, giving the list file as a single argument.
#!/bin/bash -e
# exit early if awk fails (ie. can't read list)
shopt -s lastpipe
find . -mindepth 1 -type f -name '*.txt' -o -name '*.py' -print0 |
awk -v exclude_list_file="${1?:no list file provided}" \
'BEGIN {
while ((getline line < exclude_list_file) > 0) {
exclude_list[c++] = line
}
close(exclude_list_file)
if (c==0) {
exit 1
}
FS = "/"
RS = "\000"
}
{
for (i in exclude_list) {
if (exclude_list[i] == $NF) {
next
}
}
print
}'
It prints all paths, recursively, excluding any filename which exactly matches a line in the list file (so lines not ending .py or .txt wouldn’t do anything).
Only the filename is considered, the preceding path is ignored.
It fails immediately if no argument is given or it can't read a line from the list file.
The question is tagged bash, but if you change the shebang to sh, and remove shopt, then everything in the script except -print0 is POSIX. -print0 is common, it’s available on GNU (Linux), BSDs (including OpenBSD), and busybox.
The purpose of lastpipe is to exit immediately if the list file can't be read. Without it, find keeps runs until completion (but nothing gets printed).
If you specifically want the ls -l output format, you could change awk to use a null output record separator (add ORS = "\000" to the end of BEGIN, directly below RS="\000"), and pipe awk in to xargs -0 ls -ld.

Remove text files with less than three lines

I'm using an Awk script to split a big text document into independent files. I did it and now I'm working with 14k text files. The problem here is there are a lot of files with just three lines of text and it's not useful for me to keep them.
I know I can delete lines in a text with awk 'NF>=3' file, but I don't want to delete lines inside files, rather I want to delete files which content is just two or three text lines.
Thanks in advance.

Could you please try following findcommand.(tested with GNU awk)
find /your/path/ -type f -exec awk -v lines=3 'NR>lines{f=1; exit} END{if (!f) print FILENAME}' {} \;
So above will print file names who are having lesser than 3 lines on console. Once you are happy with results coming then try following to delete them. Only once you are ok with above command's output run following and even I will suggest run below command in a test directory first and once you are fully satisfied then proceed with below one.(remove echo from below I have still put it for safer side :) )
find /your/path/ -type f -exec awk -v lines=3 'NR>lines{f=1; exit} END{exit !f}' {} \; -exec echo rm -f {} \;

If the files in the current directory are all text files, this should be efficient and portable:
for f in *; do
[ $(head -4 "$f" | wc -l) -lt 4 ] && echo "$f"
done # | xargs rm
Inspect the list, and if it looks OK, then remove the # on the last line to actually delete the unwanted files.
Why use head -4? Because wc doesn't know when to quit. Suppose half of the text files were each more than a terabyte long; if that were the case wc -l alone would be quite slow.

You may use wc to calculate lines and then decide either to delete the file or not. you should write a shell script instead of just awk command.

You can try Perl. The below solution will be efficient as the file handle ARGV will be closed if the line count > 3
perl -nle ' close(ARGV) if ($.>3) ; $kv{$ARGV}++; END { for(sort keys %kv) { print if $kv{$_}>3 } } ' *
If you want to pipe the output of some other command (say find) you can use it like
$ find . -name "*" -type f -exec perl -nle ' close(ARGV) if ($.>3) ; $kv{$ARGV}++; END { for(sort keys %kv) { print if $kv{$_}>3 } } ' {} \;
./bing.fasta
./chris_smith.txt
./dawn.txt
./drcatfish.txt
./foo.yaml
./ip.txt
./join_tab.pl
./manoj1.txt
./manoj2.txt
./moose.txt
./query_ip.txt
./scottc.txt
./seats.ksh
./tane.txt
./test_input_so.txt
./ya801.txt
$
the output of wc -l * on the same directory
$ wc -l *
12 bing.fasta
16 chris_smith.txt
8 dawn.txt
9 drcatfish.txt
3 fileA
3 fileB
13 foo.yaml
3 hubbs.txt
8 ip.txt
19 join_tab.pl
6 manoj1.txt
6 manoj2.txt
5 moose.txt
17 query_ip.txt
3 rororo.txt
5 scottc.txt
22 seats.ksh
1 steveman.txt
4 tane.txt
13 test_input_so.txt
24 ya801.txt
200 total
$

Sed: delete lines after a pattern for all occurences

I needed some help with sed. I am trying to delete 3 lines after a pattern for all occurrences in a file. I do
sed '/pattern/,+3d' file.
This only deletes 3 lines and the pattern for the first occurrence but just deletes the pattern for the second occurrence but not the lines after which is really confusing. Can anyone please help with what am I doing wrong?

I think awk is better for the task. For example,
$ cat file
1
2
4
a0
1
a1
1
2
3
4
5
Run
awk '
flag { i ++ }
i == 3 { flag = 0 }
!flag
/a/ { flag = 1; i = 0 }
' file
Output
1
2
4
a0
3
4
5

This might work for you (GNU sed):
sed -n '/regexp/{p;:a;n;//ba;n;//ba;n;//ba;d};p' file
If the regexp is encountered, print the current line and then delete the following 3 lines. At any time whilst reading these 3 lines, the regexp occurs, reset the count.
If the regexp is also to be deleted, use:
sed -n '/regexp/{:a;n;//ba;n;//ba;n;//ba;d};p' file

how to loop over pattern from a file with grep [duplicate]

This question already has answers here:
Looping through the content of a file in Bash
(16 answers)
Closed 7 years ago.
I am trying to grep a series of patterns that are, one per line, in a text file (list.txt), to find how many matches there are of each pattern in the target file (file1). The target file looks like this:
$ cat file1
2346 TGCA
2346 TGCA
7721 GTAC
7721 GTAC
7721 CTAC
And I need counts of each numerical pattern (2346 and 2271).
I have this script that works if you provide a list of the patterns in quotes:
$ for p in '7721' '2346'; do printf '%s = ' "$p"; grep -c "$p" file1; done
7721 = 3
2346 = 2
What I would like to do is search for all patterns in list.txt:
$ cat list.txt
7721
2346
6555
25425
22
125
....
19222
How can I convert my script above to look in the list.txt and search for each pattern, and return the same output as above (pattern = count) e.g.:
2346 = 2
7721 = 3
....
19222 = 6

try this awk oneliner:
awk 'NR==FNR{p[$0];next}$1 in p{p[$1]++}END{for(x in p)print x, p[x]}' list.txt file

head command to skip last few lines of file on MAC OSX

I want to output all lines of a file, but skip last 4, on Terminal.
As per UNIX man page following could be a solution.
head -n -4 main.m
MAN Page:
-n, --lines=[-]N
print the first N lines instead of the first 10; with the lead-
ing '-', print all but the last N lines of each file
I read man page here. http://unixhelp.ed.ac.uk/CGI/man-cgi?head
But on MAC OSx I get following error.
head: illegal line count -- -4
What else can be done to achieve this goal?

GNU version of head supports negative numbers.
brew install coreutils
ghead -n -4 main.m

Use awk for example:
$ cat file
line 1
line 2
line 3
line 4
line 5
line 6
$ awk 'n>=4 { print a[n%4] } { a[n%4]=$0; n=n+1 }' file
line 1
line 2
$
It can be simplified to awk 'n>=4 { print a[n%4] } { a[n++%4]=$0 }' but I'm not sure if all awk implementations support it.

A Python one-liner:
$ cat foo
line 1
line 2
line 3
line 4
line 5
line 6
$ python -c "import sys; a=[]; [a.append(line) for line in sys.stdin]; [sys.stdout.write(l) for l in a[:-4]]" < foo
line 1
line 2

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash script - search for files whose name matches a pattern - bash

Bash has a globstar option that when set will enable you to use to search subdirectories: head -3 /mypattern*.txt To set globstar you can add the following to your .bashrc: shopt -s globstar

find . -type f -name 'mypattern*.txt' -exec head -n 3 {} \; Add a -maxdepth 0 before the -exec if you do not want to descend into subdirectories.

Related

Bash: How do I check (and return) the results of a command filtered by file content

Remove text files with less than three lines

Sed: delete lines after a pattern for all occurences

how to loop over pattern from a file with grep [duplicate]

head command to skip last few lines of file on MAC OSX

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash script - search for files whose name matches a pattern - bash

Bash has a globstar option that when set will enable you to use ** to search subdirectories: head -3 **/mypattern*.txt To set globstar you can add the following to your .bashrc: shopt -s globstar

find . -type f -name 'mypattern*.txt' -exec head -n 3 {} \; Add a -maxdepth 0 before the -exec if you do not want to descend into subdirectories.

Related

Bash: How do I check (and return) the results of a command filtered by file content

Remove text files with less than three lines

Sed: delete lines after a pattern for all occurences

how to loop over pattern from a file with grep [duplicate]

head command to skip last few lines of file on MAC OSX

Categories

Resources

Bash has a globstar option that when set will enable you to use to search subdirectories: head -3 /mypattern*.txt To set globstar you can add the following to your .bashrc: shopt -s globstar