Recursively find and open files - bash

I want to search through all subdirectories and files to find files with a specific extension. When I find a file with the extension, I need to open it, find a specific string from within the file and store it within a txt file.
This is what I have so far for finding all of the correct files:
find . -name ".ext" ! -path './ExcludeThis*'
This is what I have for opening the file and getting the part of the file I want and storing it:
LINE=$(head .ext | grep search_string)
SUBSTR=$(echo $LINE | cut -f2 -d '"')
echo $SUBSTR >> results.txt
I am struggling for how to combine the 2 together, I have looked at 'for f in **/*' and then run an if statement in there to see if it matches the .ext and remove the need for find all together but **/* seems to work on directories only and not files.
A break down of any solutions would be very much appreciated too, I am new to shell scripting. Thanks.

find -name "*.ext" \! -path './ExcludeThis*' -exec head -q '{}' \+ |
grep search_string | cut -f2 -d'"' >> results.txt
find explanation
find -name "*.ext" \! -path './ExcludeThis*' -exec head -q '{}' \+
For each file name matched, executes head (with \+, the command line is built by appending each selected file name at the end, so the total number of invocations of the command will be much less than the number of matched files).
Notice I replaced .ext with *.ext (the first way just math a file named exactly .ext), and ! with \! (protection from interpretation by the shell).
The head option -q is necessary because that command prints headers when used with multiple files (due to \+ in this case).
In addition, if no path is given, the default is taken (.). i.e.: find . -name = find -name.
pipeline explanation
<find ... -exec head> | grep search_string | cut -f2 -d'"' >> results.txt
While head write the lines (10 by default) for every file in the pipe, grep read them.
If grep matches search_string in some of them, write those lines in the next pipe.
At the same time, cut take the second fields (delimited by ") of every line and appends them in results.txt

Related

How to sort files based on filename length with subdirectories?

I am trying to look at a directory named Forever where it has sub-directories with Pure,Mineral which are filled with .csv files. I was able to see all the .csv files in the directory, but I am having hard time sorting them according to the length of filename.
As for current directory, I am at Forever. So I am looking at both sub-directories Pure and Mineral.
What I did was:
find -name ".*csv" | tr ' ' '_' | sort -n -r
This just sorts the file reverse alphabetically, which doesn't consider the length.(I had to truncate some name of the files as it had spaces between them.)
I think this answer is more helpful than the marked duplicate because it also accounts for sub-dirs (which the dupe didn't):
find . -name '*.csv' -exec bash -c 'echo -e $(wc -m <<< $(basename {}))\\t{}' \; | sort -nr | cut -f2
FWIW using fd -e csv -x ... was quite a bit faster for me (0.153s vs find's 2.084s)
even though basename removes the file ext, it doesn't matter since find ensures that all of them have it

How to grep files in date order

I can list the Python files in a directory from most recently updated to least recently updated with
ls -lt *.py
But how can I grep those files in that order?
I understand one should never try to parse the output of ls as that is a very dangerous thing to do.
You may use this pipeline to achieve this with gnu utilities:
find . -maxdepth 1 -name '*.py' -printf '%T#:%p\0' |
sort -z -t : -rnk1 |
cut -z -d : -f2- |
xargs -0 grep 'pattern'
This will handle filenames with special characters such as space, newline, glob etc.
find finds all *.py files in current directory and prints modification time (epoch value) + : + filename + NUL byte
sort command performs reverse numeric sort on first column that is timestamp
cut command removes 1st column (timestamp) from output
xargs -0 grep command searches pattern in each file
There is a very simple way if you want to get the filelist in chronologic order that hold the pattern:
grep -sil <searchpattern> <files-to-grep> | xargs ls -ltr
i.e. you grep e.g. "hello world" in *.txt, with -sil you make the grep case insensitive (-i), suppress messages (-s) and just list files (-l); this you then pass on to ls (| xargs), sorting it by date (-t) showing date (-l) and all files (-a).

Grep part of a file name and output to a .txt

I'm trying to grep pattern (the first 8 characters) for all files names in a directory and output to a .txt using this but it's not working.
Why is that?
find . -type f -print | grep "^........" > test.txt
it still outputs the whole file name to the .txt
No need to use to grep at all, you can use the cut command to get the first 1-N characters without pattern matching:
find . -type f -print | cut -c1-8 > test.txt
You're passing the output of the find command to grep, rather than passing the output as a list of files for grep to search. You can fix it with xargs like this:
find . -type f -print | xargs grep "^........" > test.txt

Grep to Print all file content [duplicate]

This question already has answers here:
Colorized grep -- viewing the entire file with highlighted matches
(24 answers)
Closed 4 years ago.
How can I modify grep so that it prints full file if its entry matches the grep pattern , instead of printing Just the matching line ?
I tried using(say) grep -C2 to print two lines above and 2 below but this doesn't always works as no. of lines is not fixed ..
I am not Just searching a single file , I am searching an entire directory where some files may contain the given pattern and I want those Files to be completely Printed.
I am also using grep inside grep result without getting printed the first grep output.
Simple grep + cat combination:
grep 'pattern' file && cat file
Use grep's -l option to list the paths of files with matching contents, then print the contents of these files using cat.
grep -lR 'regex' 'directory' | xargs -d '\n' cat
The command from above cannot handle filenames with newlines in them.
To overcome the filename with newlines issue and also allow more sophisticated checks you can use the find command.
The following command prints the content of all regular files in directory.
find 'directory' -type f -exec cat {} +
To print only the content of files whose content matches the regexes regex1 and regex2, use
find 'directory' -type f \
-exec grep -q 'regex1' {} \; -and \
-exec grep -q 'regex2' {} \; \
-exec cat {} +
The linebreaks are only for better readability. Without the \ you can write everything into one line.
Note the -q for grep. That option supresses grep's output. grep's exit status will tell find whether to list a file or not.

SHELL printing just right part after . (DOT)

I need to find just extension of all files in directory (if there are 2 same extensions, its just one). I already have it. But the output of my script is like
test.txt
test2.txt
hello.iso
bay.fds
hellllu.pdf
Im using grep -e -e '.' and it just highlight DOTs
And i need just these extensions give in one variable like txt,iso,fds,pdf
Is there anyone who could help? I already had it one time but i had it on array. Today I found out It's has to work on dash too.
You can use find with awk to get all unique extensions:
find . -type f -name '?*.?*' -print0 |
awk -F. -v RS='\0' '!seen[$NF]++{print $NF}'
can be done with find as well, but I think this is easier
for f in *.*; do echo "${f##*.}"; done | sort -u
if you want to assign a comma separated list of the unique extensions, you can follow this
ext=$(for f in *.*; do echo "${f##*.}"; done | sort -u | paste -sd,)
echo $ext
csv,pdf,txt
alternatively with ls
ls -1 *.* | rev | cut -d. -f1 | rev | sort -u | paste -sd,
rev/rev is required if you have more than one dot in the filename, assuming the extension is after the last dot. For any other directory simply change the part *.* to dirpath/*.* in all scripts.
I'm not sure I understand your comment. If you don't assign to a variable, by default it will print to the output. If you want to pass directory name as a variable to a script, put the code into a script file and replace dirpath with $1, assuming that will be your first argument to the script
#!/bin/bash
# print unique extension in the directory passed as an argument, i.e.
ls -1 "$1"/*.* ...
if you have sub directories with extensions above scripts include them as well, to limit only to file types replace ls .. with
find . -maxdepth 1 -type f -name "*.*" | ...

Resources