How to sort files based on filename length with subdirectories? - bash

I am trying to look at a directory named Forever where it has sub-directories with Pure,Mineral which are filled with .csv files. I was able to see all the .csv files in the directory, but I am having hard time sorting them according to the length of filename.
As for current directory, I am at Forever. So I am looking at both sub-directories Pure and Mineral.
What I did was:
find -name ".*csv" | tr ' ' '_' | sort -n -r
This just sorts the file reverse alphabetically, which doesn't consider the length.(I had to truncate some name of the files as it had spaces between them.)

I think this answer is more helpful than the marked duplicate because it also accounts for sub-dirs (which the dupe didn't):
find . -name '*.csv' -exec bash -c 'echo -e $(wc -m <<< $(basename {}))\\t{}' \; | sort -nr | cut -f2
FWIW using fd -e csv -x ... was quite a bit faster for me (0.153s vs find's 2.084s)
even though basename removes the file ext, it doesn't matter since find ensures that all of them have it

Related

How to grep files in date order

I can list the Python files in a directory from most recently updated to least recently updated with
ls -lt *.py
But how can I grep those files in that order?
I understand one should never try to parse the output of ls as that is a very dangerous thing to do.
You may use this pipeline to achieve this with gnu utilities:
find . -maxdepth 1 -name '*.py' -printf '%T#:%p\0' |
sort -z -t : -rnk1 |
cut -z -d : -f2- |
xargs -0 grep 'pattern'
This will handle filenames with special characters such as space, newline, glob etc.
find finds all *.py files in current directory and prints modification time (epoch value) + : + filename + NUL byte
sort command performs reverse numeric sort on first column that is timestamp
cut command removes 1st column (timestamp) from output
xargs -0 grep command searches pattern in each file
There is a very simple way if you want to get the filelist in chronologic order that hold the pattern:
grep -sil <searchpattern> <files-to-grep> | xargs ls -ltr
i.e. you grep e.g. "hello world" in *.txt, with -sil you make the grep case insensitive (-i), suppress messages (-s) and just list files (-l); this you then pass on to ls (| xargs), sorting it by date (-t) showing date (-l) and all files (-a).

Bash how pipe sort, find and grep

I'm trying to write a shell script that take the a line of a file that contain a specific number, the problem is that i need this file sorted because i need the line of the last file with some specific name.
I have write this code but it seems doesn't work
sort -n | find -name '*foo*' -exec grep -r -F '11111' {} \;
Is really important that the files are sorted because i need to search in the last file. The files name are of type "res_yyyy_mm_dd_foo" and they have to be ordered by yyyy and if are the same by mm and so
Sounds like the following would do it:
cd home/input_output.1/inp.1/old_res23403/
ls -1 | sort -r | xargs cat -- | grep '11111' | head -n1
ls -1 produces a list of filenames in the current directory, one per line.
sort -r sorts them in reverse alphabetical order, which (given that your names use a big-endian date format) puts the latest files first.
xargs cat -- concatenates the contents of all those files.
grep '11111' finds all lines containing 11111.
head -n1 limits results to the first such line.
In effect this gives you the first matching line of the files in reverse order, i.e. the last such line.

Recursively find and open files

I want to search through all subdirectories and files to find files with a specific extension. When I find a file with the extension, I need to open it, find a specific string from within the file and store it within a txt file.
This is what I have so far for finding all of the correct files:
find . -name ".ext" ! -path './ExcludeThis*'
This is what I have for opening the file and getting the part of the file I want and storing it:
LINE=$(head .ext | grep search_string)
SUBSTR=$(echo $LINE | cut -f2 -d '"')
echo $SUBSTR >> results.txt
I am struggling for how to combine the 2 together, I have looked at 'for f in **/*' and then run an if statement in there to see if it matches the .ext and remove the need for find all together but **/* seems to work on directories only and not files.
A break down of any solutions would be very much appreciated too, I am new to shell scripting. Thanks.
find -name "*.ext" \! -path './ExcludeThis*' -exec head -q '{}' \+ |
grep search_string | cut -f2 -d'"' >> results.txt
find explanation
find -name "*.ext" \! -path './ExcludeThis*' -exec head -q '{}' \+
For each file name matched, executes head (with \+, the command line is built by appending each selected file name at the end, so the total number of invocations of the command will be much less than the number of matched files).
Notice I replaced .ext with *.ext (the first way just math a file named exactly .ext), and ! with \! (protection from interpretation by the shell).
The head option -q is necessary because that command prints headers when used with multiple files (due to \+ in this case).
In addition, if no path is given, the default is taken (.). i.e.: find . -name = find -name.
pipeline explanation
<find ... -exec head> | grep search_string | cut -f2 -d'"' >> results.txt
While head write the lines (10 by default) for every file in the pipe, grep read them.
If grep matches search_string in some of them, write those lines in the next pipe.
At the same time, cut take the second fields (delimited by ") of every line and appends them in results.txt

Getting last element of a path (different from #10124314 as basename falls over)

I need to process a couple of thousand PDF files sorted alphabietically on their filename ideally from bash. So from my simple perspective I need to walk a tree of files, stripping off path as I go and then do various grepping, sorting etc
Having seen an answer to a similar question I've tried doing a
tim#MERLIN:~/Documents/Scanned$ basename `find ./ -print`
but that gets messed up by some directory names which have spaces in them - e.g. there is one called General Letters which acts like a chicken-bone in the works and results in
basename: extra operand ‘Letters’
Try 'basename --help' for more information.
I can't see a way to get find to strip out the pathname and I would prefer to use find given its plethora of options to filter on age, size etc. Nor can I see any way to get basename to cope gracefully with spaces in this context.
I considered using cut but I can't work out how to get cut to give me the last field by doing something like cut -d/ <whatever> I'm sure there must be an easy way to do it: some sort of in-line sed or awk script?
I don't particularly want the buggeration of writing a perl/Python script to do it for me as I know I should be able to do it from the command line.
So any simple tips or suggestions?
Updated/Solved
Many thanks to Cyrus the solution is
tim#MERLIN:~/Documents/Scanned$ find . -name *.pdf -printf '%f\n' | sort
Try this:
find ./ -printf '%f\n'
%f: File's name with any leading directories removed (only the last element).
Here is a working solution using awk:
find ./ | awk -F'/' '{ print $NF }';
It simply uses / as delimiter and prints the last value of the line.
Or with grep:
find ./ | grep -oE "[^/]+$"
Through sed,
find ./ | sed 's/.*\/\(.*\)$/\1/g'
If you want get a list of pathnames (recursively) but want sort them by filenames (not by path names) you can use:
find . -printf '%f|%p\n' | sort -k 1 -t'|' | cut -d'|' -f2-
You need a GNU find for this. (Linux ok, not default in OS X).
Without the GNU find, you can do the above with:
find . -print | sed 's:\(.*\)/\(.*\)$:\2\|\1/\2:' | sort -k 1 -t'|' | cut -d'|' -f2-
(Assuming there is no \n in the filenames)

Force sort command to ignore folder names

I ran the following from a base folder ./
find . -name *.xvi.txt | sort
Which returns the following sort order:
./LT/filename.2004167.xvi.txt
./LT/filename.2004247.xvi.txt
./pred/2004186/filename.2004186.xvi.txt
./pred/2004202/filename.2004202.xvi.txt
./pred/2004222/filename.2004222.xvi.txt
As you can see, the filenames follow a regular structure, but the files themselves might be located in different parts of the directory structure. Is there a way of ignoring the folder names and/or directory structure so that the sort returns a list of folders/filenames based ONLY on the file names themselves? Like so:
./LT/filename.2004167.xvi.txt
./pred/2004186/filename.2004186.xvi.txt
./pred/2004202/filename.2004202.xvi.txt
./pred/2004222/filename.2004222.xvi.txt
./LT/filename.2004247.xvi.txt
I've tried a few different switches under the find and sort commands, but no luck. I could always copy everything out to a single folder and sort from there, but there are several hundred files, and I'm hoping that a more elegant option exists.
Thanks! Your help is appreciated.
If your find has -printf you can print both the base filename and the full filename. Sort by the first field, then strip it off.
find . -name '*.xvi.txt' -printf '%f %p\n' | sort -k1,1 | cut -f 2- -d ' '
I have chosen a space as a delimiter. If your filenames include spaces, you should choose another delimiter which is a character that's not in your filenames. If any filenames include newlines, you'll have to modify this because it won't work.
Note that the glob in the find command should be quoted.
If your find doesn't have printf, you could use awk to accomplish the same thing:
find . -name *.xvi.txt | awk -F / '{ print $NF, $0 }' | sort | sed 's/.* //'
The same caveats about spaces that Dennis Williamson mentioned apply here. And for variety, I'm using sed to strip off the sort field, instead of cut.
find . -name *.xvi.txt | sort -t'.' -k3 -n
will sort it as you want. the only problem is if filename or directory name will include additinal dots.
To avoid it you can use :
find . -name *.xvi.txt | sed 's/[0-9]\+.xvi.txt$/\\&/' | sort -t'\' -k2 | sed 's/\\//'

Resources