Terminal find, directories last instead of first - bash

I have a makefile that concatenates JavaScript files together and then runs the file through uglify-js to create a .min.js version.
I'm currently using this command to find and concat my files
find src/js -type f -name "*.js" -exec cat {} >> ${jsbuild}$# \;
But it lists files in directories first, this makes heaps of sense but I'd like it to list the .js files in the src/js files above the directories to avoid getting my undefined JS error.
Is there anyway to do this or? I've had a google around and seen the sort command and the -s flag for find but it's a bit above my understanding at this point!
[EDIT]
The final solution is slightly different to the accepted answer but it is marked as accepted as it brought me to the answer. Here is the command I used
cat `find src/js -type f -name "*.js" -print0 | xargs -0 stat -f "%z %N" | sort -n | sed -e "s|[0-9]*\ \ ||"` > public/js/myCleverScript.js

Possible solution:
use find for getting filenames and directory depth, i.e find ... -printf "%d\t%p\n"
sort list by directory depth with sort -n
remove directory depth from output to use filenames only
test:
without sorting:
$ find folder1/ -depth -type f -printf "%d\t%p\n"
2 folder1/f2/f3
1 folder1/file0
with sorting:
$ find folder1/ -type f -printf "%d\t%p\n" | sort -n | sed -e "s|[0-9]*\t||"
folder1/file0
folder1/f2/f3
the command you need looks like
cat $(find src/js -type f -name "*.js" -printf "%d\t%p\n" | sort -n | sed -e "s|[0-9]*\t||")>min.js

Mmmmm...
find src/js -type f
shouldn't find ANY directories at all, and doubly so as your directory names will probably not end in ".js". The brackets around your "-name" parameter are superfluous too, try removing them
find src/js -type f -name "*.js" -exec cat {} >> ${jsbuild}$# \;

find could get the first directory level already expanded on commandline, which enforces the order of directory tree traversal. This solves the problem just for the top directory (unlike the already accepted solution by Sergey Fedorov), but this should answer your question too and more options are always welcome.
Using GNU coreutils ls, you can sort directories before regular files with --group-directories-first option. From reading the Mac OS X ls manpage it seems that directories are grouped always in OS X, you should just drop the option.
ls -A --group-directories-first -r | tac | xargs -I'%' find '%' -type f -name '*.js' -exec cat '{}' + > ${jsbuild}$#
If you do not have the tac command, you could easily implement it using sed. It reverses the order of lines. See info sed tac of GNU sed.
tac(){
sed -n '1!G;$p;h'
}

You could do something like this...
First create a variable holding the name of our output file:
OUT="$(pwd)/theLot.js"
Then, get all "*.js" in top directory into that file:
cat *.js > $OUT
Then have "find" grab all other "*.js" files below current directory:
find . -type d ! -name . -exec sh -c "cd {} ; cat *.js >> $OUT" \;
Just to explain the "find" command, it says:
find
. = starting at current directory
-type d = all directories, not files
-! -name . = except the current one
-exec sh -c = and for each one you find execute the following
"..." = go to that directory and concatenate all "*.js" files there onto end of $OUT
\; = and that's all for today, thank you!

I'd get the list of all the files:
$ find src/js -type f -name "*.js" > list.txt
Sort them by depth, i.e. by the number of '/' in them, using the following ruby script:
sort.rb:
files=[]; while gets; files<<$_; end
files.sort! {|a,b| a.count('/') <=> b.count('/')}
files.each {|f| puts f}
Like so:
$ ruby sort.rb < list.txt > sorted.txt
Concatenate them:
$ cat sorted.txt | while read FILE; do cat "$FILE" >> output.txt; done
(All this assumes that your file names don't contain newline characters.)
EDIT:
I was aiming for clarity. If you want conciseness, you can absolutely condense it to something like:
find src/js -name '*.js'| ruby -ne 'BEGIN{f=[];}; f<<$_; END{f.sort!{|a,b| a.count("/") <=> b.count("/")}; f.each{|e| puts e}}' | xargs cat >> concatenated

Related

Show directory path with only files present in them

This is the folder structure that I have.
Using the find command find . -type d in root folder gives me the following result
Result
./folder1
./folder1/folder2
./folder1/folder2/folder3
However, I want the result to be only ./folder1/folder2/folder3. i.e only print the result if there's a file of type .txt present inside.
Can someone help with this scenario? Hope it makes sense.
find . -type f -name '*.txt' |
sed 's=/[^/]*\.txt$==' |
sort -u
Find all .txt files, remove file names with sed to get the parent directories only, then sort -u to remove duplicates.
This won’t work on file names/paths that contain a new line.
You may use this find command that finds all the *.txt files and then it gets unique their parent directory names:
find . -type f -name '*.txt' -exec bash -c '
for f; do
f="${f#.}"
printf "%s\0" "$PWD${f%/*}"
done
' _ {} + | awk -v RS='\0' '!seen[$0]++'
We are using printf "%s\0" to address directory names with newlines, spaces and glob characters.
Using gnu-awk to get only unique directory names printed
Using Associative array and Process Substitution.
#!/usr/bin/env bash
declare -A uniq_path
while IFS= read -rd '' files; do
path_name=${files%/*}
if ((!uniq_path["$path_name"]++)); then
printf '%s\n' "$path_name"
fi
done < <(find . -type f -name '*.txt' -print0)
Check the value of uniq_path
declare -p uniq_path
Maybe this POSIX one?
find root -type f -name '*.txt' -exec dirname {} \; | awk '!seen[$0]++'
* adds a trailing \n after each directory path
* breaks when a directory in a path has a \n in its name
Or this BSD/GNU one?
find root -type f -name '*.txt' -exec dirname {} \; -exec printf '\0' \; | sort -z -u
* adds a trailing \n\0 after each directory path

Bash script to return all elements given an extension, without using print flags

I want to create shell script that search inside all folders of the actual directory and return all files that satisfy some condition, but without using any print flag.
(Here the condition is to end with .py)
What I have done:
find . -name '*.py'| sed -n 's/\.py$//p'
The output:
./123
./test
./abc/dfe/test3
./testing
./test2
What I would like to achieve:
123
test
test3
testing
test2
Use -exec:
find . -name '*.py' -exec sh -c 'for f; do f=${f%.py}; echo "${f##*/}"; done' sh {} +
If GNU basename is an option, you can simplify this to
find . -name '*.py' -exec basename -s .py {} +
POSIX basename is a little more expensive, as you'll have to call it on every file individually:
find . -name '*.py' -exec basename {} .py \;
Using GNU grep instead of sed:
find . -name '*.py' | grep -oP '[^/]+(?=\.py$)'
If portability is not a concern, this is a very readable option:
find . -name '*.py' | xargs basename -a
This is also differentiated from chepner's answer in that it retains the .py file ending in the output.
I'm not familiar with the -exec flag, and I'm sure his one-liners can be customized to do the same, but I couldn't do so off the top of my head.
Chepner's version achieves the same with the small modification:
find . -name '*.py' -exec basename {} \;
if you want the literal output from find and didn't intend to drop the file endings when you used dummy variables (123,test, etc.) in your question.
find shows entries relative to where you ask it to search, you can simply replace the . with a *:
find * -name '*.py'| sed -n 's/\.py$//p'
(Be aware that this skips top level hidden directories)
This might work for you (GNU parallel):
find . -name '*.py*' 2>/dev/null | parallel echo "{/.}"

Sort names of zipped files and write list to file

I tried to list the zipped files in sort order and transfer this to new file, but it does not work properly in shell script. Why my script is not working?
ls |grep gz|sort -t '.' -k 2,2n >filename;
I did not find any problem with your commands. But they do not seem the right way to do this, at least to me. These two ways I'm pasting I think are better. Try them out.
With only names :
find . -type f -name '*.html' 2>/dev/null -exec basename {} \; | sort > filename.txt
With full paths :
find . -type f -name '*.html' 2>/dev/null | sort > filename.txt
You can also add the "-maxdepth 1" flag to search only on the current directory where you are running this, and not recursively within nested dirs :
find . -type f -maxdepth 1 -name '*.html' 2>/dev/null | sort > filename.txt
Hope this helps you :)

How can I list all unique file names without their extensions in bash?

I have a task where I need to move a bunch of files from one directory to another. I need move all files with the same file name (i.e. blah.pdf, blah.txt, blah.html, etc...) at the same time, and I can move a set of these every four minutes. I had a short bash script to just move a single file at a time at these intervals, but the new name requirement is throwing me off.
My old script is:
find ./ -maxdepth 1 -type f | while read line; do mv "$line" ~/target_dir/; echo "$line"; sleep 240; done
For the new script, I basically just need to replace find ./ -maxdepth 1 -type f
with a list of unique file names without their extensions. I can then just replace do mv "$line" ~/target_dir/; with do mv "$line*" ~/target_dir/;.
So, with all of that said. What's a good way to get a unique list of files without their file names with bash script? I was thinking about using a regex to grab file names and then throwing them in a hash to get uniqueness, but I'm hoping there's an easier/better/quicker way. Ideas?
A weird-named files tolerant one-liner could be:
find . -maxdepth 1 -type f -and -iname 'blah*' -print0 | xargs -0 -I {} mv {} ~/target/dir
If the files can start with multiple prefixes, you can use logic operators in find. For example, to move blah.* and foo.*, use:
find . -maxdepth 1 -type f -and \( -iname 'blah.*' -or -iname 'foo.*' \) -print0 | xargs -0 -I {} mv {} ~/target/dir
EDIT
Updated after comment.
Here's how I'd do it:
find ./ -type f -printf '%f\n' | sed 's/\..*//' | sort | uniq | ( while read filename ; do find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \; ; sleep 240; done )
Perhaps it needs some explaination:
find ./ -type f -printf '%f\n': find all files and print just their name, followed by a newline. If you don't want to look in subdirectories, this can be substituted by a simple ls;
sed 's/\..*//': strip the file extension by removing everything after the first dot. Both foo.tar ad foo.tar.gz are transformed into foo;
sort | unique: sort the filenames just found and remove duplicates;
(: open a subshell:
while read filename: read a line and put it into the $filename variable;
find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \;: find in the current directory (find .) all the files (-type f) whose name starts with the value in filename (-iname "$filename"'*', this works also for files containing whitespaces in their name) and execute the mv command on each one (-exec mv {} /dest/dir \;)
sleep 240: sleep
): end of subshell.
Add -maxdepth 1 as argument to find as you see fit for your requirements.
Nevermind, I'm dumb. there's a uniq command. Duh. New working script is: find ./ -maxdepth 1 -type f | sed -e 's/.[a-zA-Z]*$//' | uniq | while read line; do mv "$line*" ~/target_dir/; echo "$line"; sleep 240; done
EDIT: Forgot close tag on code and a backslash.

How do I write a bash alias/function to grep all files in all subdirectories for a string?

I've been using the following command to grep for a string in all the python source files in and below my current directory:
find . -name '*.py' -exec grep -nHr <string> {} \;
I'd like to simplify things so that I can just type something like
findpy <string>
And get the exact same result. Aliases don't seem sufficient since they only do a string expansion, and the argument I need to specify is not the last argument. It sounds like functions are suitable for the task, so I have several questions:
How do I write it?
Where do I put it?
If you don't want to create an entire script for this, you can do it with just a shell function:
findpy() { find . -name '*.py' -exec grep -nHr "$1" {} \; ; }
...but then you may have to define it in both ~/.bashrc and ~/.bash_profile, so it gets defined for both login and interactive shells (see the INVOCATION section of bash's man page).
All the "find ... -exec" solutions above are OK in the sense that they work, but they are horribly inefficient and will be extremely slow for large trees. The reason is that they launch a new process for every single *.py file. Instead, use xargs(1), and run grep only on files (not directories):
#! /bin/sh
find . -name \*.py -type f | xargs grep -nHr "$1"
For example:
$ time sh -c 'find . -name \*.cpp -type f -exec grep foo {} \; >/dev/null'
real 0m3.747s
$ time sh -c 'find . -name \*.cpp -type f | xargs grep foo >/dev/null'
real 0m0.278s
On a side note, you should take a look at Ack for what you are doing. It is designed as a replacement for Grep written in Perl. Filtering files based on the target language or ignoring .svn directories and the like.
Example (snippet from Trac source):
$ ack --python foo ./mysource
ticket/tests/wikisyntax.py
139:milestone:foo
144:<a class="missing milestone" href="/milestone/foo" rel="nofollow">milestone:foo</a>
ticket/tests/conversion.py
34: ticket['foo'] = 'This is a custom field'
ticket/query.py
239: count_sql = 'SELECT COUNT(*) FROM (' + sql + ') AS foo'
I wanted something similar, and the answer by Idelic reminded of one of the nice features of xargs: that it puts the command at the end. You see, my problem was that I wanted to write a shell alias that would "accept parameters" (really, that it would expand in such a way to allow me to pass parameter so grep).
Here's what I added to my bash_aliases:
alias findpy="find . -type f -name '*.py' | xargs grep"
This way, I could write findpy WORD or findpy -e REGEX or findpy -il WORD - the point being that could use any grep command-line option.
Put the following three lines in a file named findpy
#!/bin/bash
find . -name '*.py' -exec grep -nHr $1 {} \;
Then say
chmod u+x findpy
I normally have a directory called bin in my home directory where I put little shell scripts like this. Make sure to add the directory to your PATH.
The script:
#!/bin/bash
find . -name '*.py' -exec grep -nHr "$1" {} ';'
is how I'd do it.
You write it with an editor like vim and put it somewhere on your path. My normal approach is to have a ~/bin directory and make sure my .profile file (or equivalent) contains:
PATH=$PATH:~/bin
Many versions of grep have options to do recursion, specify filename pattern, etc.
grep --perl-regexp --recursive --include='*.py' --regexp="$1" .
This recurses starting from the current directory (.), looks only at files ending in 'py', uses Perl-style regular expressions.
If your version of grep doesn't support --recursive and --include, then you can still use find and xargs, but be sure to allow for pathnames with embedded spaces by using the -print0 argument to find and the --null option to xargs to handle that.
find . -type f -name '*.py' -print0 | xargs --null grep "$1"
should work.
Add the following line to your ~/.bashrc or ~/.bash_profile or ~/.profile
alias findpy='find . -type f -name "*.py" -print0 | xargs -0 grep'
then you can use it like this
findpy def
or with grep options
findpy -i class
the following alias will ignore the version control meta-directory of git and svn
alias findpy='find . -type f -not -path "*/.git/*" -a -not -path "*/.svn/*" -name "*.py" -print0 | xargs -0 grep'
#######################################################################################
#
# Function to search all files (including sub-directories) that match a given file
# extension ($2) looking for an indicated string ($1) - in a case insensitive manner.
#
# For Example:
#
# -> findfile AllowNegativePayments cpp
#
#
#######################################################################################
findfile ()
{
find . -iname "*.$2*" -type f -print0 | xargs -0 grep -i "$1" {} \; 2> /dev/nul
}
alias _ff='findfile'

Resources