Output of 'cat' to find files with partial filenames - bash

Say I've file1.txt with
ptext1
ptext2
ptext3
ptext4
These are the partial file names (library names) which I'm trying to find from a directory. Something like
cat file1.txt | xargs find . -name "*$0*"
or say,
cat file.txt | awk '{system("find . -name " *$0*)}'
None of them are working.
Please suggest.

I'm sure there is a more elegant way, but you could always loop over and run find on each:
Update to reflect suggestions in comments
while read -r filename; do
find . -type f -name "*$filename*"
done < file1.txt

One way with xargs
xargs -I{} find . -name "*"{}"*" < file

Related

Sort names of zipped files and write list to file

I tried to list the zipped files in sort order and transfer this to new file, but it does not work properly in shell script. Why my script is not working?
ls |grep gz|sort -t '.' -k 2,2n >filename;
I did not find any problem with your commands. But they do not seem the right way to do this, at least to me. These two ways I'm pasting I think are better. Try them out.
With only names :
find . -type f -name '*.html' 2>/dev/null -exec basename {} \; | sort > filename.txt
With full paths :
find . -type f -name '*.html' 2>/dev/null | sort > filename.txt
You can also add the "-maxdepth 1" flag to search only on the current directory where you are running this, and not recursively within nested dirs :
find . -type f -maxdepth 1 -name '*.html' 2>/dev/null | sort > filename.txt
Hope this helps you :)

How do I write a bash shell script to go through a series of files and pull a column of data out? [duplicate]

This question already has answers here:
How to extract one column of a csv file
(18 answers)
Closed 8 years ago.
I have a folder of about 10 thousand files and I need to write a bash shell script that will pull a COLUMN of data out and put it in a file. Help??? Please and thank you!
EDIT To Include:
#!/bin/bash
cd /Users/Larry/Desktop/TestFolder
find . -maxdepth 1 -mindepth 1 -type d
sed '4q;d'
A separate attempt
for dir in /Users/Larry/Desktop/TestFolder
do
dir=${dir%*/}
sed -n '4q;d' > Success.txt
done
The files are comma separated value files that open in a spreadsheet program like Numbers or Excel in a spreadsheet. I want to extract a single column from each file but there are at least 10 thousand files in each folder so arguments give to error "too long".
Another attempt
find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '.csv' -print0 | xargs -0 awk -F '","' {print $2}' find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '.csv' -print0 | xargs -0 awk -F '"*,*' '{print $2}' > DidItWorkThisTime.csv
The link to a previous question does not work for large sets of files.
If the directory has so many files that you exceed the argument limit, you should use find and xargs.
find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '*.csv' -print0 |
xargs -0 awk -F '"*,"*' '{print $2}' > Success.txt
Try:
find /Users/Larry/Desktop/TestFolder -type f -maxdepth 1 -name '*.csv' -exec awk -F, '{ print $2 }' '{}' \; > Success.txt
It should execute awk on each csv file found, using a comma to separate fields (-F,), to print the second ($2) field, and redirect the output to Success.txt.
Also, you might swap > Success.txt for | tee Success.txt if you want to see the output AND have it saved to the file, at least while you're testing the command and don't want to wait for all those files to be processed to see if it worked.
A simple and straightforward adaptation of the code you already have.
find /Users/Larry/Desktop/TestFolder -maxdepth 1 -mindepth 1 -type f -name '*.csv' |
xargs cut -f2
If you want files, -type d is wrong. I changed that to -type f and added the -name option to select only *.csv files.
for dir in /Users/Larry/Desktop/TestFolder/*
do
cut -f2 "$dir"/*.csv
done
This is assuming TestFolder contains a number of directories, and each of them contains one or more *.csv files. This can be further simplified to
cut -f2 /Users/Larry/Desktop/TestFolder/*/*.csv
but this could get you the Argument lenght exceeded error you tried to avoid.
All of these will print to standard out; add >Success.txt at the end to redirect to a file.
cut -d',' -f1,2,3 *.csv > result.csv
Assuming the field delimiter in your files is , [a csv file after all] and that you need in the result columns 1,2 and 3.
Above command will have problems if needed columns are having the delimiter in the column itself: "...,...",

Terminal find, directories last instead of first

I have a makefile that concatenates JavaScript files together and then runs the file through uglify-js to create a .min.js version.
I'm currently using this command to find and concat my files
find src/js -type f -name "*.js" -exec cat {} >> ${jsbuild}$# \;
But it lists files in directories first, this makes heaps of sense but I'd like it to list the .js files in the src/js files above the directories to avoid getting my undefined JS error.
Is there anyway to do this or? I've had a google around and seen the sort command and the -s flag for find but it's a bit above my understanding at this point!
[EDIT]
The final solution is slightly different to the accepted answer but it is marked as accepted as it brought me to the answer. Here is the command I used
cat `find src/js -type f -name "*.js" -print0 | xargs -0 stat -f "%z %N" | sort -n | sed -e "s|[0-9]*\ \ ||"` > public/js/myCleverScript.js
Possible solution:
use find for getting filenames and directory depth, i.e find ... -printf "%d\t%p\n"
sort list by directory depth with sort -n
remove directory depth from output to use filenames only
test:
without sorting:
$ find folder1/ -depth -type f -printf "%d\t%p\n"
2 folder1/f2/f3
1 folder1/file0
with sorting:
$ find folder1/ -type f -printf "%d\t%p\n" | sort -n | sed -e "s|[0-9]*\t||"
folder1/file0
folder1/f2/f3
the command you need looks like
cat $(find src/js -type f -name "*.js" -printf "%d\t%p\n" | sort -n | sed -e "s|[0-9]*\t||")>min.js
Mmmmm...
find src/js -type f
shouldn't find ANY directories at all, and doubly so as your directory names will probably not end in ".js". The brackets around your "-name" parameter are superfluous too, try removing them
find src/js -type f -name "*.js" -exec cat {} >> ${jsbuild}$# \;
find could get the first directory level already expanded on commandline, which enforces the order of directory tree traversal. This solves the problem just for the top directory (unlike the already accepted solution by Sergey Fedorov), but this should answer your question too and more options are always welcome.
Using GNU coreutils ls, you can sort directories before regular files with --group-directories-first option. From reading the Mac OS X ls manpage it seems that directories are grouped always in OS X, you should just drop the option.
ls -A --group-directories-first -r | tac | xargs -I'%' find '%' -type f -name '*.js' -exec cat '{}' + > ${jsbuild}$#
If you do not have the tac command, you could easily implement it using sed. It reverses the order of lines. See info sed tac of GNU sed.
tac(){
sed -n '1!G;$p;h'
}
You could do something like this...
First create a variable holding the name of our output file:
OUT="$(pwd)/theLot.js"
Then, get all "*.js" in top directory into that file:
cat *.js > $OUT
Then have "find" grab all other "*.js" files below current directory:
find . -type d ! -name . -exec sh -c "cd {} ; cat *.js >> $OUT" \;
Just to explain the "find" command, it says:
find
. = starting at current directory
-type d = all directories, not files
-! -name . = except the current one
-exec sh -c = and for each one you find execute the following
"..." = go to that directory and concatenate all "*.js" files there onto end of $OUT
\; = and that's all for today, thank you!
I'd get the list of all the files:
$ find src/js -type f -name "*.js" > list.txt
Sort them by depth, i.e. by the number of '/' in them, using the following ruby script:
sort.rb:
files=[]; while gets; files<<$_; end
files.sort! {|a,b| a.count('/') <=> b.count('/')}
files.each {|f| puts f}
Like so:
$ ruby sort.rb < list.txt > sorted.txt
Concatenate them:
$ cat sorted.txt | while read FILE; do cat "$FILE" >> output.txt; done
(All this assumes that your file names don't contain newline characters.)
EDIT:
I was aiming for clarity. If you want conciseness, you can absolutely condense it to something like:
find src/js -name '*.js'| ruby -ne 'BEGIN{f=[];}; f<<$_; END{f.sort!{|a,b| a.count("/") <=> b.count("/")}; f.each{|e| puts e}}' | xargs cat >> concatenated

find and cat to merge csv files

I have thousands of files in sub-directories of ~/data. I wish to merge all those csv files with a certain extension say .x and save the merged file to ~/data/merged.x
I know I need to use find,cat and >> with the option -iname, but I'm finding it hard to do.
Thanks in advance
find ~/data -name "*.x" | while read file
do
cat $file >> ~/data/merged.x
done
find ~/data -type f ! -name 'merged.x' -a -name '*.x' -exec cat {} \+ >> ~/data/merged.x
find ./data/ -type f -name "*.c*" | xargs cat > ~/data/merged.x

Use find, wc, and sed to count lines

I was trying to use sed to count all the lines based on a particular extension.
find -name '*.m' -exec wc -l {} \; | sed ...
I was trying to do the following, how would I include sed in this particular line to get the totals.
You may also get the nice formatting from wc with :
wc `find -name '*.m'`
Most of the answers here won't work well for a large number of files. Some will break if the list of file names is too long for a single command line call, others are inefficient because -exec starts a new process for every file. I believe a robust and efficient solution would be:
find . -type f -name "*.m" -print0 | xargs -0 cat | wc -l
Using cat in this way is fine, as its output is piped straight into wc so only a small amount of the files' content is kept in memory at once. If there are too many files for a single invocation of cat, cat will be called multiple times, but all the output will still be piped into a single wc process.
You can cat all files through a single wc instance to get the total number of lines:
find . -name '*.m' -exec cat {} \; | wc -l
On modern GNU platforms wc and find take -print0 and -files0-from parameters that can be combined into a command that count lines in files with total at the end. Example:
find . -name '*.c' -type f -print0 | wc -l --files0-from=-
you could use sed also for counting lines in place of wc:
find . -name '*.m' -exec sed -n '$=' {} \;
where '$=' is a "special variable" that keep the count of lines
EDIT
you could also try something like sloccount
Hm, solution with cat may be problematic if you have many files, especially big ones.
Second solution doesn't give total, just lines per file, as I tested.
I'll prefer something like this:
find . -name '*.m' | xargs wc -l | tail -1
This will do the job fast, no matter how many and how big files you have.
sed is not the proper tool for counting. Use awk instead:
find . -name '*.m' -exec awk '{print NR}' {} +
Using + instead of \; forces find to call awk every N files found (like with xargs).
For big directories we should use:
find . -type f -name '*.m' -exec sed -n '$=' '{}' + 2>/dev/null | awk '{ total+=$1 }END{print total}'
# alternative using awk twice
find . -type f -name '*.m' -exec awk 'END {print NR}' '{}' + 2>/dev/null | awk '{ total+=$1 }END{print total}'

Resources