Searching for .extension files recursively and print the number of lines in the files found? - bash

I ran into a problem I am trying to solve but can't think about a way without doing the whole thing from the beginning. My script gets an extension and searches for every .extension file recursively, then outputs the "filename:row #:word #". I would like to print out the total amount of row #-s found in those files too. Is there any way to do it using the existing code?
for i in find . -name '*.$1'|awk -F/ '{print $NF}'
do
echo "$i:`wc -l <$i|bc`:`wc -w <$i|bc`">>temp.txt
done
sort -r -t : -k3 temp.txt
cat temp.txt

I think you're almost there, unless I am missing something in your requirements:
#!/bin/bash
total=0
for f in `find . -name "*.$1"` ; do
lines=`wc -l < $f`
words=`wc -w < $f`
total=`echo "$lines+$total" | bc`
echo "* $f:$lines:$words"
done
echo "# Total: $total"
Edit:
Per recommendation of #Mark Setchel in the comments, this is a more refined version of the script above:
#!/bin/bash
total=0
for f in `find . -name "*.$1"` ; do
read lines words _ < <(wc -wl "$f")
total=$(($lines+$total))
echo "* $f:$lines:$words"
done
echo "# Total: $total"
Cheers

This is a one-liner printing the lines found per file, the path of the file and at the end the sum of all lines found in all the files:
find . -name "*.go" -exec wc -l {} \; | awk '{s+=$1} {print $1, $2} END {print s}'
In this example if will find for all files ending *.go then will execute use wc -l to get the number of lines and print the output to stdout, awk then is used to sum all the output of column 1 in the variable s the one will be only printed at the end: END {print s}
In case you would also like to get the words and the total sum at the end you could use:
find . -name "*.go" -exec wc {} \; | \
awk '{s+=$1; w+=$2} {print $1, $2, $4} END {print "Total:", s, w}'
Hope this can give you an idea about how to format, sum etc. your data based on the input.

Related

In loop cat file - echo name of file - count

I trying make oneline command with operation where I can do:
in folder "data" have 570 files - each file have some text line - file are called from 1 to 570.txt
I want cat each file, grep by word and count how manny that word occurs.
For the moment he is trying to get this using ' for '
for FILES in $(find /home/my/data/ -type f -print -exec cat {} \;) ; do echo $FILES; cat $FILES |grep word ; done |wc -l
but if I do that they correctly counts but does not display the counted file
I would like it to look :
----> 1.txt <----
210
---> 2.txt <----
15
etc, etc, etc..
How to get it
grep -o word * | uniq -c
is practically all you need.
grep -o word * gives a line for each hit, but only prints the match, in this case "word". Each line is prefixed with the filename it was found in.
uniq -c gives only one line per file so to say and prefixes it with the count.
You can further format it to your needs with awk or whatever, though, for example like this:
grep -o word * | uniq -c | cut -f1 -d':' | awk '{print "File: " $2 " Count: " $1}'
You can try this :
for file in /path/to/folder/data/* ; do echo "----> $file <----" ; grep -c "word_to_count" /path/to/folder/data/$file ; done
for loop will ierate over file inside folder "data".
For each of these file, print the name and search for number of occurrence of "word_to_count" (grep -c will directly output a count of matching lines).
Be carefull, if there is more than one iteration of your search word inside a line, this solution will count only one for these iteration.
Bit of awk should do it?
awk '{s+=$1} END {print s}' mydatafile
Note: some versions of awk have some odd behaviours if you are going to be adding anything exceeding 2^31 (2147483647). See comments for more background. One suggestion is to use printf rather than print:
awk '{s+=$1} END {printf "%.0f", s}' mydatafile
$ python -c "import sys; print(sum(int(l) for l in sys.stdin))"
If you only want the total number of lines, you could use
find /home/my/data/ -type f -exec cat {} + | wc -l

Append wc lines to filename

Title says it all. I've managed to get just the lines with this:
lines=$(wc file.txt | awk {'print $1'});
But I could use an assist appending this to the filename. Bonus points for showing me how to loop this over all the .txt files in the current directory.
find -name '*.txt' -execdir bash -c \
'mv -v "$0" "${0%.txt}_$(wc -l < "$0").txt"' {} \;
where
the bash command is executed for each (\;) matched file;
{} is replaced by the currently processed filename and passed as the first argument ($0) to the script;
${0%.txt} deletes shortest match of .txt from back of the string (see the official Bash-scripting guide);
wc -l < "$0" prints only the number of lines in the file (see answers to this question, for example)
Sample output:
'./file-a.txt' -> 'file-a_5.txt'
'./file with spaces.txt' -> 'file with spaces_8.txt'
You could use the rename command, which is actually a Perl script, as follows:
rename --dry-run 'my $fn=$_; open my $fh,"<$_"; while(<$fh>){}; $_=$fn; s/.txt$/-$..txt/' *txt
Sample Output
'tight_layout1.txt' would be renamed to 'tight_layout1-519.txt'
'tight_layout2.txt' would be renamed to 'tight_layout2-1122.txt'
'tight_layout3.txt' would be renamed to 'tight_layout3-921.txt'
'tight_layout4.txt' would be renamed to 'tight_layout4-1122.txt'
If you like what it says, remove the --dry-run and run again.
The script counts the lines in the file without using any external processes and then renames them as you ask, also without using any external processes, so it quite efficient.
Or, if you are happy to invoke an external process to count the lines, and avoid the Perl method above:
rename --dry-run 's/\.txt$/-`grep -ch "^" "$_"` . ".txt"/e' *txt
Use rename command
for file in *.txt; do
lines=$(wc ${file} | awk {'print $1'});
rename s/$/${lines}/ ${file}
done
#/bin/bash
files=$(find . -maxdepth 1 -type f -name '*.txt' -printf '%f\n')
for file in $files; do
lines=$(wc $file | awk {'print $1'});
extension="${file##*.}"
filename="${file%.*}"
mv "$file" "${filename}${lines}.${extension}"
done
You can adjust maxdepth accordingly.
you can do like this as well:
for file in "path_to_file"/'your_filename_pattern'
do
lines=$(wc $file | awk {'print $1'})
mv $file $file'_'$lines
done
example:
for file in /oradata/SCRIPTS_EL/text*
do
lines=$(wc $file | awk {'print $1'})
mv $file $file'_'$lines
done
This would work, but there are definitely more elegant ways.
for i in *.txt; do
mv "$i" ${i/.txt/}_$(wc $i | awk {'print $1'})_.txt;
done
Result would put the line numbers nicely before the .txt.
Like:
file1_1_.txt
file2_25_.txt
You could use grep -c '^' to get the number of lines, instead of wc and awk:
for file in *.txt; do
[[ ! -f $file ]] && continue # skip over entries that are not regular files
#
# move file.txt to file.txt.N where N is the number of lines in file
#
# this naming convention has the advantage that if we run the loop again,
# we will not reprocess the files which were processed earlier
mv "$file" "$file".$(grep -c '^' "$file")
done
{ linecount[FILENAME] = FNR }
END {
linecount[FILENAME] = FNR
for (file in linecount) {
newname = gensub(/\.[^\.]*$/, "-"linecount[file]"&", 1, file)
q = "'"; qq = "'\"'\"'"; gsub(q, qq, newname)
print "mv -i -v '" gensub(q, qq, "g", file) "' '" newname "'"
}
close(c)
}
Save the above awk script in a file, say wcmv.awk, the run it like:
awk -f wcmv.awk *.txt
It will list the commands that need to be run to rename the files in the required way (except that it will ignore empty files). To actually execute them you can pipe the output to a shell for execution as follows.
awk -f wcmv.awk *.txt | sh
Like it goes with all irreversible batch operations, be careful and execute commands only if they look okay.
awk '
BEGIN{ for ( i=1;i<ARGC;i++ ) Files[ARGV[i]]=0 }
{Files[FILENAME]++}
END{for (file in Files) {
# if( file !~ "_" Files[file] ".txt$") {
fileF=file;gsub( /\047/, "\047\"\047\"\047", fileF)
fileT=fileF;sub( /.txt$/, "_" Files[file] ".txt", fileT)
system( sprintf( "mv \047%s\047 \047%s\047", fileF, fileT))
# }
}
}' *.txt
Another way with awk to manage easier a second loop by allowing more control on name (like avoiding one having already the count inside from previous cycle)
Due to good remark of #gniourf_gniourf:
file name with space inside are possible
tiny code is now heavy for such a small task

Bash: grabbing the second line and last line of output (ls -lrS) only

I am looking to get the second line and last line of what the ls -lrS command outputs. Ive been using ls -lrS | (head -2 | tail -1) && (tail -n1) But it seems to only get the first line only, and I have to press control C to stop it.
Another problem I am having is using the awk command, I wanted to just grab the file size and file name. If I were to get the correct lines (second and last) my desired output would be
files=$(ls -lrS | (head -2 | tail -1) && (tail -n1) awk '{ print "%s", $5; "%s", $8; }' )
I was hoping it would print:
1234 file.abc
12345 file2.abc
Using the format stable GNU stat command:
stat --format='%s %n' * | sort -n | sed -n '1p;$p'
If you're using BSD stat, adjust accordingly.
If you want a lot more control over what files go into this calculation, and arguably better portability, use find. In this example, I'm getting all non-dot files in the current directory:
find -maxdepth 1 -not -path '*/\.*' -printf '%s %p\n' | sort -n | sed -n '1p;$p'
And take care if your directory contains two or fewer entries, or if any of your entries have a new-line in their name.
Using awk:
ls -lrS | awk 'NR==2 { print; } END { print; }'
It prints when the line number NR is 2 and again on the final line.
Note: As pointed out in the comments, $0 may or may not be available in an END block depending on your awk version.
whatever | awk 'NR==2{x=$0;next} {y=$0} END{if (x!="") print x; if (y!="") print y}'
You need that complexity (and more to be REALLY robust) to handle input that's less than 3 lines.
ls is not a reliable tool for this job: It can't represent all possible filenames (spaces are possible, but also newlines and other special characters -- all but NUL). One robust solution on a system with GNU tools is to use find:
{
# read the first size and name
IFS= read -r -d' ' first_size; IFS= read -r -d '' first_name;
# handle case where only one file exists
last_size=$first_size; last_name=$first_name
# continue reading "last" size and name, until one really is last
while IFS= read -r -d' ' curr_size && IFS= read -r -d '' curr_name; do
last_size=$curr_size; last_name=$curr_name
done
} < <(find . -mindepth 1 -maxdepth 1 -type f -printf '%s %P\0' | sort -n -z)
The above puts results into variables $first_size, $first_name, $last_size and $last_name, usable thusly:
printf 'Smallest file is %d bytes, named %q\n' "$first_size" "$first_name"
printf 'Largest file is %d bytes, named %q\n' "$last_size" "$last_name"
In terms of how it works:
find ... -printf '%s %P\0'
...emits a stream of the following form from find:
<size> <name><NUL>
Running that stream through sort -n -z does a numeric sort on its contents. IFS= read -r -d' ' first_size reads the everything up to the first space; IFS= read -r -d '' first_name reads everything up to the first NUL; and then the loop continues to read and store additional size/name pairs until the last one is reached.

Sum of file sizes with awk on a list of files

I have a list of files and want to sum over their file sizes.
So, I created a (global) variable as counter and are trying to loop over that list, get the file size with ls and cut&add it with
export COUNTER=1
for x in $(cat ./myfiles.lst); do ls -all $x | awk '{COUNTER+=$5}'; done
However, my counter is empty?
> echo $COUNTER
> 1
Does someone has an idea for my, what I am missing here?
Cheers and thanks,
Thomas
OK, I found a way piping the result from the awk pipe into a variable
(which is probably not elegant but working ;) )
for x in $(cat ./myfiles.lst); do a=$(ls -all $x |awk '{print $5}'); COUNTER=$(($COUNTER+$a)) ; done
> echo $COUNTER
> 4793061514
awk is getting called for every file, so in COUNTER you got the last file's size.
A better solution is:
ls -all <myfiles.lst | awk '{COUNTER+=$5} END {print COUNTER}'
But you are reinventing the wheel here. You can do something like
du -s <myfiles.lst
(If you have du installed. Note: see the comments below my answer about du. I had tested this with cygwin and with that it worked like a charm.)
Shorter version of the last:
ls -l | awk '{sum += $5} END {print sum}'
Now, say you want to filter by certain types of files, age, etc... Just throw the ls -l into a find, and you can filter using find's extensive filter parameters:
find . -type f -exec ls -l {} \; | awk '{sum += $5} END {print sum}'
ls -ltS | awk -F " " {'print $5'} | awk '{s+=$1} END {print s}'

Bash Shell awk/xargs magic

I'm trying to learn a little awk foo. I have a CSV where each line is of the format partial_file_name,file_path. My goal is to find the files (based on partial name) and move them to their respective new paths. I wanted to combine the forces of find,awk and mv to achieve this but I'm stuck implementing. I wanted to use awk to separate the terms from the csv file so that I could do something like
find . -name '*$1*' -print | xargs mv {} $2{}
where $1 and $2 are the split terms from the csv file. Anyone have any ideas? -peace
I suggest doing this:
$ cat korv
foo.txt,/hello/
bar.jpg,/mullo/
$ awk -F, '{print $1 " " $2}' korv
foo.txt /hello/
bar.jpg /mullo/
-F sets the delimiter, so the above will split using ",". Next, add * to the filenames:
$ awk -F, '{print "*"$1"*" " " $2}' korv
*foo.txt* /hello/
*bar.jpg* /mullo/
**
This shows I have an empty line. We don't want this match, so we add a rule:
$ awk -F, '/[a-z]/{print "*"$1"*" " " $2}' korv
*foo.txt* /hello/
*bar.jpg* /mullo/
Looks good, so encapsulate all this to mv using a subshell:
$ mv $(awk -F, '/[a-z]/{print "*"$1"*" " " $2}' korv)
$
Done.
You don't really need awk for this. There isn't really anything here which awk does better than the shell.
#!/bin/sh
IFS=,
while read file target; do
find . -name "$file" -print0 | xargs -ir0 mv {} "$target"
done <path_to_csv_file
If you have special characters in the file names, you may need to tweak the read.
what about using awk's system command:
awk '{ system("find . -name " $1 " -print | xargs -I {} mv {} " $2 "{}"); }'
example arguments in the csv file: test.txt ./subdirectory/
find . -name "*err" -size "+10c" | awk -F.err '{print $1".job"}' | xargs -I {} qsub {}

Resources