Count lines of code recursively, including compressed (zip) files - bash

I use the following Bash script to count lines of code in one of my projects:
echo "--- CLIENT"
cd "/mypath/client"
# Count classes:
a=`find . -name \*.java -print | wc -l`
echo ""
echo "Number of Java classes: $a"
# Total count:
b=`find . -name \*.java -exec cat {} \; | wc -l`
echo ""
echo "Java lines: $b"
c=`find . -name \*.css -exec cat {} \; | wc -l`
echo ""
echo "CSS lines: $c"
d=`find . -name \*.json -exec cat {} \; | wc -l`
echo ""
echo "JSON lines: $d"
f=$((`find . -name \*.h -exec cat {} \; | wc -l` + `find . -name \*.m -exec cat {} \; | wc -l`))
echo ""
echo "iOS Objective-C lines: $f"
echo ""
echo "--- SERVER"
cd "/mypath/server"
# Count classes:
h=`find . -name \*.java -print | wc -l`
echo ""
echo "Number of Java classes: $h"
# Total count:
i=`find . -name \*.java -exec cat {} \; | wc -l`
echo ""
echo "Java lines: $i"
echo ""
echo "Total lines of code: $((b + c + d + e + f + i))"
cd ~
This script worked fine as long as all the source code was searchable this way. Now I have a different use case: some of the source code is still reachable with this script, and some of it is inside compressed zip files (located in various subfolders of "/mypath/client"). These zip files can contain the sources in the root or in various subfolders within them.
I suppose it's possible to adapt my script to take into account the zipped files in the count, but I don't know how to do it.

Counting Files
When you search for .xyz files, also search for .zip files and search their file list.
You can list all filenames in a zip archive using zipinfo archive.zip. zipinfo also supports wildcards to print only matching filenames. For instance, zipinfo archive.zip '*.java' prints only filenames ending with .java.
find . -name \*.java -print \
-o -name \*.zip -exec zipinfo -1 {} '*.java' \; |
wc -l
This command assumes that filenames do not contain linebreaks.
Counting Lines
You can print zipped files without explicitly extracting them using unzip -p archive.zip file1 file2 .... This command also accepts wildcards.
By the way: You can drastically simplify your script by using a function, since find . -name \*.xyz -exec cat {} \; | wc -l is often the same, except for xyz. Also, -exec cat {} + is way faster than -exec cat {} \;.
#! /bin/bash
countLines() {
local ext=$1
find . -name "*.$ext" -exec cat {} + \
-o -name \*.zip -exec unzip -p {} "*.$ext" \; |
wc -l
}
for ext in java css json; do
echo "$ext lines: $(countLines "$ext")"
done
unzip -p archive.zip '*.java' may print the warning caution: filename not matched: *.java if there are no .java files. You can suppress this by adding 2> /dev/null after the find command.
Keep in mind that this approach is very inefficient. find has to run for each file extension. And the zip files are read multiple times too. It would be faster to filter out all files that you want to inspect first, then run wc -l on all of them, and then sum up their line counts.

Related

Bash Cutting a Filename as a String in a Find Loop?

I'm trying to use the cut function to parse filenames, but am encountering difficulty while doing so in a find loop With the intention of converting my music library from ARTIST - TITLE.EXT to TITLE.EXT
So If I had the file X - Y.EXT it should yield Y.EXT as an output.
The current function is something like this:
find . -iname "*.mp3" -exec cut -d "-" -f 2 <<< "`echo {}`" \;
It should be noted that the above syntax looks a bit strange, why not just use <<< {} \; instead of the echo {}. cut seems to parse the file instead of the filename if it's not given a string.
Another attempt I had looked something like:
find . -iname "*.mp3" -exec TRACKTITLE=`echo {} | cut -d '-' -f2` \; -exec echo "$TRACKTITLE" \;
But this fails with find: ‘TRACKTITLE=./DAN TERMINUS - Underwater Cities.mp3’: No such file or directory.
This (cut -d "-" -f 2 <<< FILENAME) command works wonderfully for a single instance (although keeps the space after the "-" character frustratingly).
How can I perform this operation in a find loop?
First thing is try to extract what you want in your file name with Parameter Expansion.
file="ARTIST - TITLE.EXT"
echo "${file#* - }"
Output
TITLE.EXT
Using find and invoking a shell with a for loop.
find . -type f -iname "*.mp3" -exec sh -c 'for music; do echo mv -v "$music" "${music#* - }"; done' sh {} +
If there are .mp3 files in sub directories, just change
-exec
with
-execdir
if available/supported by your find
For whatever reason -execdir is not available.
find . -type f -iname "*.mp3" -exec sh -c '
for music; do
pathname="${music%/*}"
filename="${music##*/}"
new_music="${filename#* - }"
echo mv -v "$music" "$pathname/$new_music"
done' sh {} +
Remove the echo if you're satisfied with the output.
See Understanding -exec option to Find
Below command would say what it would do, remove echo to actually
run mv:
find . -iname "*.mp3" -exec sh -c 'echo mv "$1" "$(echo "$1" | cut -d - -f2)"' sh {} \;
Example output:
$ find . -iname "*.mp3" -exec sh -c 'echo mv "$1" "$(echo "$1" | cut -d - -f2)"' sh {} \;
mv ./X - Y.mp3 Y.mp3
mv ./ARTIST - TITLE.mp3 TITLE.mp3
Also notice that your cut command will leave a whitespace at the
beginning of the new filename:
$ echo ARTIST\ -\ TITLE.mp3 | cut -d - -f2-
TITLE.mp3
You don't need the find nor the cut for this task.
for f in *' - '*.mp3; do mv -i "$f" "${f##* - }"; done
will do the job for the current directory.
If you want to descend through directories, then:
shopt -s globstar
for f in ./**/*' - '*.mp3; do
mv -i "$f" "${f%/*}/${f##* - }"
done

Count filenumber in directory with blank in its name

If you want a breakdown of how many files are in each dir under your current dir:
for i in $(find . -maxdepth 1 -type d) ; do
echo -n $i": " ;
(find $i -type f | wc -l) ;
done
It does not work when the directory name has a blank in the name. Can anyone here tell me how I must edite this shell script so that such directory names also accepted for counting its file contents?
Thanks
Your code suffers from a common issue described in http://mywiki.wooledge.org/BashPitfalls#for_i_in_.24.28ls_.2A.mp3.29.
In your case you could do this instead:
for i in */; do
echo -n "${i%/}: "
find "$i" -type f | wc -l
done
This will work with all types of file names:
find . -maxdepth 1 -type d -exec sh -c 'printf "%s: %i\n" "$1" "$(find "$1" -type f | wc -l)"' Counter {} \;
How it works
find . -maxdepth 1 -type d
This finds the directories just as you were doing
-exec sh -c 'printf "%s: %i\n" "$1" "$(find "$1" -type f | wc -l)"' Counter {} \;
This feeds each directory name to a shell script which counts the files, similarly to what you were doing.
There are some tricks here: Counter {} are passed as arguments to the shell script. Counter becomes $0 (which is only used if the shell script generates an error. find replaces {} with the name of a directory it found and this will be available to the shell script as $1. This is done is a way that is safe for all types of file names.
Note that, wherever $1 is used in the script, it is inside double-quotes. This protects it for word splitting or other unwanted shell expansions.
I found the solution what I have to consider:
Consider_this
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for i in $(find . -maxdepth 1 -type d); do
echo -n " $i: ";
(find $i -type f | wc -l) ;
done
IFS=$SAVEIFS

Find commands don't work in script only

I have a script that searches specific locations for .txt files and outputs the results to stdout, and to a file using the tee command. At least it's supposed to. I'm having some strange issues with it however. Here's the code:
echo -e "${HIGHLIGHT}Sensitive files:${OFF}"
echo "## Sensitive files:" >> $ofile
for file in $(cat $1); do ls -lh $file 2>/dev/null; done | tee -a $ofile
echo " " | tee -a $ofile
echo -e "${HIGHLIGHT}Suids:${OFF}"
echo "## Suids:" >> $ofile
find / -type f \( -perm -04000 -o -perm -02000 \) -exec ls -Alh {} \; 2>/dev/null | tee -a $ofile
echo " " | tee -a $ofile
echo -e "${HIGHLIGHT}Owned by root only:${OFF}"
echo "## Owned by root only:" >> $ofile
find / -type f -user root \( -perm -04000 -o -perm -02000 \) -exec ls -lg {} \; 2>/dev/null | tee -a $ofile
echo " " | tee -a $ofile
# Text files
echo -e "${HIGHLIGHT}Text files:${OFF}"
echo "## Text files:" >> $ofile
find /etc -type f -name *.txt -exec ls -lh {} \; 2>/dev/null | tee -a $ofile
find /home -type f -name *.txt -exec ls -lh {} \; 2>/dev/null | tee -a $ofile
find /root -type f -name *.txt -exec ls -lh {} \; 2>/dev/null | tee -a $ofile
The strange thing is that all of the commands work just fine, except for the find searches for .txt files at the bottom. None of those commands work in the script, yet if I copy and paste them into the terminal and run it exactly as they were in the script, they work just fine. How is this even possible?
You need to quote or escape the * in your -name patterns, otherwise the shell tries to expand it and use the expanded form in its place in the command line.
find /etc -type f -name '*.txt' -exec ls -lh {} \; 2>/dev/null | tee -a $ofile
and the others being similar will work

Is there a way to pipe from a variable?

I'm trying to find all files in a file structure above a certain file size, list them, then delete them. What I currently have looks like this:
filesToDelete=$(find $find $1 -type f -size +$2k -ls)
if [ -n "$filesToDelete" ];then
echo "Deleting files..."
echo $filesToDelete
$filesToDelete | xargs rm
else
echo "no files to delete"
fi
Everything works, except the $filesToDelete | xargs rm, obviously. Is there a way to use pipe on a variable? Or is there another way I could do this? My google-fu didn't really find anything, so any help would be appreciated.
Edit: Thanks for the information everyone. I will post the working code here now for anyone else stumbling upon this question later:
if [ $(find $1 -type f -size +$2k | wc -l) -ge 1 ]; then
find $1 -type f -size +$2k -exec sh -c 'f={}; echo "deleting file $f"; rm $f' {} \;
else
echo "no files above" $2 "kb found"
fi
As already pointed out, you don't need piping a var in this case. But just in case you needed it in some other situation, you can use
xargs rm <<< $filesToDelete
or, more portably
echo $filesToDelete | xargs rm
Beware of spaces in file names.
To also output the value together with piping it, use tee with process substitution:
echo "$x" | tee >( xargs rm )
You can directly use -exec to perform an action on the files that were found in find:
find $1 -type f -size +$2k -exec rm {} \;
The -exec trick makes find execute the command given for each one of the matches found. To refer the match itself we have to use {} \;.
If you want to perform more than one action, -exec sh -c "..." makes it. For example, here you can both print the name of the files are about to be removed... and remove them. Note the f={} thingy to store the name of the file, so that it can be used later on in echo and rm:
find $1 -type f -size +$2k -exec sh -c 'f={}; echo "removing $f"; rm $f' {} \;
In case you want to print a message if no matches were found, you can use wc -l to count the number of matches (if any) and do an if / else condition with it:
if [ $(find $1 -type f -size +$2k | wc -l) -ge 1 ]; then
find $1 -type f -size +$2k -exec rm {} \;
else
echo "no matches found"
fi
wc is a command that does word count (see man wc for more info). Doing wc -l counts the number of lines. So command | wc -l counts the number of lines returned by command.
Then we use the if [ $(command | wc -l) -ge 1 ] check, which does an integer comparison: if the value is greater or equal to 1, then do what follows; otherwise, do what is in else.
Buuuut the previous approach was using find twice, which is a bit inefficient. As -exec sh -c is opening a sub-shell, we cannot rely on a variable to keep track of the number of files opened. Why? Because a sub-shell cannot assign values to its parent shell.
Instead, let's store the files that were deleted into a file, and then count it:
find . -name "*.txt" -exec sh -c 'f={}; echo "$f" >> /tmp/findtest; rm $f' {} \;
if [ -s /tmp/findtest ]; then #check if the file is empty
echo "file has $(wc -l < /tmp/findtest) lines"
# you can also `cat /tmp/findtest` here to show the deleted files
else
echo "no matches"
fi
Note that you can cat /tmp/findtest to see the deleted files, or also use echo "$f" alone (without redirection) to indicate while removing. rm /tmp/findtest is also an option, to do once the process is finished.
You don't need to do all this. You can directly use find command to get the files over a particular size limit and delete it using xargs.
This should work:
#!/bin/bash
if [ $(find $1 -type f -size +$2k | wc -l) -eq 0 ]; then
echo "No Files to delete"
else
echo "Deleting the following files"
find $1 -size +$2 -exec ls {} \+
find $1 -size +$2 -exec ls {} \+ | xargs rm -f
echo "Done"
fi

find -exec with multiple commands

I am trying to use find -exec with multiple commands without any success. Does anybody know if commands such as the following are possible?
find *.txt -exec echo "$(tail -1 '{}'),$(ls '{}')" \;
Basically, I am trying to print the last line of each txt file in the current directory and print at the end of the line, a comma followed by the filename.
find accepts multiple -exec portions to the command. For example:
find . -name "*.txt" -exec echo {} \; -exec grep banana {} \;
Note that in this case the second command will only run if the first one returns successfully, as mentioned by #Caleb. If you want both commands to run regardless of their success or failure, you could use this construct:
find . -name "*.txt" \( -exec echo {} \; -o -exec true \; \) -exec grep banana {} \;
find . -type d -exec sh -c "echo -n {}; echo -n ' x '; echo {}" \;
One of the following:
find *.txt -exec awk 'END {print $0 "," FILENAME}' {} \;
find *.txt -exec sh -c 'echo "$(tail -n 1 "$1"),$1"' _ {} \;
find *.txt -exec sh -c 'echo "$(sed -n "\$p" "$1"),$1"' _ {} \;
Another way is like this:
multiple_cmd() {
tail -n1 $1;
ls $1
};
export -f multiple_cmd;
find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;
in one line
multiple_cmd() { tail -1 $1; ls $1 }; export -f multiple_cmd; find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;
"multiple_cmd()" - is a function
"export -f multiple_cmd" - will export it so any other subshell can see it
"find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;" - find that will execute the function on your example
In this way multiple_cmd can be as long and as complex, as you need.
Hope this helps.
There's an easier way:
find ... | while read -r file; do
echo "look at my $file, my $file is amazing";
done
Alternatively:
while read -r file; do
echo "look at my $file, my $file is amazing";
done <<< "$(find ...)"
Extending #Tinker's answer,
In my case, I needed to make a command | command | command inside the -exec to print both the filename and the found text in files containing a certain text.
I was able to do it with:
find . -name config -type f \( -exec grep "bitbucket" {} \; -a -exec echo {} \; \)
the result is:
url = git#bitbucket.org:a/a.git
./a/.git/config
url = git#bitbucket.org:b/b.git
./b/.git/config
url = git#bitbucket.org:c/c.git
./c/.git/config
I don't know if you can do this with find, but an alternate solution would be to create a shell script and to run this with find.
lastline.sh:
echo $(tail -1 $1),$1
Make the script executable
chmod +x lastline.sh
Use find:
find . -name "*.txt" -exec ./lastline.sh {} \;
Thanks to Camilo Martin, I was able to answer a related question:
What I wanted to do was
find ... -exec zcat {} | wc -l \;
which didn't work. However,
find ... | while read -r file; do echo "$file: `zcat $file | wc -l`"; done
does work, so thank you!
1st answer of Denis is the answer to resolve the trouble. But in fact it is no more a find with several commands in only one exec like the title suggest. To answer the one exec with several commands thing we will have to look for something else to resolv. Here is a example:
Keep last 10000 lines of .log files which has been modified in the last 7 days using 1 exec command using severals {} references
1) see what the command will do on which files:
find / -name "*.log" -a -type f -a -mtime -7 -exec sh -c "echo tail -10000 {} \> fictmp; echo cat fictmp \> {} " \;
2) Do it: (note no more "\>" but only ">" this is wanted)
find / -name "*.log" -a -type f -a -mtime -7 -exec sh -c "tail -10000 {} > fictmp; cat fictmp > {} ; rm fictmp" \;
I usually embed the find in a small for loop one liner, where the find is executed in a subcommand with $().
Your command would look like this then:
for f in $(find *.txt); do echo "$(tail -1 $f), $(ls $f)"; done
The good thing is that instead of {} you just use $f and instead of the -exec … you write all your commands between do and ; done.
Not sure what you actually want to do, but maybe something like this?
for f in $(find *.txt); do echo $f; tail -1 $f; ls -l $f; echo; done
should use xargs :)
find *.txt -type f -exec tail -1 {} \; | xargs -ICONSTANT echo $(pwd),CONSTANT
another one (working on osx)
find *.txt -type f -exec echo ,$(PWD) {} + -exec tail -1 {} + | tr ' ' '/'
A find+xargs answer.
The example below finds all .html files and creates a copy with the .BAK extension appended (e.g. 1.html > 1.html.BAK).
Single command with multiple placeholders
find . -iname "*.html" -print0 | xargs -0 -I {} cp -- "{}" "{}.BAK"
Multiple commands with multiple placeholders
find . -iname "*.html" -print0 | xargs -0 -I {} echo "cp -- {} {}.BAK ; echo {} >> /tmp/log.txt" | sh
# if you need to do anything bash-specific then pipe to bash instead of sh
This command will also work with files that start with a hyphen or contain spaces such as -my file.html thanks to parameter quoting and the -- after cp which signals to cp the end of parameters and the beginning of the actual file names.
-print0 pipes the results with null-byte terminators.
for xargs the -I {} parameter defines {} as the placeholder; you can use whichever placeholder you like; -0 indicates that input items are null-separated.
I found this solution (maybe it is already said in a comment, but I could not find any answer with this)
you can execute MULTIPLE COMMANDS in a row using "bash -c"
find . <SOMETHING> -exec bash -c "EXECUTE 1 && EXECUTE 2 ; EXECUTE 3" \;
in your case
find . -name "*.txt" -exec bash -c "tail -1 '{}' && ls '{}'" \;
i tested it with a test file:
[gek#tuffoserver tmp]$ ls *.txt
casualfile.txt
[gek#tuffoserver tmp]$ find . -name "*.txt" -exec bash -c "tail -1 '{}' && ls '{}'" \;
testonline1=some TEXT
./casualfile.txt
Here is my bash script that you can use to find multiple files and then process them all using a command.
Example of usage. This command applies a file linux command to each found file:
./finder.sh file fb2 txt
Finder script:
# Find files and process them using an external command.
# Usage:
# ./finder.sh ./processing_script.sh txt fb2 fb2.zip doc docx
counter=0
find_results=()
for ext in "${#:2}"
do
# #see https://stackoverflow.com/a/54561526/10452175
readarray -d '' ext_results < <(find . -type f -name "*.${ext}" -print0)
for file in "${ext_results[#]}"
do
counter=$((counter+1))
find_results+=("${file}")
echo ${counter}") ${file}"
done
done
countOfResults=$((counter))
echo -e "Found ${countOfResults} files.\n"
echo "Processing..."
counter=0
for file in "${find_results[#]}"
do
counter=$((counter+1))
echo -n ${counter}"/${countOfResults}) "
eval "$1 '${file}'"
done
echo "All files have been processed."

Resources