Linux "for" construct in ksh - ksh

I normally use the for construct in ksh to quickly iterate over a list of files to perform some action on it. It doesn't seem to work in this scenario:
The file info looks like:
$ ls -l tmp.*
rw------- 1 op general 375 Jul 25 04:09 tmp.zzyhsg4
...
so on. Basically a lot of tmp.* files.
Now when I try
$ ls -lS | grep 'Jul 25' | grep 'tmp.*' | cut -d' ' -f9 | more
tmp.zzyhsg4
..
it will print only the file names as expected.
However when I try the below
$ for i in `ls -lS | grep 'Jul 25' | grep 'tmp.*' | cut -d' ' -f9`
>do
>echo $i
>done
This does not print the name of all the files starting with tmp.* which were created on Jul 25 sorted by size. It prints the size column. Interestingly if I replace the f10 by f6 for the cut it will correctly print the month column. It starts to break after f9.
Any ideas ?

I can't reproduce exactly what you describe, but I have some suggestions to write more reliable commands.
cut -d' ' separates fields by spaces. If you have two spaces in a row, there's an empty field between them. So if you try with Aug 1 instead of Jul 25, the file name column is shifted by 1. And if you try with files that are more than 6 months old, the (5-character) time is replaced by a space followed by the 4-digit year. Also, depending on your version of ls there may be more than one space between some columns. Yet another issue is that some versions of ls don't display the group column. And then some file names contain spaces. And some file names contain special characters that ls may display as ?. In summary, you can't parse the output of ls -l by counting spaces, and you can't even parse the output of ls -l by counting whitespace-delimited fields. Just don't parse the output of ls.
The standard command for generating lists of names of existing files is find. Since you mention Linux, I'll mention options that work with GNU find (the version you get on Linux) but not on other unixes.
Let's start simple: list the files called tmp.* in the current directory.
find . -name 'tmp.*'
We want only the files created on July 25, that's 7 days ago.
find . -name 'tmp.*' -daystart -mtime 7
This is fragile since it won't work tomorrow. The usual way to specify a precise date is to create files dated at the earliest and latest allowable times and tell find to only return files dated between these two.
touch -t 201007250000 .earliest
touch -t 201007260000 .latest
find . -name 'tmp.*' -newer .earliest \! -newer .latest
rm .earliest .latest
The find command explores subdirectories recursively. If you don't want this:
find . -name 'tmp.*' -daystart -mindepth 1 -maxdepth 1 -mtime 7
If you want the files sorted by size:
find . -name 'tmp.*' -daystart -mtime 7 -printf '%s\t%p\n' | sort -n -k 1 | cut -f 2-
Finally, if you want to operate on the files, never use find in backticks, the way you used ls, because this will fail if the file names contain whitespace or some special characters, because the shell splits the output of `command` at whitespace and then does globbing on the resulting words. Instead, use the -exec option to find; the ; version executes mycommand once per file with {} replaced by the file name, whereas the + version usually invokes mycommand only once with {} replaced by the list of file names.
find . -name 'tmp.*' -daystart -mtime 7 -exec mycommand -myoption {} \;
find . -name 'tmp.*' -daystart -mtime 7 -exec mycommand -myoption {} +

To get all files named tmp.*, use
$ ls -lS tmp.*
With the cut you tell to take the space as a delimiter. That will not work properly. The number of spaces between fields is flexible, so you will have a varying number of fields. (between every 2 spaces you will have an empty field)
Better use find, which can shape your file list to any form you like:
$ find tmp.* -printf "%s %CD %f\\n" | grep "07/25/10" | sort -n
(man find to also get the date filtering within the find command)

Related

find the last created subdirectory in a directory of 500k subdirs

I have a folder with some 500k subfolders - and I would like to find the last directory which was added to this folder. I am having to do this due to a power failure issue :(
I dont excatly know when the power failed, so using this:
find . -type d -mmin -360 -print
which I beleive is the last 360 minutes? However, gives me results which I am not exactly sure of.
Shortly speaking, I would like to get the last directory which was created within this folder.
Any pointers would be great.
Suggesting :
find . -type d -printf "%C# %p\n" |sort -n|tail -n1|awk '{print $2}'
Explanation:
find . -type d -printf "%C# %p\n"
find . start searching from current directory recursively
-type d search only directory files
-printf "%C# %p\n" for each directory print its last change time in secs from Unix epoch time including sec fraction, followed by file name with path.
For example: 1648051886.4404644000 /tmp/mc-dudi
|sort -n|tail -n1
Sort the result from find as numbers, and print the last row.
awk '{print $2}'
From last row, print only second field
You might try this: it shows your last modification date/time in a sortable manner, and by sorting it, the last entry should be the most recent one:
find ./ -exec ls -dils --time-style=long-iso {} \; | sort -k8,9
Edit: and specific for directories:
find ./ -type d -exec ls -dils --time-style=long-iso {} \; | sort -k8,9
Assuming you're using a file system that tracks file creation ('birth' is the usual terminology) times, and GNU versions of the programs used below:
find . -type d -exec stat --printf '%W\t%n\0' \{\} + | sort -z -k1,1nr | head -1 -z | cut -f 2-
This will find all subdirectories of the current working directory, and for each one, print its birth time (The %W format field for stat(1)) and name (The %n format). Those entries are then sorted based on the timestamp, newest first, and the first line is returned minus the timestamp.
Unfortunately, GNU find's -printf doesn't support birth times, so it calls out to stat(1) to get those, using the multi-argument version of -exec to minimize the number of instances of the program that need to be run. The rest is straightforward sorting of a column, using 0-byte terminators instead of newlines to robustly handle filenames with newlines or other funky characters in them.
Mantaining a symbolic link to the last known subdirectory could avoid listing all of them to find the latest one.
ls -dl $(readlink ~/tmp/last_dir)
drwxr-xr-x 2 lmc users 4096 Jan 13 13:20 /home/lmc/Documents/some_dir
Find newer ones
ls -ldt $(find -L . -newer ~/tmp/last_dir -type d ! -path .)
drwxr-xr-x 2 lmc users 6 Mar 1 00:00 ./dir2
drwxr-xr-x 2 lmc users 6 Feb 1 00:00 ./dir1
Or
ls -ldt $(find -L . -newer ~/tmp/last_dir -type d ! -path .) | head -n 1
drwxr-xr-x 2 lmc users 6 Mar 1 00:00 ./dir2
Don't use the chosen answer if you really want to find the last created sub-directory
According to the question:
Directories should be sorted by creation time instead of modification time.
find --mindepth 1 is necessary because we want to search only sub-directories.
Here are 2 solutions that both fulfill the 2 requirements:
GNU
find . -mindepth 1 -type d -exec stat -c '%W %n' '{}' '+' |
sort -nr | head -n1
BSD
find . -mindepth 1 -type d -exec stat -f '%B %N' '{}' '+' |
sort -nr | head -n1

How to view file date of result of find command in bash

I use a find command to find some kinds of files in bash. Everything goes fine unlness the result that is shown to me just contains the file name but not the (last modification) date of file. I tried to pipe it into ls or ls -ltr but it just does not show the filedate column in result, also I tried this:
ls -ltr | find . -ctime 1
but actually I didn't work.
Can you please guide me how can I view the filedate of files returned by a find command?
You need either xargs or -exec for this:
find . -ctime 1 -exec ls -l {} \;
find . -ctime 1 | xargs ls -l
(The first executes ls on every found file individually, the second bunches them up into one ore more big ls invocations, so that they may be formatted slightly better.)
If all you want is to display an ls like output you can use the -ls option of find:
$ find . -name resolv.conf -ls
1048592 8 -rw-r--r-- 1 root root 126 Dec 9 10:12 ./resolv.conf
If you want only the timestamp you'll need to look at the -printf option
$ find . -name resolv.conf -printf "%a\n"
Mon May 21 09:15:24 2012
find . -ctime 1 -printf '%t\t%p\n'
prints the datetime and file path, separated by a ␉ character.

unix command to find most recent directory created

I want to copy the files from the most recent directory created. How would I do so in unix?
For example, if I have the directories names as date stamp as such:
/20110311
/20110318
/20110325
This is the answer to the question I think you are asking.
When I deal with many directories that have date/time stamps in the name, I always take the approach that you have which is YYYYMMDD - the great thing about that is that the date order is then also the alphabetical order. In most shells (certainly in bash and I am 90% sure of the others), the '*' expansion is done alphabetically, and by default 'ls' return alphabetical order. Hence
ls | head -1
ls | tail -1
Give you the earliest and the latest dates in the directory.
This can be extended to only keep the last 5 entries etc.
lastdir=`ls -tr <parentdir> | tail -1`
I don't know how to make the backticks play nice with the commenting system here. Just replace those apostrophes with backticks.
After some experimenting, I came up with the following:
The unix stat command is useful here. The '-t' option causes stat to print its output in terse mode (all in one line), and the 13th element of that terse output is the unix timestamp (seconds since epoch) for the last-modified time. This command will list all directories (and sub-directories) in order from newest-modified to oldest-modified:
find -type d -exec stat -t {} \; | sort -r -n -k 13,13
Hopefully the "terse" mode of stat will remain consistent in future releases of stat !
Here's some explanation of the command-line options used:
find -type d # only find directories
find -exec [command] {} \; # execute given command against each *found* file.
sort -r # reverse the sort
sort -n # numeric sort (100 should not appear before 2!)
sort -k M,N # only sort the line using elements M through N.
Returning to your original request, to copy files, maybe try the following. To output just a single directory (the most recent), append this to the command (notice the initial pipe), and feed it all into your 'cp' command with backticks.
| head --lines=1 | sed 's/\ .*$//'
The trouble with the ls based solutions is that they are not filtering just for directories. I think this:
cp `find . -mindepth 1 -maxdepth 1 -type d -exec stat -c "%Y %n" {} \; |sort -n -r |head -1 |awk '{print $2}'`/* /target-directory/.
might do the trick, though note that that will only copy files in the immediate directory. If you want a more general answer for copying anything below your newest directory over to a new directory I think you would be better off using rsync like:
rsync -av `find . -mindepth 1 -maxdepth 1 -type d -exec stat -c "%Y %n" {} \; |sort -n -r |head -1 |awk '{print $2}'`/ /target-directory/
but it depends a bit which behaviour you want. The explanation of the stuff in the backticks is:
. - the current directory (you may want to specify an absolute path here)
-mindepth/-maxdepth - restrict the find command only to the immediate children of the current directory
-type d - only directories
-exec stat .. - outputs the modified time and the name of the directory from find
sort -n -r |head -1 | awk '{print $2}' - date orders the directory and outputs the name of the most recently modified
If your directories are named YYYYMMDD like your question suggests, take advantage of the alphabetic globbing.
Put all directories in an array, and then pick the first one:
dirs=(*/); first_dir="$dirs";
(This is actually a shortcut for first_dir="${dirs[0]}";.)
Similarly, for the last one:
dirs=(*/); last_dir="${dirs[$((${#dirs[#]} - 1))]}";
Ugly syntax, but this is what it breaks down to:
# Create an array of all directories inside the working directory.
dirs=(*/);
# Get the number of entries in the array.
num_dirs=${#dirs[#]};
# Calculate the index of the last entry.
last_index=$(($num_dirs - 1));
# Get the value at the last index.
last_dir="${dirs[$last_index]}";
I know this is an old question with an accepted answer, but I think this method is preferable as it does everything in Bash. No reason to spawn extra processes, let alone parse the output of ls. (Which, admittedly, should be fine in this particular case of YYYYMMDD names.)
please try with following command
ls -1tr | tail -1
find ~ -type d | ls -ltra
This one is simple and useful which I learned recently.
This command will show the results in reverse chronological order.
I wrote a command that can be used to identify which folder or files are created in a folder as a newest. That's seems pure :)
#/bin/sh
path=/var/folder_name
newest=`find $path -maxdepth 1 -exec stat -t {} \; |sed 1d |sort -r -k 14 | head -1 |awk {'print $1'} | sed 's/\.\///g'`
find $path -maxdepth 1| sed 1d |grep -v $newest

remove old backup files

# find /home/shantanu -name 'my_stops*' | xargs ls -lt | head -2
The command mentioned above will list the latest 2 files having my_stops in it's name. I want to keep these 2 files. But I want to delete all other files starting with "my_stops" from the current directory.
If you create backups on a regular basis, it may be useful to use the -atime option of find so only files older than your last two backups can be selected for deletion.
For daily backups you might use
$ find /home/shantanu -atime +2 -name 'my_stops*' -exec rm {} \;
but a different expression (other than -atime) may suit you better.
In the example I used +2 to mean more than 2 days.
Here is a non-recursive solution:
ls -t my_stops* | awk 'NR>2 {system("rm \"" $0 "\"")}'
Explanation:
The ls command lists files with the latest 2 on top
The awk command states that for those lines (NR = number of records, i.e. lines) greater than 2, delete them
The quote characters are needed just in case the file names have embedded spaces
See here
(ls -t|head -n 2;ls)|sort|uniq -u|xargs rm
That will show you from the second line forward ;)
find /home/shantanu -name 'my_stops*' | xargs ls -lt | tail -n +2
Just keep in mind that find is recursive ;)
Without recursive approach:
find /home/folder/ -maxdepth 1 -name "*.jpg" -mtime +2

How can I count all the lines of code in a directory recursively?

We've got a PHP application and want to count all the lines of code under a specific directory and its subdirectories.
We don't need to ignore comments, as we're just trying to get a rough idea.
wc -l *.php
That command works great for a given directory, but it ignores subdirectories. I was thinking the following comment might work, but it is returning 74, which is definitely not the case...
find . -name '*.php' | wc -l
What's the correct syntax to feed in all the files from a directory resursively?
Try:
find . -name '*.php' | xargs wc -l
or (when file names include special characters such as spaces)
find . -name '*.php' | sed 's/.*/"&"/' | xargs wc -l
The SLOCCount tool may help as well.
It will give an accurate source lines of code count for whatever
hierarchy you point it at, as well as some additional stats.
Sorted output:
find . -name '*.php' | xargs wc -l | sort -nr
For another one-liner:
( find ./ -name '*.php' -print0 | xargs -0 cat ) | wc -l
It works on names with spaces and only outputs one number.
You can use the cloc utility which is built for this exact purpose. It reports each the amount of lines in each language, together with how many of them are comments, etc. CLOC is available on Linux, Mac and Windows.
Usage and output example:
$ cloc --exclude-lang=DTD,Lua,make,Python .
2570 text files.
2200 unique files.
8654 files ignored.
http://cloc.sourceforge.net v 1.53 T=8.0 s (202.4 files/s, 99198.6 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
JavaScript 1506 77848 212000 366495
CSS 56 9671 20147 87695
HTML 51 1409 151 7480
XML 6 3088 1383 6222
-------------------------------------------------------------------------------
SUM: 1619 92016 233681 467892
-------------------------------------------------------------------------------
If using a decently recent version of Bash (or ZSH), it's much simpler:
wc -l **/*.php
In the Bash shell this requires the globstar option to be set, otherwise the ** glob-operator is not recursive. To enable this setting, issue
shopt -s globstar
To make this permanent, add it to one of the initialization files (~/.bashrc, ~/.bash_profile etc.).
On Unix-like systems, there is a tool called cloc which provides code statistics.
I ran in on a random directory in our code base it says:
59 text files.
56 unique files.
5 files ignored.
http://cloc.sourceforge.net v 1.53 T=0.5 s (108.0 files/s, 50180.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
C 36 3060 1431 16359
C/C++ Header 16 689 393 3032
make 1 17 9 54
Teamcenter def 1 10 0 36
-------------------------------------------------------------------------------
SUM: 54 3776 1833 19481
-------------------------------------------------------------------------------
You didn't specify how many files are there or what is the desired output.
This may be what you are looking for:
find . -name '*.php' | xargs wc -l
Yet another variation :)
$ find . -name '*.php' | xargs cat | wc -l
This will give the total sum, instead of file-by-file.
Add . after find to make it work.
Use find's -exec and awk. Here we go:
find . -type f -exec wc -l {} \; | awk '{ SUM += $0} END { print SUM }'
This snippet finds for all files (-type f). To find by file extension, use -name:
find . -name '*.py' -exec wc -l '{}' \; | awk '{ SUM += $0; } END { print SUM; }'
More common and simple as for me, suppose you need to count files of different name extensions (say, also natives):
wc $(find . -type f | egrep "\.(h|c|cpp|php|cc)" )
The tool Tokei displays statistics about code in a directory. Tokei will show the number of files, total lines within those files and code, comments, and blanks grouped by language. Tokei is also available on Mac, Linux, and Windows.
An example of the output of Tokei is as follows:
$ tokei
-------------------------------------------------------------------------------
Language Files Lines Code Comments Blanks
-------------------------------------------------------------------------------
CSS 2 12 12 0 0
JavaScript 1 435 404 0 31
JSON 3 178 178 0 0
Markdown 1 9 9 0 0
Rust 10 408 259 84 65
TOML 3 69 41 17 11
YAML 1 30 25 0 5
-------------------------------------------------------------------------------
Total 21 1141 928 101 112
-------------------------------------------------------------------------------
Tokei can be installed by following the instructions on the README file in the repository.
POSIX
Unlike most other answers here, these work on any POSIX system, for any number of files, and with any file names (except where noted).
Lines in each file:
find . -name '*.php' -type f -exec wc -l {} \;
# faster, but includes total at end if there are multiple files
find . -name '*.php' -type f -exec wc -l {} +
Lines in each file, sorted by file path
find . -name '*.php' -type f | sort | xargs -L1 wc -l
# for files with spaces or newlines, use the non-standard sort -z
find . -name '*.php' -type f -print0 | sort -z | xargs -0 -L1 wc -l
Lines in each file, sorted by number of lines, descending
find . -name '*.php' -type f -exec wc -l {} \; | sort -nr
# faster, but includes total at end if there are multiple files
find . -name '*.php' -type f -exec wc -l {} + | sort -nr
Total lines in all files
find . -name '*.php' -type f -exec cat {} + | wc -l
There is a little tool called sloccount to count the lines of code in a directory.
It should be noted that it does more than you want as it ignores empty lines/comments, groups the results per programming language and calculates some statistics.
You want a simple for loop:
total_count=0
for file in $(find . -name *.php -print)
do
count=$(wc -l $file)
let total_count+=count
done
echo "$total_count"
For sources only:
wc `find`
To filter, just use grep:
wc `find | grep .php$`
A straightforward one that will be fast, will use all the search/filtering power of find, not fail when there are too many files (number arguments overflow), work fine with files with funny symbols in their name, without using xargs, and will not launch a uselessly high number of external commands (thanks to + for find's -exec). Here you go:
find . -name '*.php' -type f -exec cat -- {} + | wc -l
None of the answers so far gets at the problem of filenames with spaces.
Additionally, all that use xargs are subject to fail if the total length of paths in the tree exceeds the shell environment size limit (defaults to a few megabytes in Linux).
Here is one that fixes these problems in a pretty direct manner. The subshell takes care of files with spaces. The awk totals the stream of individual file wc outputs, so it ought never to run out of space. It also restricts the exec to files only (skipping directories):
find . -type f -name '*.php' -exec bash -c 'wc -l "$0"' {} \; | awk '{s+=$1} END {print s}'
I know the question is tagged as bash, but it seems that the problem you're trying to solve is also PHP related.
Sebastian Bergmann wrote a tool called PHPLOC that does what you want and on top of that provides you with an overview of a project's complexity. This is an example of its report:
Size
Lines of Code (LOC) 29047
Comment Lines of Code (CLOC) 14022 (48.27%)
Non-Comment Lines of Code (NCLOC) 15025 (51.73%)
Logical Lines of Code (LLOC) 3484 (11.99%)
Classes 3314 (95.12%)
Average Class Length 29
Average Method Length 4
Functions 153 (4.39%)
Average Function Length 1
Not in classes or functions 17 (0.49%)
Complexity
Cyclomatic Complexity / LLOC 0.51
Cyclomatic Complexity / Number of Methods 3.37
As you can see, the information provided is a lot more useful from the perspective of a developer, because it can roughly tell you how complex a project is before you start working with it.
If you want to keep it simple, cut out the middleman and just call wc with all the filenames:
wc -l `find . -name "*.php"`
Or in the modern syntax:
wc -l $(find . -name "*.php")
This works as long as there are no spaces in any of the directory names or filenames. And as long as you don't have tens of thousands of files (modern shells support really long command lines). Your project has 74 files, so you've got plenty of room to grow.
WC -L ? better use GREP -C ^
wc -l? Wrong!
The wc command counts new lines codes, not lines! When the last line in the file does not end with new line code, this will not be counted!
If you still want count lines, use grep -c ^. Full example:
# This example prints line count for all found files
total=0
find /path -type f -name "*.php" | while read FILE; do
# You see, use 'grep' instead of 'wc'! for properly counting
count=$(grep -c ^ < "$FILE")
echo "$FILE has $count lines"
let total=total+count #in bash, you can convert this for another shell
done
echo TOTAL LINES COUNTED: $total
Finally, watch out for the wc -l trap (counts enters, not lines!!!)
Giving out the longest files first (ie. maybe these long files need some refactoring love?), and excluding some vendor directories:
find . -name '*.php' | xargs wc -l | sort -nr | egrep -v "libs|tmp|tests|vendor" | less
For Windows, an easy-and-quick tool is LocMetrics.
You can use a utility called codel (link). It's a simple Python module to count lines with colorful formatting.
Installation
pip install codel
Usage
To count lines of C++ files (with .cpp and .h extensions), use:
codel count -e .cpp .h
You can also ignore some files/folder with the .gitignore format:
codel count -e .py -i tests/**
It will ignore all the files in the tests/ folder.
The output looks like:
You also can shorten the output with the -s flag. It will hide the information of each file and show only information about each extension. The example is below:
If you want your results sorted by number of lines, you can just add | sort or | sort -r (-r for descending order) to the first answer, like so:
find . -name '*.php' | xargs wc -l | sort -r
If the files are too many, better to just look for the total line count.
find . -name '*.php' | xargs wc -l | grep -i ' total' | awk '{print $1}'
Very simply:
find /path -type f -name "*.php" | while read FILE
do
count=$(wc -l < $FILE)
echo "$FILE has $count lines"
done
Something different:
wc -l `tree -if --noreport | grep -e'\.php$'`
This works out fine, but you need to have at least one *.php file in the current folder or one of its subfolders, or else wc stalls.
It’s very easy with Z shell (zsh) globs:
wc -l ./**/*.php
If you are using Bash, you just need to upgrade. There is absolutely no reason to use Bash.
On OS X at least, the find+xarg+wc commands listed in some of the other answers prints "total" several times on large listings, and there is no complete total given. I was able to get a single total for .c files using the following command:
find . -name '*.c' -print0 |xargs -0 wc -l|grep -v total|awk '{ sum += $1; } END { print "SUM: " sum; }'
If you need just the total number of lines in, let's say, your PHP files, you can use very simple one line command even under Windows if you have GnuWin32 installed. Like this:
cat `/gnuwin32/bin/find.exe . -name *.php` | wc -l
You need to specify where exactly is the find.exe otherwise the Windows provided FIND.EXE (from the old DOS-like commands) will be executed, since it is probably before the GnuWin32 in the environment PATH and has different parameters and results.
Please note that in the command above you should use back-quotes, not single quotes.
While I like the scripts, I prefer this one as it also shows a per-file summary as long as a total:
wc -l `find . -name "*.php"`

Resources