Is there a bash command which counts files? - bash

Is there a bash command which counts the number of files that match a pattern?
For example, I want to get the count of all files in a directory which match this pattern: log*

This simple one-liner should work in any shell, not just bash:
ls -1q log* | wc -l
ls -1q will give you one line per file, even if they contain whitespace or special characters such as newlines.
The output is piped to wc -l, which counts the number of lines.

Lots of answers here, but some don't take into account
file names with spaces, newlines, or control characters in them
file names that start with hyphens (imagine a file called -l)
hidden files, that start with a dot (if the glob was *.log instead of log*
directories that match the glob (e.g. a directory called logs that matches log*)
empty directories (i.e. the result is 0)
extremely large directories (listing them all could exhaust memory)
Here's a solution that handles all of them:
ls 2>/dev/null -Ubad1 -- log* | wc -l
Explanation:
-U causes ls to not sort the entries, meaning it doesn't need to load the entire directory listing in memory
-b prints C-style escapes for nongraphic characters, crucially causing newlines to be printed as \n.
-a prints out all files, even hidden files (not strictly needed when the glob log* implies no hidden files)
-d prints out directories without attempting to list the contents of the directory, which is what ls normally would do
-1 makes sure that it's on one column (ls does this automatically when writing to a pipe, so it's not strictly necessary)
2>/dev/null redirects stderr so that if there are 0 log files, ignore the error message. (Note that shopt -s nullglob would cause ls to list the entire working directory instead.)
wc -l consumes the directory listing as it's being generated, so the output of ls is never in memory at any point in time.
-- File names are separated from the command using -- so as not to be understood as arguments to ls (in case log* is removed)
The shell will expand log* to the full list of files, which may exhaust memory if it's a lot of files, so then running it through grep is be better:
ls -Uba1 | grep ^log | wc -l
This last one handles extremely large directories of files without using a lot of memory (albeit it does use a subshell). The -d is no longer necessary, because it's only listing the contents of the current directory.

You can do this safely (i.e. won't be bugged by files with spaces or \n in their name) with bash:
$ shopt -s nullglob
$ logfiles=(*.log)
$ echo ${#logfiles[#]}
You need to enable nullglob so that you don't get the literal *.log in the $logfiles array if no files match. (See How to "undo" a 'set -x'? for examples of how to safely reset it.)

For a recursive search:
find . -type f -name '*.log' -printf x | wc -c
wc -c will count the number of characters in the output of find, while -printf x tells find to print a single x for each result. This avoids any problems with files with odd names which contain newlines etc.
For a non-recursive search, do this:
find . -maxdepth 1 -type f -name '*.log' -printf x | wc -c

The accepted answer for this question is wrong, but I have low rep so can't add a comment to it.
The correct answer to this question is given by Mat:
shopt -s nullglob
logfiles=(*.log)
echo ${#logfiles[#]}
The problem with the accepted answer is that wc -l counts the number of newline characters, and counts them even if they print to the terminal as '?' in the output of 'ls -l'. This means that the accepted answer FAILS when a filename contains a newline character. I have tested the suggested command:
ls -l log* | wc -l
and it erroneously reports a value of 2 even if there is only 1 file matching the pattern whose name happens to contain a newline character. For example:
touch log$'\n'def
ls log* -l | wc -l

An important comment
(not enough reputation to comment)
This is BUGGY:
ls -1q some_pattern | wc -l
If shopt -s nullglob happens to be set, it prints the number of ALL regular files, not just the ones with the pattern (tested on CentOS-8 and Cygwin). Who knows what other meaningless bugs does ls have?
This is CORRECT and much faster:
shopt -s nullglob; files=(some_pattern); echo ${#files[#]};
It does the expected job.
And the running times differ.
The 1st: 0.006 on CentOS, and 0.083 on Cygwin (in case it is used with care).
The 2nd: 0.000 on CentOS, and 0.003 on Cygwin.

If you have a lot of files and you don't want to use the elegant shopt -s nullglob and bash array solution, you can use find and so on as long as you don't print out the file name (which might contain newlines).
find -maxdepth 1 -name "log*" -not -name ".*" -printf '%i\n' | wc -l
This will find all files that match log* and that don't start with .* — The "not name .*" is redunant, but it's important to note that the default for "ls" is to not show dot-files, but the default for find is to include them.
This is a correct answer, and handles any type of file name you can throw at it, because the file name is never passed around between commands.
But, the shopt nullglob answer is the best answer!

Here is my one liner for this.
file_count=$( shopt -s nullglob ; set -- $directory_to_search_inside/* ; echo $#)

You can define such a command easily, using a shell function. This method does not require any external program and does not spawn any child process. It does not attempt hazardous ls parsing and handles “special” characters (whitespaces, newlines, backslashes and so on) just fine. It only relies on the file name expansion mechanism provided by the shell. It is compatible with at least sh, bash and zsh.
The line below defines a function called count which prints the number of arguments with which it has been called.
count() { echo $#; }
Simply call it with the desired pattern:
count log*
For the result to be correct when the globbing pattern has no match, the shell option nullglob (or failglob — which is the default behavior on zsh) must be set at the time expansion happens. It can be set like this:
shopt -s nullglob # for sh / bash
setopt nullglob # for zsh
Depending on what you want to count, you might also be interested in the shell option dotglob.
Unfortunately, with bash at least, it is not easy to set these options locally. If you don’t want to set them globally, the most straightforward solution is to use the function in this more convoluted manner:
( shopt -s nullglob ; shopt -u failglob ; count log* )
If you want to recover the lightweight syntax count log*, or if you really want to avoid spawning a subshell, you may hack something along the lines of:
# sh / bash:
# the alias is expanded before the globbing pattern, so we
# can set required options before the globbing gets expanded,
# and restore them afterwards.
count() {
eval "$_count_saved_shopts"
unset _count_saved_shopts
echo $#
}
alias count='
_count_saved_shopts="$(shopt -p nullglob failglob)"
shopt -s nullglob
shopt -u failglob
count'
As a bonus, this function is of a more general use. For instance:
count a* b* # count files which match either a* or b*
count $(jobs -ps) # count stopped jobs (sh / bash)
By turning the function into a script file (or an equivalent C program), callable from the PATH, it can also be composed with programs such as find and xargs:
find "$FIND_OPTIONS" -exec count {} \+ # count results of a search

You can use the -R option to find the files along with those inside the recursive directories
ls -R | wc -l // to find all the files
ls -R | grep log | wc -l // to find the files which contains the word log
you can use patterns on the grep

I've given this answer a lot of thought, especially given the don't-parse-ls stuff. At first, I tried
<WARNING! DID NOT WORK>
du --inodes --files0-from=<(find . -maxdepth 1 -type f -print0) | awk '{sum+=int($1)}END{print sum}'
</WARNING! DID NOT WORK>
which worked if there was only a filename like
touch $'w\nlf.aa'
but failed if I made a filename like this
touch $'firstline\n3 and some other\n1\n2\texciting\n86stuff.jpg'
I finally came up with what I'm putting below. Note I was trying to get a count of all files in the directory (not including any subdirectories). I think it, along with the answers by #Mat and #Dan_Yard , as well as having at least most of the requirements set out by #mogsie (I'm not sure about memory.) I think the answer by #mogsie is correct, but I always try to stay away from parsing ls unless it's an extremely specific situation.
awk -F"\0" '{print NF-1}' < <(find . -maxdepth 1 -type f -print0) | awk '{sum+=$1}END{print sum}'
More readably:
awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
This is doing a find specifically for files, delimiting the output with a null character (to avoid problems with spaces and linefeeds), then counting the number of null characters. The number of files will be one less than the number of null characters, since there will be a null character at the end.
To answer the OP's question, there are two cases to consider
1) Non-recursive search:
awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -name "log*" -print0) | \
awk '{sum+=$1}END{print sum}'
2) Recursive search. Note that what's inside the -name parameter might need to be changed for slightly different behavior (hidden files, etc.).
awk -F"\0" '{print NF-1}' < \
<(find . -type f -name "log*" -print0) | \
awk '{sum+=$1}END{print sum}'
If anyone would like to comment on how these answers compare to those I've mentioned in this answer, please do.
Note, I got to this thought process while getting this answer.

This can be done with standard POSIX shell grammar.
Here is a simple count_entries function:
#!/usr/bin/env sh
count_entries()
{
# Emulating Bash nullglob
# If argument 1 is not an existing entry
if [ ! -e "$1" ]
# argument is a returned pattern
# then shift it out
then shift
fi
echo $#
}
for a compact definition:
count_entries(){ [ ! -e "$1" ]&&shift;echo $#;}
Featured POSIX compatible file counter by type:
#!/usr/bin/env sh
count_files()
# Count the file arguments matching the file operator
# Synopsys:
# count_files operator FILE [...]
# Arguments:
# $1: The file operator
# Allowed values:
# -a FILE True if file exists.
# -b FILE True if file is block special.
# -c FILE True if file is character special.
# -d FILE True if file is a directory.
# -e FILE True if file exists.
# -f FILE True if file exists and is a regular file.
# -g FILE True if file is set-group-id.
# -h FILE True if file is a symbolic link.
# -L FILE True if file is a symbolic link.
# -k FILE True if file has its `sticky' bit set.
# -p FILE True if file is a named pipe.
# -r FILE True if file is readable by you.
# -s FILE True if file exists and is not empty.
# -S FILE True if file is a socket.
# -t FD True if FD is opened on a terminal.
# -u FILE True if the file is set-user-id.
# -w FILE True if the file is writable by you.
# -x FILE True if the file is executable by you.
# -O FILE True if the file is effectively owned by you.
# -G FILE True if the file is effectively owned by your group.
# -N FILE True if the file has been modified since it was last read.
# $#: The files arguments
# Output:
# The number of matching files
# Return:
# 1: Unknown file operator
{
operator=$1
shift
case $operator in
-[abcdefghLkprsStuwxOGN])
for arg; do
# If file is not of required type
if ! test "$operator" "$arg"; then
# Shift it out
shift
fi
done
echo $#
;;
*)
printf 'Invalid file operator: %s\n' "$operator" >&2
return 1
;;
esac
}
count_files "$#"
Example usages:
count_files -f log*.txt
count_files -d datadir*
Alternate count non-directory entries without a loop:
#!/bin/sh
# Creates strings of as many dots as expanded arguments
# dotted string for entries matching star pattern
star=$(printf '%.0s.' ./*)
# dotted string for entries matching star slash pattern (directories)
star_dir=$(printf '%.0s.' ./*/)
# dotted string for entries matching dot star pattern
dot_star=$(printf '%.0s.' ./.*)
# dotted string for entries matching dot star slash pattern (directories)
dot_star_dir=$(printf '%.0s.' ./.*/)
# Print pattern matches count excluding directories matches
printf 'Files count: %d\n' $((
${#star} - ${#star_dir} +
${#dot_star} - ${#dot_star_dir}
))

Here is a generic Bash function you can use in your scripts.
# #see https://stackoverflow.com/a/11307382/430062
function countFiles {
shopt -s nullglob
logfiles=($1)
echo ${#logfiles[#]}
}
FILES_COUNT=$(countFiles "$file-*")

ls -1 log* | wc -l
Which means list one file per line and then pipe it to word count command with parameter switching to count lines.

Here's what I always do:
ls log* | awk 'END{print NR}'

To count everything just pipe ls to word count line:
ls | wc -l
To count with pattern, pipe to grep first:
ls | grep log | wc -l

Related

How do I find duplicate files by comparing them by size (ie: not hashing) in bash

How do I find duplicate files by comparing them by size (ie: not hashing) in bash.
Testbed files:
-rw-r--r-- 1 usern users 68239 May 3 12:29 The W.pdf
-rw-r--r-- 1 usern users 68239 May 3 12:29 W.pdf
-rw-r--r-- 1 usern users 8 May 3 13:43 X.pdf
Yes, files can have spaces (Boo!).
I want to check files in the same directory, move the ones which match something else into 'these are probably duplicates' folder.
My probable use-case is going to have humans randomly mis-naming a smaller set of files (ie: not generating files of arbitrary length). It is fairly unlikely that two files will be the same size and yet be different files. Sure, as a backup I could hash and check two files of identical size. But mostly, it will be people taking a file and misnaming it / re-adding it to a pile, of which it is already there.
So, preferably a solution with widely installed tools (posix?). And I'm not supposed to parse the output of ls, so I need another way to get actual size (and not a du approximate).
"Vote to close!"
Hold up cowboy.
I bet you're going to suggest this (cool, you can google search):
https://unix.stackexchange.com/questions/71176/find-duplicate-files
No fdupes (nor jdupes, nor...), nor finddup, nor rmlint, nor fslint - I can't guarantee those on other systems (much less mine), and I don't want to be stuck as customer support dealing with installing them on random systems from now to eternity, nor even in getting emails about that sh...stuff and having to tell them to RTFM and figure it out. Plus, in reality, I should write my script to test functionality of what is installed, but, that's beyond the scope.
https://unix.stackexchange.com/questions/192701/how-to-remove-duplicate-files-using-bash
All these solutions want to start by hashing. Some cool ideas in some of these: hash just a chunk of both files, starting somewhere past the header, then only do full compare if those turn up matching. Good idea for double checking work, but would prefer to only do that on the very, very few that actually are duplicate. As, looking over the first several thousand of these by hand, not one duplicate has been even close to a different file.
https://unix.stackexchange.com/questions/277697/whats-the-quickest-way-to-find-duplicated-files
Proposed:
$find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
Breaks for me:
find: unknown option -- n
usage: find [-dHhLXx] [-f path] path ... [expression]
uniq: unknown option -- w
usage: uniq [-ci] [-d | -u] [-f fields] [-s chars] [input_file [output_file]]
find: unknown option -- t
usage: find [-dHhLXx] [-f path] path ... [expression]
xargs: md5sum: No such file or directory
https://unix.stackexchange.com/questions/170693/compare-directory-trees-regarding-file-name-and-size-and-date
Haven't been able to figure out how rsync -nrvc --delete might work in the same directory, but there might be solution in there.
Well how about cmp? Yeah, that looks pretty good, actually!
cmp -z file1 file2
Bummer, my version of cmp does not include the -z size option.
However, I tried implementing it just for grins - and when it failed, looking at it I realized that I also need help constructing my loop logic. Removing things from my loops in the midst of processing them is probably a recipe for breakage, duh.
if [ ! -d ../Dupes/ ]; then
mkdir ../Dupes/ || exit 1 # Cuz no set -e, and trap not working
fi
for i in ./*
do
for j in ./*
do
if [[ "$i" != "$j" ]]; then # Yes, it will be identical to itself
if [[ $(cmp -s "$i" "$j") ]]; then
echo "null" # Cuz I can't use negative of the comparison?
else
mv -i "$i" ../Dupes/
fi
fi
done
done
https://unix.stackexchange.com/questions/367749/how-to-find-and-delete-duplicate-files-within-the-same-directory
Might have something I could use, but I'm not following what's going on in there.
https://superuser.com/questions/259148/bash-find-duplicate-files-mac-linux-compatible
If it were something that returns size, instead of md5, maybe one of the answers in here?
https://unix.stackexchange.com/questions/570305/what-is-the-most-efficient-way-to-find-duplicate-files
Didn't really get answered.
TIL: Sending errors from . scriptname will close my terminal instantly. Thanks, Google!
TIL: Sending errors from scripts executed via $PATH will close the terminal if shopt -s extdebug + trap checkcommand DEBUG are set in profile to try and catch rm -r * - but at least will respect my alias for exit
TIL: Backticks deprecated, use $(things) - Ugh, so much re-writing to do :P
TIL: How to catch non-ascii characters in filenames, without using basename
TIL: "${file##*/}"
TIL: file - yes, X.pdf is not a PDF.
On the matter of POSIX
I'm afraid you cannot get the actual file size (not the number of blocks allocated by the file) in a plain posix shell without using ls. All the solutions like du --apparent-size, find -printf %s, and stat are not posix.
However, as long as your filenames don't contain linebreaks (spaces are ok) you could create safe solutions relying on ls. Correctly handling filenames with linebreaks would require very non-posix tools (like GNU sort -z) anyway.
Bash+POSIX Approach Actually Comparing The Files
I would drop the approach to compare only the file sizes and use cmp instead. For huge directories the posix script will be slow no matter what you do. Also, I expect cmp to do some fail fast checks (like comparing the file sizes) before actually comparing the file contents. For common scenarios with only a few files speed shouldn't matter anyway as even the worst script will run fast enough.
The following script places each group of actual duplicates (at least two, but can be more) into its own subdirectory of dups/. The script should work with all filenames; spaces, special symbols, and even linebreaks are ok. Note that we are still using bash (which is not posix). We just assume that all tools (like mv, find, ...) are posix.
#! /usr/bin/env bash
files=()
for f in *; do [ -f "$f" ] && files+=("$f"); done
max=${#files[#]}
for (( i = 0; i < max; i++ )); do
sameAsFileI=()
for (( j = i + 1; j < max; j++ )); do
cmp -s "${files[i]}" "${files[j]}" &&
sameAsFileI+=("${files[j]}") &&
unset 'files[j]'
done
(( ${#sameAsFileI[#]} == 0 )) && continue
mkdir -p "dups/$i/"
mv "${files[i]}" "${sameAsFileI[#]}" "dups/$i/"
# no need to unset files[i] because loops won't visit this entry again
files=("${files[#]}") # un-sparsify array
max=${#files[#]}
done
Fairly Portable Non-POSIX Approach Using File Sizes Only
If you need a faster approach that only compares the file sizes I suggest to not use a nested loop. Loops in bash are slow already, but if you nest them you have quadratic time complexity. It is faster and easier to ...
print only the file sizes without file names
apply sort | uniq -d to retrieve duplicates in time O(n log n)
Move all files having one of the duplicated sizes to a directory
This solution is not strictly posix conform. However, I tried to verify, that the tools and options in this solution are supported by most implementations. Your find has to support the non-posix options -maxdepth and -printf with %s for the actual file size and %f for the file basename (%p for the full path would be acceptable too).
The following script places all files of the same size into the directory potential-dups/. If there are two files of size n and two files of size m all four files end up in this single directory. The script should work with all file names expect those with linebreaks (that is \n; \r should be fine though).
#! /usr/bin/env sh
all=$(find . -maxdepth 1 -type f -printf '%s %f\n' | sort)
dupRegex=$(printf %s\\n "$all" | cut -d' ' -f1 | uniq -d |
sed -e 's/[][\.|$(){}?+*^]/\\&/g' -e 's/^/^/' | tr '\n' '|' | sed 's/|$//')
[ -z "$dupRegex" ] && exit
mkdir -p potential-dups
printf %s\\n "$all" | grep -E "$dupRegex" | cut -d' ' -f2- |
sed 's/./\\&/' | xargs -I_ mv _ potential-dups
In case you wonder about some of the sed commands: They quote the file names such that spaces and special symbols are processed correctly by subsequent tools. sed 's/[][\.|$(){}?+*^]/\\&/g' is for turning raw strings into equivalent extended regular expressions (ERE) and sed 's/./\\&/' is for literal processing by xargs. See the posix documentation of xargs:
-I replstr [...] Any <blank>s at the beginning of each line shall be ignored.
[...]
Note that the quoting rules used by xargs are not the same as in the shell. [...] An easy rule that can be used to transform any string into a quoted form that xargs interprets correctly is to precede each character in the string with a backslash.

Shorten filename to n characters while preserving file extension

I'm trying to shorten a filename while preserving the extension.
I think cut may be the best tool to use, but I'm not sure how to preserve the file extension.
For example, I'm trying to rename abcdefghijklmnop.txt to abcde.txt
I'd like to simply lop off the end of the filename so that the total character length doesn't exceed [in this example] 5 characters.
I'm not concerned with filename clashes because my dataset likely won't contain any, and anyway I'll do a find, analyze the files, and test before I rename anything.
The background for this is ultimately that I want to mass truncate filenames that exceed 135 characters so that I can rsync the files to an encrypted share on a Synology NAS.
I found a good way to search for all filenames that exceed 135 characters:
find . -type f | awk -F'/' 'length($NF)>135{print $0}'
And I'd like to pipe that to a simple cut command to trim the filename down to size. Perhaps there is a better way than this. I found a method to shorten filenames while preserving extensions, but I need to recurse through all sub-directories.
Any help would be appreciated, thank you!
Update for clarification:
I'd like to use a one-liner with a syntax like this:
find . -type f | awk -F'/' 'length($NF)>135{print $0}' | some_code_here_to_shorten_the_filename_while_preserving_the_extension
With GNU find and bash:
export n=10 # change according to your needs
find . -type f \
! -name '.*' \
-regextype egrep \
! -regex '.*\.[^/.]{'"$n"',}' \
-regex '.*[^/]{'$((n+1))',}' \
-execdir bash -c '
echo "PWD=$PWD"
for f in "${##./}"; do
ext=${f#"${f%.*}"}
echo mv -- "$f" "${f:0:n-${#ext}}${ext}"
done' bash {} +
This will perform a dry-run, that is it shows folders followed by the commands to be executed within them. Once you're happy with its output you can drop echo before mv (and echo "PWD=$PWD" line too if you want) and it'll actually rename all the files whose names exceed n characters to names exactly of n characters length including extension.
Note that this excludes hidden files, and files whose extensions are equal to or longer than n in length (e.g. .hidden, app.properties where n=10).
use bash string manipulations
Details: https://www.linuxtopia.org/online_books/advanced_bash_scripting_guide/string-manipulation.html.
scroll to "Substring Extraction"
example below cut filename to 10 chars preserving extension
~ % cat test
rawFileName=$(basename "$1")
filename="${rawFileName%.*}"
ext="${rawFileName##*.}"
if [[ ${#filename} < 9 ]]; then
echo ${filename:0:10}.${ext}
else
echo $1
fi
And tests:
~ % ./test 12345678901234567890.txt
1234567890.txt
~ % ./test 1234567.txt
1234567.txt
Update
Since your file are distributed in a tree of directories, you can use my original approach, but passing the script to a sh command passed to the -exec option of find:
n=5 find . -type f -exec sh -c 'f={}; d=${f%/*}; b=${f##*/}; e=${b##*.}; b=${b%.*}; mv -- "$f" "$d/${b:0:n}.$e"' \;
Original answer
If the filename is in a variable x, then ${x:0:5}.${x##*.} should do the job.
So you might do something like
n=5 # or 135, or whatever you like
for f in *; do
mv -- "$f" "${f:0:n}.${f##*.}"
done
Clearly this assumes that there are no clashes between the shortened names. If there are clashes, then only one would survive! So be careful.

Shell script to delete whose files names are not in a text file

I have a txt file which contains list of file names
Example:
10.jpg
11.jpg
12.jpeg
...
In a folder this files should protect from delete process and other files should delete.
So i want oppposite logic of this question: Shell command/script to delete files whose names are in a text file
How to do that?
Use extglob and Bash extended pattern matching !(pattern-list):
!(pattern-list)
Matches anything except one of the given patterns
where a pattern-list is a list of one or more patterns separated by a |.
extglob
If set, the extended pattern matching features described above are enabled.
So for example:
$ ls
10.jpg 11.jpg 12.jpeg 13.jpg 14.jpg 15.jpg 16.jpg a.txt
$ shopt -s extglob
$ shopt | grep extglob
extglob on
$ cat a.txt
10.jpg
11.jpg
12.jpeg
$ tr '\n' '|' < a.txt
10.jpg|11.jpg|12.jpeg|
$ ls !(`tr '\n' '|' < a.txt`)
13.jpg 14.jpg 15.jpg 16.jpg a.txt
The deleted files are 13.jpg 14.jpg 15.jpg 16.jpg a.txt according to the example.
So with extglob and !(pattern-list), we can obtain the files which are excluded based on the file content.
Additionally, if you want to exclude the entries starting with ., then you could switch on the dotglob option with shopt -s dotglob.
This is one way that will work with bash GLOBIGNORE:
$ cat file2
10.jpg
11.jpg
12.jpg
$ ls *.jpg
10.jpg 11.jpg 12.jpg 13.jpg
$ echo $GLOBIGNORE
$ GLOBIGNORE=$(tr '\n' ':' <file2 )
$ echo $GLOBIGNORE
10.jpg:11.jpg:12.jpg:
$ ls *.jpg
13.jpg
As it is obvious, globing ignores whatever (file, pattern, etc) is included in the GLOBIGNORE bash variable.
This is why the last ls reports only file 13.jpg since files 10,11 and 12.jpg are ignored.
As a result using rm *.jpg will remove only 13.jpg in my system:
$ rm -iv *.jpg
rm: remove regular empty file '13.jpg'? y
removed '13.jpg'
When you are done, you can just set GLOBIGNORE to null:
$ GLOBIGNORE=
Worths to be mentioned, that in GLOBIGNORE you can also apply glob patterns instead of single filenames, like *.jpg or my*.mp3 , etc
Alternative :
We can use programming techniques (grep, awk, etc) to compare the file names present in ignorefile and the files under current directory:
$ awk 'NR==FNR{f[$0];next}(!($0 in f))' file2 <(find . -type f -name '*.jpg' -printf '%f\n')
13.jpg
$ rm -iv "$(awk 'NR==FNR{f[$0];next}(!($0 in f))' file2 <(find . -type f -name '*.jpg' -printf '%f\n'))"
rm: remove regular empty file '13.jpg'? y
removed '13.jpg'
Note: This also makes use of bash process substitution, and will break if filenames include new lines.
Another alternative to George Vasiliou's answer would be to read the file with the names of the files to keep using the Bash builtin mapfile and then check for each of the files to be deleted whether it is in that list.
#! /bin/bash -eu
mapfile -t keepthose <keepme.txt
declare -a deletethose
for f in "$#"
do
keep=0
for not in "${keepthose[#]}"
do
[ "${not}" = "${f}" ] && keep=1 || :
done
[ ${keep} -gt 0 ] || deletethose+=("${f}")
done
# Remove the 'echo' if you really want to delete files.
echo rm -f "${deletethose[#]}"
The -t option causes mapfile to trim the trailing newline character from the lines it reads from the file. No other white-space will be trimmed, though. This might be what you want if your file names actually contain white-space but it could also cause subtle surprises if somebody accidentally puts a space before or after the name of an important file they want to keep.
Note that I'm first building a list of the files that should be deleted and then delete them all at once rather than deleting each file individually. This saves some sub-process invocations.
The lookup in the list, as coded above, has linear complexity which gives the overall script quadratic complexity (precisely, N × M where N is the number of command-line arguments and M the number of entries in the keepme.txt file). If you only have a few dozen files, this should be fine. Unfortunately, I don't know of a better way to check for set membership in Bash. (We cannot use the file names as keys in an associative array because they might not be proper identifiers.) If you are concerned with performance for many files, using a more powerful language like Python might be worth consideration.
I would also like to mention that the above example simply compares strings. It will not realize that important.txt and ./important.txt are the same file and hence delete the file. It would be more robust to convert the file name to a canonical path using readlink -f before comparing it.
Furthermore, your users might want to be able to put globing patterns (like important.* into the list of files to keep. If you want to handle those, extra logic would be required.
Overall, specifying what files to not delete seems a little dangerous as the error is on the bad side.
Provided there's no spaces or special escaped chars in the file names, either of these (or variations of these) would work:
rm -v $(stat -c %n * | sort excluded_file_list | uniq -u)
stat -c %n * | grep -vf excluded_file_list | xargs rm -v

Iterate through list of filenames in order they were created in bash

Parsing output of ls to iterate through list of files is bad. So how should I go about iterating through list of files in order by which they were first created? I browsed several questions here on SO and they all seem to parsing ls.
The embedded link suggests:
Things get more difficult if you wanted some specific sorting that
only ls can do, such as ordering by mtime. If you want the oldest or
newest file in a directory, don't use ls -t | head -1 -- read Bash FAQ
99 instead. If you truly need a list of all the files in a directory
in order by mtime so that you can process them in sequence, switch to
perl, and have your perl program do its own directory opening and
sorting. Then do the processing in the perl program, or -- worst case
scenario -- have the perl program spit out the filenames with NUL
delimiters.
Even better, put the modification time in the filename, in YYYYMMDD
format, so that glob order is also mtime order. Then you don't need ls
or perl or anything. (The vast majority of cases where people want the
oldest or newest file in a directory can be solved just by doing
this.)
Does that mean there is no native way of doing it in bash? I don't have the liberty to modify the filename to include the time in them. I need to schedule a script in cron that would run every 5 minutes, generate an array containing all the files in a particular directory ordered by their creation time and perform some actions on the filenames and move them to another location.
The following worked but only because I don't have funny filenames. The files are created by a server so it will never have special characters, spaces, newlines etc.
files=( $(ls -1tr) )
I can write a perl script that would do what I need but I would appreciate if someone can suggest the right way to do it in bash. Portable option would be great but solution using latest GNU utilities will not be a problem either.
sorthelper=();
for file in *; do
# We need something that can easily be sorted.
# Here, we use "<date><filename>".
# Note that this works with any special characters in filenames
sorthelper+=("$(stat -n -f "%Sm%N" -t "%Y%m%d%H%M%S" -- "$file")"); # Mac OS X only
# or
sorthelper+=("$(stat --printf "%Y %n" -- "$file")"); # Linux only
done;
sorted=();
while read -d $'\0' elem; do
# this strips away the first 14 characters (<date>)
sorted+=("${elem:14}");
done < <(printf '%s\0' "${sorthelper[#]}" | sort -z)
for file in "${sorted[#]}"; do
# do your stuff...
echo "$file";
done;
Other than sort and stat, all commands are actual native Bash commands (builtins)*. If you really want, you can implement your own sort using Bash builtins only, but I see no way of getting rid of stat.
The important parts are read -d $'\0', printf '%s\0' and sort -z. All these commands are used with their null-delimiter options, which means that any filename can be procesed safely. Also, the use of double-quotes in "$file" and "${anarray[*]}" is essential.
*Many people feel that the GNU tools are somehow part of Bash, but technically they're not. So, stat and sort are just as non-native as perl.
With all of the cautions and warnings against using ls to parse a directory notwithstanding, we have all found ourselves in this situation. If you do find yourself needing sorted directory input, then about the cleanest use of ls to feed your loop is ls -opts | read -r name; do... This will handle spaces in filenames, etc.. without requiring a reset of IFS due to the nature of read itself. Example:
ls -1rt | while read -r fname; do # where '1' is ONE not little 'L'
So do look for cleaner solutions avoiding ls, but if push comes to shove, ls -opts can be used sparingly without the sky falling or dragons plucking your eyes out.
let me add the disclaimer to keep everyone happy. If you like newlines inside your filenames -- then do not use ls to populate a loop. If you do not have newlines inside your filenames, there are no other adverse side-effects.
Contra: TLDP Bash Howto Intro:
#!/bin/bash
for i in $( ls ); do
echo item: $i
done
It appears that SO users do not know what the use of contra means -- please look it up before downvoting.
You can try using use stat command piped with sort:
stat -c '%Y %n' * | sort -t ' ' -nk1 | cut -d ' ' -f2-
Update: To deal with filename with newlines we can use %N format in stat andInstead of cut we can use awk like this:
LANG=C stat -c '%Y^A%N' *| sort -t '^A' -nk1| awk -F '^A' '{print substr($2,2,length($2)-2)}'
Use of LANG=C is needed to make sure stat uses single quotes only in quoting file names.
^A is conrtrol-A character typed using ControlVA keys together.
How about a solution with GNU find + sed + sort?
As long as there are no newlines in the file name, this should work:
find . -type f -printf '%T# %p\n' | sort -k 1nr | sed 's/^[^ ]* //'
It may be a little more work to ensure it is installed (it may already be, though), but using zsh instead of bash for this script makes a lot of sense. The filename globbing capabilities are much richer, while still using a sh-like language.
files=( *(oc) )
will create an array whose entries are all the file names in the current directory, but sorted by change time. (Use a capital O instead to reverse the sort order). This will include directories, but you can limit the match to regular files (similar to the -type f predicate to find):
files=( *(.oc) )
find is needed far less often in zsh scripts, because most of its uses are covered by the various glob flags and qualifiers available.
I've just found a way to do it with bash and ls (GNU).
Suppose you want to iterate through the filenames sorted by modification time (-t):
while read -r fname; do
fname=${fname:1:((${#fname}-2))} # remove the leading and trailing "
fname=${fname//\\\"/\"} # removed the \ before any embedded "
fname=$(echo -e "$fname") # interpret the escaped characters
file "$fname" # replace (YOU) `file` with anything
done < <(ls -At --quoting-style=c)
Explanation
Given some filenames with special characters, this is the ls output:
$ ls -A
filename with spaces .hidden_filename filename?with_a_tab filename?with_a_newline filename_"with_double_quotes"
$ ls -At --quoting-style=c
".hidden_filename" " filename with spaces " "filename_\"with_double_quotes\"" "filename\nwith_a_newline" "filename\twith_a_tab"
So you have to process a little each filename to get the actual one. Recalling:
${fname:1:((${#fname}-2))} # remove the leading and trailing "
# ".hidden_filename" -> .hidden_filename
${fname//\\\"/\"} # removed the \ before any embedded "
# filename_\"with_double_quotes\" -> filename_"with_double_quotes"
$(echo -e "$fname") # interpret the escaped characters
# filename\twith_a_tab -> filename with_a_tab
Example
$ ./script.sh
.hidden_filename: empty
filename with spaces : empty
filename_"with_double_quotes": empty
filename
with_a_newline: empty
filename with_a_tab: empty
As seen, file (or the command you want) interprets well each filename.
Each file has three timestamps:
Access time: the file was opened and read. Also known as atime.
Modification time: the file was written to. Also known as mtime.
Inode modification time: the file's status was changed, such as the file had a new hard link created, or an existing one removed; or if the file's permissions were chmod-ed, or a few other things. Also known as ctime.
Neither one represents the time the file was created, that information is not saved anywhere. At file creation time, all three timestamps are initialized, and then each one gets updated appropriately, when the file is read, or written to, or when a file's permissions are chmoded, or a hard link created or destroyed.
So, you can't really list the files according to their file creation time, because the file creation time isn't saved anywhere. The closest match would be the inode modification time.
See the descriptions of the -t, -u, -c, and -r options in the ls(1) man page for more information on how to list files in atime, mtime, or ctime order.
Here's a way using stat with an associative array.
n=0
declare -A arr
for file in *; do
# modified=$(stat -f "%m" "$file") # For use with BSD/OS X
modified=$(stat -c "%Y" "$file") # For use with GNU/Linux
# Ensure stat timestamp is unique
if [[ $modified == *"${!arr[#]}"* ]]; then
modified=${modified}.$n
((n++))
fi
arr[$modified]="$file"
done
files=()
for index in $(IFS=$'\n'; echo "${!arr[*]}" | sort -n); do
files+=("${arr[$index]}")
done
Since sort sorts lines, $(IFS=$'\n'; echo "${!arr[*]}" | sort -n) ensures the indices of the associative array get sorted by setting the field separator in the subshell to a newline.
The quoting at arr[$modified]="${file}" and files+=("${arr[$index]}") ensures that file names with caveats like a newline are preserved.

Using ls, how to list files without printing the extension (the part after the dot)?

Suppose I have a directory with some files:
$ ls
a.c b.c e.c k.cpp s.java
How can I display the result without the file extension(the part following the dot, including that dot)? Like this:
$ <some command>
a
b
e
k
s
using sed?
ls -1 | sed -e 's/\..*$//'
ls | while read fname
do
echo ${fname%%.*}
done
Try that.
ls -a | cut -d "." -f 1
man (1) cut
Very handy, the -d switch defines the delimiter and the -f which field you want.
EDIT: Include riverfall's scenario is also piece of cake as cut can start also from the end, though the logic is somewhat different. Here an example with a bunch of files with random names, some with two dots, some with a single dot and some without extension:
runlevel0#ubuntu:~/test$ ls
test.001.rpx test.003.rpx test.005.rpx test.007.rpx test.009.rpx testxxx
test.002.rpx test.004.rpx test.006.rpx test.008.rpx test_nonum test_xxx.rtv
runlevel0#ubuntu:~/test$ ls | cut -d "." -f -2
test.001
test.002
test.003
test.004
test.005
test.006
test.007
test.008
test.009
test_nonum
testxxx
test_xxx.rtv
Using the minus before the field number makes it eliminate all BUT the indicated fields (1,2 in this case) and putting it behind makes it start counting from the end.
This same notation can be used for offset and characters besides of fields (see the man page)
If you already know the extension of the file, you can use basename, from the man page:
basename - strip directory and suffix from filenames
Unfortunately, it's mostly useful if you're trying to filter a single extension, in your case the command is:
basename -s .c -a $(ls *.c) && basename -s .cpp -a $(ls *.cpp) && basename -s .java -a $(ls *.java)
output:
a
b
e
k
s
for f in *; do printf "%s\n" ${f%%.*}; done
Why it works?
${string%%substring} Deletes longest match of $substring from back of $string.
This would handle mypackage.pkg.tar.xz --> mypackage for instance.
In contrast:
${string%substring} Deletes shortest match of $substring from back of $string.
That is ${string%substring} would only delete the final extension, i.e.
mypackage.pkg.tar.xz --> mypackage.pkg.tar
On a side note, use printf preferentially to echo. The syntax is a little more complex, but it will work on a wider variety of systems.
If you only want to see files, not directories:
for f in *; do if [[ -f ${f} ]]; then printf "%s\n" ${f%%.*}; fi; done

Resources