How to get the number of files in a folder as a variable? - bash

Using bash, how can one get the number of files in a folder, excluding directories from a shell script without the interpreter complaining?
With the help of a friend, I've tried
$files=$(find ../ -maxdepth 1 -type f | sort -n)
$num=$("ls -l" | "grep ^-" | "wc -l")
which returns from the command line:
../1-prefix_blended_fused.jpg: No such file or directory
ls -l : command not found
grep ^-: command not found
wc -l: command not found
respectively. These commands work on the command line, but NOT with a bash script.
Given a file filled with image files formatted like 1-pano.jpg, I want to grab all the images in the directory to get the largest numbered file to tack onto the next image being processed.
Why the discrepancy?

The quotes are causing the error messages.
To get a count of files in the directory:
shopt -s nullglob
numfiles=(*)
numfiles=${#numfiles[#]}
which creates an array and then replaces it with the count of its elements. This will include files and directories, but not dotfiles or . or .. or other dotted directories.
Use nullglob so an empty directory gives a count of 0 instead of 1.
You can instead use find -type f or you can count the directories and subtract:
# continuing from above
numdirs=(*/)
numdirs=${#numdirs[#]}
(( numfiles -= numdirs ))
Also see "How can I find the latest (newest, earliest, oldest) file in a directory?"
You can have as many spaces as you want inside an execution block. They often aid in readability. The only downside is that they make the file a little larger and may slow initial parsing (only) slightly. There are a few places that must have spaces (e.g. around [, [[, ], ]] and = in comparisons) and a few that must not (e.g. around = in an assignment.

ls -l | grep -v ^d | wc -l
One line.

How about:
count=$(find .. -maxdepth 1 -type f|wc -l)
echo $count
let count=count+1 # Increase by one, for the next file number
echo $count
Note that this solution is not efficient: it spawns sub shells for the find and wc commands, but it should work.

file_num=$(ls -1 --file-type | grep -v '/$' | wc -l)
this is a bit lightweight than a find command, and count all files of the current directory.

The most straightforward, reliable way I can think of is using the find command to create a reliably countable output.
Counting characters output of find with wc:
find . -maxdepth 1 -type f -printf '.' | wc --char
or string length of the find output:
a=$(find . -maxdepth 1 -type f -printf '.')
echo ${#a}
or using find output to populate an arithmetic expression:
echo $(($(find . -maxdepth 1 -type f -printf '+1')))

Simple efficient method:
#!/bin/bash
RES=$(find ${SOURCE} -type f | wc -l)

Get rid of the quotes. The shell is treating them like one file, so it's looking for "ls -l".

REmove the qoutes and you will be fine

Expanding on the accepted answer (by Dennis W): when I tried this approach I got incorrect counts for dirs without subdirs in Bash 4.4.5.
The issue is that by default nullglob is not set in Bash and numdirs=(*/) sets an 1 element array with the glob pattern */. Likewise I suspect numfiles=(*) would have 1 element for an empty folder.
Setting shopt -s nullglob to disable nullglobbing resolves the issue for me. For an excellent discussion on why nullglob is not set by default on Bash see the answer here: Why is nullglob not default?
Note: I would have commented on the answer directly but lack the reputation points.

Here's one way you could do it as a function. Note: you can pass this example, dirs for (directory count), files for files count or "all" for count of everything in a directory. Does not traverse tree as we aren't looking to do that.
function get_counts_dir() {
# -- handle inputs (e.g. get_counts_dir "files" /path/to/folder)
[[ -z "${1,,}" ]] && type="files" || type="${1,,}"
[[ -z "${2,,}" ]] && dir="$(pwd)" || dir="${2,,}"
shopt -s nullglob
PWD=$(pwd)
cd ${dir}
numfiles=(*)
numfiles=${#numfiles[#]}
numdirs=(*/)
numdirs=${#numdirs[#]}
# -- handle input types files/dirs/or both
result=0
case "${type,,}" in
"files")
result=$((( numfiles -= numdirs )))
;;
"dirs")
result=${numdirs}
;;
*) # -- returns all files/dirs
result=${numfiles}
;;
esac
cd ${PWD}
shopt -u nullglob
# -- return result --
[[ -z ${result} ]] && echo 0 || echo ${result}
}
Examples of using the function :
folder="/home"
get_counts_dir "files" "${folder}"
get_counts_dir "dirs" "${folder}"
get_counts_dir "both" "${folder}"
Will print something like :
2
4
6

Short and sweet method which also ignores symlinked directories.
count=$(ls -l | grep ^- | wc -l)
or if you have a target:
count=$(ls -l /path/to/target | grep ^- | wc -l)

Related

linux show head of the first file from ls command

I have a folder, e.g. named 'folder'. There are 50000 txt files under it, e.g, '00001.txt, 00002.txt, etc'.
Now I want to use one command line to show the head 10 lines in '00001.txt'. I have tried:
ls folder | head -1
which will show the filename of the first:
00001.txt
But I want to show the contents of folder/00001.txt
So, how do I do something like os.path.join(folder, xx) and show its head -10?
The better way to do this is not to use ls at all; see Why you shouldn't parse the output of ls, and the corresponding UNIX & Linux question Why not parse ls (and what to do instead?).
On a shell with arrays, you can glob into an array, and refer to items it contains by index.
#!/usr/bin/env bash
# ^^^^- bash, NOT sh; sh does not support arrays
# make array files contain entries like folder/0001.txt, folder/0002.txt, etc
files=( folder/* ) # note: if no files found, it will be files=( "folder/*" )
# make sure the first item in that array exists; if it does't, that means
# the glob failed to expand because no files matching the string exist.
if [[ -e ${files[0]} || -L ${files[0]} ]]; then
# file exists; pass the name to head
head -n 10 <"${files[0]}"
else
# file does not exist; spit out an error
echo "No files found in folder/" >&2
fi
If you wanted more control, I'd probably use find. For example, to skip directories, the -type f predicate can be used (with -maxdepth 1 to turn off recursion):
IFS= read -r -d '' file < <(find folder -maxdepth 1 -type f -print0 | sort -z)
head -10 -- "$file"
Although hard to understand what you are asking but I think something like this will work:
head -10 $(ls | head -1)
Basically, you get the file from $(ls | head -1) and then print the content.
If you invoke the ls command as ls "$PWD"/folder, it will include the absolute path of the file in the output.

Find the biggest index in extension of file in a bash script

So I have a folder with bunch of files.
File, File.0, File.1, File.2
I'm trying to find the biggest index in extension of this files. So it has to be 2.
I wrote this command, which count all files with numeric extension.
But it's not working properly when the index is greater than 10. It's not working at all, because I just want to find biggest index, not sum of file with number in index.
$1 (is file name in this case File)
y=$(echo $(ls -d $1.[0-inf] | wc -l))
How can I do this ?
First tip : do not parse the output of ls. Especially in your case.
You could use the following script in pure bash to address your issue :
#!/bin/bash
# needed for correct glob expansion
shopt -s nullglob
# we check every file following the format $1.extension
max_index=0
for f in $1.*
do
# we retrieve the last extension
ext=${f##*.}
re="^[0-9]+$"
# if ext is a number and greater than our max, we store it
if [[ $ext =~ $re && $ext -gt $max_index ]]
then
max_index=$ext
fi
done
echo $max_index
You can try this:
for i in file\.*; do echo ${i##*.}; done | sort -g | tail -n1
${i##*.} is removing everything before the last . in the filename.
sort -g is sorting as numeric value.
tail -n1 prints the last index.
A more error prone way is to use findcommand as the it will cope with file not matching the pattern, filename with spaces...
find -type f -name "file\.*" -exec bash -c 'echo ${1/*\.}' _ "{}" \; 2>/dev/null | sort -n | tail -n1
bash -c 'echo ${1/*\.}' _ "{}" is the command that will strip the characters before the ..
You may want to add -maxdepth 1 at the beginning of the command to avoid looking recursively inside directories.

Get first file of given extension from a folder

I need to get the first file in a folder which has the .tar.gz extension. I came up with:
FILE=/path/to/folder/$(ls /path/to/folder | grep ".tar.gz$" | head -1)
but I feel it can be done simpler and more elegant. Is there a better solution?
You could get all the files in an array, and then get the desired one:
files=( /path/to/folder/*.tar.gz )
Getting the first file:
echo "${files[0]}"
Getting the last file:
echo "${files[${#files[#]}-1]}"
You might want to set the shell option nullglob to handle cases when there are no matching files:
shopt -s nullglob
here is the shorter version from your own idea.
FILE=$(ls /path/to/folder/*.tar.gz| head -1)
You can use set as shown below. The shell will expand the wildcard and set will assign the files as positional parameters which can be accessed using $1, $2 etc.
# set nullglob so that if no matching files are found, the wildcard expands to a null string
shopt -s nullglob
set -- /path/to/folder/*.tar.gz
# print the name of the first file
echo "$1"
It is not good practice to parse ls as you are doing, because it will not handle filenames containing newline characters. Also, the grep is unnecessary because you could simply do ls /path/to/folder/*.tar.gz | head -1.
Here's a way to accomplish it:
for FILE in *.tar.gz; do break; done
You tell bash to break the loop in the first iteration, just when the first filename is assigned to FILE.
Another way to do the same:
first() { FILE=$1; } && first *.tar.gz
Here you are using the positional parameters of the function first which is better than set the positional parameters of your entire bash process (as with set --).
Here's a find based solution:
$ find . -maxdepth 1 -type f -iname "*.tar.gz" | head -1
where:
. is the current directory
-maxdepth 1 means only check the current directory
-type f means only look at files
-iname "*.tar.gz" means do a case-insensitive search for any file with the .tar.gz extension
| head -1 takes the results of find and only returns the first line
You could get rid of the | head -1 by doing something like:
$ find . -maxdepth 1 -type f -iname "*.tar.gz" -maxdepth 1 -print -quit
But I'm actually not sure how portable -print -quit is across environments (it works on MacOS and Ubuntu though).

Calling linux utilities with options from within a Bash script

This is my first Bash script so forgive me if this question is trivial. I need to count the number of files within a specified directory $HOME/.junk. I thought this would be simple and assumed the following would work:
numfiles= find $HOME/.junk -type f | wc -l
echo "There are $numfiles files in the .junk directory."
Typing find $HOME/.junk -type f | wc -l at the command line works exactly how I expected it to, simply returning the number of files. Why is this not working when it is entered within my script? Am I missing some special notation when it comes to passing options to the utilities?
Thank you very much for your time and help.
You just need to surround it with backticks:
numfiles=`find $HOME/.junk -type f | wc -l`
The term for this is command substitution.
if you are using bash you can also use $() for command substitution, like so:
numfiles=$(find $HOME/.junk -type f | wc -l)
I find this to be slightly more readable than backquotes, as well as having the ability to nest several commands inside one another.
with bash 4 (if you want recursive)
#!/bin/bash
shopt -s globstar
for file in **
do
((i++))
done
echo "total files: $i"
if not
#!/bin/bash
shopt -s dotglob
shopt -s nullglob
for file in *
do
((i++))
done
echo "total files: $i"

bash testing a group of directories for existence

Have documents stored in a file system which includes "daily" directories, e.g. 20050610. In a bash script I want to list the files in a months worth of these directories. So I'm running a find command find <path>/200506* -type f >> jun2005.lst. Would like to check that this set of directories is not a null set before executing the find command. However, if I use if[ -d 200506* ] I get a "too many arguements error. How can I get around this?
Your "too many arguments" error does not come from there being a huge number of files and exceeding the command line argument limit. It comes from having more than one or two directories that match the glob. Your glob "200506*" expands to something like "20050601 20050602 20050603..." and the -d test only expects one argument.
$ mkdir test
$ cd test
$ mkdir a1
$ [ -d a* ] # no error
$ mkdir a2
$ [ -d a* ]
-bash: [: a1: binary operator expected
$ mkdir a3
$ [ -d a* ]
-bash: [: too many arguments
The answer by zed_0xff is on the right track, but I'd use a different approach:
shopt -s nullglob
path='/path/to/dirs'
glob='200506*/'
outfile='jun2005.lst'
dirs=("$path"/$glob) # dirs is an array available to be iterated over if needed
if (( ${#dirs[#]} > 0 ))
then
echo "directories found"
# append may not be necessary here
find "$path"/$glob -type f >> "$outfile"
fi
The position of the quotes in "$path"/$glob versus "$path/$glob" is essential to this working.
Edit:
Corrections made to exclude files that match the glob (so only directories are included) and to handle the very unusual case of a directory named literally like the glob ("200506*").
prefix="/tmp/path"
glob="200611*"
n_dirs=$(find $prefix -maxdepth 1 -type d -wholename "$prefix/$glob" |wc -l)
if [[ $n_dirs -gt 0 ]];then
find $prefix -maxdepth 2 -type f -wholename "$prefix/$glob"
fi
S=200506*
if [ ${#S} -gt 6 ]; then
echo haz filez!
else
echo no filez
fi
not a very elegant one, but w/o any external tools/commands (if don't think of "[" as an external one)
the clue is if there is some files matched, then "S" variable will contain their names delimited with space. Otherwise it will contain a "200506*" string itself.
You could us ls like this:
if [ -n "$(ls -d | grep 200506)" ]; then
# There are directories with this pattern
fi
Because there is a limit on command line length in most shells: anything like "$(ls -d | grep 200506)" or /path/200506* will run the risk of overflowing the limit. I'm not sure if substitutions and glob expansions count towards it in BASH, but I assume so. You would have to test it and check the bash docs and source to be sure.
The answer is in simplifying your question.
find <path>/200506* -type f -exec somescript '{}' \;
Where somescript is a shell script that does the test. Something like this perhaps:
#!/bin/sh
[ -d "$#" ] && echo "$#" >> june2005.lst
Passing the june2005.lst to the script (advice: use an environment variable), and dealing with any possibility that 200506* may expand to tooo huge a file path, being left as an exercise for the OP ;)
Integrating the whole thing into a pipe line or adapting a more general scripting language would yield performance boosts, by minimizing the number of shells spawned. Now that would be fun. Here is a hint for that, use -exec and another program (awk, perl, etc) to do the directory test as part of a one line filter, and keep the >>june2005.lst on the find command.

Resources