Get first file of given extension from a folder - bash

I need to get the first file in a folder which has the .tar.gz extension. I came up with:
FILE=/path/to/folder/$(ls /path/to/folder | grep ".tar.gz$" | head -1)
but I feel it can be done simpler and more elegant. Is there a better solution?

You could get all the files in an array, and then get the desired one:
files=( /path/to/folder/*.tar.gz )
Getting the first file:
echo "${files[0]}"
Getting the last file:
echo "${files[${#files[#]}-1]}"
You might want to set the shell option nullglob to handle cases when there are no matching files:
shopt -s nullglob

here is the shorter version from your own idea.
FILE=$(ls /path/to/folder/*.tar.gz| head -1)

You can use set as shown below. The shell will expand the wildcard and set will assign the files as positional parameters which can be accessed using $1, $2 etc.
# set nullglob so that if no matching files are found, the wildcard expands to a null string
shopt -s nullglob
set -- /path/to/folder/*.tar.gz
# print the name of the first file
echo "$1"
It is not good practice to parse ls as you are doing, because it will not handle filenames containing newline characters. Also, the grep is unnecessary because you could simply do ls /path/to/folder/*.tar.gz | head -1.

Here's a way to accomplish it:
for FILE in *.tar.gz; do break; done
You tell bash to break the loop in the first iteration, just when the first filename is assigned to FILE.
Another way to do the same:
first() { FILE=$1; } && first *.tar.gz
Here you are using the positional parameters of the function first which is better than set the positional parameters of your entire bash process (as with set --).

Here's a find based solution:
$ find . -maxdepth 1 -type f -iname "*.tar.gz" | head -1
where:
. is the current directory
-maxdepth 1 means only check the current directory
-type f means only look at files
-iname "*.tar.gz" means do a case-insensitive search for any file with the .tar.gz extension
| head -1 takes the results of find and only returns the first line
You could get rid of the | head -1 by doing something like:
$ find . -maxdepth 1 -type f -iname "*.tar.gz" -maxdepth 1 -print -quit
But I'm actually not sure how portable -print -quit is across environments (it works on MacOS and Ubuntu though).

Related

For loop, wildcard and conditional statement

I don't really know what am I supposed to do with it.
For each file in the /etc directory whose name starts with the o or l and the second letter and the second letter of the name is t or r, display its name, size and type ('file'/'directory'/'link'). Use: wildcard, for loop and conditional statement for the type.
#!/bin/bash
etc_dir=$(ls -a /etc/ | grep '^o|^l|^.t|^.r')
for file in $etc_dir
do
stat -c '%s-%n' "$file"
done
I was thinking about something like that but I have to use if statement.
You may reach the goal by using find command.
This will search through all subdirectories.
#!/bin/bash
_dir='/etc'
find "${_dir}" -name "[ol][tr]*" -exec stat -c '%s-%n' {} \; 2>/dev/null
To have control on searching in subdirectories, you may use -maxdepth flag, like in the below example it will search only the files and directories name in the /etc dir and don't go through the subdirectories.
#!/bin/bash
_dir='/etc'
find "${_dir}" -maxdepth 1 -name "[ol][tr]*" -exec stat -c '%s-%n' {} \; 2>/dev/null
You may also use -type f OR -type d parameters to filter finding only Files OR Directories accordingly (if needed).
#!/bin/bash
_dir='/etc'
find "${_dir}" -name "[ol][tr]*" -type f -exec stat -c '%s-%n' {} \; 2>/dev/null
Update #1
Due to your request in the comments, this is a long way but used for loop and if statement.
Note: I'd strongly recommend to review and practice the commands used in this script instead of just copy and pasting them to get the score ;)
#!/bin/bash
# Set the main directory path.
_mainDir='/etc'
# This will find all files in the $_mainDir (ignoring errors if any) and assign the file's path to the $_files variable.
_files=$(find "${_mainDir}" 2>/dev/null)
# In this for loop we will
# loop over all files
# identify the poor filename from the whole file path
# and IF the poor file name matches the statement then run & output the `stat` command on that file.
for _file in ${_files} ;do
_fileName=$(basename ${_file})
if [[ "${_fileName}" =~ ^[ol][tr].* ]] ;then
stat -c 'Size: %s , Type: %n ' "${_file}"
fi
done
exit 0
You should break-down you problems into multiple pieces and tackle them one by one.
First, try and build an expression that finds the right files. If you were to execute your regex expression in a shell:
ls -a /etc/ | grep '^o|^l|^.t|^.r'
You would immediately see that you don't get the right output. So the first step would be to understand how grep works and fix the expression to:
ls -a /etc/ | grep '^[ol][tr]*'
Then, you have the file name, and you need the size and a textual file type. The size is easy to obtain using a stat call.
But, you soon realize you cannot ask stat to provide a textual format of the file type with the -f switch, so you probably have to use an if clause to present that.
How about this:
shopt -s extglob
ls -dp /etc/#(o|l)#(t|r)* | grep -v '/$'
Explanation:
shopt extglob - enable extended globbing (https://www.google.com/search?q=bash+extglob)
ls -d - list directories names, not their content
ls -dp - and add / at the end of each directory name
#(o|l)#(t|r) - o or l once (#), and then t or r once
grep -v '/$' - remove all lines containing / at the end
Of course, Vab's find solution is better that this ls:
find /etc -maxdepth 1 -name "[ol][tr]*" -type f -exec stat {} \;

How to remove files from a directory if their names are not in a text file? Bash script

I am writing a bash script and want it to tell me if the names of the files in a directory appear in a text file and if not, remove them.
Something like this:
counter = 1
numFiles = ls -1 TestDir/ | wc -l
while [$counter -lt $numFiles]
do
if [file in TestDir/ not in fileNames.txt]
then
rm file
fi
((counter++))
done
So what I need help with is the if statement, which is still pseudo-code.
You can simplify your script logic a lot :
#/bin/bash
# for loop to iterate over all files in the testdir
for file in TestDir/*
do
# if grep exit code is 1 (file not found in the text document), we delete the file
[[ ! $(grep -x "$file" fileNames.txt &> /dev/null) ]] && rm "$file"
done
It looks like you've got a solution that works, but I thought I'd offer this one as well, as it might still be of help to you or someone else.
find /Path/To/TestDir -type f ! -name '.*' -exec basename {} + | grep -xvF -f /Path/To/filenames.txt"
Breakdown
find: This gets file paths in the specified directory (which would be TestDir) that match the given criteria. In this case, I've specified it return only regular files (-type f) whose names don't start with a period (-name '.*'). It then uses its own builtin utility to execute the next command:
basename: Given a file path (which is what find spits out), it will return the base filename only, or, more specifically, everything after the last /.
|: This is a command pipe, that takes the output of the previous command to use as input in the next command.
grep: This is a regular-expression matching utility that, in this case, is given two lists of files: one fed in through the pipe from find—the files of your TestDir directory; and the files listed in filenames.txt. Ordinarily, the filenames in the text file would be used to match against filenames returned by find, and those that match would be given as the output. However, the -v flag inverts the matching process, so that grep returns those filenames that do not match.
What results is a list of files that exist in the directory TestDir, but do not appear in the filenames.txt file. These are the files you wish to delete, so you can simply use this line of code inside a parameter expansion $(...) to supply rm with the files it's able to delete.
The full command chain—after you cd into TestDir—looks like this:
rm $(find . -type f ! -name '.*' -exec basename {} + | grep -xvF -f filenames.txt")

Bash command to get list of files in directory

What do I have to type to get a list of files in the current directory satisfying the following conditions?
Hidden files (starting with ".") should NOT be included
Folder names should not be included
The filenames should include their extension
Filenames with spaces should not be broken up into multiple list items
(I intend to loop over the results in a foor loop in bash script.)
Using just bash:
files=()
for f in *; do [[ -d $f ]] || files+=("$f"); done
printf "%s\n" "${files[#]}"
How about just using * and then skipping over the directories as the first step in your loop?
for F in * ; do
if test -d "$F" ; then continue ; fi
echo "$F"
done
find . -maxdepth 1 ! -name '.*' -type f
should work fine for your needs
maxdepth 1 -> Searches only in current dir
! -name '.*' -> Searches files NOT matching name pattern '.*'
type f -> Searches only files, not dirs
You can try this:
find $folderPath -type f -printf "%f\n" | grep -v '^[.].*'
you only have to change the $folderPath.
Explanation
find $folderPath -type f -printf "%f\n"
With find command you can search into the folder with a several options.
'-type f' search only the files, if you want filter by folder, you only have to change the 'f' for 'd'
'-print "%f\n"' format the output of the command. '%f' extract the filename and '\n' add a return.
'|' is a pipe. With the pipe you can use the output of a command to use it like a input var in other command.
grep -v '^[.].*'
grep is a command to looking for a regex expresion into a input text.
with the option '-v '^[.].*', you exclude the values that contains the regex. In this case, the words that start with a . (dot).
To close the explanation, if you use another '|' (pipe) you can process every result of the command like the input for another. This has a better performance, because the output is in a Stream, and need less memory to process the result.

Globbing for only files in Bash

I'm having a bit of trouble with globs in Bash. For example:
echo *
This prints out all of the files and folders in the current directory.
e.g. (file1 file2 folder1 folder2)
echo */
This prints out all of the folders with a / after the name.
e.g. (folder1/ folder2/)
How can I glob for just the files?
e.g. (file1 file2)
I know it could be done by parsing ls but also know that it is a bad idea. I tried using extended blobbing but couldn't get that to work either.
WIthout using any external utility you can try for loop with glob support:
for i in *; do [ -f "$i" ] && echo "$i"; done
I don't know if you can solve this with globbing, but you can certainly solve it with find:
find . -type f -maxdepth 1
You can do what you want in bash like this:
shopt -s extglob
echo !(*/)
But note that what this actually does is match "not directory-likes."
It will still match dangling symlinks, symlinks pointing to not-directories, device nodes, fifos, etc.
It won't match symlinks pointing to directories, though.
If you want to iterate over normal files and nothing more, use find -maxdepth 1 -type f.
The safe and robust way to use it goes like this:
find -maxdepth 1 -type f -print0 | while read -d $'\0' file; do
printf "%s\n" "$file"
done
My go to in this scenario is to use the find command. I just had to use it, to find/replace dozens of instances in a given directory. I'm sure there are many other ways of skinning this cat, but the pure for example above, isn't recursive.
for file in $( find path/to/dir -type f -name '*.js' );
do sed -ie 's#FIND#REPLACEMENT#g' "$file";
done

How to get the number of files in a folder as a variable?

Using bash, how can one get the number of files in a folder, excluding directories from a shell script without the interpreter complaining?
With the help of a friend, I've tried
$files=$(find ../ -maxdepth 1 -type f | sort -n)
$num=$("ls -l" | "grep ^-" | "wc -l")
which returns from the command line:
../1-prefix_blended_fused.jpg: No such file or directory
ls -l : command not found
grep ^-: command not found
wc -l: command not found
respectively. These commands work on the command line, but NOT with a bash script.
Given a file filled with image files formatted like 1-pano.jpg, I want to grab all the images in the directory to get the largest numbered file to tack onto the next image being processed.
Why the discrepancy?
The quotes are causing the error messages.
To get a count of files in the directory:
shopt -s nullglob
numfiles=(*)
numfiles=${#numfiles[#]}
which creates an array and then replaces it with the count of its elements. This will include files and directories, but not dotfiles or . or .. or other dotted directories.
Use nullglob so an empty directory gives a count of 0 instead of 1.
You can instead use find -type f or you can count the directories and subtract:
# continuing from above
numdirs=(*/)
numdirs=${#numdirs[#]}
(( numfiles -= numdirs ))
Also see "How can I find the latest (newest, earliest, oldest) file in a directory?"
You can have as many spaces as you want inside an execution block. They often aid in readability. The only downside is that they make the file a little larger and may slow initial parsing (only) slightly. There are a few places that must have spaces (e.g. around [, [[, ], ]] and = in comparisons) and a few that must not (e.g. around = in an assignment.
ls -l | grep -v ^d | wc -l
One line.
How about:
count=$(find .. -maxdepth 1 -type f|wc -l)
echo $count
let count=count+1 # Increase by one, for the next file number
echo $count
Note that this solution is not efficient: it spawns sub shells for the find and wc commands, but it should work.
file_num=$(ls -1 --file-type | grep -v '/$' | wc -l)
this is a bit lightweight than a find command, and count all files of the current directory.
The most straightforward, reliable way I can think of is using the find command to create a reliably countable output.
Counting characters output of find with wc:
find . -maxdepth 1 -type f -printf '.' | wc --char
or string length of the find output:
a=$(find . -maxdepth 1 -type f -printf '.')
echo ${#a}
or using find output to populate an arithmetic expression:
echo $(($(find . -maxdepth 1 -type f -printf '+1')))
Simple efficient method:
#!/bin/bash
RES=$(find ${SOURCE} -type f | wc -l)
Get rid of the quotes. The shell is treating them like one file, so it's looking for "ls -l".
REmove the qoutes and you will be fine
Expanding on the accepted answer (by Dennis W): when I tried this approach I got incorrect counts for dirs without subdirs in Bash 4.4.5.
The issue is that by default nullglob is not set in Bash and numdirs=(*/) sets an 1 element array with the glob pattern */. Likewise I suspect numfiles=(*) would have 1 element for an empty folder.
Setting shopt -s nullglob to disable nullglobbing resolves the issue for me. For an excellent discussion on why nullglob is not set by default on Bash see the answer here: Why is nullglob not default?
Note: I would have commented on the answer directly but lack the reputation points.
Here's one way you could do it as a function. Note: you can pass this example, dirs for (directory count), files for files count or "all" for count of everything in a directory. Does not traverse tree as we aren't looking to do that.
function get_counts_dir() {
# -- handle inputs (e.g. get_counts_dir "files" /path/to/folder)
[[ -z "${1,,}" ]] && type="files" || type="${1,,}"
[[ -z "${2,,}" ]] && dir="$(pwd)" || dir="${2,,}"
shopt -s nullglob
PWD=$(pwd)
cd ${dir}
numfiles=(*)
numfiles=${#numfiles[#]}
numdirs=(*/)
numdirs=${#numdirs[#]}
# -- handle input types files/dirs/or both
result=0
case "${type,,}" in
"files")
result=$((( numfiles -= numdirs )))
;;
"dirs")
result=${numdirs}
;;
*) # -- returns all files/dirs
result=${numfiles}
;;
esac
cd ${PWD}
shopt -u nullglob
# -- return result --
[[ -z ${result} ]] && echo 0 || echo ${result}
}
Examples of using the function :
folder="/home"
get_counts_dir "files" "${folder}"
get_counts_dir "dirs" "${folder}"
get_counts_dir "both" "${folder}"
Will print something like :
2
4
6
Short and sweet method which also ignores symlinked directories.
count=$(ls -l | grep ^- | wc -l)
or if you have a target:
count=$(ls -l /path/to/target | grep ^- | wc -l)

Resources