How to find a file without knowing its extension? [duplicate] - bash

This question already has answers here:
Test whether a glob has any matches in Bash
(22 answers)
Closed 1 year ago.
Currently I'm writing a script and I need to check if a certain file exists in a directory without knowing the extension of the file. I tried this:
if [[ -f "$2"* ]]
but that didn't work. Does anyone know how I could do this?

-f expects a single argument, and AFIK, you don't get filename expansion in this context anyway.
Although cumbersome, the best I can think of, is to produce an array of all matching filenames, i.e.
shopt -s nullglob
files=( "$2".* )
and test the size of the array. If it is larger than 1, you have more than one candidate., i.e.
if (( ${#files[*]} > 1 ))
then
....
fi
If the size is 1, ${files[0]} gives you the desired one. If the size is 0 (which can only happen if you turn on nullglob), no files are matching.
Don't forget to reset nullglob afterwards, if you don't need it anymore.

In shell, you have to iterate each file globbing pattern's match and test each one individually.
Here is how you could do it with standard POSIX shell syntax:
#!/usr/bin/env sh
# Boolean flag to check if a file match was found
found_flag=0
# Iterate all matches
for match in "$2."*; do
# In shell, when no match is found, the pattern itself is returned.
# To exclude the pattern, check a file of this name actually exists
# and is an actual file
if [ -f "$match" ]; then
found_flag=1
printf 'Found file: %s\n' "$match"
fi
done
if [ $found_flag -eq 0 ]; then
printf 'No file matching: %s.*\n' "$2" >&2
exit 1
fi

You can use find:
find ./ -name "<filename>.*" -exec <do_something> {} \;
<filename> is the filename without extension, do_something the command you want to launch, {} is the placeholder of the filename.

Related

Identifying folder with name as largest number in the directory

there is a directory which contains folders named with numbers, i've to find the folder with largest number in that directory.
This is the script i've written to find that folder:
files='ls path/'
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
But it's not working. It gives below error after invoking the script using command sudo ./restore2.sh.
ls
path/
./restore2.sh: line 6: path/: syntax error: operand expected (error token is "/")
0
Try this:
#!/bin/bash
files=`ls path/`
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
there's a backtick here: ls path/ instead of single or double-quotes.
I've only corrected this statement and it worked. and notice to add #!/bin/bash at the top of the script. This will tell your system to run the script in a bash shell.
You're using single quotes instead of backticks files='ls path/'. It's trying to use it as a literal string instead of evaluating it.
Also, for that specific task, you can just do:
ls test | awk '{if($1 > largest){largest = $1}} END{print largest}'
To have it a bit simpler.
Use find instead:
find . -maxdepth 1 -type d -regextype "posix-extended" -regex "^.*[[:digit:]]+.*$" | sort -n | tail -1
Set the maxdepth to 1 to check for directories within this directory only and no deeper. Set the regular expression type to posix-extended and search for all directories that have one or more digits. Print the result and order through sort before taking the largest one with tail -1.
Does path/ have any files in it? It looks like it's empty.
You should be getting a completely different complaint...
You don't want the path info in the filename. Rather than strip it with ${file##*/}, just go there and use non-path'd names.
An adaptation using your own logic as its base -
cd /whatever/path/ # go where the files are
var=-1 # initialize comparator
for file in [0-9]* # each entry that starts with a digit
do [[ "$file" =~ [^0-9] ]] && continue # skip any file with nondigit contents
[[ -f "$file" ]] || continue # only process plain files
(( file > var )) && var=$file # remember largest seen
done
echo $var # report largest
If you are sure there will be no negative numbered filenames, this should do it.
If there can be valid negatives, then your initialization needs to be appropriately lower, and the exclusion of nondigits should include the minus sign, as well as the list of files to select.
Note that this doesn't parse ls and doesn't require piping through a sort or spawning any other processes -- it's all handled in the bash interpreter and should be pretty efficient.
If you are sure of your data, and know there aren't any negatives or files named just 0 or non-plain-file entries in the directory that match the [0-9]* pattern, you can simplify it to just
cd /whatever/path/ # go where the files are
for file in [0-9]*; do (( file > var )) && var=$file; done
echo $var # report largest
As an aside, if you wanted to preserve the "make a list first" logic, you should still NOT use ls. Use an array.
cd /wherever/your/files/are/
files=( [0-9]* )
for file in "${files[#]}"
do : ...

Renaming files with numbered 0-padding names while leaving pre-renamed ones alone

I have several .jpg files in serveral folders about 20K actually. The filename are different like 123.jpg, abc.jpg, ab12.jpg. What is need is to rename all those files using bash script with leading 0 pattern.
I had used one code down below and while I do this the everytime I add new files the previous files get renamed again. Could anyone help me out from this situation and this would be really help full. I have searched entire web for this and not find one :(
#!/bin/bash
num=0
for i in *.jpg
do
a=`printf "%05d" $num`
mv "$i" "filename_$a.jpg"
let "num = $(($num + 1))"
done
To provide a concrete example of the problem, consider:
touch foo.jpg bar.jpg baz.jpg
The first time this script is run, bar.jpg is renamed to filename_00000.jpg; baz.jpg is renamed to filename_00001.jpg; foo.jpg is renamed to filename_00002.jpg. This behavior is acceptable.
If someone then runs:
touch a.jpg
...and runs the script again, then it renames a.jpg to filename_00000.jpg, renames filename_00000.jpg (now a.jpg, as the old version got overwritten!) to filename_00001.jpg, renames filename_00001.jpg to filename_00002.jpg, etc.
How can I make the program leave the files already matching filename_#####.jpg alone, and rename new files to have numbers after the last one that already exists?
#!/bin/bash
shopt -s extglob # enable extended globbing -- regex-like syntax
prefix="filename_"
# Find the largest-numbered file previously renamed
num=0 # initialize counter to 0
for f in "$prefix"+([[:digit:]]).jpg; do # Iterate only over names w/ prefix/suffix
f=${f#"$prefix"} # strip the prefix
f=${f%.jpg} # strip the suffix
if (( 10#$f > num )); then # force base-10 evaluation
num=$(( 10#$f ))
fi
done
# Second pass: Iterate over *all* names, and rename the ones that don't match the pattern
for i in *.jpg; do
[[ $i = "$prefix"+([[:digit:]]).jpg ]] && continue # Skip files already matching pattern
printf -v a '%05d' "$num" # More efficient than subshell use
until mv -n -- "$i" "$prefix$a.jpg"; do # "--" forces parse of "$i" as name
[[ -e "$i" ]] || break # abort if source file disappeared
num=$((num + 1)) # if we couldn't rename, increment num
printf -v a '%05d' "$num" # ...and try again with the next name
done
num=$((num + 1)) # modern POSIX math syntax
done
Note the use of mv -n to prevent overwrites -- that way two copies of this script running at the same time won't overwrite each others' files.

Iterate through several files in bash [duplicate]

This question already has answers here:
How to zero pad a sequence of integers in bash so that all have the same width?
(15 answers)
Closed 6 years ago.
I have a folder with several files that are named like this:
file.001.txt.gz, file.002.txt.gz, ... , file.150.txt.gz
What I want to do is use a loop to run a program with each file. I was thinking in something like this (just a sketch):
for i in {1:150}
gunzip file.$i.txt.gz
./my_program file.$i.txt output.$1.txt
gzip file.$1.txt
First of all, I don't know if something like this is gonna work, and second, I can't figure out how to keep the three digits numeration the file have ('001' instead of just '1').
Thanks a lot
The syntax for ranges in bash is
{1..150}
not {1:150}.
Moreover, if your bash is recent enough, you can add the leading zeroes:
{001..150}
The correct syntax of the for loop needs do and done.
for i in {001..150} ; do
# ...
done
It's unclear what $1 contains in your script.
To iterate over files I believe the simpler way is:
(assuming there are no files named 'file.*.txt' already in the directory and that your output file can have a different name)
for i in file.*.txt.gz; do
gunzip $i
./my_program $i $i-output.txt
gzip file.*.txt
done
Using find command:
# Path to the source directory
dir="./"
while read file
do
output="$(basename "$file")"
output="$(dirname "$file")/"${output/#file/output}
echo "$file ==> $output"
done < <(find "$dir" \
-regextype 'posix-egrep' \
-regex '.*file\.[0-9]{3}\.txt\.gz$')
The same via pipe:
find "$dir" \
-regextype 'posix-egrep' \
-regex '.*file\.[0-9]{3}\.txt\.gz$' | \
while read file
do
output="$(basename "$file")"
output="$(dirname "$file")/"${output/#file/output}
echo "$file ==> $output"
done
Sample output
/home/ruslan/tmp/file.001.txt.gz ==> /home/ruslan/tmp/output.001.txt.gz
/home/ruslan/tmp/file.002.txt.gz ==> /home/ruslan/tmp/output.002.txt.gz
(for $dir=/home/ruslan/tmp/).
Description
The scripts iterate the files in $dir directory. The $file variable is filled with the next line read from the find command.
The find command returns a list of paths corresponding to the regular expression '.*file\.[0-9]{3}\.txt\.gz$'.
The $output variable is built from two parts: basename (path without directories) and dirname (path to file's directory).
${output/#file/output} expression replaces file with output at the front end of $output variable (see Manipulating Strings)
Try-
for i in $(seq -w 1 150) #-w adds the leading zeroes
do
gunzip file."$i".txt.gz
./my_program file."$i".txt output."$1".txt
gzip file."$1".txt
done
The syntax for ranges is as choroba said, but when iterating over files you usually want to use a glob. If you know all the files have three digits in their names you can match on digits:
shopt -s nullglob
for i in file.0[0-9][0-9].txt.gz file.1[0-4][0-9] file.15[0].txt.gz; do
gunzip file.$i.txt.gz
./my_program file.$i.txt output.$i.txt
gzip file.$i.txt
done
This will only iterate through files that exist. If you use the range expression, you have to take extra care not to try to operate on files that don't exist.
for i in file.{000..150}.txt.gz; do
[[ -e "$i" ]] || continue
...otherstuff
done

List all the files with prefixes from a for loop using Bash

Here is a small[but complete] part of my bash script that finds and outputs all files in mydir if the have the prefix from a stored array. Strange thing I notice is that this script works perfectly if I take out the "-maxdepth 1 -name" from the script else it only gives me the files with the prefix of the first element in the array.
It would be of great help if someone explained this to me. Sorry in advance if there is some thing obviously silly that I'm doing. I'm relatively new to scripting.
#!/bin/sh
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in $DIS_ARRAY
do
IN_FILES=`find /mydir -maxdepth 1 -name "$dis*.xml"`
for file in $IN_FILES
do
echo $file
done
done
Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
Expected Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
/mydir/Bc.xml
/mydir/Cb.xml
/mydir/Dc.xml
The loop is broken either way. The reason why
IN_FILES=`find mydir -maxdepth 1 -name "$dis*.xml"`
works, whereas
IN_FILES=`find mydir "$dis*.xml"`
doesn't is because in the first one, you have specified -name. In the second one, find is listing all the files in mydir. If you change the second one to
IN_FILES=`find mydir -name "$dis*.xml"`
you will see that the loop isn't working.
As mentioned in the comments, the syntax that you are currently using $DIS_ARRAY will only give you the first element of the array.
Try changing your loop to this:
for dis in "${DIS_ARRAY[#]}"
The double quotes around the expansion aren't strictly necessary in your specific case, but required if the elements in your array contained spaces, as demonstrated in the following test:
#!/bin/bash
arr=("a a" "b b")
echo using '$arr'
for i in $arr; do echo $i; done
echo using '${arr[#]}'
for i in ${arr[#]}; do echo $i; done
echo using '"${arr[#]}"'
for i in "${arr[#]}"; do echo $i; done
output:
using $arr
a
a
using ${arr[#]}
a
a
b
b
using "${arr[#]}"
a a
b b
See this related question for further details.
#TomFenech's answer solves your problem, but let me suggest other improvements:
#!/usr/bin/env bash
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in "${DIS_ARRAY[#]}"
do
for file in "/mydir/$dis"*.xml
do
if [ -f "$file" ]; then
echo "$file"
fi
done
done
Your shebang line references sh, but your question is tagged bash - unless you need POSIX compliance, use a bash shebang line to take advantage of all that bash has to offer
To match files located directly in a given directory (i.e., if you don't need to traverse an entire subtree), use a glob (filename pattern) and rely on pathname expansion as in my code above - no need for find and command substitution.
Note that the wildcard char. * is UNquoted to ensure pathname expansion.
Caveat: if no matching files are found, the glob is left untouched (assuming the nullglob shell option is OFF, which it is by default), so the loop is entered once, with an invalid filename (the unexpanded glob) - hence the [ -f "$file" ] conditional to ensure that an actual match was found (as an aside: using bashisms, you could use [[ -f $file ]] instead).

How can I grep contents of files with bash only without using find or grep -r?

I have an assignment to write a bash program which if I type in the following:
-bash-4.1$ ./sample.sh path regex keyword
that will result something like that:
path/sample.txt:12
path/sample.txt:34
path/dir/sample1.txt:56
path/dir/sample2.txt:78
The numbers are the line number of the search results. I have absolutely no idea how can I achieve this in bash, without using find or grep -r. I am allowed to use grep, sed, awk, …
Break the problem into parts.
First, you need to obtain the file names to search in. How can you list the files in a directory and its subdirectories? (Hint: there's a glob pattern for that.)
You need to iterate over the files. What form of loop should this be?
For each file, you need to read each line from the file in turn. There's a builtin for that.
For each line, you need to test whether the line matches the specified regexp. There's a construct for that.
You need to maintain a counter of the number of lines read in a file to be able to print the line number.
Search for globstar in the bash manual.
See https://unix.stackexchange.com/questions/18886/why-is-while-ifs-read-used-so-often-instead-of-ifs-while-read/18936#18936 regarding while read loops.
shopt -s globstar # to enable **/
GLOBIGNORE=.:.. # to match dot files
dir=$1; regex=$2
for file in "$dir"/**/*; do
[[ -f $file ]] || continue
n=1
while IFS= read -r line; do
if [[ $line =~ $regex ]]; then
echo "$file:$n"
fi
((++n))
done <"$file"
done
It's possible that your teacher didn't intend you to use the globstar feature, which is a relatively recent addition to bash (appeared in version 4.0). If so, you'll need to write a recursive function to recurse into subdirectories.
traverse_directory () {
for x in "$1"/*; do
if [ -d "$x" ]; then
traverse_directory "$x"
elif [ -f "$x" ]; then
grep "$regexp" "$x"
fi
done
}
Putting this into practice:
#!/bin/sh
regexp="$2"
traverse_directory "$1"
Follow-up exercise: the glob pattern * omits files whose name begins with a . (dot files). You can easily match dot files as well by adding looping over .* as well, i.e. for x in .* *; do …. However, this throws the function into an infinite loop as it recurses forever into . (and also ..). How can you change the function to work with dot files as well?
while read
do
[[ $REPLY =~ foo ]] && echo $REPLY
done < file.txt

Resources