Find the biggest index in extension of file in a bash script - bash

So I have a folder with bunch of files.
File, File.0, File.1, File.2
I'm trying to find the biggest index in extension of this files. So it has to be 2.
I wrote this command, which count all files with numeric extension.
But it's not working properly when the index is greater than 10. It's not working at all, because I just want to find biggest index, not sum of file with number in index.
$1 (is file name in this case File)
y=$(echo $(ls -d $1.[0-inf] | wc -l))
How can I do this ?

First tip : do not parse the output of ls. Especially in your case.
You could use the following script in pure bash to address your issue :
#!/bin/bash
# needed for correct glob expansion
shopt -s nullglob
# we check every file following the format $1.extension
max_index=0
for f in $1.*
do
# we retrieve the last extension
ext=${f##*.}
re="^[0-9]+$"
# if ext is a number and greater than our max, we store it
if [[ $ext =~ $re && $ext -gt $max_index ]]
then
max_index=$ext
fi
done
echo $max_index

You can try this:
for i in file\.*; do echo ${i##*.}; done | sort -g | tail -n1
${i##*.} is removing everything before the last . in the filename.
sort -g is sorting as numeric value.
tail -n1 prints the last index.
A more error prone way is to use findcommand as the it will cope with file not matching the pattern, filename with spaces...
find -type f -name "file\.*" -exec bash -c 'echo ${1/*\.}' _ "{}" \; 2>/dev/null | sort -n | tail -n1
bash -c 'echo ${1/*\.}' _ "{}" is the command that will strip the characters before the ..
You may want to add -maxdepth 1 at the beginning of the command to avoid looking recursively inside directories.

Related

linux show head of the first file from ls command

I have a folder, e.g. named 'folder'. There are 50000 txt files under it, e.g, '00001.txt, 00002.txt, etc'.
Now I want to use one command line to show the head 10 lines in '00001.txt'. I have tried:
ls folder | head -1
which will show the filename of the first:
00001.txt
But I want to show the contents of folder/00001.txt
So, how do I do something like os.path.join(folder, xx) and show its head -10?
The better way to do this is not to use ls at all; see Why you shouldn't parse the output of ls, and the corresponding UNIX & Linux question Why not parse ls (and what to do instead?).
On a shell with arrays, you can glob into an array, and refer to items it contains by index.
#!/usr/bin/env bash
# ^^^^- bash, NOT sh; sh does not support arrays
# make array files contain entries like folder/0001.txt, folder/0002.txt, etc
files=( folder/* ) # note: if no files found, it will be files=( "folder/*" )
# make sure the first item in that array exists; if it does't, that means
# the glob failed to expand because no files matching the string exist.
if [[ -e ${files[0]} || -L ${files[0]} ]]; then
# file exists; pass the name to head
head -n 10 <"${files[0]}"
else
# file does not exist; spit out an error
echo "No files found in folder/" >&2
fi
If you wanted more control, I'd probably use find. For example, to skip directories, the -type f predicate can be used (with -maxdepth 1 to turn off recursion):
IFS= read -r -d '' file < <(find folder -maxdepth 1 -type f -print0 | sort -z)
head -10 -- "$file"
Although hard to understand what you are asking but I think something like this will work:
head -10 $(ls | head -1)
Basically, you get the file from $(ls | head -1) and then print the content.
If you invoke the ls command as ls "$PWD"/folder, it will include the absolute path of the file in the output.

How to use a while read filename; do to take filenames strip "(-to the end" and then create a directory with that information?

I have hundreds of movies saved as "Title (year).mkv". They are all in one directory, however, I wish to create a directory by just using the "Title" of the file and then mv the filename into the newly created directory to clean things up a little bit.
Here is what I have so far:
dest=/storage/Uploads/destination/
find "$dest" -maxdepth 1 -mindepth 1 -type f -printf "%P\n" | sort -n | {
while read filename ; do
echo $filename;
dir=${filename | cut -f 1 -d '('};
echo $dir;
# mkdir $dest$dir;
# rename -n "s/ *$//" *;
done;
}
~
dest=/storage/Uploads/destination/
is my working dirctory
find $dest -maxdepth 1 -mindepth 1 type f -printf "%P\n" | sort -n | {
is my find all files in $dest variable
while read filename ; do
as long as there's a filename to read, the loop continues
echo $filename
just so I can see what it is
dir=${filename | cut -f 1 -d '('};
dir = the results of command within the {}
echo $dir;
So I can see the name of the upcoming directory
mkdir $dest$dir;
Make the directory
rename -n "s/ *$//" *;
will rename the pesky directories that have a trailing space
And since we have more files to read, starts over until the last one, and
done;
}
When I run it, I get"
./new.txt: line 8: ${$filename | cut -f 1 -d '('}: bad substitution
I have two lines commented so it won't use those until I get the other working. Anyone have a way to do what I'm trying to do? I would prefer a bash script so I can run it again when necessary.
Thanks in advance!
dir=${filename | cut -f 1 -d '('}; is invalid. To run a command and capture it's output use $( ) and echo the text into the pipe. By the way, that cut will leave a trailing space which you probably don't want.
But don't use external programs like cut when there is no need, bash expansion will do it for you, and get rid of the trailing space:
filename="Title (year).mkv"
# remove all the characters on the right after and including <space>(
dir=${filename%% (*}
echo "$dir"
Gives
Title
General syntax is %%pattern to remove the longest pattern from the right. Pattern uses the glob (filename expansion) syntax, so (* is a space, followed by ( followed by zero or more of any character.
% is the shortest pattern, and ## and # do the same but remove from the left of the pattern.

How to use bash string formatting to reverse date format?

I have a lot of files that are named as: MM-DD-YYYY.pdf. I want to rename them as YYYY-MM-DD.pdf I’m sure there is some bash magic to do this. What is it?
For files in the current directory:
for name in ./??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
Recursively, in or under the current directory:
find . -type f -name '??-??-????.pdf' -exec bash -c '
for name do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done' bash {} +
Enabling the globstar shell option in bash lets us do the following (will also, like the above solution, handle all files in or below the current directory):
shopt -s globstar
for name in **/??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
All three of these solutions uses a regular expression to pick out the relevant parts of the filenames, and then rearranges these parts into the new name. The only difference between them is how the list of pathnames is generated.
The code prefixes mv with echo for safety. To actually rename files, remove the echo (but run at least once with echo to see that it does what you want).
A direct approach example from the command line:
$ ls
10-01-2018.pdf 11-01-2018.pdf 12-01-2018.pdf
$ ls [0-9]*-[0-9]*-[0-9]*.pdf|sed -r 'p;s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\1-\2/'|xargs -n2 mv
$ ls
2018-10-01.pdf 2018-11-01.pdf 2018-12-01.pdf
The ls output is piped to sed , then we use the p flag to print the argument without modifications, in other words, the original name of the file, and s to perform and output the conversion.
The ls + sed result is a combined output that consist of a sequence of old_file_name and new_file_name.
Finally we pipe the resulting feed through xargs to get the effective rename of the files.
From xargs man:
-n number Execute command using as many standard input arguments as possible, up to number arguments maximum.
You can use the following command very close to the one of klashxx:
for f in *.pdf; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
before:
ls *.pdf
12-01-1998.pdf 12-03-2018.pdf
after:
ls *.pdf
1998-01-12.pdf 2018-03-12.pdf
Also if you have other pdf files that does not respect this format in your folder, what you can do is to select only the files that respect the format: MM-DD-YYYY.pdf to do so use the following command:
for f in `find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf' | xargs -n1 basename`; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
Explanations:
find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf this find command will look only for files in the current working directory that respect your syntax and extract their basename (remove the ./ at the beginning, folders and other type of files that would have the same name are not taken into account, other *.pdf files are also ignored.
for each file you do a move and the resulting file name is computed using sed and back reference to the 3 groups for MM,DD and YYYY
For these simple filenames, using a more verbose pattern, you can simplify the body of the loop a bit:
twodigit=[[:digit:]][[:digit:]]
fourdigit="$twodigit$twodigit"
for f in $twodigit-$twodigit-$fourdigit.pdf; do
IFS=- read month day year <<< "${f%.pdf}"
mv "$f" "$year-$month-$day.pdf"
done
This is basically #Kusalananda's answer, but without the verbosity of regular-expression matching.

List files whose last line doesn't contain a pattern

The very last line of my file should be "#"
if I tail -n 1 * | grep -L "#" the result is (standard input) obviously because it's being piped.
was hoping for a grep solution vs reading the entire file and just searching the last line.
for i in *; do tail -n 1 "$i" | grep -q -v '#' && echo "$i"; done
You can use sed for that:
sed -n 'N;${/pattern/!p}' file
The above command prints all lines of file if it's last line doesn't contain a pattern.
However, it looks like I misunderstood you, you want only to print the file names of the those files where the last line doesn't match the pattern. In this case I would use find together with the following (GNU) sed command:
find -maxdepth 1 -type f -exec sed -n '${/pattern/!F}' {} \;
The find command iterates over all files in the current folder and executes the sed command. $ marks the last line of input. If /pattern/ isn't found ! then F prints the file name.
The solution above looks nice and executes fast it has a drawback it would not print the names of empty files, since the last line will never reached and $ will not match.
For a stable solution I would suggest to put the commands into a script:
script.sh
#!/bin/bash
# Check whether the file is empty ...
if [ ! -s "$1" ] ; then
echo "$1"
else
# ... or if the last line contains a pattern
sed -n '${/pattern/!F}' "$1"
# If you don't have GNU sed you can use this
# (($(tail -n1 a.txt | grep -c pattern))) || echo "$1"
fi
make it executable
chmod +x script.sh
And use the following find command:
find -maxdepth 1 -type f -exec ./script.sh {} \;
Consider this one-liner:
while read name ; do tail -n1 "$name" | grep -q \# || echo "$name" does not contain the pattern ; done < <( find -type f )
It uses tail to get the last line of each file and grep to test that line against the pattern. Performance will not be the best on many files because two new processes are started in each iteration.

How to get the number of files in a folder as a variable?

Using bash, how can one get the number of files in a folder, excluding directories from a shell script without the interpreter complaining?
With the help of a friend, I've tried
$files=$(find ../ -maxdepth 1 -type f | sort -n)
$num=$("ls -l" | "grep ^-" | "wc -l")
which returns from the command line:
../1-prefix_blended_fused.jpg: No such file or directory
ls -l : command not found
grep ^-: command not found
wc -l: command not found
respectively. These commands work on the command line, but NOT with a bash script.
Given a file filled with image files formatted like 1-pano.jpg, I want to grab all the images in the directory to get the largest numbered file to tack onto the next image being processed.
Why the discrepancy?
The quotes are causing the error messages.
To get a count of files in the directory:
shopt -s nullglob
numfiles=(*)
numfiles=${#numfiles[#]}
which creates an array and then replaces it with the count of its elements. This will include files and directories, but not dotfiles or . or .. or other dotted directories.
Use nullglob so an empty directory gives a count of 0 instead of 1.
You can instead use find -type f or you can count the directories and subtract:
# continuing from above
numdirs=(*/)
numdirs=${#numdirs[#]}
(( numfiles -= numdirs ))
Also see "How can I find the latest (newest, earliest, oldest) file in a directory?"
You can have as many spaces as you want inside an execution block. They often aid in readability. The only downside is that they make the file a little larger and may slow initial parsing (only) slightly. There are a few places that must have spaces (e.g. around [, [[, ], ]] and = in comparisons) and a few that must not (e.g. around = in an assignment.
ls -l | grep -v ^d | wc -l
One line.
How about:
count=$(find .. -maxdepth 1 -type f|wc -l)
echo $count
let count=count+1 # Increase by one, for the next file number
echo $count
Note that this solution is not efficient: it spawns sub shells for the find and wc commands, but it should work.
file_num=$(ls -1 --file-type | grep -v '/$' | wc -l)
this is a bit lightweight than a find command, and count all files of the current directory.
The most straightforward, reliable way I can think of is using the find command to create a reliably countable output.
Counting characters output of find with wc:
find . -maxdepth 1 -type f -printf '.' | wc --char
or string length of the find output:
a=$(find . -maxdepth 1 -type f -printf '.')
echo ${#a}
or using find output to populate an arithmetic expression:
echo $(($(find . -maxdepth 1 -type f -printf '+1')))
Simple efficient method:
#!/bin/bash
RES=$(find ${SOURCE} -type f | wc -l)
Get rid of the quotes. The shell is treating them like one file, so it's looking for "ls -l".
REmove the qoutes and you will be fine
Expanding on the accepted answer (by Dennis W): when I tried this approach I got incorrect counts for dirs without subdirs in Bash 4.4.5.
The issue is that by default nullglob is not set in Bash and numdirs=(*/) sets an 1 element array with the glob pattern */. Likewise I suspect numfiles=(*) would have 1 element for an empty folder.
Setting shopt -s nullglob to disable nullglobbing resolves the issue for me. For an excellent discussion on why nullglob is not set by default on Bash see the answer here: Why is nullglob not default?
Note: I would have commented on the answer directly but lack the reputation points.
Here's one way you could do it as a function. Note: you can pass this example, dirs for (directory count), files for files count or "all" for count of everything in a directory. Does not traverse tree as we aren't looking to do that.
function get_counts_dir() {
# -- handle inputs (e.g. get_counts_dir "files" /path/to/folder)
[[ -z "${1,,}" ]] && type="files" || type="${1,,}"
[[ -z "${2,,}" ]] && dir="$(pwd)" || dir="${2,,}"
shopt -s nullglob
PWD=$(pwd)
cd ${dir}
numfiles=(*)
numfiles=${#numfiles[#]}
numdirs=(*/)
numdirs=${#numdirs[#]}
# -- handle input types files/dirs/or both
result=0
case "${type,,}" in
"files")
result=$((( numfiles -= numdirs )))
;;
"dirs")
result=${numdirs}
;;
*) # -- returns all files/dirs
result=${numfiles}
;;
esac
cd ${PWD}
shopt -u nullglob
# -- return result --
[[ -z ${result} ]] && echo 0 || echo ${result}
}
Examples of using the function :
folder="/home"
get_counts_dir "files" "${folder}"
get_counts_dir "dirs" "${folder}"
get_counts_dir "both" "${folder}"
Will print something like :
2
4
6
Short and sweet method which also ignores symlinked directories.
count=$(ls -l | grep ^- | wc -l)
or if you have a target:
count=$(ls -l /path/to/target | grep ^- | wc -l)

Resources