Find missing files by their number? - bash

I have a big list of ordered files with names like this
file_1.txt
file_2.txt
file_3.txt
file_6.txt
file_7.txt
file_8.txt
file_10.txt
In this case it is easy to see that files: file_4.txt,file_5.txt and file_9.txt are missing, but if i have a big list how can i find the missing files? i am just learning bash so i just know some simple examples. like this
for i in $(seq 1 1000) ;
do
if [i not in *.txt]; then
echo $i;
done
But this doesnt even work unless i erase the if [i not in *.txt];then line
so it just writes all the numbers between 1 and 1000.
I hope you can help me.
Thanks in advance.

If they are in a file then this should work
awk 'match($0,/([0-9]+)/,a){a[1]>max&&max=a[1];b[a[1]]++}
END{for(i=1;i<max;i++)if(!b[i])print "file_"i".txt"}' file
Output
file_4.txt
file_5.txt
file_9.txt

The suggestion from #user4453924 really helped me out. It does not have to be in a file, just pipe output from ls into his awk command, and you should be fine:
ls *.txt | awk 'match($0,/([0-9]+)/,a){a[1]>max&&max=a[1];b[a[1]]++}
END{for(i=1;i<max;i++)if(!b[i])print "file_"i".txt"}'
Outputs:
file_4.txt
file_5.txt
file_9.txt
Alternatively, if you prefer to do it in a two step fashion, it would be quite simple to pipe the output from ls into a file, and then use his command directly on the file, as it is:
ls *.txt > filelist.txt
awk 'match($0,/([0-9]+)/,a){a[1]>max&&max=a[1];b[a[1]]++}
END{for(i=1;i<max;i++)if(!b[i])print "file_"i".txt"}' filelist.txt

One way to do this is by
## TODO: You need to change the following path:
THELIST=/path/to/input-file
for i in $(seq 1 10);
do
FOUND=`grep "file_$i.txt" "$THELIST"` #look for file $i in $THELIST
#Note: double quotes were placed around $THELIST
# in case there is whitespace in the filename
[[ "$FOUND" == "" ]] && echo $i #if what you found is empty, then output $i
done
You can find info about [[ ... ]] here: What is the difference between single and double square brackets in Bash?
square-brackets

Related

Renames numbered files using names from list in other file

I have a folder where there are books and I have a file with the real name of each file. I renamed them in a way that I can easily see if they are ordered, say "00.pdf", "01.pdf" and so on.
I want to know if there is a way, using the shell, to match each of the lines of the file, say "names", with each file. Actually, match the line i of the file with the book in the positión i in sort order.
<name-of-the-book-in-the-1-line> -> <book-in-the-1-position>
<name-of-the-book-in-the-2-line> -> <book-in-the-2-position>
.
.
.
<name-of-the-book-in-the-i-line> -> <book-in-the-i-position>
.
.
.
I'm doing this in Windows, using Total Commander, but I want to do it in Ubuntu, so I don't have to reboot.
I know about mv and rename, but I'm not as good as I want with regular expressions...
renamer.sh:
#!/bin/bash
for i in `ls -v |grep -Ev '(renamer.sh|names.txt)'`; do
read name
mv "$i" "$name.pdf"
echo "$i" renamed to "$name.pdf"
done < names.txt
names.txt: (line count must be the exact equal to numbered files count)
name of first book
second-great-book
...
explanation:
ls -v returns naturally sorted file list
grep excludes this script name and input file to not be renamed
we cycle through found file names, read value from file and rename the target files by this value
For testing purposes, you can comment out the mv command:
#mv "$i" "$name"
And now, simply run the script:
bash renamer.sh
This loops through names.txt, creates a filename based on a counter (padding to two digits with printf, assigning to a variable using -v), then renames using mv. ((++i)) increases the counter for the next filename.
#!/bin/bash
i=0
while IFS= read -r line; do
printf -v fname "%02d.pdf" "$i"
mv "$fname" "$line"
((++i))
done < names.txt

In ksh how do I iterate through a list of directories and cd to them using a loop?

Basically I have a flat file that is setup like so:
/this/is/a/log/directory | 30
/this/also/is/having/logs | 45
/this/logs/also | 60
What I'm trying to do is extract the first column of this flat file which is the directory path and check if it has more than 500 log files in it. If it does, remove all but the newest 500 files.
I was trying to do something like this
#!/bin/ksh
for each in "$(awk '{ print $1 }' flat_file)"; do
cd "$each";
if [ "ls -l | wc -l >= 500" ]; then
rm `ls -t | awk 'NR>500'`
else
:
fi
done
However from what I've read I cannot cd from within my script with the for loop like I was trying to do and that you can do it from within a function, at which point I basically just made a function and copied that code into it, and it of course didn't work (not too familiar with shell scripting). Something similar to Python's OS module where I could just use os.listdir() and pass in the directory names would be perfect, however I have yet to be able to figure out an easy way to do this.
OK, you're on the right track, but you'll confuse the csh programmers that look at your code with for each. Why not
for dir in $( awk '{ print $1 }' flat_file ) ; do
cd "$dir"
if (( $(ls -l | wc -l) >= 500 )); then
rm $( ls -t | awk 'NR>500' )
fi
cd -
done
Lots of little things in your original code. Why use backticks sometimes, when you are using the preferred form of cmd-sub $( cmd ) other times.
Enclosing your "$(awk '{print $1}' file)" in dbl-quotes will turn the complete output of the cmd-substition into 1 long string, it won't find a dir named "dir1 dir2 dir3 .... dirn", right?
You don't need a null (:) else. You can just eliminate that block of code.
ksh supports math operations inside (( .... )) pairs (just like bash).
cd - will take you back to the previous directory.
Learn to use the shell debug/trace, set -vx. it will show you first, what it going to be executed (sometimes a very large loop structure) and then it will show each line that does get executed, preceded with + and where variables have been converted into their values. You might also want to use export PS4='$LINENO >' so debugging will show current lineNo that is being executed.
IHTH

How to search for numbers in filename/data in shell script

I have 10 files in a folder. All with similar pattern with text and number:
ABCDEF20141010_12345.txt
ABCDEF20141010_23456.txt
ABCDEF20141010_34567.txt
...
I need to process these files in a loop.
for filename in `ls -1 | egrep "ABCDEF[0-9]+\_[0-9]+.txt"`
do
<code>
done
Above egrep code, is not going inside the loop. Can you please help me in modifying this search?
You don't have to use ls and grep. It's working with shell-only functionalities:
for filename in ABCDEF[0-9]*_[0-9]*.txt
do
echo $filename
#do whatever
done

Find the file with highest numeric value using shell script

I have many files with alphanumeric names like
abc2,abc5,cat1,dog6,horse5,abc3,cat3,dog8,horse9,abc8
I want to find the file with highest numeric value and starts with abc. In this case the file I'm looking for is abc8 . I want a shell script to do this thing.
Could anyone please help me.
Thanks for your time.
You can use:
p=0; for f in abc*; do n="${f#abc}"; ((n>p)) && p=$n && of="$f"; done
echo "$of"
abc8
Another way
ls -d abc*|sort -nr |head -1

Extract part of a filename shell script

In bash I would like to extract part of many filenames and save that output to another file.
The files are formatted as coffee_{SOME NUMBERS I WANT}.freqdist.
#!/bin/sh
for f in $(find . -name 'coffee*.freqdist)
That code will find all the coffee_{SOME NUMBERS I WANT}.freqdist file. Now, how do I make an array containing just {SOME NUMBERS I WANT} and write that to file?
I know that to write to file one would end the line with the following.
> log.txt
I'm missing the middle part though of how to filter the list of filenames.
You can do it natively in bash as follows:
filename=coffee_1234.freqdist
tmp=${filename#*_}
num=${tmp%.*}
echo "$num"
This is a pure bash solution. No external commands (like sed) are involved, so this is faster.
Append these numbers to a file using:
echo "$num" >> file
(You will need to delete/clear the file before you start your loop.)
If the intention is just to write the numbers to a file, you do not need find command:
ls coffee*.freqdist
coffee112.freqdist coffee12.freqdist coffee234.freqdist
The below should do it which can then be re-directed to a file:
$ ls coffee*.freqdist | sed 's/coffee\(.*\)\.freqdist/\1/'
112
12
234
Guru.
The previous answers have indicated some necessary techniques. This answer organizes the pipeline in a simple way that might apply to other jobs as well. (If your sed doesn't support ‘;’ as a separator, replace ‘;’ with ‘|sed’.)
$ ls */c*; ls c*
fee/coffee_2343.freqdist
coffee_18z8.x.freqdist coffee_512.freqdist coffee_707.freqdist
$ find . -name 'coffee*.freqdist' | sed 's/.*coffee_//; s/[.].*//' > outfile
$ cat outfile
512
18z8
2343
707

Resources