How to search for numbers in filename/data in shell script - shell

I have 10 files in a folder. All with similar pattern with text and number:
ABCDEF20141010_12345.txt
ABCDEF20141010_23456.txt
ABCDEF20141010_34567.txt
...
I need to process these files in a loop.
for filename in `ls -1 | egrep "ABCDEF[0-9]+\_[0-9]+.txt"`
do
<code>
done
Above egrep code, is not going inside the loop. Can you please help me in modifying this search?

You don't have to use ls and grep. It's working with shell-only functionalities:
for filename in ABCDEF[0-9]*_[0-9]*.txt
do
echo $filename
#do whatever
done

Related

How to select files in a directory begins with explicit names in bash?

I have a shell script as below
dcacheDirIn="/mypath/"
for files in `ls $dcacheDirIn | grep txt`
do
.....
done
I have some .txt files in this directory, some of them begins with Data2012*.txt and some of Data2011*.txt. How can I choose "Data2012" files?
EDIT: my bad I mixed up with my python file. This is shell script for sure.
You can try this
dcacheDirIn="/mypath/"
for files in `ls $dcacheDirIn | grep Data2012`
do
echo $files
done
To avoid directories with that name, try
ls $dcacheDirIn -p | grep -v / | grep Data2012
In Python you can use the glob library as follows:
import glob
for file2012 in glob.glob("/mypath/Data2012*.txt"):
print file2012
Tested using Python 2.7
You can use grep to achieve this directly:
dcacheDirIn="/mypath/"
for files in `ls $dcacheDirIn | grep -E 'Data2012.*\.txt'`
do
.....
done
grep uses regex to filter the output from ls. The regex I provided for grep will filter out files in the format Data2012*.txt, like you wanted.
The python glob library has that capability and it also supports regex expressions. So, for instance, you would do:
for file in glob.glob('*2012.txt'):
print file
and that would print the files matching that expression (assuming you're running it from the same directory). It has a heap-load more functionality though, you should dive deeper.
Edit: fixed indents, need more chars..
In bash the wildcards will do the work of ls for you.
Just use
dcacheDirIn="/mypath"
for file in $dcacheDirIn/Data2012*txt
do
echo "File $file"
done

Find missing files by their number?

I have a big list of ordered files with names like this
file_1.txt
file_2.txt
file_3.txt
file_6.txt
file_7.txt
file_8.txt
file_10.txt
In this case it is easy to see that files: file_4.txt,file_5.txt and file_9.txt are missing, but if i have a big list how can i find the missing files? i am just learning bash so i just know some simple examples. like this
for i in $(seq 1 1000) ;
do
if [i not in *.txt]; then
echo $i;
done
But this doesnt even work unless i erase the if [i not in *.txt];then line
so it just writes all the numbers between 1 and 1000.
I hope you can help me.
Thanks in advance.
If they are in a file then this should work
awk 'match($0,/([0-9]+)/,a){a[1]>max&&max=a[1];b[a[1]]++}
END{for(i=1;i<max;i++)if(!b[i])print "file_"i".txt"}' file
Output
file_4.txt
file_5.txt
file_9.txt
The suggestion from #user4453924 really helped me out. It does not have to be in a file, just pipe output from ls into his awk command, and you should be fine:
ls *.txt | awk 'match($0,/([0-9]+)/,a){a[1]>max&&max=a[1];b[a[1]]++}
END{for(i=1;i<max;i++)if(!b[i])print "file_"i".txt"}'
Outputs:
file_4.txt
file_5.txt
file_9.txt
Alternatively, if you prefer to do it in a two step fashion, it would be quite simple to pipe the output from ls into a file, and then use his command directly on the file, as it is:
ls *.txt > filelist.txt
awk 'match($0,/([0-9]+)/,a){a[1]>max&&max=a[1];b[a[1]]++}
END{for(i=1;i<max;i++)if(!b[i])print "file_"i".txt"}' filelist.txt
One way to do this is by
## TODO: You need to change the following path:
THELIST=/path/to/input-file
for i in $(seq 1 10);
do
FOUND=`grep "file_$i.txt" "$THELIST"` #look for file $i in $THELIST
#Note: double quotes were placed around $THELIST
# in case there is whitespace in the filename
[[ "$FOUND" == "" ]] && echo $i #if what you found is empty, then output $i
done
You can find info about [[ ... ]] here: What is the difference between single and double square brackets in Bash?
square-brackets

Using regular expressions to get parts of file name

On Mac (OS X) I have a directory with many images named like this:
IMG_250x333_1.jpg
IMG_250x333_2.jpg
IMG_250x333_3.jpg
...
I need to rename all of them to:
IMG_1.jpg
IMG_2.jpg
IMG_3.jpg
...
I think using a UNIX command line with "mv" and a kind of regex would do the job, but I don't know how! Can someone please help?
Thanks!
What happens if there's a IMG_111x333_1.jpg and also a IMG_444x222_1.jpg? You risk mangling/overwriting something...
But if that is what you want, you can do it like this:
#!/bin/bash
for f in *.jpg; do
new=${f/_*_/_}
echo mv "$f" $new
done
If you like what it is doing, remove the word echo.
Here's an approach I like:
ls | sed 's/\(.*\)250x333_\(.*\)/mv "&" "\1\2"/' | sh
List the files with ls.
Then, transform the filenames with sed, and generate a mv command. Note that the & in the sed command outputs the full input string.
Finally, evaluate the mv command with sh
The nice thing about this approach is you can remove | sh and test that your regex is correct.

Shell script: Count number of files in a particular type extension in single folder

I am new with shell script.
I need to save the number of files with particular extension(.properties) in a variable using shell script.
I have used
ls |grep .properties$ |wc -l
but this command prints the number of properties files in the folder. How can I assign this value in a variable.
I have tried
count=${ls |grep .properties$ |wc -l}
But it is showing error like:
./replicate.sh: line 57: ${ls |grep .properties$ |wc -l}: bad substitution
What is this type of errors?
Please anyone help me to save the number of particular files in a variable for future use.
You're using the wrong brackets, it should be $() (command output substitution) rather than ${} (variable substitution).
count=$(ls -1 | grep '\.properties$' | wc -l)
You'll also notice I've use ls -1 to force one file per line in case your ls doesn't do this automatically for pipelines, and changed the pattern to match the . correctly.
You can also bypass the grep totally if you use something like:
count=$(ls -1 *.properties 2>/dev/null | wc -l)
Just watch out for "evil" filenames like those with embedded newlines for example, though my ls seems to handle these fine by replacing the newline with a ? character - that's not necessarily a good idea for doing things with files but it works okay for counting them.
There are better tools to use if you have such beasts and you need the actual file name, but they're rare enough that you generally don't have to worry about it.
You could use a loop with globbing:
count=0
for i in *.properties; do
count=$((count+1))
done
If you are using a shell that supports arrays, you can simply capture all such file names
files=( *.properties )
and then determine the number of array elements
count=${#files[#]}
(The above assumes bash; other shells may require slightly different syntax.)
You'd better use find instead of parsing ls. Then, use the var=$(command) syntax to store the value.
var=$(find . -maxdepth 1 -name "*\.properties" | wc -l)
Reference: Why you shouldn't parse the output of ls.
To solve the problem appearing if any file name contains new lines, you can use what chepner suggests in the comments:
var=$(find . -maxdepth 1 -name "*\.properties" -exec 'echo 1' | wc -l)
so that for every match it will print not the name, but any random character (in this case, 1) and then the amount of them will be counted to produce the correct output.
Use:
count=`ls|grep .properties$ | wc -l`
echo $count
You could write your assignment like this:
count=$(ls -q | grep -c '\.properties$')
or
count=$(ls -qA | grep -c '\.properties$')
if you want to include hidden files.
This works with all kind of filenames because we're using ls with q.
Sure it's easier to link to some webpage that tells you to "never parse ls" than to read the ls manual and see there's a q option (and that most implementations default to q if the output is to a terminal device which explains why some people here state their ls seems to handle filenames with newlines just fine by replacing the newline with a ? character).

How to copy multiple files and rename them at once by appending a string in between the file names in Unix?

I have a few files that I want to copy and rename with the new file names generated by adding a fixed string to each of them.
E.g:
ls -ltr | tail -3
games.txt
files.sh
system.pl
Output should be:
games_my.txt
files_my.sh
system_my.pl
I am able to append at the end of file names but not before *.txt.
for i in `ls -ltr | tail -10`; do cp $i `echo $i\_my`;done
I am thinking if I am able to save the extension of each file by a simple cut as follows,
ext=cut -d'.' -f2
then I can append the same in the above for loop.
do cp $i `echo $i$ext\_my`;done
How do I achieve this?
You can use the following:
for file in *
do
name="${file%.*}"
extension="${file##*.}"
cp $file ${name}_my${extension}
done
Note that ${file%.*} returns the file name without extension, so that from hello.txt you get hello. By doing ${file%.*}_my.txt you then get from hello.txt -> hello_my.txt.
Regarding the extension, extension="${file##*.}" gets it. It is based on the question Extract filename and extension in bash.
If the shell variable expansion mechanisms provided by fedorqui's answer look too unreadable to you, you also can use the unix tool basename with a second argument to strip off the suffix:
for file in *.txt
do
cp -i "$file" "$(basename "$file" .txt)_my.txt"
done
Btw, in such cases I always propose to apply the -i option for cp to prevent any unwanted overwrites due to typing errors or similar.
It's also possible to use a direct replacement with shell methods:
cp -i "$file" "${file/.txt/_my.txt}"
The ways are numerous :)

Resources