Rename the extension of huge number of files Bash - bash

I want to change extension of large number of files,
Example: "ABC.dat.length.doc.fasta"
Desired: ABC.mat

You can use bash parameter expansion in a for loop:
for file in *.length.doc.fasta
do
mv "$file" "${file%dat.length.doc.fasta}"mat
done
Another way to do it:
for file in *.length.doc.fasta
do
mv "$file" "${file%%.*}".mat
done

Alternatively to using bash or awk it is possible to use the rarely used command line tool rename:
rename .dat.length.doc.fasta .mat *.dat.length.doc.fasta
This replaces the first argument with the second argument for the files provided is subsequent arguments (here represented by the shell expansion of *.dat.length.doc.fasta).

prename will do this easily for you. Uses regular expressions to locate and rename files.
prename 's/(.*)\.dat\.length\.doc\.fasta/$1.mat/' *.dat.length.doc.fasta

You can do it in a single line using AWK:
ls -1 *.dat.length.doc.fasta | awk -F"." '{system("mv "$0" "$1".mat")}' -
Input:
$ ls
ABC.dat.length.doc.fasta
ABCD.dat.length.doc.fasta
ABCDE.dat.length.doc.fasta
do_not_touch.txt
Output:
$ ls
ABCDE.mat
ABCD.mat
ABC.mat
do_not_touch.txt

ls -1 *.dat.length.doc.fasta | nawk '{p=$0;gsub("dat.length.doc.fasta","mat");system("mv "p" "$0); }'

Related

batch rename matching files using 1st field to replace and 2nd as search criteria

I have a very large selection of files eg.
foo_de.vtt, foo_en.vtt, foo_es.vtt, foo_fr.vtt, foo_pt.vtt, baa_de.vtt, baa_en.vtt, baa_es.vtt, baa_fr.vtt, baa_pt.vtt... etc.
I have created a tab separated file, filenames.txt containing the current string and replacement string eg.
foo 1000
baa 1016
...etc
I want to rename all of the files to get the following:
1000_de.vtt, 1000_en.vtt, 1000_es.vtt, 1000_fr.vtt, 1000_pt.vtt, 1016_de.vtt, 1016_en.vtt, 1016_es.vtt, 1016_fr.vtt, 1016_pt.vtt
I know I can use a utility like rename to do it manually term by term eg:
rename 's/foo/1000/g' *.vtt
could i chain this into an awk command so that it could run through the filenames.txt?
or is there an easier way to do it just in awk? I know I can rename with awk such as:
find . -type f | awk -v mvCmd='mv "%s" "%s"\n' \
'{ old=$0;
gsub(/foo/,"1000");
printf mvCmd,old,$0;
}' | sh
How can I get awk to process filenames.txt and do all of this in one go?
This question is similar but uses sed. I feel that being tab separated this should be quite easy in awk?
First ever post so please be gentle!
Solution
Thanks for all your help. Ultimately I was able to solve by adapting your answers to the following:
while read new old; do
rename "s/$old/$new/g" *.vtt;
done < filenames.txt
I'm assuming that the strings in the TSV file are literals (not regexes nor globs) and that the part to be replaced can be located anywhere in the filenames.
With that said, you can use mv with shell globs and bash parameter expansion:
#!/bin/bash
while IFS=$'\t' read -r old new
do
for f in *"$old"*.vtt
do
mv "$f" "${f/"$old"/$new}"
done
done < file.tsv
Or with GNU rename (more performant):
while IFS=$'\t' read -r old new
do
rename "$old" "$new" *"$old"*.vtt
done < file.tsv
This might work for you (GNU sed and rename):
sed -E 's#(.*)\t(.*)#rename -n '\''s/\1/\2/'\'' \1*#e' ../file
This builds a script which renames the files in the current directory using file to match and replace parts of the filenames.
Once you are happy with the results, remove the -n and the renaming will be enacted.

How to read all text file with head linux command?

I can't read or apply any other commands like cat or strings on .txt files because it is not allowed. I need to read a file named flag.txt, but this file is also on the blacklist. So, is there any way to read *.txt using the head command? The head command is allowed.
blacklist=\
'flag\|<\|$\|"\|'"'"'\|'\
'cat\|tac\|*\|?\|less\|more\|pico\|nano\|edit\|hexdump\|xxd\|'\
'sed\|tail\|diff\|grep\|paste\|strings\|bas64\|sort\|uniq\|cut\|awk\|'\
'bzip\|gzip\|xz\|tar\|ar\|'\
'mv\|cp\|ln\|nl\|'\
'python\|perl\|sh\|cc\|g++\|php\|hd\|g++\|gcc\|curl\|tcp\|udp\|'\
'scp\|sftp\|wget\|nc\|netcat'
Thanks
do you want some alternative of the command head *.txt? if so, ls/findand xargs will help, but it can not identify .txt file, it will read all the file under the directory.
ls -1| xargs head
You can use the ` (backtick) in the following way:
head `ls -1`
Backtick has a very special meaning. Everything you type between
backticks is evaluated (executed) by the shell before the main command
So the command will do the following:
`ls -1` - will result with the file names
head - will show the start of the files listed in ls -1
More info about backtick can be found in this answer
If you need a glob that matches flag.txt but can use neither * not the string flag, you can use fl[a]g.txt instead. Then, to print the entire file using head, use -c and pass it the size of the file:
head -c $(stat -c '%s' fl[a]g.txt) fl[a]g.txt
Another approach would be to use the shell to read the file:
while IFS= read -r c; do echo $c; done < fl[a]g.txt
You could also just use paste:
paste fl[a]g.txt

How to select files in a directory begins with explicit names in bash?

I have a shell script as below
dcacheDirIn="/mypath/"
for files in `ls $dcacheDirIn | grep txt`
do
.....
done
I have some .txt files in this directory, some of them begins with Data2012*.txt and some of Data2011*.txt. How can I choose "Data2012" files?
EDIT: my bad I mixed up with my python file. This is shell script for sure.
You can try this
dcacheDirIn="/mypath/"
for files in `ls $dcacheDirIn | grep Data2012`
do
echo $files
done
To avoid directories with that name, try
ls $dcacheDirIn -p | grep -v / | grep Data2012
In Python you can use the glob library as follows:
import glob
for file2012 in glob.glob("/mypath/Data2012*.txt"):
print file2012
Tested using Python 2.7
You can use grep to achieve this directly:
dcacheDirIn="/mypath/"
for files in `ls $dcacheDirIn | grep -E 'Data2012.*\.txt'`
do
.....
done
grep uses regex to filter the output from ls. The regex I provided for grep will filter out files in the format Data2012*.txt, like you wanted.
The python glob library has that capability and it also supports regex expressions. So, for instance, you would do:
for file in glob.glob('*2012.txt'):
print file
and that would print the files matching that expression (assuming you're running it from the same directory). It has a heap-load more functionality though, you should dive deeper.
Edit: fixed indents, need more chars..
In bash the wildcards will do the work of ls for you.
Just use
dcacheDirIn="/mypath"
for file in $dcacheDirIn/Data2012*txt
do
echo "File $file"
done

Removing last n characters from Unix Filename before the extension

I have a bunch of files in Unix Directory :
test_XXXXX.txt
best_YYY.txt
nest_ZZZZZZZZZ.txt
I need to rename these files as
test.txt
best.txt
nest.txt
I am using Ksh on AIX .Please let me know how i can accomplish the above using a Single command .
Thanks,
In this case, it seems you have an _ to start every section you want to remove. If that's the case, then this ought to work:
for f in *.txt
do
g="${f%%_*}.txt"
echo mv "${f}" "${g}"
done
Remove the echo if the output seems correct, or replace the last line with done | ksh.
If the files aren't all .txt files, this is a little more general:
for f in *
do
ext="${f##*.}"
g="${f%%_*}.${ext}"
echo mv "${f}" "${g}"
done
If this is a one time (or not very often) occasion, I would create a script with
$ ls > rename.sh
$ vi rename.sh
:%s/\(.*\)/mv \1 \1/
(edit manually to remove all the XXXXX from the second file names)
:x
$ source rename.sh
If this need occurs frequently, I would need more insight into what XXXXX, YYY, and ZZZZZZZZZZZ are.
Addendum
Modify this to your liking:
ls | sed "{s/\(.*\)\(............\)\.txt$/mv \1\2.txt \1.txt/}" | sh
It transforms filenames by omitting 12 characters before .txt and passing the resulting mv command to a shell.
Beware: If there are non-matching filenames, it executes the filename—and not a mv command. I omitted a way to select only matching filenames.

Extract part of a filename shell script

In bash I would like to extract part of many filenames and save that output to another file.
The files are formatted as coffee_{SOME NUMBERS I WANT}.freqdist.
#!/bin/sh
for f in $(find . -name 'coffee*.freqdist)
That code will find all the coffee_{SOME NUMBERS I WANT}.freqdist file. Now, how do I make an array containing just {SOME NUMBERS I WANT} and write that to file?
I know that to write to file one would end the line with the following.
> log.txt
I'm missing the middle part though of how to filter the list of filenames.
You can do it natively in bash as follows:
filename=coffee_1234.freqdist
tmp=${filename#*_}
num=${tmp%.*}
echo "$num"
This is a pure bash solution. No external commands (like sed) are involved, so this is faster.
Append these numbers to a file using:
echo "$num" >> file
(You will need to delete/clear the file before you start your loop.)
If the intention is just to write the numbers to a file, you do not need find command:
ls coffee*.freqdist
coffee112.freqdist coffee12.freqdist coffee234.freqdist
The below should do it which can then be re-directed to a file:
$ ls coffee*.freqdist | sed 's/coffee\(.*\)\.freqdist/\1/'
112
12
234
Guru.
The previous answers have indicated some necessary techniques. This answer organizes the pipeline in a simple way that might apply to other jobs as well. (If your sed doesn't support ‘;’ as a separator, replace ‘;’ with ‘|sed’.)
$ ls */c*; ls c*
fee/coffee_2343.freqdist
coffee_18z8.x.freqdist coffee_512.freqdist coffee_707.freqdist
$ find . -name 'coffee*.freqdist' | sed 's/.*coffee_//; s/[.].*//' > outfile
$ cat outfile
512
18z8
2343
707

Resources