I am working on a windows machine and I have a directory filled with ~200k of files which I need to organise. This is a job I will need to do regularly with different filename sets but with similar patterns so perl seemed a good tool to use.
Each filename is made up of {a string A}{2 or 3 digit number B}{single letter "r" or "x"}{3 digit number}.extension
I want to create a folder for each string A
Within each folder I want a sub-folder for each B
I then want to move each file into its relevant sub-folder
So it will end up looking something like
/CustomerA/1
/CustomerA/2
/CustomerA/3
/CustomerB/1
/CustomerB/2
/CustomerB/3
etc with the files in each sub-folder
so CustomerA888x123.xml is moved into /CustomerA/888/
I have the list of files in an array but I am struggling with splitting the file name out to its constituent parts and using the parts effectively.
Thanks for the answer. I ended up with this:
#!usr/bin/perl
use warnings;
use strict;
use File::Copy qw(move);
use File::Path qw(make_path);
opendir my $dir, ".";
my #files = readdir($dir);
closedir $dir;
foreach my $file (#files) {
my ($cust, $num) = $file =~ m/(\D+)(\d+)/;
my $dirname = "$cust/$num";
my #dirs_made = make_path($dirname, { verbose => 1 });
move($file, $dirname) or warn "cant move $file to $dirname: $!";
}
Given your description of file names, this regex should parse what you need
my ($cust, $num) = $filename =~ m/(\D+)(\d+)/;
Use a more precise pattern if you wish or need to be more specific about what precedes the number, for example [a-zA-Z] for letters only.
With that on hand, you can create directories using the core module File::Path, for example
use File::Path qw(make_path);
my $dirname = "$cust/$num";
my #dirs_made = make_path($dirname, { verbose => 1 });
This creates the path as needed, returning the names of created directories. It also prints the names with the verbose. If the directory exists it quietly skips it. If there are problems it raises a die so you may want to wrap it in eval
eval { make_path($dirname) };
if ($#) {
warn "Error with make_path($dirname): $#";
}
Also note the File::Path::Tiny module as an alternative, thanks to Sinan Ünür for bringing it up. Other than being far lighter, it also has the more common error-handling policy whereby a false is returned on failure so you don't need an eval but only the usual check
use File::Path::Tiny;
File::Path::Tiny::mk($path) or warn "Can't mk($path): $!";
The module behaves similarly to mkdir in many ways, see the linked documentation.
Move the files using the move function form the core module File::Copy, for example
use File::Copy qw(move);
move($file, $dirname) or warn "Can't move $file to $dirname: $!";
All this can be in a loop over the array with the file names.
Related
I am new to Perl and was wondering if you guys can help me in regards to passing more than one files in the below code;
my #files=<data/j*.*.txt>;
if (#ARGV) {
my $test=$ARGV[0];
$test=lc($test);
print "Using $test instead\n";
#files=</data/$test*.*.txt>;
print "Found #files instead\n";
}
my $outfile='/data/w_c.txt';
my $lotfile='/data/completed.txt';
if (-e $outfile) {
unlink $outfile;
}
In the above code (my #files=<data/j*.*.txt>;) is currently having all the files starting with j*.*, But I would like to pass all the below files only;
j*.1.txt
c*.3.1.txt
a*.a.b.txt
etc..
How could I pass the list of files in the program itself? I am trying to read all those files and extract information from them..!
Thank you in advance..
You can use something like this:
<data/j*.*.txt data/j*.1.txt data/a*.a.b.txt>
There comes a point where it might be best to use <data/*.txt> and use a regex to filter out all but those you want.
Rather than using globs this way I'd be tempted to switch to opendir and readdir and to use an array of patterns in a regex with alternation to select my files. That way you're not using two different text wildcard syntaxes (for glob and for regex) in the same short snippet of code, which I've seen confuse programmers new to Perl before.
# Set your data directory.
my $dir = '/data';
# Take the whole array of arguments on the command line as patterns to
# match in the regex, or default to a short list of patterns if there
# are none.
# (Consider using an options library later rather than messing
# with #ARGV directly if the program becomes more complex.)
my #filespecs = ( scalar #ARGV ? #ARGV : qw( j.*?\.1\.txt c.*?\.3\.1\.txt ) );
# Join the multiple patterns with the regex alternation character.
# This makes them multiple matching options in a single regex.
my $re = join '|', #filespecs;
# Open the directory for reading, or terminate with an error.
opendir my $d, $dir or die "Cannot open directory $dir : $!\n";
# Select into the #files array things read from the directory
# entry that are regular files (-f), do not start with '.',
# and which match the regex.
my #files = grep { (-f) && (!/^\./) && (/$re/) } readdir $d;
# Close the directory handle now that we're done using it.
closedir $d;
Without the overly verbose comments, that boils down to just this.
my $dir = '/data';
my #filespecs = ( scalar #ARGV ? #ARGV : qw( j.*?\.1\.txt c.*?\.3\.1\.txt ) );
my $re = join '|', #filespecs;
opendir my $d, $dir or die "Cannot open directory $dir : $!\n";
my #files = grep { (-f) && (!/^\./) && (/$re/) } readdir $d;
closedir $d;
I elided the last few lines of your original code because it doesn't seem directly related to your question.
Some sources for you to read that may help make sense of this solution.:
perldoc perlop for the Conditional Operator
https://perldoc.perl.org/perlop#Conditional-Operator , and for qw()
https://perldoc.perl.org/perlop#qw/STRING/
perldoc perlre to learn
about Perl regexes, especially in this case alternation
https://perldoc.perl.org/perlre#Metacharacters
perldoc perlfunc for the -f file test https://perldoc.perl.org/perlfunc#-X-FILEHANDLE , opendir https://perldoc.perl.org/perlfunc#opendir-DIRHANDLE,EXPR , readdir https://perldoc.perl.org/perlfunc#readdir-DIRHANDLE , closedir https://perldoc.perl.org/perlfunc#closedir-DIRHANDLE , and grep https://perldoc.perl.org/perlfunc#grep-BLOCK-LIST
## removesmalls.pl
#!/usr/bin/perl
use strict;
use warnings;
my $minlen = shift or die "Error: `minlen` parameter not provided\n";
{
local $/=">";
while(<>) {
chomp;
next unless /\w/;
s/>$//gs;
my #chunk = split /\n/;
my $header = shift #chunk;
my $seqlen = length join "", #chunk;
print ">$_" if($seqlen >= $minlen);
}
local $/="\n";
}
Exexecuting the script as follows:
perl removesmalls.pl 1000 contigs.fasta > contigs-1000.fasta
The above script works for me but there is a problem,
i have 109 different fasta files with different file names.
i can run the script for individual file but i want to run the script at once for all files and the result file should be individually different for each.
file names are like SRR8224532.fasta, SRR8224533.fasta, SRR8224534.fasta, and so on
i want the result files after removing the contigs (i.e., for me less than 1000) something like SRR8224532-out.fasta,
SRR8224533-out.fasta, and so on.
Any help or suggestion would be helpfull.
I want to move files from a server that my Windows computer is connected to, to the actual computer. I have tried the code on my mac and it works fine, so I suspect the problem has to do with the fact that the files I wish to move are on a server or perhaps with Windows (I am unfamiliar with this OS). It is important to me to be able to use File::Find::Rule because there are many subdirectories within subdirectories that need to be searched.
use strict;
use warnings;
use File::Find::Rule;
use File::Copy;
# directory where files live
# my $dir = "\\172.18\user\folder\folder2";
# directory where TextGrids will be moved to
my $outdir = "\users\lisa\desktop\test";
my #files;
#files = File::Find::Rule -> file()
-> name("*_clean.TextGrid")
-> maxdepth()
-> in($dir);
foreach my $file (#files) {
$file =~ /(.*\\)(.*)/;
my $name = $2;
copy("$file", "$outdir/$name") or die "Copy failed: $!";
}
Edit: Ok, I've made some changes to the script below. But the strange thing is, that when I ask it to print each file, it gives me something like \\172.18\user\folder\folder/255/file.txt. I changed the regex to be (.*\/)(.*) and now the script works perfectly, though I don't know why!
use strict;
use warnings;
use File::Find::Rule;
use File::Copy;
# directory where files live
my $dir = "\\\\172.18\\user\\folder\\folder2";
# directory where TextGrids will be moved to
my $outdir = "C:\\Users\\lisa\\desktop\\test";
my #files;
#files = File::Find::Rule -> file()
-> name("*_clean.TextGrid")
-> maxdepth()
-> in($dir);
foreach my $file (#files) {
print "$file\n";
$file =~ /(.*\\)(.*)/;
my $name = $2;
copy("$file", "$outdir\\$name") or die "Copy failed: $!";
}
After your edit, the script works because the last directory separator in the string happens to be /, which is matched by the \/ in the regular expression. Even though you had \ in the input, the library you used to find the files added /s.
I have some suggestions:
You can avoid the need to escape (most) backslashes by using single quoted strings, unless you need the interpolation of the double quoted ones.
Escaping backslashes is optional unless followed by a single quote or another backslash:
my $outdir = '\users\lisa\desktop\test';
but
my $outdir = '\users\lisa\desktop\test\\';
$outdir = '\users\lisa\desktop\test\\\'ere is a path';
my $not_a_path = 'three backslashes\\\\\in between, all but the last need escaping';
'ere is a path is the last element in that path.
If you're dealing with Windows, consider using [\\/] in place of directory separator in regular expressions. (Or [\\\/] if you absolutely must use / as regular expression delimiter.)
Even if you have control over user input to only use \ in paths, libraries you use will usually add /, so it's better to be prepared for a combination of both.
$file =~ /(.*[\\\/])(.*)/;
$file =~ m{(.*[\\/])(.*)};
$file =~ m¤(.*[\\/])(.*)¤;
I also removed the superfluous quotes from around $file in the copy() call. Final result:
use strict;
use warnings;
use File::Find::Rule;
use File::Copy;
# directory where files live
my $dir = '\\172.18\user\folder\folder2';
# directory where TextGrids will be moved to
my $outdir = 'C:\Users\lisa\desktop\test';
my #files;
#files = File::Find::Rule -> file()
-> name("*_clean.TextGrid")
-> maxdepth()
-> in($dir);
foreach my $file (#files) {
print "$file\n";
$file =~ /(.*[\\\/])(.*)/;
my $name = $2;
copy($file, "$outdir\\$name") or die "Copy failed: $!";
}
use strict;
use warnings;
my $dir = "\\172.18\user\folder\folder2";
print("$dir\n");
my $outdir = "\users\lisa\desktop\test";
print("$outdir\n");
outputs
Unrecognized escape \d passed through at a.pl line 7.
\172.18Ser?older?older2
Sersisadesktop est
You need to escape your backslashes!
use strict;
use warnings;
my $dir = "\\\\172.18\\user\\folder\\folder2";
print("$dir\n");
my $outdir = "\\users\\lisa\\desktop\\test";
print("$outdir\n");
Don't ignore warnings.
I have multiple folders where two files are present.
For example, 123.jpg, 456.jpg under folder ABC. I want to rename the files to IT1_ABC_123.v1.jpg and IT2_ABC_456.v1.jpg. Similarly, other folders have two files.
How can I do this in shell or Perl?
Try this, using shell and perl:
mkdir /tmp/test; cd $_
mkdir ABC DEF
touch {ABC,DEF}/{123,456}.jpg #creates four files, two in each directory
find|perl -nlE's,((.*)/(.+))/((123|456).jpg),$1/IT#{[++$n]}_$3_$4,&&say"$&\n$_\n"'
./ABC/123.jpg
./ABC/IT1_ABC_123.jpg
./ABC/456.jpg
./ABC/IT2_ABC_456.jpg
./DEF/123.jpg
./DEF/IT3_DEF_123.jpg
./DEF/456.jpg
./DEF/IT4_DEF_456.jpg
Now, after confirming this is what you want, replace the say with a rename:
find|perl -nlE's,((.*)/(.+))/((123|456).jpg),$1/IT#{[++$n]}_$3_$4, and rename$&,$_'
The new filenames:
find -type f
./ABC/IT1_ABC_123.jpg
./ABC/IT2_ABC_456.jpg
./DEF/IT3_DEF_123.jpg
./DEF/IT4_DEF_456.jpg
This will find filenames with 123.jpg or 456.jpg and rename them.
s,,, is the search-replace and it returns 1 (the number of changes it made) which again leads to the right side of the and being done (the rename).
Filenames that doesn't match 123.jpg or 456.jpg isn't renamed since s,,, will return 0 and the and is "short cutted" since it then logically cannot be true with a false (0) left side. So then the rename is not executed.
This variant does the same, but might be easier to read:
find|perl -nlE 'rename$&,$_ if s,((.*)/(.+))/((123|456).jpg),$1/IT#{[++$n]}_$3_$4,'
I have found this pattern useful in many cases of mass renamings. Also, dedicated software for mass renaming with GUIs exists, which for some might be easier to use.
Rewritten as a program abc.pl, it could be:
#!/usr/bin/perl
while(<>){
chomp;
next if not s,((.*)/([A-Z]{3}))/(\d{3}\.jpg),$1/IT#{[++$n]}_$3_$4,;
print "Found: $&\nNew name: $_\n\n";
#rename $&, $_;
}
Run:
find|perl abc.pl
You can do this in core Perl using the File::Find, File::Basename, and File::Copy modules. You can test it out with the script below. It won't make any changes until you uncomment the line with the "move" function.
#! perl
use strict;
use warnings;
use File::Basename;
use File::Copy;
use File::Find;
my $root_dir = '/path/to/main/folder';
# Recursively searches for all files below the $root_dir
my #fileset;
find(
sub {
# Get the absolute file path
my $path = $File::Find::name;
# Only capture the path if not a directory
# You can add any number of conditions here
if (!-d $path) {
push #fileset, $path;
}
},
$root_dir
);
# set the IT counter in new file name
my $int = 1;
# list of all possible file suffixes to have fileparse() look for. It will
# capture the end of the file path verbatim (including the period) if it's
# in this array
my #suffixes = ('.jpg', '.txt');
my $previous_dir;
foreach my $old_path (#fileset) {
# split apart the basename of the file, the directory path, and the file suffix
my ($basename, $parent_dir, $suffix) = fileparse($old_path, #suffixes);
# strip off trailing slash so fileparse() will capture parent dir name correctly
$parent_dir =~ s{[/]$}{};
# capture just the name of the parent directory
my $parent_name = fileparse($parent_dir);
# Assemble the new path
my $new_path = $parent_dir . '/IT' . $int . '_'
. $parent_name . '_' . "$basename.v1" . $suffix;
# Move the file to rename (this is safer than using rename() for cross-platform)
# move $old_path, $new_path;
print "OLD PATH: $old_path\n";
print "NEW PATH: $new_path\n\n";
# Reset counter when dir changes
if (!$previous_dir) {
$previous_dir = $parent_dir; # set previous_dir on first loop
}
elsif($previous_dir ne $parent_dir) {
$previous_dir = $parent_dir; # update previous_dir to check next loop
$int = 0; # reset counter
}
$int++; # iterate the counter
}
Edit 2018-07-12: I've updated the answer to show how to reset the counter when the directory changes by evaluating the current path with the one used in the previous loop and updating accordingly. This is not tested so it may need some adjustments.
Given the abc/def examples given, the output should look something like this:
OLD PATH: /path/to/main/folder/abc/123.jpg
NEW PATH: /path/to/main/folder/abc/IT1_abc_123.v1.jpg
OLD PATH: /path/to/main/folder/abc/456.txt
NEW PATH: /path/to/main/folder/abc/IT2_abc_456.v1.jpg
OLD PATH: /path/to/main/folder/def/123.jpg
NEW PATH: /path/to/main/folder/def/IT1_def_123.v1.jpg
OLD PATH: /path/to/main/folder/def/456.jpg
NEW PATH: /path/to/main/folder/def/IT2_def_456.v1.jpg
I am perl noob, and trying to do following:
Search for files with specific string in a directory recursively. Say string is 'abc.txt'
The file can be in two different sub-directories, say dir_1 or dir_2
Once the file is found, if it is found in dir_1, rename it to dir_1_abc.txt. If it is in dir_2, then rename it to dir_2_abc.txt.
Once all the files have been found and renamed, move them all to a new directory named, say dir_3
I don't care if I have to use any module to accomplish this. I have been trying to do it using File::Find::Rule and File::copy, but not getting the desired result. Here is my sample code:
#!/usr/bin/perl -sl
use strict;
use warnings;
use File::Find::Rule;
use File::Copy;
my $dir1 = '/Users/macuser/ParentDirectory/logs/dir_1'
my $dir2 = '/Users/macuser/ParentDirectory/logs/dir_2'
#ideally I just want to define one directory but because of the logic I am using in IF
#statement, I am specifying two different directory paths
my $dest_dir = '/Users/macuser/dir_3';
my(#old_files) = find(
file => (),
name => '*abc.txt',
in => $dir1, $dir2 ); #not sure if I can give two directories, works with on
foreach my $old_file(#old_files) {
print $old_file; #added this for debug
if ($dest_dir =~ m/dir_1/)
{
print "yes in the loop";
rename ($old_file, "dir_1_$old_file");
print $old_file;
copy "$old_file", "$dest_dir";
}
if ($dest_dir =~ m/dir_2/)
{
print "yes in the loop";
rename ($old_file, "dir_2_$old_file");
print $old_file;
copy "$old_file", "dest_dir";
}
}
The code above does not change the file name, instead when I am printing $old_file inside if, it spits the whole directory path, where the file is found, and it is prefixing the path with dir_1 and dir_2 respectively. Something is horribly wrong. Please help simply.
If you have bash ( I assume in OSX it is available), you can do this in a few lines (usually I put them in one line).
destdir="your_dest_dir"
for i in `find /Users/macuser/ParentDirectory/logs -type f -iname '*abc.txt' `
do
prefix=`dirname $i`
if [[ $prefix = *dir_1* ]] ; then
prefix="dir_1"
fi
dest="$destdir/${prefix}_`basename $i`"
mv "$i" "$dest"
done
The advantage of this method is that you can have many sub dirs under logs and you don't need to specify them. you can search for files like blah_abc.txt, tada_abc.txt too. If you want a exact match just juse abc.txt, instead of *abc.txt.
If the files can be placed in the destination as you rename them, try this:
#!/usr/bin/perl
use strict;
use File::Find;
use File::Copy;
my $dest_dir = '/Users/macuser/dir_3';
foreach my $dir ('/Users/macuser/ParentDirectory/logs/dir_1', '/Users/macuser/ParentDirectory/logs/dir_2') {
my $prefix = $dir; $prefix =~ s/.*\///;
find(sub {
move($File::Find::name, "$dest_dir/${prefix}_$_") if /abc\.txt$/;
}, $dir);
}
If you need to do all the renaming first and then move them all, you could either remember the list of files you have to move or you can make two passes making sure the pattern on the second pass is still OK after the initial rename in the first pass.