How can I make Perl's File::Find faster? - performance

I have a folder named Lib and I am using the File::Find module to search that folder in whole dir say, D:\. It's taking a long time to search, say even 5 mins if the drive has a lot of subdirectories. How can I search that Lib faster so it will be done in seconds?
My code looks like this:
find( \&Lib_files, $dir);
sub Lib_files
{
return unless -d;
if ($_=~m/^([L|l]ib(.*))/)
{
print"$_";
}
return;
}

Searching the file system without a preexisting index is IO bound. Otherwise, products ranging from locate to Windows Desktop Search would not exist.
Type D:\> dir /b/s > directory.lst and observe how long it takes for that command to run. You should not expect to beat that without indexing files first.
One major improvement you can make is to print less often. A minor improvement is not to use capturing parentheses if you are not going to capture:
my #dirs;
sub Lib_files {
return unless -d $File::Find::name;
if ( /^[Ll]ib/ ) {
push #dirs, $File::Find::name;
}
return;
}
On my system, a simple script using File::Find to print the names of all subdirectories under my home directory with about 150,000 files takes a few minutes to run compared to dir %HOME% /ad/b/s > dir.lst which completes in about 20 seconds.
I would be inclined to use:
use File::Basename;
my #dirs = grep { fileparse($_) =~ /^[Ll]ib/ }
split /\n/, `dir %HOME% /ad/b/s`;
which completed in under 15 seconds on my system.
If there is a chance there is some other dir.exe in %PATH%, cmd.exe's built-in dir will not be invoked. You can use qx! cmd.exe /c dir %HOME% /ad/b/s ! to make sure that the right dir is invoked.

how about not using File::Find module
use Cwd;
sub find{
my ($wdir) = shift;
my ($sdir) = &cwd;
chdir($wdir) or die "Unable to enter dir $wdir:$!\n";
opendir(DIR, ".") or die "Unable to open $wdir:$!\n";
foreach my $name (readdir(DIR) ){
next if ($name eq ".");
next if ($name eq "..");
if (-d $name){
&find($name);
next;
}
print $name ."\n";
chdir($sdir) or die "Unable to change to dir $sdir:$!\n";
}
closedir(DIR);
}
&find(".");

Related

file organisation in windows using perl

I am working on a windows machine and I have a directory filled with ~200k of files which I need to organise. This is a job I will need to do regularly with different filename sets but with similar patterns so perl seemed a good tool to use.
Each filename is made up of {a string A}{2 or 3 digit number B}{single letter "r" or "x"}{3 digit number}.extension
I want to create a folder for each string A
Within each folder I want a sub-folder for each B
I then want to move each file into its relevant sub-folder
So it will end up looking something like
/CustomerA/1
/CustomerA/2
/CustomerA/3
/CustomerB/1
/CustomerB/2
/CustomerB/3
etc with the files in each sub-folder
so CustomerA888x123.xml is moved into /CustomerA/888/
I have the list of files in an array but I am struggling with splitting the file name out to its constituent parts and using the parts effectively.
Thanks for the answer. I ended up with this:
#!usr/bin/perl
use warnings;
use strict;
use File::Copy qw(move);
use File::Path qw(make_path);
opendir my $dir, ".";
my #files = readdir($dir);
closedir $dir;
foreach my $file (#files) {
my ($cust, $num) = $file =~ m/(\D+)(\d+)/;
my $dirname = "$cust/$num";
my #dirs_made = make_path($dirname, { verbose => 1 });
move($file, $dirname) or warn "cant move $file to $dirname: $!";
}
Given your description of file names, this regex should parse what you need
my ($cust, $num) = $filename =~ m/(\D+)(\d+)/;
Use a more precise pattern if you wish or need to be more specific about what precedes the number, for example [a-zA-Z] for letters only.
With that on hand, you can create directories using the core module File::Path, for example
use File::Path qw(make_path);
my $dirname = "$cust/$num";
my #dirs_made = make_path($dirname, { verbose => 1 });
This creates the path as needed, returning the names of created directories. It also prints the names with the verbose. If the directory exists it quietly skips it. If there are problems it raises a die so you may want to wrap it in eval
eval { make_path($dirname) };
if ($#) {
warn "Error with make_path($dirname): $#";
}
Also note the File::Path::Tiny module as an alternative, thanks to Sinan Ünür for bringing it up. Other than being far lighter, it also has the more common error-handling policy whereby a false is returned on failure so you don't need an eval but only the usual check
use File::Path::Tiny;
File::Path::Tiny::mk($path) or warn "Can't mk($path): $!";
The module behaves similarly to mkdir in many ways, see the linked documentation.
Move the files using the move function form the core module File::Copy, for example
use File::Copy qw(move);
move($file, $dirname) or warn "Can't move $file to $dirname: $!";
All this can be in a loop over the array with the file names.

How to remove partial path from Get-Location output?

I'm trying to write a custom prompt for PowerShell and I was wondering how I would filter out the 1...n directories in the output of Get-Location.
function prompt {
"PS " + $(get-location) + "> "
}
So, if the path is too long I would like to omit some of the directories and just display PS...blah\blah> or something. I tried (get-container) - 1 but it doesn't work.
Use Split-Path with the -Leaf parameter if you want just the last element of a path:
function prompt {
"PS {0}> " -f (Split-Path -Leaf (Get-Location))
}
I wanted to make a more dynamic function. I do just basic string manipulation. You could do some logic nesting Split-Path but the string manipulation approach is just so much more terse. Since what you want to be returned wont be a fully validated path I feel better offering this solution.
Function Get-PartialPath($path, $depth){
If(Test-Path $path){
"PS {0}>" -f (($path -split "\\")[-$depth..-1] -join "\")
} else {
Write-Warning "$path is not a valid path"
}
}
Sample Function call
Get-PartialPath C:\temp\folder1\sfg 2
PS folder1\sfg>
So you can use this simple function. Pass is a string for the path. Assuming it is valid then it will carve up the path into as many trailing chunks as you want. We use -join to rebuild it. If you give a $depth number that is too high the whole path will be returned. So if you only wanted to have 3 folders being shown setting the $depth for 3.
Ansgar Wiechers' answer will give you the last directory but if you want a way to do multiple directories at the end of the filepath (using the triple dot notation) you can cast the directory path to a uri and then just get and join the segments:
function prompt {
$curPath = pwd
$pathUri = ([uri] $curPath.ToString())
if ($pathUri.Segments.Count -le 3) {
"PS {0}>" -f $curPath
} else {
"PS...{0}\{1}>" -f $pathUri.Segments[-2..-1].trim("/") -join ""
}
}
Or using just a string (no uri cast)
function prompt {
$curPath = pwd
$pathString = $curPath.Tostring().split('\') #Changed; no reason for escaping
if ($pathString.Count -le 3) {
"PS {0}>" -f $curPath
} else {
"PS...{0}\{1}>" -f $pathString[-2..-1] -join ""
}
}
$a = prompt
Write-Host $a
Then just change -2 to whatever you want to be the first directory and -le 3 to match. I typically use the uri cast when I have to run stuff through a browser or over connections to Linux machines (as it uses "/" as a path separator) but there is no reason to not use the string method for normal operations.

Copy work very well, while Move doesnt work at all.

use File::Copy;
#Variable with my directory I work on
$dir = "C:/projekty/perl/muzyka";
#Variables used to find all mp3 files
$dir_tmp = $dir."/*.mp3";
#files = glob( $dir_tmp );
#Variable with directory I want to create and put my files to
$new_dir = "C:/projekty/perl/muzyka/new_dir";
#Creating new directory
mkdir ( $new_dir ) or print "MKDIR PROBLEM";
Till this point everything is allright. Now I put the loop:
foreach( #pliki )
{
copy( $_, $new_dir) or print "COPY PROBLEM";
}
or:
foreach( #pliki )
{
move( $_, $new_dir) or print "MOVE PROBLEM";
}
And the problem is: Copy works perfectly fine, but Move doesn't want to do its job. It works sometimes depends on some modifications in code but never in a loop. Simple code with 1 line:
move($a, $b);
works perfectly. But if I use some conditions or loops it stops working even if arguments (directories) seem OK (I checked them with print function put in a loop). Why is it not working? Are there any circumstances that would cause errors?
copy and move are documented to set $! on error. It's also good to check whether the arguments are what you expect them to be. Pay particular attention to the presence of newlines and trailing spaces.
move($_, $new_dir)
or warn("Can't move \"$_\" to \"$new_dir\": $!\n");

How to find a specific files recursively in the directory, rename it by prefixing sub-directory name, and move it to different directory

I am perl noob, and trying to do following:
Search for files with specific string in a directory recursively. Say string is 'abc.txt'
The file can be in two different sub-directories, say dir_1 or dir_2
Once the file is found, if it is found in dir_1, rename it to dir_1_abc.txt. If it is in dir_2, then rename it to dir_2_abc.txt.
Once all the files have been found and renamed, move them all to a new directory named, say dir_3
I don't care if I have to use any module to accomplish this. I have been trying to do it using File::Find::Rule and File::copy, but not getting the desired result. Here is my sample code:
#!/usr/bin/perl -sl
use strict;
use warnings;
use File::Find::Rule;
use File::Copy;
my $dir1 = '/Users/macuser/ParentDirectory/logs/dir_1'
my $dir2 = '/Users/macuser/ParentDirectory/logs/dir_2'
#ideally I just want to define one directory but because of the logic I am using in IF
#statement, I am specifying two different directory paths
my $dest_dir = '/Users/macuser/dir_3';
my(#old_files) = find(
file => (),
name => '*abc.txt',
in => $dir1, $dir2 ); #not sure if I can give two directories, works with on
foreach my $old_file(#old_files) {
print $old_file; #added this for debug
if ($dest_dir =~ m/dir_1/)
{
print "yes in the loop";
rename ($old_file, "dir_1_$old_file");
print $old_file;
copy "$old_file", "$dest_dir";
}
if ($dest_dir =~ m/dir_2/)
{
print "yes in the loop";
rename ($old_file, "dir_2_$old_file");
print $old_file;
copy "$old_file", "dest_dir";
}
}
The code above does not change the file name, instead when I am printing $old_file inside if, it spits the whole directory path, where the file is found, and it is prefixing the path with dir_1 and dir_2 respectively. Something is horribly wrong. Please help simply.
If you have bash ( I assume in OSX it is available), you can do this in a few lines (usually I put them in one line).
destdir="your_dest_dir"
for i in `find /Users/macuser/ParentDirectory/logs -type f -iname '*abc.txt' `
do
prefix=`dirname $i`
if [[ $prefix = *dir_1* ]] ; then
prefix="dir_1"
fi
dest="$destdir/${prefix}_`basename $i`"
mv "$i" "$dest"
done
The advantage of this method is that you can have many sub dirs under logs and you don't need to specify them. you can search for files like blah_abc.txt, tada_abc.txt too. If you want a exact match just juse abc.txt, instead of *abc.txt.
If the files can be placed in the destination as you rename them, try this:
#!/usr/bin/perl
use strict;
use File::Find;
use File::Copy;
my $dest_dir = '/Users/macuser/dir_3';
foreach my $dir ('/Users/macuser/ParentDirectory/logs/dir_1', '/Users/macuser/ParentDirectory/logs/dir_2') {
my $prefix = $dir; $prefix =~ s/.*\///;
find(sub {
move($File::Find::name, "$dest_dir/${prefix}_$_") if /abc\.txt$/;
}, $dir);
}
If you need to do all the renaming first and then move them all, you could either remember the list of files you have to move or you can make two passes making sure the pattern on the second pass is still OK after the initial rename in the first pass.

Locating a file on the path

Does anybody know how to determine the location of a file that's in one of the folders specified by the PATH environmental variable other than doing a dir filename.exe /s from the root folder?
I know this is stretching the bounds of a programming question but this is useful for deployment-related issues, also I need to examine the dependencies of an executable. :-)
You can use the where.exe utility in the C:\Windows\System32 directory.
For WindowsNT-based systems:
for %i in (file) do #echo %~dp$PATH:i
Replace file with the name of the file you're looking for.
If you want to locate the file at the API level, you can use PathFindOnPath. It has the added bonus of being able to specify additional directories, in case you want to search in additional locations apart from just the system or current user path.
On windows i'd say use %WINDIR%\system32\where.exe
Your questions title doesn't specify windows so I imagine some folks might find this question looking for the same with a posix OS on their mind (like myself).
This php snippet might help them:
<?php
function Find( $file )
{
foreach( explode( ':', $_ENV( 'PATH' ) ) as $dir )
{
$command = sprintf( 'find -L %s -name "%s" -print', $dir, $file );
$output = array();
$result = -1;
exec( $command, $output, $result );
if ( count( $output ) == 1 )
{
return( $output[ 0 ] );
}
}
return null;
}
?>
This is slightly altered production code I'm running on several servers. (i.e. taken out of OO context and left some sanitation and error checking out for brevity.)
Using PowerShell on Windows...
Function Get-ENVPathFolders {
#.Synopsis Split $env:Path into an array
#.Notes
# - Handle 1) folders ending in a backslash 2) double-quoted folders 3) folders with semicolons 4) folders with spaces 5) double-semicolons i.e. blanks
# - Example path: 'C:\WINDOWS\;"C:\Path with semicolon; in the middle";"E:\Path with semicolon at the end;";;C:\Program Files;
# - 2018/01/30 by Chad#ChadsTech.net - Created
$NewPath = #()
$env:Path.ToString().TrimEnd(';') -split '(?=["])' | ForEach-Object { #remove a trailing semicolon from the path then split it into an array using a double-quote as the delimeter keeping the delimeter
If ($_ -eq '";') { # throw away a blank line
} ElseIf ($_.ToString().StartsWith('";')) { # if line starts with "; remove the "; and any trailing backslash
$NewPath += ($_.ToString().TrimStart('";')).TrimEnd('\')
} ElseIf ($_.ToString().StartsWith('"')) { # if line starts with " remove the " and any trailing backslash
$NewPath += ($_.ToString().TrimStart('"')).TrimEnd('\') #$_ + '"'
} Else { # split by semicolon and remove any trailing backslash
$_.ToString().Split(';') | ForEach-Object { If ($_.Length -gt 0) { $NewPath += $_.TrimEnd('\') } }
}
}
Return $NewPath
}
$myFile = 'desktop.ini'
Get-ENVPathFolders | ForEach-Object { If (Test-Path -Path $_\$myFile) { Write-Output "Found [$_\$myFile]" } }
I also blogged the answer with some details over at http://blogs.catapultsystems.com/chsimmons/archive/2018/01/30/parse-envpath-with-powershell
In addition to the 'which' (MS Windows) and 'where' (unix/linux) utilities, I have written my own utility which I call 'findinpath'. In addition to finding the executable that would be executed, if handed to the command line interpreter (CLI), it will find all matches, returned path-search-order so you can find path-order problems. In addition, my utility returns not just executables, but any file-specification match, to catch those times when a desired file isn't actually executable.
I also added a feature that has turned out to be very nifty; the -s flag tells it to search not just the system path, but everything on the system disk, known user-directories excluded. I have found this feature to be incredibly useful in systems administration tasks...
Here's the 'usage' output:
usage: findinpath [ -p <path> | -path <path> ] | [ -s | -system ] <file>
or findinpath [ -h | -help ]
where: <file> may be any file spec, including wild cards
-h or -help returns this text
-p or -path uses the specified path instead of the PATH environment variable.
-s or -system searches the system disk, skipping /d /l/ /nfs and /users
Writing such a utility is not hard and I'll leave it as an exercise for the reader. Or, if asked here, I'll post my script - its in 'bash'.
just for kicks, here's a one-liner powershell implementation
function PSwhere($file) { $env:Path.Split(";") | ? { test-path $_\$file* } }

Resources