How can I reduce this to a single file open? - windows

Using Strawberry Perl 5.22.0 in Windows 7. Is there a more "perlish" way to write this snippet of code? I hate the duplication of file open sections, but cannot think of a way to make it only open once because of the requirement to test the creation time.
...
my $x;
my $fh;
my $sentinelfile = "Logging.yes"; #if this file exists then enable logging
my $logfile = "transfers.log";
my $log = 0; #default to NO logging
$log = 1 if -e $sentinelfile; #enable logging if sentinel file exists
if($log){
#logfile remains open after this so remember to close at end of program!
if (-e $logfile) { #file exists
open($fh, "<", $logfile); #open for read will NOT create if not exist
chomp ($x = <$fh>); #grab first row
close $fh;
if (((scalar time - $x)/3600/24) > 30) { #when ~30 days since created
rename($logfile, $logfile . time); #rename existing logfile
open($fh, ">", $logfile); #open for write and truncate
print $fh time,"\n"; #save create date
print $fh "--------------------------------------------------\n";
} else { #file is not older than 30 days
open($fh, ">>", $logfile); #open for append
}
} else { #file not exist
open($fh, ">", $logfile); #open new for write
print $fh time,"\n"; #save create date
print $fh "--------------------------------------------------\n";
}
} #if $log
...
To recap: logfile logs stuff. First row of file contains the logfile creation date. Second row contains horizontal rule. Rest of file contains text. Around 30 days after file was created, rename file and start a new one. After the above chunk of code the logfile is open and ready for logging stuff. It gets closed at the end of the rest of the program.

There are other, non-cosmetic problems with your code: a) You do not ever check if your calls to open succeeded; b) You are creating a race condition. The file can come into existence after the -e check has failed. The subsequent open $fh, '>' ... would then clobber it; c) You don't check if your rename call succeeded etc.
The following would be a partial improvement on your existing code:
if ($log) {
if (open $fh, '<', $logfile) { #file exists
chomp ($x = <$fh>);
close $fh
or die "Failed to close '$logfile': $!";
if (((time - $x)/3600/24) > 30) {
my $rotated_logfile = join '.', $logfile, time;
rename $logfile => $rotated_logfile
or die "Failed to rename '$logfile' to '$rotated_logfile': $!";
open $fh, '>', $logfile
or die "Failed to create '$logfile'";
print $fh time, "\n", '-' x 50, "\n";
}
else {
open $fh, '>>', $logfile
or die "Cannot open '$logfile' for appending: $!";
}
}
else {
open $fh, '>', $logfile
or die "Cannot to create '$logfile': $!";
print $fh time, "\n", '-' x 50, "\n";
}
}
It would be better to abstract every bit of discrete functionality to suitably named functions.
For example, here is a completely untested re-write:
use autouse Carp => qw( croak );
use constant SENTINEL_FILE => 'Logging.yes';
use constant ENABLE_LOG => -e SENTINEL_FILE;
use constant HEADER_SEPARATOR => '-' x 50;
use constant SECONDS_PER_DAY => 24 * 60 * 60;
use constant ROTATE_AFTER => 30 * SECONDS_PER_DAY;
my $fh;
if (ENABLE_LOG) {
if (my $age = read_age( $logfile )) {
if ( is_time_to_rotate( $age ) ) {
rotate_log( $logfile );
}
else {
$fh = open_log( $logfile );
}
}
unless ($fh) {
$fh = create_log( $logfile );
}
}
sub is_time_to_rotate {
my $age = shift;
return $age > ROTATE_AFTER;
}
sub rotate_log {
my $file = shift;
my $saved_file = join '.', $file, time;
rename $file => $saved_file
or croak "Failed to rename '$file' to '$saved_file': $!"
return;
}
sub create_log {
my $file = shift;
open my $fh, '>', $file
or croak "Failed to create '$file': $!";
print $fh time, "\n", HEADER_SEPARATOR, "\n"
or croak "Failed to write header to '$file': $!";
return $fh;
}
sub open_log {
my $file = shift;
open my $fh, '>>', $file
or croak "Failed to open '$file': $!";
return $fh;
}
sub read_age {
my $file = shift;
open my $fh, '<', $file
or return;
defined (my $creation_time = <$fh>)
or croak "Failed to read creation time from '$file': $!";
return time - $creation_time;
}

If you need to read a line of a file, rename it and then work with it, you have to open it twice.
However, you can also do away with using that first line.
On Windows, according to perlport (Files and Filesystems), the inode change time time-stamp (ctime) "may really" mark the file creation time. This is likely to be completely suitable for a log file that doesn't get manipulated and moved around. It can be obtained with the -C file-test operator
my $days_float = -C $filename;
Now you can numerically test this against 30. Then there is no need to print the file's creation time to its first line (but you may as well if it is useful for viewing or other tools).
Also, there is the module Win32API::File::Time, with the purpose to
provide maximal access to the file creation, modification, and access times under MSWin32
Plese do read the docs for some caveats. I haven't used it but it seems tailored for your need.
A good point is raised in a comment: apparently the OS retains the original time-stamp as the file is being renamed. In that case, when the file's too old copy it into a new one (with the new name) and delete it, instead of using rename. Then open that log file anew, so with a new time-stamp.
Here is a complete example
archive_log($logfile) if -f $logfile and -C $logfile > 30;
open my $fh_log, '>>', $logfile or die "Can't open $logfile: $!";
say $fh_log "Log a line";
sub archive_log {
my ($file) = #_;
require POSIX; POSIX->import('strftime');
my $ts = strftime("%Y%m%d_%H:%M:%S", localtime); # 20170629_12:44:10
require File::Copy; File::Copy->import('copy');
my $archive = $file . "_$ts";
copy ($file, $archive) or die "Can't copy $file to $archive: $!";
unlink $file or die "Can't unlink $file: $!";
}
The archive_log archives the current log by copying it and then removes it.
So after that we can just open for append, which creates the file if not there.
The -C tests for file existence but since its output is used in a numerical test we need -f first.
Since this happens once a month I load modules at runtime, with require and import, once the log actually need be rotated. If you already use File::Copy then there is no need for this. As for the time-stamp, I threw in something to make this a working example.
I tested this on UNIX, by changing -C to -M and tweaking the timestamp by touch -t -c.
Better yet, to reduce the caller's code fully move the tests into the sub as well, for
my $fh_log = open_log($logfile);
say $fh_log "Log a line";
sub open_log {
my ($file) = #_;
if (-f $file and -C $file > 30) {
# code from archive_log() above, to copy and unlink $file
}
open my $fh_log, '>>', $file or die "Can't open $file: $!";
return $fh_log;
}
Note. On UNIX the file's creation time is not kept anywhere. The closest notion is the ctime above, but this is of course different. For one thing, it changes with many operations, for instance mv, ln, chmod, chown, chgrp (and probably others).

Related

Keep shared entries among many files

I have hundreds of files, each with different number of entries (>xxxx) and want to keep only shared entries among all files, separately. I'm not sure what is the best method to do this, maybe perl! I used sort, uniq of bash, but I didn't get the correct answer. The format of IDs start with > and follows 4 characters among all files.
1.fa
>abcd
CTGAATGCC
2.fa
>abcd
AAATGCGCG
>efgh
CGTAC
3.fa
>abcd
ATGCAATA
>efgh
TAACGTAA
>ijkl
TGCAA
Final results, of this example would be:
1.fa
>abcd
CTGAATGCC
2.fa
>abcd
AAATGCGCG
3.fa
>abcd
ATGCAATA
This Perl program will do as you ask. It uses Perl's built-in edit in place functionality and renames the original files to 1.fa.bak etc. It shouldn't have a problem with blank lines in your data as long as the sequence is always on one line immediately following the ID
use strict;
use warnings 'all';
my #files = glob '*.fa';
printf "Processing %d file%s\n", scalar #files, #files == 1 ? "" : "s";
exit if #files < 2;
my %ids;
{
local #ARGV = #files;
while ( <> ) {
++$ids{$1} if /^>(\S+)/;
}
}
# remove keys that aren't in all files
delete #ids{ grep { $ids{$_} < #files } keys %ids };
my $n = keys %ids;
printf "%d ID%s common to all files\n", $n, $n == 1 ? '' : "s";
exit unless $n;
{
local #ARGV = #files;
local $^I = '.bak';
while ( <> ) {
next unless /^>(\S+)/ and $ids{$1};
print;
print scalar <>;
}
}
Here is Perl solution, that may help you:
use feature qw(say);
use strict;
use warnings;
my $file_dir = 'files';
chdir $file_dir;
my #files = <*.fa>;
my $num_files = scalar #files;
my %ids;
for my $file (#files) {
open ( my $fh, '<', $file) or die "Could not open file '$file': $!";
while (my $id = <$fh>) {
chomp $id;
chomp (my $sequence = <$fh>);
$ids{$id}++;
}
close $fh;
}
for my $file (#files) {
open ( my $fh, '<', $file) or die "Could not open file '$file': $!";
my $new_name = $file . '.new';
open ( my $fh_write, '>', $new_name ) or die "Could not open file '$new_name': $!";
while (my $id = <$fh>) {
chomp $id;
chomp (my $sequence = <$fh>);
if ( $ids{$id} == $num_files ) {
say $fh_write $id;
say $fh_write $sequence;
}
}
close $fh_write;
close $fh;
}
It assumes that all the .fa files are located in the directory named $file_dir, and it writes the new sequences to new files in the same directory. The new file names get the .new extension.

Need to split multiple files in a directory based on string, rename properly using powershell or fix my perl script

I have a directory full of files (text exports of Dynamics NAV objects that have been exported) in Windows. Each file contains multiple objects. I need to split each file into separate files based on lines that begin with OBJECT, and name each file appropriately.
The purpose of this is to get our Dynamics NAV system into git.
I wrote a nifty perl program to do this that works great on linux. But it hangs on the while(<>) loop in Windows (Server 2012 if that matters).
So, I need to either figure out how to do this in the PowerShell script that I wrote that generates all of the files, or fix my perl script that I'm calling from PowerShell. Does Windows perl handle filehandles differently than linux?
Here's my code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Path qw(make_path remove_tree);
use POSIX qw(strftime);
my $username = getlogin || getpwuid($<);
my $datestamp = strftime("%Y%m%d-%H%M%S", localtime);
my $work_dir = "/temp/nav_export";
my $objects_dir = "$work_dir/$username/objects";
my $export_dir = "$work_dir/$username/$datestamp";
print "Objects being exported to $export_dir\n";
make_path("$export_dir/Page", "$export_dir/Codeunit", "$export_dir/MenuSuite", "$export_dir/Query", "$export_dir/Report", "$export_dir/Table", "$export_dir/XMLport");
chdir $objects_dir or die "Could not change to $objects_dir: $!";
# delete empty files
foreach(glob('*.*')) {
unlink if -f and !-s _;
}
my #files = <*>;
my $count = #files;
print "Processing $count files\n";
open (my $fh, ">-") or die "Could not open standard out: $!";
# OBJECT Codeunit 1 ApplicationManagement
while(<>)
{
if (m/^OBJECT ([A-Za-z]+) ([0-9]+) (.*)/o)
{
my $objectType = $1;
my $objectID = $2;
my $objectName = my $firstLine = $3;
$objectName =~ s/[\. \/\(\)\\]/_/g; # translate spaces, (, ), ., \ and / to underscores
$objectName =~ tr/\cM//d; # get rid of Ctrl-M
my $filename = $export_dir . "/" . $objectType . "/" . $objectType . "~" . $objectID . "~" . $objectName;
close $fh and open($fh, '>', $filename) or die "Could not open file '$filename' $!";
print $fh "OBJECT $objectType $objectID $firstLine\n";
next;
}
print $fh $_;
}
I've learned quite a bit of PowerShell in the past few days. There are some things that it really does quite well. And some (such as calling an executable with variables and command line options that have spaces) that are maddeningly difficult to figure out. To call curl, this is what I resorted to:
$curl = "C:\Program Files (x86)\cURL\bin\curl"
$arg10 = '-s'
$arg1 = '-X'
$arg11 = 'post'
$arg2 = '-H'
$arg22 = '"Accept-Encoding: gzip,deflate"'
$arg3 = '-H'
$arg33 = '"Content-Type: text/xml;charset=UTF-8"'
$arg4 = '-H'
$arg44 = '"SOAPAction:urn:microsoft-dynamics-schemas/page/permissionrange:ReadMultiple"'
$arg5 = '--ntlm'
$arg6 = '-u'
$arg66 = 'username:password'
$arg7 = '-d'
$arg77 = '"#soap_envelope.txt"'
$arg8 = "http://$servicetier.corp.company.net:7047/$database/WS/DBDOC/Page/PermissionRange"
$arg9 = "-o"
$arg99 = "c:\temp\nav_export\$env:username\raw_list.xml"
&"$curl" $arg10 $arg1 $arg11 $arg2 $arg22 $arg3 $arg33 $arg4 $arg44 $arg5 $arg6 $arg66 $arg7 $arg77 $arg8 $arg9 $arg99
I realize that part is a bit of a tangent. But I've been working really hard at trying to figure this out and not have to bother you nice folk here at stackoverflow!
I'm ambivalent about making it work in PowerShell or fixing the Perl code at this point. I just need to make it work. But I'm hoping it's just some little difference in filehandle handling between linux and Windows.
It's hard to believe that the Perl code that you show does anything on Linux either. It looks like your while loop is supposed to be reading through all of the files in the #files array, but to make it do that you have to copy the names to #ARGV.
Also note that #files will contain directories as well as files.
I suggest you change the lines starting with my #files = <*> to this. There's no reason why it shouldn't work on both Windows and Linux.
our #ARGV = grep -f, glob '*';
my $count = #ARGV;
print "Processing $count files\n";
my $fh;
while (<>) {
s/\s+\z//; # Remove trailing whitespace (including CR and LF)
my #fields = split ' ', $_, 4;
if ( #fields == 4 and $fields[0] eq 'OBJECT' ) {
my ($object_type, $object_id, $object_name) = #fields[1,2,3];
$object_name =~ tr{ ().\\/}{_}; # translate spaces, (, ), ., \ and / to underscores
my $filename = "$export_dir/$object_type/$object_type~$object_id~$object_name";
open $fh, '>', $filename or die "Could not open file '$filename': $!";
}
print $fh "$_\n" if $fh;
if (eof) {
close $fh;
$fh = undef;
}
}

Perl: Bad Symbol for dirhandle

This is my code:
opendir(DIR, $directoryPath) or die "Cant open $directoryPath$!";
my #files = readdir(DIR); #Array of file names
closedir (DIR) or die "Cant close $directoryPath$!";
I'm using #files to create an array of the file names within the directory for renaming later in the program.
The problem is:
I am getting the error "Bad Symbol for dirhandle" at the closedir line.
If I don't closedir to avoid this, I don't have permission to change file names (I'm using Windows).
I tried an alternative way of renaming the files (below) to try a different solution to the problem by renaming the files a different way and within the dirhandles, but this just repeat the permission errors.
opendir(DIR, $directoryPath) or die "Cant open $directoryPath$!";
while( (my $filename = readdir(DIR)))
{
rename($filename, $nFileName . $i) or die "Cant rename file $filename$!";
i++;
}
closedir (DIR) or die "Cant close $directoryPath$!";
From a quick bit of research I think the permission error is a Windows security feature so you can't edit a file while its open, but I haven't been able to find a solution simple enough for me to understand.
An answer to point 1. or point 3. is preferrable, but an answer to point 2. will also be useful.
Full code used in points 1. and 2. below
use 5.16.3;
use strict;
print "Enter Directory: ";
my $directoryPath = <>;
chomp($directoryPath);
chdir("$directoryPath") or die "Cant chdir to $directoryPath$!";
opendir(DIR, $directoryPath) or die "Cant open $directoryPath$!";
my #files = readdir(DIR); #Array of file names
closedir (DIR) or die "Cant close $directoryPath$!";
my $fileName = "File ";
for my $i (0 .. #files)
{
rename($files[$i], $fileName . ($i+1)) or die "Cant rename file $files[$i]$!";
}
chdir; #return to home directory
I can input the path correctly, but then error message (copied exactly) is:
Can't rename file .Permission denied at C:\path\to\file\RenameFiles.pl line 19, <> line 1.
The error
Can't rename file .Permission denied at C:\path\to\file\RenameFiles.pl line 19, <> line 1.
says that you are trying to rename the file ., which is a special file that is a shortcut for "current directory". You should add exceptions to your code to not rename this file, and the one called ... Something like:
next if $files[$i] =~ /^\./;
Would do. This will skip over any file that begins with a period .. Alternatively you can skip directories:
next if -d $files[$i]; # skip directories (includes . and ..)
As TLP has already pointed out, readdir returns . and .. which corresponds to the current and parent directory.
You'll need to filter those out in order to avoid renaming directories.
use strict;
use warnings;
use autodie;
print "Enter Directory: ";
chomp( my $dirpath = <> );
opendir my $dh, $dirpath or die "Can't open $dirpath: $!";
my $number = 0;
while ( my $file = readdir($dh) ) {
next if $file =~ /^\.+$/;
my $newfile = "$dirpath/File " . ++$number;
rename "$dirpath/$file", $newfile or die "Cant rename file $file -> $newfile: $!";
}
closedir $dh;
Cross Platform Compatibility using Path::Class
One way to simplify this script and logic is to use Path::Class to handle file and directory operations.
use strict;
use warnings;
use autodie;
use Path::Class;
print "Enter Directory: ";
chomp( my $dirname = <> );
my $dir = dir($dirname);
my $number = 0;
for my $file ( $dir->children ) {
next if $file->is_dir();
my $newfile = $dir->file( "File" . ++$number );
$file->move_to($newfile);
}

Perl Hash of hash issue

I have hash which looks like this:
$VAR1 = {
'638047' => {
'commands' => 'perl nSWIFT/bin/tqtest.pl -var clist=',
},
'638049' => {
'commands' => 'perl nSWIFT/bin/tqtest.pl-var clist=',
}
};
I want to create a directory such that the file name starts with hash values and it stores the numbers that are generated by running the commands given as perl nswift/bin/tqtest.pl -var clist=. The numbers should be stored in the same file as their key. For example:
'638047' -> '638050' and '638049' -> '638051'
then it should be stored accordingly.
It have tried this but not getting there.
my %stat
my #qbsid_filename = keys %stat;
foreach (#qbsid_filename){
open QBS, ">/root/qbsid/$_";
}
my $newqbsid, #files;
opendir (DIREC, '/root/qbsid') or die $!;
foreach my $qbsid ( keys %stat){
my $cmd = $stat{$qbsid}->{commands};
if ($cmd =~ m/perl(.*)/){
$ex_cmd = $1;
}
$newqbsid = qx | perl $ex_cmd|;
}
close (DIREC);
I am not very good with algorithms. So, finding it difficult.
Please inform me if I am going wrong somewhere or you need more information. Thanks.
Try this:
my %stat = { ... };
while (my ($qbsid_filename, $qbsid) = each %stat) {
my $cmd = $qbsid->{commands};
next unless $cmd =~ m/^perl/;
open my $qbs, '>', "/root/qbsid/$qbsid_filename"
or die "Cannot open $qbsid_filename: $!";
open my $qbscmd, '-|', "$cmd"
or die "Cannot execute $cmd: $!";
while (<$qbscmd>) {
print $qbs $_;
}
close ($qbscmd);
close ($qbs);
}

Why can't I use more than 20 files with my Perl script and Windows's SendTo?

I'm trying to emulate RapidCRC's ability to check crc32 values within filenames on Windows Vista Ultimate 64-bit. However, I seem to be running into some kind of argument limitation.
I wrote a quick Perl script, created a batch file to call it, then placed a shortcut to the batch file in %APPDATA%\Microsoft\Windows\SendTo
This works great when I select about 20 files or less, right-click and "send to" my batch file script. However, nothing happens at all when I select more than that. I suspect there's a character or number of arguments limit somewhere.
Hopefully I'm missing something simple and that the solution or a workaround isn't too painful.
References:
batch file (crc32_inline.bat):
crc32_inline.pl %*
Perl notes:
I'm using (strawberry) perl v5.10.0
I have C:\strawberry\perl\bin in my path, which is where crc32.bat exists.
perl script (crc32_inline.pl):
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd;
use English qw( -no_match_vars );
use File::Basename;
$OUTPUT_AUTOFLUSH = 1;
my $crc32_cmd = 'crc32.bat';
my $failure_report_basename = 'crc32_failures.txt';
my %failures = ();
print "\n";
foreach my $arg (#ARGV) {
# if the file has a crc, check to see if it matches the calculated
# crc.
if (-f $arg and $arg =~ /\[([0-9a-f]{8})\]/i) {
my $crc = uc $1;
my $basename = basename($arg);
print "checking ${basename}... ";
my $calculated_crc = uc `${crc32_cmd} "${arg}"`;
chomp($calculated_crc);
if ($crc eq $calculated_crc) {
print "passed.\n";
}
else {
print "FAILED (calculated ${calculated_crc})\n";
my $dirname = dirname($arg);
$failures{$dirname}{$basename} = $calculated_crc;
}
}
}
print "\nReport Summary:\n";
if (scalar keys %failures == 0) {
print " All files OK\n";
}
else {
print sprintf(" %d / %d files failed crc32 validation.\n" .
" See %s for details.\n",
scalar keys %failures,
scalar #ARGV,
$failure_report_basename);
my $failure_report_fullname = $failure_report_basename;
if (defined -f $ARGV[0]) {
$failure_report_fullname
= dirname($ARGV[0]) . '/' . $failure_report_basename;
}
$OUTPUT_AUTOFLUSH = 0;
open my $fh, '>' . $failure_report_fullname or die $!;
foreach my $dirname (sort keys %failures) {
print {$fh} $dirname . "\n";
foreach my $basename (sort keys %{$failures{$dirname}}) {
print {$fh} sprintf(" crc32(%s) basename(%s)\n",
$failures{$dirname}{$basename},
$basename);
}
}
close $fh;
$OUTPUT_AUTOFLUSH = 1;
}
print sprintf("\n%s done! (%d seconds elapsed)\n" .
"Press enter to exit.\n",
basename($0),
time() - $BASETIME);
<STDIN>;
I will recommend just putting a shortcut to your script in the "Send To" directory instead of doing it via a batch file (which is subject to cmd.exes limits on command line length).

Resources