Perl: Weird Tie::File behaviour in Windows as opposed to Unix - windows

I have this perl script that uses Tie::File.
In Linux(Ubuntu) when I invoke the script via Bash it works as expected but in Windows when I invoke the script via Powershell it behaves weirdly (check P.S. below).
Code:
#!/usr/bin/perl -T
use strict;
use warnings;
use Tie::File;
use CommonStringTasks;
if ( #ARGV != 4 ) {
print "ERROR:Inadequate/Redundant arguments.\n";
print "Usage: perl <pl_executable> <path/to/peer_main.java> <peer_main.java>\n";
print " <score_file_index> <port_step_index>\n";
print $ARGV[0], "\n";
print $ARGV[1], "\n";
print $ARGV[2], "\n";
print $ARGV[3], "\n";
exit 1;
}
my $PEER_DIR = $ARGV[0];
my $PEER_FILE = $ARGV[1];
my $PEER_PACKAGE = "src/planetlab/app";
my $PEER_PATH = "${PEER_DIR}/${PEER_PACKAGE}/${PEER_FILE}";
# Check if args are tainted ...
# Check $PEER_PATH file permissions ...
open(my $file, "+<", "$PEER_PATH")
or
die("File ", $PEER_FILE, " could not be opened for editing:$!");
# Edit the file and change variables for debugging/deployment setup.
# Number demanglers:
# -flock -> arg2 -> 2 stands for FILE_EX
# Options (critical!):
# -Memory: Inhibit caching as this will allow record changes on the fly.
tie my #fileLines,
'Tie::File',
$file,
memory => 0
or
die("File ", $PEER_FILE, " could not be tied with Tie::File:$!");
flock $file, 2;
my $i = 0;
my $scoreLine = "int FILE_INDEX = " . $SCORE . ";";
my $portLine = "int SERVER_PORT = " . $PORT . ";";
my $originalScoreLine = "int FILE_INDEX =";
my $originalPortLine = "int SERVER_PORT =";
(tied #fileLines)->defer;
while (my $line = <$file>) {
if ( ($line =~ m/($scoreLine)/) && ($SCORE+1 > 0) ) {
print "Original line (score): ", "\n", $scoreLine, "\n";
chomp $line;
$line = substr($line, 0, -($scoreDigits+1));
$line = $line . (++$SCORE) . ";";
print "Editing line (score): ", $i, "\n", trimLeadSpaces($fileLines[$i]), "\n";
$fileLines[$i] = $line;
print "Line replaced with:\n", trimLeadSpaces($line), "\n";
next;
}
if ( ($line =~ m/($portLine)/) && ($PORT > 0) ) {
print "Original line (port): ", "\n", $portLine, "\n";
chomp $line;
$line = substr($line, 0, -($portDigits+1));
$line = $line . (++$PORT) . ";";
print "Editing line (port): ", $i, "\n", trimLeadSpaces($fileLines[$i]), "\n";
$fileLines[$i] = $line;
print "Line replaced with:\n", trimLeadSpaces($line), "\n";
last;
}
# Restore original settings.
if ( ($line =~ m/($originalScoreLine)/) && ($SCORE < 0) ) {
print "Restoring line (score) - FROM: ", "\n", $fileLines[$i], "\n";
$fileLines[$i] = " private static final int FILE_INDEX = 0;";
print "Restoring line (score) - TO: ", "\n", $fileLines[$i], "\n";
next;
}
if ( ($line =~ m/($originalPortLine)/) && ($PORT < 0) ) {
print "Restoring line (port) - FROM: ", "\n", $fileLines[$i], "\n";
$PORT = abs($PORT);
$fileLines[$i] = " private static final int SERVER_PORT = " . $PORT . ";";
print "Restoring line (port) - TO: ", "\n", $fileLines[$i], "\n";
last;
}
} continue {
$i++;
}
(tied #fileLines)->flush;
untie #fileLines;
close $file;
The perl version in both OSes is 5+(in Windows Active-State Perl with CPAN modules).
Could it be the way I open the filehandle? Any ideas anyone?
P.S.: The first version had a while (<$file>) and instead of $line I used the $_ variable but when I did that I had a behaviour where specific lines would not be edited but instead the file would get appended with a hundred newlines or so followed by the (correctly) edited line and so on. I also had a warning about $fileLines[$i] being uninitialized!Clearly something's wrong with the Tie::File structure in Windows or something else that I am not aware of. Same erratic behaviour takes place with the changes and in Linux(Ubuntu) behaviour again is as expected.

The OPs question is vague, and lacks input and expected output. Therefore I will simply note some of my concerns:
First, using Tie::File and <$file> and flock on the same handle seems to be both overkill and dangerous. I would recommend simply using Tie::File to iterate and to edit, such as:
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
tie my #lines, 'Tie::File', 'filename';
foreach my $linenum ( 0..$#lines ) {
if ($lines[$linenum] =~ /something/) {
$lines[$linenum] = 'somethingelse';
}
}
Perhaps better than edit inline, as Tie::File allows, copy the file to a backup, iterate over the lines using <$file>, then write to a new file with the old name.
#!/usr/bin/env perl
use strict;
use warnings;
use File::Copy 'move';
my $infile = $ARGV[0];
move( $infile, "$infile.bak");
open my $inhandle, '<', "$infile.bak";
open my $outhandle, '>', $infile;
while( my $line = <$inhandle> ) {
if ($line =~ /something/) {
$line = 'somethingelse';
}
print $outhandle $line;
}
Second, the -MModule flag simply translates to a use Module; at the top of the script. Therefore -MCPAN is use CPAN;, however loading the CPAN module does nothing for the script. CPAN.pm gives a script the ability to install modules.
Third, we will be able to help better if you give and example input, an expected output, and a stripped down script that clearly shows how this operation is to perform while still failing in the same way that the actual script does.

I found out the source of my problems. The reason was the record separator!
Tie::File expected in Windows a /r/n record separator so it read the whole file in just one pass. My files are in UTF-8, with Unix line endings.
That is why when I was traversing the $fileLines and accessed any index beyond 0 I got from perl a warning that the string was not initialized. Fixed the problem and now I am ready to go on! :D
P.S.: Mr Joel Berger I am marking your answer as valid/appropriate because you really tried helping me and I followed your first advice about the file handle :).
Thank you everyone for assisting me xD xD xD

Related

Executing Perl script from windows-command line with 2 entry

this is my Perl script
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
sub xml2array{
my $path = shift;
my $twig = XML::Twig->new->parsefile($path);
return map { $_ -> att('VirtualPath') } $twig -> get_xpath('//Signals');
}
sub compareMappingToArray {
my $mapping = shift;
my $signalsRef = shift;
my $i = 1;
print "In file : $mapping\n";
open(my $fh, $mapping);
while (my $r = <$fh>) {
chomp $r;
if ($r =~ /\'(ModelSpecific.*)\'/) {
my $s = $1;
my #matches = grep { /^$s$/ } #{$signalsRef};
print "line $i : not found - $s\n" if scalar #matches ==0;
print "line $i : multiple $s\n" if scalar #matches > 1;
}
$i = $i + 1 # keep line index
}
}
my $mapping = "C:/Users/HOR1DY/Desktop/Global/TA_Mapping/CAN/CAN_ESP_002_mapping.pm";
my #virtualpath = xml2array("SignalModel.xml");
compareMappingToArray($mapping, \#virtualpath);
The script works well, the aim of it is to compare the file "SignalModel.xml" and "CAN_ESP_002_mapping.pm" and putting the lines that didn't matches in a .TXT file. Here is how the .TXT file looks like:
In file : C:/Users/HOR1DY/Desktop/Global/TA_Mapping/CAN/CAN_ESP_002_mapping.pm
line 331 : not found - ModelSpecific.EID.NET.CAN_Engine.VCU.Transmit.VCU_202.R2B_VCU_202__byte_3
line 348 : not found - ModelSpecific.EID.NET.CAN_Engine.CMM_WX.Transmit.CMM_HYB_208.R2B_CMM_HYB_208__byte_2
line 368 : not found - ModelSpecific.EID.NET.CAN_Engine.VCU.Transmit.VCU_222.R2B_VCU_222__byte_0
But for this script, I put the two files that need to be compare inside of the code and instead of doing that, I would like to run the script in windows cmd line and having something like:
C:\Users>perl CANMappingChecker.pl -'file 1' 'file 2'
All the files are in .zip file so if I can execute the script that he goes inside and take the 2 files that I need for comparison, it should be perfect.
I really don't know how to do and what to put inside my script to make that in the cmd windows. Thanks for your help !
Program (or script) parameters are stored in the #ARGV array. shift and pop without any parameter will work on #ARGV when used outside of a sub, in a sub they operate on #_.
See Archive::Zip for zip file handling.

Kaldi librispeech data preparation error

I'm trying to do ASR system. Im using kaldi manual and librispeech corpus.
In data preparation step i get this error
utils/data/get_utt2dur.sh: segments file does not exist so getting durations
from wave files
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file
headers, using wav-to-duration
utils/data/get_utt2dur.sh: line 99: wav-to-duration: command not found
And here the piece of code where this error occures
if cat $data/wav.scp | perl -e '
while (<>) { s/\|\s*$/ |/; # make sure final | is preceded by space.
#A = split;
if (!($#A == 5 && $A[1] =~ m/sph2pipe$/ &&
$A[2] eq "-f" && $A[3] eq "wav" && $A[5] eq "|")) { exit (1); }
$utt = $A[0]; $sphere_file = $A[4];
if (!open(F, "<$sphere_file")) { die "Error opening sphere file $sphere_file"; }
$sample_rate = -1; $sample_count = -1;
for ($n = 0; $n <= 30; $n++) {
$line = <F>;
if ($line =~ m/sample_rate -i (\d+)/) { $sample_rate = $1; }
if ($line =~ m/sample_count -i (\d+)/) { $sample_count = $1;
}
if ($line =~ m/end_head/) { break; }
}
close(F);
if ($sample_rate == -1 || $sample_count == -1) {
die "could not parse sphere header from $sphere_file";
}
$duration = $sample_count * 1.0 / $sample_rate;
print "$utt $duration\n";
} ' > $data/utt2dur; then
echo "$0: successfully obtained utterance lengths from sphere-file headers"
else
echo "$0: could not get utterance lengths from sphere-file headers,
using wav-to-duration"
if command -v wav-to-duration >/dev/null; then
echo "$0: wav-to-duration is not on your path"
exit 1;
fi
In file wav.scp i got such lines:
6295-64301-0002 flac -c -d -s /home/tinin/kaldi/egs/librispeech/s5/LibriSpeech/dev-clean/6295/64301/6295-64301-0002.flac |
In this dataset i have only flac files(they downloaded via provided script) and i dont understand why we search wav-files? And how run data preparation correctly(i didnt change source code in this manual.
Also, if you explain to me what is happening in this code, then I will be very grateful to you, because i'm not familiar with bash and perl.
Thank you a lot!
The problem I see from this line
utils/data/get_utt2dur.sh: line 99: wav-to-duration: command not found
is that you have not added the kaldi tools in your path.
Check the file path.sh and see if the directories that it adds to your path are correct (because it has ../../.. inside and it might not match your current folder setup)
As for the perl script, it counts the samples of the sound file and then it divides with the sample rate in order to get the duration. Don't worry about the 'wav' word, your files might be on another format, it's just the name of the kaldi functions.

How to find out if a command exists in a POSIX compliant manner?

See the discussion at Is `command -v` option required in a POSIX shell? Is posh compliant with POSIX?. It describes that type as well as command -v option is optional in POSIX.1-2004.
The answer marked correct at Check if a program exists from a Bash script doesn't help either. Just like type, hash is also marked as XSI in POSIX.1-2004. See http://pubs.opengroup.org/onlinepubs/009695399/utilities/hash.html.
Then what would be a POSIX compliant way to write a shell script to find if a command exists on the system or not?
How do you want to go about it? You can look for the command on directories in the current value of $PATH; you could look in the directories specified by default for the system PATH (getconf PATH as long as getconf
exists on PATH).
Which implementation language are you going to use? (For example: I have a Perl implementation that does a decent job finding executables on $PATH — but Perl is not part of POSIX; is it remotely relevant to you?)
Why not simply try running it? If you're going to deal with Busybox-based systems, lots of the executables can't be found by searching — they're built into the shell. The major caveat is if a command does something dangerous when run with no arguments — but very few POSIX commands, if any, do that. You might also need to determine what command exit statuses indicate that the command is not found versus the command objecting to not being called with appropriate arguments. And there's little guarantee that all systems will be consistent on that. It's a fraught process, in case you hadn't gathered.
Perl implementation pathfile
#!/usr/bin/env perl
#
# #(#)$Id: pathfile.pl,v 3.4 2015/10/16 19:39:23 jleffler Exp $
#
# Which command is executed
# Loosely based on 'which' from Kernighan & Pike "The UNIX Programming Environment"
#use v5.10.0; # Uses // defined-or operator; not in Perl 5.8.x
use strict;
use warnings;
use Getopt::Std;
use Cwd 'realpath';
use File::Basename;
my $arg0 = basename($0, '.pl');
my $usestr = "Usage: $arg0 [-AafhqrsVwx] [-p path] command ...\n";
my $hlpstr = <<EOS;
-A Absolute pathname (determined by realpath)
-a Print all possible matches
-f Print names of files (as opposed to symlinks, directories, etc)
-h Print this help message and exit
-q Quiet mode (don't print messages about files not found)
-r Print names of files that are readable
-s Print names of files that are not empty
-V Print version information and exit
-w Print names of files that are writable
-x Print names of files that are executable
-p path Use PATH
EOS
sub usage
{
print STDERR $usestr;
exit 1;
}
sub help
{
print $usestr;
print $hlpstr;
exit 0;
}
sub version
{
my $version = 'PATHFILE Version $Revision: 3.4 $ ($Date: 2015/10/16 19:39:23 $)';
# Beware of RCS hacking at RCS keywords!
# Convert date field to ISO 8601 (ISO 9075) notation
$version =~ s%\$(Date:) (\d\d\d\d)/(\d\d)/(\d\d) (\d\d:\d\d:\d\d) \$%\$$1 $2-$3-$4 $5 \$%go;
# Remove keywords
$version =~ s/\$([A-Z][a-z]+|RCSfile): ([^\$]+) \$/$2/go;
print "$version\n";
exit 0;
}
my %opts;
usage unless getopts('AafhqrsVwxp:', \%opts);
version if ($opts{V});
help if ($opts{h});
usage unless scalar(#ARGV);
# Establish test and generate test subroutine.
my $chk = 0;
my $test = "-x";
my $optlist = "";
foreach my $opt ('f', 'r', 's', 'w', 'x')
{
if ($opts{$opt})
{
$chk++;
$test = "-$opt";
$optlist .= " -$opt";
}
}
if ($chk > 1)
{
$optlist =~ s/^ //;
$optlist =~ s/ /, /g;
print STDERR "$arg0: mutually exclusive arguments ($optlist) given\n";
usage;
}
my $chk_ref = eval "sub { my(\$cmd) = \#_; return -f \$cmd && $test \$cmd; }";
my #PATHDIRS;
my %pathdirs;
my $path = defined($opts{p}) ? $opts{p} : $ENV{PATH};
#foreach my $element (split /:/, $opts{p} // $ENV{PATH})
foreach my $element (split /:/, $path)
{
$element = "." if $element eq "";
push #PATHDIRS, $element if $pathdirs{$element}++ == 0;
}
my $estat = 0;
CMD:
foreach my $cmd (#ARGV)
{
if ($cmd =~ m%/%)
{
if (&$chk_ref($cmd))
{
print "$cmd\n" unless $opts{q};
next CMD;
}
print STDERR "$arg0: $cmd: not found\n" unless $opts{q};
$estat = 1;
}
else
{
my $found = 0;
foreach my $directory (#PATHDIRS)
{
my $file = "$directory/$cmd";
if (&$chk_ref($file))
{
$file = realpath($file) if $opts{A};
print "$file\n" unless $opts{q};
next CMD unless defined($opts{a});
$found = 1;
}
}
print STDERR "$arg0: $cmd: not found\n" unless $found || $opts{q};
$estat = 1;
}
}
exit $estat;

Need to split multiple files in a directory based on string, rename properly using powershell or fix my perl script

I have a directory full of files (text exports of Dynamics NAV objects that have been exported) in Windows. Each file contains multiple objects. I need to split each file into separate files based on lines that begin with OBJECT, and name each file appropriately.
The purpose of this is to get our Dynamics NAV system into git.
I wrote a nifty perl program to do this that works great on linux. But it hangs on the while(<>) loop in Windows (Server 2012 if that matters).
So, I need to either figure out how to do this in the PowerShell script that I wrote that generates all of the files, or fix my perl script that I'm calling from PowerShell. Does Windows perl handle filehandles differently than linux?
Here's my code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Path qw(make_path remove_tree);
use POSIX qw(strftime);
my $username = getlogin || getpwuid($<);
my $datestamp = strftime("%Y%m%d-%H%M%S", localtime);
my $work_dir = "/temp/nav_export";
my $objects_dir = "$work_dir/$username/objects";
my $export_dir = "$work_dir/$username/$datestamp";
print "Objects being exported to $export_dir\n";
make_path("$export_dir/Page", "$export_dir/Codeunit", "$export_dir/MenuSuite", "$export_dir/Query", "$export_dir/Report", "$export_dir/Table", "$export_dir/XMLport");
chdir $objects_dir or die "Could not change to $objects_dir: $!";
# delete empty files
foreach(glob('*.*')) {
unlink if -f and !-s _;
}
my #files = <*>;
my $count = #files;
print "Processing $count files\n";
open (my $fh, ">-") or die "Could not open standard out: $!";
# OBJECT Codeunit 1 ApplicationManagement
while(<>)
{
if (m/^OBJECT ([A-Za-z]+) ([0-9]+) (.*)/o)
{
my $objectType = $1;
my $objectID = $2;
my $objectName = my $firstLine = $3;
$objectName =~ s/[\. \/\(\)\\]/_/g; # translate spaces, (, ), ., \ and / to underscores
$objectName =~ tr/\cM//d; # get rid of Ctrl-M
my $filename = $export_dir . "/" . $objectType . "/" . $objectType . "~" . $objectID . "~" . $objectName;
close $fh and open($fh, '>', $filename) or die "Could not open file '$filename' $!";
print $fh "OBJECT $objectType $objectID $firstLine\n";
next;
}
print $fh $_;
}
I've learned quite a bit of PowerShell in the past few days. There are some things that it really does quite well. And some (such as calling an executable with variables and command line options that have spaces) that are maddeningly difficult to figure out. To call curl, this is what I resorted to:
$curl = "C:\Program Files (x86)\cURL\bin\curl"
$arg10 = '-s'
$arg1 = '-X'
$arg11 = 'post'
$arg2 = '-H'
$arg22 = '"Accept-Encoding: gzip,deflate"'
$arg3 = '-H'
$arg33 = '"Content-Type: text/xml;charset=UTF-8"'
$arg4 = '-H'
$arg44 = '"SOAPAction:urn:microsoft-dynamics-schemas/page/permissionrange:ReadMultiple"'
$arg5 = '--ntlm'
$arg6 = '-u'
$arg66 = 'username:password'
$arg7 = '-d'
$arg77 = '"#soap_envelope.txt"'
$arg8 = "http://$servicetier.corp.company.net:7047/$database/WS/DBDOC/Page/PermissionRange"
$arg9 = "-o"
$arg99 = "c:\temp\nav_export\$env:username\raw_list.xml"
&"$curl" $arg10 $arg1 $arg11 $arg2 $arg22 $arg3 $arg33 $arg4 $arg44 $arg5 $arg6 $arg66 $arg7 $arg77 $arg8 $arg9 $arg99
I realize that part is a bit of a tangent. But I've been working really hard at trying to figure this out and not have to bother you nice folk here at stackoverflow!
I'm ambivalent about making it work in PowerShell or fixing the Perl code at this point. I just need to make it work. But I'm hoping it's just some little difference in filehandle handling between linux and Windows.
It's hard to believe that the Perl code that you show does anything on Linux either. It looks like your while loop is supposed to be reading through all of the files in the #files array, but to make it do that you have to copy the names to #ARGV.
Also note that #files will contain directories as well as files.
I suggest you change the lines starting with my #files = <*> to this. There's no reason why it shouldn't work on both Windows and Linux.
our #ARGV = grep -f, glob '*';
my $count = #ARGV;
print "Processing $count files\n";
my $fh;
while (<>) {
s/\s+\z//; # Remove trailing whitespace (including CR and LF)
my #fields = split ' ', $_, 4;
if ( #fields == 4 and $fields[0] eq 'OBJECT' ) {
my ($object_type, $object_id, $object_name) = #fields[1,2,3];
$object_name =~ tr{ ().\\/}{_}; # translate spaces, (, ), ., \ and / to underscores
my $filename = "$export_dir/$object_type/$object_type~$object_id~$object_name";
open $fh, '>', $filename or die "Could not open file '$filename': $!";
}
print $fh "$_\n" if $fh;
if (eof) {
close $fh;
$fh = undef;
}
}

Why can't I use more than 20 files with my Perl script and Windows's SendTo?

I'm trying to emulate RapidCRC's ability to check crc32 values within filenames on Windows Vista Ultimate 64-bit. However, I seem to be running into some kind of argument limitation.
I wrote a quick Perl script, created a batch file to call it, then placed a shortcut to the batch file in %APPDATA%\Microsoft\Windows\SendTo
This works great when I select about 20 files or less, right-click and "send to" my batch file script. However, nothing happens at all when I select more than that. I suspect there's a character or number of arguments limit somewhere.
Hopefully I'm missing something simple and that the solution or a workaround isn't too painful.
References:
batch file (crc32_inline.bat):
crc32_inline.pl %*
Perl notes:
I'm using (strawberry) perl v5.10.0
I have C:\strawberry\perl\bin in my path, which is where crc32.bat exists.
perl script (crc32_inline.pl):
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd;
use English qw( -no_match_vars );
use File::Basename;
$OUTPUT_AUTOFLUSH = 1;
my $crc32_cmd = 'crc32.bat';
my $failure_report_basename = 'crc32_failures.txt';
my %failures = ();
print "\n";
foreach my $arg (#ARGV) {
# if the file has a crc, check to see if it matches the calculated
# crc.
if (-f $arg and $arg =~ /\[([0-9a-f]{8})\]/i) {
my $crc = uc $1;
my $basename = basename($arg);
print "checking ${basename}... ";
my $calculated_crc = uc `${crc32_cmd} "${arg}"`;
chomp($calculated_crc);
if ($crc eq $calculated_crc) {
print "passed.\n";
}
else {
print "FAILED (calculated ${calculated_crc})\n";
my $dirname = dirname($arg);
$failures{$dirname}{$basename} = $calculated_crc;
}
}
}
print "\nReport Summary:\n";
if (scalar keys %failures == 0) {
print " All files OK\n";
}
else {
print sprintf(" %d / %d files failed crc32 validation.\n" .
" See %s for details.\n",
scalar keys %failures,
scalar #ARGV,
$failure_report_basename);
my $failure_report_fullname = $failure_report_basename;
if (defined -f $ARGV[0]) {
$failure_report_fullname
= dirname($ARGV[0]) . '/' . $failure_report_basename;
}
$OUTPUT_AUTOFLUSH = 0;
open my $fh, '>' . $failure_report_fullname or die $!;
foreach my $dirname (sort keys %failures) {
print {$fh} $dirname . "\n";
foreach my $basename (sort keys %{$failures{$dirname}}) {
print {$fh} sprintf(" crc32(%s) basename(%s)\n",
$failures{$dirname}{$basename},
$basename);
}
}
close $fh;
$OUTPUT_AUTOFLUSH = 1;
}
print sprintf("\n%s done! (%d seconds elapsed)\n" .
"Press enter to exit.\n",
basename($0),
time() - $BASETIME);
<STDIN>;
I will recommend just putting a shortcut to your script in the "Send To" directory instead of doing it via a batch file (which is subject to cmd.exes limits on command line length).

Resources