Need to split multiple files in a directory based on string, rename properly using powershell or fix my perl script - windows

I have a directory full of files (text exports of Dynamics NAV objects that have been exported) in Windows. Each file contains multiple objects. I need to split each file into separate files based on lines that begin with OBJECT, and name each file appropriately.
The purpose of this is to get our Dynamics NAV system into git.
I wrote a nifty perl program to do this that works great on linux. But it hangs on the while(<>) loop in Windows (Server 2012 if that matters).
So, I need to either figure out how to do this in the PowerShell script that I wrote that generates all of the files, or fix my perl script that I'm calling from PowerShell. Does Windows perl handle filehandles differently than linux?
Here's my code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Path qw(make_path remove_tree);
use POSIX qw(strftime);
my $username = getlogin || getpwuid($<);
my $datestamp = strftime("%Y%m%d-%H%M%S", localtime);
my $work_dir = "/temp/nav_export";
my $objects_dir = "$work_dir/$username/objects";
my $export_dir = "$work_dir/$username/$datestamp";
print "Objects being exported to $export_dir\n";
make_path("$export_dir/Page", "$export_dir/Codeunit", "$export_dir/MenuSuite", "$export_dir/Query", "$export_dir/Report", "$export_dir/Table", "$export_dir/XMLport");
chdir $objects_dir or die "Could not change to $objects_dir: $!";
# delete empty files
foreach(glob('*.*')) {
unlink if -f and !-s _;
}
my #files = <*>;
my $count = #files;
print "Processing $count files\n";
open (my $fh, ">-") or die "Could not open standard out: $!";
# OBJECT Codeunit 1 ApplicationManagement
while(<>)
{
if (m/^OBJECT ([A-Za-z]+) ([0-9]+) (.*)/o)
{
my $objectType = $1;
my $objectID = $2;
my $objectName = my $firstLine = $3;
$objectName =~ s/[\. \/\(\)\\]/_/g; # translate spaces, (, ), ., \ and / to underscores
$objectName =~ tr/\cM//d; # get rid of Ctrl-M
my $filename = $export_dir . "/" . $objectType . "/" . $objectType . "~" . $objectID . "~" . $objectName;
close $fh and open($fh, '>', $filename) or die "Could not open file '$filename' $!";
print $fh "OBJECT $objectType $objectID $firstLine\n";
next;
}
print $fh $_;
}
I've learned quite a bit of PowerShell in the past few days. There are some things that it really does quite well. And some (such as calling an executable with variables and command line options that have spaces) that are maddeningly difficult to figure out. To call curl, this is what I resorted to:
$curl = "C:\Program Files (x86)\cURL\bin\curl"
$arg10 = '-s'
$arg1 = '-X'
$arg11 = 'post'
$arg2 = '-H'
$arg22 = '"Accept-Encoding: gzip,deflate"'
$arg3 = '-H'
$arg33 = '"Content-Type: text/xml;charset=UTF-8"'
$arg4 = '-H'
$arg44 = '"SOAPAction:urn:microsoft-dynamics-schemas/page/permissionrange:ReadMultiple"'
$arg5 = '--ntlm'
$arg6 = '-u'
$arg66 = 'username:password'
$arg7 = '-d'
$arg77 = '"#soap_envelope.txt"'
$arg8 = "http://$servicetier.corp.company.net:7047/$database/WS/DBDOC/Page/PermissionRange"
$arg9 = "-o"
$arg99 = "c:\temp\nav_export\$env:username\raw_list.xml"
&"$curl" $arg10 $arg1 $arg11 $arg2 $arg22 $arg3 $arg33 $arg4 $arg44 $arg5 $arg6 $arg66 $arg7 $arg77 $arg8 $arg9 $arg99
I realize that part is a bit of a tangent. But I've been working really hard at trying to figure this out and not have to bother you nice folk here at stackoverflow!
I'm ambivalent about making it work in PowerShell or fixing the Perl code at this point. I just need to make it work. But I'm hoping it's just some little difference in filehandle handling between linux and Windows.

It's hard to believe that the Perl code that you show does anything on Linux either. It looks like your while loop is supposed to be reading through all of the files in the #files array, but to make it do that you have to copy the names to #ARGV.
Also note that #files will contain directories as well as files.
I suggest you change the lines starting with my #files = <*> to this. There's no reason why it shouldn't work on both Windows and Linux.
our #ARGV = grep -f, glob '*';
my $count = #ARGV;
print "Processing $count files\n";
my $fh;
while (<>) {
s/\s+\z//; # Remove trailing whitespace (including CR and LF)
my #fields = split ' ', $_, 4;
if ( #fields == 4 and $fields[0] eq 'OBJECT' ) {
my ($object_type, $object_id, $object_name) = #fields[1,2,3];
$object_name =~ tr{ ().\\/}{_}; # translate spaces, (, ), ., \ and / to underscores
my $filename = "$export_dir/$object_type/$object_type~$object_id~$object_name";
open $fh, '>', $filename or die "Could not open file '$filename': $!";
}
print $fh "$_\n" if $fh;
if (eof) {
close $fh;
$fh = undef;
}
}

Related

Add automatic header comments for all source files

I have a standardized way of writing comments before and after a function.
For example before declaring any function I write,
!---------------------------
! NAME_OF_FUNC (no)
!---------------------------
where no is the nth function in a given file containing multiple functions.
I know that a function e.g begins with (Fortran convention) either subroutine NAME_OF_SUB or function NAME_OF_FUNC. Hence, my end result would be
!---------------------------
! NAME_OF_FUNC (no)
!---------------------------
function NAME_OF_FUNC(...)
end function
!---------------------------
Can somebody show an example of how to write a bash script or in any other scripting language a code that can go through all my source files and the standard convention I just showed an example of?
Here is an example in Perl. It does not take backup before overwriting (I would recommend that you try to improve this script and add backup functionality). It also does not add the end of subroutine marker. But it would easy to add that functionality, please try. It also assumes that you want to modify all *.f95 files in the current directory and all its sub directories:
use feature qw(say state);
use strict;
use warnings;
use File::Find::Rule;
my #files = File::Find::Rule->new->name('*.f95')->in('.');
for my $fn (#files) {
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my $txt = do {local $/; <$fh>};
close $fh;
process_txt( \$txt );
save_txt( $fn, \$txt );
}
sub save_txt {
my ( $fn, $txt ) = #_;
open ( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
print $fh $$txt;
close $fh;
}
sub process_txt {
my ( $txt ) = #_;
my $i = 1;
$$txt =~ s/^(.*(?i:function|subroutine)\s+)([a-zA-Z_]+)(\s*\(.*$)/
do_substitution($1, $2, $3, $i++)/egmx;
}
sub do_substitution {
my ( $head, $name, $end, $i ) = #_;
my $line = $head . $name . $end;
$line =~ s/\s+$//;
my $N = length $line;
my $cline = '!' . '-' x $N;
my $mline = '! ' . $name;
my $snum = "($i)";
my $M = (length $mline) + (length $snum);
my $mspc = ($N > $M) ? (' ' x ($N-$M)) : ' ';
$mline = $mline . $mspc . $snum;
my $new_txt = join "\n", $cline, $mline, $cline, $line;
return $new_txt;
}

Executing Perl script from windows-command line with 2 entry

this is my Perl script
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
sub xml2array{
my $path = shift;
my $twig = XML::Twig->new->parsefile($path);
return map { $_ -> att('VirtualPath') } $twig -> get_xpath('//Signals');
}
sub compareMappingToArray {
my $mapping = shift;
my $signalsRef = shift;
my $i = 1;
print "In file : $mapping\n";
open(my $fh, $mapping);
while (my $r = <$fh>) {
chomp $r;
if ($r =~ /\'(ModelSpecific.*)\'/) {
my $s = $1;
my #matches = grep { /^$s$/ } #{$signalsRef};
print "line $i : not found - $s\n" if scalar #matches ==0;
print "line $i : multiple $s\n" if scalar #matches > 1;
}
$i = $i + 1 # keep line index
}
}
my $mapping = "C:/Users/HOR1DY/Desktop/Global/TA_Mapping/CAN/CAN_ESP_002_mapping.pm";
my #virtualpath = xml2array("SignalModel.xml");
compareMappingToArray($mapping, \#virtualpath);
The script works well, the aim of it is to compare the file "SignalModel.xml" and "CAN_ESP_002_mapping.pm" and putting the lines that didn't matches in a .TXT file. Here is how the .TXT file looks like:
In file : C:/Users/HOR1DY/Desktop/Global/TA_Mapping/CAN/CAN_ESP_002_mapping.pm
line 331 : not found - ModelSpecific.EID.NET.CAN_Engine.VCU.Transmit.VCU_202.R2B_VCU_202__byte_3
line 348 : not found - ModelSpecific.EID.NET.CAN_Engine.CMM_WX.Transmit.CMM_HYB_208.R2B_CMM_HYB_208__byte_2
line 368 : not found - ModelSpecific.EID.NET.CAN_Engine.VCU.Transmit.VCU_222.R2B_VCU_222__byte_0
But for this script, I put the two files that need to be compare inside of the code and instead of doing that, I would like to run the script in windows cmd line and having something like:
C:\Users>perl CANMappingChecker.pl -'file 1' 'file 2'
All the files are in .zip file so if I can execute the script that he goes inside and take the 2 files that I need for comparison, it should be perfect.
I really don't know how to do and what to put inside my script to make that in the cmd windows. Thanks for your help !
Program (or script) parameters are stored in the #ARGV array. shift and pop without any parameter will work on #ARGV when used outside of a sub, in a sub they operate on #_.
See Archive::Zip for zip file handling.

Keep shared entries among many files

I have hundreds of files, each with different number of entries (>xxxx) and want to keep only shared entries among all files, separately. I'm not sure what is the best method to do this, maybe perl! I used sort, uniq of bash, but I didn't get the correct answer. The format of IDs start with > and follows 4 characters among all files.
1.fa
>abcd
CTGAATGCC
2.fa
>abcd
AAATGCGCG
>efgh
CGTAC
3.fa
>abcd
ATGCAATA
>efgh
TAACGTAA
>ijkl
TGCAA
Final results, of this example would be:
1.fa
>abcd
CTGAATGCC
2.fa
>abcd
AAATGCGCG
3.fa
>abcd
ATGCAATA
This Perl program will do as you ask. It uses Perl's built-in edit in place functionality and renames the original files to 1.fa.bak etc. It shouldn't have a problem with blank lines in your data as long as the sequence is always on one line immediately following the ID
use strict;
use warnings 'all';
my #files = glob '*.fa';
printf "Processing %d file%s\n", scalar #files, #files == 1 ? "" : "s";
exit if #files < 2;
my %ids;
{
local #ARGV = #files;
while ( <> ) {
++$ids{$1} if /^>(\S+)/;
}
}
# remove keys that aren't in all files
delete #ids{ grep { $ids{$_} < #files } keys %ids };
my $n = keys %ids;
printf "%d ID%s common to all files\n", $n, $n == 1 ? '' : "s";
exit unless $n;
{
local #ARGV = #files;
local $^I = '.bak';
while ( <> ) {
next unless /^>(\S+)/ and $ids{$1};
print;
print scalar <>;
}
}
Here is Perl solution, that may help you:
use feature qw(say);
use strict;
use warnings;
my $file_dir = 'files';
chdir $file_dir;
my #files = <*.fa>;
my $num_files = scalar #files;
my %ids;
for my $file (#files) {
open ( my $fh, '<', $file) or die "Could not open file '$file': $!";
while (my $id = <$fh>) {
chomp $id;
chomp (my $sequence = <$fh>);
$ids{$id}++;
}
close $fh;
}
for my $file (#files) {
open ( my $fh, '<', $file) or die "Could not open file '$file': $!";
my $new_name = $file . '.new';
open ( my $fh_write, '>', $new_name ) or die "Could not open file '$new_name': $!";
while (my $id = <$fh>) {
chomp $id;
chomp (my $sequence = <$fh>);
if ( $ids{$id} == $num_files ) {
say $fh_write $id;
say $fh_write $sequence;
}
}
close $fh_write;
close $fh;
}
It assumes that all the .fa files are located in the directory named $file_dir, and it writes the new sequences to new files in the same directory. The new file names get the .new extension.

Perl: Weird Tie::File behaviour in Windows as opposed to Unix

I have this perl script that uses Tie::File.
In Linux(Ubuntu) when I invoke the script via Bash it works as expected but in Windows when I invoke the script via Powershell it behaves weirdly (check P.S. below).
Code:
#!/usr/bin/perl -T
use strict;
use warnings;
use Tie::File;
use CommonStringTasks;
if ( #ARGV != 4 ) {
print "ERROR:Inadequate/Redundant arguments.\n";
print "Usage: perl <pl_executable> <path/to/peer_main.java> <peer_main.java>\n";
print " <score_file_index> <port_step_index>\n";
print $ARGV[0], "\n";
print $ARGV[1], "\n";
print $ARGV[2], "\n";
print $ARGV[3], "\n";
exit 1;
}
my $PEER_DIR = $ARGV[0];
my $PEER_FILE = $ARGV[1];
my $PEER_PACKAGE = "src/planetlab/app";
my $PEER_PATH = "${PEER_DIR}/${PEER_PACKAGE}/${PEER_FILE}";
# Check if args are tainted ...
# Check $PEER_PATH file permissions ...
open(my $file, "+<", "$PEER_PATH")
or
die("File ", $PEER_FILE, " could not be opened for editing:$!");
# Edit the file and change variables for debugging/deployment setup.
# Number demanglers:
# -flock -> arg2 -> 2 stands for FILE_EX
# Options (critical!):
# -Memory: Inhibit caching as this will allow record changes on the fly.
tie my #fileLines,
'Tie::File',
$file,
memory => 0
or
die("File ", $PEER_FILE, " could not be tied with Tie::File:$!");
flock $file, 2;
my $i = 0;
my $scoreLine = "int FILE_INDEX = " . $SCORE . ";";
my $portLine = "int SERVER_PORT = " . $PORT . ";";
my $originalScoreLine = "int FILE_INDEX =";
my $originalPortLine = "int SERVER_PORT =";
(tied #fileLines)->defer;
while (my $line = <$file>) {
if ( ($line =~ m/($scoreLine)/) && ($SCORE+1 > 0) ) {
print "Original line (score): ", "\n", $scoreLine, "\n";
chomp $line;
$line = substr($line, 0, -($scoreDigits+1));
$line = $line . (++$SCORE) . ";";
print "Editing line (score): ", $i, "\n", trimLeadSpaces($fileLines[$i]), "\n";
$fileLines[$i] = $line;
print "Line replaced with:\n", trimLeadSpaces($line), "\n";
next;
}
if ( ($line =~ m/($portLine)/) && ($PORT > 0) ) {
print "Original line (port): ", "\n", $portLine, "\n";
chomp $line;
$line = substr($line, 0, -($portDigits+1));
$line = $line . (++$PORT) . ";";
print "Editing line (port): ", $i, "\n", trimLeadSpaces($fileLines[$i]), "\n";
$fileLines[$i] = $line;
print "Line replaced with:\n", trimLeadSpaces($line), "\n";
last;
}
# Restore original settings.
if ( ($line =~ m/($originalScoreLine)/) && ($SCORE < 0) ) {
print "Restoring line (score) - FROM: ", "\n", $fileLines[$i], "\n";
$fileLines[$i] = " private static final int FILE_INDEX = 0;";
print "Restoring line (score) - TO: ", "\n", $fileLines[$i], "\n";
next;
}
if ( ($line =~ m/($originalPortLine)/) && ($PORT < 0) ) {
print "Restoring line (port) - FROM: ", "\n", $fileLines[$i], "\n";
$PORT = abs($PORT);
$fileLines[$i] = " private static final int SERVER_PORT = " . $PORT . ";";
print "Restoring line (port) - TO: ", "\n", $fileLines[$i], "\n";
last;
}
} continue {
$i++;
}
(tied #fileLines)->flush;
untie #fileLines;
close $file;
The perl version in both OSes is 5+(in Windows Active-State Perl with CPAN modules).
Could it be the way I open the filehandle? Any ideas anyone?
P.S.: The first version had a while (<$file>) and instead of $line I used the $_ variable but when I did that I had a behaviour where specific lines would not be edited but instead the file would get appended with a hundred newlines or so followed by the (correctly) edited line and so on. I also had a warning about $fileLines[$i] being uninitialized!Clearly something's wrong with the Tie::File structure in Windows or something else that I am not aware of. Same erratic behaviour takes place with the changes and in Linux(Ubuntu) behaviour again is as expected.
The OPs question is vague, and lacks input and expected output. Therefore I will simply note some of my concerns:
First, using Tie::File and <$file> and flock on the same handle seems to be both overkill and dangerous. I would recommend simply using Tie::File to iterate and to edit, such as:
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
tie my #lines, 'Tie::File', 'filename';
foreach my $linenum ( 0..$#lines ) {
if ($lines[$linenum] =~ /something/) {
$lines[$linenum] = 'somethingelse';
}
}
Perhaps better than edit inline, as Tie::File allows, copy the file to a backup, iterate over the lines using <$file>, then write to a new file with the old name.
#!/usr/bin/env perl
use strict;
use warnings;
use File::Copy 'move';
my $infile = $ARGV[0];
move( $infile, "$infile.bak");
open my $inhandle, '<', "$infile.bak";
open my $outhandle, '>', $infile;
while( my $line = <$inhandle> ) {
if ($line =~ /something/) {
$line = 'somethingelse';
}
print $outhandle $line;
}
Second, the -MModule flag simply translates to a use Module; at the top of the script. Therefore -MCPAN is use CPAN;, however loading the CPAN module does nothing for the script. CPAN.pm gives a script the ability to install modules.
Third, we will be able to help better if you give and example input, an expected output, and a stripped down script that clearly shows how this operation is to perform while still failing in the same way that the actual script does.
I found out the source of my problems. The reason was the record separator!
Tie::File expected in Windows a /r/n record separator so it read the whole file in just one pass. My files are in UTF-8, with Unix line endings.
That is why when I was traversing the $fileLines and accessed any index beyond 0 I got from perl a warning that the string was not initialized. Fixed the problem and now I am ready to go on! :D
P.S.: Mr Joel Berger I am marking your answer as valid/appropriate because you really tried helping me and I followed your first advice about the file handle :).
Thank you everyone for assisting me xD xD xD

Why can't I use more than 20 files with my Perl script and Windows's SendTo?

I'm trying to emulate RapidCRC's ability to check crc32 values within filenames on Windows Vista Ultimate 64-bit. However, I seem to be running into some kind of argument limitation.
I wrote a quick Perl script, created a batch file to call it, then placed a shortcut to the batch file in %APPDATA%\Microsoft\Windows\SendTo
This works great when I select about 20 files or less, right-click and "send to" my batch file script. However, nothing happens at all when I select more than that. I suspect there's a character or number of arguments limit somewhere.
Hopefully I'm missing something simple and that the solution or a workaround isn't too painful.
References:
batch file (crc32_inline.bat):
crc32_inline.pl %*
Perl notes:
I'm using (strawberry) perl v5.10.0
I have C:\strawberry\perl\bin in my path, which is where crc32.bat exists.
perl script (crc32_inline.pl):
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd;
use English qw( -no_match_vars );
use File::Basename;
$OUTPUT_AUTOFLUSH = 1;
my $crc32_cmd = 'crc32.bat';
my $failure_report_basename = 'crc32_failures.txt';
my %failures = ();
print "\n";
foreach my $arg (#ARGV) {
# if the file has a crc, check to see if it matches the calculated
# crc.
if (-f $arg and $arg =~ /\[([0-9a-f]{8})\]/i) {
my $crc = uc $1;
my $basename = basename($arg);
print "checking ${basename}... ";
my $calculated_crc = uc `${crc32_cmd} "${arg}"`;
chomp($calculated_crc);
if ($crc eq $calculated_crc) {
print "passed.\n";
}
else {
print "FAILED (calculated ${calculated_crc})\n";
my $dirname = dirname($arg);
$failures{$dirname}{$basename} = $calculated_crc;
}
}
}
print "\nReport Summary:\n";
if (scalar keys %failures == 0) {
print " All files OK\n";
}
else {
print sprintf(" %d / %d files failed crc32 validation.\n" .
" See %s for details.\n",
scalar keys %failures,
scalar #ARGV,
$failure_report_basename);
my $failure_report_fullname = $failure_report_basename;
if (defined -f $ARGV[0]) {
$failure_report_fullname
= dirname($ARGV[0]) . '/' . $failure_report_basename;
}
$OUTPUT_AUTOFLUSH = 0;
open my $fh, '>' . $failure_report_fullname or die $!;
foreach my $dirname (sort keys %failures) {
print {$fh} $dirname . "\n";
foreach my $basename (sort keys %{$failures{$dirname}}) {
print {$fh} sprintf(" crc32(%s) basename(%s)\n",
$failures{$dirname}{$basename},
$basename);
}
}
close $fh;
$OUTPUT_AUTOFLUSH = 1;
}
print sprintf("\n%s done! (%d seconds elapsed)\n" .
"Press enter to exit.\n",
basename($0),
time() - $BASETIME);
<STDIN>;
I will recommend just putting a shortcut to your script in the "Send To" directory instead of doing it via a batch file (which is subject to cmd.exes limits on command line length).

Resources