Executing Perl script from windows-command line with 2 entry - windows

this is my Perl script
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
sub xml2array{
my $path = shift;
my $twig = XML::Twig->new->parsefile($path);
return map { $_ -> att('VirtualPath') } $twig -> get_xpath('//Signals');
}
sub compareMappingToArray {
my $mapping = shift;
my $signalsRef = shift;
my $i = 1;
print "In file : $mapping\n";
open(my $fh, $mapping);
while (my $r = <$fh>) {
chomp $r;
if ($r =~ /\'(ModelSpecific.*)\'/) {
my $s = $1;
my #matches = grep { /^$s$/ } #{$signalsRef};
print "line $i : not found - $s\n" if scalar #matches ==0;
print "line $i : multiple $s\n" if scalar #matches > 1;
}
$i = $i + 1 # keep line index
}
}
my $mapping = "C:/Users/HOR1DY/Desktop/Global/TA_Mapping/CAN/CAN_ESP_002_mapping.pm";
my #virtualpath = xml2array("SignalModel.xml");
compareMappingToArray($mapping, \#virtualpath);
The script works well, the aim of it is to compare the file "SignalModel.xml" and "CAN_ESP_002_mapping.pm" and putting the lines that didn't matches in a .TXT file. Here is how the .TXT file looks like:
In file : C:/Users/HOR1DY/Desktop/Global/TA_Mapping/CAN/CAN_ESP_002_mapping.pm
line 331 : not found - ModelSpecific.EID.NET.CAN_Engine.VCU.Transmit.VCU_202.R2B_VCU_202__byte_3
line 348 : not found - ModelSpecific.EID.NET.CAN_Engine.CMM_WX.Transmit.CMM_HYB_208.R2B_CMM_HYB_208__byte_2
line 368 : not found - ModelSpecific.EID.NET.CAN_Engine.VCU.Transmit.VCU_222.R2B_VCU_222__byte_0
But for this script, I put the two files that need to be compare inside of the code and instead of doing that, I would like to run the script in windows cmd line and having something like:
C:\Users>perl CANMappingChecker.pl -'file 1' 'file 2'
All the files are in .zip file so if I can execute the script that he goes inside and take the 2 files that I need for comparison, it should be perfect.
I really don't know how to do and what to put inside my script to make that in the cmd windows. Thanks for your help !

Program (or script) parameters are stored in the #ARGV array. shift and pop without any parameter will work on #ARGV when used outside of a sub, in a sub they operate on #_.
See Archive::Zip for zip file handling.

Related

How to remove small contigs from fasta files?

## removesmalls.pl
#!/usr/bin/perl
use strict;
use warnings;
my $minlen = shift or die "Error: `minlen` parameter not provided\n";
{
local $/=">";
while(<>) {
chomp;
next unless /\w/;
s/>$//gs;
my #chunk = split /\n/;
my $header = shift #chunk;
my $seqlen = length join "", #chunk;
print ">$_" if($seqlen >= $minlen);
}
local $/="\n";
}
Exexecuting the script as follows:
perl removesmalls.pl 1000 contigs.fasta > contigs-1000.fasta
The above script works for me but there is a problem,
i have 109 different fasta files with different file names.
i can run the script for individual file but i want to run the script at once for all files and the result file should be individually different for each.
file names are like SRR8224532.fasta, SRR8224533.fasta, SRR8224534.fasta, and so on
i want the result files after removing the contigs (i.e., for me less than 1000) something like SRR8224532-out.fasta,
SRR8224533-out.fasta, and so on.
Any help or suggestion would be helpfull.

Keep shared entries among many files

I have hundreds of files, each with different number of entries (>xxxx) and want to keep only shared entries among all files, separately. I'm not sure what is the best method to do this, maybe perl! I used sort, uniq of bash, but I didn't get the correct answer. The format of IDs start with > and follows 4 characters among all files.
1.fa
>abcd
CTGAATGCC
2.fa
>abcd
AAATGCGCG
>efgh
CGTAC
3.fa
>abcd
ATGCAATA
>efgh
TAACGTAA
>ijkl
TGCAA
Final results, of this example would be:
1.fa
>abcd
CTGAATGCC
2.fa
>abcd
AAATGCGCG
3.fa
>abcd
ATGCAATA
This Perl program will do as you ask. It uses Perl's built-in edit in place functionality and renames the original files to 1.fa.bak etc. It shouldn't have a problem with blank lines in your data as long as the sequence is always on one line immediately following the ID
use strict;
use warnings 'all';
my #files = glob '*.fa';
printf "Processing %d file%s\n", scalar #files, #files == 1 ? "" : "s";
exit if #files < 2;
my %ids;
{
local #ARGV = #files;
while ( <> ) {
++$ids{$1} if /^>(\S+)/;
}
}
# remove keys that aren't in all files
delete #ids{ grep { $ids{$_} < #files } keys %ids };
my $n = keys %ids;
printf "%d ID%s common to all files\n", $n, $n == 1 ? '' : "s";
exit unless $n;
{
local #ARGV = #files;
local $^I = '.bak';
while ( <> ) {
next unless /^>(\S+)/ and $ids{$1};
print;
print scalar <>;
}
}
Here is Perl solution, that may help you:
use feature qw(say);
use strict;
use warnings;
my $file_dir = 'files';
chdir $file_dir;
my #files = <*.fa>;
my $num_files = scalar #files;
my %ids;
for my $file (#files) {
open ( my $fh, '<', $file) or die "Could not open file '$file': $!";
while (my $id = <$fh>) {
chomp $id;
chomp (my $sequence = <$fh>);
$ids{$id}++;
}
close $fh;
}
for my $file (#files) {
open ( my $fh, '<', $file) or die "Could not open file '$file': $!";
my $new_name = $file . '.new';
open ( my $fh_write, '>', $new_name ) or die "Could not open file '$new_name': $!";
while (my $id = <$fh>) {
chomp $id;
chomp (my $sequence = <$fh>);
if ( $ids{$id} == $num_files ) {
say $fh_write $id;
say $fh_write $sequence;
}
}
close $fh_write;
close $fh;
}
It assumes that all the .fa files are located in the directory named $file_dir, and it writes the new sequences to new files in the same directory. The new file names get the .new extension.

Need to split multiple files in a directory based on string, rename properly using powershell or fix my perl script

I have a directory full of files (text exports of Dynamics NAV objects that have been exported) in Windows. Each file contains multiple objects. I need to split each file into separate files based on lines that begin with OBJECT, and name each file appropriately.
The purpose of this is to get our Dynamics NAV system into git.
I wrote a nifty perl program to do this that works great on linux. But it hangs on the while(<>) loop in Windows (Server 2012 if that matters).
So, I need to either figure out how to do this in the PowerShell script that I wrote that generates all of the files, or fix my perl script that I'm calling from PowerShell. Does Windows perl handle filehandles differently than linux?
Here's my code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Path qw(make_path remove_tree);
use POSIX qw(strftime);
my $username = getlogin || getpwuid($<);
my $datestamp = strftime("%Y%m%d-%H%M%S", localtime);
my $work_dir = "/temp/nav_export";
my $objects_dir = "$work_dir/$username/objects";
my $export_dir = "$work_dir/$username/$datestamp";
print "Objects being exported to $export_dir\n";
make_path("$export_dir/Page", "$export_dir/Codeunit", "$export_dir/MenuSuite", "$export_dir/Query", "$export_dir/Report", "$export_dir/Table", "$export_dir/XMLport");
chdir $objects_dir or die "Could not change to $objects_dir: $!";
# delete empty files
foreach(glob('*.*')) {
unlink if -f and !-s _;
}
my #files = <*>;
my $count = #files;
print "Processing $count files\n";
open (my $fh, ">-") or die "Could not open standard out: $!";
# OBJECT Codeunit 1 ApplicationManagement
while(<>)
{
if (m/^OBJECT ([A-Za-z]+) ([0-9]+) (.*)/o)
{
my $objectType = $1;
my $objectID = $2;
my $objectName = my $firstLine = $3;
$objectName =~ s/[\. \/\(\)\\]/_/g; # translate spaces, (, ), ., \ and / to underscores
$objectName =~ tr/\cM//d; # get rid of Ctrl-M
my $filename = $export_dir . "/" . $objectType . "/" . $objectType . "~" . $objectID . "~" . $objectName;
close $fh and open($fh, '>', $filename) or die "Could not open file '$filename' $!";
print $fh "OBJECT $objectType $objectID $firstLine\n";
next;
}
print $fh $_;
}
I've learned quite a bit of PowerShell in the past few days. There are some things that it really does quite well. And some (such as calling an executable with variables and command line options that have spaces) that are maddeningly difficult to figure out. To call curl, this is what I resorted to:
$curl = "C:\Program Files (x86)\cURL\bin\curl"
$arg10 = '-s'
$arg1 = '-X'
$arg11 = 'post'
$arg2 = '-H'
$arg22 = '"Accept-Encoding: gzip,deflate"'
$arg3 = '-H'
$arg33 = '"Content-Type: text/xml;charset=UTF-8"'
$arg4 = '-H'
$arg44 = '"SOAPAction:urn:microsoft-dynamics-schemas/page/permissionrange:ReadMultiple"'
$arg5 = '--ntlm'
$arg6 = '-u'
$arg66 = 'username:password'
$arg7 = '-d'
$arg77 = '"#soap_envelope.txt"'
$arg8 = "http://$servicetier.corp.company.net:7047/$database/WS/DBDOC/Page/PermissionRange"
$arg9 = "-o"
$arg99 = "c:\temp\nav_export\$env:username\raw_list.xml"
&"$curl" $arg10 $arg1 $arg11 $arg2 $arg22 $arg3 $arg33 $arg4 $arg44 $arg5 $arg6 $arg66 $arg7 $arg77 $arg8 $arg9 $arg99
I realize that part is a bit of a tangent. But I've been working really hard at trying to figure this out and not have to bother you nice folk here at stackoverflow!
I'm ambivalent about making it work in PowerShell or fixing the Perl code at this point. I just need to make it work. But I'm hoping it's just some little difference in filehandle handling between linux and Windows.
It's hard to believe that the Perl code that you show does anything on Linux either. It looks like your while loop is supposed to be reading through all of the files in the #files array, but to make it do that you have to copy the names to #ARGV.
Also note that #files will contain directories as well as files.
I suggest you change the lines starting with my #files = <*> to this. There's no reason why it shouldn't work on both Windows and Linux.
our #ARGV = grep -f, glob '*';
my $count = #ARGV;
print "Processing $count files\n";
my $fh;
while (<>) {
s/\s+\z//; # Remove trailing whitespace (including CR and LF)
my #fields = split ' ', $_, 4;
if ( #fields == 4 and $fields[0] eq 'OBJECT' ) {
my ($object_type, $object_id, $object_name) = #fields[1,2,3];
$object_name =~ tr{ ().\\/}{_}; # translate spaces, (, ), ., \ and / to underscores
my $filename = "$export_dir/$object_type/$object_type~$object_id~$object_name";
open $fh, '>', $filename or die "Could not open file '$filename': $!";
}
print $fh "$_\n" if $fh;
if (eof) {
close $fh;
$fh = undef;
}
}

Perl: Weird Tie::File behaviour in Windows as opposed to Unix

I have this perl script that uses Tie::File.
In Linux(Ubuntu) when I invoke the script via Bash it works as expected but in Windows when I invoke the script via Powershell it behaves weirdly (check P.S. below).
Code:
#!/usr/bin/perl -T
use strict;
use warnings;
use Tie::File;
use CommonStringTasks;
if ( #ARGV != 4 ) {
print "ERROR:Inadequate/Redundant arguments.\n";
print "Usage: perl <pl_executable> <path/to/peer_main.java> <peer_main.java>\n";
print " <score_file_index> <port_step_index>\n";
print $ARGV[0], "\n";
print $ARGV[1], "\n";
print $ARGV[2], "\n";
print $ARGV[3], "\n";
exit 1;
}
my $PEER_DIR = $ARGV[0];
my $PEER_FILE = $ARGV[1];
my $PEER_PACKAGE = "src/planetlab/app";
my $PEER_PATH = "${PEER_DIR}/${PEER_PACKAGE}/${PEER_FILE}";
# Check if args are tainted ...
# Check $PEER_PATH file permissions ...
open(my $file, "+<", "$PEER_PATH")
or
die("File ", $PEER_FILE, " could not be opened for editing:$!");
# Edit the file and change variables for debugging/deployment setup.
# Number demanglers:
# -flock -> arg2 -> 2 stands for FILE_EX
# Options (critical!):
# -Memory: Inhibit caching as this will allow record changes on the fly.
tie my #fileLines,
'Tie::File',
$file,
memory => 0
or
die("File ", $PEER_FILE, " could not be tied with Tie::File:$!");
flock $file, 2;
my $i = 0;
my $scoreLine = "int FILE_INDEX = " . $SCORE . ";";
my $portLine = "int SERVER_PORT = " . $PORT . ";";
my $originalScoreLine = "int FILE_INDEX =";
my $originalPortLine = "int SERVER_PORT =";
(tied #fileLines)->defer;
while (my $line = <$file>) {
if ( ($line =~ m/($scoreLine)/) && ($SCORE+1 > 0) ) {
print "Original line (score): ", "\n", $scoreLine, "\n";
chomp $line;
$line = substr($line, 0, -($scoreDigits+1));
$line = $line . (++$SCORE) . ";";
print "Editing line (score): ", $i, "\n", trimLeadSpaces($fileLines[$i]), "\n";
$fileLines[$i] = $line;
print "Line replaced with:\n", trimLeadSpaces($line), "\n";
next;
}
if ( ($line =~ m/($portLine)/) && ($PORT > 0) ) {
print "Original line (port): ", "\n", $portLine, "\n";
chomp $line;
$line = substr($line, 0, -($portDigits+1));
$line = $line . (++$PORT) . ";";
print "Editing line (port): ", $i, "\n", trimLeadSpaces($fileLines[$i]), "\n";
$fileLines[$i] = $line;
print "Line replaced with:\n", trimLeadSpaces($line), "\n";
last;
}
# Restore original settings.
if ( ($line =~ m/($originalScoreLine)/) && ($SCORE < 0) ) {
print "Restoring line (score) - FROM: ", "\n", $fileLines[$i], "\n";
$fileLines[$i] = " private static final int FILE_INDEX = 0;";
print "Restoring line (score) - TO: ", "\n", $fileLines[$i], "\n";
next;
}
if ( ($line =~ m/($originalPortLine)/) && ($PORT < 0) ) {
print "Restoring line (port) - FROM: ", "\n", $fileLines[$i], "\n";
$PORT = abs($PORT);
$fileLines[$i] = " private static final int SERVER_PORT = " . $PORT . ";";
print "Restoring line (port) - TO: ", "\n", $fileLines[$i], "\n";
last;
}
} continue {
$i++;
}
(tied #fileLines)->flush;
untie #fileLines;
close $file;
The perl version in both OSes is 5+(in Windows Active-State Perl with CPAN modules).
Could it be the way I open the filehandle? Any ideas anyone?
P.S.: The first version had a while (<$file>) and instead of $line I used the $_ variable but when I did that I had a behaviour where specific lines would not be edited but instead the file would get appended with a hundred newlines or so followed by the (correctly) edited line and so on. I also had a warning about $fileLines[$i] being uninitialized!Clearly something's wrong with the Tie::File structure in Windows or something else that I am not aware of. Same erratic behaviour takes place with the changes and in Linux(Ubuntu) behaviour again is as expected.
The OPs question is vague, and lacks input and expected output. Therefore I will simply note some of my concerns:
First, using Tie::File and <$file> and flock on the same handle seems to be both overkill and dangerous. I would recommend simply using Tie::File to iterate and to edit, such as:
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
tie my #lines, 'Tie::File', 'filename';
foreach my $linenum ( 0..$#lines ) {
if ($lines[$linenum] =~ /something/) {
$lines[$linenum] = 'somethingelse';
}
}
Perhaps better than edit inline, as Tie::File allows, copy the file to a backup, iterate over the lines using <$file>, then write to a new file with the old name.
#!/usr/bin/env perl
use strict;
use warnings;
use File::Copy 'move';
my $infile = $ARGV[0];
move( $infile, "$infile.bak");
open my $inhandle, '<', "$infile.bak";
open my $outhandle, '>', $infile;
while( my $line = <$inhandle> ) {
if ($line =~ /something/) {
$line = 'somethingelse';
}
print $outhandle $line;
}
Second, the -MModule flag simply translates to a use Module; at the top of the script. Therefore -MCPAN is use CPAN;, however loading the CPAN module does nothing for the script. CPAN.pm gives a script the ability to install modules.
Third, we will be able to help better if you give and example input, an expected output, and a stripped down script that clearly shows how this operation is to perform while still failing in the same way that the actual script does.
I found out the source of my problems. The reason was the record separator!
Tie::File expected in Windows a /r/n record separator so it read the whole file in just one pass. My files are in UTF-8, with Unix line endings.
That is why when I was traversing the $fileLines and accessed any index beyond 0 I got from perl a warning that the string was not initialized. Fixed the problem and now I am ready to go on! :D
P.S.: Mr Joel Berger I am marking your answer as valid/appropriate because you really tried helping me and I followed your first advice about the file handle :).
Thank you everyone for assisting me xD xD xD

Why can't I use more than 20 files with my Perl script and Windows's SendTo?

I'm trying to emulate RapidCRC's ability to check crc32 values within filenames on Windows Vista Ultimate 64-bit. However, I seem to be running into some kind of argument limitation.
I wrote a quick Perl script, created a batch file to call it, then placed a shortcut to the batch file in %APPDATA%\Microsoft\Windows\SendTo
This works great when I select about 20 files or less, right-click and "send to" my batch file script. However, nothing happens at all when I select more than that. I suspect there's a character or number of arguments limit somewhere.
Hopefully I'm missing something simple and that the solution or a workaround isn't too painful.
References:
batch file (crc32_inline.bat):
crc32_inline.pl %*
Perl notes:
I'm using (strawberry) perl v5.10.0
I have C:\strawberry\perl\bin in my path, which is where crc32.bat exists.
perl script (crc32_inline.pl):
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd;
use English qw( -no_match_vars );
use File::Basename;
$OUTPUT_AUTOFLUSH = 1;
my $crc32_cmd = 'crc32.bat';
my $failure_report_basename = 'crc32_failures.txt';
my %failures = ();
print "\n";
foreach my $arg (#ARGV) {
# if the file has a crc, check to see if it matches the calculated
# crc.
if (-f $arg and $arg =~ /\[([0-9a-f]{8})\]/i) {
my $crc = uc $1;
my $basename = basename($arg);
print "checking ${basename}... ";
my $calculated_crc = uc `${crc32_cmd} "${arg}"`;
chomp($calculated_crc);
if ($crc eq $calculated_crc) {
print "passed.\n";
}
else {
print "FAILED (calculated ${calculated_crc})\n";
my $dirname = dirname($arg);
$failures{$dirname}{$basename} = $calculated_crc;
}
}
}
print "\nReport Summary:\n";
if (scalar keys %failures == 0) {
print " All files OK\n";
}
else {
print sprintf(" %d / %d files failed crc32 validation.\n" .
" See %s for details.\n",
scalar keys %failures,
scalar #ARGV,
$failure_report_basename);
my $failure_report_fullname = $failure_report_basename;
if (defined -f $ARGV[0]) {
$failure_report_fullname
= dirname($ARGV[0]) . '/' . $failure_report_basename;
}
$OUTPUT_AUTOFLUSH = 0;
open my $fh, '>' . $failure_report_fullname or die $!;
foreach my $dirname (sort keys %failures) {
print {$fh} $dirname . "\n";
foreach my $basename (sort keys %{$failures{$dirname}}) {
print {$fh} sprintf(" crc32(%s) basename(%s)\n",
$failures{$dirname}{$basename},
$basename);
}
}
close $fh;
$OUTPUT_AUTOFLUSH = 1;
}
print sprintf("\n%s done! (%d seconds elapsed)\n" .
"Press enter to exit.\n",
basename($0),
time() - $BASETIME);
<STDIN>;
I will recommend just putting a shortcut to your script in the "Send To" directory instead of doing it via a batch file (which is subject to cmd.exes limits on command line length).

Resources