How to call CMD with utf8 arguments from perl? - windows

How to call CMD with utf8 arguments from perl without messing the argument's characters?
One of the things I've tried is to convert a string of unicode characters to its unicode character codes then use system($cmd):
use utf8;
`chcp 65001`;
binmode STDOUT, ":encoding(UTF-8)";
$string = "αω";
$converted_string = convert_to_unicode_code($string);
# gets $converted_string = '\x{03B1}\x{03C9}'
$cmd = 'program "'.$converted_string.'"';
# $cmd's value is: program "\x{03B1}\x{03C9}"
system($cmd);
sub convert_to_unicode_code {
my $input = shift;
$input =~ s/(.)/"\\x{" . (sprintf "%04X", ord $1) . "}"/eg;
return $input;
}
Actually this solution doesn't work as expected and calls program "\x{03B1}\x{03C9}" instead of program "αω".

See Win32::Unicode.
αω.bat
#echo hiαω
so48996757.pl
use utf8;
use Win32::Unicode::Process qw(systemW);
system 'chcp 65001';
systemW 'αω';

Related

Passing bash variable into inline perl script

Reference the 2nd to last line in my script. For some reason Perl is not able to access the variable $perlPort how can I fix this? Note: $perlPort is a bash variable location before my perl script
perl -e '
{
package MyWebServer;
use HTTP::Server::Simple::CGI;
use base qw(HTTP::Server::Simple::CGI);
my %dispatch = (
"/" => \&resp_hello,
);
sub handle_request {
my $self = shift;
my $cgi = shift;
my $path = $cgi->path_info();
my $handler = $dispatch{$path};
if (ref($handler) eq "CODE") {
print "HTTP/1.0 200 OK\r\n";
$handler->($cgi);
} else {
print "HTTP/1.0 404 Not found\r\n";
print $cgi->header,
$cgi->start_html("Not found"),
$cgi->h1("Not found"),
$cgi->end_html;
}
}
sub resp_hello {
my $cgi = shift; # CGI.pm object
return if !ref $cgi;
my $who = $cgi->param("name");
print $cgi->header,
$cgi->start_html("Hello"),
$cgi->h1("Hello Perl"),
$cgi->end_html;
}
}
my $pid = MyWebServer->new($perlPort)->background();
print "Use 'kill $pid' to stop server.\n";'
export perlPort
perl -e '
...
my $pid = MyWebServer->new($ENV{perlPort})->background();
'
You can use -s switch to pass variables. See http://perldoc.perl.org/perlrun.html
perl -se '
...
my $pid = MyWebBrowser->new($perlPort)->background();
...' -- -perlPort="$perlPort"
You can still pass command line arguments to your script. Replace $perlPort with $ARGV[0], then call you script as
perl -e $' ...
my $pid = MyWebServer->new($ARGV[0])->background();
print "Use \'kill $pid\' to stop server.\n";' "$perlPort"
Note the other problem: You can't include single quotes inside a single-quoted string in bash. You can work around this by using a $'...'-quoted string as the argument to Perl, which can contain escaped single quotes. If your script doesn't need to read from standard input, it would be a better idea to have perl read from a here-document instead.
perl <<'EOF' "$perlPort"
{
package MyWebServer;
use HTTP::Server::Simple::CGI;
...
my $pid = MyWebServer->new($ARGV[0])->background();
print "Use 'kill $pid' to stop server.\n";
EOF
The best idea is to simply use a script file instead of trying to construct the script on the command line.
perl -e '
...
my $pid = MyWebServer->new('$perlPort')->background();
...

How to find out if a command exists in a POSIX compliant manner?

See the discussion at Is `command -v` option required in a POSIX shell? Is posh compliant with POSIX?. It describes that type as well as command -v option is optional in POSIX.1-2004.
The answer marked correct at Check if a program exists from a Bash script doesn't help either. Just like type, hash is also marked as XSI in POSIX.1-2004. See http://pubs.opengroup.org/onlinepubs/009695399/utilities/hash.html.
Then what would be a POSIX compliant way to write a shell script to find if a command exists on the system or not?
How do you want to go about it? You can look for the command on directories in the current value of $PATH; you could look in the directories specified by default for the system PATH (getconf PATH as long as getconf
exists on PATH).
Which implementation language are you going to use? (For example: I have a Perl implementation that does a decent job finding executables on $PATH — but Perl is not part of POSIX; is it remotely relevant to you?)
Why not simply try running it? If you're going to deal with Busybox-based systems, lots of the executables can't be found by searching — they're built into the shell. The major caveat is if a command does something dangerous when run with no arguments — but very few POSIX commands, if any, do that. You might also need to determine what command exit statuses indicate that the command is not found versus the command objecting to not being called with appropriate arguments. And there's little guarantee that all systems will be consistent on that. It's a fraught process, in case you hadn't gathered.
Perl implementation pathfile
#!/usr/bin/env perl
#
# #(#)$Id: pathfile.pl,v 3.4 2015/10/16 19:39:23 jleffler Exp $
#
# Which command is executed
# Loosely based on 'which' from Kernighan & Pike "The UNIX Programming Environment"
#use v5.10.0; # Uses // defined-or operator; not in Perl 5.8.x
use strict;
use warnings;
use Getopt::Std;
use Cwd 'realpath';
use File::Basename;
my $arg0 = basename($0, '.pl');
my $usestr = "Usage: $arg0 [-AafhqrsVwx] [-p path] command ...\n";
my $hlpstr = <<EOS;
-A Absolute pathname (determined by realpath)
-a Print all possible matches
-f Print names of files (as opposed to symlinks, directories, etc)
-h Print this help message and exit
-q Quiet mode (don't print messages about files not found)
-r Print names of files that are readable
-s Print names of files that are not empty
-V Print version information and exit
-w Print names of files that are writable
-x Print names of files that are executable
-p path Use PATH
EOS
sub usage
{
print STDERR $usestr;
exit 1;
}
sub help
{
print $usestr;
print $hlpstr;
exit 0;
}
sub version
{
my $version = 'PATHFILE Version $Revision: 3.4 $ ($Date: 2015/10/16 19:39:23 $)';
# Beware of RCS hacking at RCS keywords!
# Convert date field to ISO 8601 (ISO 9075) notation
$version =~ s%\$(Date:) (\d\d\d\d)/(\d\d)/(\d\d) (\d\d:\d\d:\d\d) \$%\$$1 $2-$3-$4 $5 \$%go;
# Remove keywords
$version =~ s/\$([A-Z][a-z]+|RCSfile): ([^\$]+) \$/$2/go;
print "$version\n";
exit 0;
}
my %opts;
usage unless getopts('AafhqrsVwxp:', \%opts);
version if ($opts{V});
help if ($opts{h});
usage unless scalar(#ARGV);
# Establish test and generate test subroutine.
my $chk = 0;
my $test = "-x";
my $optlist = "";
foreach my $opt ('f', 'r', 's', 'w', 'x')
{
if ($opts{$opt})
{
$chk++;
$test = "-$opt";
$optlist .= " -$opt";
}
}
if ($chk > 1)
{
$optlist =~ s/^ //;
$optlist =~ s/ /, /g;
print STDERR "$arg0: mutually exclusive arguments ($optlist) given\n";
usage;
}
my $chk_ref = eval "sub { my(\$cmd) = \#_; return -f \$cmd && $test \$cmd; }";
my #PATHDIRS;
my %pathdirs;
my $path = defined($opts{p}) ? $opts{p} : $ENV{PATH};
#foreach my $element (split /:/, $opts{p} // $ENV{PATH})
foreach my $element (split /:/, $path)
{
$element = "." if $element eq "";
push #PATHDIRS, $element if $pathdirs{$element}++ == 0;
}
my $estat = 0;
CMD:
foreach my $cmd (#ARGV)
{
if ($cmd =~ m%/%)
{
if (&$chk_ref($cmd))
{
print "$cmd\n" unless $opts{q};
next CMD;
}
print STDERR "$arg0: $cmd: not found\n" unless $opts{q};
$estat = 1;
}
else
{
my $found = 0;
foreach my $directory (#PATHDIRS)
{
my $file = "$directory/$cmd";
if (&$chk_ref($file))
{
$file = realpath($file) if $opts{A};
print "$file\n" unless $opts{q};
next CMD unless defined($opts{a});
$found = 1;
}
}
print STDERR "$arg0: $cmd: not found\n" unless $found || $opts{q};
$estat = 1;
}
}
exit $estat;

Need to split multiple files in a directory based on string, rename properly using powershell or fix my perl script

I have a directory full of files (text exports of Dynamics NAV objects that have been exported) in Windows. Each file contains multiple objects. I need to split each file into separate files based on lines that begin with OBJECT, and name each file appropriately.
The purpose of this is to get our Dynamics NAV system into git.
I wrote a nifty perl program to do this that works great on linux. But it hangs on the while(<>) loop in Windows (Server 2012 if that matters).
So, I need to either figure out how to do this in the PowerShell script that I wrote that generates all of the files, or fix my perl script that I'm calling from PowerShell. Does Windows perl handle filehandles differently than linux?
Here's my code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Path qw(make_path remove_tree);
use POSIX qw(strftime);
my $username = getlogin || getpwuid($<);
my $datestamp = strftime("%Y%m%d-%H%M%S", localtime);
my $work_dir = "/temp/nav_export";
my $objects_dir = "$work_dir/$username/objects";
my $export_dir = "$work_dir/$username/$datestamp";
print "Objects being exported to $export_dir\n";
make_path("$export_dir/Page", "$export_dir/Codeunit", "$export_dir/MenuSuite", "$export_dir/Query", "$export_dir/Report", "$export_dir/Table", "$export_dir/XMLport");
chdir $objects_dir or die "Could not change to $objects_dir: $!";
# delete empty files
foreach(glob('*.*')) {
unlink if -f and !-s _;
}
my #files = <*>;
my $count = #files;
print "Processing $count files\n";
open (my $fh, ">-") or die "Could not open standard out: $!";
# OBJECT Codeunit 1 ApplicationManagement
while(<>)
{
if (m/^OBJECT ([A-Za-z]+) ([0-9]+) (.*)/o)
{
my $objectType = $1;
my $objectID = $2;
my $objectName = my $firstLine = $3;
$objectName =~ s/[\. \/\(\)\\]/_/g; # translate spaces, (, ), ., \ and / to underscores
$objectName =~ tr/\cM//d; # get rid of Ctrl-M
my $filename = $export_dir . "/" . $objectType . "/" . $objectType . "~" . $objectID . "~" . $objectName;
close $fh and open($fh, '>', $filename) or die "Could not open file '$filename' $!";
print $fh "OBJECT $objectType $objectID $firstLine\n";
next;
}
print $fh $_;
}
I've learned quite a bit of PowerShell in the past few days. There are some things that it really does quite well. And some (such as calling an executable with variables and command line options that have spaces) that are maddeningly difficult to figure out. To call curl, this is what I resorted to:
$curl = "C:\Program Files (x86)\cURL\bin\curl"
$arg10 = '-s'
$arg1 = '-X'
$arg11 = 'post'
$arg2 = '-H'
$arg22 = '"Accept-Encoding: gzip,deflate"'
$arg3 = '-H'
$arg33 = '"Content-Type: text/xml;charset=UTF-8"'
$arg4 = '-H'
$arg44 = '"SOAPAction:urn:microsoft-dynamics-schemas/page/permissionrange:ReadMultiple"'
$arg5 = '--ntlm'
$arg6 = '-u'
$arg66 = 'username:password'
$arg7 = '-d'
$arg77 = '"#soap_envelope.txt"'
$arg8 = "http://$servicetier.corp.company.net:7047/$database/WS/DBDOC/Page/PermissionRange"
$arg9 = "-o"
$arg99 = "c:\temp\nav_export\$env:username\raw_list.xml"
&"$curl" $arg10 $arg1 $arg11 $arg2 $arg22 $arg3 $arg33 $arg4 $arg44 $arg5 $arg6 $arg66 $arg7 $arg77 $arg8 $arg9 $arg99
I realize that part is a bit of a tangent. But I've been working really hard at trying to figure this out and not have to bother you nice folk here at stackoverflow!
I'm ambivalent about making it work in PowerShell or fixing the Perl code at this point. I just need to make it work. But I'm hoping it's just some little difference in filehandle handling between linux and Windows.
It's hard to believe that the Perl code that you show does anything on Linux either. It looks like your while loop is supposed to be reading through all of the files in the #files array, but to make it do that you have to copy the names to #ARGV.
Also note that #files will contain directories as well as files.
I suggest you change the lines starting with my #files = <*> to this. There's no reason why it shouldn't work on both Windows and Linux.
our #ARGV = grep -f, glob '*';
my $count = #ARGV;
print "Processing $count files\n";
my $fh;
while (<>) {
s/\s+\z//; # Remove trailing whitespace (including CR and LF)
my #fields = split ' ', $_, 4;
if ( #fields == 4 and $fields[0] eq 'OBJECT' ) {
my ($object_type, $object_id, $object_name) = #fields[1,2,3];
$object_name =~ tr{ ().\\/}{_}; # translate spaces, (, ), ., \ and / to underscores
my $filename = "$export_dir/$object_type/$object_type~$object_id~$object_name";
open $fh, '>', $filename or die "Could not open file '$filename': $!";
}
print $fh "$_\n" if $fh;
if (eof) {
close $fh;
$fh = undef;
}
}

Non-determinism in encoding when using open() with scalar and I/O layers in Perl

For several hours now I am fighting a bug in my Perl program. I am not sure if I do something wrong or the interpreter does, but the code is non-deterministic while it should be deterministic, IMO. Also it exhibits the same behavior on ancient Debian Lenny (Perl 5.10.0) and a server just upgraded to Debian Wheezy (Perl 5.14.2). It boiled down to this piece of Perl code:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";
my $c = "";
open C, ">:utf8", \$c;
print C "š";
close C;
die "Does not happen\n" if utf8::is_utf8($c);
print utf8::decode($c) ? "Decoded\n" : "Undecoded\n";
It initializes Perl 5 interpreter in strict mode with warnings enabled, with character strings (as opposed to byte strings) and named standard streams encoded in UTF8 (internal notion of UTF-8, but pretty close; changing to full UTF-8 makes no difference). Then it opens a file handle to an “in-memory file” (scalar variable), prints a single two-byte UTF-8 character into it and examines the variable upon closure.
The scalar variable now always has UTF8 bit flipped off. However it sometimes contains a byte string (converted to character string via utf8::decode()) and sometimes a character string that just needs to flip on its UTF8 bit (Encode::_utf8_on()).
When I execute my code repeatedly (1000 times, via Bash), it prints Undecoded and Decoded with approximately the same frequencies. When I change the string I write into the “file”, e.g. add a newline at its end, Undecoded disappears. When utf8::decode succeeds and I try it for the same original string in a loop, it keeps succeeding in the same instance of interpreter; however, if it fails, it keeps failing.
What is the explanation for the observed behavior? How can I use file handle to a scalar variable together with character strings?
Bash playground:
for i in {1..1000}; do perl -we 'use strict; use utf8; binmode STDOUT, ":utf8"; binmode STDERR, ":utf8"; my $c = ""; open C, ">:utf8", \$c; print C "š"; close C; die "Does not happen\n" if utf8::is_utf8($c); print utf8::decode($c) ? "Decoded\n" : "Undecoded\n";'; done | grep Undecoded | wc -l
For reference and to be absolutely sure, I also made a version with pedantic error handling – same results.
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
binmode STDOUT, ":utf8" or die "Cannot binmode STDOUT\n";
binmode STDERR, ":utf8" or die "Cannot binmode STDERR\n";
my $c = "";
open C, ">:utf8", \$c or die "Cannot open: $!\n";
print C "š" or die "Cannot print: $!\n";
close C or die "Cannot close: $!\n";
die "Does not happen\n" if utf8::is_utf8($c);
print utf8::decode($c) ? "Decoded\n" : "Undecoded\n";
Examining $c in details reveals it has nothing to do with the content of $c or its internals, and the result of decode accurately represents what it did or didn't do.
$ for i in {1..2}; do
perl -MDevel::Peek -we'
use strict; use utf8;
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";
my $c = "";
open C, ">:utf8", \$c;
print C "š";
close C;
die "Does not happen\n" if utf8::is_utf8($c);
Dump($c);
print utf8::decode($c) ? "Decoded\n" : "Undecoded\n";
Dump($c)
'
echo
done
SV = PV(0x17c8470) at 0x17de990
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x17d7a40 "\305\241"
CUR = 2
LEN = 16
Decoded
SV = PV(0x17c8470) at 0x17de990
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0x17d7a40 "\305\241" [UTF8 "\x{161}"]
CUR = 2
LEN = 16
SV = PV(0x2d0fee0) at 0x2d26400
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x2d1f4b0 "\305\241"
CUR = 2
LEN = 16
Undecoded
SV = PV(0x2d0fee0) at 0x2d26400
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x2d1f4b0 "\305\241"
CUR = 2
LEN = 16
This was a bug in utf8::decode, but it was fixed in 5.16.3 or earlier, probably 5.16.0 since it was still present in 5.14.2.
A suitable workaround it to use Encode's decode_utf8 instead.

Perl - Using Variables from an Input File in the URL when a Variable has a Space (two words)

What am I doing? The script loads a string from a .txt (locations.txt), and separates it into 6 variables. Each variable is separated by a comma. Then I go to a website, whose address depends on these 6 values.
What is the problem? If there is a space as a character in a variable as part of a string in locations.txt. When there is a space, it does not get the correct url.
The input file is:
locations.txt = Heinz,Weber,Sierra Leone,1915,M,White
Because Sierra Leone has a space, the url is:
https://familysearch.org/search/collection/results#count=20&query=%2Bgivenname%3AHeinz%20%2Bsurname%3AWeber%20%2Bbirth_place%3A%22Sierra%20Leone%22%20%2Bbirth_year%3A1914-1918~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219
But that does not get processed correctly in the code below.
I'm using the packages:
use strict;
use warnings;
use WWW::Mechanize::Firefox;
use HTML::TableExtract;
use Data::Dumper;
use LWP::UserAgent;
use JSON;
use CGI qw/escape/;
use HTML::DOM;
This is the beginning of the code :
open(my $l, 'locations26.txt') or die "Can't open locations: $!";
open(my $o, '>', 'out2.txt') or die "Can't open output file: $!";
while (my $line = <$l>) {
chomp $line;
my %args;
#args{qw/givenname surname birth_place birth_year gender race/} = split /,/, $line;
$args{birth_year} = ($args{birth_year} - 2) . '-' . ($args{birth_year} + 2);
my $mech = WWW::Mechanize::Firefox->new(create => 1, activate => 1);
$mech->get("https://familysearch.org/search/collection/results#count=20&query=%2Bgivenname%3A".$args{givenname}."%20%2Bsurname%3A".$args{surname}."%20%2Bbirth_place%3A".$args{birth_place}."%20%2Bbirth_year%3A".$args{birth_year}."~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219");
# REST OF THE SCRIPT HERE. MANY LINES.
}
As another example, the following would work:
locations.txt = Benjamin,Schuvlein,Germany,1913,M,White
I have not used Mechanize, so not sure whether you need to encode the URL. Try encoding space to %20 or + before running $mech->get
$url =~ s/ /+/g;
Or
$url =~ s/ /%20/g
whichever works :)
====
Edit:
my $url = "https://familysearch.org/search/collection/results#count=20& query=%2Bgivenname%3A".$args{givenname}."%20%2Bsurname%3A".$args{surname}."%20%2Bbirth_place%3A".$args{birth_place}."%20%2Bbirth_year%3A".$args{birth_year}."~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219";
$url =~ s/ /+/g;
$mech->get($url);
Try that.
If you have the error
Global symbol "$url" requires explicit package name.
this means that you forgot to declare $url with :
my $url;
Your use part seems freaky, I'm pretty sure that you don't need all of those modules # the same time. If you use WWW::Mechanize, no need LWP::UserAgent and CGI I guess...

Resources