How do you create unicode file names in Windows using Perl - windows

I have the following code
use utf8;
open($file, '>:encoding(UTF-8)', "さっちゃん.txt") or die $!;
print $file "さっちゃん";
But I get the file name as ã•ã£ã¡ã‚ƒã‚“.txt
I was wondering if there was a way of making this work as I would expect (meaning I have a unicode file name) this without resorting to Win32::API, Win32API::* or moving to another platform and using a Samba share to modify the files.
The intent is to ensure we do not have any Win32 specific modules that need to be loaded (even conditionally).

Perl treats file names as opaque strings of bytes. They need to be encoded as per your "locale"'s encoding (ANSI code page).
In Windows, this is is usually cp1252. It is returned by the GetACP system call. (Prepend "cp"). However, cp1252 doesn't support Japanese characters.
Windows also provides a "Unicode" aka "Wide" interface, but Perl doesn't provide access to it using builtins*. You can use Win32API::File's CreateFileW, though. IIRC, you need to still need to encode the file name yourself. If so, you'd use UTF-16le as the encoding.
* — Perl's support for Windows sucks in some respects.

Use Encode::Locale:
use utf8;
use Encode::Locale;
use Encode;
open($file, '>:encoding(UTF-8)', encode(locale_fs => "さっちゃん.txt") ) or die $!;
print $file "さっちゃん";

The following produces a unicoded file name on Windows 7 using Activestate Perl.
#-----------------------------------------------------------------------
# Unicode file names on Windows using Perl
# Philip R Brenan at gmail dot com, Appa Apps Ltd, 2013
#-----------------------------------------------------------------------
use feature ":5.16";
use Data::Dump qw(dump);
use Encode qw/encode decode/;
use Win32API::File qw(:ALL);
# Create a file with a unicode name
my $e = "\x{05E7}\x{05EA}\x{05E7}\x{05D5}\x{05D5}\x{05D4}".
"\x{002E}\x{0064}\x{0061}\x{0074}\x{0061}"; # File name in UTF-8
my $f = encode("UTF-16LE", $e); # Format supported by NTFS
my $g = eval dump($f); # Remove UTF ness
$g .= chr(0).chr(0); # 0 terminate string
my $F = Win32API::File::CreateFileW
($g, GENERIC_WRITE, 0, [], OPEN_ALWAYS, 0, 0); # Create file via Win32API
say $^E if $^E; # Write any error message
# Write to the file
OsFHandleOpen(FILE, $F, "w") or die "Cannot open file";
binmode FILE;
print FILE "hello there\n";
close(FILE);

Related

How to print Hexagram in Perl?

I am trying to print the first hexagram from here in Perl.
The code below doesn't generate any errors, but doesn't print any hexagrams either.
use warnings;
use open ':encoding(utf8)';
binmode(STDOUT, ":utf8");
print "\x{4DC0}\n";
I was hoping to see this "䷀" not "Σ╖Ç".
You tell Perl your terminal is expecting UTF-8, but your terminal appears to expect one of the following:[1]
cp437
cp860
cp861
cp863
cp865
Seeing as these are all Windows code pages, I presume the terminal in question is a Windows console. If so, you can find out which encoding is expected using either of these commands:
chcp
perl -le"use Win32; print Win32::GetACP()"
Prepend cp to the number to get a name you can use with the Encode module (which is used by the :encoding layer).
Knowing the expected encoding won't help you, however. None of the character sets of these encodings contains "䷀", so your terminal can't display "䷀" without change.
You can switch the encoding expected by a Windows console to UTF-8 by issuing the following command:
chcp 65001
You may have to adjust the font in the console's properties.
I obtained the list of possible encodings using the following program:
use strict;
use warnings;
use feature qw( say );
use utf8;
use Encode qw( decode encode_utf8 );
my $output = encode_utf8("\x{4DC0}");
my $displayed = "䷀";
for my $encoding (Encode->encodings(":all")) {
defined( my $got = eval { decode($encoding, $output, Encode::FB_CROAK|Encode::LEAVE_SRC) } )
or next;
say $encoding if $output eq $displayed;
}
(Make sure the file is encoded using UTF-8.)

Strawberry Perl -- where are encoding conversions done by default?

Basically, I wrote a Perl script that creates an encoded command for Powershell and tries to run it. I had to explicitly convert the command string to utf-16 before base64-encoding it. I'm wondering why that's all I had to do get the script to work. What conversions is Perl on Windows* performing by default in the run of an "ordinary" program that interacts with the console and perhaps the file system? For instance, is argv converted? Is stdin/stdout converted? Does file IO go through a conversion?
✱ in particular, the Strawberry Perl distribution in case ActivePerl does something different
I'm trying to write a Perl script that calls many PowerShell fragments and depends on Strawberry Perl distribution.
PowerShell, rather conveniently, has an -encodedCommand flag that accepts a base64-encoded string and then processes it. This is helpful for avoiding quoting-related problems.
I tried the Simplest Thing That Could Possibly Work.
// powersheller.pl
#! /usr/bin/env perl
use strict;
use warnings;
use MIME::Base64;
use Encode qw/encode decode/;
use vars ('$powershell_command');
sub run_powershell_fragment {
my ($contents) = #_;
my $encoded = encode_base64($contents);
printf "encoded: %s\n", $encoded;
return `powershell.exe -noprofile -encodedCommand $encoded`;
}
printf "%s\n---\n", run_powershell_fragment($powershell_command);
BEGIN {
$powershell_command = <<EOF
echo "hi"
EOF
}
And ran it. Here's the output of the ... standard output channels (?) from running the perl script in the powershell window.
PS C\...> perl .\powersheller.pl
encoded: ZWNobyAiaGkiCQo=
Redundant argument in printf at .\powersheller.pl line 18.
?????? : The term '??????' is not recognized as the name of a cmdlet, function, script file, or operable program.
---
This looked like an encoding issue. I guessed that Perl was using something resembling utf-8 by default and powershell was expecting utf16-le or similar.
sub run_powershell_fragment {
my ($contents) = #_;
my $utf16_le_contents = encode("utf-16le", $contents);
my $encoded = encode_base64($utf16_le_contents);
printf "encoded: %s\n", $encoded;
return `powershell.exe -noprofile -encodedCommand $encoded`;
}
Technically, using "ucs-2le" also works. I don't know which is appropriate.
Anyway, all together, the program works as expected with the extra conversion inserted.
PS C:\...> perl .\powersheller.pl
encoded: ZQBjAGgAbwAgACIAaABpACIACQAKAA==
hi
---
Why was this all that I needed to do? Is Perl handling conversions related to argv and stdout &c?
qx`` performs no conversion. The command is expected to be encoded using the system's ANSI code page as it will be passed unmodified to CreateProcessA or similar.[1]
use Encode qw( encode );
use Win32 qw( );
my $cmd_ansi = encode("cp".Win32::GetACP(), $cmd);
`$cmd_ansi`
Of course, if the command contains only ASCII characters, encoding is moot.
Similarly, the values in #ARGV have not been decoded. They are received from the system encoded using the system's ANSI code page.
use Encode qw( decode );
use Win32 qw( );
my #decode_argv = map { decode("cp".Win32::GetACP(), $_) } #ARGV;
Of course, if the arguments contain only ASCII characters, decoding is moot.
By default, file handles do not perform any encoding or decoding except for CRLF ⇔ LF conversion (CRLF ⇒ LF on read, LF ⇒ CRLF on write). You are expected to provide a string of bytes (a string of characters with values in 0..255) to print/printf/say[1], and you will receive a string of bytes from the readline/read/readpipe.
You may provide an encoding/decoding layer when opening the file.
open(my $fh, '>:encoding(UTF-8)', $qfn)
You may provide an default encoding/decoding layer via the open pragma.
use open ':encoding(UTF-8)';
open(my $fh, '>', $qfn)
In both cases, you will now need to provide a string of Unicode Code Points to print/printf/say, and you will similarly receive a string of bytes from the readline/read/readpipe.
I'm not sure what's best for STDIN/STDOUT/STDERR, but you could start with the following:
use Win32 qw( );
my ($in_enc, $out_enc);
BEGIN {
$in_enc = "cp".Win32::GetConsoleCP();
$out_enc = "cp".Win32::GetConsoleOutputCP();
binmode STDIN, ":encoding($in_enc)";
binmode STDOUT, ":encoding($out_enc)";
binmode STDERR, ":encoding($out_enc)";
}
You should use UTF-16le rather than UCS-2le.
If you provide a string that contains non-bytes (characters outside of 0..255), Perl will assumes you meant to encode the string using UTF-8. It will warn ("Wide character") and encode the string using utf8.

Merge CSV file using Perl

I have a CSV file that I need to merge using the following Perl code, but I am not able to run it. It is supposed to out put multiple text files, but it is not working.
#!/usr/local/bin/perl
#
$template_file_name="rtr-template.txt";
while(<>) {
($location, $name, $lo0ip, $frameip, $framedlci, $eth0ip, $x)
= split (/,/);
open(TFILE, "< $template_file_name") || die "config template file $template_file_name:
$!\n";
$ofile_name = $name . ".txt";
open(OFILE, "> $ofile_name") || die "output config file $ofile_name: $!\n";
while (<TFILE>) {
s/##location##/$location/;
s/##rtrname##/$name/;
s/##eth0-ip##/$eth0ip/;
s/##loop0-ip##/$lo0ip/;
s/##frame-ip##/$frameip/;
s/##frame-DLCI##/$framedlci/;
printf OFILE $_;
}
}
The CSV file looks like this
Toronto, Router1, 172.25.15.1, 172.25.16.6,101, 172.25.100.1
And this the rtr-template.txt file
!
version 12.1
service timestamps debug datetime msec
service timestamps log datetime msec
service password-encryption
!
hostname ##rtrname##
!
enable password cisco
enable secret cisco
!
interface Loopback0
ip address ##loop0-ip## 255.255.255.255
!
interface Serial0/0
description Frame-Relay Circuit
no ip address
encapsulation frame-relay
ip route-cache policy
frame-relay lmi-type ansi
no shutdown
!
interface Serial0/0.1 point-to-point
ip address ##frame-ip## 255.255.255.252
frame-relay interface-dlci ##frame-DLCI##
!
interface FastEthernet0/1
description User LAN Segment
ip address ##eth0-ip## 255.255.255.0
no shutdown
!
router eigrp 99
network 172.25.0.0
!
snmp-server location ##location##
!
line con 0
password cisco
login
transport input none
line aux 0
password cisco
login
line vty 0 4
password cisco
login
transport input telnet
!
end
The main problem is that you're running your program by double-clicking on its file name in Windows Explorer.
The way <> works is that it will read from any files that you specify on the command line (that appear in the #ARGV array) or, if that array is empty, then it will read from STDIN — usually the keyboard.
Double-clicking the file gives it no command-line parameters, so it waits for you to type input in the black window that appears. That means you've entered <RTR-DATA.CSV as input to your while loop and Perl has tried to split it on commas, giving only a single field, so it sets $location to <RTR-DATA.CSV. Not what you wanted!
So, if you run your program from the cmd window by entering
create-configs.pl RTR-DATA.CSV
then within the program #ARGV will contain RTR-DATA.CSV and the <> will automatically read from that file
Here are some further notes on your code
There is no need for the #! line on a Windows system, which will normally have the .pl file extension tied to the perl executable
You must always use strict and use warnings at the top of every Perl program you write, and declare all your variables at their first point of use. That would have given some very strong clues about the nature of your problem
You should normally chomp each line read from a file, as it will have a newline character at the end that can cause problems if you leave it at the end of the last field returned by split
In this case you should also probably add optional whitespace either side of the comma in the pattern you are splitting on, so as to remove leading and trailing spaces from the fields it returns
You should always use lexical file handles ($out_fh instead of OFILE) with the three-parameter form of open
And here's a rewrite of your code that takes into account all of those points. I hope it helps
use strict;
use warnings;
my $template_file = 'rtr-template.txt';
while ( <> ) {
chomp;
my ($location, $name, $lo0ip, $frameip, $framedlci, $eth0ip) = split /\s*,\s*/;
open my $t_fh, '<', $template_file
or die qq{Unable to open "$template_file" for input: $!};
my $out_file = "$name.txt";
open my $out_fh, '>', $out_file
or die qq{Unable to open "$out_file" for output: $!};
while (<$t_fh>) {
s/##location##/$location/g;
s/##rtrname##/$name/g;
s/##eth0-ip##/$eth0ip/g;
s/##loop0-ip##/$lo0ip/g;
s/##frame-ip##/$frameip/g;
s/##frame-DLCI##/$framedlci/g;
printf $out_fh $_;
}
}
Use Text::CSV to parse the CSV file and Template Toolkit or similar to do the templating. Don't reinvent the wheel.

How to convert Unicode file to ASCII file in perl script on windows machine

I have a file in Unicode format on a windows machine. Is there any way to convert it to ASCII format on a windows machine using perl script
It's UTF-16 BOM.
If you want to convert unicode to ascii, you must be aware that some characters can't be converted, because they just don't exist in ascii.
If you can live with that, you can try this:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use open IN => ':encoding(UTF-16)';
use open OUT => ':encoding(ascii)';
my $buffer;
open(my $ifh, '<', 'utf16bom.txt');
read($ifh, $buffer, -s $ifh);
close($ifh);
open(my $ofh, '>', 'ascii.txt');
print($ofh $buffer);
close($ofh);
If you do not have autodie, just remove that line - you should then change your open/close statements with a
open(...) or die "error: $!\n";
If you have characters that can't be converted, you will get warnings on the console and your output file will have e.g. text like
\x{00e4}\x{00f6}\x{00fc}\x{00df}
in it.
BTW: If you don't have a mom but know it is Big Endian (Little Endian), you can change the encoding line to
use open IN => ':encoding(UTF-16BE)';
or
use open IN => ':encoding(UTF-16LE)';
Hope it works under Windows as well. I can't give it a try right now.
Take a look at the encoding option on the Perl open command. You can specify the encoding when opening a file for reading or writing:
It'd be something like this would work:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say switch);
use Data::Dumper;
use autodie;
open (my $utf16_fh, "<:encoding(UTF-16BE)", "test.utf16.txt");
open (my $ascii_fh, ">:encoding(ASCII)", ".gvimrc");
while (my $line = <$utf16_fh>) {
print $ascii_fh $line;
}
close $utf16_fh;
close $ascii_fh;

How do I write a file whose *filename* contains utf8 characters in Perl?

I am struggling creating a file that contains non-ascii characters.
The following script works fine, if it is called with 0 as parameter but dies when called with 1.
The error message is open: Invalid argument at C:\temp\filename.pl line 15.
The script is started within cmd.exe.
I expect it to write a file whose name is either (depending on the paramter) äöü.txt or äöü☺.txt. But I fail to create the filename containing a smiley.
use warnings;
use strict;
use Encode 'encode';
# Text is stored in utf8 within *this* file.
use utf8;
my $with_smiley = $ARGV[0];
my $filename = 'äöü' .
($with_smiley ? '☺' : '' ).
'.txt';
open (my $fh, '>', encode('cp1252', $filename)) or die "open: $!";
print $fh "Filename: $filename\n";
close $fh;
I am probably missing something that is obvious to others, but I can't find, so I'd appreciate any pointer towards solving this.
First of all, saying "UTF-8 character" is weird. UTF-8 can encode any Unicode character, so the UTF-8 character set is the Unicode character set. That means you want to create file whose name contain Unicode characters, and more specifically, Unicode characters that aren't in cp1252.
I've answered this on PerlMonks in the past. Answer copied below.
Perl treats file names as opaque strings of bytes. That means that file names need to be encoded as per your "locale"'s encoding (ANSI code page).
In Windows, code page 1252 is commonly used, and thus the encoding is usually cp1252.* However, cp1252 doesn't support Tamil and Hindi characters [or "☺"].
Windows also provides a "Unicode" aka "Wide" interface, but Perl doesn't provide access to it using builtins**. You can use Win32API::File's CreateFileW, though. IIRC, you need to still need to encode the file name yourself. If so, you'd use UTF-16le as the encoding.
Aforementioned Win32::Unicode appears to handle some of the dirty work of using Win32API::File for you. I'd also recommend starting with that.
* — The code page is returned (as a number) by the GetACP system call. Prepend "cp" to get the encoding.
** — Perl's support for Windows sucks in some respects.
The following runs on Windows 7, ActiveState Perl. It writes "hello there" to a file with hebrew characters in its name:
#-----------------------------------------------------------------------
# Unicode file names on Windows using Perl
# Philip R Brenan at gmail dot com, Appa Apps Ltd, 2013
#-----------------------------------------------------------------------
use feature ":5.16";
use Data::Dump qw(dump);
use Encode qw/encode decode/;
use Win32API::File qw(:ALL);
# Create a file with a unicode name
my $e = "\x{05E7}\x{05EA}\x{05E7}\x{05D5}\x{05D5}\x{05D4}".
"\x{002E}\x{0064}\x{0061}\x{0074}\x{0061}"; # File name in UTF-8
my $f = encode("UTF-16LE", $e); # Format supported by NTFS
my $g = eval dump($f); # Remove UTF ness
$g .= chr(0).chr(0); # 0 terminate string
my $F = Win32API::File::CreateFileW
($g, GENERIC_WRITE, 0, [], OPEN_ALWAYS, 0, 0); # Create file via Win32API
say $^E if $^E; # Write any error message
# Write to the file
OsFHandleOpen(FILE, $F, "w") or die "Cannot open file";
binmode FILE;
print FILE "hello there\n";
close(FILE);
no need to encode the filename (at least not on linux). This code works on my linux system:
use warnings;
use strict;
# Text is stored in utf8 within *this* file.
use utf8;
my $with_smiley = $ARGV[0] || 0;
my $filename = 'äöü' .
($with_smiley ? '?' : '' ).
'.txt';
open my $fh, '>', $filename or die "open: $!";
binmode $fh, ':utf8';
print $fh "Filename: $filename\n";
close $fh;
HTH, Paul

Resources