problems in Perl script with localized user name on Windows - windows

I have a perl script to start a script file with default program.
system("Start C:\\Temp\\test.jsx");
It works file with English user names but when I change user name to ai𥹖Ц中 it doesn't work.
Also no error message appears to I'm not able to debug.

perl on Windows uses so called ANSI functions to interface with the outside world. That means, if you use interesting characters (for example, certain Turkish letters on a US-English Windows install), perl cannot see them. As I wrote on my blog:
You can't pass characters that are outside of the Windows code page to perl on the command line. It doesn't matter whether you have set the code page to 65001 and use the -CA command line argument to perl: Because perl uses main instead of wmain as the entry point, it never sees anything other than characters in the ANSI code page.
For example:
$ chcp 65001
$ perl -CAS -E "say for #ARGV" şey
sey
That's because ş does not appear in CP 437 which is what my laptop is using. By the time it reaches the internals of perl, it has already become s.
So, there is not much you can do with a stock perl. I was working on a set of patches, but things intervened. I may still get around to it this summer.
Now, in your case, you are passing "interesting characters" to an external program via system. The same problem applies. Because perl is using the ANSI versions of functions used to spawn processes etc, the program spawned will not see a Unicode environment. So, if you are trying to use Korean or Japanese programs with a system code page that does not include them, I am not sure what will happen.
There is not much you can do once perl is running. The environment, command line arguments, everything lives in the ANSI world from that point on. There may be funky work-arounds, but for that one would need to know exactly how 'ai𥹖Ц中' gets from your perl program to the external program.

Related

Detect prompt with xterm.js

TLDR: I want know how to detect from the output of a shell (e.g. zsh, bash) the location of the prompts (e.g. user#machine /etc % ).
Details
I have made a working shell frontend in the browser based on xtermjs. It is now equivalent feature-wise to e.g. the default macOS terminal application with zsh, bash and powershell. In a nutshell, it works by executing a shell process (e.g. zsh) as the child of a parent process that pipes the input/output from/to the browser via web sockets.
I want now to step up and implement a "collapse" functionality that hides the output of the selected commands in the history (like Visual Studio Code does now).
To this end, I need to detect the location of the prompts from the terminal output: the collapse function would then hide the characters between two consecutive prompts.
I know I can use the approaches below:
detect the prompt with a regular expression (I would need to parse the PS1 variable)
inject some special character sequence before and after the prompt (e.g. in variable PS1)
But both do not seem very robust, and may not work with some specific command interpreter. I could not find yet the location where this functionality is implemented in the source code of Visual Studio Code.
My question is: is there a robust way to achieve this functionality for at least zsh, bash and powershell (it is fine if it is specific to xterm.js)
Edit 1
This SO question is related: ANSI escape sequence for collapsing/folding text (maybe hierarchically)
It links to this interesting thread: https://github.com/PerBothner/DomTerm/issues/54
It appears that DomTerm uses escapes sequences at folding points (my solution 2).
Yet I don't see how to inject them into the terminal, besides hacking the PS1 env var.
Edit 2
While parsing iTerm's documentation I found out that it takes advantage of the hooks provided by the shell (e.g. for zsh) in order to print some special escape sequence at various locations, including before showing the prompt.
For example, in zsh, I can print string "🐮" before each prompt be executing precmd() { echo '🐮' }. Then when I execute e.g. ls I get
$> ls
[...]
🐮
$>
There is a more extensive explanation of the various available hooks for various shells here.
It looks like PowerShell uses a very different system though.

How to prevent \n to from being translated to \r\n on Windows

I am using Windows 10 and Strawberry Perl
It is well know that the line terminator in Linux is \n, and in Windows is \r\n.
I found that, on my computer, files of Linux type will automatically transform to windows type \r\n after a replacement operation like
perl -i.bak -pe "s/aaa/bbb/g" test.txt
But this is not what I want, and it seems unreasonable. I would like to know if this is a Strawberry Perl issue, or another factor?
How can I leave the line terminator unaffected on Windows?
This is standard behavior of Perl on Windows (to convert \n to \r\n).
You can get around it by using binmode, which prevents Perl from doing the automatic line-ending conversion.
Your command would then be changed to look like this. It tells binmode to write to STDOUT and then that output has to be redirected to another file. The following command should do what you want (though not in place):
perl -pe "BEGIN{ binmode(STDOUT) } s/aaa/bbb/g" test.txt > newtest.txt
"Actually I set unix format as notepad++ default which is my main editor" I think you should make the effort the keep files with the correct line endings for the appropriate system. You won't make any friends if you keep Linux files everywhere, as it will make it very hard for others to work with your non-standard methodology
It isn't very hard to work with both systems properly, as all you have to do is to make the change automatically when copying from one system to another. You can use dos2unix and unix2dos when making the copy, but it would be a simple job to write a Perl program to update all of your systems with the relevant version of the text files
However, if you insist on this plan, this should help you to achieve it
By default, when running on Windows, perl will use the IO layers :unix and :crlf, which means it works the same as on a Linux system but will translate CRLF to LF on input, and LF to CRLF on output
You can make individual open calls behave differently by adding an explicit pseudo-layer :raw, which removes the :crlf layer. But if you want to modify the special file handlesSTDIN, STDOUT and ARGV then you need a different tactic, because those handles are opened for you by perl
You can use the open pragma at the top of your program, like this
use open IO => ':raw';
which will implicitly apply the :raw layer to every input or output file handle, including the special handles. You can set this from the command line by using
perl -Mopen=IO,raw program.pl
Or you can set the PERLIO environment variable
set PERLIO=raw
which will affect every program run henceforth from the same cmd window

What does #!perl do exactly?

I recently received a perl script with the first line
#!perl
This of course doesn't work but I would like to know exactly what it does. Can anyone help?
That is called a shebang and is used (in Unix) to specify which interpreter binary should be used to run a script.
It's a very nice mechanism, especially together with the way the file system permissions can be used to turn a script file into something the shell (and program loader) consider to be executable.
It seems the interpreter name must be absolute. The linked text says that a relative name (like the bare perl here) will be interpreted as ./perl, so it might work if executed from the directory the perl binary is in. Not a very common use-case but at least it could work if used that way, i.e. if you want to wrap a perl binary with a script, you want that script to run the binary that's in the same place as the script, and not use absolute paths to pick some other binary. Haven't tested this.
A more typical approach (at least in Linux) is to use the env program to pick the perl:
#!/usr/bin/env perl
If you give the shebang line like this,
#!perl
it will look for the perl interpreter in the current directory. If the perl interpreter exists in the current directory, then the perl script will start to execute otherwise it shows bad interpreter error.

why perl testing dir -d " " on windows returns true? bug or not?

Is there any explanation for perl returning true when testing a space-string dir on Windows?
Run on Windows7:
perl -e "print qq{found\n} if -d qq{ }"
You will get output: found
But same perl code returns false on Linux.
Tested on perl 5.8 and strawberry perl 5.18 on Windows
Is it a bug or have an unconventional reasoning?
Under windows, any perl operation that internally tries to test a file or directory for existence uses the Win32 function CreateFile. Under windows a filename ending in spaces is not legal (although not clearly documented), and for some strange reason the CreateFile function internally strips all trailing spaces before trying to open the file/directory.
Since a name starting with space looks like a relative path, the space is first appended to your current working directory but then internally ignored by the Win32 function. This results in the directory test seeing your current directory and reporting success.
The perl stat function then proceeds to acquiring additional information and seems to somewhere down the line handle the trailing space(s) differently and therefore fails to get any further information. However it seems to explicitly leave the mode attribute of the stat result set, because it earlier had deduced that there existed a directory.
So yes, I would call this a bug in the windows port of perl, but fixing it is probably not as easy as it sounds, as there are lots of special cases for UNC paths, reparse points, NTFS/FAT filesystems etc. that makes correct handling of trailing spaces rather tricky.
Your best bet would probably be to explicitly strip trailing spaces from any assumed directory name at a very early point (who needs that anyway) and croak if nothing is left after stripping.

what is the use of "#!/usr/local/bin/ruby -w" at the start of a ruby program

what is the use of writing the following command at the start of a ruby program ?
#!/usr/local/bin/ruby -w
Is it OS specific command? Is it valid for ruby on windows ? if not, then what is an equivalent command in windows ?
It is called a Shebang. It tells the program loader what command to use to execute the file. So when you run ./myscript.rb, it actually translates to /usr/local/bin/ruby -w ./myscript.rb.
Windows uses file associations for the same purpose; the shebang line has no effect (edit: see FMc's answer) but causes no harm either.
A portable way (working, say, under Cygwin and RVM) would be:
#!/usr/bin/env ruby
This will use the env command to figure out where the Ruby interpreter is, and run it.
Edit: apparently, precisely Cygwin will misbehave with /usr/bin/env ruby -w and try to look up ruby -w instead of ruby. You might want to put the effect of -w into the script itself.
The Shebang line is optional, and if you run the ruby interpreter and pass the script to it as a command line argument, then the flags you set on the command line are the flags ruby runs with.
A Shebang line is not ruby at all (unless you want to call it a ruby comment). It's really shell scripting. Most linux and unix users are running the BASH shell (stands for Borne Again SHell), but pretty much every OS has a command interpreter that will honor the Shebang.
“#!/usr/local/bin/ruby -w”
The "she" part is the octothorp (#), aka pound sign, number sign, hash mark, and now hash tag (I still call it tic-tac-toe just cuz).
The "bang" part is the exclaimation mark (!), and it's like banging your fist on the table to exclaim the command.
On Windows, the "Shell" is the command prompt, but even without a black DOS window, the command interpreter will run the script based on file associations. It doesn't really matter if the command interpreter or the programming langue is reading the shebang and making sure the flags are honored, the important point is, they are honored.
The "-w" is a flag. Basically it's an instruction for ruby to follow when it runs the script. In this case "-w" turns on warnings, so you'll get extra warnings (script keeps running) or errors (script stops running) during the execution of the script. Warnings and exceptions can be caught and acted upon during the program. These help programmers find problems that lead to unexpected behavior.
I'm a fan of quick and dirty scripts to get a job done, so no -w. I'm also a fan of high quality reusable coding, so definitely use -w. The right tool for the right job. If you're learning, then always use -w. When you know what you're doing, and stop using -w on quick tasks, you'll start to figure out when it would have helped to use -w instead of spending hours trouble shooting. (Hint, when the cause of a problem isn't pretty obvious, just add -w and run it to see what you get).
"-w" requires some extra coding to make it clear to ruby what you mean, so it doesn't immediately solve things, but if you already write code with -w, then you won't have much trouble adding the necessary bits to make a small script run with warnings. In fact, if you're used to using -w, you're probably already writing code that way and -w won't change anything unless you've forgotten something. Ruby requires far less "plumbing code" then most (maybe all) compiled languages like C++, so choosing to not use -w doesn't allow you to save much typing, it just lets you think less before you try running the script (IMHO).
-v is verbose mode, and does NOT change the running of the script (no warnings are raised, no stopping the script in new places). Several sites and discussions call -w verbose mode, but -w is warning mode and it changes the execution of the script.
Although the execution behavior of a shebang line does not translate directly to the Windows world, the flags included on that line (for example the -w in your question) do affect the running Ruby script.
Example 1 on a Windows machine:
#!/usr/local/bin/ruby -w
puts $VERBOSE # true
Example 2 on a Windows machine:
#!/usr/local/bin/ruby
puts $VERBOSE # false

Resources