I CreateProcess(win32) powershell and read raw bytes from it.
And I see that it produces a lot of invisible chars.
For example \u{1b}[2J\u{1b}[m\u{1b}[
Is there any way how to stop it?
*Exactly it's possible to strip them manually but I do hope there other way.
You mention powershell (powershell.exe), i.e. the CLI of Windows PowerShell.
Windows PowerShell (unlike PowerShell (Core) 7+, see below) itself does not use coloring / formatting based on VT / ANSI escape sequences.
The implication is that third-party code is producing the VT sequences in your case, so you must deactivate (or reconfigure) it to avoid such sequences in the output.
A prime candidate is a custom prompt function; such functions often involve coloring for a better command-line experience.
In programmatic use of powershell.exe, however, you would only see what the prompt function prints if you feed PowerShell commands to the CLI's stdin, accompanied by passing argument -File - to the CLI (to instruct it to read commands from stdin) or by default.
To exclude the prompt-function output from the output altogether, use -Command -, as discussed in the answer to your previous question.
If you do want it, but want to use the default prompt string, suppress $PROFILE loading with the -NoProfile parameter, which is generally preferable in programmatic processing.
Controlling use of colored output (VT / ANSI escape sequences) in PowerShell (Core) 7.2+
In PowerShell (Core) 7+ (pwsh.exe) - but not in Windows PowerShell (powershell.exe) - PowerShell itself situationally uses VT (ANSI) escape sequence to produce formatted/colored output, such as in the output of Select-String and, in v7.2+, in formatted output in general, notably column headers in tabular output / property names in list output.
In PowerShell 7.2+ you can categorically suppress these as follows:
Note: Categorically disabling VT (ANSI) sequences is generally not necessary, because PowerShell automatically suppresses them when output is not sent to the host (display); that is, $PSStyle.OutputRendering defaults to Host[1].
This amounts to the same behavior that may Unix utilities (sensibly) exhibit: coloring is by default only applied when printing to the display (terminal), not when piping to another command or redirecting to a file.
However, note that $PSStyle.OutputRendering only applies to objects that are formatted by PowerShell's for-display formatting system and only when such formatted representations are converted to string data, either explicitly with Out-String, or implicitly with > / Out-File or when piping to an external program.[2]
From outside PowerShell, before launching it:
By defining the NO_COLOR environment variable, with any value, such as 1.
This causes PowerShell to set $PSStyle.OutputRendering to PlainText on startup, which instructs PowerShell not to use VT / ANSI escape sequences.
A growing number of external programs also respect this env. variable - see no-color.org
Alternatively, by setting the TERM environment variable to xtermm / xterm-mono:
Note that value dumb, at least as of PowerShell Core 7.2.0-preview.9 (the most recent version as of this writing), doesn't work: while it does cause $host.UI.SupportsVirtualTerminal to then reflect $false, as documented, $PSStyle.OutputRendering remains at its default, Host, and actual formatted output (such as from Get-Item /) still uses colors.
However, setting TERM may be more effective than NO_COLOR in getting external programs not to emit VT sequences - ultimately, there's no guarantee however.
That said, modifying TERM, especially setting it to value dumb, is best avoided in general, because external programs (on Unix-like platforms) may rely on the TERM variable for inferring (also non-color-related) capabilities of the hosting terminal application (which setting the variable to dumb takes away altogether and - at least hypothetically - value xtermm / xterm-mono may misrepresent the true fundamental terminal type).
From inside PowerShell:
With $PSStyle.OutputRendering = 'PlainText'
Note that this alone - unlike the environment variables discussed above - does not affect the behavior of external programs.
Note that third-party PowerShell code that uses VT sequences, especially if it predates PowerShell (Core) 7+, may not respect any of the standard mechanisms described above for disabling them (though may conceivably offer a custom mechanism).
[1] This applies since v7.2.0-preview.9, where Host was made the default and the previous default, Automatic, was removed altogether. In preview versions of v7.3.0, Ansi was temporarily the default, but since the official v7.3.0 release the (sensible) default is again Host.
[2] Notably, this means that string data that has embedded ANSI / VT escape sequences is not subject to $PSStyle.OutputRendering in v7.3+ (a change from v7.2), because strings aren't handled by the formatting system (they print as-is).
You can disable ANSI output rendering by setting the environment variable TERM to dumb:
SetEnvironmentVariable(TEXT("TERM"), TEXT("dumb"));
// proceed with your call to CreateProcess
Related
I am using Python's Paramiko library to SSH a remote machine and fetch some output from command-line. I see a lot of junk printing along with the actual output. How to get rid of this?
chan1.send("ls\n")
output = chan1.recv(1024).decode("utf-8")
print(output)
[u'Last login: Wed Oct 21 18:08:53 2015 from 172.16.200.77\r', u'\x1b[2J\x1b[1;1H[local]cli#BENU>enable', u'[local]cli#BENU#Configure',
I want to eliminate, [2J\x1b[1;1H and u from the output. They are junk.
It's not a junk. These are ANSI escape codes that are normally interpreted by a terminal client to pretty print the output.
If the server is correctly configured, you get these only, when you use an interactive terminal, in other words, if you requested a pseudo terminal for the session (what you should not, if you are automating the session).
The Paramiko automatically requests the pseudo terminal, if you used the SSHClient.invoke_shell, as that is supposed to be used for implementing an interactive terminal. See also How do I start a shell without terminal emulation in Python Paramiko?
If you automate an execution of remote commands, you better use the SSHClient.exec_command, which does not allocate the pseudo terminal by default (unless you override by the get_pty=True argument).
stdin, stdout, stderr = client.exec_command('ls')
See also What is the difference between exec_command and send with invoke_shell() on Paramiko?
Or as a workaround, see How can I remove the ANSI escape sequences from a string in python.
Though that's rather a hack and might not be sufficient. You might have other problems with the interactive terminal, not only the escape sequences.
You particularly are probably not interested in the "Last login" message and command-prompt (cli#BENU>) either. You do not get these with the exec_command.
If you need to use the "shell" channel due to some specific requirements or limitations of the server, note that it is technically possible to use the "shell" channel without the pseudo terminal. But Paramiko SSHClient.invoke_shell does not allow that. Instead, you can create the "shell" channel manually. See Can I call Channel.invoke_shell() without calling Channel.get_pty() beforehand, when NOT using Channel.exec_command().
And finally the u is not a part of the actual string value (note that it's outside the quotes). It's an indication that the string value is in the Unicode encoding. You want that!
This is actually not junk. The u before the string indicates that this is a unicode string. The \x1b[2J\x1b[1;1H is an escape sequence. I don't know exactly what it is supposed to do, but it appears to clear the screen when I print it out.
To see what I mean, try this code:
for string in output:
print string
In the following code, the ü is not the single Unicode character U+00FC but is a single grapheme cluster composed of two Unicode characters, the plain ASCII u U+0075 followed by the combining diaeresis U+0308.
fmt.Println("Jürgen Džemal")
fmt.Println("Ju\u0308rgen \u01c5emel")
If I run it in the go playground, it works as expected.
If I run it in a MS Windows 10 "Command Prompt" window, it doesn't visually combine the combining character with the prior character.
However when I cut and paste the text into here it appears correctly:
C:\> ver
Microsoft Windows [Version 10.0.17134.228]
C:\> test
Jürgen Džemal
Jürgen Džemel
On screen, in the "Command Prompt" window it looked more like:
Ju¨rgen Džemel
Changing the code page (chcp) from 850 to 65001 made no difference. Changing fonts (Consolas, Courier etc) made no difference.
In the past I have experienced problems that were fundamentally because Microsoft require Windows programs to use a different API to output characters to STDOUT depending on whether STDOUT is attached to a console or to a file. I don't know if this is a different manifestation of the same issue.
Is there something I can do to make this Unicode grapheme-cluster appear correctly?
As eryksun and Peter commented,
The Windows console (conhost.exe) doesn't support combining codes. You'll have to first normalize to an equivalent string that uses precomposed characters.
you can use golang.org/x/text/unicode/norm to do the normalization (e.g. norm.NFC.String("Jürgen Džemal"))
I tried this
s := "Ju\u0308rgen \u01c5emel"
fmt.Println(s) // dieresis not combined with u by conhost.exe
s = norm.NFC.String(s)
fmt.Println(s) // shows correctly
And the output looked like this
or, for the visually impaired with fabulously sophisticated screen readers - a bit like this:
Ju¨rgen Džemel
Jürgen Džemel
Note that Unicode has four different normalised forms but NFC is the most used on the Internet in web-pages and is also appropriate for this situation.
There are other methods in this package that may be more efficient or more useful
I read there are visual-characters in use which can only be represented in Unicode using combining characters. In other words for which there is no precomposed character. A more thorough approach would be needed to do something appropriate with those. Essentially the complications of Unicode (or perhaps more accurately of human languages and their typography) are almost without end. It sometimes seems that way to me.
References
https://blog.golang.org/normalization
https://godoc.org/golang.org/x/text/unicode/norm
https://learn.microsoft.com/en-us/windows/desktop/intl/using-unicode-normalization-to-represent-strings
For example, several characters used in writing Lithuanian have double diacritics, as they have only decomposed forms. An example is lowercase U with macron and tilde ("ū̃", U+016b U+0303, where the first code point is a lowercase U with macron and the second is a combining acute accent).
I'm currently in the process of planning out a custom Vim-like editor. It's going to be written in C and I want it to be as portable as possible between as many types of systems as possible.
I'm aware of curses (ncurses, I suppose), the tput command, and how terminals use control sequence (Esc-[ and the CSI character) to change backgrounds, move the cursor, etc.
Of the options above, it seems like ncurses would be the most recommended way of printing for the editor. BUT ncurses also has a LOT of stuff that I rather wouldn't use, and if it's reasonably feasible I'd rather make my own system. I'm not against using it, but .. anyways.
So, my question is: Is there any way to use control sequences in the vast majority of terminals without using a library? Whether through tput or another method?
Thanks!
tput(1) uses the terminfo(5) (or older termcap(5)) database, which provides the mapping from abstract commands such as move cursor to x,y to escape sequences for different terminals. When you run a command such as
$ tput cup 10 3 # move cursor to row/column 10/3
, the terminfo database is queried to find the correct string for your terminal, which is then simply written to stdout. To find the available commands (e.g., cup), look at the cap-name column in terminfo(5). tput determines what terminal you are using by looking at the TERM environment variable.
(This means that you can check what escape characters are being generated by simply doing $ tput [command] > [file] and opening [file] in some editor that can show control characters, which can be handy for exploration. The infocmp(1) command can also be used for this.)
If you use tput (or the underlying tputs(3)), your program is hence automatically portable to different terminals. This is what Vim uses by the way.
However -- in the modern world, pretty much all terminals (or terminal emulators rather) use ANSI escape codes, along with some extensions (see XTerm Control Sequences). I believe the escapes supported by xterm and their behavior have become something of a de facto standard at this point, with other terminal emulators simply copying xterm's behavior. Some text-based UI libraries like termbox seem to do away with support for non-ANSI terminals altogether, and output ANSI escapes directly.
Besides the already-mentioned termbox, there's also S-Lang, which includes a terminal handling component. I believe those are the two most popular "ncurses replacements". I'd give ncurses some time first though.
I have been writing a new command line application in C++. One platform we support is, of course, Windows.
The Windows console, by default, uses the OEM code pages depending on the locale (for example, on my machine it is CP437 / DOS.Western). I think, if it was a Windows Cyrillic version, it would have been CP866, and so on. These OEM code pages contain only 256 characters)
I think what this means is the Windows console translates the input key strokes into characters based on the default code page. (And, depending on the currently selected fonts, if there is a corresponding glyph, it is displayed).
In such a case, whether does it makes sense to use wmain/wchar_t and wide char types in my application?
Is there any advantage of using wide types? Or is there any grave problem if just char * is used?
When wide char types are used, what is the encoding of the command line arguments and environment strings - (wchar_t * argv[] and wchar_t * envp[]), i mean. Are they converted to UTF-16 by Windows CRT, or are they untouched?
Thanks for your contributions.
You seem to be assuming that Windows internally works in the specified codepage. That's not true. Windows internally works in Unicode (UTF-16). For legacy software that uses char instead of wchar_t, input and output are translated into the specified codepage.
I think what this means is the Windows console translates the input key strokes into characters based on the default code page
This is not correct. The mapping of key strokes to (Unicode) characters is defined by the keyboard layout. This is totally independent of the code page. E.g you could use a Chinese keyboard layout on a system using a Cyrillic code page.
Not only makes it totally sense to usewchar_t, it is the recommended way.
Yes, there is an advantage: your program can process all characters supported by Windows. If you use char, you can't handle any characters that are not in the current code page.
They are not converted - they stay what they are, namely UTF-16 characters.
Unfortunately, the command prompt itself is an 'ANSI' application, so it suffers from all of the limitations of 'ANSI', and this affects your application if you use it from the command prompt. However, a console application can be used in other ways, without a command prompt window, and then it can support Unicode fully.
Does the Windows console supporsts ANSI control characters?
It doesn't support many ANSI control characters by default (which is also mentioned in the wikipedia article http://en.wikipedia.org/wiki/ANSI_escape_code), but there are ways to make that possible.
Look into the answers to this question: How to load ANSI escape codes or get coloured file listing in WinXP cmd shell?
You might happen upon something useful.
I assume you're referring to ASCII control characters.
The answer is "some". You can read backspace keypresses, for example, and you can pipe-in things like the ASCII "Bell" character.
However if you mean that the Windows console automatically resolves escaped characters, such as converting "\b" into "Bell", then no, you have to do that yourself.
Note that I speak about entering keypresses directly into the console and not batch files, for that see #ProblemFactory's answer.