I'm trying to send non-printable ASCII characters (codes 128 - 255) through telnet to a Ruby app using Socket objects to read data in.
When I try to send \x80 through telnet, I expect Ruby to receive a string of 3 bytes: 128 13 10.
I actually receive a string of 6 bytes: 92 120 56 48 13 10.
Do I need to change something about how telnet is sending the information, or how the Ruby socket is accepting it? I've read through all the telnet jargon I can comprehend. A point in the right direction would be very much appreciated.
92 120 56 48 13 10
is in decimal ASCII:
\ x 8 0 \r \n
So you are doing something very wrong and it's not the Telnet. The escape sequence \x80 was treated literally instead of being understood as single character of code=128.
I guess you have used '\x80' instead of "\x80". Note the different quotes. If that was a single character, you can in Ruby also use ? character to denote a character: ?\x80 so for example:
"\x80\r\n" == ?\x80 + ?\r + ?\n
=> true
while of course
'\x80\r\n' == "\x80\r\n"
=> false
--
To summarize long story from the comments:
originally data to be sent was entered manually through telnet terminal
telnet terminals often don't accept any escape codes and "just send" everything they got directly, sometimes copying&pasting a text with special characters work, sometimes terminals provide some extra UI goodies to send special characters - but this time the terminal was very basic and pasting it didn't work, and there was no UI goodies
instead of entering the data manually, sending a file through a pipe to the telnet terminal seemed to work much better. Some data arrived, but not good
piping the data to nc(netcat) instead of telnet terminal almost seemed to work, binary data arrived, but it was not perfect yet
after examining the input file (the one that was piped to nc) with hexdump utility, it turned out that the file contained not exactly what we thought, it seems that the editor used to create the file saved the text with wrong encoding, and it has added some extra unwanted bytes
finally, a utility called xxd helped to produce a good binary data from a tailored hex-text; output of xxd could be directly piped to nc (netcat)
Related
I have a piece of software running on concurrent DOS 3.1, which I emulate with QEMU 5.1.
In this program, there are several options to print data. The problem is that the data arriving to my host does not correspond to the data sent.
the command to start qemu:
qemu-system-i386 -chardev file,id=imp0,path=/path/to/file -parallel chardev:imp0 -hda DISK.Raw
So the output sent on parallel port of my guest is redirected to /path/to/file.
When I send the charactère 'é' from CDOS:
echo é>>PRN
The code page used on CDOS is Code Page 437, and in this charactere set, the charactère é is represented by 0x82, but on my host, instead, I receive the following:
cp437 é -> 0x82 ---------> host -> x1b52 x017b x1b52 x00
So I tried something else. I wrote the charactère 'é' in a file, and sent the file with nc.exe (from brutman and libmtcp), and with nc, the value stays 0x82.
So my question, what happen when I send my data to virtual parallel port? When does my data get transformed? Is it the parallel port on Concurrent DOS? Is it the QEMU? I can't figure out how to send my data through LPT1 properly.
I also tried this:
qemu-system-i386 -chardev socket,id=imp0,host=127.0.0.1,port=2222,server,nowait -parallel chardev:imp0 -hda DISK.Raw
I can read the socket fine, but same output as when I write in a file, the é get transformed to x1b52 x017b x1b52 x00.
The byte sequence "1B 52 01 7B 1B 52 00" is using Epson FX-style printer escape sequences (there's a reference here). Specifically, "1B 52 nn" is ESC R n which selects the international character set, where character set 0 is USA and 1 is France. So the sequence as a whole is "Select French character set; print byte 0x7b; select US character set". In the French character set for this printer standard, 0x7B is the e-acute é.
This is almost certainly the CDOS printer driver assuming that the thing on the end of PRN: is an Epson printer and emitting the appropriate escape sequences to output your text on that kind of printer.
OK, so i finally figured it out... After hours of searching where this printer.sys driver could be, or how to remove it, no Concurrent DOS, the setup command is "n". And of course, it is not listed in the "help" command...
Anyway, in there you can setup your printer, and select "no conversion" for the port you want. And it was actually on Epson MX-80/MX-100.
So thanks to Peter's answer, you led me to the right path !
Struggling to get using xtermjs, and have some questions which aren't covered in the official documentation, at least I didn't find.
I understand that when I use some app within the terminal, for example, Vim terminal need to be switched to alternate buffer, after I quit from the app, terminal switched back to the normal buffer. Is this right?
To switch between buffers (and overall to control terminal behavior) I need to use a control sequence. It isn't something unique to xterm.js, but the common pattern and control sequence is unified between terminals?
Control sequence to switch to the alternate buffer is CSI ? Pm h with parameter 47 according to documentation:
DECSET DEC Private Set Mode CSI ? Pm h Set various terminal attributes.
Where
paramAction 47 - Use Alternate Screen Buffer.
How to use this control sequence with xterm.js, for example, I want to switch to alternate buffer. What string should be used in terminal.write(...)?
Yes, see description here in this question Using the “alternate screen” in a bash script
The alternate screen is used by many "user-interactive" terminal applications like vim, htop, screen, alsamixer, less, ... It is like a different buffer of the terminal content, which disappears when the application exits, so the whole terminal gets restored and it looks like the application hasn't output anything
Yes, ANSI escape code
ANSI escape sequences are a standard for in-band signaling to control the cursor location, color, and other options on video text terminals and terminal emulators. Certain sequences of bytes, most starting with Esc (ASCII character 27) and '[', are embedded into the text, which the terminal looks for and interprets as commands, not as character codes.
Control sequence to switch to alternate buffer: CSI ? 47 h
Control sequence to switch to regular buffer: CSI ? 47 l
Code to apply control sequence to switch to alternate buffer:
terminal.write("\x9B?47h"); //CSI ? 47 h
I need to send a UNICODE string message to A16 COMS (Mainframe) via TCP/IP. What algorithm do I need , what transformation of a string. String can contain one or more UNICODE Characters.
While sending ASCII only based string I convert(map) it to EBCDIC and send via TCP/IP connection. I know that EBCDIC doesn't handle UNICODE Character. Besides, I can send via TCP IP only byte array, where in case of ASCII string one character maps to one array cell. In the case of UNICODE character - it can occupy from 1 to 4 byte array cells.
The question is how do I send the UNICODE containing string to A16 Mainframe.
Further clarification:
When I run the code, the TCP client cannot receive any response. It passes timeout and gives an error. Increasing timeout does not help. C# can convert an UNC string to UTF-8 either using System.Text.Encoding or even with an algorithm - almost manually. Those are not a problem. Problem is that A16 COMS expects “one character = one byte”, (mapped to EBCDIC). And with UTF-8 one character may occupy 2, 3 or 4 cells of an array. Now EBCDIC mapping itself does not help, because EBCDIC is designed to work with non-unicode (ASCII based) strings.
I hope that someone whoever did this at some point in his career might read my post because not much can be done by figuring out. Can it be done with TCP Client and its NetworkStream? Send method has only array of bytes in its signature, but with utf-8 array of bytes can be so much longer than the limit.
It is a question asking to share experience, not knowledge.
I have a little program that prints out a direcory structure.
It works fine except when the direcory names contain german umlaut characters.
In this case int prints a blank line after the directory line.
I'm running Python 3.50 on Windows 7 64bit.
This Code ...
class dm():
...
def print(self, rootdir=None, depth=0):
if rootdir is None:
rootdir = self.initialdir
if rootdir in self.dirtree:
print('{}{} ({} files)'.format(' '*depth,
rootdir,
len(self.dirtree[rootdir]['files'])))
for _dir in self.dirtree[rootdir]['dirs']:
self.print(os.path.join(rootdir, _dir), depth+1)
else:
pass
...produces the following output:
B:\scratch (11 files)
B:\scratch\Test1 (3 files)
B:\scratch\Test1 - Kopie (0 files)
B:\scratch\Test1 - Übel (0 files)
B:\scratch\Test2 (3 files)
B:\scratch\Test2\Test21 (0 files)
This is so with codepage set to 65001. If i change the codepage to e.g. 850 then the blank line disappears but of course the "Ü" isn't printed correctly.
The structure self.dirtree is a dict of dicts of lists, is parsed with os.walk and seems OK.
Python or Windows? Any suggestions?
Marvin
There are several bugs when using codepage 65001 (UTF-8) -- all of which are due to the Windows console (i.e. conhost.exe), not Python. The best solution is to avoid this buggy codepage, and instead use the wide-character API, such as by loading win_unicode_console.
You're experiencing a bug that exists in the legacy console that was used prior to Windows 10. (It's still available in Windows 10 if you select the option "Use legacy console".) The console decodes the UTF-8 buffer to UTF-16 and reports back that it writes b'\xc3\x9c' (i.e. "Ü" encoded as UTF-8) as one character, but it's supposed to report back the number of bytes that it writes, which is two. Python's buffered sys.stdout sees that apparently one byte wasn't written, so it dutifully writes the last byte of the line again, which is b'\n'. That's why you get an extra newline. The result can be far worse if a written buffer has many non-ASCII characters, especially codes above U+07FF that get encoded as three UTF-8 bytes.
There's a worse bug if you try to paste "Ü" into the interactive REPL. This bug is still present even in Windows 10. In this case a process is reading the console's wide-character (UTF-16) input buffer encoded as UTF-8. The console does the conversion via WideCharToMultiByte with a buffer that assumes one Unicode character is a single byte in the target codepage. But that's completely wrong for UTF-8, in which one UTF-16 code may map to as many as three bytes. In this case it's two bytes, and the console only allocates one byte in the translation buffer. So WideCharToMultiByte fails, but does the console try to increase the translation buffer size? No. Does it fail the call? No. It actually returns back that it 'successfully' read 0 bytes. To Python's REPL that signals EOF (end of file), so the interpreter just exits as if you had entered Ctrl+Z at the prompt.
I'm building an embedded system that talks RS-232 to a serial terminal, "full-duplex" style, so the host echos what the terminal sends.
I know printables (at least ASCII 0x20 to 0x7E) are normally echoed, but which control characters (if any) are normally echoed in a case like this?
Is there some Posix or other standard for this? How does Linux do it?
For example, if I type a ^C at the terminal, should the ^C be echoed by the host? What about ^G (bell)? Etc?
I'll try to answer my own question. Here's what I'm planning to do:
Printables (ASCII 0x20 to 0x7E) are echoed.
CR is echoed as CR LF (because the Enter key on terminals normally
sends CR, and an ANSI terminal requires CR to move the cursor to the
left, and LF to move it to the next line).
BS (backspace, 0x08) and DEL (0x7F) are treated identically and are
echoed as "\b \b" (in C syntax) - that is, backspace, space,
backspace, to erase the last character on the terminal.
All other control characters are not echoed. (Not to say they're not processed, but they're not automatically echoed. What they do is outside the scope of what I'm asking about.)
My reasoning is that the remaining control characters are usually meant to do something, and that something is meant to happen at the host, not at the terminal.
For example DC1/DC3 (^Q/^S) are often used as flow control ("XON/XOFF") - it doesn't make sense to echo the ^S (XOFF) back to the terminal, since the purpose is to flow control the host. Echoing XOFF back to the terminal would flow control the terminal, clearly not what was intended. So echoing this makes no sense.
Similarly, ANSI escape sequences sent by the terminal (cursor up/down/left/right, etc.) should not be echoed.
Bottom line - echo printables only. Control characters as a rule should not be echoed, except on a case-by-case basis depending on their function (carriage return, backspace, etc.).
I'd like comments on whether or not this is the right thing to do (and why).