Control Characters and How OS/TextEditors interprets them? - windows

I was going thru some content about control characters especially newline character(will focus on this).After going thru
http://en.wikipedia.org/wiki/Control_characters, got to know that \n is the line character in unix
while it is \r\n in windows. Now i got the question how OS comes into picture when iterpreting
ASCII Codes becoz i was under impression when we type any given character on keyboard, any OS send the same
bits and editor interprets that bit and display the corresponding character. Looks like this understanding is
wrong, Because different bit is sent in case of unix(\n) and windows(\r\n) when we press ENTER(new line terminator).As per
new understanding if we press ENTER on diff OS(say unix and windows),different bits are sent to editor and its
responsibilty of text editor to show the typed stuff in new line keeping the underlying OS in picture.Please let me
know if my understanding is correct as this will help me to understand other basics also?
Next question is if above is correct, what can be the reason different OS treat some control characters differently
when they treat all other characters equally? Is it becoz specific bits are already reserved in specific OS?

How an application treats keyboard input varies a bit, actually. When you press return the application is under no obligation to actually generate LF or CR+LF anywhere. E.g. it might decide to just end the current paragraph object and start a new one (e.g. in a word processor). If it's a Windows text editor then it will probably just write CR+LF into the file, while on Unix it just writes an LF.
They keyboard itself is very, very far removed from things you see on the screen or even on the disk. This goes through scan codes, keyboard layouts and other transformations before it ends up as text or markup somewhere.

Related

Replace any keyboard character(s) with keyboard shortcut or different keystroke

For a project using a barcode scanner I need to know if it is possible to replace a special character like
!
"
§
$
%
=
with different keyboard strokes like
arrow down or arrow up or even shortcuts like
ctrl+a or ctrl+v?
Would be also possible if a specific series of characters resulted in a keystroke/keyboardshortcut,
for instance this text InsertArrowLeftHere would result in this keypress arrow left
Is there any way to make something like this work?
A bar-code scanner (unless is used with special hardware in-between) just reads the data, and sends keystrokes to the computer as keyboard interrupts. How the bar-code "string" is interpreted depends on software that, at that moment, has the focus. If you open a Notepad and read something with the barcode scanner, the number will be printed in notepad. In many cases no software comes with the scanner because there is no need for that.
But your software (the program that receives the data from the scanner) can catch anything typed in the textbox (or other control that has the focus, for example the whole form can catch the keystrokes). Maybe you can also identify where the keystrokes come from, (means: from which keyboard-input device: keyboard1, keyboard2, barcodescanner etc.) and act accordingly (if from keyboard1 or keyboard2 do nothing, if from barcodescanner then do this).

Has anyone heard of this strange bug with the standard Windows message box?

Years ago, I was messing around with Visual Basic and I discovered a bug with the MsgBox function. I tried searching for it, but nobody had ever said anything about it. It's not just with Visual Basic though; it's with anything that uses the standard Windows MessageBox API call.
The bug is triggered when the title text has more than one character, and the first character is a lowercase 'y' with an umlaut ('ÿ'). What's so special about this character? It almost definitely not the character itself, but rather its ASCII value that's special. 'ÿ' is character 255 (0xFF), meaning it's the highest value that can be stored in an unsigned byte, and all its bits are set to 1.
What does this bug do? Well, there are two different possibilities, which depend on the number of characters in the title text. If there are an even number of characters (unless it's 2) in the title text, no message box appears, and you just hear the alert sound. If there are two characters in the title text, or any odd number other than 1 (in which case the bug wouldn't be triggered)...then this happens:
And that's not all--the message will also be truncated to one line. It seems like the kind of bug that would occur in at least one semi-high-profile incident, considering how often this API call is used. Are there any reports of this on the Internet, or anything showing what could cause it? Maybe it's a Unicode-related glitch, like that "bush hid the facts" glitch in Notepad?
I made a program in case you want to play around with this; download it here.
Alternatively, copy the following into Notepad, save it with a .vbs extension, and double-click it to display the dialog box seen above:
MsgBox "Windows 3.1 font, anyone?", 0, "ÿ ODD NUMBER!"
Or for a different font:
MsgBox "I CAN HAS CHEEZBURGER?", 0, "ÿ HImpact"
EDIT: It seems that if the first four characters are ÿ's, it doesn't ever display the message, even if there's an odd number of characters.
This is a bug with dialog templates generally. It is not a message box bug as such.
For example, in Visual Studio create the default win32 application. In the .rc file, change the caption in the template for the about box from
CAPTION "About sampleapp"
to
CAPTION "ÿT"
and the bug will manifest itself when you display the about box.
In the DLGTEMPLATEEX documentation note that the menu and class name have type sz_Or_Ord which means either a null-terminated string or 0xFFFF followed by a single word resource identifier.
Windows incorrectly applies a similar scheme to the dialog title: if the first character is 0xFF then it treats the title as being two WORDs long, but only when it is trying to locate the font information. When it is displaying the title it correctly treats the title as a string.
In other words, Windows is looking for the font information inside the title string. In most case this won't specify a valid font, so Windows defaults to the system font.
To prove this, I constructed a dialog template in memory (based on this). Once this was working I deleted the code that writes the font information to the template and used the dialog title "ÿa\xd\x200\x21SimSun". This displays the dialog in italic SimSun because windows is reading the font information from the title string.
This bug is likely a hangover from 16-bit Windows, where (I guess) 0xFF was used as the resource ID marker.
A strange bug. I suspect the symptoms are the result of the way the MessageBox() actually displays the dialog.
Internally, MessageBox() builds a dialog template dynamically. If you look at the description of a DLGTEMPLATE structure you'll find the following nugget of information:
In a standard template for a dialog box, the DLGTEMPLATE structure is
always immediately followed by three variable-length arrays that
specify the menu, class, and title for the dialog box. When the
DS_SETFONT style is specified, these arrays are also followed by a
16-bit value specifying point size and another variable-length array
specifying a typeface name.
So, the in-memory layout of a dialog template has the font specification immediately following the dialog box title.
Visual Basic does not use Unicode and so the function you're calling is actually MessageBoxA(). This is simply a thunk that converts the passed-in strings from multibyte to Unicode and then calls MessageBoxW().
I believe what's happening is that, for some reason, the conversion of that string from multibyte to Unicode is either going wrong, or returning a spurious length value. This has the knock-on effect, when the dialog template is built in memory, of corrupting the memory immediately following the title string - which, as we know, is the font specification.

How to create code box without rich text formatting

My question is related to this topic How to copy and paste code without rich text formatting? except its from the opposite viewpoint: I'm creating a document from PowerPoint in which I have code snippets in text boxes. I want to make the document as simple as possible by making the code snippet text boxes easy to copy and paste the code into a terminal to run without editing anything. However, the way I have it right now is that when I copy and paste it keeps the formatting and I have to go though letter by letter to erase the end of line symbols. How should I format this in PowerPoint?
You can get rid of most formatting by copy/pasting from PPT to Notepad and then copy/pasting from there to your terminal program, or if the latter has a Paste Special command, you should be able to paste as plain text, which'd get rid of formatting.
Line/Paragraph breaks are another matter. If the end of line symbols are the only formatting problem when you've pasted the text into a terminal (emulator program, I assume), it sounds as though the terminal's using CR or LF as a line ending, whereas PPT's using CR/LF pairs. It might only be necessary to reconfigure the terminal software to use CR/LF.
It's worth a look at this page on my site, where I explain what line and paragraph ending characters are used by different versions of PowerPoint in different situations.
Paragraph endings and line breaks
http://www.pptfaq.com/FAQ00992_Paragraph_endings_and_line_breaks.htm
Sorry, my mistake was not realizing that PowerPoint auto formats hyphens and quotation marks to make them stylized, and the terminal was not recognizing the symbols. All I did was type in a quotation mark/hyphen then copying that before I pressed the space bar after it to save the original formatting.

How to manipulate the contents of CEdit?

I have a situation with edit control and I need some guidance. The text editor functions normally in most cases but in other cases, depending on the last few characters before typing and based on the characters typed, the last few characters must be replaced with different characters.
The solution that looks obvious to me is to have a character buffer, GetWindowText() just before the contents are changed, add the characters typed into the buffer, manipulate the buffer if necessary and then SetWimdowText().
I know the edit control has its own buffer. So is this the right approach to have my own buffer or are there ways I can share the buffer with it etc? The editor might not have more than 4MB worth of characters.
I need this to work on Windows 7 and XP, not keen on older ones.I use MFC.
Thanks for your help.
You don't need your own buffer and indeed it would be dangerous to have one since it will likely get out of synchronisation.
But you don't need to set the entire edit text at once. From the documentation:
Also, if an edit control is multiline, get and set part of the control's text by calling the CEdit member functions GetLine, SetSel, GetSel, and ReplaceSel.
ReplaceSel is what you are looking for I think. Although this text talks about multiline edit controls, SetSel, ReplaceSel etc. work fine with single line edit controls.

How do shell text editors work?

I'm fairly new at programming, but I've wondered how shell text editors such as vim, emacs, nano, etc are able to control the command-line window. I'm primarily a Windows programmer, so maybe it's different on *nix. As far as I know, it's only possible to print text to a console, and ask for input. How do text editors create a navigable, editable window in a command line environment?
By using libraries such as the following which, in turn, use escape character sequences
NAME
ncurses - CRT screen handling and optimization package
SYNOPSIS
#include
DESCRIPTION
The ncurses library routines give the user a terminal-independent
method of updating character screens with reasonable optimization. This
implementation is ‘‘new curses’’ (ncurses) and is the approved replacement
for 4.4BSD classic curses, which has been discontinued.
[...snip....]
The ncurses package supports: overall screen, window and pad
manipulation; output to windows and pads; reading terminal input; control
over terminal and curses input and output options; environment query
routines; color manipulation; use of soft label keys; terminfo capabilities;
and access to low-level terminal-manipulation routines.
Short answer: there are libraries for it (like curses, slang).
Longer answer: doing things like jumping around with the cursor or changing colors are done by printing special character sequences (called escape-secquences, because they start with the ESC character).
Learning about ncurses might be a good starting point.
There is an old protocol called vt100 based on a "VT100" terminal. It used codes starting with escape to control cursor position, color, clearing the screen, etc.
It's also how you get colored prompts.
Google VT100 or "terminal escape codes"
edit: I Googled it for you: http://www.termsys.demon.co.uk/vtansi.htm
You will also notice this if you type "edit" in a Windows command line console. This "feature" is not unique to unix-like systems, though the concepts for manipulating the windows console in that way are quite different to in unix.
On Unix systems, a console window emulates an ancient serial terminal (usually a VT100). You can print special control characters and escape sequences to move the cursor around, change colors, and do other special effects. There are libraries to help handle the details; ncurses is the most popular.
On Windows, the [Win32 Console API](http://msdn.microsoft.com/en-us/library/ms682073(VS.85%29.aspx) provides similar functionality, but in a rather different manner.
Type "c:\winnt\system32\edit" or "c:\windows\system32\edit" at the command line, and you'll be shown a command line text editor.
People are mostly right about the ESC character being used to control the command screen, but some older programs also write characters directly to the memory space used by the Windows Command Line screen.
In order to control the command line window, you used to have to write your own windowing forms, entry box, menus, etc. You'd also have to wrap all that up in a big loop for handling events.
More Windows command line specific, the app typically calls DOS or BIOS functions that do the same. Sometimes ANSI command code support is available, sometimes it isn't (depending on exact MS OS version and whether or not it's configured to load it).

Resources