We have an old 32bit PowerBuilder application (3rd party) that was written for Windows XP and, although it runs under Windows 8, we have noticed that the password masking character is a different sized character (XP is a small black circle and Windows 8 is a larger black circle). This is a problem because the application was written to limit the space available in the password field (22 characters in XP but only 13 in Windows_8). Our password policies require 15-character minimum passwords, and obviously these wont fit when we run the application in Windows 8.
Because the character sizes change with the operating system (not application side), we suspect the problem is with a .dll file or a font that is being referenced by the PowerBuilder application. Are there any ideas where the password mask is being called?
As a workaround, maybe you could be able to type more characters if you increase the edit width?
You could give a try with either uuspy or WinCheat.
If it helps, you could then find a mean to script the send of a WM_SIZE to the control or inject a dll to do so...
I'm afraid you're looking for a complicated solution to a simple problem.
I'm guessing this is a DataWindow-independent (i.e. not inside a DataWindow object) SingleLineEdit. There are generally two reasons you can't type all the characters you want (regardless of the Password attribute) into one of these fields:
You have a non-zero value set in the Limit attribute. However, since your limitation is varying, this likely isn't your culprit.
You have the attribute AutoHScroll (automatic horizontal scroll) set to FALSE (or unchecked in the IDE painters). This means when the field fills up graphically, you can't enter more characters. (This also means, with a variable width font, which most are, you fit more i's into the field than W's; this isn't a good way to go when data length is important. In fact, it's seldom a good way to go at all.)
Your symptom sounds like AutoHScroll is turned off (the default is on when you drop and SLE on the window). Check that back to on, optionally set the Limit attribute to something that makes sense (if the original intention was to constrain the number of characters entered, and it was just implemented poorly), and my guess is that you'll be good to go.
Good luck,
Terry
Related
Windows uses some encoding table for non-unicode applications to map characters from unicode table to 1-byte table. There are many predefined character sets, user can choose one in windows settings. I need to create a custom character set. Where can I find some information about that process? I tried to Google it, but didn't have any luck, I guess, few people are doing that.
AFAIK, you can't do that, I don't think there's even a way to write some kernel mode "driver" for it, but, haven't looked into these things for a while, maybe there is some way (now).
In any case, you might be better off using a library you can change/update, such as libiconv.
UPDATE:
Since you don't have the source code, you're in a very unfortunate position.
For all string resources (in EXE or any DLLs or, though unlikely, in some other file(s)), you can "read them out" and figure out what's the code page used in them and change it (and the strings themselves), tweaking it in some way that would achieve your purpose - to have the right glyphs appear (yes, you might actually see different glyphs in Notepad, but, who cares if you application shows the right one(s) - FWIW, for such hacks, it's best to use a hex-editor). Then, of course, "put" the (changed) resources back in (EXE/DLL). But, it's quite possible not all strings are in resources, and that's when the "real" problems start.
There's any number of hacks that could have been done here. Your best option is to use some good debugger (WinDbg or better) and figure out what's going on and how are character sets handled = since you don't have the source code, it's gonna be quite painful. You want to find out:
Are the default charset(s) used (OEM/ANSI), or some specific (via NLS APIs)?
Whatever charset is used, is it a standard one or not? The charset here is the "code" Windows assigns to it. Look at Windows lists of available charsets.
Is the application installing fonts? If it is, use a font tool to examine them - maybe it has a specific (non-standard?) code-page supported in it.
Is the application installing some some drivers. If it is, the only way to gain more insight is to use a kernel debugger (which is very tricky and annoying, but, as already said, you're in an unfortunate situation).
It appears that those tables are located at C:\Windows\system32*.nls. I'm not sure whether there's proper documentation for their structure. There's some information in Russian here. Also you might want to tinker with registry at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls
I'm using Notepad++ as my test case, but I was also able to replicate the exact same behavior on a program I wrote that (indirectly) uses the Win32 API, so I'm trusting at this time that the behavior is specific to Windows, and not to the individual programs.
At any rate, I'm trying to use alt-codes to transmit unicode text to a program. Specifically, I'm using the "Alt Plus" method described in the first point here. For any and all alt-codes below 0xffff, this method works perfectly fine. But the moment I try to transmit an alt-code above this threshold, the input only retains the last four digits I typed.
From notepad++ using UCS-2 Big Endian:
To generate this output, I used the following keystrokes:
[alt↓][numpad+][numpad1][numpad2][numpad3][numpad4][alt↑]
[alt↓][numpad+][numpad1][numpad2][numpad3][numpad5][alt↑]
[alt↓][numpad+][numpad1][numpad2][numpad3][numpad4][numpad5][alt↑]
Knowing that the first two bytes are encoding specific data, it's clear that the first two characters transmitted were successful and correct, but the third was truncated to only the last 2 bytes. Specifically, it did not try to perform a codepoint→UTF16 conversion; it simply dropped the higher bits.
I observed identical behavior on the program I wrote.
So my questions then are in two parts:
Is this a hard constraint of Windows (or at least the Win32 API), or is it possible to alter this behavior to split higher codepoints into multiple UTF-16 (or UTF-8) characters?
Is there a better solution for transmitting unicode to the application that doesn't involve writing a manual parser in the software itself?
To expand on that second point: In the program I wrote, I was able implement the intended functionality by discarding the "Text Handling" code and writing, by hand, a unicode parser in the "Key Handling" code which would, if the [alt] key were held down, save numpad inputs to an internal buffer and convert them to a codepoint when [alt] was released. I can use this as a solution if I must, but I'd like to know if there's a better solution, especially on the OS side of things, rather than a software solution.
Style guides for various languages recommend a maximum line length range of 80–100 characters for readability and comprehension.
My question is, how do people using tabs for indentation comply with the line length limits? Because, if the length of a tab is set to 2 in your editor, the same code could be over 80 characters long in someone else's editor with tab length set to 3/4/6/8.
Some suggest that I mentally see my tab length at 8 characters. This isn't realistic and won't let me focus on programming as I'd have to check it for every other line.
So, how do you (if you use only tabs in your code) do it?
Almost all languages have a convention for tab size. Most projects have a convention for tabs size as well, which is usually (though not always) the same as the language's tab size.
For example, Ruby is 2, Python & PHP is 4, C is 8, etc.
Sticking with the project's tab length, or the language tab size if there is no obvious project tab size, is by far the most sane thing to do, since this what almost all people will use, except perhaps a few.
The ability to set a different tab size is probably the largest advantage of using tabs instead of spaces, and if a user wants to deviate from the standard, then that's fine, but there is no reasonable way to also comply with the maximum line length in every possible situation.
You can compare it to most web sites; they're designed for a 100% zoom level. Zooming in or changing the font size almost always works, but some minor things may break. It's practically impossible to design for every possible zoom level & font size, so artefacts are accepted.
If you want to make sure the code complies to both the line length and tab size, you should probably use spaces, and not "real" tabs. Personally, I feel these guidelines are exactly that: guidelines; and not strict rules, so I don't think it matters that much. Other people's opinion differs on this matter, though.
Personally, I like tabs, for a number of reasons which are not important now, but only deviate from the standard temporarily, usually when editing heavy-nested code where I would like some more clarity which block I'm editing.
You probably don't write very long lines often, so you probably won't have to reformat your code too often.
Therefore, you can use an approach where you periodically check your file for compliance, and fix anything that's out of whack (for example, you could do this before submitting the file to code-review).
Here's a trivially simple Python script that takes a file on stdin and tells you which lines exceed 80 chars:
TABSIZE = 8
MAXLENGTH = 80
for i, line in enumerate(sys.stdin):
if len(line.rstrip('\n').replace('\t', ' '*TABSIZE)) > MAXLENGTH:
print "%d: line exceeds %d characters" % (i+1, MAXLENGTH)
You can extend this in various ways: have it take a filename/directory and automatically scan everything in it, make the tab size or length configurable, etc.
Some editors can be set up to insert a number of spaces equal to your tab length instead of an actual tab (Visual Studio has this enabled by default). This allows you to use the tab key to indent while writing code, but also guarantees that the code will look the same when someone else reads it, regardless of what their tab length may be.
Things get a little trickier if the reader also needs to edit the code, as is the case in pretty much every development team. Teams will usually have tabs vs. spaces and tab length as part of their coding standard. Each developer is responsible for making their own commits align with the standard, generally by setting up their environment to do this automatically. Changes that don't meet the standard can be reverted, and the developer made to fix them before they get integrated.
If you are going to have style guidelines that everyone has to follow then one way to insure that all code conforms to a column width maximum of 100 is to use code formatter such as Clang Format on check-in for example they include a option to set the ColumnLimit:
The column limit.
A column limit of 0 means that there is no column limit. In this case, clang-format will respect the input’s line breaking decisions
within statements unless they contradict other rules.
Once your team agrees to the settings then no one has to think about these details on check-in code formatting will be automatically done. This requires a degree of compromise but once in place you stop thinking about pretty quickly and you just worry about your code rather then formatting.
If you want to ensure you don't go over the limit as prescribed in a team/project coding standard, then it would be irrational to not also have a teamp/project standard for tab width. Modern editors let you set the tab width, so if your team decides and everyone sets their editor to that tab width, everyone will be on the same page.
Think if a line is 100 characters long for you and someone comes by and says it's over because their editor is set to 8 while yours is 2. You could tell them they have the wrong tab width and be making just as valid a point as they could if they told you your tab width was wrong if there's no agreed upon standard. If there's not, the 8-width guy will be wrong when he writes a 100 character line and another developer who prefers 10-width tabs complains.
Where tabs don't inherently have a defined width, it's up to the team to either choose one or choose not to engage in irrational arguments about it.
That said, spaces are shown to be correlated with higher income, so, y'know... ;)
I'm using the keyboard to enter multi-lingual text into a field in a form displayed by a Web browser. At an O/S-agnostic and browser-agnostic level, I think the following events take place (please correct me if I'm wrong, because I think I am):
On each keypress, there is an interrupt indicating a key was pressed
The O/S (or the keyboard driver?) determines the keycode and converts that to some sort of keyboard event (character, modifiers, etc).
The O/S' window manager looks for the currently-focused window (the browser) and passes the keyboard event to it
The browser's GUI toolkit looks for the currently-focused element (in this case, the field I'm entering into) and passes the keyboard event to it
The field updates itself to include the new character
When the form is sent, the browser encodes the entered text before sending it to the form target (what encoding?)
Before I go on, is this what actually happens? Have I missed or glossed over anything important?
Next, I'd like to ask: how is the character represented at each of the above steps? At step 1, the keycode could be a device-specific magic number. At step 2, the keyboard driver could convert that to something the O/S understands (for example, the USB HID spec: http://en.wikipedia.org/wiki/USB_human_interface_device_class). What about at subsequent steps? I think the encodings at steps 3 and 4 are OS-dependent and application-dependent (browser), respectively. Can they ever be different, and if yes, how is that problem resolved?
The reason I'm asking is I've run into a problem that is specific to a site that I started to use recently:
Things appear to be working until step 6 above, where the form with the entered text gets submitted, after which the text is mangled beyond recognition. While it's pretty obvious the site isn't handling Unicode input correctly, the incident has led me to question my own understanding of how things work, and now I'm here.
Anatomy of a character from key press to application:
1 - The PC Keyboard:
PC keyboards are not the only type of keyboard, but I'll restrict myself to them.
PC Keyboards surprisingly enough do not understand characters, they understand keyboard buttons. This allows the same hardware used by a US keyboard to be used for QEWERTY or Dvorak and for English in any other language that uses the US 101/104-key format (some languages have extra keys.)
Keyboards use standard scan codes to identify the keys, and to make matters more interesting keyboards can be configured to use a specific set of codes:
Set 1 - used in the old XT keyboards
Set 2 - used currently and
Set-3 used by PS/2 keyboards which no one uses today.
Sets 1 and 2 use make and break codes (i.e. press down and release codes). Set 3 uses make and break codes just for some keys (like shift) and only make codes for letters this allows the keyboard itself to handle key repeat when the key is pressed for long. This is good to offload key repeat processing from the PS/2 8086 or 80286 processor but rather bad for gaming.
You can read more about all this here and I also found a Microsoft specification for scan codes in case you want to build and certify your own 104 key windows keyboard.
In any case we can assume a PC Keyboard using set 2, which means it sends to the computer a code when a key is pressed and one when a key is released.
By the way the USB HID spec does not specify the scan codes sent by the keyboard it only specifies the structures used to send those scan codes.
Now since we're talking about hardware this is true for all operating systems, but how every operating system handles these codes may differ. I'll restrict myself to what happens in Windows, but I assume other operating systems should follow roughly the same path.
2 - The Operating System
I don't know exactly how exactly Windows handles the keyboard, which parts are handled by drivers, which by the kernel and which in user mode; but suffice to say the keyboard is periodically polled for changed to key state and the scan codes are translated and converted to WM_KEYDOWN/WM_KEYUP messages which contain virtual key codes.
To be precise Windows also generates WM_SYSKEYUP/WM_SYSKEYDOWN messages and you can read more about them here
3 - The Application
For Windows that is it, the application gets the raw virtual key codes and it is up to it to decide to use them as is or translate them to a character code.
Nowadays nobody writes good honest C windows programs, but once upon a time programmers used to roll out their own message pump handling code and most message pumps would contain code similar to:
while (GetMessage( &msg, NULL, 0, 0 ) != 0)
{
TranslateMessage(&msg);
DispatchMessage(&msg);
}
TranslateMessage is where the magic happens. The code in TranslateMessage would keep track of the WM_KEYDOWN (and WM_SYSKEYDOWN) messages and generate WM_CHAR messages (and WM_DEADCHAR, WM_SYSCHAR, WM_SYSDEADCHAR.)
WM_CHAR messages contain the UTF-16 (actually the UCS-2 but lets not split hairs) code for the character translated from the WM_KEYDOWN message by taking into account the active keyboard layout at the time.
What about application written before unicode? Those applications used the ANSI version of RegisterClassEx (i.e. RegisterClassExA) to register their windows. In this case TranslateMessage generated WM_CHAR messages with an 8 bit character code based on the keyboard layout and the active culture.
4 - 5 - Dispatching and displaying characters.
In modern code using UI libraries it is entirely possible (though unlikely) not to use TranslateMessage and have custom translation of WM_KEYDOWN events. Standard Window controls (widgets) understand and handle WM_CHAR messages dispatched to them, but UI libraries/VMs running under windows can implement their own dispatch mechanism and many do.
Hope this answers your question.
Your description is more or less correct.
However it is not essential to understand what is wrong with the site.
The question marks instead of characters indicate a translation between encodings has taken place, as opposed to a misrepresentation of encodings (which would probably result in gibberish.)
The characters used to represent letters can be encoded in different ways. For example 'a' in ASCII is 0x61, but 0x81 in EBCDIC. This you probably know, what people tend to forget is that ASCII is a 7 bit code containing only English characters. Since PC computers use bytes as their storage unit, early on the unused top 128 places in the ASCII code where used to represent letters in other alphabets, but which one? Cyrillic? Greek? etc..
DOS used code page numbers to specify which symbols where used. Most (all?) of the DOS code pages left the lower 128 symbols unchanged so English looked like English no matter what code page was used; but try to use a Greek code page to read a Russian text file and you'd end up with Greek and symbols gibberish.
Later Windows added it's own encodings some of the with variable length encodings (as opposed to DOS code pages in which each character was represented by a single byte code,) and then Unicode came along introducing the concept of code points.
Under code points each character is assigned a code point identified by a generic number, i.e. the code point is identified by a number not a 16 bit number.
Unicode also defined encodings to encode code points into bytes. UCS-2 is a fixed length encoding that encodes the code point numbers as 16 bit numbers. What happens to code points with more than 16 bits, simple they cannot be encoded in UCS-2.
When translating from an encoding that supports a specific code point to one that doesn't the character is replaced with a specified character, usually the question mark.
So if I get a transmission in UTF-16 with the hebrew character aleph 'א' and translate it to say latin-1 encoding which has no such character (or formally latin-1 has no code point to represent the unicode code point U+05D0) I'll get a question mark character instead '?'
What is happening in the web site is exactly that, it is getting your input just fine but it is being translated into an encoding that does not support all the characters in your input.
Unfortunately, unlike encoding misrepresentations which can be fixed by manually specifying the encoding of the page there is nothing you can do to fix this on the client.
A related problem is using fonts that do not have the characters shown. In this case you'd see a blank square instead of the character. This problem can be fixed on the client by overriding the site CSS or installing appropriate fonts.
I need to send some keys to another program, and I tried keystroke in AppleScript. Everything worked well until I found that when I sent numbers to a Windows program running in Parallels virtual machine, it didn't work, but instead, it change the position of cursor.
Then I use program keyboardSee to find what's wrong, and I found that all numbers and some symbols are mapped to the keys on numeric keypad, not the numeric line. So maybe the NumLock is off in the virtual machine, and the keys mapped to keypad trigger not numeric but control keys.
I found some people says that use key code can solve this problem, but I cannot find a perfect char-to-keycode table, and I also notice that people said key code could be different for the same character in different keyboard layout.
So how can I solve this problem perfectly, means how can I make it always map numbers to numeric line, and perform the same in no matter what keyboard layout?
I'm afraid to say that I don't think this is possible at the moment. In general, Mac apps know that macs don't even have the same characters on their Numberpads that PCs do (no arrows, no num lock, etc), so the distinction is meaningless. Some poorly ported Mac apps do make a distinction though, such as World of Warcraft (although it isn't the worst port out there by any means). I believe parallels has a num lock in its menus somewhere (if I recall correctly), so you should be able to get around it that way.