Why is this displaying weird characters on the console? - ruby

Take this text for example:
the three umlauts are ä, ö, and ü..
Let's assume they are in a text file, which I'm reading like this:
data = File.read("umlauts.txt")
Now, if I try to output them, I get this:
the three umlauts are Σ, ÷, and ⁿ.
If I write it to a file, they get outputted correctly. How can I make them show up properly on a windows command prompt? I'm using Ruby 1.8.6. I want to be able to perform quick debug from the command prompt.

What encoding is the file? I'm guessing probably utf-8. Windows cmd prompt does not use utf-8.
Here's a good article that covers this: http://illegalargumentexception.blogspot.com/2009/04/i18n-unicode-at-windows-command-prompt.html

Maybe set a different code page for cmd?
For explanations on encodings, read this.

Related

RStudio: keeping special characters in a script

I wrote a script with German special characters e.g. ü.
However, whenever I close R and reopen the script the characters are substituted:
Before "für"; "hinzufügen"; "Ø" - After "für"; "hinzufügen"; "Ã".
I tried to remedy it using save with encoding and choosing UTF-8 as it is stated here but it did not work.
What am I missing?
You don't say what OS you're using, but this kind of thing really only happens on Windows nowadays, so I'll assume that.
The problem is that Windows has a local encoding that is not UTF-8. It is commonly something like Latin1 in English-speaking countries. I'm not sure what encoding people use in German-speaking countries, if that's where you are. From the junk you saw, it looks as though you saved the file in UTF-8, then read it using your local encoding. The encodings for writing and reading have to match if you want things to work.
In RStudio you can try "Reopen with encoding..." and specify UTF-8, and you'll probably get your original back, as long as you haven't saved it after the bad read. If you did that, you've got a much harder cleanup to do.

In Windows, how can I restore original characters to a csv file that have been replaced with hex ASCII?

I have a csv file that contains names Like O'Brien that appear as O%27Brien or St. Something that appear as St%2ESomething. I don't have access to generate a new csv of this data, and I need the names in a correct format because I'm writing a PowerShell script to search for the names on another server.
I tried implementing something similar to the answer to this but I can't get it to work for the problem I'm experiencing.
It doesn't matter to me if the solution is in PowerShell as long as I can run it on Windows 7.
Use the Uri.UnescapeDataString method which you can call from powershell like this:
# > [Uri]::UnescapeDataString("O%27Brien")
O'Brien

How make Visual Studio make programs with 1252 encoding?

I have a solution on my Visual Studio and my program's language is Brazillian Portuguese.
Everytime I compile it and execute and it simply doesn't show the characters I wrote.
Example:
int main (void) {
printf("áéíóúàèìòù");
return 0;
}
It simply shows something really strange.
Although, I had tested another time taking the output to a file and it showed the right output, so I think the problem might be in the cmd.
Then, I searched what might be causing the problem and the results were hanging basically on the code page cmd used.
I finally used chcp 1252, but it seems it doens't work with me, so here I am. Does anyone know what code page should I use or what I can do to the source file to it show the right output? Thanks in advance.
I'm assuming C++.
The reason is that the file is saved with UTF-8 encoding, and the string literals are treated as a sequence of bytes.
So if you have "é" in your source code, it's treated as "\c9\a9" and it gets displayed in CP-437 (default Western encoding for Windows Command Prompt) as ├⌐
Solution: either:
save your source files in some 8-bit encoding (for example CP-1252), change the default encoding in VS, and set the terminal to use the same encoding,
or change your terminal to something that support UTF-8, like Cygwin.

Does Windows console supports ANSI?

Does the Windows console supporsts ANSI control characters?
It doesn't support many ANSI control characters by default (which is also mentioned in the wikipedia article http://en.wikipedia.org/wiki/ANSI_escape_code), but there are ways to make that possible.
Look into the answers to this question: How to load ANSI escape codes or get coloured file listing in WinXP cmd shell?
You might happen upon something useful.
I assume you're referring to ASCII control characters.
The answer is "some". You can read backspace keypresses, for example, and you can pipe-in things like the ASCII "Bell" character.
However if you mean that the Windows console automatically resolves escaped characters, such as converting "\b" into "Bell", then no, you have to do that yourself.
Note that I speak about entering keypresses directly into the console and not batch files, for that see #ProblemFactory's answer.

Testing for extended characters in watir-webdriver

I need to check for text with extended character set characters in my watir-webdriver scripts.
For example checking for a link has the follow text;
Weiß
I read the text from a CSV file, which when edited looks like the above text.
But when running the test in FireFox I get the following failure.
Wrong values on attribute table after add all save.
<"Wei\247"> expected but was
<"Wei\303\237>.
I tried saving it in the CSV as Wei\303\237 but the expected value then had double backslash characters.
How can I encode this in the CSV so I can check the text value safely cross platform and browser?
I had this problem, and I got around it by writing it in the spreadsheet as something like {S} and gsubbing it when I read the file into Ruby. If you gsub the text when you check the link too then basically you have your own encoding method for special characters. This is a long way around, so I'd be very interested in other answers.
The double backslash is probably because when your code reads from the CSV it escapes the backslashes in the file to preserve the text. Therefore you can't put the unicode in your CSV file. I don't really know a way around this. I hear that Ruby unicode support isn't that great, but is being worked on as of 1.9.x.

Resources