Conversion to Python 3 using 2to3 (and UTF-8) [closed] - utf-8

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm given a task of converting a bunch of code written in Python 2 to Python 3,
and this task was given with emphasis on having UTF-8 (didn't quite comprehend the concept but anyway..)
I've automated the conversion using 2to3, but not sure if using 2to3 achieves the goal of having UTF-8, or if there's some other parts that I should manually work on.
What is it exactly, and is it done automatically by using 2to3?
Thank you in advance.

"I was just told the importance of converting it into Python 3 due to importance of UTF-8 so that the program can work with any other language"
Whoever told you that was misinformed.
2to3 does not do anything towards "having UTF-8" whatever that means. 2to3 is to move your code from Python 2 to Python 3. Python 3 does mean you have have Unicode variable names, but I would strongly recommend against that anyway. Bad Idea. Otherwise Python 2 supports Unicode and UTF-8 perfectly well.
It seems your actual goal is not UTF-8, but translating the program to other language, also known as internationalization, or "18n". That's a completely different issue, and has nothing to do with 2to3. Instead you need to manually change all your text strings to gettext tokens that will be translated when rendered. See http://docs.python.org/library/gettext.html
See also http://regebro.wordpress.com/2011/03/23/unconfusing-unicode-what-is-unicode/ for more information on Unicode.

Related

What happens in the code when emails and websites display the � character? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I know that when emails and websites display �, something went wrong with character encoding along the way, but my understanding of exactly how encoding works is very limited - pretty much limited to knowing that (1) it makes no sense to have a string without an encoding - no such thing as "plain text," and (2) trying to merge two strings with different encodings leads to a corrupt result. But I don't really understand exactly where in the process the interpretation of the foreign character is off.
Essentially what I'm looking for is a breakdown of the process of how a foreign character that is entered into an email by a user OR a foreign character that is entered into a string in the code by the programmer ultimately gets displayed incorrectly as �. What is the original encoding? What is the outputted encoding? What happens along the way? And how should this be fixed?

ANTLR 3 - Can I use it in other way around? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have in my hands a Java project that has a parser for some language.
And it works very nice to read and parse.
But I would like to know if using ANTLR I can also get the other way around, like given a the Java Object go to the string representation of this same object in the language that the parser was built for?
So if I had a CSV parser, I would like to go from the java Object to a CSV file(or string that represents the file).
Not sure if ANTLR is the way to do that.
How can any tool know what the CSV representation of a given object is? Not even the CSV parser knows this. It can only take a CSV input and see if that matches any predefined rule.
ANTLR certainly is not a tool for that. It is a parser generator that takes a grammar of a language and creates recognition classes (a lexer and a parser at the bare minimum) out of it. That's it and ANTLR very good at it.

Issue with UTF-8 encoding in notepad++ [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am having a problem with utf-8 characters displaying correctly when being viewed with Notepad++.
I am viewing a list of geographic locations downloaded from:
http://www.world-gazetteer.com/wg.php?x=1322834778&men=stdl&lng=en&des=wg&srt=npan&col=adhoq&msz=1500
I have already set encoding->Encode in utf8 .
An example of a display problem is the city "H̨alīmābād". I see it as H,then a square character, then alīmābād. However if I copy and paste from Notepad++ to, say this text area, the city name shows up properly.
I've tried Googling around but most of the answers are to set the encoding to utf8 in the editor which, as I mentioned earlier, I have already done.
If anyone could suggest how to fix this issue I would very much appreciate it. Thanks much!
In your example, the first visible letter is encoded by the letter H followed by a combining ogonek; codepoint 48 followed by 328 . Your other accented letters are encoded by a single code-point, e.g. 12B for the "latin small letter I with macron".
You might care to read the unicode FAQ on Characters and Combining Marks. The question with the example of an "X with circumflex by use of X with a combining circumflex" is equivalent to your situation. You'll note that it says "Your problem is most likely a limitation of the layout engine and/or font you are using". As such, the first thing you might want to try is seeing if you are able to view the file using a different font.

Writing a code beautifier [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'd like to write a code beautifier and i thought of using Ruby to do it. Could someone show me a place to get started? I've seen a lot of code beautifiers online but I've never come across any tutorials on how to write one. Is this a very challenging task for someone who's never undertaken any projects such as writing a compiler, parser, etc. before?
(Is there another langauge which would be more well suited for this kind of task, excluding C/C++?)
Python has an interesting feature - it exposes its own parser to scripts. There are examples that use the AST - abstract syntax tree - and do the pretty printing.
I'm not aware that Ruby exposes its own parser to its scripts in such a way, but there are parsers for Ruby written in Ruby here.
Well... I think the initial steps are what you'd do for any project.
Write a list of requirements.
Describe a user interface to your program, that you like and won't prevent you meeting those requirements.
Now you can write down more of a "code" design, and pick the language that would be easiest for you to meet that design.
Here's some requirements off the top of my head:
Supports code beautifying of these languages: Ruby, Python, Perl
Output code behaves identically to input
Output has consistent use of tabs/spaces
Output has consistent function naming convention
Output has consistent variable naming convention
Output has matching braces and indentation
Make as many as you want, it's your program. ;p I was kidding about the Perl, but I think every language you support is going to add a more work.

Should command line options in POSIX-style operating systems be underscore style? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Should the name of command line options for a program in a POSIX-style operating system be underscore-style, like
--cure_world_hunger
or maybe some other style?
--cureworldhunger
--cure-world-hunger
--cureWorldHunger
What's most common? What's better style? What's more Bash-friendly (if such a thing exist)?
Underscore is not a good idea, sometimes it gets "eaten" by a terminal border and thus look like a space.
The easiest to read, and most standard way is to use a dash:
--cure-world-hunger
Always hyphens! Let's get a reputed reference: the Gnu style guide:
GNU adds long options to these conventions. Long options consist of
‘--’ followed by a name made of alphanumeric characters and dashes.
Option names are typically one to three words long, with hyphens to
separate words. Users can abbreviate the option names as long as the
abbreviations are unique.
Another problem with underscores is that if the documentation is linked in a HTML document, the link underline will hide the underscore and will confuse the user.
The double dash prefix is a GNU convention I believe. Check out getopt_long(3) man page on the GNU/Linux Operating System.

Resources