Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am having a problem with utf-8 characters displaying correctly when being viewed with Notepad++.
I am viewing a list of geographic locations downloaded from:
http://www.world-gazetteer.com/wg.php?x=1322834778&men=stdl&lng=en&des=wg&srt=npan&col=adhoq&msz=1500
I have already set encoding->Encode in utf8 .
An example of a display problem is the city "H̨alīmābād". I see it as H,then a square character, then alīmābād. However if I copy and paste from Notepad++ to, say this text area, the city name shows up properly.
I've tried Googling around but most of the answers are to set the encoding to utf8 in the editor which, as I mentioned earlier, I have already done.
If anyone could suggest how to fix this issue I would very much appreciate it. Thanks much!
In your example, the first visible letter is encoded by the letter H followed by a combining ogonek; codepoint 48 followed by 328 . Your other accented letters are encoded by a single code-point, e.g. 12B for the "latin small letter I with macron".
You might care to read the unicode FAQ on Characters and Combining Marks. The question with the example of an "X with circumflex by use of X with a combining circumflex" is equivalent to your situation. You'll note that it says "Your problem is most likely a limitation of the layout engine and/or font you are using". As such, the first thing you might want to try is seeing if you are able to view the file using a different font.
Related
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I want to include accented characters in a Sphinx heading such that I will see the following result:
Finalé
I have tried the following strategies:
Use the unicode character directly such as:
Finalé
======
but the character is considered invalid by Docutils.
Also, I have tried using the strip_html specification in conf.py along with the following text:
Finalé
==============
But the text is left unchanged and the HTML entity is passed through.
Can anyone suggest a way of handling accented characters in Sphinx that will work in all situations and where the final output may be HTML or Latex?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I know that when emails and websites display �, something went wrong with character encoding along the way, but my understanding of exactly how encoding works is very limited - pretty much limited to knowing that (1) it makes no sense to have a string without an encoding - no such thing as "plain text," and (2) trying to merge two strings with different encodings leads to a corrupt result. But I don't really understand exactly where in the process the interpretation of the foreign character is off.
Essentially what I'm looking for is a breakdown of the process of how a foreign character that is entered into an email by a user OR a foreign character that is entered into a string in the code by the programmer ultimately gets displayed incorrectly as �. What is the original encoding? What is the outputted encoding? What happens along the way? And how should this be fixed?
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've been cutting and pasting some code snippets from an Amazon Kindle eBook into a text editor (JetBrains PhpStorm), and apparently each time it comes with some extended (>127) ASCII characters.
Is there simple cmd line sed/awk/tr command, or a simple OSX App to strip them out?
Thanks to this blog post, here is a solution that worked well for me:
tr -cd '\11\12\15\40-\176' < infile > outfile
Note that if you get this error: tr: Illegal byte sequence, this can be solved by setting LANG=C via:
export LANG=C
(not sure why setting LANG=C helps, but that's what others with the same problem were doing)
Plain Clip has always been my go-to OS X app for stripping out unwanted characters/whitespace/etc.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm given a task of converting a bunch of code written in Python 2 to Python 3,
and this task was given with emphasis on having UTF-8 (didn't quite comprehend the concept but anyway..)
I've automated the conversion using 2to3, but not sure if using 2to3 achieves the goal of having UTF-8, or if there's some other parts that I should manually work on.
What is it exactly, and is it done automatically by using 2to3?
Thank you in advance.
"I was just told the importance of converting it into Python 3 due to importance of UTF-8 so that the program can work with any other language"
Whoever told you that was misinformed.
2to3 does not do anything towards "having UTF-8" whatever that means. 2to3 is to move your code from Python 2 to Python 3. Python 3 does mean you have have Unicode variable names, but I would strongly recommend against that anyway. Bad Idea. Otherwise Python 2 supports Unicode and UTF-8 perfectly well.
It seems your actual goal is not UTF-8, but translating the program to other language, also known as internationalization, or "18n". That's a completely different issue, and has nothing to do with 2to3. Instead you need to manually change all your text strings to gettext tokens that will be translated when rendered. See http://docs.python.org/library/gettext.html
See also http://regebro.wordpress.com/2011/03/23/unconfusing-unicode-what-is-unicode/ for more information on Unicode.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
What is the best Windows program to print out source code (more generally, text files)? I'd like the following features:
Includes line numbers
Option of printing 2 or 4 pages on a single sheet of paper.
Header includes filename and timestamp.
Notepad++ is an excellent tool for this (and it's free!). You can print the code out both in normal text, as well as marked-up with colour as you see it on the screen!
I tried the suggested Notepad++ and Codex, but I find them too limiting.
I could not print two columns per sheet in either one of them.
I like to maximize the amount of code per sheet.
A decade ago I would use pcps to print multiple columns of source code, but that software is just too old and cumbersome in this day and age.
For now, I would suggest this, if you want multi column output: http://www.lerup.com/printfile/
UltraEdit works pretty well for all three of those.
www.ultraedit.com
I use Context for most of my non-Visual Studio development, and it does what you asked for and is free. I don't know how well it does color, but the source code colors print in a couple of varying boldnesses, which makes it pretty readable in black and white.
I just use the printer dialog to set the multi-pages per sheet option.
I'm using Codex: http://www.snapfiles.com/get/codex.html
Works pretty good, can both print and publish (export to HTML).
Crimson Editor looks great too!