InstallShield 2011 error 7185 importing Japanese strings in the string table of basic MSI project - utf-8

I am trying to import Japanese strings inside my "Basic MSI" project, it use to work before without any issues but now when I try to import some Japanese strings from a text file then it throws following error (I have changed some of the personal data from the error message.)
ISDEV : error -7185: The Japanese: 日本語 translation for string identifier IDS_XXXX_1111 includes characters that are not available on code page 932.
I think there are some of the characters inside the IDS_XXXX_1111 are not part of code page 932. How to detect those characters using some tool?
Also documentation mentions about changing some encoding settings to UTF-8 in InstallShield 2011, if you are aware then please guide me.
Thanks in advance
Rahul

My favorite way to detect such characters is with python. For example, reading a file like the InstallShield string tables in python 2.x:
import codecs
strings = codecs.open("strings.txt", "r", "UTF-16"):
for line in strings.readlines():
line = line.strip()
try:
line.encode("cp932")
except UnicodeError:
print "Can't encode: " + line.encode("cp932", "replace")
Your alternatives are to pinpoint the characters that cannot be represented on the relevant code page and replace them with ones that can, or to go to the Releases view and select yes for the Build UTF-8 Database setting.

Related

AWS SAM throws UnicodeEncodeError when invoking NodeJS 12.x lambda function [duplicate]

What could be causing this error when I try to insert a foreign character into the database?
>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)
And how do I resolve it?
Thanks!
I ran into this same issue when using the Python MySQLdb module. Since MySQL will let you store just about any binary data you want in a text field regardless of character set, I found my solution here:
Using UTF8 with Python MySQLdb
Edit: Quote from the above URL to satisfy the request in the first comment...
"UnicodeEncodeError:'latin-1' codec can't encode character ..."
This is because MySQLdb normally tries to encode everythin to latin-1.
This can be fixed by executing the following commands right after
you've etablished the connection:
db.set_character_set('utf8')
dbc.execute('SET NAMES utf8;')
dbc.execute('SET CHARACTER SET utf8;')
dbc.execute('SET character_set_connection=utf8;')
"db" is the result of MySQLdb.connect(), and "dbc" is the result of
db.cursor().
Character U+201C Left Double Quotation Mark is not present in the Latin-1 (ISO-8859-1) encoding.
It is present in code page 1252 (Western European). This is a Windows-specific encoding that is based on ISO-8859-1 but which puts extra characters into the range 0x80-0x9F. Code page 1252 is often confused with ISO-8859-1, and it's an annoying but now-standard web browser behaviour that if you serve your pages as ISO-8859-1, the browser will treat them as cp1252 instead. However, they really are two distinct encodings:
>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'
If you are using your database only as a byte store, you can use cp1252 to encode “ and other characters present in the Windows Western code page. But still other Unicode characters which are not present in cp1252 will cause errors.
You can use encode(..., 'ignore') to suppress the errors by getting rid of the characters, but really in this century you should be using UTF-8 in both your database and your pages. This encoding allows any character to be used. You should also ideally tell MySQL you are using UTF-8 strings (by setting the database connection and the collation on string columns), so it can get case-insensitive comparison and sorting right.
The best solution is
set mysql's charset to 'utf-8'
do like this comment(add use_unicode=True and charset="utf8")
db = MySQLdb.connect(host="localhost", user = "root", passwd = "", db = "testdb", use_unicode=True, charset="utf8") – KyungHoon Kim Mar
13 '14 at 17:04
detail see :
class Connection(_mysql.connection):
"""MySQL Database Connection Object"""
default_cursor = cursors.Cursor
def __init__(self, *args, **kwargs):
"""
Create a connection to the database. It is strongly recommended
that you only use keyword parameters. Consult the MySQL C API
documentation for more information.
host
string, host to connect
user
string, user to connect as
passwd
string, password to use
db
string, database to use
port
integer, TCP/IP port to connect to
unix_socket
string, location of unix_socket to use
conv
conversion dictionary, see MySQLdb.converters
connect_timeout
number of seconds to wait before the connection attempt
fails.
compress
if set, compression is enabled
named_pipe
if set, a named pipe is used to connect (Windows only)
init_command
command which is run once the connection is created
read_default_file
file from which default client values are read
read_default_group
configuration group to use from the default file
cursorclass
class object, used to create cursors (keyword only)
use_unicode
If True, text-like columns are returned as unicode objects
using the connection's character set. Otherwise, text-like
columns are returned as strings. columns are returned as
normal strings. Unicode objects will always be encoded to
the connection's character set regardless of this setting.
charset
If supplied, the connection character set will be changed
to this character set (MySQL-4.1 and newer). This implies
use_unicode=True.
sql_mode
If supplied, the session SQL mode will be changed to this
setting (MySQL-4.1 and newer). For more details and legal
values, see the MySQL documentation.
client_flag
integer, flags to use or 0
(see MySQL docs or constants/CLIENTS.py)
ssl
dictionary or mapping, contains SSL connection parameters;
see the MySQL documentation for more details
(mysql_ssl_set()). If this is set, and the client does not
support SSL, NotSupportedError will be raised.
local_infile
integer, non-zero enables LOAD LOCAL INFILE; zero disables
autocommit
If False (default), autocommit is disabled.
If True, autocommit is enabled.
If None, autocommit isn't set and server default is used.
There are a number of undocumented, non-standard methods. See the
documentation for the MySQL C API for some hints on what they do.
"""
I hope your database is at least UTF-8. Then you will need to run yourstring.encode('utf-8') before you try putting it into the database.
Use the below snippet to convert the text from Latin to English
import unicodedata
def strip_accents(text):
return "".join(char for char in
unicodedata.normalize('NFKD', text)
if unicodedata.category(char) != 'Mn')
strip_accents('áéíñóúü')
output:
'aeinouu'
You are trying to store a Unicode codepoint \u201c using an encoding ISO-8859-1 / Latin-1 that can't describe that codepoint. Either you might need to alter the database to use utf-8, and store the string data using an appropriate encoding, or you might want to sanitise your inputs prior to storing the content; i.e. using something like Sam Ruby's excellent i18n guide. That talks about the issues that windows-1252 can cause, and suggests how to process it, plus links to sample code!
SQLAlchemy users can simply specify their field as convert_unicode=True.
Example:
sqlalchemy.String(1000, convert_unicode=True)
SQLAlchemy will simply accept unicode objects and return them back, handling the encoding itself.
Docs
Latin-1 (aka ISO 8859-1) is a single octet character encoding scheme, and you can't fit \u201c (“) into a byte.
Did you mean to use UTF-8 encoding?
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 106: ordinal not in range(256)
Solution 1:
\u2013 - google the character meaning to identify what character actually causing this error, Then you can replace that specific character, in the string with some other character, that's part of the encoding you are using.
Solution 2:
Change the string encoding to some encoding which includes all the character of your string. and then you can print that string, it will work just fine.
below code is used to change encoding of the string , borrowed from #bobince
u'He said \u201CHello\u201D'.encode('cp1252')
The latest version of mysql.connector has only
db.set_charset_collation('utf8', 'utf8_general_ci')
and NOT
db.set_character_set('utf8') //This feature is not available
I ran into the same problem when I was using PyMySQL. I checked this package version, it's 0.7.9.
Then I uninstall it and reinstall PyMySQL-1.0.2, the issue is solved.
pip uninstall PyMySQL
pip install PyMySQL
Python: You will need to add
# - * - coding: UTF-8 - * - (remove the spaces around * )
to the first line of the python file. and then add the following to the text to encode: .encode('ascii', 'xmlcharrefreplace'). This will replace all the unicode characters with it's ASCII equivalent.

Can't import product names/descriptions with special characters

I'm trying to import a product CSV which has Bulgarian product names/descriptions (using the standard import under import/export->import)
The only way I've been able to import any so far is by wrapping them in quotes or by putting roman characters in front of the Bulgarian.
e.g. 'Ламинирани ПДЧ' or xxx Ламинирани ПДЧ
without adding these characters it outputs the error: Required attribute 'name' has an empty value in rows: 1
It seems like the Bulgarian is being stripped out completely? My file is encoded as UTF-8 and I've also set the default charset as UTF-8 in the htaccess file.
Is it possible to import the Bulgarian without quotes/roman characters?
I haven't personally tried it, but I have seen this answer that seems to work.
Try saving your csv file through Libre Office Calc with UTF-8 and try import again.

How do I prevent Turkish letters from dropping when using UIFont in cocos2d?

I'm doing the following to create a label that I use as part of attribution for a photo:
CCLabelTTF *imageSourceLabel = [CCLabelTTF labelWithString:[_organism imageSource] fontName:[[UIFont systemFontOfSize:12] fontName] fontSize:12];
Several of the image sources include Turkish letters. For example, in this URL:
http://commons.wikimedia.org/wiki/File:Şahlûr-33.jpg
This displays improperly in my iPad app; the Turkish letters are missing.
How do I create a label that will work with text like that in the URL above?
Edit:
Nevermind... the problem is with exporting from Excel. See the comments on the answer below. This link provides some additional information: Excel to CSV with UTF8 encoding
Additional Edit:
Actually, it's still a problem, even after I export correctly and verify that I have the proper UTF-8 (or is it 16?) letters in the CSV file. For example, this string:
Dûrzan cîrano / CC BY-SA 3.0
Is displayed like this:
and this string:
Christian Mehlführer / CC-BY 2.5
is displayed like this:
It's definitely being processed improperly upon import, as CCLOG generates the following:
Photo Credit: Dûrzan cîrano / CC BY-SA 3.0
More Info:
Upon import, I'm storing the following value as a string in an array:
"Christian Mehlf\U00c3\U00bchrer / CC-BY 2.5"
Wikipedia says the UTF-8 value for ü, in hex, is C3 BC. It looks like the c3bc is in there, but masked as \U00c3\U00bc.
Is there any way to convert this properly? Or is something fundamentally broken at the CSV import level?
The solution is below.
There were several problems:
Excel on the Mac doesn't export UTF-8 properly. The solution I used was to paste the data into Google Spreadsheet and export from there. More info here: Excel to CSV with UTF8 encoding
I realized that once I had the proper data in the CSV file, that I was importing it with the improper settings. I'm using parseCSV and needed to set _encoding in the -init method to NSUTF8StringEncoding instead of the default, NSISOLatin1StringEncoding.
if you try this:
[CCLabelTTF labelWithString:[[_organism imageSource] stringByUnescapingHTML] fontName:[[UIFont systemFontOfSize:12] fontName] fontSize:12];
it will likely work better. I suspect your url string is escaped HTML.

Extended charsets chars not reccognized and converting to ? mark

I have a string contain some special char like "\u2012" i.e. FIGURE DASH. When i am trying to print this on console I am getting a '?' mark instead of its symbol. I have an editor where in I can insert the symbol using alt+numpad like alt+2012. In editor it I could see the symbol save it in a xml file and get the value using nodevalue, I get a '?' mark.
To summerize I am facing problem to read extended latin a charset. What i need is When i insert such symbols and read it, i should get something like &#xXXXX;.
Please help!
TIA :)
Simply I have a String inpath = "À";, I want to get its unicode value..like &#xXXXX;
The default console encoding in Windows is some MS-DOS code page and they don't support the character. You can try running chcp 65001 before running the program but you might also need to change the console font as well.
You don't need to do anything you wouldn't do with any other character, as long as you use UTF-8. You aren't doing that in many places. You need to explicitly write in your code to save and read the file in UTF-8, and not rely on the platform default encoding.

Ruby CSV parsing from Excel with multilingual document

I have a csv document exported from Excel and containing both english and non-english (russian) letters.
I've managed to open it with
CSV.open #tmp, "rb:ISO-8859-1", {col_sep: ";"}
but it read russian symbols as \xCE\xF1\xF2\xE0\xEB\xFC\xED\xFB\xE5 \xE7\xE0\xEF\xF7 etc.
I've try "rb:ISO-8859-1:UTF-8" but get "ArgumentError: invalid byte sequence in UTF-8", same as csv.open runned without mode.
How this could be fixed? Also, how I could find 'mode' argument options - I couldn't understand from docs where it is described.
Main environment is Ubuntu server, if it matters.
try using this format
r:ISO-8859-15:UTF-8

Resources