I have just started working with internationalization and i have configured my system as follows :
1 .I have specified my database character encoding to be in UTF-8
2 .Added 2 line in my.cnf
default-character-set=utf8
default-collation=utf8_general_ci
3. At each jsp page added
<%# page contentType="text/html;charset=UTF-8" language="java"%>
4.In my server.xml added URIEncoding="UTF-8"
Now my question is whenever i enter a "Number" which has variables in "int" in Bengali font its stored in English in the database . But if i enter any "String" in Bengali font its stored in the database as Bengali.
What is happening to the int type variables ?
It's not being stored "in English" in the database; it's stored as an integer using the database's own representation of integers (probably binary or binary-coded decimal depending on the database). For most cases, this is almost certainly what you want.
Converting from a String (representing a number) to an int is called 'parsing'. Some code somewhere is probably calling parse() on a NumberFormat object, and this format object knows about the user's locale (which presumably allows for however numbers are written in Bengali).
The opposite (displaying an int as a String) is called 'formatting', and this can also know about the user's locale.
Related
So I was having an issue with converting French characters correctly. Basically, I have a form which sends data to an SQL Database. Then, on another page, data from this DB is retrieved and displayed to the user. But the data (strings) were being displayed with wierd corrupt characters because the input in the form on the other page was in French. I overcame this problem by using the following function which converters a string to the correct charset. HOWEVER, obviously the better solution is to convert it FIRST and then send it to the database. Now here's the code to convert a string retrieved from a DB to the appropriate charset:
Function ConvertFromUTF8(sIn)
Dim oIn: Set oIn = CreateObject("ADODB.Stream")
oIn.Open
oIn.CharSet = "WIndows-1252"
oIn.WriteText sIn
oIn.Position = 0
oIn.CharSet = "UTF-8"
ConvertFromUTF8 = oIn.ReadText
oIn.Close
End Function
I got this function from here: Classic ASP - How to convert a UTF-8 string to UCS-2?
Now my question is, what function do I use to convert strings beforehand and then send them to the database, so that when I retrieve them they will be good-to-go?
Tried Paul's Method:
So there's page 1, and page 2. Page 1 contains a form which, when submitted, sends the string to the DB which is then retrieved in page 2. I tried Paul's solution by removing the function ConvertFromUTF8 and leaving it to as it was before (it returned wierd mangolian characters). After that, I added the following line on top of Page 1 as well as Page 2.
<%#LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>
I also have the following on both of the pages:
Response.CodePage = 65001
Response.CharSet = "UTF-8"
But it didn't work :(
Edit: it works!, thank you so much everyone for your help!
All I needed to do was add "CodePage = 65001" on top of Page 3 (which I didn't even talk about), where the writing to the DB part was happening.
Paul's answer isn't wrong but it is not the only part to consider:
You will need to go through each of these steps to make sure that you are getting consistent results;
IMPORTANT: These steps have to be performed on each and every page in your web application or you will have problems (emphasized by Paul's comment).
Each page needs to be saved using UTF-8 encoding double check this as some IDEs will default to Windows-1252 (also often misnamed as "ANSI").
Each page will need the following line added as the very first line in the page, to make this easier I put this along with some other values in an include file so I can include them in each page as I go.
Include File - page_encoding.asp
<%#Language="VBScript" CodePage = 65001 %>
<%
Response.CharSet = "UTF-8"
Response.CodePage = 65001
%>
Usage in the top of an ASP page (prefer to put in a config folder at the root of the web)
<!-- #include virtual="/config/page_encoding.asp" -->
Response.Charset = "UTF-8" is the equivalent of setting the ;charset in the HTTP content-type header.
Response.CodePage = 65001 tell's ASP to process all dynamic strings as UTF-8.
Include files in the page will also have to be saved using UTF-8 encoding (double check these also).
Follow these steps and your page will work, your problem at the moment is some pages are being interpreted as Windows-1252 while others are being treated as UTF-8 and you're ending up with a mis-match in encoding.
Normally - and that word has a veryyyyy long stretch - you do not need to convert on hand, even more it's discouraged. At the top off your asp page you write:
<%#LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>
that tell's ASP to send and to receive (from a server point of view) UTF-8. Furthermore it instructs the interpreter to use 2 byte strings. So when writing to a database or reading from a database everything goes auto-magically, so if your database uses 1 byte char or 2 byte nchar conversions are taken care of. And actually that's about it. You can test if all goes well by testing with this set:
áäÇçéčëíďńóöçÖöÚü
This set contains some 'European' but also some 'Unicode' chars... those Unicode will always fail if you use codepage 1252, so it's a nice test set.
What could be causing this error when I try to insert a foreign character into the database?
>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)
And how do I resolve it?
Thanks!
I ran into this same issue when using the Python MySQLdb module. Since MySQL will let you store just about any binary data you want in a text field regardless of character set, I found my solution here:
Using UTF8 with Python MySQLdb
Edit: Quote from the above URL to satisfy the request in the first comment...
"UnicodeEncodeError:'latin-1' codec can't encode character ..."
This is because MySQLdb normally tries to encode everythin to latin-1.
This can be fixed by executing the following commands right after
you've etablished the connection:
db.set_character_set('utf8')
dbc.execute('SET NAMES utf8;')
dbc.execute('SET CHARACTER SET utf8;')
dbc.execute('SET character_set_connection=utf8;')
"db" is the result of MySQLdb.connect(), and "dbc" is the result of
db.cursor().
Character U+201C Left Double Quotation Mark is not present in the Latin-1 (ISO-8859-1) encoding.
It is present in code page 1252 (Western European). This is a Windows-specific encoding that is based on ISO-8859-1 but which puts extra characters into the range 0x80-0x9F. Code page 1252 is often confused with ISO-8859-1, and it's an annoying but now-standard web browser behaviour that if you serve your pages as ISO-8859-1, the browser will treat them as cp1252 instead. However, they really are two distinct encodings:
>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'
If you are using your database only as a byte store, you can use cp1252 to encode “ and other characters present in the Windows Western code page. But still other Unicode characters which are not present in cp1252 will cause errors.
You can use encode(..., 'ignore') to suppress the errors by getting rid of the characters, but really in this century you should be using UTF-8 in both your database and your pages. This encoding allows any character to be used. You should also ideally tell MySQL you are using UTF-8 strings (by setting the database connection and the collation on string columns), so it can get case-insensitive comparison and sorting right.
The best solution is
set mysql's charset to 'utf-8'
do like this comment(add use_unicode=True and charset="utf8")
db = MySQLdb.connect(host="localhost", user = "root", passwd = "", db = "testdb", use_unicode=True, charset="utf8") – KyungHoon Kim Mar
13 '14 at 17:04
detail see :
class Connection(_mysql.connection):
"""MySQL Database Connection Object"""
default_cursor = cursors.Cursor
def __init__(self, *args, **kwargs):
"""
Create a connection to the database. It is strongly recommended
that you only use keyword parameters. Consult the MySQL C API
documentation for more information.
host
string, host to connect
user
string, user to connect as
passwd
string, password to use
db
string, database to use
port
integer, TCP/IP port to connect to
unix_socket
string, location of unix_socket to use
conv
conversion dictionary, see MySQLdb.converters
connect_timeout
number of seconds to wait before the connection attempt
fails.
compress
if set, compression is enabled
named_pipe
if set, a named pipe is used to connect (Windows only)
init_command
command which is run once the connection is created
read_default_file
file from which default client values are read
read_default_group
configuration group to use from the default file
cursorclass
class object, used to create cursors (keyword only)
use_unicode
If True, text-like columns are returned as unicode objects
using the connection's character set. Otherwise, text-like
columns are returned as strings. columns are returned as
normal strings. Unicode objects will always be encoded to
the connection's character set regardless of this setting.
charset
If supplied, the connection character set will be changed
to this character set (MySQL-4.1 and newer). This implies
use_unicode=True.
sql_mode
If supplied, the session SQL mode will be changed to this
setting (MySQL-4.1 and newer). For more details and legal
values, see the MySQL documentation.
client_flag
integer, flags to use or 0
(see MySQL docs or constants/CLIENTS.py)
ssl
dictionary or mapping, contains SSL connection parameters;
see the MySQL documentation for more details
(mysql_ssl_set()). If this is set, and the client does not
support SSL, NotSupportedError will be raised.
local_infile
integer, non-zero enables LOAD LOCAL INFILE; zero disables
autocommit
If False (default), autocommit is disabled.
If True, autocommit is enabled.
If None, autocommit isn't set and server default is used.
There are a number of undocumented, non-standard methods. See the
documentation for the MySQL C API for some hints on what they do.
"""
I hope your database is at least UTF-8. Then you will need to run yourstring.encode('utf-8') before you try putting it into the database.
Use the below snippet to convert the text from Latin to English
import unicodedata
def strip_accents(text):
return "".join(char for char in
unicodedata.normalize('NFKD', text)
if unicodedata.category(char) != 'Mn')
strip_accents('áéíñóúü')
output:
'aeinouu'
You are trying to store a Unicode codepoint \u201c using an encoding ISO-8859-1 / Latin-1 that can't describe that codepoint. Either you might need to alter the database to use utf-8, and store the string data using an appropriate encoding, or you might want to sanitise your inputs prior to storing the content; i.e. using something like Sam Ruby's excellent i18n guide. That talks about the issues that windows-1252 can cause, and suggests how to process it, plus links to sample code!
SQLAlchemy users can simply specify their field as convert_unicode=True.
Example:
sqlalchemy.String(1000, convert_unicode=True)
SQLAlchemy will simply accept unicode objects and return them back, handling the encoding itself.
Docs
Latin-1 (aka ISO 8859-1) is a single octet character encoding scheme, and you can't fit \u201c (“) into a byte.
Did you mean to use UTF-8 encoding?
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 106: ordinal not in range(256)
Solution 1:
\u2013 - google the character meaning to identify what character actually causing this error, Then you can replace that specific character, in the string with some other character, that's part of the encoding you are using.
Solution 2:
Change the string encoding to some encoding which includes all the character of your string. and then you can print that string, it will work just fine.
below code is used to change encoding of the string , borrowed from #bobince
u'He said \u201CHello\u201D'.encode('cp1252')
The latest version of mysql.connector has only
db.set_charset_collation('utf8', 'utf8_general_ci')
and NOT
db.set_character_set('utf8') //This feature is not available
I ran into the same problem when I was using PyMySQL. I checked this package version, it's 0.7.9.
Then I uninstall it and reinstall PyMySQL-1.0.2, the issue is solved.
pip uninstall PyMySQL
pip install PyMySQL
Python: You will need to add
# - * - coding: UTF-8 - * - (remove the spaces around * )
to the first line of the python file. and then add the following to the text to encode: .encode('ascii', 'xmlcharrefreplace'). This will replace all the unicode characters with it's ASCII equivalent.
I got a CSV file from my front-end as a XString and after I convert it into String it looks as follows:
In the next step I'm trying to perform SPLIT lv_string AT '##' INTO TABLE itab so I can get my data but it doesn't split anything, itab contains one line equal to lv_string.
If I try REPLACE '#' IN lv_string WITH space, lv_string doesn't change and sy-subrc is 4.
From my point of view I have this problem because the symbol # is used by SAP in this context as a symbol for non-printable symbols (that result from the conversion byte->string).
My question is: how may I use SPLIT/REPLACE with # in this case?
I also thought that I can change the SAP code page when converting XString to String but I already use the SAP code page 4110 (utf-8) and don't know a better alternative...
When you display a variable with the debugger, it displays the generic character # (U+0023) for all control characters which are not assigned a glyph ("non-printable symbols" as you say).
If the variable corresponds to the contents of a text file, and ## frequently occurs, there is a big chance that it's the combination of the control characters U+000D and U+000A which correspond to "newline" in Windows files.
In the backend debugger, you can check the hexadecimal values of those characters by clicking the button "Hexadezimal" (shown in your screenshot).
You may use the variable CL_ABAP_CHAR_UTILITIES=>CR_LF which contains those two control characters.
I have created layout based on cobol copybook.
Layout snap-shot:
I tried to load data also selecting same layout, it gives me wrong result for some columns. I try using all binary numeric type.
CLASS-ORDER-EDGE
DIV-NO-EDG
OFFICE-NO-EDG
REG-AREA-NO-EDG
CITY-NO-EDG
COUNTY-NO-EDG
BILS-COUNT-EDG
REV-AMOUNT-EDG
USAGE-QTY-EDG
GAS-CCF-EDG
result snapshot
Input file can be find below attachment
enter link description here
or
https://drive.google.com/open?id=0B-whK3DXBRIGa0I0aE5SUHdMTDg
Expected output:
Related thread
Unpacking COMP-3 digit using Java
First Problem you have done an EBCDIC --> ascii conversion on the file !!!!
The EBCDIC --> ascii conversion will also try and convert binary fields as well as text.
For example:
Comp-3 value hex hex after Ascii conversion
400 x'400c' x'200c' x'40' is the ebcdic space character
it gets converted to the ascii
space character x'20'
You need to do binary transfer, keeping the file as ebcdic:
Check the file on the Mainframe if it has a RECFM=FB you can do a transfer
If the file is RECFM=VB make sure you transfer the RDW (Record Descriptor word) (or copy the VB file to a FB file on the mainframe).
Other points:
You will have to update RecordEditor/JRecord
The font will need to be ebcdic (cp037 for US ebcdic; for other lookup)
The FileStructure/FileOrganisation needs to change (Fixed length / VB)
Finally
BILS-Count-EDG is either 9 characters long or starts in column 85 (and is 8 bytes long).
You should include Xml in as text not copy a picture in.
In the RecordEditor if you Right click >>> Edit Record; it will show the fields as Value, Raw Text and Hex. That is useful for seeing what is going on
You do not seem to accept many answers; it is not relevant whether the answer solves your problem; it is whether the answer is correct answer for the question.
I am maintaining an app for a client that is used in two locations. One in England and one in Poland.
The database is stored in England and uses the format £1000.00 for currency, but the information is being gathered locally in Poland where 1000,00 is the format.
My question is, in VB6 is there a function that takes a currency string in a local format and converts to another, or will I just have to parse the string and replace , or . ?
BTW I have looked at CCur, but not sure if that will do what I want.
The data is not actually stored as the string "£1000.00"; it's stored in some numeric format.
Sidebar: Usually databases are set up to store money amounts using either the decimal data type (also called money in some DBs), or as a floating point number (also called double).
The difference is that when it's stored as decimal certain numbers like 0.01 are represented exactly whereas in double those numbers can only be stored approximately, causing rounding errors.
The database appears to be storing the number as "£1000.00" because something is formatting it for display. In VB6, there's a function FormatCurrency which would take a number like 1000 and return a string like "£1000.00".
You'll notice that the FormatCurrency function does not take an argument specifying what type of currency to use. That's because it, along with all the other locale-specific functions in VB, figures out the currency from the current locale of the system (from the Windows Control Panel).
That means that on my system,
Debug.Print FormatCurrency(1000)
will print $1,000.00, but if I run that same program on a Windows computer set to the UK locale, it will probably print £1,000.00, which, of course, is something completely different.
Similarly, you've got some code, somewhere, I can't tell where, in Poland, it seems, that is responsible for parsing the user's string and converting it to a number. And if that code is in Visual Basic, again, it's relying on the control panel to decide whether "." or "," is the thousands separator and whether "," or "." is the decimal point.
The function CDbl converts its argument to a number. So for example on my system in the US
Debug.Print CDbl("1.200")
produces the number one point two, on a system with the Control Panel set to European formatting, it would produce the number one thousand, two hundred.
It's possible that the problem is that you have someone sitting a computer with the regional control panel set to use "." as the decimal separator, but they're typing "," as the decimal separator.
What database are you using? And what data type is the amount stored in?
As long as you are always converting from one format to another, you do not need to do any parsing, just replace "." with "," or the other way around. You may need to remove the "£"-sign as well if that is stored in your string.
There's probably a correct answer dealing with culture objects and such, but the easiest way would be to taken the input from the polish input, and replace the , with a ., and then store it in your database as type "money" or "decimal". If you know they (possibly configurable per user) are always entering numbers in either Polish or English, you could have a function that you run all the input numbers through to convert the string to a proper "decimal" typed variable. Also, for display purposes you could run it through another similar function to ensure that the user always sees the number format they are comfortable with. The key here is to switch it to a decimal as soon as you get it from the user, and only switch it back to a string at the last step before sending it out to the user.
#KiwiBastard yes i would think so. Are you storing your amount in an "(n)varchar" field or are you using a currency/decimal type field? If the latter is the case, the currency-symbols and separators are added by your client, and there would be no need to replace anything in the database.