How to test webservice for unicode handling - utf-8

Are there test tools available to test if a webservice can handle unicode utf-8 encoded posts? How do I generate utf-8 encoded data?

The sad-but-true answer is that there is no way to know what encoding some program expects if they don't either document it, or provide encoding metadata in whatever protocol you're using.
As for generating utf-8, well, that depends on what programming language you're using.

Related

How to change default character encoding configuration in Jetty app server from UTF-8 to ISO-8859-1

I want my application to support ISO-8859-1 fully in jetty server. But I am Unable to change the default character encoding to ISO-8859-1. Where do i need to set the encoding/charsets?
This is for jetty-distribution-9.4.12, running a struts web application. I have tried modifying the webdefault.xml for encoding mappings. But somehow it fails to take UTF-8 for encoding.
I am seeing an issue when giving a name to an XML resource with japanese chars(私のユーザー). jetty server always fails in taking this name to my resource. when I check in the request I see that the content type is UTF-8 and HTTP 1.1 spec.
I want my server to support in taking my resource name as 私のユーザー. In order to make this happen, I wanted to add that compatibility to the server.
However, with the little knowledge I have, done some research tried to do some configurations in the server but nothing seems to work.
Trial 1
Changing the web-default.xml with locale-encoding
<locale-encoding-mapping>
<locale>en</locale>
<encoding>ISO-8859-1</encoding>
</locale-encoding-mapping>
Trial 2
adding the encoding property to the JAVA_OPTIONS in jetty.sh file
JAVA_OPTIONS+=("-Dfile.encoding=UTF-8")
referred links
Jetty Character encoding issue
Jetty 9, character encoding UTF-8
Jetty uses the current HTTP/1.1 specs (yep, all of these specs talk about current HTTP/1.1 specific behavior)
RFC7230: HTTP/1.1: Message Syntax and Routing
RFC7231: HTTP/1.1: Semantics and Content
RFC7232: HTTP/1.1: Conditional Requests
RFC7233: HTTP/1.1: Range Requests
RFC7234: HTTP/1.1: Caching
RFC7235: HTTP/1.1: Authentication
I think the most relevant spec to your question is from RFC7231 - Appendix B: Updates from RFC2616
The default charset of ISO-8859-1 for text media types has been
removed; the default is now whatever the media type definition says.
Likewise, special treatment of ISO-8859-1 has been removed from the
Accept-Charset header field. (Section 3.1.1.3 and Section 5.3.3)
The idea of ISO-8859-1 being the default charset has long ago been deprecated, the only place you'll find ISO-8859-1 indicated as a default charset is in old specs that have now been labelled as "obsolete" (such as RFC2616).
Timeline:
The older HTTP/1.1 spec, RFC2616, was released in 1999.
The faults in RFC2616 were identified and a revised spec started being discussed in 2006.
The updated specs RFC7230 thru RFC7235 were release in June 2014.
All of the major browser vendors (Chrome, Firefox, Edge, Safari, etc..) updated that year to support RFC7230 and related specs.
Over the years since, the major browser have started to drop RFC2616 concepts and support, removing behaviors, and even quietly dropping features that are from other obsolete specs (eg: older Set-Cookie header syntax now result in a no-op on the browser side, with the cookie being dropped).
Today (Sept 2019):
The HTTP 1.1 protocol has a default character encoding of UTF-8.
The HTTP 1.1 document default character encoding is UTF-8.
The HTTP 2 protocol has a default character encoding of UTF-8.
The HTTP 2 document default character encoding is UTF-8.
What all Web Developers today are responsible for:
You MUST limit your HTTP 1.1 protocol usages (headers names, header values) to US-ASCII.
Header names should follow HTTP 1.1 token rules. (this is a subset of US-ASCII)
Header values that contain a character outside of US-ASCII 1, MUST be encoded first in UTF-8 and then the hex values percent-encoded for representation in the header value.
If you intend to send a ISO-8859-1 document as a response body, then you MUST indicate as such in the HTTP Response Content-Type header the mime-type and charset. (eg: Content-Type: text/html; charset=ISO-8859-1)
But seeing as you didn't indicate where in the HTTP exchange you are wanting to set this default character encoding, it's hard to express a detailed answer/solution to your issue. (eg: it could be a problem with your encoding of application/x-www-form-urlencoded request body content and its interaction with the Servlet spec? which can be fixed with an additional field in your HTML5 form btw)
1: This might seem harsh, but if you check RFC 7230: 3.2.4 Field Parsing you'll see that the existence of characters in the header fields of HTTP outside of US-ASCII will at best be dropped, or at worst be interpreted to be a obs-fold or obs-text character rendering the entire request as bad resulting in a (400 Bad Request).
Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

French character é is printing as � in jhipster thymeleaf template

I am using Jhipster application email functionality to send mail on user creation. When sending mail in French the character like é is printing wrongly.
These characters are coming from standard messages_fr.properties file.
Obviously its encoding issue but both in email template html and java code we are setting encoding as UTF-8, which should display this correctly.
While debugging I found that in MailService.java class, content loaded by the SpringTemplateEngine's process method have already loaded the character wrongly before setting the encoding as UTF-8.
My code:
String content = templateEngine.process("activationEmail", context);
Looks like I know the root cause of it but as it is Spring's internal API class, I don't know how to fix this issue.
As Gael pointed I found that the exact root cause of issue is that STS is changing the encoding of .properties files to ISO by default. Changing the file back to UTF-8 is the solution of this issue. Thanks Gael for pointing to this direction.

ANSI Message format validator

We know ANSI is the one of the standard and format for transfer the financial and billing information between organisation, in our terms application, ie integration purpose, while developing the ANSI format integrations we need some message format validator software that could be helpful to identify the segments required fields and match the segments and values between templates and actual message that we are constructing.
I have one validator for HL7 message like 7edit, like this do we have any ANSI message validator that will integrate all like ANSI, UB04 message elements.
Thanks in advance.
Your best bet might be to use open-source Java libraries and create a small app yourself.
Here are some libraries:
EDIReader http://berryworkssoftware.net/index.php?option=com_content&task=view&id=13&Itemid=27
BOTS http://bots.sourceforge.net/en/index.shtml
SMOOKS http://www.smooks.org/

UTF-8 encoding Google Apps Email Settings API

I've been using Google Apps Email Settings API for a while but I came to a problem when I tried to insert aliases, signatures or any information with "ñ" or "Ñ". It adds garbage instead of those characters and it doesn't seem to respect the charset specified (utf-8) in the HTTP header nor the XML character encoding.
I have tried via my own python code and also using OAuth Playground[1] but it's been impossible to properly add the mentioned characters.
¿Any idea/suggestion?
Thanks in advance.
EDIT: It seems that the problem is not in the request but in the response. I have encoded it successfully in my code but it should be also fixed in OAuth Playground.
[1] https://developers.google.com/oauthplayground/
I have succesfully called Google API client methods using UTF8-encoded strings, so it is definitely an issue with your Python setup.
I would workaround this issue sending Unicode strings instead of UTF-8 encoded:
u'literal string' # This is unicode
'encoded utf-8 string'.decode('utf-8') # This is unicode
EDIT: Re-reading your answer it seems that you are making raw HTTP calls with hand-made XML documents. I can't understand why. If it's the way you want to go, take a look into Emails Settings API client code to learn how to build the XML documents.

Internationalization of text data using Apache POI for Excel Import export

I work in Java Technology and a beginner in Apache POI. I am implementing Excel Import Export using Apache POI API and data in mysql database. I have to read and write the localized data Like Chinese characters and other characters that are UTF-8 / UTF-16 encoded. For eg, the Titles in the Excel will be localized but data can be in English.
I want to know, if POI provides API to write into the Excel that takes encoding as parameter or some other way. Pls suggest.
The API that i know is : workbook.write(fileoutputstream). Please suggest if there is a way to write other encoding characters.
Thanks in advance,
Pallavi
Apache POI works with Java Strings. Java Strings are unicode - see the official Java Strings and Bytes tutorial for more if this is all new to you.
As long as you give POI a valid Java string for your character, it will save that into the file for you. However, you do need to ensure you get the strings into your program correctly!
POI has loads of unit tests which verify it handles unicode characters just fine, take a look at TestBugs for quite a few

Resources