freemarker prb encoding in template - freemarker

I have a template like this :
<?xml version="1.0" encoding="utf-8"?><ContractRequest><SORequest/><Operator/><Contract>
<Engine_Type_Name>${Body.Contract.Engine.ENGINE_TYPE_NAME!""}</Engine_Type_Name>
I know that Body.Contract.Engine.ENGINE_TYPE_NAME = "Véhicule léger"
but freemarker puts it like this "Véhicule léger"
What do I need to change?

Certainly you should view that output as UTF-8. If you don't want UTF-8 output, note that FreeMarker just sends its output into a java.io.Writer, and the encoding with a charset happens outside of FreeMarker. Also, you are outputting XML, so be sure that <?xml ... encoding=... ?> in the output specifies the correct charset too.

Related

unmarshalling fixedlength utf-8 strings with beanio and camel

When there are no diacritic signs that are represented with two bytes, unmarshalling of a message is OK, otherwise it fails complaining about the length. I tried to converty body to type string and set charset utf-8
<convertBodyTo type="java.lang.String" charset="UTF-8" />
before unmarshalling using BeanIO in a Camel route, but it doesn't help. What is the right way to solve the problem?
In fact, I think that purpose of convertBodyTo might be not to tell some class that is supposed to do unmarshalling that the actual string although declared fixedlength, might be variable length, but to do actual conversion? But that requires that I tell somewhere first that the actual source is utf-8, probably in from endpoint. Then I can convert it temporarily to some charset that has single byte charset representation before unmarshalling, and back to utf-8 afterwards?
After having a suggestion that the point is to give BeanIO information which charset to use, I came up with:
<dataFormats>
<beanio id="parseTransactions464" mapping="mapping.xml" streamName="Transactions464" encoding="UTF-8"/>
</dataFormats>
but this gives me:
Exhausted after delivery attempt: 1 caught: java.lang.NullPointerException: charset
I basically copied the usage of encoding with beanio dataFormat from here, I don't know if it is OK:
Cannot find data format in registry - Camel
This is a defect in camel-beanio, see this:
http://camel.465427.n5.nabble.com/Re-Exhausted-after-delivery-attempt-1-caught-java-lang-NullPointerException-charset-tc5817807.html
http://camel.465427.n5.nabble.com/Exhausted-after-delivery-attempt-1-caught-java-lang-NullPointerException-charset-tc5817815.html
https://issues.apache.org/jira/browse/CAMEL-12284

In freemarker, how to convert a string to utf-8 encoding?

I am trying to convert a string encoding to utf-8 in freemarker script.
Is there a way to encode a freemarker string encoding?
If by UTF-8 encoding you mean percentage escaping, like példa to p%C3%A9lda, then it's done as myString?url (or if it's more familiar this way: ${myString?url}). However, the charset used by ?url depends on the url_encoding_charset FreeMarker configuration setting, which should be set to UTF-8 in your application. (It's also possible to specify the charset directly, like in myString?url('UTF-8').)
Documentation: http://freemarker.org/docs/ref_builtins_string.html#ref_builtin_url

Convert to unicode from UTF-8 [duplicate]

I try to convert a UTF8 string to a Java Unicode string.
String question = request.getParameter("searchWord");
byte[] bytes = question.getBytes();
question = new String(bytes, "UTF-8");
The input are Chinese Characters and when I compare the hex code of each caracter it is the same Chinses character. So I'm pretty sure that the charset is UTF8.
Where do I go wrong?
There's no such thing as a "UTF-8 string" in Java. Everything is in Unicode.
When you call String.getBytes() without specifying an encoding, that uses the platform default encoding - that's almost always a bad idea.
You shouldn't have to do anything to get the right characters here - the request should be handling it all for you. If it's not doing so, then chances are it's lost data already.
Could you give an example of what's actually going wrong? Specify the Unicode values of the characters in the string you're receiving (e.g. by using toCharArray() and then converting each char to an int) and what you expected to receive.
EDIT: To diagnose this, use something like this:
public static void dumpString(String text) {
for (int i = 0; i < text.length(); i++) {
System.out.println(i + ": " + (int) text.charAt(i));
}
}
Note that that will give the decimal value of each Unicode character. If you have a handy hex library method around, you may want to use that to give you the hex value. The main point is that it will dump the Unicode characters in the string.
First make sure that the data is actually encoded as UTF-8.
There are some inconsistency between browsers regarding the encoding used when sending HTML form data. The safest way to send UTF-8 encoded data from a web form is to put that form on a page that is served with the Content-Type: text/html; charset=utf-8 header or contains a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> meta tag.
Now to properly decode the data call request.setCharacterEncoding("UTF-8") in your servlet before the first call to request.getParameter().
The servlet container takes care of the encoding for you. If you use setCharacterEncoding() properly you can expect getParameter() to return normal Java strings.
Also you may need a special filter which will take care of encoding of your requests. For example such filter exists in spring framework org.springframework.web.filter.CharacterEncodingFilter
String question = request.getParameter("searchWord");
is all you have to do in your servlet code. At this point you have not to deal with encodings, charsets etc. This is all handled by the servlet-infrastucture. When you notice problems like displaying �, ?, ü somewhere, there is maybe something wrong with request the client sent. But without knowing something of the infrastructure or the logged HTTP-traffic, it is hard to tell what is wrong.
possibly.
question = new String(bytes, "UNICODE");

Substituting text in a file with Ruby

I need to read in a file which will be in xml format but all crammed into a single line, and I need to parse that line to find a specific property and replace its value with something I have specified.
The file might contain:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><VerificationPoint type="Screenshot" version="2"><Description/><Verification object=":qP1B11_QLabel" type="PNG">
I need to search through this line, find the property "Verification object=" and replace the :qP1B11 with my own string. Please not that I don't want to replace the _QLabel" type="PNG"> part of the string if possible.
I can't use sub as I don't value of the property which could be anything, and I believe I should be able to do this with Regular Expressions but I have never had to use them before and all examples I've seen just make me more confused than earlier.
If anyone can present me with an elegant answer (and an explanation if using regexp) it would be a huge help!
Thanks
You have XML so use an XML parser. Nokogiri will make short work of that:
doc = Nokogiri::XML(that_string)
doc.search('Verification').each do |node|
node['object'] = node['object'].sub(/:qP1B11/, 'PANCAKES')
end
new_string = doc.to_xml
# <?xml version="1.0" encoding="UTF-8" standalone="no"?>\n<VerificationPoint type="Screenshot" version="2">\n <Description/>\n <Verification object="PANCAKES_QLabel" type="PNG">\n</Verification>\n</VerificationPoint>\n"
You can adjust the output format using the options for to_xml.
If you only have one <Verification> then you could do it like this:
node = doc.at('Verification')
node['object'] = node['object'].sub(/:qP1B11/, 'PANCAKES')
new_string = doc.to_xml
In either case you'd adjust your regex and replacement to suit your needs.

Can sitemap.xml precessors cope with <!ENTITY name "my text">?

Can sitemap.xml precessors cope with this ?
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY port ":8080">
<!ENTITY host"http://example.com&port;">
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>&host;/path/</loc>
<!-- ...
I assume so. It will most likely just ignore it though. If there is no Sitemaps DTD, I think it has to ignore it unless it expects it.
From Wikipedia:
In the markup languages SGML, HTML, XHTML and XML, a character entity reference is a reference to a particular kind of named entity that has been predefined or explicitly declared in a Document Type Definition (DTD). The "replacement text" of the entity consists of a single character from the Universal Character Set/Unicode. The purpose of a character entity reference is to provide a way to refer to a character that is not universally encodable.
In short, no. Not unless the preprocessor is very forgiving.

Resources