Reading Japanese query parameters from the URL - spring

I am getting query parameter as "ã\\u0083\\u008eã\\u0083¼ã\\u0083\\u0096ã\\u0083©ã\\u0083³ã\\u0083\\u0089å\\u0093\\u0081" in my controller for Japanese character "ノーブランド品".
Is there a way to translate all query parameters into UTF-8?
I have tried multiple solutions but it does not seem to be working
Solution I tried
URLDecoder.decode(string, "UTF-8");
Another solution I tried is
ByteBuffer buffer = StandardCharsets.UTF_8.encode(encodedName);
decodedName = StandardCharsets.UTF_8.decode(buffer).toString();
Is there a way to decode the string back to Japanese once it is translated? Reason am asking is because page that is calling us is not owned by us
Thanks

Related

Disabling visually ambiguous characters in Google URL-shortener output

Is there a way to say (programmatically, I mean their API) the Google URL shortener not to produce short URL with characters like:
0 O
1 l
Because people often make mistake when reading those characters from displays and typing them elsewhere.
You cannot request the API to use a custom charset, so no.
Not a proper solution, but you could check the url for unwanted characters and request another short URL for the same long URL until you get one you like. Google URL shortner issues a unique short URL for an already shortned URL if you provide an OAuth token with the request. However I am not sure if a user is limited to one unique short URL per a specific long URL in which case this won't work either.
Since you're doing it programmatically, you could swap out those chars for their ascii value, '%6F' for the letter o, for instance. In this case, just warn the users that in doubt, it's a numeral.
Alternatively, use a font that distinguishes ambiguous chars, or better yet, color-code them (or underline numerals, or whatever visual mark)

Convert to unicode from UTF-8 [duplicate]

I try to convert a UTF8 string to a Java Unicode string.
String question = request.getParameter("searchWord");
byte[] bytes = question.getBytes();
question = new String(bytes, "UTF-8");
The input are Chinese Characters and when I compare the hex code of each caracter it is the same Chinses character. So I'm pretty sure that the charset is UTF8.
Where do I go wrong?
There's no such thing as a "UTF-8 string" in Java. Everything is in Unicode.
When you call String.getBytes() without specifying an encoding, that uses the platform default encoding - that's almost always a bad idea.
You shouldn't have to do anything to get the right characters here - the request should be handling it all for you. If it's not doing so, then chances are it's lost data already.
Could you give an example of what's actually going wrong? Specify the Unicode values of the characters in the string you're receiving (e.g. by using toCharArray() and then converting each char to an int) and what you expected to receive.
EDIT: To diagnose this, use something like this:
public static void dumpString(String text) {
for (int i = 0; i < text.length(); i++) {
System.out.println(i + ": " + (int) text.charAt(i));
}
}
Note that that will give the decimal value of each Unicode character. If you have a handy hex library method around, you may want to use that to give you the hex value. The main point is that it will dump the Unicode characters in the string.
First make sure that the data is actually encoded as UTF-8.
There are some inconsistency between browsers regarding the encoding used when sending HTML form data. The safest way to send UTF-8 encoded data from a web form is to put that form on a page that is served with the Content-Type: text/html; charset=utf-8 header or contains a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> meta tag.
Now to properly decode the data call request.setCharacterEncoding("UTF-8") in your servlet before the first call to request.getParameter().
The servlet container takes care of the encoding for you. If you use setCharacterEncoding() properly you can expect getParameter() to return normal Java strings.
Also you may need a special filter which will take care of encoding of your requests. For example such filter exists in spring framework org.springframework.web.filter.CharacterEncodingFilter
String question = request.getParameter("searchWord");
is all you have to do in your servlet code. At this point you have not to deal with encodings, charsets etc. This is all handled by the servlet-infrastucture. When you notice problems like displaying �, ?, ü somewhere, there is maybe something wrong with request the client sent. But without knowing something of the infrastructure or the logged HTTP-traffic, it is hard to tell what is wrong.
possibly.
question = new String(bytes, "UNICODE");

How is Illegal char's URL working?

There are many sites (such as Stackoverflow) that has the title of the page in the URL.
I am looking for the algorithm in which they are using in order to avoid illegal URL characters. ( I dont want URL encoding, I want replace/remove algo)
like 'How is Illegal char's URL working?' will become 'How-is-Illegal-chars-URL-working'
Thanks!
The algorithm to do this is generally called 'slugify', because it turns a string into a 'slug' to be used in a URL. Searching for that should give you plenty of useful implementations.
No idea how SO does it, but I would just strip every non-alphanumeric character and replace spaces with underscores.
In Python:
def cleanTitle(title):
temp = ''
for character in title.lower():
if character in 'abcdefghijklmnopqrstuvwxyz1234567890_-+/<>,.=[]{}()\|!##$%^&':
temp += character
return temp
I see you are working in C#. I don't know C#, so you'll have to translate this code. I doubt it's hard to do, though.

C# MVC3 and non-latin characters

I have my database results (áéíóúàâêô...) and when I display any of this characters I get codes like:
á
My controller is like this:
ViewBag.EstadosDeAlma = (from e in db.EstadosDeAlma select e.Title).ToList();
My cshtml page is like this:
var data = '#foreach (dynamic item in ViewBag.EstadosDeAlma){ #(item + " ") }';
In addition, if I use any rich text editor as Tiny MCE all non-latin characters are like this too.
What should I do to avoid this problem?
What output encoding are you using on your web pages? I would suggest using UTF-8 since you want a lot of non-ascii characters to work.
I think you should HTML encode/decode the values before comparing them.
Since you are using jQuery you can take advantage of the encoding functions built-in into it. For example:
$('<div/>').html('& #225;gil').html()
gives you "ágil" (notice that I added an extra space between the & and the # so that stackoverflow does not encode it, you won't need it)
This other question has more information about this.
HTML-encoding lost when attribute read from input field

How to SMS string that base on Base64

HI,
anything different between string and string that base on Base64? I can send out string by SMS programmaticallly. But not sure this will apply to string that is base on Base64. can someone provide guidance what need to be done on the string base on Base64 and send it by SMS. Thanks
Do you mean you want the text message to just contain the base64 string itself? That should work fine - as far as I'm aware, all the characters used within base64 are also available in text messages. If you mean you want to send arbitrary binary data which you happen to have in base64 form at the moment, that could be harder. It wouldn't really be "text" at that point - I dare say there's a way of sending it (possibly MMS etc) but you should be looking at APIs which take byte arrays rather than strings at that point.
have you tried to send a Base64String via SMS?
EDIT... thanks to Jon :)
At the risk of another downvote by the almighty Jon Skeet ;) I will provide "some" help... #MilkBottle, if you have a byte[] and you convert it (for example with Convert.ToBase64String) you can send the result as a smsbody. I cannot understand, why you want to do this, but that doesn`t matter. Another (bad) example:
byte[] arr = new byte[] {0x02, 0x04, 0x07};
String smsbody = System.Convert.ToBase64String(arr);
The result will look like this "AgQG" and this is able to be send programmatically with the SmsComposerTask.
Hope that helps...

Resources