For example, it prints below string:
"user_description" = "\U5efa\U5b50\Uff0c\U6b4c\U540e\Uff0c\U5c0f\U5e86\Uff0c\U5c0fKen\Uff0c\U8fd9\U4e9b\U90fd\U662f\U6211\U3002\U6211\U60f3\U505a\U7684\U5c31\U662f\Uff0c\U6253\U5f00\U53cc\U624b\Uff0c\U62e5\U62b1\U4f60\U3002";
any one know how to print the actual string instead these un-readable characters?
or do you know why this issue happen? how to avoid it? encoding?
I'm not sure how you print it out, or what type of string it is, but have you played with different types of format specifiers? Apple Doc
Related
Have been doing printf "text" to print some text from a bash script. Is using printf without a format string valid to do?
Putting the entire message in the format string is a reasonable thing to do provided it doesn't contain any dynamic data. As long as you have full control over the string (i.e. it's either just a fixed string, or one selected from a set of fixed strings, or something like that), and you've used that control to make sure it doesn't contain any unintended escape characters, all % characters in it are doubled (making them literal, rather than format specifiers), and the string doesn't start with -.
Basically, if it's a fixed string and it doesn't obviously fail, it'll work consistently.
But if it contains any sort of dynamic data -- filenames, user-entered data, anything at all like that -- you should put format specifiers in the format string, and the dynamic data in separate arguments.
So these are ok:
printf 'Help, Help, the Globolinks!\n'
printf 'Help, Help, the %s!\n' "$monster_name"
But this is not:
printf "Help, Help, the $monster_name!\n" # Don't do this
I am seeing this in my browse:
\xe18\xe23\xe23\xe21\xe0a\xe32\xe15
I believe it's some valid Thai scripts? But how do I know the format of it?
Thanks
It's hard to know if this is the correct answer without more details but the sample you provided looks like a hexadecimal escape sequence.
\x followed by two hexadecimal characters represent a character by its ASCII code
You can check directly wha the value is in your browser console:
console.log("\xe18\xe23\xe23\xe21\xe0a\xe32\xe15");
Output is:
á8â3â3â1àaã2á5
I have a simple struct containing some stuff, and also a Text field. I was looking at the result of encoding this data using Capnp, and for some reason the value of the text field appears in the encoded output twice! That doesn't seem very efficient or sane. Why does this happen?
Cap'n Proto does not encode text fields twice. To understand what happened in your case, we'd need to see your code.
I would like to extract a line of strings but am having difficulties using the correct RegEx. Any help would be appreciated.
String to extract: KSEA 122053Z 21008KT 10SM FEW020 SCT250 17/08 A3044 RMK AO2 SLP313 T01720083 50005
For Some reason StackOverflow wont let me cut and paste the XML data here since it includes "<>" characters. Basically I am trying to extract data between "raw_text" ... "/raw_text" from a xml that will always be formatted like the following: http://www.aviationweather.gov/adds/dataserver_current/httpparam?dataSource=metars&requestType=retrieve&format=xml&hoursBeforeNow=3&mostRecent=true&stationString=PHNL%20KSEA
However, the Station name, in this case "KSEA" will not always be the same. It will change based on user input into a search variable.
Thanks In advance
if I can assume that every strings that you want starts with KSEA, then the answer would be:
.*(KSEA.*?)KSEA.*
using ? would let .* match as less as possible.
I scraped some text from the internet, which I put in an UTF8String. I can use this string normally, but when I select some specific characters (strange character with accents, like in my case ú), which are not part of the UTF8 standard, I get an error, saying that I used invalid indexes. This only happens when the string contains strange characters; my code works with normal string that do not contain strange characters.
Any way to solve this?
EDIT:
I have a variable word of type SubString{UTF8String}
When I use do method(word), no problems occur. When I do method(word[2:end]) (assuming length of at least 2), I get an error in case the second character is strange (not in UTF8).
Julia does indexing on byte positions instead of character position. It is way more efficient for a variable length encoding like UTF-8, but it makes some operations use some more boilerplate.
The problem is that some codepoints is encoded as multiple bytes and when you slice the string from 2:end you would have got half of the first character (witch is invalid and you get an error).
The solution is to get the second valid index instead of 2 in the slice. I think that is something like str[nextind(str, 1):end]
PS. Sorry for a less than clear answer on my phone.
EDIT:
I tried this, and it seems like SubString{UTF8String} and UTF8String has different behaviour on slicing. I've reported it as bug #7811 on GitHub.