In Python, I can use \N to print unicode characters.
print('\N{White Smiling Face}')
Will print ☺
Is there an way to do the same thing in Go? Couldn't find anything in unicode or x/text.
I’m printing a parameter returned from a query that’s a string of letters and underscores.
The label prints just the letters without the underscores, and I’m not sure how to fix it.
Thank you very much.
(Removing the FH Only reads to the first underscore.
The ^FH command without parameter defaults to underscore as the hexidecimal escape character. Either remove the ^FH or specify a different escape character like backslash using ^FH\^FD<String>^FS.
Here is the Unicode characters table for the Tibetan language,
How to I use the codes in that chart in a fmt.Printf(mycode) statement, in order to print, say the Tibetan letter ཏ, which is located at line U+0F4x and column F of that unicode chart.
Do I have to write:
or something like that, or do I have to drop the “U” or the “U+“ ?
To print ཏ (U+0F4F TIBETAN LETTER TA) (or any other Unicode character), you can put the character directly into your string literal, use a \u0F4F escape, or use the correspoding rune (Unicode codepoint):
fmt.Printf("Direct: ཏ\n")
fmt.Printf("Escape: \u0F4F\n")
fmt.Printf("Rune: %c\n", rune(0x0F4F))
The Go blog has some details...
I am using Arduino with OPEN-SMART Touch Screen Expansion Shield, which uses Adafruit_GFX library. I would need to print characters from whole UTF-8 as for example letters with diactritics, Greek letters and so on. If I try to print these characters with default font, it prints some nonsense. What should I do?
I need to match emojis in a string in Ruby using a regex. I have tried several unicode sequences and none seem to quite do the job. I am also not sure where the start and end range for emojis would be.
This regex matches all 845 emoji, taken from Emoji unicode characters for use on the web:
I generated this regex directly from the raw list of Unicode emoji. The algorithm is here:
Example usage:
regex = /[\u{203C}\u{2049}\u{20E3}\u{2122}\u{2139}\u{2194}-\u{2199}\u{21A9}-\u{21AA}\u{231A}-\u{231B}\u{23E9}-\u{23EC}\u{23F0}\u{23F3}\u{24C2}\u{25AA}-\u{25AB}\u{25B6}\u{25C0}\u{25FB}-\u{25FE}\u{2600}-\u{2601}\u{260E}\u{2611}\u{2614}-\u{2615}\u{261D}\u{263A}\u{2648}-\u{2653}\u{2660}\u{2663}\u{2665}-\u{2666}\u{2668}\u{267B}\u{267F}\u{2693}\u{26A0}-\u{26A1}\u{26AA}-\u{26AB}\u{26BD}-\u{26BE}\u{26C4}-\u{26C5}\u{26CE}\u{26D4}\u{26EA}\u{26F2}-\u{26F3}\u{26F5}\u{26FA}\u{26FD}\u{2702}\u{2705}\u{2708}-\u{270C}\u{270F}\u{2712}\u{2714}\u{2716}\u{2728}\u{2733}-\u{2734}\u{2744}\u{2747}\u{274C}\u{274E}\u{2753}-\u{2755}\u{2757}\u{2764}\u{2795}-\u{2797}\u{27A1}\u{27B0}\u{2934}-\u{2935}\u{2B05}-\u{2B07}\u{2B1B}-\u{2B1C}\u{2B50}\u{2B55}\u{3030}\u{303D}\u{3297}\u{3299}\u{1F004}\u{1F0CF}\u{1F170}-\u{1F171}\u{1F17E}-\u{1F17F}\u{1F18E}\u{1F191}-\u{1F19A}\u{1F1E7}-\u{1F1EC}\u{1F1EE}-\u{1F1F0}\u{1F1F3}\u{1F1F5}\u{1F1F7}-\u{1F1FA}\u{1F201}-\u{1F202}\u{1F21A}\u{1F22F}\u{1F232}-\u{1F23A}\u{1F250}-\u{1F251}\u{1F300}-\u{1F320}\u{1F330}-\u{1F335}\u{1F337}-\u{1F37C}\u{1F380}-\u{1F393}\u{1F3A0}-\u{1F3C4}\u{1F3C6}-\u{1F3CA}\u{1F3E0}-\u{1F3F0}\u{1F400}-\u{1F43E}\u{1F440}\u{1F442}-\u{1F4F7}\u{1F4F9}-\u{1F4FC}\u{1F500}-\u{1F507}\u{1F509}-\u{1F53D}\u{1F550}-\u{1F567}\u{1F5FB}-\u{1F640}\u{1F645}-\u{1F64F}\u{1F680}-\u{1F68A}]/
str = "I am a string with emoji 😍😍😱😱👿👿🐔🌚 and other Unicode characters 比如中文."
str.gsub regex, ''
# "I am a string with emoji and other Unicode characters 比如中文."
Other Unicode characters, such as Asian characters, are preserved.
EDIT: I udpated the regex to exclude ASCII numbers and symbols. See comments from How do I remove emoji from string for details.
Emojis don't exist in one single range. They are scattered about. This is a collection of codes, and ranges where possible, that will match emojis. Tested in ruby 2.0.0p451:
str = "😣"
You can use the emoji_data gem to canonically match emoji in a string via it's .scan method:
(disclaimer: I am the author)
Some of the more recent Emoji need to be constructed by multiple Emoji-related codepoints, for example, using the invisible "Zero-width joiner" (U+200D) codepoint to construct so called Emoji ZWJ sequences. You can use my unicode-emoji gem, which comes with a regex, build from the latest Emoji data by the Unicode consortium.
I use the code below:
puts "matched" if "中国" =~ /\w+/
it puts "matched" and surprised me, since "中国" is two Chinese characters, it doesn't any of 0-9, a-z, A-Z and _, but why it outputs "matched".
Could somebody give me some clues?
I'm not sure of the exact flavor of regex that Ruby uses, but this isn't just a Ruby aberration as .net works this way as well. MSDN says this about it:
Matches any word character. For
non-Unicode and ECMAScript
implementations, this is the same as
[a-zA-Z_0-9]. In Unicode categories,
this is the same as
So it's not the case that \w necessarily just means [a-zA-Z_0-9] - it (and other operators) operate differently on Unicode strings compared to how they do for Ascii ones.
This still makes it different from . though, as \w wouldn't match punctuation characters (sort of - see the \p{Lo} list below though) , spaces, new lines and various other non-word symbols.
As for what exactly \p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc} does match, you can see on a Unicode reference list:
\p{Ll} Lowercase Unicode letter
\p{Lu} Uppercase Unicode letter
\p{Lt} Titlecase Unicode letter
\p{Lo} Other Unicode letter
\p{Nd} Decimal, number
\p{Pc} "Punctuation, connector"
Oniguruma, which is the regex engine in Ruby 1.9+, defines \w as:
[\w] word character
Not Unicode:
* alphanumeric, "_" and multibyte char.
* General_Category -- (Letter|Mark|Number|Connector_Punctuation)
In 1.9+, Ruby knows if the string has Unicode characters, and automatically switches to use Unicode mode for pattern matching.