How to display utf-8 characters in hbase shell? - shell

when I use get command in hbase shell,like:
hbase(main)> get 't1','00003ab'
result is:
PATHID:path0 timestamp=1463537742385, value={"pathSign":"\xE5\x8C\x97\xE5\xAE\x89\xE9\x97\xA8\xE8\xA1\x97"} //hexadecimal
The utf-8 characters is displayed to hexadecimal.

Related

LUA : How to print a Latin1 string with io.write()?

In Lua 5.4, I tried to print sone strings in Latin1 encoding with io.write(), but some characters (à,é...) are not well printed,
How could I perform this ?
Here is a screenshot of failed print with win-125x.lua
I guess you are running Lua on Windows.
Because you are converting Latin1 characters to UTF8, you should set the Windows console codepage to UTF8 before running your Lua script, with the following command :
chcp 65001
An other option is to save your script with UTF8 encoding without the need to convert strings from cp1252 to UTF8 and use the chcp command before running your script.
Remember that standard Lua has no concept of string encoding and that Windows support for UTF8 characters in the console is incomplete. Hence this kind of problems.
Check that related question too : Problem with accents while copying files in LUA
If you have the table utf8 you can do...
> do io.write(utf8.char(8364):rep(3)..'\n'):flush() end
€€€
To get the code you can do...
> do io.write(utf8.codepoint('€')..'\n'):flush() end
8364
But i am not sure if that works on windows.
...i am on linux.

Text file encoding properly as utf-8 in Scite editor but fails to encode to uft-8 in ruby

I have a text file which if viewed in the Scite editor with the encoding set to utf-8, displays all text correctly, including capital letters with an accute accent (i.e. Á).
However, if I write a ruby script and use mystring.encode("utf-8") it will give me this error on capital letters that carry an acute accent (i.e. Á):
encode': "\x81" to UTF-8 in conversion from Windows-1252 to UTF-8 (Encoding::UndefinedConversionError)
Is this expected behaviour? How can I encode the whole text to utf-8 using ruby, knowing that otherwise it does get successfully encoded in the Scite editor?
Code:
ine_file = File.open("../../_data/ine_spain_demographics.csv", 'r')
ine_towns_population_hash = Hash.new
ine_file.each do|line|
values = line.split(";")
town_name = values[3]
population = values[4]
begin
ine_towns_population_hash[town_name.encode("utf-8")] = population
rescue
puts "problematic string: " + town_name
end
end
You say that ine_file.external_encoding says Windows-1252 so the file is being opened as a Windows-1252 encoded file. Then you say town_name.encode("utf-8") in an attempt to encoded a string as UTF-8 and Ruby complains. But the file is actually UTF-8; reading UTF-8 bytes as Windows-1252 and then trying to recode those bytes as UTF-8 isn't going to work.
You need to open the file in UTF-8 mode:
File.open("../../_data/ine_spain_demographics.csv", 'r:UTF-8')
and stop trying to change the encoding of town_name, just use town_name as-is.
It seems like it's misinterpreting the encoding of ine_spain_demographics.csv.
Looking at the doc's for encode and open you have two options:
Use replace in encode to tell Ruby what character to use town_name.encode("utf-8", replace: '').
Identify the correct file encoding and specify it: File.open("../../_data/ine_spain_demographics.csv", 'r:ISO-8859-1')

change character ก to Ď during xls2csv -d utf-8 /source.xls >desination.csv

When I convert an XLS file to CSV using xls2csv there is a problem that the character 'ก' is converted to 'Ď'.
My command is:
xls2csv -d utf-8 /test.xls>/test.csv
Other characters of Thai and other languages work normally.

Is Bash support Unicode 6.0?

When I use unicode 6.0 character(for example, 'beer mug') in Bash(4.3.11), it doesn't display correctly.
Just copy and paste character is okay, but if you use utf-16 hex code like
$ echo -e '\ud83c\udf7a',
output is '??????'.
What's the problem?
You can't use UTF-16 with bash and a unix(-like) terminal. Bash strings are strings of bytes, and the terminal will (if you have it configured correctly) be expecting UTF-8 sequences. In UTF-8, surrogate pairs are illegal. So if you want to show your beer mug, you need to provide the UTF-8 sequence.
Note that echo -e interprets unicode escapes in the forms \uXXXX and \UXXXXXXXX, producing the corresponding UTF-8 sequence. So you can get your beer mug (assuming your terminal font includes it) with:
echo -e '\U0001f37a'

What Encoding does Ruby 1.9.3 use to parse the output of a shell command using backtics?

When executing
lines = `gpg --list-keys --with-colons horst`
What Encoding will the string lines have? How do I change how Ruby interprets it?
Background:
I have some Umlauts in some gpg keys, and I get this error when trying to split by newline:
invalid byte sequence in UTF-8
My current workaround is this:
lines.force_encoding('ISO-8859-1')
However, I don't get why this should be ISO-8859-1, as my locale is en_US.UTF-8..
I'm not sure if you still need an answer on this or not but it looks like you'll have to use the --display-charset or –charset option in your gpg command in order to set the name of the native character set. This is used to convert some strings to proper UTF-8 encoding. You shouldn't have to enforce encoding downstream after you've done that.
Check the gpg man page on your server to see which option is available to you.

Resources