supporting multiple languages in cassandra - utf-8

I am analyzing Facebook data using Cassandra due to which I ended up having need of multiple languages text in one of my columns.
I am unable to insert text data into Cassandra which is not English:
<stdin>:1:'ascii' codec can't encode character u'\u010c' in position 51: ordinal not in range(128)
<stdin>:1:Invalid syntax at char 7623
I browsed thorough the Internet and found that i need to override coding (link)
but I am not sure how to configure this.
Note : there is a possibility of multiple language in a single row.

Your column seems to be of type ascii, which only supports US-ASCII-encoded text. If you need a wider range of characters, use varchar instead (see here for details on CQL types).
To change the column type, use this ALTER TABLE statement:
ALTER TABLE my_table ALTER my_column TYPE varchar;
See here for details on ALTER TABLE.

Related

Accomodate more data in a Oracle column without increasing size

I have a scenario where I would like to know if we can accommodate more characters to an Oracle column without increasing the column size.
I have a Oracle column bname which is of type varchar2(256). The column is getting updated via Java code. I would like to know if there is any way to accommodate more than 256 characters in this column without increasing the size?
Wanted to know if there are any column compression techniques available to accommodate the same?
Use smaller font. Just kidding.
As far as I can tell, you can't do that. 256 is the limit you set, so - the only option is to
alter table that_table modify bname varchar2(500);
Depending on database version, you can go up to 4000 characters (or 32767, if MAX_STRING_SIZE is set to extended). If that's not enough, CLOB is your choice.
If you want to stored compressed data, then use BLOB datatype (so you'd e.g. put a ZIP file into that column).
~ o ~
Or, perhaps you could alter the table and add another column:
alter table that_table add bname_1 varchar2(256);
and make your Java code "split" value in two parts and store the first 256 characters into bname, and the rest into bname_1.
Other than that, no luck, I'm afraid.

How to filter on RAW data type oracle

We are using Oracle for one of our client databases. I am not very well versed with it. There is a column basis on which I need to filter records. The column was printing System.Byte before and when I converted it to VARCHAR(50) it was printing as 000B000000000000000000000000000A.
I need to know how to filter the records with this value in the mentioned column.
If the idea of the column is to represent a hex string:
SELECT UTL_I18N.RAW_TO_CHAR ('000B000000000000000000000000000A', 'AL32UTF8')
FROM DUAL;
could work for you, however more information about the application and expected results will be needed for a more fitting solution.

Hive - column type name too long

I want to use rcongiu's hive-json-serde to store non-trivial JSON documents complying with an open standard. I've used Michael Peterson's convenient hive-json-schema generator to produce a CREATE TABLE statement that should work, except for its size.
The JSON documents I am encoding follow a well-defined schema, but the schema contains maybe a hundred fields, nested up to four levels deep. A Hive column type that matches the standard is very, very long (around 3700 characters), and when I run my generated create table statement I get the error
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
InvalidObjectException(message:Invalid column type name is too long: <the
really long type name>)
The statement looks like this:
CREATE TABLE foobar_requests (
`event_id` int,
`client_id` int,
`request` struct<very long and deeply nested struct definition>,
`timestamp` timestamp)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
Any path forward to storing these documents?
Hive has a problem with very long column definitions. By default the maximum number of chars supported is 4000 so if you really need more than this you'll have to alter the metastore database by extending the length of COLUMNS_V2.TYPE_NAME.
If you like to read more about the issue go to this link:
https://issues.apache.org/jira/browse/HIVE-12274
Add the following property through Ambari > Hive > Configs > Advanced > Custom hive-site:
hive.metastore.max.typename.length=14000
This issue occurs, when the name of one of the Column Type is longer than the default of 2000 characters.
Solution:
To resolve this issue, do the following: 1.Add the following property through Ambari > Hive > Configs > Advanced > Custom hive-site: hive.metastore.max.typename.length=10000
The above value is an example, and it needs to be tuned according to a specific use case.
2.Save changes, restart services and recreate the table.

How did the unicode characters endup in the database table column?

Recently I came across a unicode character (\u2019) in a database table column while parsing using Python.
Question: What are the reasons that can result in unicode characters showing up in the database table? Is it data entry issue?
Appreciate any input.
When you set up your Oracle Database you choose a character set which will be used in the SQL char datatypes (char, varchar2 etc).
Suppose you chose your character set and you have a table with a column of VARCHAR2 type. Suddenly you need to store some string with non-ASCII symbols not supported by your database (chosen character set). You may convert this string into ASCII string by calling ASCIISTR function for example and store it in your VARCHAR2 column (but it's not a good idea because many SQL built-in functions don't understand '\u2019' (they think it's just 6 symbols)). That's how Unicode may appear in your table column (ASCIISTR converts non-ascii symbols into unicode representation such as '\u2019').
Another option is special Oracle nchar datatypes which were designed to store UNICODE without altering global database settings.
Here is the link with Oracle documentation: https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch6unicode.htm

Char Vs Byte in Oracle

I am comparing two databases which have similar schema. Both should support unicode characters.
When i describe the same table in both database, db 1 shows all the varchar fields with char, (eg varchar(20 char)) but the db2 shows without char, (varchar(20)
the second schema supports only one byte/char.
When i compare nls_database_parameters and v$nls_parameters in both database its all same.
could some one let me know what may be the change here?
Have you checked NLS_LENGTH_SEMANTICS? You can set the default to BYTE or CHAR for CHAR/VARCHAR2 types.
If these parameters are the same on both datbases then maybe the table was created by explicitly specifying it that way.

Resources