How to handle extended ASCII in hive? - hadoop

Just wondering how anyone has dealt with handling extended ASCII in hive. For example, characters like §.
I see that character in the raw data stored as string in Hive but once I query or export the data it does not show up properly. Is there anyway to retain the §?

Related

Apache NiFi JSON to SQL with pre processing

I am completely new NiFi but I understand from people it is good.
However, I am going to be sent a JSON, where there be an embedded array which can contain hex, byte, an ASCII characters. These values will nee converting to string values before inserting into Oracle.
Searching the internet, there are no proper examples to follow which converts JSON to SQL and converts data from hex to string, etc. Are there any examples to follow? Has anyone done something similar and advise?
there are two ways as I know of to convert JSON to SQL:
The first one is by using Jolt Transformation, which is not very efficient with large data comparatively.
The second one is which I prefer by using a series of processors to convert JSON to SQL: EvaluateJsonPath --> AttributesToJson --> ConvertJSONToSQL -->PutSQL.
There is a processor known as EncodeContent or EncodeAttribute for the conversion of hex to different formats.

How to upload unstructured data into Google Storage that has latin-1 encoding

We have unstructured data, however already transformed into tabular structures, that we want to upload to GCP Storage in order to process them with bigQuery, feeding our data teams. However, our data is not encoded in UTF-8 and it does have all sort of special spanish characters. So everytime we tried to upload the data, encoding stuff happens, and our data just f"$#"# up. So I was wondering, if anybody here knows an API that can help us with our issue; or if the is an existant method or pipeline within GCP to handle this kind of data transformation.
Spanish Characters such as: ñ, ó, ¡
Summary
We want to ingest a lot of data with Latin-1 encoding in our GCP instance. How we can do it , to preserve the characters while being able to be used by Bigquery and Friends.
PD: We can not transform it UTF-8 before uploading, because the data is too big, and we would like to process(it) within the cloud!

Get data and read short string from txt-file in SPSS syntax

I would like to use GET DATA to open my data. Then read a string from a text file. The string would be a date (eg. "2017-09-02 13:24") which I would use in filtering the data set before saving as .sav.
Is this possible? Or any other suggestion on how to import external information to use while processing the data set?
With ADD FilE I know its possible to open up two different data sets. However, I have to use GET DATA.
The .sps-file is run from spss job-file.

Charset, Accents, Special Characters in Apache Hive

The Problem
I’m having quite some problems with my Hive tables that contain special characters (in French) in some of their row values. Basically, everything that is a special character (like an accent on a letter or other diacritics) gets transformed in pure gibberish (various weird symbols) when querying the data (via Hive CLI or other methods). The problem is not with column names, but with the actual row values and content.
For exemple, instead of printing "Variat°" or any other special character or accent mark, I get this as a result (when using a select statement):
Variat� cancel
Infos & Conf
The Hive table is external, from a CSV file in HDFS that is encoded in charset iso-8859-1. Changing the original file encoding charset doesn’t produce any better result.
I'm using a Hortonworks distribution 2.2 on RedHat Enterprise 6. The original CSV displays correctly in Linux.
The Question
I've looked on the web for similar problems but it would seem that no one encountered it. Or at least everybody uses only English when using Hive :) Some Jiras have addressed issues with special characters in the Hive table column names - but my problem is with actual content of rows.
How can I deal with this problem in Hive?
Is it not possible to display special characters in Hive?
Is there any "charset" option for Hive?
Any help would be greatly appreciated as I’m currently stuck.
Thank you in advance!
I had similar issue but since my source file was small used notepad++ to covert it to UTF-8 encoding.

Hive support for filtering Unicode data

I have a Hive table with Unicode data. When trying to perform a simple query "SELECT * FROM table," I get back the correct data in correct Unicode encoding. However, when I tried to add filtering criteria such as "... WHERE column = 'some unicode value'," my query returned nothing.
Is it Hive's limitation? Or is there anyway to make Unicode filtering work with Hive?
Thank you!
you should use utf-8 format and load data into hive table, then you can get the data use what you've wrote before, e.g. ... name like '%你好%'

Resources