simple JSON file analysing in Hive-0.14 using serde

simple JSON file analysing in Hive-0.14 using serde - hadoop

I am trying to execute hive commands on json file using jsonserde's,but I am always getting null values ,but not actual data. I have used serde's provided in "code.google.com/p/hive-json-serde/downloads/list" link. I have tried multiple ways but all of the attempts were not successful. Please can some one help me with the exact steps to be followed and serde's to be used in order to work with json files in apache hive latest version (0.14)
BR,
San

Here are the simple steps to play around with JSON in Hive
Create a hive table
CREATE EXTERNAL TABLE IF NOT EXISTS json_table (
field1 string COMMENT 'This is a field1',
field2 int COMMENT 'This is a field2',
field3 string COMMENT 'This is a field3',
field4 double COMMENT 'This is a field4'
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
Location '/path/to/json_table';
Sample data for your table. Copy the below content into a json file and store into a file location pointed by json_table.
{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}
Make sure JSON Serde Jar file is added in the HIVE class path. For this example we have used openx json serde. It can be downloaded from here
Command to add the jar
ADD JAR /path-to/json-serde-1.3.6-jar-with-dependencies.jar;
Now we can query the entries from json_table
select * from json_table;

Related

Can I import a CSV file to livesql.oracle.com?

UPDATE: I finally found a way to do the process I want. But I needed to first convert CSV file to .sql file from an external web and then uploaded the script to LIVE SQL and run the script. It worked smoothly and fulfilled my requirements.
I need to insert some values from a CSV file to database. Since I can't install Oracle into my laptop for not having requirements, I am using Live SQL (web version of Oracle DBMS) from Oracle. But It seems like I can't import any data from external files into the web version. Now, I want to know how I can import those data from csv files to my database table easily? Is there any way in LIVE SQL (possibly I didn't find any as I am a beginner) or any other way to proceed this?

Import the CSV file into a spreadsheet then if you have the columns A, B and C with data that is, respectively, a number, string and date data type then into the D column enter the formula:
="INSERT INTO your_table ( column1, column2, column3 ) VALUES ("&A1&", '"&B1&"', DATE '"&TEXT(C1,"yyyy-mm-dd")&"');"
Copy/paste the formula down for all the rows.
Copy/paste the output of the D column into Oracle's LiveSQL.
(Note: this assumes that the CSV file will not attempt any SQL injection. If it will then you will need to guard against it.)

Unable to load data from Apache hive to ElasticSearch -

I am using CDH5.5,ElasticSearch-2.4.1.
I have created Hive table and trying to push the hive table data to ElasticSearch using the below query.
CREATE EXTERNAL TABLE test1_es(
id string,
timestamp string,
dept string)<br>
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
LOCATION
'hdfs://quickstart.cloudera:8020/user/cloudera/elasticsearch/test1_es'
TBLPROPERTIES ( 'es.nodes'='localhost',
'es.resource'='sample/test1',
'es.mapping.names' = 'timestamp:#timestamp',
'es.port' = '9200',
'es.input.json' = 'false',
'es.write.operation' = 'index',
'es.index.auto.create' = 'yes'
);<br>
INSERT INTO TABLE default.test1_es select id,timestamp,dept from test1_hive;
I'm getting the below error in the Job Tracker URL
"
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured. <br>
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all. "
It will throw "FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask" in hive terminal.
I tried all the steps mentioned in forums like including /usr/lib/hive/bin/elasticsearch-hadoop-2.0.2.jar in hive-site.xml, adding ES-hadoop jar to HIVEAUXJARS_PATH, copied yarn jar to /usr/lib/hadoop/elasticsearch-yarn-2.1.0.Beta3.jar also. Please suggest me how to fix the error.
Thanks in Advance,
Sreenath

I'm dealing with the same problem, and I found the execution error thrown by hive is caused by a timestamp field of string type which could not be parsed. I'm wondering whether timestamp fields of string type could be properly mapped to es, and if not this could be the root cause.
BTW, you should go to the hadoop MR log to find more details about the error.

REATE EXTERNAL TABLE test1_es(
id string,
timestamp string,
dept string)<br>
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ...........
don't need location

Creating Hive table for handling fixed length file

I have a fixed length file in HDFS on top of which i have to create external table using regex.
My file is something like this:
12piyush34stack10
13pankaj21abcde41
I want it to convert it into a table like:
key_column Value_column
---------- -----------------
1234stack 12piyush34stack10
1321stack 13pankaj21abcde41
I tried even by substr using insert but i am unable to point to key_columns.
Please help with solving this problem.

I don't know why you've used regexp external table but the way cannot workout so as to need also use another substring operation.
If me , I would make a regexp serde table then create two columns(key_column , Value_column) and just specify serde option as follows:
SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" ="(\d\d)\w{6}(\d\d).*",
"output.format.string" = "%1$s%2$sstack %0$s"
)
The output option will write the space separated data to corresponding columns by order.
Haven't yet test it , mind the back slash may not interpreted right in java.

Oracle: Import CSV file

I've been searching for a while now but can't seem to find answers so here goes...
I've got a CSV file that I want to import into a table in Oracle (9i/10i).
Later on I plan to use this table as a lookup for another use.
This is actually a workaround I'm working on since the fact that querying using the IN clause with more that 1000 values is not possible.
How is this done using SQLPLUS?
Thanks for your time! :)

SQL Loader helps load csv files into tables: SQL*Loader
If you want sqlplus only, then it gets a bit complicated. You need to locate your sqlloader script and csv file, then run the sqlldr command.

Another solution you can use is SQL Developer.
With it, you have the ability to import from a csv file (other delimited files are available).
Just open the table view, then:
choose actions
import data
find your file
choose your options.
You have the option to have SQL Developer do the inserts for you, create an sql insert script, or create the data for a SQL Loader script (have not tried this option myself).
Of course all that is moot if you can only use the command line, but if you are able to test it with SQL Developer locally, you can always deploy the generated insert scripts (for example).
Just adding another option to the 2 already very good answers.

An alternative solution is using an external table: http://www.orafaq.com/node/848
Use this when you have to do this import very often and very fast.

SQL Loader is the way to go.
I recently loaded my table from a csv file,new to this concept,would like to share an example.
LOAD DATA
infile '/ipoapplication/utl_file/LBR_HE_Mar16.csv'
REPLACE
INTO TABLE LOAN_BALANCE_MASTER_INT
fields terminated by ',' optionally enclosed by '"'
(
ACCOUNT_NO,
CUSTOMER_NAME,
LIMIT,
REGION
)
Place the control file and csv at the same location on the server.
Locate the sqlldr exe and invoce it.
sqlldr userid/passwd#DBname control=
Ex : sqlldr abc/xyz#ora control=load.ctl
Hope it helps.

Somebody asked me to post a link to the framework! that I presented at Open World 2012. This is the full blog post that demonstrates how to architect a solution with external tables.

I would like to share 2 tips: (tip 1) create a csv file (tip 2) Load rows from a csv file into a table.
====[ (tip 1) SQLPLUS to create a csv file form an Oracle table ]====
I use SQLPLUS with the following commands:
set markup csv on
set lines 1000
set pagesize 100000 linesize 1000
set feedback off
set trimspool on
spool /MyFolderAndFilename.csv
Select * from MYschema.MYTABLE where MyWhereConditions ;
spool off
exit
====[tip 2 SQLLDR to load a csv file into a table ]====
I use SQLLDR and a csv ( comma separated ) file to add (APPEND) rows form the csv file to a table.
the file has , between fields text fields have " before and after the text
CRITICAL: if last column is null there is a , at the end of the line
Example of data lines in the csv file:
11,"aa",1001
22,"bb',2002
33,"cc",
44,"dd",4004
55,"ee',
This is the control file:
LOAD DATA
APPEND
INTO TABLE MYSCHEMA.MYTABLE
fields terminated by ',' optionally enclosed by '"'
TRAILING NULLCOLS
(
CoulmnName1,
CoulmnName2,
CoulmnName3
)
This is the command to execute sqlldr in Linux. If you run in Windows use \ instead of / c:
sqlldr userid=MyOracleUser/MyOraclePassword#MyOracleServerIPaddress:port/MyOracleSIDorService DATA=datafile.csv CONTROL=controlfile.ctl LOG=logfile.log BAD=notloadedrows.bad
Good luck !

From Oracle 18c you could use Inline External Tables:
Inline external tables enable the runtime definition of an external table as part of a SQL statement, without creating the external table as persistent object in the data dictionary.
With inline external tables, the same syntax that is used to create an external table with a CREATE TABLE statement can be used in a SELECT statement at runtime. Specify inline external tables in the FROM clause of a query block. Queries that include inline external tables can also include regular tables for joins, aggregation, and so on.
INSERT INTO target_table(time_id, prod_id, quantity_sold, amount_sold)
SELECT time_id, prod_id, quantity_sold, amount_sold
FROM EXTERNAL (
(time_id DATE NOT NULL,
prod_id INTEGER NOT NULL,
quantity_sold NUMBER(10,2),
amount_sold NUMBER(10,2))
TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_dir1
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '|')
LOCATION ('sales_9.csv') REJECT LIMIT UNLIMITED) sales_external;

Oracle sqlldr timestamp format headache

I'm struggling to get sqlldr to import a csv data file into my table, specifically with the field that is a timestamp.
The data in my csv file is in this format:
16-NOV-09 01.57.48.001000 PM
I've tried all manner of combinations in my control file and am going around in circles. I can't find anything online - not even the Oracle reference page that details what all the date/timestamp format strings are.
Does anyone know where this reference page is, or what format string I should be using in my control file for this timestamp format.
For reference, this is what I've most recently tried:
load data
infile 'kev.csv'
into table page_hits
fields terminated by "~"
( ...
event_timestamp TIMESTAMP "dd-mmm-yy hh24.mi.ss",
...)

you can try this format:
event_timestamp TIMESTAMP "dd-MON-yy hh.mi.ss.ff6 PM"
You can browse all available formats in the SQL reference documentation.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

simple JSON file analysing in Hive-0.14 using serde - hadoop

Related

Can I import a CSV file to livesql.oracle.com?

Unable to load data from Apache hive to ElasticSearch -

Creating Hive table for handling fixed length file

Oracle: Import CSV file

Oracle sqlldr timestamp format headache

Categories

Resources