Table or view not found-convert hive table to spark dataframe - hadoop

I am trying to do the following operation:
import hiveContext.implicits._
val productDF=hivecontext.sql("select * from productstorehtable2")
println(productDF.show())
The error I am getting is
org.apache.spark.sql.AnalysisException: Table or view not found:
productstorehtable2; line 1 pos 14
I am not sure why that is occurring.
I have used this in spark configuration
set("spark.sql.warehouse.dir", "hdfs://quickstart.cloudera:8020/user/hive/warehouse")
and the location when I do describe formatted productstorehtable2
hdfs://quickstart.cloudera:8020/user/hive/warehouse/productstorehtable2
I have used this code for creating the table
create external table if not exists productstorehtable2
(
device string,
date string,
word string,
count int
)
row format delimited fields terminated by ','
location 'hdfs://quickstart.cloudera:8020/user/cloudera/hadoop/hive/warehouse/VerizonProduct2';
I use sbt (with spark dependencies) to run application. My OS is CentOS and I have spark 2.0
Could someone help me out in spotting where I am going wrong?
edit:
when I perform println(hivecontext.sql("show tables")) it just outputs a blank line
Thanks

Related

PostgreSQL Sqoop import + data line break issue

We are trying to import PostgreSQL data using apache sqoop in to Hadoop environment. On which, identified that direct(keyword: --direct) mode of SQOOP import using the PostgreSQL COPY operation to fast import the data in to HDFS. If the column is having a line breaker(\n) as a value then the QUOTE is added in the column value(example as below:1) which was considered as another record in HIVE table(LOAD DATA INPATH). Is there alternative is available to make this work?
E1: Sample data in HDFS (tried importing with: Default or --input-escaped-by '\' or --input-escaped-by '\n' doesn't help)
value1,"The some data
has line break",value3
Hive table considered it as 2 records.(provided:--hive-delims-replacement '' seems HDFS level data has \n hive detects as new record)
value1 "the same data NULL
has line break" value3 NULL
It seems apache retired this project seems it no longer support bug fixes or any release.
Any of you faced the same problem or any one could help me on this?
Note: I am able to import using non-direct and select query mode.
You could try exporting your data to a non-text format (e.g. Parquet, "-as-parquetfile" sqoop flag). That would fix the issue with new lines.

sum() function gives wrong answer in hiveql

I was playing around with a simple dataset that you can find here.
No matter what I do, calling the SUM() aggregate function on the 4th column of the given data set returns the wrong answer.
Here is the exact code that I have used:
create database beep_boop;
use beep_boop;
create table cause (year INT, sex STRING, cause STRING, value INT)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as textfile
tblproperties("skip.header.line.count" = "1");
load data inpath '/user/verterse/CauseofDeath.csv' into table cause;
select sum(value) from cause;
The answer that I get is 11478567 as shown in the screenshot here.
But using the SUM() in MS Excel gives an answer of 12745563.
I tried deleting the table/database and recreating them from scratch. I tried uploading the csv file again. I tried using different datatypes like INT and BIGINT for the value column. I tried skipping and not skipping the header line. Nothing works. I also know that the file is being read completely because select count(*) from cause; returns a correct answer of 1016.
P.S.: I am new to Hadoop, Hive and big data in general.

Identifying character delimiter in Hive

Below is my data having with delimiter thorn þ
1þNaveenþ"Bangalore ,"Karnataka"þ
2þNaveenþ"Srikanth ^ Karnatakaþ562114
Create table statement is below
CREATE External TABLE adh_dev.delimiter_test (Number INT, Name STRING,
Address string , Pincode int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\-61'
STORED AS TEXTFILE
LOCATION '/xyz/test_delimiter';
Tried below approaches nothing worked
1) Followed below link
Thorn character delimiter is not recognized in Hive
2)Tried to put '-2'
3)Followed below link
http://www.theasciicode.com.ar/extended-ascii-code/capital-letter-thorn-ascii-code-232.html
4)Tried to put '\xFE'
Please help me in resolving the issue
I am using cloudera CDH5.11.1 and hive 1.1.0
Please help me in resolving the issue struggling from past 3 days

Unable to load data from Apache hive to ElasticSearch -

I am using CDH5.5,ElasticSearch-2.4.1.
I have created Hive table and trying to push the hive table data to ElasticSearch using the below query.
CREATE EXTERNAL TABLE test1_es(
id string,
timestamp string,
dept string)<br>
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
LOCATION
'hdfs://quickstart.cloudera:8020/user/cloudera/elasticsearch/test1_es'
TBLPROPERTIES ( 'es.nodes'='localhost',
'es.resource'='sample/test1',
'es.mapping.names' = 'timestamp:#timestamp',
'es.port' = '9200',
'es.input.json' = 'false',
'es.write.operation' = 'index',
'es.index.auto.create' = 'yes'
);<br>
INSERT INTO TABLE default.test1_es select id,timestamp,dept from test1_hive;
I'm getting the below error in the Job Tracker URL
"
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured. <br>
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all. "
It will throw "FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask" in hive terminal.
I tried all the steps mentioned in forums like including /usr/lib/hive/bin/elasticsearch-hadoop-2.0.2.jar in hive-site.xml, adding ES-hadoop jar to HIVEAUXJARS_PATH, copied yarn jar to /usr/lib/hadoop/elasticsearch-yarn-2.1.0.Beta3.jar also. Please suggest me how to fix the error.
Thanks in Advance,
Sreenath
I'm dealing with the same problem, and I found the execution error thrown by hive is caused by a timestamp field of string type which could not be parsed. I'm wondering whether timestamp fields of string type could be properly mapped to es, and if not this could be the root cause.
BTW, you should go to the hadoop MR log to find more details about the error.
REATE EXTERNAL TABLE test1_es(
id string,
timestamp string,
dept string)<br>
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ...........
don't need location

How to give a function as a input for s3 location in hive script

I am trying to do achieve this;
location/11.11
location/12.11
location/13.11
In order to do that , i have tried many things and couldn't make it happen.
Now i have an Udf hive function which returns me the location of s3 table, but i am facing with an error ;
ParseException line 1:0 cannot recognize input near 'LOCATION'
'datenow' '(' LOCATION datenow(); NoViableAltException(143#[])
This is my hive script , i have two external tables.
CREATE TEMPORARY FUNCTION datenow AS 'LocationUrlGenerator';
CREATE EXTERNAL TABLE IF NOT EXISTS s3( file Array<String>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\001' LINES TERMINATED BY '\n';
LOCATION datenow();
LOCATION accepts a string, not an UDF. The Language Manual si a bit unclear because it only specifies [LOCATION hdfs_path] and leaves hdfs_path undefined, but it can only be an URL location path, a string. In general UDFs are not acceptable in DDL context.
Build a script with any text tool of choice and run that script.
I managed it like that ,
INSERT INTO TABLE S3
PARTITION(time)
SELECT func(json),from_unixtime(unix_timestamp(),'yyyy-MM-dd') AS time FROM tracksTable;

Resources