Hive throws error after sqooping data - hadoop

I want to import data from database to HDFS in a parquet format then populate the hive table. I can't use sqoop import --hive-import because sqoop moves data from the --target-dir to the hive metastore dir.
So, I am obliged to create the hive schema sqoop create-hive-table, convert the hive table to parquet SET FILEFORMAT parquet, change the location of hive table to point to the suitable file in HDFS and finally import data to the table using sqoop import --as-parquet-file
I am faced to a problem in hive : I cannot preview the data of my table because of this error :
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable
1) How can I solve this problem ?
2) Is there a better solution to do this use case ?

What is your hive version? If your version is 1.0.0, it's a bug. Please follow this link
This bug is fixed in hive 1.2.0 version

Related

Can sqoop write data to hive and hbase together

Can we wirte sqoop data to hive and hbase together in hadoop
I want to write sqoop to hive (rdbms) and hbase (NoSql) together
No it cannot. If you want the data to show up in Hive and HBase, you will have to import it into two separate locations, Create hive table on one for use in Hive. On the second location you will have to create an External Hive table with HBase SerDe properties.
Integrating Hive and HBase. This link shall give you the steps required.

Sqoop and Vertica

Is there anyone here who has worked with sqoop and hp vertica?
I am trying to export data from sqoop to vertica and I find that the performance is extremely poor.
I can switch to the HP vertica connector... but I still want to know why sqoop works so slow when exporting data to vertica.
I also found that when inserting data, sqoop does not support upserts against vertica. I want to know if this issue will be fixed anytime soon?
sqoop export -Dsqoop.export.records.per.statement=1 --driver
com.vertica.jdbc.Driver --mysql-delimiters  --username **** --password **** --
connect jdbc:vertica://hostname/schema?ConnectionLoadBalance=1 --export-dir <hdfs-
data-dir> --table <table_name>
One of the issues is that sqoop if forcing us to set sqoop.export.records.per.statement to 1 for Vertica. Otherwise it throws an error.
I've never used sqoop, but the command line data import function in vertica uses the COPY function; basically it makes a temp file and then runs a file import in the background. It wouldn't be a graceful solution, but you could try dumping your data to a gzip and then running the COPY function directly. I find that the gzip is always the bottleneck for files over a certain threshold (~50Mb+), never the COPY. Could be a backdoor to a faster import.
i work sqoop with vertica database, i use sqoop to export data from the vertica to the hive/HDFS and it work grate, you just need to add the vertica jar to the sqoop folder.
when i want to asq vertica on data that in the HDFS/Hive i use the hcatalog of the vertica. in version 8.1.* it comes with the vertica database and you don't need more connectors.
hcatalog

How to create external table in Hive using sqoop. Need suggestions

Using sqoop I can create managed table but not the externale table.
Please let me know what are the best practices to unload data from data warehouse and load them in Hive external table.
1.The tables in warehouse are partitioned. Some are date wise partitioned some are state wise partitioned.
Please put your thoughts or practices used in production environment.
Sqoop does not support creating Hive external tables. Instead you might:
Use the Sqoop codegen command to generate the SQL for creating the Hive internal table that matches your remote RDBMS table (see http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal)
Modify the generated SQL to create a Hive external table
Execute the modified SQL in Hive
Run your Sqoop import command, loading into the pre-created Hive external table
Step 1: import data from mysql to hive table.
sqoop import
--connect jdbc:mysql://localhost/
--username training --password training
--table --hive-import --hive-table -m 1
--fields-terminated-by ','
Step 2: In hive change the table type from Managed to External.
Alter table <Table-name> SET TBLPROPERTIES('EXTERNAL'='TRUE')
Note:you can import directly into hive table or else to back end of hive.
My best suggestion is to SQOOP your data to HDFS and create EXTERNAL for Raw operations and Transformations.
Finally mashed up data to the internal table. I believe this is one of the best practices to get things done in a proper way.
Hope this helps!!!
Refer to these links:
https://mapr.com/blog/what-kind-hive-table-best-your-data/
In the above if you want to skip directly to the point -->2.2.1 External or Internal
https://hadoopsters.net/2016/07/15/hive-tables-internal-and-external-explained/
After referring to the 1st link then second will clarify most of your questions.
Cheers!!

import table from HDFS into spark

Is there a way to import a table from HDFS directly into spark and store it as an RDD or does it need to made into a textfile to do so?
ps - I get the table onto HDFS from my local system using sqoop (if that matters) and When i do so it comes in the form of 4 files
While I haven't used sqoop before my self, you can use it to create hive tables which you can then query with Spark SQL which will give you back SchemaRDDs :)
You can use the read.jdbc() on your sqlContext to import a table from an external DB into a Spark DataFrame.

Sqooping same table for different schema in parallal is failing

We are having different data base schemas in Oracle. We are planning to sqoop some of the tables from oracle to Hive ware house. But If we put sqooping of tables of an oltp is sequential it is working. But to have a better usage we are planning to sqoop different oltps tables parallay, but it is faling to sqoop same table parallay.
It seems while sqooping a Table, one temporary table will be created in hdfs by sqoop and from there it will move the data to hive table, because of that reason we are not able to sqoop parallay.
Is there any way that we sqoop same tables parallay.
You can use parameter --target-dir to specify arbitrary temporary directory on HDFS where Sqoop will import data first. This parameter should be working in conjunction with --hive-import.

Resources