HCatalog/Hive table creation does not import data into /app/hive/warehouse folder in hadoop cluster - hadoop

I ran into a very weird problem within a hadoop cluster (HDP 2.2) I setup in Amazon EC2 (3 data nodes + one name node + one secondary name node). Hue server runs on the main name node and hive server runs on the secondary name node. I was using Hue web interface to create table "mytable" in HCatalog using a CSV file loaded into HDFS. The table creation returned successfully without error. The table was created and displayed in the Hue web interface. However, when I tried to query the table, it returned 0 record. I went to the /app/hive/warehouse folder, I could see the table folder "mytable" was created, but the CSV file was never copied into that folder. I reproduced the same behavior using hive shell.
If I does the same operation in the HDP sandbox VM, everything works as expected. After the table creation, the /app/hive/warehouse/mytable folder contains the CSV file I imported into the table.
Any help is highly appreciated.

I solved the issue. I realized the server in the cluster with the hive server running is low on physical memory. After free up some memory on the box, the hcatalog table creation operation worked as expected.

Related

Lineage is not visible for Hive Managed Table in HDP Atlas

I am using Atlas with HDP for creating the lineage flow for my hive tables but the lineage is only visible for the Hive External tables. I have created hive managed tables and perform a join operation to create a new table and imported the hive meta store using import-hive.sh placed under hook-bin folder. But the lineage for the managed table is not visible.
Even the HDFS directory is not listed for the managed table. But, if I check for the external table HDFS directory is available.
Can anyone help me over here? Thanks in advance.
There were two factors which were causing the issue in my case, first was with the Hive-Hook and second was with offsets.topic.replication.factor. To resolve this below steps were implemented:-
1) Re-install Hive Hook for the atlas
Grep the lists of the services installed for Apache Atlas and re-install the Hive-Hook jar
2) Kafka offset replication property
Changes the offsets.topic.replication.factor value to 1.
By implementing the above changes, lineage is reflecting in the Atlas itself for Hive as well as sqoop.

Data from Cloudera cluster(which was mostly in hive tables) missing after a cluster restart

I recently restarted a Cloudera cluster as i was having some problems with hue service. After the restart i can see the databases and tables in hive but once i run a statement to see the data, i get no rows. I have checked /user/hive directory in HDFS and see no warehouse directory. I feel like this is the case where the name node is not connecting to my data nodes but i am not sure about it. (The cluster is of 5 nodes).
I have tried to look through the logs and i did not find anything. There was one single CSV file in HDFS and i can see that it exists but the whole of /hive/warehouse directory is missing.
Thanks.

hadoop architecture query example

Currently i have 2 machines one of them is the Horton sandbox i have configured it as name node and decommissioned the data node from it and other machine which i have made and made it as a data node and i have installed hive server on it.
Also and assigned the slave role to it and i used Ambari to finish it .
My question is as its my first time ever to use hadoop my plan is to transfer data from sql database to the hadoop so does this mean i have to install mysql on datanode while i will be using sqoop and other thing what will the name node do ?shall i query it and it passes the queries to the datanode am really very much confused and really having huge pressure to finish so forgive me as am newbie the installations of the machines are all default i have chosen datanode for the First machine and nodemanager for the second one with no special configurations appreciate if You have a simple example from which i can understand .
Thanks alot fellows
Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.
example like- you have some data in mysql in other machine and you have to transfer the data into your hadoop hdfs. In this condition sqoop will be used
NameNode stores MetaData(No of Blocks, On Which Rack which DataNode the data is stored and other details) about the data being stored in DataNodes whereas the DataNode stores the actual Data.

Unable to retain HIVE tables

I have set up a single node hadoop cluster on ubuntu.I have installed hadoop 2.6 version on in my machine.
Problem:
Everytime i create HIVE tables and load data into it , i can see the data by querying on it but once i shut-down my hadoop , tables gets wiped out. Is there any way i can retain them or is there any setting i am missing?
I tried some online solution provided , but nothing worked , kindly help me out with this.
Blockquote
Thanks
B
The hive table data is on the hadoop hdfs, hive just add a meta data and provide users sql like commands to prevent them from writing basic MR jobs.So if you shutdown the hadoop cluster,Hive cant find the data in the table.
But if you are saying when you restart the hadoop cluster, the data is lost.That's another problem.
seems you are using default derby as metastore.configure the hive metastore.i am pointing the link.please fallow it.
Hive is not showing tables

Hive Server doesn't see old hdfs tables

I'm having a problem about hive server that I don't understand. I've just set up a hadoop cluster and want to access to it from a hive service. First try I did was running the hive server in one of the cluster machines.
Everything worked nicely but I wanted to move the hive service to another machine outside the hadoop cluster.
So I just started a new machine outside this hadoop cluster. I've just install hive (+ hadoop libraries) and copied the hadoop config from the cluster. When I run the hiveserver almost everything goes ok. I can connect with the hive cli from a different machine to my hiveserver, create new tables in the hive warehouse within the hdfs filesystem in the hadoop cluster, query then and so on.
The thing I don't understand is that hiveserver seems to not recognize old tables which were created in my first try.
Some notes about my config are that all tables are handled by Hive and stored in HDFS. Hive configuration is the default one. I suppose that it has to do with my hive metastore but it couldn't say what.
Thank you!!

Resources