Hive is not showing tables - hadoop

I am new to Hadoop and Hive world.
I have a strange problem. When I was working on hive prompt. I have created few tables and hive was showing those tables.
After exiting Hive session when I am again starting Hive terminal "show tables;" is not showing any table!. I can see tables in '/user/hive/warehouse' in HDFS.
What is wrong am I doing. Can you please help me on this?

BalduZ is right . set this in $HIVE_HOME/conf/hive-site.xml
property name = javax.jdo.option.ConnectionURL
property value = jdbc:derby:;databaseName=/home/youruser/hive_metadata/metastore_db;create=true
Next time onwards you can run hive from any dir location. This will solve your problem.

I assume you are using the default configuration, so the problem is where you call hive to start working, since you need to call it from the same directory in order to see the tables you created in the previous hive session.
For example, if you call hive when you are in ~/test/hive and create some tables, and the next time you use hive you start it from ~/test you will not see the tables you created earlier. The easiest solution is to always start hive from the same directory.
However, a better solution would be to configure hive so that it uses a database like MySQL as a metastore. You can find how to do this here.

Related

SAS to HIVE2 Cloudera - Error trying to write

I have the following error while trying to write on the hive2 db :
ERROR: java.io.IOException: Could not get block locations. Source file "/tmp/sasdata-e1-...dlv - Aborting...block==null
the error appears when trying to write a new table or append rows to an existing table. I can connect correctly to the db (through a libname), read tables from the schema and when trying to create a new table the new table get created but empty because the error above happens .
Can someone help pls?
Thank you
Remember that hive is mostly just a metadatastore that helps you to read files from HDFS. Yes, it does this through a database paradigm but it's really operating on HDFS. Each table is created in an HDFS directory, and files are created.
This sounds like you don't have write permissions to the hdfs folder you are writing to. (but you have read)
To solve this problem you need to understand what user you are using and where the data is being written.
If you are creating a simple table you need to check if you can write to the hive warehouse directory. If you are purposely creating files in a specific hDFS folder you should check that.
Here's a command to help you determine where the data is being written to.
show create table [mytable]
If it doesn't mention a HDFS location you need to find get permissions to the hive warehouse. (Typicallys located hdfs:/user/hive/warehouse , but is actually defined in $HIVE_HOME/conf/hive-default.xml if it's located elsewhere).

Spark(2.3) not able to identify new columns in Parquet table added via Hive Alter Table command

I have a Hive Parquet table which I am creating using Spark 2.3 API df.saveAstable. There is a separate Hive process that alters the same parquet table to add columns (based on requirements).
However, next time when I try to read the same parquet table into Spark dataframe, the new column which was added to the parquet table using Hive Alter Table command is not showing up in the df.printSchema output.
Based on initial analysis, it seems that there might be some conflict, and Spark is using its own schema instead of reading the Hive metastore.
Hence, I tried the below options :
Changing the spark setting:
spark.sql.hive.convertMetastoreParquet=false
and Refreshing the spark catalog:
spark.catalog.refreshTable("table_name")
However, the above two options are not solving the problem.
Any suggestions or alternatives would be super helpful.
This sounds like a bug described in SPARK-21841. JIRA description also contains the idea for a possible workaround:
...Interestingly enough it appears that if you create the table
differently like:
spark.sql("create table mydb.t1 select ip_address from mydb.test_table limit 1")
Run your alter table on mydb.t1 val t1 = spark.table("mydb.t1")
Then it works properly...
To fix this solution, you have to use the same alter command used in hive to spark-shell as well.
spark.sql("alter table TABLE_NAME add COLUMNS (col_A string)")

Hive insert and load data query is not working

My hive query is not working. Hive allowing me to create the databases, show databases and create table as well but it don't allow me to move local file to into HDFS table and insert query is also not working.
I tried reinitialize my metastore and format namenode and created again every directory. But still anything is not working.
My datanode is not starting. Is this problem related to datanode? What should I do.
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Error caching map.xml.
This error is coming when I try to run any query except create table and databases.
From the errors above ,You will not be able to write in hdfs.
Hive allowing me to create the databases, show databases and create
table as well but it don't allow me to move local file to into HDFS
table and insert query is also not working.
Freeing up the HDFS Space would work

Oozie shell action using beeline

I am creating a oozie workflow in which I'm executing one she'll script. This shell script calls one ".hql" file using beeline.
The hql file is selecting from table one and inserting on table two, both the table one and two are partitioned.
When I am running Oozie job that beeline operation is executing with no error,but data is not getting inserted into table two.
The same hql command when I execute on beeline terminal works fine and inserts data in table two.
What could be possible reason for hql file not behaving as expected?
Read below article of horton work :
https://community.hortonworks.com/questions/28224/strange-issue-with-beeline.html
After a lot of trial and error I found the issue. The hive.hql file was expecting a new line at the end of the query which is a bug in Hive 0.13.1 and has been fixed in hive 0.14.0.

Repairing hive table using hiveContext in java

I want to repair the hive table for any newly added/deleted partitions.Instead of manually running msck repair command in hive,is there any way to achieve this in java?I am trying to get all partitions from hdfs and from hive metastore and then after comparing them will put newly added/deleted partitions in hive metastore.But i am not able to get the api from hivecontext.I have tried to get all the partitions using hivecontext,but it is throwing error table not found.
System.out.println(hiveContext.metadataHive().getTable("anshu","mytable").getAllPartitions());
Is there any way to add/remove partitions in hive using java?
Spark Option :
using hivecontext you can execute this like below example. no need to do it manually
sqlContext = HiveContext(sc)
sqlContext.sql("MSCK REPAIR TABLE your table")
Is there any way to add/remove partitions in hive using java?
Plain java option :
If you want to do it in plain java way with out using spark, with plain java code then
You can use class HiveMetaStoreClient to query directly from HiveMetaStore.
Please see my answer here with example usage

Resources