Oozie shell action using beeline - shell

I am creating a oozie workflow in which I'm executing one she'll script. This shell script calls one ".hql" file using beeline.
The hql file is selecting from table one and inserting on table two, both the table one and two are partitioned.
When I am running Oozie job that beeline operation is executing with no error,but data is not getting inserted into table two.
The same hql command when I execute on beeline terminal works fine and inserts data in table two.
What could be possible reason for hql file not behaving as expected?

Read below article of horton work :
https://community.hortonworks.com/questions/28224/strange-issue-with-beeline.html
After a lot of trial and error I found the issue. The hive.hql file was expecting a new line at the end of the query which is a bug in Hive 0.13.1 and has been fixed in hive 0.14.0.

Related

How to create fact and dimension tables for single hive table along with incremental data

I am new to create a Data warehouse(star schema) in hive.My requirement is,i am able to get one hive table along with incremental data using sqoop job.For this hive table I have to create Fact and Dimension tables along with updated data continuously.
Is it possible in Hive,if yes how to create.If no what is the another approach.
If anybody have any idea please share with me.
you can use certain steps to automate your requirement
create a shell script which will contain your hive query for
creating Fact & Dimension table
eg. your_shell_script.sh will contain code
#!/bin/sh
hive -e "use hivedb; CREATE TABLE FACTS as select your columns from Source_table;"
hive -e "use hivedb; CREATE TABLE DIMENSIONS as Select your coloumns from Source_table;"
Note: you can use any create table method you want to use, depends upon how you want to create your tables, you can add partitions also.
Start Crontab deamon of your Linux OS.
Make a Crontab entry in your Linux which will execute your shell script at certain time after your main source table sqoop uploading is complete.
eg. crontab -e
0 11 * * * /path/to/your/script/your_shell_script.sh
note this crontab entry will run your shell script at 11 AM in morning ( after your sqoop loading)
Hope this helps.

Why is mapreduce not Executed for Hive queries?

I had one query where we have Hive table created and when we select * from table where=< condition>; ,it gives results immediately without invoking MR job.When I create a same duplicate table and try to execute a query then MR is invoked. What could be the possible reason for this?
I got the answer,The reason was Hive analyze command was issued on the table .Once you execute a hive analyze command it stores number of row,file size in hive metastore.So ,when u do select count(*) from table.It directly fetches it from the hive metastore instead of invoking a map reduce job.
You can also issue a Analyze command on column as well.
ANALYZE TABLE [db_name.]tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] -- (Note: Fully support qualified table name since Hive 1.2.0, see HIVE-10007.)
COMPUTE STATISTICS
[FOR COLUMNS] -- (Note: Hive 0.10.0 and later.)
[CACHE METADATA] -- (Note: Hive 2.1.0 and later.)
[NOSCAN];
Documentation link :
https://cwiki.apache.org/confluence/display/Hive/StatsDev
Local mode (hive not invoking MR) depends on several conditions (see HIVE-1408):
hive.exec.mode.local.auto=true/false - Lets Hive determine whether to run in local mode automatically.
hive.exec.mode.local.auto.input.size.max=1G - When hive.exec.mode.local.auto is true, input bytes should be less than this for local mode.
hive.exec.mode.local.auto.input.files.max=4 - When hive.exec.mode.local.auto is true, the number of tasks should be less than this for local mode.
If the tables have the same data, my guess is that there is a difference in the number of tasks that are spawned when querying the two tables causing one query to run in local mode and another to spawn a MR job.

Using Hive in real world applications?

I am a newbee on Hadoop stack, I have learned map-reduce and now hive.
But I am not sure about hive use?
In map-R we have one or more output files n that's our final result, but In hive we can select the records using SQL like queries i.e. HQL but we are not getting any final output file. Results will be shown on terminal only.
Now my Q is how can we use this select HQL so that it can be used by some other analytic's team?
There are lot of ways to extract/export the hive query result outside.
If you want the result in any RDBMS storage you can use Sqoop.
I suggest you go through what Sqoop is and what it does.
And if you want your query results in a file, then there are lot of ways.
Hive supports exporting data from tables.
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select * from table;
Another simple approach would be to simple redirecting you hive query outputs to a file while running your hive queries in CLI.
hive -e "select * from table" > output.txt

Sqoop - Create empty hive partitioned table based on schema of oracle partitioned table

I have an oracle table which has 80 columns and id partitioned on state column. My requirement is to create a hive table with similar schema of oracle table and partitioned on state.
I tried using sqoop -create-hive-table option. But keep getting an error
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Partition key state cannot be a column to import.
I understand that in Hive the partitioned column should not be in table definition, but then how do I get around the issue?
I do not want to manually write create table command, as I have 50 such tables to import and would like to use sqoop.
Any suggestion or ideas?
Thanks
There is a turn around for this.
Below is the procedure i fallow :
On Oracle run query to get the schema for a table and store it to a file.
Move that file to Hadoop
On Hadoop create a shell script which constructs a HQL file.
That hql file contains "Hive create table statement along with columns". For this we can use the above file(Oracle schema file copied to hadoop).
For this script to run u need to just pass Hive database name,table name, partition column name,path, etc.. depending on u r customization level.At the end of this shell script add "hive -f HQL filename".
If everything is ready it just takes couple of mins for each table creation.

Hive is not showing tables

I am new to Hadoop and Hive world.
I have a strange problem. When I was working on hive prompt. I have created few tables and hive was showing those tables.
After exiting Hive session when I am again starting Hive terminal "show tables;" is not showing any table!. I can see tables in '/user/hive/warehouse' in HDFS.
What is wrong am I doing. Can you please help me on this?
BalduZ is right . set this in $HIVE_HOME/conf/hive-site.xml
property name = javax.jdo.option.ConnectionURL
property value = jdbc:derby:;databaseName=/home/youruser/hive_metadata/metastore_db;create=true
Next time onwards you can run hive from any dir location. This will solve your problem.
I assume you are using the default configuration, so the problem is where you call hive to start working, since you need to call it from the same directory in order to see the tables you created in the previous hive session.
For example, if you call hive when you are in ~/test/hive and create some tables, and the next time you use hive you start it from ~/test you will not see the tables you created earlier. The easiest solution is to always start hive from the same directory.
However, a better solution would be to configure hive so that it uses a database like MySQL as a metastore. You can find how to do this here.

Resources