set variable as query's result in hive from hue - hadoop

I am currently using hue and doing all the work through its hive editor, and now I want to store the result of a query inside a variable.
I know that hiveconf does not support this. I have seen people using hive CLI/ shell script to achieve it. But I don't know how to use shell script and use it to communicate with hue or the HDFS. I would prefer using a variable , if possible, instead of using a table to store the value. Would someone give me some advice?
May I know if I can do it through Oozie workflows also? Thanks.

Related

Create hive table through spark job

I am trying to create hive tables as outputs of my spark (1.5.1 version) job on a hadoop cluster (BigInsight 4.1 distribution) and am facing permission issues. My guess is spark is using a default user (in this case 'yarn' and not the job submitter's username) to create the tables and therefore fails to do so.
I tried to customize the hive-site.xml file to set an authenticated user that has permissions to create hive tables, but that didn't work.
I also tried to set Hadoop user variable to an authenticated user but it didn't work either.
I want to avoid saving txt files and then creating hive tables to optimize performances and reduce the size of the outputs through orc compression.
My questions are :
Is there any way to call write function of the spark dataframe api
with a specified user ?
Is it possible to choose a username using oozie's workflow file ?
Does anyone have an alternative idea or has ever faced this problem ?
Thanks.
Hatak!
Consider df holding your data, you can write
In Java:
df.write().saveAsTable("tableName");
You can use different SaveMode like Overwrite, Append
df.write().mode(SaveMode.Append).saveAsTable("tableName");
In Scala:
df.write.mode(SaveMode.Append).saveAsTable(tableName)
A lot of other options can be specified depending on what type you would like to save. Txt, ORC (with buckets), JSON.

Documentation of manually passing parameters ${parameter} inside query

Hive documented about setting variables in hiveconf
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution
I know there is also a way of passing parameters using ${parameter}(not hiveconf), e.g.
select * from table_one where variable = ${parameter}
And then the hive editor would prompt you to enter the value for parameter when you submit the query.
I can't find where Apache hadoop documents this way of passing parameters. Is this way of passing parameters inherent in hive or oozie? If it is oozie why can it be used in the hive editor?
This is a feature of Hue. There is a reference to this feature in Cloudera documentation, at least for older versions. For example, the Hive Query Editor User Guide describes it.
PARAMETERIZATION Indicate that a dialog box should display to enter parameter values when a query containing the string $parametername is executed. Enabled by default.

SAS Macro code to Pig/Hive

I am working on converting SAS programs to Hadoop ie. Pig or Hive, and I am having trouble converting the macro code in SAS to something in hive. Is there any equivalent for the same since I already read that Hive does not support Stored Procedures? I need to write a hive script which has a macro code like function to call variables and use in the script.
I figured out a way to write the macro code in an if...else statement within Hive itself. Thanks guys for all the help! I know the question was not that greatly put up, but I will learn over time.

show hadoop files on HDFS only created on a specific day

I want to show hadoop files on HDFS under a specific folder which created on a specific day, is there a command/option to do this?
Thanks in advance,
Lin
As far as I know, hadoop command won't support this.
You can write a script to achieve this, which is not a good implementation.
My suggestions:
Organize your file in the way more convenient to be used. Say in your case, make a time partition would be better.
If you want to make data analysis easier, use some database based on hdfs like hive. hive support partition and sql like query and insert.
more about hive and hive partitions:
https://hive.apache.org/
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables

Can we run queries from the Custom UDF in Hive?

guys I am newbie to Hive and have some doubts in it.
Normally we write custom UDF in Hive for the particular number of columns. (Consider UDF is in Java). Means it performs some operation on that particular column.
I am thinking that can we write such UDF through which we can give the particular column as a input to some query and can we return that query from UDF which will execute on Hive CLI by taking the column as a input?
Can we do this? If yes please suggest me.
Thanks and sorry for my bad english.
This is not possible out of the box because as the Hive query is running, there has been a plan already built that is going to execute. What you suggest is to dynamically change that plan while it is running, which is not only hard because the plan is already built, but also because the Hadoop MapReduce jobs are already running.
What you can do is have your initial Hive query output new Hive queries to a file, then have some sort of bash/perl/python script that goes through that and formulates new Hive queries and passes them to the CLI.

Resources