Documentation of manually passing parameters ${parameter} inside query - hadoop

Hive documented about setting variables in hiveconf
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution
I know there is also a way of passing parameters using ${parameter}(not hiveconf), e.g.
select * from table_one where variable = ${parameter}
And then the hive editor would prompt you to enter the value for parameter when you submit the query.
I can't find where Apache hadoop documents this way of passing parameters. Is this way of passing parameters inherent in hive or oozie? If it is oozie why can it be used in the hive editor?

This is a feature of Hue. There is a reference to this feature in Cloudera documentation, at least for older versions. For example, the Hive Query Editor User Guide describes it.
PARAMETERIZATION Indicate that a dialog box should display to enter parameter values when a query containing the string $parametername is executed. Enabled by default.

Related

Import Sqoop column names issue

I have a question on Kylo and Nifi.
The version of Kylo used is 0.10.1
The version of Nifi used is 1.6.0
When we create a feed for database ingest (using database as source), in the Additional Options step there is no provision to enter the source table column names.
However, in Nifi side, we use an Import Sqoop processor which has a mandatory field called Source Fields and it requires that the columns be entered, separated by commas. If it is not done, we get an error:
ERROR tool.ImportTool: Imported Failed: We found column without column name. Please verify that you've entered all column names in your query if using free form query import (consider adding clause AS if you're using column transformation)
For our requirement, we want Import Sqoop to take all the columns from the table automatically into this property without manual intervention at Nifi level. Is there any option to include all columns of a database table in the background automatically? Or is there any other possibility of giving this value in UpdateAttribute processor?
As mentioned in the Comments, ImportSqoop is not a not a normal Nifi processor. This does not have to be problem, but will mean it is probably not possible to troubleshoot the problem without involving the creator.
Also, though I am still debating whether Nifi on Sqoop is an antipattern, it is certainly not necessary.
Please look into the standard options first:
Standard way to get data into Nifi from tables is with standard processors such as ExecuteSQL
If that doesn't suffice, the standard way to use Sqoop (a batch tool) is with a batch scheduler, such as Oozie or Airflow
This thread may take away further doubts on point 1: http://apache-nifi.1125220.n5.nabble.com/Sqoop-Support-in-NIFI-td5653.html
Yes, Teradata Kylo Import Sqoop is not standard NiFi processor, but it's there for us to use. Looking deeper at processor's properties, we can see that indeed, SOURCE_TABLE_FIELDS is required there. Then you have an option to manually hard-code the list of columns or set up a method to generate the list dynamically.
Typical solution is to provide the list of fields is by querying table's metadata. A particular solution depends on where source and target tables are set up and how mapping is defined between source and target columns. For example, one could use databases' INFORMATION_SCHEMA tables and match columns by name. Because SQOOP's output should match the source, one has to find a way to generate the column list and provide it to ImportSqoop processor. A better yet approach could involve a separate metadata that would store the source and target information along with mappings and possible transforms (many tools are available there for that purpose, for example, Wherescape).
More specifically, I would use LookupAttribute paired with database or scripted lookup service to retrieve the column list from some metadata provider.

How can I access a hive variable within Hive UDF without passing as argument

I want to access one/many hive variables, set with set var=XXX in my hive UDF evaluate() function/class.
As per this answer, I can pass these using ${hiveconf:var}, but can I access these without passing as arguments to the UDF.
I am open to any other means by which I can access a specific set of properties within the UDF that can be passed externally, if above is not possible.

Example about how set a Hive property from within a Hive query

I need a quick example of how to change a property in hive using a query, for instance, I would like to change the property 'mapred.reduce.tasks' so, how to perform this change within a query.
I'm training my self for HDPCD exam and one of the goals in the exam is 'Set a Hadoop or Hive configuration property from within a Hive query' So I suppose that it's not the same as performing in hive console something like:
set mapred.reduce.tasks=2;
To change Hadoop and Hive configuration variable you need to use set in the hive query.
The change made will be applicable only to that query session
set -v prints all Hadoop and Hive configuration variables.
SET mapred.reduce.tasks=XX // In Hadoop 1.X
SET mapreduce.job.reduces=XX // In Hadoop 2.X (YARN)
reset in query resets the configuration to the default values

set variable as query's result in hive from hue

I am currently using hue and doing all the work through its hive editor, and now I want to store the result of a query inside a variable.
I know that hiveconf does not support this. I have seen people using hive CLI/ shell script to achieve it. But I don't know how to use shell script and use it to communicate with hue or the HDFS. I would prefer using a variable , if possible, instead of using a table to store the value. Would someone give me some advice?
May I know if I can do it through Oozie workflows also? Thanks.

Dynamically calculating oozie parameter (number of reducers for MR action)

In my oozie workflow I dynamically create a hive table, say T1. This hive action is then followed by a map-reduce action. I want to set number of reducers property (mapred.reduce.tasks) equal to distinct values of a field say (T1.group). Any ideas how to set value of some oozie parameter dynamically and how to get value of the parameter from hive distinct action to oozie parameter?
I hope this can help:
Create the hive table as you are doing already.
Execute another Hive query which calculates the distinct values for the column and writes it to a file in hdfs.
Create an Shell action, which will read the file and echo the value in the form of key=value. Enable the capture-output for the shell action.
This is your MR action. Now access the action data using the Oozie EL functions. e.g. ${wf:actionData('ShellAction')['key']}, pass this value to the mapred.reduce.tasks in the configuration tag of the MR action.

Resources