SAS Macro code to Pig/Hive - hadoop

I am working on converting SAS programs to Hadoop ie. Pig or Hive, and I am having trouble converting the macro code in SAS to something in hive. Is there any equivalent for the same since I already read that Hive does not support Stored Procedures? I need to write a hive script which has a macro code like function to call variables and use in the script.

I figured out a way to write the macro code in an if...else statement within Hive itself. Thanks guys for all the help! I know the question was not that greatly put up, but I will learn over time.

Related

How to test Hive CRUD queries from Shell scripting

i am creating a shell script, which should execute The HIVE basic queries and assert that with expected result.
from where should i start in shell scripting.?
thanks in advance
I have found an answer,
one thing we can do is, create a hql file which contain basic queries to be test and triggered hql file through beeline(as i was using) in bash file.

set variable as query's result in hive from hue

I am currently using hue and doing all the work through its hive editor, and now I want to store the result of a query inside a variable.
I know that hiveconf does not support this. I have seen people using hive CLI/ shell script to achieve it. But I don't know how to use shell script and use it to communicate with hue or the HDFS. I would prefer using a variable , if possible, instead of using a table to store the value. Would someone give me some advice?
May I know if I can do it through Oozie workflows also? Thanks.

MapReduce code generated by Hive

Where does Apache HiveQL store the Map/Reduce code it generates?
I believe Hive doesn't really generate Map/Reduce code in the sense as you could get from Java, because it is interpreted by the Hive query planner.
If you want to get an idea of what kind of operations your Hive queries generate, you could prefix your queries with EXPLAIN and you will see the abstract syntax tree, the dependency graph, and the plan of each stage. More info on EXPLAIN here.
If you really want to see some Map/Reduce jobs, you could try YSmart which will translate your HiveQL statements into working Java Map/Reduce code. I haven't used it personally, but I know people who have and said good things about it.
It seems that Hive change this method every query execution.
http://hive.apache.org/docs/r0.9.0/api/org/apache/hadoop/hive/ql/exec/Task.html#execute(org.apache.hadoop.hive.ql.DriverContext)

How to pump data to txt file using Oracle datapump?

all hope for you.
I need to export a huge table (900 fields, 1 000 000 strings) in to txt ansi file.
UTL_FILE takes a lot of time. It is not suitable in this task.
I'am trying to use Oracle Datapump, but i can't receive txt file with ansi symbols in it (only 2TTЁ©QRўҐEJЉ•).
Can anybody advice me anything.
Thank you in advance.
Oracle Data Pump can only export in its proprietary binary format.
If you want to export data to text you have only a few options:
PL/SQL or Java (stored) procedure which writes a file using UTL_FILE or the Java equivalent api.
A program running outside the database that writes to a file. Use whichever language you're comfortable with.
Pro*C might be a good choice as it is apparently much faster than the UTL_FILE approach, see http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:459020243348
Use a special SQL script and run it in SQL*Plus using spooling. This is the "SQL Unloader" approach, see http://www.orafaq.com/wiki/SQL*Loader_FAQ#Is_there_a_SQL.2AUnloader_to_download_data_to_a_flat_file.3F
Googling "SQL Unloader" comes up with a few ready-made solutions that you might be able to use directly or modify for your needs.

Can we run queries from the Custom UDF in Hive?

guys I am newbie to Hive and have some doubts in it.
Normally we write custom UDF in Hive for the particular number of columns. (Consider UDF is in Java). Means it performs some operation on that particular column.
I am thinking that can we write such UDF through which we can give the particular column as a input to some query and can we return that query from UDF which will execute on Hive CLI by taking the column as a input?
Can we do this? If yes please suggest me.
Thanks and sorry for my bad english.
This is not possible out of the box because as the Hive query is running, there has been a plan already built that is going to execute. What you suggest is to dynamically change that plan while it is running, which is not only hard because the plan is already built, but also because the Hadoop MapReduce jobs are already running.
What you can do is have your initial Hive query output new Hive queries to a file, then have some sort of bash/perl/python script that goes through that and formulates new Hive queries and passes them to the CLI.

Resources