How can I run Hive Explain command from java code? - hadoop

I want to run Hive and Impala Explain and compute stats command from java code. So that I can use the collected information for my analysis purpose. If any one have any idea please help

You can run it as any other jdbc query against impala.
The compute stats query for a table called temp would be "compute stats temp" and you can pass this as an argument for the jdbc statement.execute
Similarly, to explain a query, say "select count( * ) from temp" the query to pass as an argument for statement.execute is "explain select count(*) from temp".

Related

Why use Sparksql to query elasticsesarch the result incorrect

I'm using sparkSql to query elasticsearch. The program runs successfully, but the query result incorrect. The SQL is:
select count(1) from myTableName;
Every time run the program the result is nonconformity. How can I guarantee the result is the same?

How to execute select query on oracle database using pi spark?

I have written a program using pyspark to connect to oracle database and fetch data. Below command works fine and returns the contents of the table:
sqlContext.read.format("jdbc")
.option("url","jdbc:oracle:thin:user/password#dbserver:port/dbname")
.option("dbtable","SCHEMA.TABLE")
.option("driver","oracle.jdbc.driver.OracleDriver")
.load().show()
Now I do not want to load the entire table data. I want to load selected records. Can I specify select query as part of this command? If yes how?
Note: I can use dataframe and execute select query on the top of it but I do not want to do it. Please help!!
You can use subquery in dbtable option
.option("dbtable", "(SELECT * FROM tableName) AS tmp where x = 1")
Here is similar question, but about MySQL
In general, the optimizer SHOULD be able to push down any relevant select and where elements so if you now do df.select("a","b","c").where("d<10") then in general this should be pushed down to oracle. You can check it by doing df.explain(true) on the final dataframe.

sqoop2 import very large postgreSQL table failed

I am trying to use sqoop transfer from cdh5 to import large postgreSQL table to HDFS. The whole table is about 15G.
First, I tried to import just use the basic information, by entering schema and table name, it didn't work. I always get GC overhead limit exceeded. I tried to change the JVM heap size on Cloudera manager configuration for Yarn and sqoop to maximum (4G), still no help.
Then, I am trying to use sqoop transfer SQL statement to transfer partly of the table, I added SQL statement in the field as the following:
select * from mytable where id>1000000 and id<2000000 ${CONDITIONS}
(partition column is id).
The statement is failed, actually any kind of statements with my own "where" condition were having the error: "GENERIC_JDBC_CONNECTOR_0002:Unable to execute the SQL statement"
Also I tried to use the boundary query, I can use "select min(id), 1000000 from mutable", and it worked, but I tried to use "select 1000000, 2000000 from mytable" to select data further ahead which caused the sqoop server crash and down.
Could someone help? How to add where condition? or how to use the boundary query. I have searched in many places, I didn't find any good document about how to write SQL statement with sqoop2. Also is that possible to use direct on sqoop2?
Thanks

How can I see the SQL execution plan in Oracle?

I'm learning about database indexes right now, and I'm trying to understand the efficiency of using them.
I'd like to see whether a specific query uses an index.
I want to actually see the difference between executing the query using an index and without using the index (so I want to see the execution plan for my query).
I am using sql+.
How do I see the execution plan and where can I found in it the information telling me whether my index was used or not?
Try using this code to first explain and then see the plan:
Explain the plan:
explain plan
for
select * from table_name where ...;
See the plan:
select * from table(dbms_xplan.display);
Edit: Removed the brackets
The estimated SQL execution plan
The estimated execution plan is generated by the Optimizer without executing the SQL query. You can generate the estimated execution plan from any SQL client using EXPLAIN PLAN FOR or you can use Oracle SQL Developer for this task.
EXPLAIN PLAN FOR
When using Oracle, if you prepend the EXPLAIN PLAN FOR command to a given SQL query, the database will store the estimated execution plan in the associated PLAN_TABLE:
EXPLAIN PLAN FOR
SELECT p.id
FROM post p
WHERE EXISTS (
SELECT 1
FROM post_comment pc
WHERE
pc.post_id = p.id AND
pc.review = 'Bingo'
)
ORDER BY p.title
OFFSET 20 ROWS
FETCH NEXT 10 ROWS ONLY
To view the estimated execution plan, you need to use DBMS_XPLAN.DISPLAY, as illustrated in the following example:
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE'))
The ALL +OUTLINE formatting option allows you to get more details about the estimated execution plan than using the default formatting option.
Oracle SQL Developer
If you have installed SQL Developer, you can easily get the estimated execution plan for any SQL query without having to prepend the EXPLAIN PLAN FOR command:
##The actual SQL execution plan
The actual SQL execution plan is generated by the Optimizer when running the SQL query. So, unlike the estimated Execution Plan, you need to execute the SQL query in order to get its actual execution plan.
The actual plan should not differ significantly from the estimated one, as long as the table statistics have been properly collected by the underlying relational database.
GATHER_PLAN_STATISTICS query hint
To instruct Oracle to store the actual execution plan for a given SQL query, you can use the GATHER_PLAN_STATISTICS query hint:
SELECT /*+ GATHER_PLAN_STATISTICS */
p.id
FROM post p
WHERE EXISTS (
SELECT 1
FROM post_comment pc
WHERE
pc.post_id = p.id AND
pc.review = 'Bingo'
)
ORDER BY p.title
OFFSET 20 ROWS
FETCH NEXT 10 ROWS ONLY
To visualize the actual execution plan, you can use DBMS_XPLAN.DISPLAY_CURSOR:
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST ALL +OUTLINE'))
Enable STATISTICS for all queries
If you want to get the execution plans for all queries generated within a given session, you can set the STATISTICS_LEVEL session configuration to ALL:
ALTER SESSION SET STATISTICS_LEVEL='ALL'
This will have the same effect as setting the GATHER_PLAN_STATISTICS query hint on every execution query. So, just like with the GATHER_PLAN_STATISTICS query hint, you can use DBMS_XPLAN.DISPLAY_CURSOR to view the actual execution plan.
You should reset the STATISTICS_LEVEL setting to the default mode once you are done collecting the execution plans you were interested in. This is very important, especially if you are using connection pooling, and database connections get reused.
ALTER SESSION SET STATISTICS_LEVEL='TYPICAL'
Take a look at Explain Plan. EXPLAIN works across many db types.
For sqlPlus specifically, see sqlplus's AUTO TRACE facility.
Try this:
http://www.dba-oracle.com/t_explain_plan.htm
The execution plan will mention the index whenever it is used. Just read through the execution plan.

How does oracle execute an sql statement?

such as:
select country
from table1
inner join table2 on table1.id=table2.id
where table1.name='a' and table2.name='b'
group by country
after the parse, which part will be executed first?
It looks like you want to know the execution plan chosen by Oracle. You can get that ouput from Oracle itself:
set serveroutput off
< your query with hint "/*+ gather_plan_statistics */" inserted after SELECT >
select * from table(dbms_xplan.display_cursor(null, null, 'last allstats'));
See here for an explanation how to read a query plan: http://download.oracle.com/docs/cd/E11882_01/server.112/e16638/ex_plan.htm#i16971
Be aware however that the choice of a query plan is not fixed. Oracle tries to find the currently best query plan, based on available statistics data.
There are plenty of places you can find the order in which SQL is executed:
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
But note that this is the "theoretical" order - SQL engines are allowed to perform the operations in other orders, provided that the end result appears to have been produced by using the above order.
If you install the free tool SQL*Developer from Oracle, then you can click a button to get the explain plan.
A quick explanation is at http://www.seeingwithc.org/sqltuning.html

Resources