Why use Sparksql to query elasticsesarch the result incorrect - elasticsearch

I'm using sparkSql to query elasticsearch. The program runs successfully, but the query result incorrect. The SQL is:
select count(1) from myTableName;
Every time run the program the result is nonconformity. How can I guarantee the result is the same?

Related

query ORACLE database during springboot application no response

A sql like "SELECT COUNT(1) FROM tablename..." was executed by my springboot application with mybatis,but no response during a long time.
But when I executed the same sql directly in ORACLE,it worked and returned the response.
With executing these sql:
select event,count(*) from v$session_wait group by event order by 2 desc;
I found the 'latch:cache buffers chains' event.
I asked my frined and he told me that I can execute analyze table tablename compute statistics.
After executed the sql,the problem resolved,the select count sql executed by springboot application worked !
I feel puzzled,why the analyze sql worked?
Thank you.

Why FINAL modifier doesn't trigger Clickhouse's merging procedure

There's a ReplacingMergeTree table in Clickhouse, when executing select count(1) from tbl, it yields 71961920. Then I executed select count(1) from tbl FINAL trying to trigger the merge procedure as noted in the official document:
but it turns out after running the above sql with FINAL, the sql without FINAL still yields the "wrong" result as if it was not being merged at all. Could anyone help explain? Thanks~
In order to force clickhouse to merge tables use OPTIMIZE keyword.
https://clickhouse.com/docs/en/sql-reference/statements/optimize/
but keep in mind this:
Although you can run an unscheduled merge using the OPTIMIZE query, do not count on using it, because the OPTIMIZE query will read and write a large amount of data. - ClickHouse ReplacingMergeTree

T-SQL : why is running multiple sql statements in one batch slower without GO?

I have come to very interesting problem (at least for me).
When I run following SQL:
SELECT count(*) AS [count]
FROM [dbo].[contract_v] AS [contract_v]
WHERE 1 = 0;
SELECT *
FROM [dbo].[contract] AS [contract]
LEFT JOIN ([dbo].[contract_accepted_garbage_type] AS [garbageTypes->contract_accepted_garbage_type]
INNER JOIN [dbo].[garbage_type] AS [garbageTypes] ON [garbageTypes].[id] = [garbageTypes->contract_accepted_garbage_type].[garbage_type_id])
ON [contract].[id] = [garbageTypes->contract_accepted_garbage_type].[contract_id]
WHERE [contract].[id] IN (125018);
Execution takes 21s
However when I add GO statement as following:
SELECT count(*) AS [count]
FROM [dbo].[contract_v] AS [contract_v]
WHERE 1 = 0;
GO
SELECT *
FROM [dbo].[contract] AS [contract]
LEFT JOIN ([dbo].[contract_accepted_garbage_type] AS [garbageTypes->contract_accepted_garbage_type]
INNER JOIN [dbo].[garbage_type] AS [garbageTypes] ON [garbageTypes].[id] = [garbageTypes->contract_accepted_garbage_type].[garbage_type_id])
ON [contract].[id] = [garbageTypes->contract_accepted_garbage_type].[contract_id]
WHERE [contract].[id] IN (125018);
It takes only 2s.
The view used in first SQL statement is based on the table called in second statement.
Could you please explain this behaviour to me? I know that GO statement makes database create separate execution plan for every batch. I have checked the execution plans, and the actual steps are identical.
Thank you!
The GO keyword separates execution batches. If the underlying tables are the same in both queries, and they are executed in the same batch, both queries have to be executed with the same transaction context. This ensures that the underlying data in both tables is the same during both executions.
If using separate batches (GO statement in-between), you cannot guarantee that the data will be consistent in that rows could theoretically be modified in between executions.
If you don't care about the chance of the data changing in between queries, then by all means use GO for performance. If you do care, consider it a dangerous move.
SQL Server applications can send multiple Transact-SQL statements to an instance of SQL Server for execution as a batch. The statements in the batch are then compiled into a single execution plan. Programmers executing ad hoc statements in the SQL Server utilities, or building scripts of Transact-SQL statements to run through the SQL Server utilities, use GO to signal the end of a batch.
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/sql-server-utilities-statements-go?view=sql-server-ver15

How can I run Hive Explain command from java code?

I want to run Hive and Impala Explain and compute stats command from java code. So that I can use the collected information for my analysis purpose. If any one have any idea please help
You can run it as any other jdbc query against impala.
The compute stats query for a table called temp would be "compute stats temp" and you can pass this as an argument for the jdbc statement.execute
Similarly, to explain a query, say "select count( * ) from temp" the query to pass as an argument for statement.execute is "explain select count(*) from temp".

How can I see the SQL execution plan in Oracle?

I'm learning about database indexes right now, and I'm trying to understand the efficiency of using them.
I'd like to see whether a specific query uses an index.
I want to actually see the difference between executing the query using an index and without using the index (so I want to see the execution plan for my query).
I am using sql+.
How do I see the execution plan and where can I found in it the information telling me whether my index was used or not?
Try using this code to first explain and then see the plan:
Explain the plan:
explain plan
for
select * from table_name where ...;
See the plan:
select * from table(dbms_xplan.display);
Edit: Removed the brackets
The estimated SQL execution plan
The estimated execution plan is generated by the Optimizer without executing the SQL query. You can generate the estimated execution plan from any SQL client using EXPLAIN PLAN FOR or you can use Oracle SQL Developer for this task.
EXPLAIN PLAN FOR
When using Oracle, if you prepend the EXPLAIN PLAN FOR command to a given SQL query, the database will store the estimated execution plan in the associated PLAN_TABLE:
EXPLAIN PLAN FOR
SELECT p.id
FROM post p
WHERE EXISTS (
SELECT 1
FROM post_comment pc
WHERE
pc.post_id = p.id AND
pc.review = 'Bingo'
)
ORDER BY p.title
OFFSET 20 ROWS
FETCH NEXT 10 ROWS ONLY
To view the estimated execution plan, you need to use DBMS_XPLAN.DISPLAY, as illustrated in the following example:
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE'))
The ALL +OUTLINE formatting option allows you to get more details about the estimated execution plan than using the default formatting option.
Oracle SQL Developer
If you have installed SQL Developer, you can easily get the estimated execution plan for any SQL query without having to prepend the EXPLAIN PLAN FOR command:
##The actual SQL execution plan
The actual SQL execution plan is generated by the Optimizer when running the SQL query. So, unlike the estimated Execution Plan, you need to execute the SQL query in order to get its actual execution plan.
The actual plan should not differ significantly from the estimated one, as long as the table statistics have been properly collected by the underlying relational database.
GATHER_PLAN_STATISTICS query hint
To instruct Oracle to store the actual execution plan for a given SQL query, you can use the GATHER_PLAN_STATISTICS query hint:
SELECT /*+ GATHER_PLAN_STATISTICS */
p.id
FROM post p
WHERE EXISTS (
SELECT 1
FROM post_comment pc
WHERE
pc.post_id = p.id AND
pc.review = 'Bingo'
)
ORDER BY p.title
OFFSET 20 ROWS
FETCH NEXT 10 ROWS ONLY
To visualize the actual execution plan, you can use DBMS_XPLAN.DISPLAY_CURSOR:
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST ALL +OUTLINE'))
Enable STATISTICS for all queries
If you want to get the execution plans for all queries generated within a given session, you can set the STATISTICS_LEVEL session configuration to ALL:
ALTER SESSION SET STATISTICS_LEVEL='ALL'
This will have the same effect as setting the GATHER_PLAN_STATISTICS query hint on every execution query. So, just like with the GATHER_PLAN_STATISTICS query hint, you can use DBMS_XPLAN.DISPLAY_CURSOR to view the actual execution plan.
You should reset the STATISTICS_LEVEL setting to the default mode once you are done collecting the execution plans you were interested in. This is very important, especially if you are using connection pooling, and database connections get reused.
ALTER SESSION SET STATISTICS_LEVEL='TYPICAL'
Take a look at Explain Plan. EXPLAIN works across many db types.
For sqlPlus specifically, see sqlplus's AUTO TRACE facility.
Try this:
http://www.dba-oracle.com/t_explain_plan.htm
The execution plan will mention the index whenever it is used. Just read through the execution plan.

Resources