Jpa self join performance issue - performance

I have implemented jpa self join relationship in one of my project. We have more than 200k records in this table. I am also using delete orphan annotation. Now when my app tries to delete orphaned entities which has this self join then it is taking too long to delete records in db. Our dba check index and everything else.
Can someone please explain what can cause this performance issue? Looking at the query, I believe it is scanning whole table to verify parent child relation before deleting those records.

For huge tables and self join, usually the simple join queries generated by ORM tools are not performant. To confirm you can log the sql produced by the ORM and then run EXPLAIN PLAN FOR or EXPLAIN (depending upon whether the database is oracle or mysql) and analyse the output. This would give you a fair idea of what is wrong.You might need to actually write an an alternate SQL query and tune it and then use NativeQuery feature of JPA to get the query result effectively.

Related

Truncate all database tables with Hibernate and Spring Boot

I need to truncate all database tables after each test. Is there a way to do so or at least a database agnostic way to get all table names so that they can be truncated.
Any other alternatives are welcome. But keep in mind #Transactional and #Rollback will not help as I'm dealing with integration tests which fire http request on the server.
I think you're going to struggle to truncate tables in a simple, database-agnostic way. For example, what do you do about foreign key constraints? Some DBs will let you just truncate the tables in the correct order, leaving you with the problem of how to define that order. But if I recall correctly, some won't let you truncate tables with foreign key constraints at all, even if empty. Then you need to use some DB-specific DDL to disable the constraints, or worse, drop and recreate them.
You are also ruling out parallelising your integration tests if you take this approach.
I've always found a better approach is to make each test clear up just the data that it created. For example, for your create API, you may be able to register a listener that records the IDs of all created entities in your test code, then on teardown you can just reverse iterate this list of IDs, calling your delete API. The downside of this approach is that you may need to implement APIs that your application doesn't actually need, just to support the tests. However these can then be disabled by a flag on deployment to production.
I read this property from a text file and add the 2 hibernate properties in the if statement which rebuilds the database every time I execute my project, perhaps this can help you?
if (environment.getProperty("dbReset").compareTo("ON") == 0)
{
properties.put("hbm2ddl.auto", "create");
properties.put("hibernate.hbm2ddl.auto", "create");
}

Immediate evaluation of CTE

I am trying to optimize a very long and complex impala query which contains multiple CTE. Each CTE is used multiple times. My expectation is that once a CTE is created, I should be able to direct impala that results of this CTE should be re-used in main query as-is instead of SCAN HDFS operation on tables involved in CTE again with the main query. Is this possible? if yes how ?
I am using impalad version 2.1.1-cdh5 RELEASE (build 7901877736e29716147c4804b0841afc4ebc9037) version
I do not think so. I believe WITH clause does not create any permanent object, it's just for you avoid cluttering the namespace with new tables or views and to make it easier to refactor large, complex queries by reor‐
dering and replacing their individual parts. The queries used in the WITH clause are good candidates to someday become views or to be materialized as summary tables during the ETL process.
Is this possible?
The very purpose of the CTE is to re-use of the results obtained from a preceding query (that uses the with clause) by a following query, say, SELECT. So i don't see a reason why it is not possible.
Use Explain on your query to find out the actual SCAN HDFS details.
For more I/O related insights use profile as in the official documentation https://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_explain_plan.html#perf_profile

Can full information about an Oracle schema data-model be selected atomically?

I'm instantiating a client-side representation of an Oracle Schema data-model in custom Table/Column/Constraint/Index data structures, in C/C++ using OCI. For this, I'm selecting from:
all_tables
all_tab_comments
all_col_comments
all_cons_columns
all_constraints
etc...
And then I'm using OCI to describe all tables, for precise information about column types. This is working, but our CI testing farm is often failing inside this schema data-model introspection code, because another test is running in parallel and creating/deleting tables in the middle of this serie of queries and describe calls I'm making.
My question is thus how can I introspect this schema atomically such that another session does not concurrently change that very schema I'm instropecting?
Would using a Read-only Serializable transaction around the selects and describes be enough? I.e. does MVCC apply to Oracle's data dictionaries? What would be the likelihood of SnapShot too Old errors on such system dictionaries?
If full atomicity is not possible, are there steps I could take to minimize the possibility of getting inconsistent / stale info?
I was thinking maybe left-joins to reduce the number of queries, and/or replacing the OCIDescribeAny() calls with other dictionary accesses joined to other tables, to get all table/column info in a single query each?
I'd appreciate some expert input on this concurrency issue. Thanks, --DD
a typical read-write conflict. from the top of my head i see 2 ways around it:
use dbms_lock package in both "introspection" and "another test".
rewrite your retrospection query so that it returns one big thing of what you need. there are multiple ways to do that:
use xmlagg and alike.
use listagg and get one big string or clob.
just use a bunch of unions to get one resultset, as it's guaranteed to be consistent.
hope that helps.

How to write JPQL query to get all the views and indexes from Oracle/postgreSQL DB

Hi every one,
I have a requirement to get all the views and indexes from Oracle/PostgreSQL DB
Here I have written one query like below to get all views
SELECT * FROM INFORMATION_SCHEMA.VIEWS where table_schema = 'public'
But that is postgreSQL dependent query right. Because in Oracle, information_schema was implemented differently so i thought to write jpql query but i dont know how to write.
Could anybody please help me out!
Thanks & regards,
Sridhar Kosna.
It's not possible. JPQL is only able to query mapped entities. Not database tables. You'll need to use SQL, and if you need to support multiple databases, have one DAO per database and use the appropriate DAO based on the used database.

Performance on joins in linq

HI ,
I am going to rewrite a store procedure in LINQ.
What this sp is doing is joining 12 tables and get the data and insert it into another table.
it has 7 left outer joins and 4 inner joins.And returns one row of data.
Now question.
1)What is the best way to achieve this joins in linq.
2) do you think this affect performance (its only retrieving one row of data at a given point of time)
Please advice.
Thanks
SNA.
You might want to check this question for the multiple joins. I usually prefer lambda syntax, but YMMV.
As for performance: I doubt the query performance itself will be affected, but there may be some overhead in figuring out the execution plan, since it's such a complicated query. The biggest performance hit will likely be the extra database round trip you will need compared to the stored procedure. If I understand you correctly, your current SP does the SELECT AND INSERT all at once. Using LINQ to SQL or LINQ to Entities, you will need to fetch the data first before you can actually write them to the other table.
So, it depends on your usage if rewriting is warranted. Alternatively, you can add stored procedures to your data model. It will be exposed as a method on your data context.

Resources