H2 database table getting clear automatically - spring

I am using H2 Database to test my SpringBoot application. I do not use the file to store the data. instead I just use the In Memory datatabase. in the properties file, my JdbcUrl look like below:
spring.datasource.url=jdbc:h2:mem:;MODE=MSSQLServer;INIT=runscript from 'classpath:/schema.sql'\\;runscript from 'classpath:/data.sql'
Now When I run the tests, I have the following test scenario
Add Some Entities in a table (This adds some rows in a table)
search those entities by some criteria
Do the assertion
Now, sometime this runs successfully, but sometimes what happens is, the search query returns empty list, which causes the test to be failed.
I tried to add print statements just to check whether my entities are getting inserted properly, so in the insert function. after each insertion, I run the below query
SELECT * FROM tableName;
Which returns correct list. means each insertion is inserting in the table correctly. Now in the search function, before running the actual search query, I run the same query again
SELECT * from tableName;
And Surprisingly this is returning empty also, which means there is no data in the table. Please suggest what I check for?

Pretty sure #Evgenij Ryazanov's comment is correct here.
Closing the last connection to a database closes the database.
When using in-memory databases this means the content is lost.
After step 1) Add Some Entities in a table - is the connection closing?
If so to keep the database open, add ;DB_CLOSE_DELAY=-1 to the database URL.
e.g.
spring.datasource.url=jdbc:h2:mem:;DB_CLOSE_DELAY=-1;MODE=MSSQLServer;INIT=runscript from 'classpath:/schema.sql'\\;runscript from 'classpath:/data.sql'
Note, this can create a memory leak!
see: http://www.h2database.com/html/features.html#in_memory_databases

Related

Is there a way around Hibernate calling OracleStatement.getColumnIndex for each row and column?

I am puzzled by Hibernate’s behavior when loading ResultSets with many columns via HQL: it seems like OracleStatement.getColumnIndex(String) is called over and over again, not just once for every column at the beginning of the load but once for every column when retrieving each and every row of data.
In case of a ResultSet with many columns this seems to take a significant amount of time (in our case about 43 % of the total time, see attached screenshot of a VisualVM profile). Our HQL loads a table with many columns, join fetching two other tables with to-many relations (both having lots of columns, too). Unfortunately we cannot restrict the columns to be loaded, because the task is to preload all objects into a Coherence cache on startup of the system, so the objects have to be complete.
As far as I can tell the problem arises because hydrating the mapped result objects of an HQL query from the ResultSet does use nullSafeGet() for each column which takes String arguments to identify the column and therefore has to call getColumnIndex().
(When loading the data from a ResultSet of an SQL query one can use getString(int), getTimestamp(int) etc. instead of String based versions to avoid this issue.)
We are still using an old version of Hibernate (3.6) but the source on github indicates that the same behavior is still present, as nullSafeGet() is still String based instead of taking an index (or object containing the index) which then could be precomputed once at the beginning of the load.
Is there something that I am missing?
Is there a reason for calling getColumnIndex() for each column of each row of data over and over again?
Is there a way around this which does not involve rewriting the query into SQL and using the index based accessors to build up the mapped objects manually?
The only similar issue I was able to find on the internet was this question which has no answer.
The query there had many columns, too.
Thanks for any help!
Thorsten
This problem is addressed in Hibernate 6, which switches from reading JDBC ResultSet by name to reading by position.

Oracle DB links and retrieving stale data

I have 2 databases, DBa and DBb. I have 2 records sets, RecordsA and RecordsB. The concept is that in our app you can add records from A to B. I am having an issue where I go to add a record from A to B and try to query the records again. The particular property on the added record is stale/incorrect.
RecordsA lives on DBa and RecordsB lives on DBb. I make my stored proc call to add the record to the B side and modify a column's value on DBa which makes the insert/update using a dblink on DBb. Problem is, when I do a insert/update followed by an immidiate get call on DBa (calling DBb) that modified property is incorrect, it's null as if the insert never went through. However, if I put a breakpoint before the pull call and wait about 1 second the correct data is returned. Making me wonder if there is some latency issues with dblinks.
This seems like an async issue but we verified no async calls are being made and everything is running on the same thread. Would this type of behavior be likely with a db link? As in, inserting/updating a record on a remote server and retrieving it right away causing some latency where the record wasn't quite updated at the time of the re-pull?

Spring Boot application with Postgres: indexes not being used during first use

I have a Spring Boot application that is using a Postgres database. When the application is deployed I need to run a transactional operation that uploads a zip file that is used to populate the database. The application is checking for duplicate rows before inserting them (because users can upload duplicate data that should just be ignored).
The problem I am having is that the first time I upload the file, even thought the indexes are created, they are not being used when checking for the existence of a row. My theory is that this happens because the query plan is deciding not to use the index because it is checking the original statistics, which show that the tables are empty. If I upload a small zip file first, then the problem goes away because the tables now have data.
I have two questions. First, is my theory correct or is there some other reason for this behaviour? Also, if so, is there a way to force Postgres to update the query plan it uses at some predefined interval within the same transaction and can this be done using JPA? Any ideas are appreciated.
Just in case someone runs into this issue, I'll post the solution I found. It appears my theory was correct. The queries will not use the indexes until some statistics are collected. One way to force this is to call ANALYZE after a number of rows have been written to the database. You can do this using a native query like this:
entityManager.createNativeQuery("ANALYZE " + tbl).executeUpdate();
You can wrap this call in a try catch and ignore any exceptions that might occur if you change the database engine. I couldn't find a way of doing this in a database-independent way but this approach works fine and now the initial upload performs as expected.

PDI: Returning the result of a SELECT-statement to the datastream

Using PDI (Kettle) I am filling the entry-stage of my database by utilizing a CSV Inputand Table Output step. This works great, however, I also want to make sure that the data that was just inserted fulfills certain criteria, e.g. fields not being NULL, etc.
Normally this would be a job for database constraints, however, we want to keep the data in the database even if its faulty (for debugging purposes. It is a pain trying to debug a .csv file...). As it is just a staging table anyway it doesn't cause any troubles for integrity, etc.
So to do just that, I wrote some SELECT Count(*) as test123 ... statements that instantly show if something is wrong or not and are easy to handle (if the value of test123 is 0 all is good, else the job needs to be aborted).
I am executing these statements using a Execute SQL Statements step within a PDI transformation. I expected the result to be automatically passed to my datastream, so I also used a Copy rows to result step to pass it up to the executing job.
This is the point where the problem is most likely located.
I think that the result of the SELECT statement was not automatically passed to my datastream, because when I do a Simple evaluation in the main job using the variable ${test123} (which I thought would be implicitly created by executing SELECT Count(*) as test123 ...) I never get the expected result.
I couldn't really find any clues to this problem in the PDI documentation so I hope that someone here has some experience with PDI and might be able to help. If something is still unclear, just hint at it and I will edit the post with more information.
best regards
Edit:
This is a simple model of my main job:
Start --> Load data (Transformation) --> Check data (Transformation) --> Simple Evaluation --> ...
You are mixing up a few concepts, if I read your post correctly.
You don't need a Execute SQL script, this is a job for the Table input step.
Just type your query in the Table input and you can preview your data and see it coming from the step into the data stream by using the preview on a subsequent step. The Execute SQL script is not an input step, which means it will not add external data to your data stream.
The output fields are not Variables. A Variable is set using the Set Variables step, which takes a single input row and maps a specific field to a variable, which can be persisted at parent job or root job levels. Fields are just that: fields. They are passed from one step to the next through hops and eventually to the parent job if you have a Copy rows to result step, but they are NOT variables.

Oracle: performance about filtering results from remote view

I have a remote database A which has a view v_myview. I am working on a local database, which has a dblink to access v_myview on databse A. If I query the view like this :
select * from v_myview # dblink ;
it returns half million rows. I just want to get some specific rows from the view,e.g., to get rows with id=123, my query is
select * from v_myview # dblink where id=123;
This works as expected. Here comes my question, when I run this query, will remote database generates the half million rows first then from there to find rows with id=123? or the remote view applies my filter first then query the DB without retrieving the half million rows first? how do I know that. Thank you!
Oracle is free to do either. You'd need to look at the query plan to see whether the filtering is being done locally or remotely.
Presumably, in a case as simple as the one you present, the optimizer would expect it to be more efficient to send the filter to the remote server rather than pulling half a million rows over the network only to filter them locally. That calculation may be different if the optimizer expects the unfiltered query to return a single row rather than half a million rows and it may be different if the query gets more complicated doing something like joining to a local table or calling a function on the local server.

Resources