I'm working on a app where I have some entities in the database that have a column representing the date until that particular entity is available for some actions. When it expires I need to change it's state, meaning updating a column representing it's state.
What I'm doing so far, whenever I ask the database for those entities to do something with them, I first check if they are not expired and if they are, I update them. I don't particularly like this approach, since that means I will have a bunch of records in the database that would be in the wrong state just because I haven't queried them. Another approach would be to have a periodic task that runs over those records and updates them as necessary. That I also don't like since again, I would have records in a inconsistent state and in this case, the first approach seems more reasonable.
Is there another way of doing this, am I missing something? I need to mention, I use spring-boot + hibernate for my application. The underlying db is Postgresql. Is there any technology specific trick I can use to obtain what I want?
in database there it no triger type expired. if you have somethind that expired and you should do somethig with that there is two solutions (you have wrote about then) : do some extra with expired before you use data , and some cron/task (it might be on db level or on server side).
I recomend you use cron approach. Here is explanation :
do something with expired before you get data :
updated before select
+: you update expired data before you need it , and here are questions - update only that you requested or all that expired... update all might be time consumed in case if from all records you need just 2 records and updated 2000 records that are not related you you working dataset.
-: long time to update all record ; if database is shared - access to db not only throth you application , logic related to expired is not executed(if you have this case); you need controll entry point where you should do something with expired and where you shouldn't ; if time expired in min , sec - then even after you execure logic for expired , in next sec new records might be expired too;also if you need update workflow logic for expired data handling you need keep it in one plase - in cron , in case with update before you do select you should update changed logic too.
CRON/TASK
-: you should spend time to configure it just once 30-60 mins max:) ;
+: it's executed in the background ; if your db is used not only by your application , expired data logic also be available; you don't have to check(and don't rememebr about it , and explaine about for new employee....) is there any staled data in your java code before select something; you do split logic between cares about staled data , and normal queries do db .
You can execute 'select for update' in cron and even if you do select during update time from server side query you will wait will staled data logic complets and you get in select up to date data
for spring :
spring scheduling documentation , simple example spring-quartz-schedule
for db level postgresql job scheduler
scheduler/cron it's best practices for such things
Related
Lets star with background. I have an api endpoint that I have to query every 15 minutes and that returns complex data. Unfortunately this endpoint does not provide information of what exactly changed. So it requires me to compare the data that I have in db and compare everything and than execute update, add or delete. This is pretty boring...
I came to and idea that I can simply remove all data from certain tables and build everything from scratch... But it I have to also return this cached data to my clients. So there might be a situation that the db will be empty during some request from my client because it will be "refreshing/rebulding". And that cant happen because I have to return something
So I cam to and idea to
Lock the certain db tables so that the client will have to wait for the "refreshing the db"
or
CQRS https://martinfowler.com/bliki/CQRS.html
Do you have any suggestions how to solve the problem?
It sounds like you're using a relational database, so I'll try to outline a solution using database terms. The idea, however, is more general than that. In general, it's similar to Blue-Green deployment.
Have two data tables (or two databases, for that matter); one is active, and one is inactive.
When the software starts the update process, it can wipe the inactive table and write new data into it. During this process, the system keeps serving data from the active table.
Once the data update is entirely done, the system can begin to serve data from the previously inactive table. In other words, the inactive table becomes the active table, and vice versa.
I am writing some data loading code that pulls data from a large, slow table in an oracle database. I have read-only access to the data, and do not have the ability to change indexes or affect the speed of the query in any way.
My select statement takes 5 minutes to execute and returns around 300,000 rows. The system is inserting large batches of new records constantly, and I need to make sure I get every last one, so I need to save a timestamp for the last time I downloaded the data.
My question is: If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?
My gut tells me that the answer is 'no', especially since a large portion of those 5 minutes is just the time spent on the data transfer from the database to the local environment, but I can't find any direct documentation on the scenario.
"If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?"
No. Oracle enforces strict isolation levels and does not permit dirty reads.
The default isolation level is Read Committed. This means the result set you get after five minutes will be identical to the one you would have got if Oracle could have delivered you all the records in 0.0000001 seconds. Anything committed after you query started running will not be included in the results. That includes updates to the records as well as inserts.
Oracle does this by tracking changes to the table in the UNDO tablespace. Provided it can restrict the original image from that data your query will run to completion; if for any reason the undo information is overwritten your query will fail with the dreaded ORA-1555: Snapshot too old. That's right: Oracle would rather hurl an exception than provide us with an inconsistent result set.
Note that this consistency applies at the statement level. If we run the same query twice within the one transaction we may see two different results sets. If that is a problem (I think not in your case) we need to switch from Read Committed to Serialized isolation.
The Concepts Manual covers Concurrency and Consistency in great depth. Find out more.
So to answer your question, take the timestamp from the time you start the select. Specifically, take the max(created_ts) from the table before you kick off the query. This should protect you from the gap Alex mentions (if records are not committed the moment they are inserted there is the potential to lose records if you base the select on comparing with the system timestamp). Although doing this means you're issuing two queries in the same transaction which means you do need Serialized isolation after all!
I have 2 databases, DBa and DBb. I have 2 records sets, RecordsA and RecordsB. The concept is that in our app you can add records from A to B. I am having an issue where I go to add a record from A to B and try to query the records again. The particular property on the added record is stale/incorrect.
RecordsA lives on DBa and RecordsB lives on DBb. I make my stored proc call to add the record to the B side and modify a column's value on DBa which makes the insert/update using a dblink on DBb. Problem is, when I do a insert/update followed by an immidiate get call on DBa (calling DBb) that modified property is incorrect, it's null as if the insert never went through. However, if I put a breakpoint before the pull call and wait about 1 second the correct data is returned. Making me wonder if there is some latency issues with dblinks.
This seems like an async issue but we verified no async calls are being made and everything is running on the same thread. Would this type of behavior be likely with a db link? As in, inserting/updating a record on a remote server and retrieving it right away causing some latency where the record wasn't quite updated at the time of the re-pull?
I am working with a EJB(3.0)-Hibernate(3) project along with Oracle 11g DB.
First of all due to the security reason I am unable to share my code, I am really sorry for that.
Issue is :
In my Application from different locations, DB has been called for retrieving, persisting, merging records which deals with a number of tables in DB.
But, for a particular retrieve query(select query which is fetching only a single record by putting a primary key data in where clause) from my Application, it is taking too much time(almost 4 minutes) for getting the response from DB(response is proper with a single record).
I can track the time by debugging from calling point to DB inside Application and the retrieving response from DB to my Application.
So, I want to know why for a single record fetching, it is taking so much time where for other queries it's fetching within seconds or micro-seconds.
And also want to know how to track the time-stamp of [query request from Application just hitting the Database after connecting DB through Hibernate Layer] and also what is going on inside the DB for this flow.
Please give me some advice or suggestions from your entire work experience if you facing such kind of issue and also help me how to track the whole flow
Application <-> Hibernate Layer <-> Database
Thanks in advance!!!
Question: How can I process (read in) batches of records 1000 at a time and ensure that only the current batch of 1000 records is in memory? Assume my primary key is called 'ID' and my table is called Customer.
Background: This is not for user pagination, it is for compiling statistics about my table. I have limited memory available, therefore I want to read my records in batches of 1000 records at a time. I am only reading in records, they will not be modified. I have read that StatelessSession is good for this kind of thing and I've heard about people using ScrollableResults.
What I have tried: Currently I am working on a custom made solution where I implemented Iterable and basically did the pagination by using setFirstResult and setMaxResults. This seems to be very slow for me but it allows me to get 1000 records at a time. I would like to know how I can do this more efficiently, perhaps with something like ScrollableResults. I'm not yet sure why my current method is so slow; I'm ordering by ID but ID is the primary key so the table should already be indexed that way.
As you might be able to tell, I keep reading bits and pieces about how to do this. If anyone can provide me a complete way to do this it would be greatly appreciated. I do know that you have to set FORWARD_ONLY on ScrollableResults and that calling evict(entity) will take an entity out of memory (unless you're doing second level caching, which I do not yet know how to check if I am or not). However I don't see any methods in the JavaDoc to read in say, 1000 records at a time. I want a balance between my lack of available memory and my slow network performance, so sending records over the network one at a time really isn't an option here. I am using Criteria API where possible. Thanks for any detailed replies.
May useing of ROWNUM feature of oracle will hepl you.
Lets say we need to fetch 1000 rows(pagesize) of table CUSTOMERS and we need to fetch second page(pageNumber)
Creating and Calling some query like this may be the answer
select * from
(select rownum row_number,customers.* from Customer
where rownum <= pagesize*pageNumber order by ID)
where row_number >= (pagesize -1)*pageNumber
Load entities as read-only.
For HQL
Query.setReadOnly( true );
For Criteria
Criteria.setReadOnly( true );
http://docs.jboss.org/hibernate/orm/3.6/reference/en-US/html/readonly.html#readonly-api-querycriteria
Stateless session quite different with State-Session.
Operations performed using a stateless session never cascade to associated instances. Collections are ignored by a stateless session
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html#batch-statelesssession
Use flash() and clear() to clean up session cache.
session.flush();
session.clear();
Question about Hibernate session.flush()
ScrollableResults should works that you expect.
Do not forget that each item that you loaded takes memory space unless you evict or clear and need to check it really works well.
ScrollableResults in Mysql J/Connecotr works fake, it loads entire rows, but I think oracle connector works fine.
Using Hibernate's ScrollableResults to slowly read 90 million records
If you find alternatives, you may consider to use this way
1. Select PrimaryKey of every rows that you will process
2. Chopping them into PK chunk
3. iterate -
select rows by PK chunk (using in-query)
process them what you want