I have a function with 1 argument (date) which encapsulates 1 query like
SELECT COUNT(*)
FROM tbl
WHERE some_date_field BETWEEN param_date - INTERVAL '0 1:00:00' DAY TO SECOND
AND param_date
What I want to do is to cache somewhere the result of this query with ttl = 1 minute. The cached result should be shared across all sessions, not just current one.
Any proposals?
PS: Yes, I know about oracle function result cache, but it doesn't fit the requirements.
PPS: Yes, we can create 2nd artificial argument with some value like date in format of yyyymmddhh24mi so it changes each minute and we're able to use function result cache, but I hope it is a solution which will allow me to hide the caching dependencies inside.
I'd use a global application context, and a job with a refresh interval of 1 minute to set the context.
PS: INTERVAL '1' HOUR is shorter and more meaningful than INTERVAL '0 1:00:00' DAY TO SECOND
You want to cache the result of this query, and share the cache across all sessions. The only way I can think of is to wrap the query in a function call, store the result in a small table. The function will query the small table to see if the count has already been stored within the last 1 minute, and if so, return it.
You would keep the table small by running a job periodically to delete rows in the "cache table" that are older than 1 minute - or better still, perhaps truncate it.
However, I can only see this being of benefit if the original SELECT COUNT(*) is a relatively expensive query.
Related
I use crontab to schedule a SQL that queries a big table every 2 hours.
select a,b,c,d,e,f,g,h,i,j,k,many_cols from big_table format Null
It takes anywhere from 5 minutes to 30 seconds at a time.
What I can see from the query_log is that when the SQL time is low, the MarkCacheHits value is high, when the time is high, the MarkCacheHits value is low, and the MarkCacheMiss value is high.
And I'm wondering how to make mark cache hit as many as possible? (This is probably not the only big table that needs to be warmed up)
Will mark cache be replaced by other queries and what is its limit?
Does the warm-up way of selecting specific columns really work for an aggregate query of those columns? For example, warm-up SQL is as above, and the aggregate query can be select a,sum(if(b,c,0)) from big_table group by a
My clickhouse server has been hanging occasionally recently, and I can't see any errors or exceptions at the corresponding time from the log. Could this be related to my regular warm-up query of the big table?
In reality you placing data into Linux disk cache.
Will mark cache be replaced by other queries and what is its limit?
yes, will be replaced, 5GB <mark_cache_size>5368709120</mark_cache_size>
Does the warm-up way of selecting specific columns really work for an aggregate query of those columns?
Yes because you put files into Linux cache.
Could this be related to my regular warm-up query of the big table?
No.
I am writing some data loading code that pulls data from a large, slow table in an oracle database. I have read-only access to the data, and do not have the ability to change indexes or affect the speed of the query in any way.
My select statement takes 5 minutes to execute and returns around 300,000 rows. The system is inserting large batches of new records constantly, and I need to make sure I get every last one, so I need to save a timestamp for the last time I downloaded the data.
My question is: If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?
My gut tells me that the answer is 'no', especially since a large portion of those 5 minutes is just the time spent on the data transfer from the database to the local environment, but I can't find any direct documentation on the scenario.
"If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?"
No. Oracle enforces strict isolation levels and does not permit dirty reads.
The default isolation level is Read Committed. This means the result set you get after five minutes will be identical to the one you would have got if Oracle could have delivered you all the records in 0.0000001 seconds. Anything committed after you query started running will not be included in the results. That includes updates to the records as well as inserts.
Oracle does this by tracking changes to the table in the UNDO tablespace. Provided it can restrict the original image from that data your query will run to completion; if for any reason the undo information is overwritten your query will fail with the dreaded ORA-1555: Snapshot too old. That's right: Oracle would rather hurl an exception than provide us with an inconsistent result set.
Note that this consistency applies at the statement level. If we run the same query twice within the one transaction we may see two different results sets. If that is a problem (I think not in your case) we need to switch from Read Committed to Serialized isolation.
The Concepts Manual covers Concurrency and Consistency in great depth. Find out more.
So to answer your question, take the timestamp from the time you start the select. Specifically, take the max(created_ts) from the table before you kick off the query. This should protect you from the gap Alex mentions (if records are not committed the moment they are inserted there is the potential to lose records if you base the select on comparing with the system timestamp). Although doing this means you're issuing two queries in the same transaction which means you do need Serialized isolation after all!
I have simple app which execute query on dp, since there are alot rows returned ~ 300-400k and its to much to be retrived and it cause out of memory error i have to use pagination. In groovy.sql.SQL we have rows(String sql,int offset, int maxRows) anyway its works very slow, for example with step 20k rows execution time of rows method starts with around 10 sec and increase with every next call, second way of achiving pagination is using some buile in mechanism for example
select *
from (
select /*+ first_rows(25) */
your_columns,
row_number()
over (order by something unique)rn
from your_tables )
where rn between :n and :m
order by rn;
And for my query second approach tooks 5 seconds with step 20k. My question is, which method is better for database? And what is the reason of slow execution Sql.rows ?
The first_rows hint is no more needed - since Oracle 11g. For Oracle it is best approach producer-consumer design pattern. As database generates data "on-the-fly".
So simple pure select would be suitable:
select your_columns,
row_number() over (order by something unique)rn
from your_tables;
But unfortunately Java frameworks usually can not keep db connection open. They simply fetch all data at once, and then hand over the whole result set to caller.
You do not have many options. Either:
you will need all lot of RAM to fetch everything. Plus you can also use lazy loading on JPA level.
or you have to find a way how keep db connection open in a web application. Which it practically impossible. Also such a approach is not suitable for applications having more than thousands of concurrent users.
PS: under usual circumstances, the usual way how pagination is implemented does not return consistent data, as they can change between executions. So it should not be used for anything else that displaying purposes.
I'm using Oracle PL/SQL Developer on a Oracle Database 11g. I have recently written a view with some weird behaviour. When I run the simple query below without fetching the last page of the query the query time is about 0.5 sec (0.2 when cached).
select * from covenant.v_status_covenant_tuning where bankkode = '4210';
However, if i fetch the last page in PL/SQL Developer or if I run the query from Java-code (i.e. I run a query that retrieves all the rows) something happens to the view and the query time increases to about 20-30 secs.
The view does not start working properly again before I recompile it. The explain plan is exactly the same before and after. All indexes and tables are analyzed. I don't know if it's relevant but the view uses a few analytic expressions like rank() over (partition by .....), lag(), lead() and so on.
As I'm new here I can't post a picture of the explain plan (need a reputation of 10) but in general the optimizer uses indexes efficiently and it does a few sorts because of the analytic functions.
If the plan involves a full scan of some sort, the query will not complete until the very last block in the table has been read.
Imagine a table that has lots of matching rows in the very first few blocks in the table, and no matching rows in the rest of it. If there is a large volume of blocks to check, the query might return the first few pages of results very quickly, as it finds them all in the first few blocks of the table. But before it can return the final "no more results" to the client, it must check every last block of the table - it doesn't know if there might be one more result in the very last block of the table, so it has to wait until it has read that last block.
If you'd like more help, please post your query plan.
Question: How can I process (read in) batches of records 1000 at a time and ensure that only the current batch of 1000 records is in memory? Assume my primary key is called 'ID' and my table is called Customer.
Background: This is not for user pagination, it is for compiling statistics about my table. I have limited memory available, therefore I want to read my records in batches of 1000 records at a time. I am only reading in records, they will not be modified. I have read that StatelessSession is good for this kind of thing and I've heard about people using ScrollableResults.
What I have tried: Currently I am working on a custom made solution where I implemented Iterable and basically did the pagination by using setFirstResult and setMaxResults. This seems to be very slow for me but it allows me to get 1000 records at a time. I would like to know how I can do this more efficiently, perhaps with something like ScrollableResults. I'm not yet sure why my current method is so slow; I'm ordering by ID but ID is the primary key so the table should already be indexed that way.
As you might be able to tell, I keep reading bits and pieces about how to do this. If anyone can provide me a complete way to do this it would be greatly appreciated. I do know that you have to set FORWARD_ONLY on ScrollableResults and that calling evict(entity) will take an entity out of memory (unless you're doing second level caching, which I do not yet know how to check if I am or not). However I don't see any methods in the JavaDoc to read in say, 1000 records at a time. I want a balance between my lack of available memory and my slow network performance, so sending records over the network one at a time really isn't an option here. I am using Criteria API where possible. Thanks for any detailed replies.
May useing of ROWNUM feature of oracle will hepl you.
Lets say we need to fetch 1000 rows(pagesize) of table CUSTOMERS and we need to fetch second page(pageNumber)
Creating and Calling some query like this may be the answer
select * from
(select rownum row_number,customers.* from Customer
where rownum <= pagesize*pageNumber order by ID)
where row_number >= (pagesize -1)*pageNumber
Load entities as read-only.
For HQL
Query.setReadOnly( true );
For Criteria
Criteria.setReadOnly( true );
http://docs.jboss.org/hibernate/orm/3.6/reference/en-US/html/readonly.html#readonly-api-querycriteria
Stateless session quite different with State-Session.
Operations performed using a stateless session never cascade to associated instances. Collections are ignored by a stateless session
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html#batch-statelesssession
Use flash() and clear() to clean up session cache.
session.flush();
session.clear();
Question about Hibernate session.flush()
ScrollableResults should works that you expect.
Do not forget that each item that you loaded takes memory space unless you evict or clear and need to check it really works well.
ScrollableResults in Mysql J/Connecotr works fake, it loads entire rows, but I think oracle connector works fine.
Using Hibernate's ScrollableResults to slowly read 90 million records
If you find alternatives, you may consider to use this way
1. Select PrimaryKey of every rows that you will process
2. Chopping them into PK chunk
3. iterate -
select rows by PK chunk (using in-query)
process them what you want