SQLite DB open time really long - time

I am using sqlite in c++ windows, And I have a db size about 60M,
When I open the sqlite db, It takes about 13 second.
sqlite3* mpDB;
nRet = sqlite3_open16(szFile, &mpDB);
And if I closed my application and reopen it again. It takse only less then 1 second.
First, I thought It is because of disk cache. So I preload the 60M db file before sqlite open, and read the file using CFile, However, after preloading, the first time is still very slow.
BOOL CQFilePro::PreLoad(const CString& strPath)
{
boost::shared_array<BYTE> temp = boost::shared_array<BYTE>(new BYTE[PRE_LOAD_BUFFER_LENGTH]);
int nReadLength;
try
{
CFile file;
if (file.Open(strPath, CFile::modeRead) == FALSE)
{
return FALSE;
}
do
{
nReadLength = file.Read(temp.get(), PRE_LOAD_BUFFER_LENGTH);
} while (nReadLength == PRE_LOAD_BUFFER_LENGTH);
file.Close();
}
catch(...)
{
}
return TRUE;
}
My question is what is the difference between first open and second open.
How can I accelerate the sqlite open-process.

Actually, I don't imagine it would be a caching issue. I'm pretty certain SQLite doesn't load the entire database into memory when you open it - it just gets some relatively small amount of on-disk structures.
One possibility, however, is if it has not been compiled with the SQLITE_OMIT_AUTOINIT preprocessor define. In that case, a call to sqlite3_open16 will result in a call to sqlite3_initialize().
Quite a bit happens within that function though I'm unsure as to how much time it will take. The sqlite3_initialize() function maintains a flag indicating it's been called before and then, on subsequent calls, it will exit (almost) immediately. That's why I mention it as a possible culprit in the difference between first and subsequent opens.
I'd suggest changing your code from:
sqlite3* mpDB;
nRet = sqlite3_open16(szFile, &mpDB);
to:
sqlite3* mpDB;
nRet = sqlite3_initialize();
if (nRet == SQLITE_OK)
nRet = sqlite3_open16(szFile, &mpDB);
and timing the two function calls independently. It may be that it's the initialisation taking up the time.

Related

Responsive asynchronous search-as-you-type in Java 8

I'm trying to implement a "search as you type" pattern in Java.
The goal of the design is that no change gets lost but at the same time, the (time consuming) search operation should be able to abort early and try with the updated pattern.
Here is what I've come up so far (Java 8 pseudocode):
AtomicReference<String> patternRef
AtomicLong modificationCount
ReentrantLock busy;
Consumer<List<ResultType>> resultConsumer;
// This is called in a background thread every time the user presses a key
void search(String pattern) {
// Update the pattern
synchronized {
patternRef.set(pattern)
modificationCount.inc()
}
try {
if (!busy.tryLock()) {
// Another search is already running, let it handle the change
return;
}
// Get local copy of the pattern and modCount
synchronized {
String patternCopy = patternRef.get();
long modCount = modificationCount.get()
}
while (true) {
// Try the search. It will return false when modificationCount changes before the search is finished
boolean success = doSearch(patternCopy, modCount)
if (success) {
// Search completed before modCount was changed again
break
}
// Try again with new pattern+modCount
synchronized {
patternCopy = patternRef.get();
modCount = modificationCount.get()
}
}
} finally {
busy.unlock();
}
}
boolean doSearch(String pattern, long modCount)
... search database ...
if (modCount != modificationCount.get()) {
return false;
}
... prepare results ...
if (modCount != modificationCount.get()) {
return false;
}
resultConsumer.accept(result); // Consumer for the UI code to do something
return modCount == modificationCount.get();
}
Did I miss some important point? A race condition or something similar?
Is there something in Java 8 which would make the code above more simple?
The fundamental problem of this code can be summarized as “trying to achieve atomicity by multiple distinct atomic constructs”. The combination of multiple atomic constructs is not atomic and trying to reestablish atomicity leads to very complicated, usually broken, and inefficient code.
In your case, doSearch’s last check modCount == modificationCount.get() happens while still holding the lock. After that, another thread (or multiple other threads) could update the search string and mod count, followed by finding the lock occupied, hence, concluding that another search is running and will take care.
But that thread doesn’t care after that last modCount == modificationCount.get() check. The caller just does if (success) { break; }, followed by the finally { busy.unlock(); } and returns.
So the answer is, yes, you have potential race conditions.
So, instead of settling on two atomic variables, synchronized blocks, and a ReentrantLock, you should use one atomic construct, e.g. a single atomic variable:
final AtomicReference<String> patternRef = new AtomicReference<>();
Consumer<List<ResultType>> resultConsumer;
// This is called in a background thread every time the user presses a key
void search(String pattern) {
if(patternRef.getAndSet(pattern) != null) return;
// Try the search. doSearch will return false when not completed
while(!doSearch(pattern) || !patternRef.compareAndSet(pattern, null))
pattern = patternRef.get();
}
boolean doSearch(String pattern) {
//... search database ...
if(pattern != (Object)patternRef.get()) {
return false;
}
//... prepare results ...
if(pattern != (Object)patternRef.get()) {
return false;
}
resultConsumer.accept(result); // Consumer for the UI code to do something
return true;
}
Here, a value of null indicates that no search is running, so if a background thread sets this to a non-null value and finds the old value to be null (in an atomic operation), it knows it has to perform the actual search. After the search, it tries to set the reference to null again, using compareAndSet with the pattern used for the search. Thus, it can only succeed if it has not changed again. Otherwise, it will fetch the new value and repeat.
These two atomic updates are already sufficient to ensure that there is only a single search operation at a time while not missing an updated search pattern. The ability of doSearch to return early when it detects a change, is just a nice to have and not required by the caller’s loop.
Note that in this example, the check within doSearch has been reduced to a reference comparison (using a cast to Object to prevent compiler warnings), to demonstrate that it can be as cheap as the int comparison of your original approach. As long as no new string has been set, the reference will be the same.
But, in fact, you could also use a string comparison, i.e. if(!pattern.equals(patternRef.get())) { return false; } without a significant performance degradation. String comparison is not (necessarily) expensive in Java. The first thing, the implementation of String’s equals does, is a reference comparison. So if the string has not changed, it will return true immediately here. Otherwise, it will check the lengths then (unlike C strings, the length is known beforehand) and return false immediately on a mismatch. So in the typical scenario of the user typing another character or pressing backspace, the lengths will differ and the comparison bail out immediately.

V8 Garbage Collection Differs For ObjectTemplates and Objects Created With Them

V8's garbage collection seems to easily clean up as it goes a Local<T> value where T anything stored in the Local, however if you create an ObjectTemplate and then create an instance of that Object, v8 will wait to clean up the memory. Consider the following example where the resident set size remains stable throughout program execution:
Isolate* isolate = Isolate::New(create_params);
Persistent<Context> *context= ContextNew(isolate); // creates a persistent context
for(int i = 1 ; i <= 1000000; i ++ ) {
isolate->Enter();
EnterContext(isolate, context); // enters the context
{
HandleScope handle_scope(isolate);
Local<Object> result = Object::New(isolate);
}
ExitContext(isolate, context);
isolate->Exit();
}
Above, all we do is create a new Object in a loop, and then handle_scope goes out of scope and it looks like the Local values allocated are garbage collected right away as the residential set size remains steady. However, there is an issue when this object is created through an ObjectTemplate that is also created in the loop:
Isolate* isolate = Isolate::New(create_params);
Persistent<Context> *context= ContextNew(isolate); // creates a persistent context
for(int i = 1 ; i <= 1000000; i ++ ) {
isolate->Enter();
EnterContext(isolate, context); // enters the context
{
HandleScope handle_scope(isolate);
Local<Object> result;
Local<ObjectTemplate> templ = ObjectTemplate::New(isolate);
if (!templ->NewInstance(context->Get(isolate)).ToLocal(&result)) { exit(1); }
}
ExitContext(isolate, context);
isolate->Exit();
}
Here, the resident set size increases linearly until an unnecessary amount of ram is used for such a small program. Just looking to understand what is happening here. Sorry for the long explanation, i tried to keep it short and to the point :p. Thanks in advance!
V8 assumes that ObjectTemplates are long-lived and hence allocates them in the "old generation" part of the heap, where it takes longer for them to get collected by a (comparatively slow and rare) full GC cycle -- if the assumption was right and they actually are long-lived, this is an overall performance win. Objects themselves, on the other hand, are allocated in the "young generation", where they are quick and easy to collect by the (comparatively frequent) young-generation GC cycles.
If you run with --trace-gc you should see this explanation confirmed.

Cancel ajax calls?

I'm using the Select2 select boxes in my Django project. The ajax calls it makes can be fairly time-consuming if you've only entered a character or two in the query box, but go quicker if you've entered several characters. So what I'm seeing is you'll start typing a query, and it will make 4 or 5 ajax calls, but the final one returns and the results display. It looks fine on the screen, but meanwhile, the server is still churning away on the earlier queries. I've increased the "delay" parameter to 500 ms, but it's still a bit of a problem.
Is there a way to have the AJAX handler on the server detect that this is a new request from the same client as one that is currently processing, and tell the older one to exit immediately? It appears from reading other answers here that merely calling .abort() on the client side doesn't stop the query running on the server side.
If they are DB queries that are taking up time, then basically nothing will stop them besides stopping the database server, which is of course not tangible. If it is computation in nested loops for example, then you could use cache to detect whether another request has been submitted from the same user. Basically:
from django.core.cache import cache
def view(request):
start_time = timestamp # timezone.now() etc.
cache.set(request.session.session_key + 'some_identifier', start_time)
for q in werty:
# Very expensive computation with millions of loops
if start_time != cache.get(request.session.session_key + 'some_identifier'):
break
else:
# Continue the nasty computations
else:
cache.delete(request.session.session_key + 'some_identifier')
But the Django part aside - what I would do: in JS add a condition that when the search word is less than 3 chars, then it waits 0.5s (or less, whatever you like) before searching. And if another char is added then search right away.
I.e.
var timeout;
function srch(param) {
timeout = false;
if (param.length < 3) {
timeout = true;
setTimeout(function () {
if (timeout) {
$.ajax({blah: blah});
}
}, 500);
} else {
$.ajax({blah: blah});
}
}

SCAN command with spring redis template

I am trying to execute "scan" command with RedisConnection. I don't understand why the following code is throwing NoSuchElementException
RedisConnection redisConnection = redisTemplate.getConnectionFactory().getConnection();
Cursor c = redisConnection.scan(scanOptions);
while (c.hasNext()) {
c.next();
}
Exception:
java.util.NoSuchElementException at
java.util.Collections$EmptyIterator.next(Collections.java:4189) at
org.springframework.data.redis.core.ScanCursor.moveNext(ScanCursor.java:215)
at
org.springframework.data.redis.core.ScanCursor.next(ScanCursor.java:202)
Yes, I have tried this, in 1.6.6.RELEASE spring-data-redis.version. No issues, the below simple while loop code is enough. And i have set count value to 100 (more the value) to save round trip time.
RedisConnection redisConnection = null;
try {
redisConnection = redisTemplate.getConnectionFactory().getConnection();
ScanOptions options = ScanOptions.scanOptions().match(workQKey).count(100).build();
Cursor c = redisConnection.scan(options);
while (c.hasNext()) {
logger.info(new String((byte[]) c.next()));
}
} finally {
redisConnection.close(); //Ensure closing this connection.
}
I'm using spring-data-redis 1.6.0-RELEASE and Jedis 2.7.2; I do think that the ScanCursor implementation is slightly flawed w/rgds to handling this case on this version - I've not checked previous versions though.
So: rather complicated to explain, but in the ScanOptions object there is a "count" field that needs to be set (default is 10). This field, contains an "intent" or "expected" results for this search. As explained (not really clearly, IMHO) here, you may change the value of count at each invocation, especially if no result has been returned. I understand this as "a work intent" so if you do not get anything back, maybe your "key space" is vast and the SCAN command has not worked "hard enough". Obviously, as long as you're getting results back, you do not need to increase this.
A "simple-but-dangerous" approach would be to have a very large count (e.g 1 million or more). This will make REDIS go away trying to search your vast key space to find "at least or near as much" as your large count. Don't forget - REDIS is single-threaded so you just killed your performance. Try this with a REDIS of 12M keys and you'll see that although SCAN may happily return results with a very high count value, it will absolutely do nothing more during the time of that search.
To the solution to your problem:
ScanOptions options = ScanOptions.scanOptions().match(pattern).count(countValue).build();
boolean done = false;
// the while-loop below makes sure that we'll get a valid cursor -
// by looking harder if we don't get a result initially
while (!done) {
try(Cursor c = redisConnection.scan(scanOptions)) {
while (c.hasNext()) {
c.next();
}
done = true; //we've made it here, lets go away
} catch (NoSuchElementException nse) {
System.out.println("Going for "+countValue+" was not hard enough. Trying harder");
options = ScanOptions.scanOptions().match(pattern).count(countValue*2).build();
}
}
Do note that the ScanCursor implementation of Spring Data REDIS will properly follow the SCAN instructions and loop correctly, as much as needed, to get to the end of the loop as per documentation. I've not found a way to change the scan options within the same cursor - so there may be a risk that if you get half-way through your results and get a NoSuchElementException, you'll start again (and essentially do some of the work twice).
Of course, better solutions are always welcome :)
My old code
ScanOptions.scanOptions().match("*" + query + "*").count(10).build();
Working code
ScanOptions.scanOptions().match("*" + query + "*").count(Integer.MAX_VALUE).build();

Efficient Independent Synchronized Blocks?

I have a scenario where, at certain points in my program, a thread needs to update several shared data structures. Each data structure can be safely updated in parallel with any other data structure, but each data structure can only be updated by one thread at a time. The simple, naive way I've expressed this in my code is:
synchronized updateStructure1();
synchronized updateStructure2();
// ...
This seems inefficient because if multiple threads are trying to update structure 1, but no thread is trying to update structure 2, they'll all block waiting for the lock that protects structure 1, while the lock for structure 2 sits untaken.
Is there a "standard" way of remedying this? In other words, is there a standard threading primitive that tries to update all structures in a round-robin fashion, blocks only if all locks are taken, and returns when all structures are updated?
This is a somewhat language agnostic question, but in case it helps, the language I'm using is D.
If your language supported lightweight threads or Actors, you could always have the updating thread spawn a new a new thread to change each object, where each thread just locks, modifies, and unlocks each object. Then have your updating thread join on all its child threads before returning. This punts the problem to the runtime's schedule, and it's free to schedule those child threads any way it can for best performance.
You could do this in langauges with heavier threads, but the spawn and join might have too much overhead (though thread pooling might mitigate some of this).
I don't know if there's a standard way to do this. However, I would implement this something like the following:
do
{
if (!updatedA && mutexA.tryLock())
{
scope(exit) mutexA.unlock();
updateA();
updatedA = true;
}
if (!updatedB && mutexB.tryLock())
{
scope(exit) mutexB.unlock();
updateB();
updatedB = true;
}
}
while (!(updatedA && updatedB));
Some clever metaprogramming could probably cut down the repetition, but I leave that as an exercise for you.
Sorry if I'm being naive, but do you not just Synchronize on objects to make the concerns independent?
e.g.
public Object lock1 = new Object; // access to resource 1
public Object lock2 = new Object; // access to resource 2
updateStructure1() {
synchronized( lock1 ) {
...
}
}
updateStructure2() {
synchronized( lock2 ) {
...
}
}
To my knowledge, there is not a standard way to accomplish this, and you'll have to get your hands dirty.
To paraphrase your requirements, you have a set of data structures, and you need to do work on them, but not in any particular order. You only want to block waiting on a data structure if all other objects are blocked. Here's the pseudocode I would base my solution on:
work = unshared list of objects that need updating
while work is not empty:
found = false
for each obj in work:
try locking obj
if successful:
remove obj from work
found = true
obj.update()
unlock obj
if !found:
// Everything is locked, so we have to wait
obj = randomly pick an object from work
remove obj from work
lock obj
obj.update()
unlock obj
An updating thread will only block if it finds that all objects it needs to use are locked. Then it must wait on something, so it just picks one and locks it. Ideally, it would pick the object that will be unlocked earliest, but there's no simple way of telling that.
Also, it's conceivable that an object might become free while the updater is in the try loop and so the updater would skip it. But if the amount of work you're doing is large enough, relative to the cost of iterating through that loop, the false conflict should be rare, and it would only matter in cases of extremely high contention.
I don't know any "standard" way of doing this, sorry. So this below is just a ThreadGroup, abstracted by a Swarm-class, that »hacks» at a job list until all are done, round-robin style, and makes sure that as many threads as possible are used. I don't know how to do this without a job list.
Disclaimer: I'm very new to D, and concurrency programming, so the code is rather amateurish. I saw this more as a fun exercise. (I'm too dealing with some concurrency stuff.) I also understand that this isn't quite what you're looking for. If anyone has any pointers I'd love to hear them!
import core.thread,
core.sync.mutex,
std.c.stdio,
std.stdio;
class Swarm{
ThreadGroup group;
Mutex mutex;
auto numThreads = 1;
void delegate ()[int] jobs;
this(void delegate()[int] aJobs, int aNumThreads){
jobs = aJobs;
numThreads = aNumThreads;
group = new ThreadGroup;
mutex = new Mutex();
}
void runBlocking(){
run();
group.joinAll();
}
void run(){
foreach(c;0..numThreads)
group.create( &swarmJobs );
}
void swarmJobs(){
void delegate () myJob;
do{
myJob = null;
synchronized(mutex){
if(jobs.length > 0)
foreach(i,job;jobs){
myJob = job;
jobs.remove(i);
break;
}
}
if(myJob)
myJob();
}while(myJob)
}
}
class Jobs{
void job1(){
foreach(c;0..1000){
foreach(j;0..2_000_000){}
writef("1");
fflush(core.stdc.stdio.stdout);
}
}
void job2(){
foreach(c;0..1000){
foreach(j;0..1_000_000){}
writef("2");
fflush(core.stdc.stdio.stdout);
}
}
}
void main(){
auto jobs = new Jobs();
void delegate ()[int] jobsList =
[1:&jobs.job1,2:&jobs.job2,3:&jobs.job1,4:&jobs.job2];
int numThreads = 2;
auto swarm = new Swarm(jobsList,numThreads);
swarm.runBlocking();
writefln("end");
}
There's no standard solution but rather a class of standard solutions depending on your needs.
http://en.wikipedia.org/wiki/Scheduling_algorithm

Resources