Slow Update/insert into SQL Server CE using LinqToDatasets - linq

I have a mobile app that is using LinqToDatasets to update/insert into a SQL Server CE 3.5 File.
My Code looks like this:
// All the MyClass Updates
MyTableAdapter myTableAdapter = new MyTableAdapter();
foreach (MyClassToInsert myClass in updates.MyClassChanges)
{
// Update the row if it is already there
int result = myTableAdapter.Update(myClass.FirstColumn,
myClass.SecondColumn,
myClass.FirstColumn);
// If the row was not there then insert it.
if (result == 0)
{
myTableAdapter.Insert(myClass.FirstColumn, myClass.SecondColumn);
}
}
This code is used to keep the hand held database in sync with the server database. Problem is if it is a full update (first time for example) there are a lot of updates (about 125). That makes this code (and more loops like it take a very long time (I have three such loops that take over 30 seconds each).
Is there a faster or better way to do updates/inserts like this?
(I did see this Codeplex Project, but I could not see how to make it work with both updates and inserts.)

You should always use SqlCeResultSet for data access on mobile devices for maximum performance and memory usage. You must identify the data to be inserted and then use code like the SqlCeBulkCopy sample, and use similar code by using the Seek and Update methods of the SqlCeResultSet.

Related

How write performance can be improved for RecordWriter

Can anyone help me out finding correct API to improve write performance?
We use MultipleOutputs<ImmutableBytesWritable, Result> class to write data we read from a table, we use the newly created file as a backup. We face performance issue in write using MultipleOutputs, it takes nearly 5 seconds for every 10000 records we write.
This is the code we use:
Result[] results = // result from another table
MultipleOutputs<ImmutableBytesWritable, Result> mos = new MultipleOutputs<ImmutableBytesWritable, Result> ();
for(Result res : results ){
mos.write(new ImmutableBytesWritable(result.getRow()), result, baseoutputpath);
}
We get a batch of 10000 rows and write them in a loop, with baseoutputpath changing depending on Result content.
We are facing performance dip when writing into MultipleOutputs, we suspect that it might be due to writing in a loop.
Is there any other API in maprdb or HBase which push data to database using fewer RPC calls by buffering upto certain limit.
We write data as records so no file system write class would work for us.
Please note that we use mapreduce job to do all of the above.

Managing shared resources with threads in Huey

I have to update many rows (increment one value in each rows) in peewee database (SqliteDatabase). Some objects can be uncreated so I have to create them with default values before working with them. I would use ways which are in peewee docs (Atomic updates) but I couldn't figure out how to mix model.get_or_create() and in [my_array].
So I decided to make queries in a transaction to commit it once at the end (I hope it does).
Why I'm writting in stack overflow is because I don't know how to work with db.atomic() with threading (I tested with 4 workers) in Huey because .atomic() locks the connection (peewee.OperationalError: database is locked). I've tried to use #huey.lock_task but it's not a solution of my problem as I've found.
Code of my class:
class Article(Model):
name = CharField()
mention_number = IntegerField(default=0)
class Meta:
database = db
Code of my task:
#huey.task(priority=30)
def update(names): # "names" is a list of strings
with db.atomic():
for name in names:
article, success = Article.get_or_create(name=name)
article.mention_number += 1
article.save()
Well, if you're using a recent version of Sqlite (3.24 or newer) you can use Postgres-style upsert queries. This is well supported by Peewee: http://docs.peewee-orm.com/en/latest/peewee/api.html#Insert.on_conflict
To answer the other question about shared resources, it's not clear from your example what you would like to happen... Sqlite only allows one write transaction at a time. So if you are running several threads, only one of them may be writing at any given time.
Peewee stores database connections in a thread local, so Peewee databases can be safely used in multithreaded applications.
You didn't mention why huey lock_task wouldn't work.
Another suggestion is to try using WAL-mode with Sqlite, as WAL-mode allows multiple reader transactions to co-exist with a single writer.

How do we improve a MongoDB MapReduce function that takes too long to retrieve data and gives out of memory errors?

Retrieving data from mongo takes too long, even for small datasets. For bigger datasets we get out of memory errors of the javascript engine. We've tried several schema designs and several ways to retrieve data. How do we optimize mongoDB/mapReduce function/MongoWire to retrieve more data quicker?
We're not very experienced with MongoDB yet and are therefore not sure whether we're missing optimization steps or if we're just using the wrong tools.
1. Background
For graphing and playback purposes we want to store changes for several objects over time. Currently we have tens of objects per project, but expectations are we need to store thousands of objects. The objects may change every second or not change for long periods of time. A Delphi backend writes to and reads from MongoDB through MongoWire and SuperObjects, the data is displayed in a web frontend.
2. Schema design
We're storing the object changes in minute-second-millisecond objects in a record per hour. The schema design is like described here. Sample:
o: object1,
dt: $date,
v: {0: {0:{0: {speed: 8, rate: 0.8}}}, 1: {0:{0: {speed: 9}}}, …}
We've put indexes on {dt: -1, o: 1} and {o:1}.
3. Retrieving data
We use a mapReduce to construct a new date based on the minute-second-millisecond objects and to put the object back in v:
o: object1,
dt: $date,
v: {speed: 8, rate:0.8}
An average document is about 525 kB before the mapReduce function and has had ~29000 updates. After mapReduce of such a document, the result is about 746 kB.
3.1 Retrieving data from through mongo shell with mapReduce
We're using the following map function:
function mapF(){
for (var i = 0; i < 3600; i++){
var imin = Math.floor(i / 60);
var isec = (i % 60);
var min = ''+imin;
var sec = ''+isec;
if (this.v.hasOwnProperty(min) && this.v[min].hasOwnProperty(sec)) {
for (var ms in this.v[min][sec]) {
if (imin !== 0 && isec !== 0 && ms !== '0' && this.v[min][sec].hasOwnProperty(ms)) {// is our keyframe
var currentV = this.v[min][sec][ms];
//newT is new date computed by the min, sec, ms above
if (toDate > newT && newT > fromDate) {
if (fields && fields.length > 0) {
for (var p = 0, length = fields.length; p < length; p++){
//check if field is present and put it in newV
}
if (newV) {
emit(this.o, {vs: [{o: this.o, dt: newT, v: newV}]});
}
} else {
emit(this.o, {vs: [{o: this.o, dt: newT, v: currentV}]});
}
}
}
}
}
}
};
The reduce function basically just passes the data on. The call to mapReduce:
db.collection.mapReduce( mapF,reduceF,
{out: {inline: 1},
query: {o: {$in: objectNames]}, dt: {$gte: keyframeFromDate, $lt: keyframeToDate}},
sort: {dt: 1},
scope: {toDate: toDateWithinKeyframe, fromDate: fromDateWithinKeyframe, fields: []},
jsMode: true});
Retrieving 2 objects over 1 hour: 2,4 seconds.
Retrieving 2 objects over 5 hour: 8,3 seconds.
For this method we would have to write js and bat files runtime and read the json data back in. We have not measured times fort his yet, because frankly, we don’t like the idea very much.
Another problem with this method is that we get out of memory errors of the v8 javascript engine when we try to retrieve data for longer periods and/or more objects. Using a pc with more RAM works to some extend in preventing out of memory, but it doesn't make retrieving data faster.
This article mentions splitVector, which we might use to devide the workload. But we're not sure on how to use the keyPattern and maxChunkSizeBytes options. Can we use a keyPattern for both o and dt?
We might use multiple collections, but our dataset isn’t that big to start with at the moment, so we’re worried about how much collections we’d need.
3.2 Retrieving data through mongoWire with mapReduce
For retrieving data through mongoWire with mapReduce, we use the same mapReduce functions as above. We use the following Delphi code to start te query:
FMongoWire.Get('$cmd',BSON([
'mapreduce', ‘collection’,
'map', bsonJavaScriptCodePrefix + FMapVCRFunction.Text,
'reduce', bsonJavaScriptCodePrefix + FReduceVCRFunction.Text,
'out', BSON(['inline', 1]),
'query', mapquery,
'sort', BSON(['dt', -1]),
'scope', scope
]));
Retrieving data with this method is about 3-4 times (!) slower. And then the data has to be translated from BSON (IBSONDocument to JSON (SuperObject), which is a major time consuming part in this method. For retrieving raw data we use TMongoWireQuery which translates the BSONdocument in parts, while this mapReduce function uses TMongoWire directly and tries to translate the complete result. This might explain why this takes so long, while normally it's quite fast. If we can reduce the time it takes for the mapReduce to return results, this might be a next step for us to focus on.
3.3 Retrieving raw data and parsing in Delphi
Retrieving raw data to Delphi takes a bit longer then the previous method, but probably because of the use of TMongoWireQuery, the translation from BSON to JSON is much quicker.
4. Questions
Can we do further optimizations on our schema design?
How can we make the mapReduce function faster?
How can we prevent the out of
memory errors of the v8 engine? Can someone give more information on
the splitVector function?
How can we best use of mapReduce from Delphi? Can we use
MongoWireQuery in stead of MongoWire?
5. Specs
MongoDB 3.0.3
MongoWire from 2015 (recently updated)
Delphi 2010 (got XE5 as well)
4GB RAM (tried on 8GB RAM as well, less out of memory, but reading times are about the same)
Phew what a question! First up: I'm not an expert at MongoDB. I wrote TMongoWire as a way to get to know MongoDB a little. Also I really (really) dislike when wrappers have a plethora of overloads to do the same thing but for all kinds of specific types. A long time ago programmers didn't have generics, but we did have Variant. So I built a MongoDB wrapper (and IBSONDocument) based around variants. That said, I apparently made something people like to use, and by keeping it simple performs quite well. (I haven't been putting much time in it lately, but on the top of the list is catering for the new authentication schemes since version 3.)
Now, about your specific setup. You say you use mapreduce to get from 500KB to 700KB? I think there's a hint there you're using the wrong tool for the job. I'm not sure what the default mongo shell does differently than when you do the same over TMongoWire.Get, but if I assume mapReduce assembles the response first before sending it over the wire, that's where the performance gets lost.
So here's my advice: you're right with thinking about using TMongoWireQuery. It offers a way to process data faster as the server will be streaming it in, but there's more.
I strongly suggest to use an array to store the list of seconds. Even if not all seconds have data, store null on the seconds without data so each minute array has 60 items. This is why:
One nicety that turned up in designing TMongoWireQuery, is the assumption you'll be processing a single (BSON) document at a time, and that the contents of the documents will be roughly similar, at least in the value names. So by using the same IBSONDocument instance when enumerating the response, you actually save a lot of time by not having to de-allocate and re-allocate all those variants.
That goes for simple documents, but would actually be nice to have on arrays as well. That's why I created IBSONDocumentEnumerator. You need to pre-load an IBSONDocument instance with an IBSONDocumentEnumerator in the place where you're expecting the array of documents, and you need to process the array in roughly the same way as with TMongoWireQuery: enumerate it using the same IBSONDocument instance, so when subsequent documents have the same keys, time is saved not having to re-allocate them.
In your case though, you would still need to pull the data of an entire hour through the wire just to select the seconds you need. As I said before, I'm not a MongoDB expert, but I suspect there could be a better way to store data like this. Either with a separate document per second (I guess this would let the indexes do more of the work, and MongoDB can take that insert-rate), or with a specific query construction so that MongoDB knows to shorten the seconds array into just that data you're requesting (is that what $splice does?)
Here's an example of how to use IBSONDocumentEnumerator on documents like {name:"fruit",items:[{name:"apple"},{name:"pear"}]}
q:=TMongoWireQuery.Create(db);
try
q.Query('test',BSON([]));
e:=BSONEnum;
d:=BSON(['items',e]);
d1:=BSON;
while q.Next(d) do
begin
i:=0;
while e.Next(d1) do
begin
Memo1.Lines.Add(d['name']+'#'+IntToStr(i)+d1['name']);
inc(i);
end;
end;
finally
q.Free;
end;

Zend Framework Cache

I'm trying to make an ajax autocomplete search box that of course uses SQL, min 3 characters, and have a SQL view of relevant fields already set up and indexed in the db. The CPU still spikes when searching, which I expected as it's running a query for every character. I want to use Zend shm cache to speed up results and reduce CPU usage. The results are stored in an array which is to be cached like this:
while($row = db2_fetch_row($stmt)) {
$fSearch[trim($row[0]).trim($row[1])] = array(/*array built here*/);
}
if (zend_shm_cache_store('fSearch', $fSearch, 10 * 60) === false) {
error_log('Failed to store search cache!');
}
Of course there's actual data inside the array instead of comments, I just shortened the code for simplicity. Rows 0&1 form the PK, and this has tested to be working properly. It's the zend_shm_cache_store that fails because the error log gets flooded with 'Failed to store search cache!'. I read that zend_shm_cache_store can store any array that can be serialized - how can I tell if my data is serialized or can be serialized? Are there any other potential causes? I did make a test page that only stored a string and that was successful, so I know caching is on.
Solved: cache size was too small for array - increased cache size and it worked fine. Sorry for the trouble.

Extremely slow WordPress user import on XAMPP

I'm posting this question here because I'm not sure it's a WordPress issue.
I'm running XAMPP on my local system, with 512MB max headroom and a 2.5-hour php timeout. I'm importing about 11,000 records into the WordPress wp_user and wp_usermeta tables via a custom script. The only unknown quantity (performance-wise) on the WordPress end is the wp_insert_user and update_user_meta calls. Otherwise it's a straight CSV import.
The process to import 11,000 users and create 180,000 usermeta entries took over 2 hours to complete. It was importing about 120 records a minute. That seems awfully slow.
Are there known performance issues importing user data into WordPress? A quick Google search was unproductive (for me).
Are there settings I should be tweaking beyond the timeout in XAMPP? Is its mySQL implementation notoriously slow?
I've read something about virus software dramatically slowing down XAMPP. Is this a myth?
yes, there are few issues with local vs. hosted. One of the important things to remember is the max_execution time for php script. You may need to reset the timer once a while during the data upload.
I suppose you have some loop which takes the data row by row from CSV file for example and uses SQL query to insert it into WP database. I usually put this simple snippet into my loop so it will keep the PHP max_exec_time reset:
$counter = 1;
// some upload query
if (($handle = fopen("some-file.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
mysql_query..... blablabla....
// snippet
if($counter == '20') // this count 20 loops and resets the counter
{
set_time_limit(0);
$counter = 0;
}
$counter = $counter + 1;
} //end of the loop
.. also BTW 512MB room is not much if the database is big. Count how much resources is taking your OS and all running apps. I have ove 2Gb WO database and my MySql needs a lot of RAM to run fast. (depends on the query you are using as well)

Resources