Odata filter on cached data - caching

Say for instance we want to pull back a list of 50 individuals. When using odata with this filter, $top=2&$skip=4, it is returning those 2 records, but what we are wanting to do is possibly sending back 50 individuals, and be able to run the filter against those.
When I debug my program, I run it to get the 50, and every subsequent call brings me the 50 back from cache. When I run the same thing, but add the following $top=2&$skip=4, it runs through my code and gets all the records and returns the two objects from the code, not the cache.
GetIndividuals(ODataQueryOptions<Individuals> opts)
{
.....
var results = opts.ApplyTo(objAll.AsQueryable(), settings);
return new PageResult<Individual>(
results as IEnumerable<Individual>,
Request.ODataProperties().NextLink,
objAll.AsQueryable().Count());
}
I hope this is clear.....any ideas on how to return a larger group of data and then run odata on it after the fact?

Related

Redis pipeline, dealing with cache misses

I'm trying to figure out the best way to implement Redis pipelining. We use redis as a cache on top of MySQL to store user data, product listings, etc.
I'm using this as a starting point: https://joshtronic.com/2014/06/08/how-to-pipeline-with-phpredis/
My question is, assuming you have an array of ids properly sorted. You loop through the redis pipeline like this:
$redis = new Redis();
// Opens up the pipeline
$pipe = $redis->multi(Redis::PIPELINE);
// Loops through the data and performs actions
foreach ($users as $user_id => $username)
{
// Increment the number of times the user record has been accessed
$pipe->incr('accessed:' . $user_id);
// Pulls the user record
$pipe->get('user:' . $user_id);
}
// Executes all of the commands in one shot
$users = $pipe->exec();
What happens when $pipe->get('user:' . $user_id); is not available, because it hasn't been requested before or has been evicted by Redis, etc? Assuming it's result # 13 from 50, how do we a) find out that we weren't able to retrieve that object and b) keep the array of users properly sorted?
Thank you
I will answer the question referring to Redis protocol. How it works in particular language is more or less the same in that case.
First of all, let's check how Redis pipeline works:
It is just a way to send multiple commands to server, execute them and get multiple replies. There is nothing special, you just get an array with replies for each command in the pipeline.
Why pipelines are much faster is because roundtrip time for each command is saved, i.e. for 100 commands there is only one round-trip time instead of 100. In addition, Redis executes every command synchronously. Executing 100 commands needs potentially fighting 100 times, for Redis to pick that singular command, pipeline is treated as one long command, thus requiring only once to wait being picked synchronously.
You can read more about pipelining here: https://redis.io/topics/pipelining. One more note, because each pipelined batch runs uninterruptible (in terms of Redis) it makes sense to send these commands in overviewable chunks, i.e. don't send 100k commands in a single pipeline, that might block Redis for a long period of time, split them into chunks of 1k or 10k commands.
In your case you run in the loop the following fragment:
// Increment the number of times the user record has been accessed
$pipe->incr('accessed:' . $user_id);
// Pulls the user record
$pipe->get('user:' . $user_id);
The question is what is put into pipeline? Let's say you'd update data for u1, u2, u3, u4 as user ids. Thus the pipeline with Redis commands will look like:
INCR accessed:u1
GET user:u1
INCR accessed:u2
GET user:u2
INCR accessed:u3
GET user:u3
INCR accessed:u4
GET user:u4
Let's say:
u1 was accessed 100 times before,
u2 was accessed 5 times before,
u3 was not accessed before and
u4 and accompanying data does not exist.
The result will be in that case an array of Redis replies having:
101
u1 string data stored at user:u1
6
u2 string data stored at user:u2
1
u3 string data stored at user:u3
1
NIL
As you can see, Redis will treat missing INCR values as being 0 and execute incr(0). Finally, there is nothing being sorted by Redis and the results will come in the oder as requested.
The language binding, e.g. Redis driver, will just parse for you that protocol and give the view to parsed data. Without keeping the oder of commands it'll be impossible for Redis driver to work correctly and for you as programmer to deduce smth. Just keep in mind, that request is not duplicated in the reply i.e. you will not receive key for u1 or u2 when doing GET, but just the data for that key. Thus your implementation must remember that on position 1 (zero based index) comes the result of GET for u1.

Unable to get results more than 100 results on google custom search api

I need to use Google Custom Search API https://developers.google.com/custom-search/v1/overview. From that page, it said:
For CSE users, the API provides 100 search queries per day for free.
If you need more, you may sign up for billing in the Developers
Console. Additional requests cost $5 per 1000 queries, up to 10k
queries per day.
I already sign up for billing inside the developer console. However, I still could not retrieve results more than 100. What things should I do more? https://www.googleapis.com/customsearch/v1?cx=CSE_INSTANCE&key=API_KEY&q=QUERY&start=100
{ error: { errors: [ { domain: "global", reason: "invalid", message:
"Invalid Value" } ], code: 400, message: "Invalid Value" } }
Query: Definition
https://support.google.com/customsearch/answer/1361951
Any actual user query from a Google Site Search engine, including but
not limited to search engines installed on your website using XML,
iFrame, or the Custom Search Element.
That means you would probably need to send eleven queries to get more than 100 results.
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=1
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=11
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=21
GET ...
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=81
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=91
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=101
Check every response and if error code is 400, you can stop - there is probably no need to send next (&start=previous+10) request.
Now you can merge responses and start building results page.
Google Custom Search and Google Site Search return up to 10 results
per query. If you want to display more than 10 results to the user,
you can issue multiple requests (using the start=0, start=11 ...
parameters) and display the results on a single page. In this case,
Google will consider each request as a separate query, and if you are
using Google Site Search, each query will count towards your limit.
There might be a better way to do this then I described above. (But, I'm not sure about batching API calls.)
And (finally) possible answer to your question: I made more than few tests, but I haven't had any luck with start greater than 100 (I was getting the same as you - <Response [400]>). I'm using "Browser key" from my billing-enabled project. That could mean we can't get 101st, 102nd, 103rd, etc. results with CSE API.
The API documentation says it never returns more than 100 items.
https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list
start
integer (uint32 format)
The index of the first result to return. The default number of results
per page is 10, so &start=11 would start at the top of the second page
of results. Note: The JSON API will never return more than 100
results, even if more than 100 documents match the query, so setting
the sum of start + num to a number greater than 100 will produce an
error. Also note that the maximum value for num is 10.

How do we improve a MongoDB MapReduce function that takes too long to retrieve data and gives out of memory errors?

Retrieving data from mongo takes too long, even for small datasets. For bigger datasets we get out of memory errors of the javascript engine. We've tried several schema designs and several ways to retrieve data. How do we optimize mongoDB/mapReduce function/MongoWire to retrieve more data quicker?
We're not very experienced with MongoDB yet and are therefore not sure whether we're missing optimization steps or if we're just using the wrong tools.
1. Background
For graphing and playback purposes we want to store changes for several objects over time. Currently we have tens of objects per project, but expectations are we need to store thousands of objects. The objects may change every second or not change for long periods of time. A Delphi backend writes to and reads from MongoDB through MongoWire and SuperObjects, the data is displayed in a web frontend.
2. Schema design
We're storing the object changes in minute-second-millisecond objects in a record per hour. The schema design is like described here. Sample:
o: object1,
dt: $date,
v: {0: {0:{0: {speed: 8, rate: 0.8}}}, 1: {0:{0: {speed: 9}}}, …}
We've put indexes on {dt: -1, o: 1} and {o:1}.
3. Retrieving data
We use a mapReduce to construct a new date based on the minute-second-millisecond objects and to put the object back in v:
o: object1,
dt: $date,
v: {speed: 8, rate:0.8}
An average document is about 525 kB before the mapReduce function and has had ~29000 updates. After mapReduce of such a document, the result is about 746 kB.
3.1 Retrieving data from through mongo shell with mapReduce
We're using the following map function:
function mapF(){
for (var i = 0; i < 3600; i++){
var imin = Math.floor(i / 60);
var isec = (i % 60);
var min = ''+imin;
var sec = ''+isec;
if (this.v.hasOwnProperty(min) && this.v[min].hasOwnProperty(sec)) {
for (var ms in this.v[min][sec]) {
if (imin !== 0 && isec !== 0 && ms !== '0' && this.v[min][sec].hasOwnProperty(ms)) {// is our keyframe
var currentV = this.v[min][sec][ms];
//newT is new date computed by the min, sec, ms above
if (toDate > newT && newT > fromDate) {
if (fields && fields.length > 0) {
for (var p = 0, length = fields.length; p < length; p++){
//check if field is present and put it in newV
}
if (newV) {
emit(this.o, {vs: [{o: this.o, dt: newT, v: newV}]});
}
} else {
emit(this.o, {vs: [{o: this.o, dt: newT, v: currentV}]});
}
}
}
}
}
}
};
The reduce function basically just passes the data on. The call to mapReduce:
db.collection.mapReduce( mapF,reduceF,
{out: {inline: 1},
query: {o: {$in: objectNames]}, dt: {$gte: keyframeFromDate, $lt: keyframeToDate}},
sort: {dt: 1},
scope: {toDate: toDateWithinKeyframe, fromDate: fromDateWithinKeyframe, fields: []},
jsMode: true});
Retrieving 2 objects over 1 hour: 2,4 seconds.
Retrieving 2 objects over 5 hour: 8,3 seconds.
For this method we would have to write js and bat files runtime and read the json data back in. We have not measured times fort his yet, because frankly, we don’t like the idea very much.
Another problem with this method is that we get out of memory errors of the v8 javascript engine when we try to retrieve data for longer periods and/or more objects. Using a pc with more RAM works to some extend in preventing out of memory, but it doesn't make retrieving data faster.
This article mentions splitVector, which we might use to devide the workload. But we're not sure on how to use the keyPattern and maxChunkSizeBytes options. Can we use a keyPattern for both o and dt?
We might use multiple collections, but our dataset isn’t that big to start with at the moment, so we’re worried about how much collections we’d need.
3.2 Retrieving data through mongoWire with mapReduce
For retrieving data through mongoWire with mapReduce, we use the same mapReduce functions as above. We use the following Delphi code to start te query:
FMongoWire.Get('$cmd',BSON([
'mapreduce', ‘collection’,
'map', bsonJavaScriptCodePrefix + FMapVCRFunction.Text,
'reduce', bsonJavaScriptCodePrefix + FReduceVCRFunction.Text,
'out', BSON(['inline', 1]),
'query', mapquery,
'sort', BSON(['dt', -1]),
'scope', scope
]));
Retrieving data with this method is about 3-4 times (!) slower. And then the data has to be translated from BSON (IBSONDocument to JSON (SuperObject), which is a major time consuming part in this method. For retrieving raw data we use TMongoWireQuery which translates the BSONdocument in parts, while this mapReduce function uses TMongoWire directly and tries to translate the complete result. This might explain why this takes so long, while normally it's quite fast. If we can reduce the time it takes for the mapReduce to return results, this might be a next step for us to focus on.
3.3 Retrieving raw data and parsing in Delphi
Retrieving raw data to Delphi takes a bit longer then the previous method, but probably because of the use of TMongoWireQuery, the translation from BSON to JSON is much quicker.
4. Questions
Can we do further optimizations on our schema design?
How can we make the mapReduce function faster?
How can we prevent the out of
memory errors of the v8 engine? Can someone give more information on
the splitVector function?
How can we best use of mapReduce from Delphi? Can we use
MongoWireQuery in stead of MongoWire?
5. Specs
MongoDB 3.0.3
MongoWire from 2015 (recently updated)
Delphi 2010 (got XE5 as well)
4GB RAM (tried on 8GB RAM as well, less out of memory, but reading times are about the same)
Phew what a question! First up: I'm not an expert at MongoDB. I wrote TMongoWire as a way to get to know MongoDB a little. Also I really (really) dislike when wrappers have a plethora of overloads to do the same thing but for all kinds of specific types. A long time ago programmers didn't have generics, but we did have Variant. So I built a MongoDB wrapper (and IBSONDocument) based around variants. That said, I apparently made something people like to use, and by keeping it simple performs quite well. (I haven't been putting much time in it lately, but on the top of the list is catering for the new authentication schemes since version 3.)
Now, about your specific setup. You say you use mapreduce to get from 500KB to 700KB? I think there's a hint there you're using the wrong tool for the job. I'm not sure what the default mongo shell does differently than when you do the same over TMongoWire.Get, but if I assume mapReduce assembles the response first before sending it over the wire, that's where the performance gets lost.
So here's my advice: you're right with thinking about using TMongoWireQuery. It offers a way to process data faster as the server will be streaming it in, but there's more.
I strongly suggest to use an array to store the list of seconds. Even if not all seconds have data, store null on the seconds without data so each minute array has 60 items. This is why:
One nicety that turned up in designing TMongoWireQuery, is the assumption you'll be processing a single (BSON) document at a time, and that the contents of the documents will be roughly similar, at least in the value names. So by using the same IBSONDocument instance when enumerating the response, you actually save a lot of time by not having to de-allocate and re-allocate all those variants.
That goes for simple documents, but would actually be nice to have on arrays as well. That's why I created IBSONDocumentEnumerator. You need to pre-load an IBSONDocument instance with an IBSONDocumentEnumerator in the place where you're expecting the array of documents, and you need to process the array in roughly the same way as with TMongoWireQuery: enumerate it using the same IBSONDocument instance, so when subsequent documents have the same keys, time is saved not having to re-allocate them.
In your case though, you would still need to pull the data of an entire hour through the wire just to select the seconds you need. As I said before, I'm not a MongoDB expert, but I suspect there could be a better way to store data like this. Either with a separate document per second (I guess this would let the indexes do more of the work, and MongoDB can take that insert-rate), or with a specific query construction so that MongoDB knows to shorten the seconds array into just that data you're requesting (is that what $splice does?)
Here's an example of how to use IBSONDocumentEnumerator on documents like {name:"fruit",items:[{name:"apple"},{name:"pear"}]}
q:=TMongoWireQuery.Create(db);
try
q.Query('test',BSON([]));
e:=BSONEnum;
d:=BSON(['items',e]);
d1:=BSON;
while q.Next(d) do
begin
i:=0;
while e.Next(d1) do
begin
Memo1.Lines.Add(d['name']+'#'+IntToStr(i)+d1['name']);
inc(i);
end;
end;
finally
q.Free;
end;

Zend Framework Cache

I'm trying to make an ajax autocomplete search box that of course uses SQL, min 3 characters, and have a SQL view of relevant fields already set up and indexed in the db. The CPU still spikes when searching, which I expected as it's running a query for every character. I want to use Zend shm cache to speed up results and reduce CPU usage. The results are stored in an array which is to be cached like this:
while($row = db2_fetch_row($stmt)) {
$fSearch[trim($row[0]).trim($row[1])] = array(/*array built here*/);
}
if (zend_shm_cache_store('fSearch', $fSearch, 10 * 60) === false) {
error_log('Failed to store search cache!');
}
Of course there's actual data inside the array instead of comments, I just shortened the code for simplicity. Rows 0&1 form the PK, and this has tested to be working properly. It's the zend_shm_cache_store that fails because the error log gets flooded with 'Failed to store search cache!'. I read that zend_shm_cache_store can store any array that can be serialized - how can I tell if my data is serialized or can be serialized? Are there any other potential causes? I did make a test page that only stored a string and that was successful, so I know caching is on.
Solved: cache size was too small for array - increased cache size and it worked fine. Sorry for the trouble.

MongoDB Geospatial Load More Between HTTP Requests

AcaniUsers loads the first 20 users in MongoDB (on Heroku via Sinatra) closest to me from my iPhone. I want to add a Load More button that will load the next 20 users closest to me. Keep in mind, my location and the locations of the users on my phone may have changed. I was thinking of switching from Sinatra to Node.js and opening a WebSocket, so I could have realtime updates of the presences & locations of the users on my phone, but think I should save that challenge for a next iteration. Basically, how should I implement the load more functionality?
To paginate queries in MongoDB you can use a combination of limit() and skip().
So, the first query will be:
your_query.limit(20)
Then if you want to load the second 20 (you will have to remember the first query somewhere):
your_query.skip(20).limit(20)
btw I suggest you to execute in the first place the query with a limit higher than 20 and put in the cache the result you don't display. When requested, just get them from the cache (you can store it in the user session). If the position change, restart from scratch and re-query the db invalidating the cache.
think of it more as a client side question: use subscriptions based on the current group - encode the group into a geo-square if possible (more efficient than circle, I think?) - periodically (t) executes an operation that checks the locations of each user and simply sends them out with a group id to match the subscriptions
actually...to build your subscription groups, just use the geonear command on all of your subscribers
- build a hash of your subscribers and their groups
- each subscriber is subscribed to one group and themselves (for targeted communication => indicate that a specific subscriber should change their subscription)
- iterate through the results i number of times where i is the number of individuals in an update group
- execute an action that checks the current value of j, the group number for a specific subscriber, against the new j value - if there is a change, notify the subscriber on the subsriber's private channel
- notifications synchronously follow subscriber adjustments
something like:
var pageSize;
// assign pageSize in method call
var documents = collection.Find(query);
var max = documents.Size();
for (int i = 0; i == max ; i++)
{
var level = i*pageSize;
if (max / level > 1)
{
documents.Skip(pageSize);
}
else
{
documents.Skip(pageSize).Limit(level);
break;
}
}
:)

Resources