DynamoDB Query Returns Incomplete Data - aws-lambda

Why does my query in Lambda not return everything between the two Timestamps? It consistently returns the same, incomplete data.
In DynamoDB Item Explorer, I can query recent Timestamps and find the appropriate items by Device.
When I query in my Lambda, much of that same data is missing.
var params = {
Statement : `SELECT * FROM temps WHERE "Timestamp" >= ${fromParam} AND "Timestamp" <= ${toParam}`,
}
dynamodb.executeStatement(params, function(err, data) { ...
My DynamoDB table looks like this:

DDB will only read (note read, not the same as return) 1MB of data at a time.
If you are doing any filtering, then the returned data will be less than the 1MB read.
If there's more data to read, DDB will include LastEvaluatedKey in it's response. You'll need to call Query() again passing in the returned LastEvaluatedKey as ExclusiveStartKey
Thus, unless you can guarantee you'll never have more than 1MB of data to read, you'll want to call Query() in a loop till you get back all the data.
EDIT
Yes, if nextToken is returned, you'll need to pass that back in the next call..
I've never used execute statement, but it appears you're doing a full table scan. Rather than a query. You need to include a where device = in order to use query behind the scenes.
If you really need records for all devices, consider a adding a GSI with a single value as the partition key and timestamp as the sort key. Then use FROM TEMPS.mytsidx

Related

GraphQL: how to build a diff query

I'm new to GraphQL. I have a server API that only returns changes done to a list of objects since a certain timestamp given by the client. This is for performance reasons - so that polling will return a smaller result after the initial query. So the first query result would be something like:
{
data: [fullobj1, fullobj2, fullobj3, ...],
timestamp
}
and a subsequent query is sent with the previous query's timestamp would result only with objects that changed, and only the fields that changed in them:
{
data: [partialobj1, partialobj3],
timestamp
}
Since I define this in GQL as a "query", Apollo has no way of knowing that I want the results to merge rather than replace one another. What is the "proper" way of implementing this using Apollo client? (I'm using the React variant)

Update (increment) value when Parse Object is retrieved?

Is there a way to update a Parse Object field (increment) before it is returned from a Parse Query or retrieve method?
An example is for a view count to be incremented every time it is returned.
I've looked at beforeFind method but it looks like it is just for modifying queries before it is executed.
Add a cloud function to handle the query instead of calling the query directly from the client. In the find() success handler, iterate through and increment that value, call a Parse.Object.saveAll on the results, and then return all the results in the response.success() call.
Currently there is no such way that modifies the values you retrieve from queries. You can only edit the query statement. If there is a need to modify the value into DB, you have to retrieve first, update the table, then return back the incremented value.

RethinkDb OrderBy Before Filter, Performance

The data table is the biggest table in my db. I would like to query the db and then order it by the entries timestamps. Common sense would be to filter first and then manipulate the data.
queryA = r.table('data').filter(filter).filter(r.row('timestamp').minutes().lt(5)).orderBy('timestamp')
But this is not possible, because the filter creates a side table. And the command would throw an error (https://github.com/rethinkdb/rethinkdb/issues/4656).
So I was wondering if I put the orderBy first if this would crash the perfomance when the datatbse gets huge over time.
queryB = r.table('data').orderBy('timestamp').filter(filter).filter(r.row('timestamp').minutes().lt(5))
Currently I order it after querying, but usually datatbases are quicker in these processes.
queryA.run (err, entries)->
...
entries = _.sortBy(entries, 'timestamp').reverse() #this process takes on my local machine ~2000ms
Question:
What is the best approach (performance wise) to query this entries ordered by timestamp.
Edit:
The db is run with one shard.
Using an index is often the best way to improve performance.
For example, an index on the timestamp field can be created:
r.table('data').indexCreate('timestamp')
It can be used to sort documents:
r.table('data').orderBy({index: 'timestamp'})
Or to select a given range, for example the past hour:
r.table('data').between(r.now().sub(60*60), r.now(), {index: 'timestamp'})
The last two operations can be combined int one:
r.table('data').between(r.now().sub(60*60), r.maxval, {index: 'timestamp'}).orderBy({index: 'timestamp'})
Additional filters can also be added. A filter should always be placed after an indexed operation:
r.table('data').orderBy({index: 'timestamp'}).filter({colour: 'red'})
This restriction on filters is only for indexed operations. A regular orderBy can be placed after a filter:
r.table('data').filter({colour: 'red'}).orderBy('timestamp')
For more information, see the RethinkDB documentation: https://www.rethinkdb.com/docs/secondary-indexes/python/

Sesame caching common queries

I use Sesame in a JSP web based application and I would like to know if there is any way to cache some queries that are used consistently.
I assume that what you want to "cache" is the query result for a given query with specific value. You can very easily build such a cache yourself. Just create a class for the general query that internally keeps a reference to a HashMap that maps from a value key (e.g. the placeid for your example query) to a query result:
HashMap<URI, TupleQueryResult> cache = new HashMap<>();
Then all you do is check, for a given place id, if it is present in the cache. If it is not, you execute the query, get the result back and materialize it as a MutableTupleQueryResult which you can then put in that cache:
if (!cache.contains(placeId)) {
// reuse the prepared query with the specific binding for which we want a result
preparedQuery.setBinding("placeid", placeId);
// execute the query and add the result to a result object we can reuse multiple times
TupleQueryResult result = new MutableTupleQueryResult(preparedQuery.evaluate());
// put the result in the cache.
cache.put(placeId, result);
}
return cache.get(placeId);
If you want something a bit more sophisticated (e.g. something that throws out cached items after a certain time, or sets a size limit on your cache), I would have a look at using something like a Guava Cache instead of a simple HashMap, but the basic setup would remain the same.

This filters in memory right?

I Just want to make sure I understand this correctly...
search is an object that contains a querystring.
Repo.Query returns an ObjectQuery<T>.
From my understanding the chained linq statements will filter the results after entity framework has returned all the rows satisfying the query. So really ALL the rows are being returned and THEN filtered in memory. So we are returning a bunch of data that we don't really want. There's about 10k rows being returned so this is kind of important. Just like to get my confusion cleared up.
var searchQuery = Repo.Query(search)
.Where(entity =>
entity.Prop1.ToUpper().Equals(prop1.ToUpper()) &&
entity.Prop2.ToUpper().Equals(prop2.ToUpper()))
.OrderBy(entity => Repo.SortExpression ?? entity.prop1);
Your Repo.Query(string query) function should return IQueryable<T>.
Then you can filter and order without getting all rows first.
IQueryable(Of T) Interface
hope this helps
If this is to SQL, this will most likely create a SQL query and filter on the server and not in memory.
As a matter of fact, the statement above wouldn't actually do anything.
It's only when you iterate over it that the query will be executed. This is why certain providers (like the EF to SQL one) can collapse expression trees into a SQL query.
Easiest way to check is to use LINQPAD or the SQL Profiler to see what query is actually is executed.

Resources