Elastic Search - How to return records within time intervals - elasticsearch

I have an elastic search db deployed within an AWS VPC. It holds millions of records all with a timestamp added based on the unix datestamp (new Date().getTime()). I am trying to pull (1) record per time slot based on min/max hour and minute values.
Index Mapping:
{ timestamp: "date", ...rest of record }
Elastic Search Query:
let params = {
query: {
bool: {
must: [{
range: {
timestamp: {
gte: (unix date),
lte: (unix date)
}
}
},
{
script: {
script: {
source: "long datestamp = doc['timestamp'].value.getMillis(); " +
"Date dt = new java.util.Date(datestamp*1L); " +
"Calendar instance = Calendar.getInstance(); " +
"instance.setTime(dt); " +
"int hod = instance.get(Calendar.HOUR_OF_DAY); " +
"int tod = instance.get(Calendar.MINUTE); " +
"if (hod >= params.hourMin && hod <= params.hourMax && (hod === params.hourMin && tod >= params.timeMin || hod === params.hourMax && tod <= params.timeMax)) { return true; } else { return false }",
params: {
hourMin: 7,
hourMax: 8,
timeMin: 30,
timeMax: 10
}
}
}
}
]
}
},
from: 0,
size: 500
};
Issue:
I often run into an error while searching indicating that
"dynamic method [java.lang.Long, getMillis/0] not found"
It shows up every 4~5th query generally speaking.
Question:
Is there a better way? I have poured over the elastic search docs regarding intervals, histograms, etc and came up with query above. Not sure if this is the most efficient method nor the most robust.
If this is a community accept approach to find records within an interval then how do I mitigate the errors I am encountering. Do I skip over a specific record or reformat the unix timestamp another way?
Appreciate your support ahead of time.

Related

How to use orderBy for graphQL

I am trying to sort reservesUSD of nested object dailyPoolSnapshots In descending order by timestamp and return it's first value (in other words, return the latest entry).
I know almost nothing of GraphQL and it's documentation seems confusing and scarce. Can someone help me to figure out how to sort my objects?
I am using subgraphs on the Ethereum mainnet for Curve.fi to get information about pools
My code:
pools(first: 1000) {
name
address
coins
coinDecimals
dailyPoolSnapshots(first: 1,
orderBy:{field: timestamp, order: DESC}) {
reservesUSD
timestamp
}
}
}
It throws and error:
"errors": [
{
"locations": [
{
"line": 0,
"column": 0
}
],
"message": "Invalid value provided for argument `orderBy`: Object({\"direction\": Enum(\"DESC\"), \"field\": Enum(\"timestamp\")})"
}
]
}```
Here is your solution
{
pools(first: 1000) {
name
address
coins
coinDecimals
dailyPoolSnapshots(first: 1,
orderBy: timestamp, orderDirection:desc) {
reservesUSD
timestamp
}
}
}
In the playground, you have the doc on the right, you can search dailyPoolSnapshots, if you click on it, you will have the documentation of this query
Sample for this query:
Type
[DailyPoolSnapshot!]!
Arguments
skip: Int = 0
first: Int = 100
orderBy: DailyPoolSnapshot_orderBy
orderDirection: OrderDirection
where: DailyPoolSnapshot_filter
block: Block_height
Arguments are all the params you can use
The orderBy and orderDirection are separate query params, and orderDirection needs to be lowercase for their enum syntax.
{
platforms(first: 5) {
id
pools {
id
dailyPoolSnapshots(first: 1, orderBy: timestamp, orderDirection: asc) {
timestamp
}
}
poolAddresses
latestPoolSnapshot
}
registries(first: 5) {
id
}
}

2 items added to DynamoDB when I run putItem

I am working on a bookmark skill for Alexa to teach myself DynamoDB. I've got over various hurdles, and can now write to my table. The issue is that whenever I putItem it adds two items. I'm trying to store the userID (partition key in DynamoDB), the timestamp of the request (as a string, and the sort key in DynamoDB), the title of a book and the page the user is on. This issue has only started since I tried working with a composite key, but I think I will need both these fields to a) get a unique primary key and b) be able to find the last item saved by a user.
Here's my intent code in Lambda:
'addBookmark': function() {
//delegate to Alexa to collect all the required slot values
var filledSlots = delegateSlotCollection.call(this);
//Get slot values as variables
var userID = this.event.session.user.userId;
var pageNumber = this.event.request.intent.slots.pageNumber.value;
var bookTitle = this.event.request.intent.slots.bookTitle.value;
//DynamoDB expects the timestamp as a string, so we convert it
var timeStamp = Date.now().toString();
var params = {
TableName: 'bookmarkV6',
Item: {
'userID' : {S: userID},
'timeStamp': { S: timeStamp },
'bookTitle': { S: bookTitle },
'pageNumber': { N: pageNumber },
}
};
//Call DynamoDB to add the item to the table
ddb.putItem(params, function(err, data) {
if (err) {
console.log("Error", err);
}
else {
console.log("Success", data);
}
});
const speechOutput = "OK, I've made a note that you're on page " + pageNumber + " of " + bookTitle + ".";
this.response.cardRenderer("Bookmark", "Page " + pageNumber + " of " + sentenceCase(bookTitle) +"\n \n" + stringToDate(timeStamp));
this.response.speak(speechOutput);
this.emit(':responseReady');
},
The "duplicate" items have slightly different timestamp values.
I am also having the same issues. It is happening of delegate collections used, but not able to solve. I have delegate slot confirmation for 6 slots and when I give all 6 slots value, finally I end up with 7 records in the table.
In delegateSlotCollection() function, return "COMPLETED" in the else block and in your addbookmark intent , please check like below after your delegateSlotCollection.call method
var filledSlots = delegateSlotCollection.call(this);
if(filledSlots==='COMPLETED'){
place all your save dynamodb logic here.
}

elasticsearch-painless - Manipulate date

I am trying to manipulate date in elasticsearch's scripting language painless.
Specifically, I am trying to add 4 hours, which is 14,400 seconds.
{
"script_fields": {
"new_date_field": {
"script": {
"inline": "doc['date_field'] + 14400"
}
}
}
}
This throws Cannot apply [+] operation to types [org.elasticsearch.index.fielddata.ScriptDocValues.Longs] and [java.lang.Integer].
Thanks
The solution was to use .value
{
"script_fields": {
"new_date_field": {
"script": {
"inline": "doc['date_field'].value + 14400"
}
}
}
}
However, I actually wanted to use it for reindexing, where the format is a bit different.
Here is my version for manipulating time in the _reindex api
POST _reindex
{
"source": {
"index": "some_index_v1"
},
"dest": {
"index": "some_index_v2"
},
"script": {
"inline": "def sf = new SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss\"); def dt = sf.parse(ctx._source.date_field); def calendar = sf.getCalendar(); calendar.setTime(dt); def instant = calendar.toInstant(); def localDateTime = LocalDateTime.ofInstant(instant, ZoneOffset.UTC); ctx._source.date_field = localDateTime.plusHours(4);"
}
}
Here is the inline script in a readable version
def sf = new SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss\");
def dt = sf.parse(ctx._source.date_field);
def calendar = sf.getCalendar();
calendar.setTime(dt);
def instant = calendar.toInstant();
def localDateTime = LocalDateTime.ofInstant(instant, ZoneOffset.UTC);
ctx._source.date_field = localDateTime.plusHours(4);
Here is the list of functions supported by painless, it was painful.
An addition. Converting date to a string, your first part I believe, can be done with:
def dt = String.valueOf(ctx._source.date_field);
Just spent a couple of hours playing with this.. so I can concantenate a date field (in UTC format with 00:00:00 added).. to a string with the time, to get a valid datetime to add to ES. Don't ask why it was split.. its an old Oracle system

Script to return array for scripted metric aggregation from combine

For scripted metric aggregation , in the example shown in the documentation , the combine script returns a single number.
Instead here , can i pass an array or hash ?
I tried doing it , though it did not return any error , i am not able to access those values from reduce script.
In reduce script per shard i am getting an instance when converted to string read as 'Script2$_run_closure1#52ef3bd9'
Kindly let me know , if this can be accomplished in any way.
At least for Elasticsearch version 1.5.1 you can do so.
For example, we can modify Elasticsearch example (scripted metric aggregation) to receive an average profit (profit divided by number of transactions):
{
"query": {
"match_all": {}
},
"aggs": {
"avg_profit": {
"scripted_metric": {
"init_script": "_agg['transactions'] = []",
"map_script": "if (doc['type'].value == \"sale\") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }",
"combine_script": "profit = 0; num_of_transactions = 0; for (t in _agg.transactions) { profit += t; num_of_transactions += 1 }; return [profit, num_of_transactions]",
"reduce_script": "profit = 0; num_of_transactions = 0; for (a in _aggs) { profit += a[0] as int; num_of_transactions += a[1] as int }; return profit / num_of_transactions as float"
}
}
}
}
NOTE: this is just a demo for an array in the combine script, you can calculate average easily without using any arrays.
The response will look like:
"aggregations" : {
"avg_profit" : {
"value" : 42.5
}
}

MongoDB multikey index write performance degrading

In MongoDB I have a collection with documents having an array with subdocuments I would like to have an index on:
{
_id : ObjectId(),
members : [
{ ref : ObjectId().str, ... },
{ ref : ObjectId().str, ... },
...
]
}
The index is on the ref field, such that I can quickly find all documents having a particular 'ref' in its members:
db.test.ensureIndex({ "members.ref" : 1 });
I noticed that the performance of pushing an additional subdocument to the array degrades fast as the array length goes above a few thousand. If I instead use an index on an array of strings, the performance does not degrade.
The following code demonstrates the behavior:
var _id = ObjectId("522082310521b655d65eda0f");
function initialize () {
db.test.drop();
db.test.insert({ _id : _id, members : [], memberRefs : [] });
}
function pushToArrays (n) {
var total, err, ref;
total = Date.now();
for (var i = 0; i < n; i++) {
ref = ObjectId().str;
db.test.update({ _id : _id }, { $push : { members : { ref : ref }, memberRefs : ref } });
err = db.getLastError();
if (err) {
throw err;
}
if ((i + 1) % 1000 === 0) {
print("pushed " + (i + 1));
}
}
total = Date.now() - total;
print("pushed " + n + " in " + total + "ms");
}
initialize();
pushToArrays(5000);
db.test.ensureIndex({ "members.ref" : 1 });
pushToArrays(10);
db.test.dropIndexes();
db.test.ensureIndex({ "memberRefs" : 1 });
pushToArrays(10);
db.test.dropIndexes();
E.g., using MongoDB 2.4.6 on my machine I see the following times used to push 10 elements on arrays of length 5000:
Index on "members.ref": 37272ms
Index on "memberRefs": 405ms
That difference seems unexpected. Is this a problem with MongoDB or my use of the multikey index? Is there a recommended way of handling this? Thanks.
Take a look at SERVER-8192 and SERVER-8193. Hopefully that will help answer your question!

Resources