Aggregating sequence of connected events - elasticsearch

Lets say I have events like this in my log
{type:"approval_revokation", approval_id=22}
{type:"approval", request_id=12, approval_id=22}
{type:"control3", request_id=12}
{type:"control2", request_id=12}
{type:"control1", request_id=12}
{type:"request", request_id=12 requesting_user="user1"}
{type:"registration", userid="user1"}
I would like to do a search that aggregates one bucket for each approval_id containing all events connected to it as above. As you see there is not a single id field that can be used throughout the events, but they are all connected in a chain.
The reason I would like this is to feed this into a anomaly detector to verify things like that all controls where executed and validate registration event for a eventual approval.
Can this be done using aggregation or are there any other suggestion?

If there's no single unique "glue" parameter to tie these events together, I'm afraid the only choice is a brute-force map-reduce iterator on all the docs in the index.
After ingesting the above events:
POST _bulk
{"index":{"_index":"events","_type":"_doc"}}
{"type":"approval_revokation","approval_id":22}
{"index":{"_index":"events","_type":"_doc"}}
{"type":"approval","request_id":12,"approval_id":22}
{"index":{"_index":"events","_type":"_doc"}}
{"type":"control3","request_id":12}
{"index":{"_index":"events","_type":"_doc"}}
{"type":"control2","request_id":12}
{"index":{"_index":"events","_type":"_doc"}}
{"type":"control1","request_id":12}
{"index":{"_index":"events","_type":"_doc"}}
{"type":"request","request_id":12,"requesting_user":"user1"}
{"index":{"_index":"events","_type":"_doc"}}
{"type":"registration","userid":"user1"}
we can link them together like so:
POST events/_search
{
"size": 0,
"aggs": {
"log_groups": {
"scripted_metric": {
"init_script": "state.groups = [];",
"map_script": """
int fetchIndex(List groups, def key, def value, def backup_key) {
if (key == null || value == null) {
// nothing to search
return -1
}
return IntStream.range(0, groups.size())
.filter(i -> groups.get(i)['docs']
.stream()
.anyMatch(_doc -> _doc.get(key) == value
|| (backup_key != null
&& _doc.get(backup_key) == value)))
.findFirst()
.orElse(-1);
}
def approval_id = doc['approval_id'].size() != 0
? doc['approval_id'].value
: null;
def request_id = doc['request_id'].size() != 0
? doc['request_id'].value
: null;
def requesting_user = doc['requesting_user.keyword'].size() != 0
? doc['requesting_user.keyword'].value
: null;
def userid = doc['userid.keyword'].size() != 0
? doc['userid.keyword'].value
: null;
HashMap valueMap = ['approval_id':approval_id,
'request_id':request_id,
'requesting_user':requesting_user,
'userid':userid];
def found = false;
for (def entry : valueMap.entrySet()) {
def field = entry.getKey();
def value = entry.getValue();
def backup_key = field == 'userid'
? 'requesting_user'
: field == 'requesting_user'
? 'userid'
: null;
def found_index = fetchIndex(state.groups, field, value, backup_key);
if (found_index != -1) {
state.groups[found_index]['docs'].add(params._source);
if (approval_id != null) {
state.groups[found_index]['approval_id'] = approval_id;
}
found = true;
break;
}
}
if (!found) {
HashMap nextInLine = ['docs': [params._source]];
if (approval_id != null) {
nextInLine['approval_id'] = approval_id;
}
state.groups.add(nextInLine);
}
""",
"combine_script": "return state",
"reduce_script": "return states"
}
}
}
}
returning the grouped events + the inferred approval_id:
"aggregations" : {
"log_groups" : {
"value" : [
{
"groups" : [
{
"docs" : [
{...}, {...}, {...}, {...}, {...}, {...}, {...}
],
"approval_id" : 22
},
{ ... }
]
}
]
}
}
Keep in mind that such scripts are going to be quite slow, esp. when run on large numbers of events.

Related

Elasticsearch: Multiply each nested element plus aggregation

Let's imagine an index composed of 2 documents like this one:
doc1 = {
"x":1,
"y":[{brand:b1, value:1},
{brand:b2, value:2}]
},
doc2 = {
"x":2,
"y":[{brand:b1, value:0},
{brand:b2, value:3}]
}
Is it possible to multiply each values of y by x for each document and then do sum aggregation based on brand term to get this result:
b1: 1
b2: 8
If not, could it be done with any other mapping types ?
This is a highly custom use-case so I don't think there's some sort of a pre-optimized mapping for it.
What I would suggest is the following:
Set up an index w/ y being nested:
PUT xy/
{"mappings":{"properties":{"y":{"type":"nested"}}}}
Ingest the docs from your example:
POST xy/_doc
{"x":1,"y":[{"brand":"b1","value":1},{"brand":"b2","value":2}]}
POST xy/_doc
{"x":2,"y":[{"brand":"b1","value":0},{"brand":"b2","value":3}]}
Use a scripted_metric aggregation to compute the products and add them up in a shared HashMap:
GET xy/_search
{
"size": 0,
"aggs": {
"multiply_and_add": {
"scripted_metric": {
"init_script": "state.by_brands = [:]",
"map_script": """
def x = params._source['x'];
for (def brand_pair : params._source['y']) {
def brand = brand_pair['brand'];
def product = x * brand_pair['value'];
if (state.by_brands.containsKey(brand)) {
state.by_brands[brand] += product;
} else {
state.by_brands[brand] = product;
}
}
""",
"combine_script": "return state",
"reduce_script": "return states"
}
}
}
}
which would yield something along the lines of
{
...
"aggregations":{
"multiply_and_add":{
"value":[
{
"by_brands":{ <----
"b2":8,
"b1":1
}
}
]
}
}
}
UPDATE
The combine_script could look like this:
def combined_states = [:];
for (def state : states) {
for (def brand_pair : state['by_brands'].entrySet()) {
def key = brand_pair.getKey();
def value = brand_pair.getValue();
if (combined_states.containsKey(key)) {
combined_states[key] += (float)value;
break;
}
combined_states[key] = (float)value
}
}

Why does Spring Data fail on date queries?

I have records in my mongodb which are like this example record.
{
"_id" : ObjectId("5de6e329bf96cb3f8d253163"),
"changedOn" : ISODate("2019-12-03T22:35:21.126Z"),
"bappid" : "BAPP0131337",
}
I have code which is implemented as:
public List<ChangeEvent> fetchChangeList(Application app, Date from, Date to) {
Criteria criteria = null;
criteria = Criteria.where("bappid").is(app.getBappid());
Query query = Query.query(criteria);
if(from != null && to == null) {
criteria = Criteria.where("changedOn").gte(from);
query.addCriteria(criteria);
}
else if(to != null && from == null) {
criteria = Criteria.where("changedOn").lte(to);
query.addCriteria(criteria);
} else if(from != null && to != null) {
criteria = Criteria.where("changedOn").gte(from).lte(to);
query.addCriteria(criteria);
}
logger.info("Find change list query: {}", query.toString());
List<ChangeEvent> result = mongoOps.find(query, ChangeEvent.class);
return result;
This code always comes up empty. The logging statement generates a log entry like:
Find change list query: Query: { "bappid" : "BAPP0131337", "changedOn" : { "$gte" : { "$date" : 1575418473670 } } }, Fields: { }, Sort: { }
Playing around with variants of the query above in a database which has the record above gets the following results we get.
Returns records:
db["change-events"].find({ "bappid" : "BAPP0131337" }).pretty();
Returns empty set:
db["change-events"].find({ "bappid" : "BAPP0131337", "changedOn" : { "$gte" : { "$date" : 1575418473670 } } }).pretty();
Returns empty set:
db["change-events"].find({ "bappid" : "BAPP0131337", "changedOn" : { "$lte" : { "$date" : 1575418473670 } } }).pretty();
The record returned without the date query should be non empty on one of the two above. But it is empty on both.
What is wrong here?
Since the collection name change-events is different then Class name ChangeEvent so you have to pass the collection name in the find query of mongoOps as below:
List<ChangeEvent> result = mongoOps.find(query, ChangeEvent.class, "change-events");
I have tried it replicating and found that your query without dates in where clause also not working i.e:
Criteria criteria = null;
criteria = Criteria.where("bappid").is(bappid);
Query query = Query.query(criteria);
And the find query on mongoOps as below:
List<ChangeEvent> result = mongoTemplate.find(query, ChangeEvent.class);
Will not work, becuase collection name is missing, below query with collection name execute fine:
List<ChangeEvent> result1 = mongoTemplate.find(query, ChangeEvent.class, "changeEvents");
For details explanation of above discussion you can find out at my Github repo: https://github.com/krishnaiitd/learningJava/blob/master/spring-boot-sample-data-mongodb/src/main/java/sample/data/mongo/main/Application.java#L157

Count of unique aggregration doc_count in ElasticSearch

Using ElasticSearch 7.0, I can get how many log I have for each user with an aggregation :
"aggs": {
"by_user": {
"terms": {
"field": "user_id",
}
}
}
This returns me something like:
user32: 25
user52: 20
user10: 20
...
What I would like is to know how many user have 25 logs, and how many user have 20 logs etc. The ideal result would be something like :
25: 1
20: 2
19: 4
12: 54
Because 54 users have 12 logs lines.
How can I make an aggregation that returns this result ?
It sounds like you can use Bucket Script Aggregation to simplify your query but the problem is that there is still open PR on this topic.
So, for now i think the simplest is to use painless script with Scripted Metric Aggregation. I recommend you to carefully read about the stages of its execution.
In terms of code I know it's not the best algorithm for your problem but quick and dirty your query could look something like this:
GET my_index/_search
{
"size": 0,
"query" : {
"match_all" : {}
},
"aggs": {
"profit": {
"scripted_metric": {
"init_script" : "state.transactions = [:];",
"map_script" :
"""
def key = doc['firstName.keyword'];
if (key != null && key.value != null) {
def value = state.transactions[key.value];
if(value==null) value = 0;
state.transactions[key.value] = value+1
}
""",
"combine_script" : "return state.transactions",
"reduce_script" :
"""
def result = [:];
for (state in states) {
for (item in state.entrySet()) {
def key=item.getValue().toString();
def value = result[key];
if(value==null)value = 0;
result[key]=value+1;
}
}
return result;
"""
}
}
}
}

How to cast swiftyjson value if sometimes is detected as int and other as string

I am getting the login info using alamofire and swiftyjson
Alamofire.request(.POST, postsEndpoint, parameters: newPost)
.responseSwiftyJSON({ (request, response, json, error) in
in my post response i have the value
json["id_usuario"]
the problem is that, when the value is -1 or 0 (zero) it can be obtained as int
using
let idUser = json["id_usuario"].int
and example of the reponse with the value in -1
{
"id_usuario" : -1
}
and the response when the value is greater than, a success login
{
"estado" : "Jalisco",
"plan_activo" : "0",
"datos_registro_completos" : 1,
"plan" : 0,
"genero" : "H",
"id_usuario_openpay" : "annvwl3didjylvex0wzh",
"fb_id" : "10205386840402780",
"email" : "steel.edward#hotmail.com",
"postal_code" : "44630",
"address" : "Nueva Escocia #1514 Interior 106",
"nombres" : "Steel Edward",
"app_mat" : "George",
"app_pat" : "Vázquez",
"ciudad" : "Guadalajara",
"id_usuario" : "204",
"admin" : "1",
"phone_number" : "3334691505"
}
but if the value is greater than 0 returns a nil and only could be obtained as string
let idUser = json["id_usuario"].string
my final code works and looks like this
if let idUser = json["id_usuario"].int {
if(idUser == -1) {
//specific error from the server
} else if(idUser == 0) {
//another specific error from the server
}
} else if let idUser = json["id_usuario"].string {
if(idUser != "") {
//success
}
}
i would like to store the value always as Int and perform the validation using it, and to have a code like this
if(idUser == -1) {
//specific error from the server
} else if(idUser == 0) {
//another specific error from the server
} else if (idUser > 0) {
//success
}
var id = 0
if let num = json["id_usuario"].int {
id = num
} else if let str = json["id_usuario"].string,
let num = Int(str) {
id = num
}
//
// do something with id, which is an Int value
if you own the server code, you are better to find a bug there ....
This was my first solution not using Swifty-JSON
let id_usuario_object:NSObject = jsonData.valueForKey("id_usuario") as! NSObject
var id_usuario:Int
if ( id_usuario_object.isKindOfClass(NSString) ) {
let id_usuario_string:NSString = jsonData.valueForKey("id_usuario") as! NSString
id_usuario = Int(id_usuario_string.intValue)
} else {
id_usuario = jsonData.valueForKey("id_usuario") as! NSInteger
}
The problem resides in the JSON response from the PHP...
Before
echo json_encode( $results);
After
echo json_encode( $results, JSON_NUMERIC_CHECK );

Script to return array for scripted metric aggregation from combine

For scripted metric aggregation , in the example shown in the documentation , the combine script returns a single number.
Instead here , can i pass an array or hash ?
I tried doing it , though it did not return any error , i am not able to access those values from reduce script.
In reduce script per shard i am getting an instance when converted to string read as 'Script2$_run_closure1#52ef3bd9'
Kindly let me know , if this can be accomplished in any way.
At least for Elasticsearch version 1.5.1 you can do so.
For example, we can modify Elasticsearch example (scripted metric aggregation) to receive an average profit (profit divided by number of transactions):
{
"query": {
"match_all": {}
},
"aggs": {
"avg_profit": {
"scripted_metric": {
"init_script": "_agg['transactions'] = []",
"map_script": "if (doc['type'].value == \"sale\") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }",
"combine_script": "profit = 0; num_of_transactions = 0; for (t in _agg.transactions) { profit += t; num_of_transactions += 1 }; return [profit, num_of_transactions]",
"reduce_script": "profit = 0; num_of_transactions = 0; for (a in _aggs) { profit += a[0] as int; num_of_transactions += a[1] as int }; return profit / num_of_transactions as float"
}
}
}
}
NOTE: this is just a demo for an array in the combine script, you can calculate average easily without using any arrays.
The response will look like:
"aggregations" : {
"avg_profit" : {
"value" : 42.5
}
}

Resources