Recursively create hierarchical hash from Mongo collection in Ruby - ruby

I am trying to create a hash that looks like this:
{"difficulty"=>{"easy"=>{}, "normal"=>{}, "hard"=>{}}, "terrain"=>{"snow"=>{"sleet"=>{}, "powder"=>{}}, "jungle"=>{}, "city"=>{}}}
From a MongoDB collection which is and enumerable list of hashes that looks like this:
{
"_id" : "globalSettings",
"groups" : [
"difficulty",
"terrain"
],
"parent" : null,
"settings" : {
"maxEnemyCount" : 10,
"maxDamageInflicted" : 45,
"enemyHealthPoints" : 40,
"maxEnemySpeed" : 25,
"maxPlayerSpeed" : 32,
"lightShader" : "diffuse",
"fogDepth" : 12,
"terrainModifier" : 9
}
},
{
"_id" : "difficulty",
"groups" : [
"easy",
"normal",
"hard"
],
"parent" : "globalSettings",
"settings" : {
}
}
{
"_id" : "terrain",
"groups" : [
"snow",
"jungle",
"city"
],
"parent" : "globalSettings",
"settings" : {
}
}
{
"_id" : "snow",
"groups" : [
"sleet",
"powder"
],
"parent" : "terrain",
"settings" : {
"fogDepth" : 4
}
}
{
"_id" : "jungle",
"groups" : [ ],
"parent" : "terrain",
"settings" : {
"terrainModifier" : 6
}
}
{
"_id" : "city",
"groups" : [ ],
"parent" : "terrain",
"settings" : {
"lightShader" : "bumpedDiffuse"
}
}
{
"_id" : "easy",
"groups" : [ ],
"parent" : "difficulty",
"settings" : {
"maxEnemyCount" : 5
}
}
{
"_id" : "normal",
"groups" : [ ],
"parent" : "difficulty",
"settings" : {
}
}
{
"_id" : "hard",
"groups" : [ ],
"parent" : "difficulty",
"settings" : {
"maxEnemyCount" : 20
}
}
{
"_id" : "sleet",
"groups" : [ ],
"parent" : "snow",
"settings" : {
"fogDepth" : 2
}
}
{
"_id" : "powder",
"groups" : [ ],
"parent" : "snow",
"settings" : {
"terrainModifier" : 2
}
}
Every time I try to write the function, I get stuck when setting the parent of the group. How do I recurse, yet keep track of the path of hierarchy?
The closest I've come is with this:
def dbCurse(nodes, parent = nil)
withParent, withoutParent = nodes.partition { |n| n['parent'] == parent }
withParent.map do |node|
newNode={}
newNode[node["_id"]]={}
newNode[node["_id"]].merge(node["_id"] => dbCurse(withoutParent, node['_id']))
end
end
which gives me a crazy mix of arrays and hashes:
{"globalSettings"=>[{"difficulty"=>[{"easy"=>[]}, {"normal"=>[]}, {"hard"=>[]}]}, {"terrain"=>[{"snow"=>[{"sleet"=>[]}, {"powder"=>[]}]}, {"jungle"=>[]}, {"city"=>[]}]}]}
I think the arrays are getting mixed in there from the #map but I'm not sure how to get rid of them to get the clean hash of hashes I show at the top of my question.
Thank you,
David

So looking at your sample input, I'm going to make a core assumption:
The order of the hash objects in the input list are somewhat well-defined. Namely, that if the globalSettings hash refers to the group difficulty, then the next object in the list with _id == 'difficulty' and parent == 'globalSettings' is the correct match.
If this is true, then you can write a function that accepts a description of what you're looking for (i.e., the object with _id == 'difficulty' and parent == 'globalSettings') along with a reference to where you want that data stored, which can then recurse using deeper references.
def doit(obj_list, work = {})
work.each do |key, data|
# fetch the node
node_i = obj_list.index { |n| n['_id'] == key && n['parent'] == data[:parent] } or next
node = obj_list.delete_at(node_i)
# for each group of this node, create a new hash
# under this node's output pointer and queue it for parsing
new_work = {}
node['groups'].each do |group|
data[:output][group] = {}
new_work[group] = { parent: key, output: data[:output][group] }
end
# get the group data for this node
doit(obj_list, new_work)
end
end
input_data = JSON.parse(IO.read './input.json')
output_data = {}
doit( input_data, 'globalSettings' => { parent: nil, output: output_data } )
The trick here is that I'm handing the recursive call to doit the names of the next objects that I'm looking for from the list (using the current object's group list) and pairing each of those desired names with their parent and a reference to where I want the function to put the found data. Each recursive call to doit will use deeper and deeper references into the original output hash.

Related

Enriching documents in ElasticSearch with only matching nested elements by ID

We're creating some packages, but that process is currently rather slow, because of the sheer amount of data being sent between microservices. Therefore, I have pruned the information being sent between those microservices and instead want to enrich the documents with the necessary information directly from within ElasticSearch. This gives documents of the following shape:
{
"_index" : "packages-2022.02.28",
"_type" : "_doc",
"_id" : "SG_DH-8019-ao-74783-20220315-12",
"_score" : 1.0,
"_source" : {
"id" : "SG_DH-8019-ao-74783-20220315-12",
"updatedOn" : "2022-02-28T14:45:57.7511562+01:00",
"code" : "SG",
"createdDate" : "2022-02-28T15:17:48.2571391+01:00",
"content" : {
"contentId" : "74783",
"units" : [
{
"id" : "HB_DBL.ST_RO_NFP",
"globalId" : "74783_HB_DBL.ST_RO_NFP",
"globalIntId" : -592692223,
"forPackaging" : false
},
{
"id" : "HB_DBL.ST_BB_NFP",
"globalId" : "74783_HB_DBL.ST_BB_NFP",
"globalIntId" : 446952442,
"forPackaging" : false
},
{
"id" : "HB_DBL.ST_AI_NFP",
"globalId" : "74783_HB_DBL.ST_AI_NFP",
"globalIntId" : -1174348304,
"forPackaging" : false
},
{
"id" : "HB_DBL.SU_RO_NFP",
"globalId" : "74783_HB_DBL.SU_RO_NFP",
"globalIntId" : -2111509049,
"forPackaging" : false
},
{
"id" : "HB_DBL.SU_BB_NFP",
"globalId" : "74783_HB_DBL.SU_BB_NFP",
"globalIntId" : 307969427,
"forPackaging" : false
},
{
"id" : "HB_DBL.SU_AI_NFP",
"globalId" : "74783_HB_DBL.SU_AI_NFP",
"globalIntId" : 1418623211,
"forPackaging" : false
},
{
"id" : "HB_DBL.PO-1_RO_NFP",
"globalId" : "74783_HB_DBL.PO-1_RO_NFP",
"globalIntId" : 1328251159,
"forPackaging" : false
},
{
"id" : "HB_DBL.PO-1_BB_NFP",
"globalId" : "74783_HB_DBL.PO-1_BB_NFP",
"globalIntId" : -1228155826,
"forPackaging" : false
},
{
"id" : "HB_DBL.PO-1_AI_NFP",
"globalId" : "74783_HB_DBL.PO-1_AI_NFP",
"globalIntId" : 749215308,
"forPackaging" : false
},
{
"id" : "HB_DBL.OF_RO_NFP",
"globalId" : "74783_HB_DBL.OF_RO_NFP",
"globalIntId" : 1981865239,
"forPackaging" : false
},
{
"id" : "HB_DBL.OF_BB_NFP",
"globalId" : "74783_HB_DBL.OF_BB_NFP",
"globalIntId" : 545563435,
"forPackaging" : false
},
{
"id" : "HB_DBL.OF_AI_NFP",
"globalId" : "74783_HB_DBL.OF_AI_NFP",
"globalIntId" : -481310774,
"forPackaging" : false
}
]
"duration" : {
"value" : 12,
"durationType" : "Day"
}
},
"generatedInfo" : {
"productGroupName" : null,
"subProductGroupName" : "Foo",
"version" : 0
}
}
}
]
with information from an enrich policy's index of the shape (when queried):
{
"_index" : ".enrich-package-enrich-1646044129711",
"_type" : "_doc",
"_id" : "zt_gP38BZeMUiw0-LxLa",
"_score" : 1.0,
"_source" : {
"contentId" : "365114",
"name" : "PackageName",
"board" : [
"B1",
"B2"
],
"units" : [
{
"price" : [
{
"margin" : 0,
"combination" : 10000,
"value" : 189030,
"currency" : "EUR"
}
],
"id" : "W2M_AX2_SC_NFP",
"globalId" : "365114_W2M_AX2_SC_NFP",
"globalIntId" : -988330164,
"name" : "UnitName",
"prop1": "Foo",
"prop2": "Bar"
}
]
}
}
]
I originally could get this working. However, when enriching, I only want to keep the units with the same global ID as those in the document to save. To this end, I have tried also enriching each unit with a simple Enrich processor and a ForEach processor referencing the enrich policy, matching on globalId and have even attempted matching on its hash code globalIntId (although in even in the latter case I would often get the error that it 'is not an integer', even though it clearly is one). This separate enrich-policy index has a shape similar to the following:
{
"_index" : ".enrich-package-unit-enrich-1646044158417",
"_type" : "_doc",
"_id" : "dN_gP38BZeMUiw0-t2Io",
"_score" : 1.0,
"_source" : {
"units" : [
{
"price" : [
{
"margin" : 0,
"combination" : 10000,
"value" : 189030,
"currency" : "EUR"
}
],
"globalId" : "365114_W2M_AX2_SC_NFP",
"globalIntId" : -988330164,
"name" : "UnitName",
"prop1": "Foo",
"prop2": "Bar",
"id" : "W2M_AX2_SC_NFP"
}
]
}
}
]
I have also tried to use Painless script, but so far my experience hasn't been exactly painless (pun intended). Every time I would try to access any data (I've tried various ways I encountered), I would get nothing but compilation errors. Also, given that I'm working on making this process faster, I'm a bit worried about performance here if I were to get it to work. I've read that Painless is fast, yet I've also heard it's actually fairly slow (I think compared to using processors, not necessarily other scripts).
Now, I'm at a loss about how to get this to work. I would prefer to do this without scripting if possible. However, if it is only possible using scripting, that's okay as long as the performance is acceptable. I'm using Elastic 7.12.
Update 1:
I'm creating the enrich policy from C# using Nest like so:
var enrichPolicyRequest = new PutEnrichPolicyRequest(enrichPolicyName)
{
Match = new MyPackageBedEnrichPolicy(index)
};
var putEnrichPolicyResponse = await elasticClient.Enrich.PutPolicyAsync(enrichPolicyRequest);
var executeEnrichPolicyResponse = await elasticClient.Enrich.ExecutePolicyAsync(enrichPolicyName);
...
public class MyPackageBedEnrichPolicy : IEnrichPolicy
{
public MyPackageBedEnrichPolicy(string index)
{
Indices = index;
MatchField = "contentId";
EnrichFields = new[] { "name", "board", "units" };
}
public Indices Indices { get; set; }
public Field MatchField { get; set; }
public Fields EnrichFields { get; set; }
public string Query { get; set; }
}
and the index for the units very similarly, but with
public class MyPackageUnitEnrichPolicy : IEnrichPolicy
{
public MyPackageUnitEnrichPolicy(string index)
{
Indices = index;
MatchField = "units.globalId";
EnrichFields = new[] { "units" };
}
...
For now, I have created the ingest processors in Kibana for easier prototyping, though I will have take care of that using Nest later as well. I have defined them basically as follows:
This is the definition of the ingest pipeline in JSON:
[
{
"enrich": {
"field": "content.contentId",
"policy_name": "enrichPolicyName",
"target_field": "enrichTest"
}
},
{
"foreach": {
"field": "content.units.globalId",
"processor": {
"enrich": {
"field": "content.units.globalId",
"policy_name": "unitEnrichPolicyName",
"target_field": "enrichTest.units",
"tag": "enrich-units-on-globalId-processor"
}
}
}
}
]

Get document by min size of array in Mongodb

I have mongo collection:
{
"_id" : 123,
"index" : "111",
"students" : [
{
"firstname" : "Mark",
"lastname" : "Smith"),
}
],
}
{
"_id" : 456,
"index" : "222",
"students" : [
{
"firstname" : "Mark",
"lastname" : "Smith"),
}
],
}
{
"_id" : 789,
"index" : "333",
"students" : [
{
"firstname" : "Neil",
"lastname" : "Smith"),
},
{
"firstname" : "Sofia",
"lastname" : "Smith"),
}
],
}
I want to get document that has index that is in the set of the given indexes, for example givenSet = ["111","333"] and has min length of students array.
Result should be the first document with _id:123, because its index is in the givenSet and studentsArrayLength = 1, which is smaller than third.
I need to write custom JSON #Query for Spring Mongo Repository. I am new to Mongo and am stuck a bit with this problem.
I wrote something like this:
#Query("{'index':{$in : ?0}, length:{$size:$students}, $sort:{length:-1}, $limit:1}")
Department getByMinStudentsSize(Set<String> indexes);
And got error: error message '$size needs a number'
Should I just use .count() or something like that?
you should use the aggregation framework for this type of query.
filter the result based on your condition.
add a new field and assign the array size to it.
sort based on the new field.
limit the result.
the solution should look something like this:
db.collection.aggregate([
{
"$match": {
index: {
"$in": [
"111",
"333"
]
}
}
},
{
"$addFields": {
"students_size": {
"$size": "$students"
}
}
},
{
"$sort": {
students_size: 1
}
},
{
"$limit": 1
}
])
working example: https://mongoplayground.net/p/ih4KqGg25i6
You are getting the issue because the second param should be enclosed in curly braces. And second param is projection
#Query("{{'index':{$in : ?0}}, {length:{$size:'$students'}}, $sort:{length:1}, $limit:1}")
Department getByMinStudentsSize(Set<String> indexes);
Below is the mongodb query :
db.collection.aggregate(
[
{
"$match" : {
"index" : {
"$in" : [
"111",
"333"
]
}
}
},
{
"$project" : {
"studentsSize" : {
"$size" : "$students"
},
"students" : 1.0
}
},
{
"$sort" : {
"studentsSize" : 1.0
}
},
{
"$limit" : 1.0
}
],
{
"allowDiskUse" : false
}
);

Partial update overwriting whole structure

I'm indexing a new document with the following content
{
"lastUpdate" : "20180114144020452",
"name" : "My Process",
"startDate" : "20180114162356585",
"endData" : "",
"tasks" : [
{
"1" : {
"lastUpdate" : "20180114144020452",
"taskId" : "123",
"subject" : "Terceira Atividade",
"status" : "Active",
"type" : "userTask",
"assign" : [
{
"date" : "20180114144020452",
"type" : "role",
"name" : "Time 3",
"id" : "Team3_345"
}
],
"receivedDate" : "",
"readDate" : "",
"finishDate" : ""
}
}
]
}
And then I'm trying to change task.1.status value with the following update content
{
"doc" : {
"tasks" : [
{
"1" : {
"status" : "Closed"
}
}
]
}
}
But it's overwriting the whole task.1 structure, deleting other values and letting only status value to closed instead of keep other values and change only status value.
How can I solve this? Thanks
You need to do it via a scripted partial updated like this
POST updates/update/1/_update
{
"script": {
"source": "ctx._source.tasks[0].1.status = 'Closed'"
}
}

Count Documents Matching Multiple Array Criteria

Schema is:
{
"_id" : ObjectId("594b7e86f59ccd05bb8a90b5"),
"_class" : "com.notification.model.entity.Notification",
"notificationReferenceId" : "7917a5365ba246d1bb3664092c59032a",
"notificationReceivedAt" : ISODate("2017-06-22T08:23:34.382+0000"),
"sendTo" : [
{
"userReferenceId" : "check",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "UNREAD"
}
]
}
]
}
{
"_id" : ObjectId("594b8045f59ccd076dd86063"),
"_class" : "com.notification.model.entity.Notification",
"notificationReferenceId" : "6990329330294cbc950ef2b38f6d1a4f",
"notificationReceivedAt" : ISODate("2017-06-22T08:31:01.299+0000"),
"sendTo" : [
{
"userReferenceId" : "check",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "UNREAD"
}
]
}
]
}
{
"_id" : ObjectId("594b813ef59ccd076dd86064"),
"_class" : "com.notification.model.entity.Notification",
"notificationReferenceId" : "3c910cf5fcec42d6bfb78a9baa393efa",
"notificationReceivedAt" : ISODate("2017-06-22T08:35:10.474+0000"),
"sendTo" : [
{
"userReferenceId" : "check",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "UNREAD"
}
]
},
{
"userReferenceId" : "hello",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "READ"
}
]
}
]
}
I want to count a user notifications based on statusList which is a List. I used mongoOperations to make a query:
Query query = new Query();
query.addCriteria(Criteria.where("sendTo.userReferenceId").is(userReferenceId)
.andOperator(Criteria.where("sendTo.mediumAndDestination.status").in(statusList)));
long count = mongoOperations.count(query, Notification.class);
I realise I'm doing it wrong because I am getting count as 1 when I query for user with reference ID hello and statusList with single element as UNREAD.
How do I perform an aggregated query on array element?
The query needs $elemMatch in order to actually match "within" the array element that matches both criteria:
Query query = new Query(Criteria.where("sendTo")
.elemMatch(
Criteria.where("userReferenceId").is("hello")
.and("mediumAndDestination.status").is("UNREAD")
));
Which essentially serializes to:
{
"sendTo": {
"$elemMatch": {
"userReferenceId": "hello",
"mediumAndDestination.status": "UNREAD"
}
}
}
Note that in your question there is no such document, the only matching thing with "hello" actually has the "status" of "READ". If I supply those criteria instead:
{
"sendTo": {
"$elemMatch": {
"userReferenceId": "hello",
"mediumAndDestination.status": "READ"
}
}
}
Then I get the last document:
{
"_id" : ObjectId("594b813ef59ccd076dd86064"),
"_class" : "com.notification.model.entity.Notification",
"notificationReferenceId" : "3c910cf5fcec42d6bfb78a9baa393efa",
"notificationReceivedAt" : ISODate("2017-06-22T08:35:10.474Z"),
"sendTo" : [
{
"userReferenceId" : "check",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "UNREAD"
}
]
},
{
"userReferenceId" : "hello",
"mediumAndDestination" : [
{
"medium" : "API",
"status" : "READ"
}
]
}
]
}
But with "UNREAD" the count is actually 0 for this sample.

Update object in array with new fields mongodb

ai have some mongodb document
horses is array with id, name, type
{
"_id" : 33333333333,
"horses" : [
{
"id" : 72029,
"name" : "Awol",
"type" : "flat",
},
{
"id" : 822881,
"name" : "Give Us A Reason",
"type" : "flat",
},
{
"id" : 826474,
"name" : "Arabian Revolution",
"type" : "flat",
}
}
I need to add new fields
I thought something like that, but I did not go to his head
horse = {
"place" : 1,
"body" : 11
}
Card.where({'_id' => 33333333333}).find_and_modify({'$set' => {'horses.' + index.to_s => horse}}, upsert:true)
But all existing fields are removed and inserted new how to do that would be new fields added to existing
Indeed, this command will overwrite the subdocument
'$set': {
'horses.0': {
"place" : 1,
"body" : 11
}
}
You need to set individual fields:
'$set': {
'horses.0.place': 1,
'horses.0.body': 11
}

Resources