How to update this specific data in this user collection in mongodb? - performance

I need to do an update to a specific soldier in this user collection:
For example:
user: {
myArmy : {
money : 100,
fans : 100,
mySoldiers : [{
_id : ddd111bbb,
mySkill : 50,
myStamina : 50,
myMoral : 50,
},
{
_id : ddd111dd ,
mySkill : 50,
myStamina : 50,
myMoral : 50,
}],
}
}
I want in my update query to do like the following:
conditions = { _id : user._id };
update =
{ 'myArmy.mySoldiers._id' : soldierId},
{
'$set': {
'myArmy.money' : balanceToSet,
'myArmy.fans' : fansToSet,
'myArmy.mySoldiers.$.skill': skillToSet,
'myArmy.mySoldiers.$.stamina': staminaToSet,
'myArmy.mySoldiers.$.moral': moralToSet
}
}
and this is the final query:
User.update(conditions, update, options, function(err){
if (err) deferred.reject;
stream.resume();
});
And the end result if soldierId is 'ddd111bbb':
user: {
myArmy : {
money : 200,
fans : 100,
mySoldiers : [{
_id : ddd111bbb,
mySkill : 150,
myStamina : 250,
myMoral : 50,
},
{
_id : ddd111dd ,
mySkill : 50,
myStamina : 50,
myMoral : 50,
}],
}
}
Those skill, moral and stamina should change only on the specific soldier.
How do i get the $ to be the index number of this soldier, what is missing from the update query above?

This is what i was looking for:
conditions = { _id : user._id , 'myArmy.mySoldiers._id' : soldierId};
update = {
$set: {
'myArmy.balance': balanceToSet,
'myArmy.fans' : fansToSet,
'myArmy.tokens' : tokensToSet,
'myArmy.mySoldiers.$.skill' : skillToSet,
'myArmy.mySoldiers.$.stamina': staminaToSet,
'myArmy.mySoldiers.$.moral' : moralToSet
}
}
This gave me the result i wanted, before i accidentally inserted the condition query with the update one...

Related

Assert on the basis of queryTime from the sample response

From the below response, want to fail the test if the queryTime value is more than 1000ms.
Response Data:
{
"metadata" :{
"count" : 1,
"pageSize" : 100,
"page" : 1,
"TotalPages" : 1,
"queryTime" : "5224ms"
},
"result": {
"transactionName" : "Test"
}
}
You can use a JSR223 assertion for this with some script inside,
var json = JSON.parse(prev.getResponseDataAsString());
var queryTime = json.metadata.queryTime
var time = parseInt(queryTime.split("m")[0])
if (time > 1000 )
{
log.info("QueryTime " + time);
AssertionResult.setFailure(true);
}

Group Data on elastic search with same value on two key

I have just started to learn about elastic search and facing a problem on group aggregation. I have a data set on elastic search like :
[{
srcIP : "10.0.11.12",
dstIP : "19.67.78.91",
totalMB : "0.25"
},{
srcIP : "10.45.11.62",
dstIP : "19.67.78.91",
totalMB : "0.50"
},{
srcIP : "13.67.52.91",
dstIP : "10.0.11.12",
totalMB : "0.75"
},{
srcIP : "10.23.64.12",
dstIP : "10.45.11.62",
totalMB : "0.25"
}]
I Just want to group data on the basis of srcIP and sum the field totalMB but I just wanna add up on more thing like when group by performing on scrIP then it will match the srcIP value to dstIP value and also sum the totalMB for dstIP.
Output should be like this :
buckets : [{
key : "10.0.11.12",
total_GB_SrcIp :{
value : "0.25"
},
total_GB_dstIP :{
value : "0.75"
}
},
{
key : "10.45.11.62",
total_MB_SrcIp :{
value : "0.50"
},
total_MB_dstIP :{
value : "0.25"
}
}]
I have done normal aggregation for one key but didn't get the final query for my problem.
Query :
GET /index*/_search
{
size : 0,
"aggs": {
"group_by_srcIP": {
"terms": {
"field": "srcIP",
"size": 100,
"order": {
"total_MB_SrcIp": "desc"
}
},
"aggs": {
"total_MB_SrcIp": {
"sum": {
"field": "TotalMB"
}
}
}
}
}
}
Hope you understand my problem on the basis of sample output.
Thanks in advance.
As per my understanding, you need a sum aggregation on field (totalMB) with respect to distinct values in two another fields (srcIP, dstIP).
AFAIK, elastic search is not that good for aggregating on values of multiple fields, unless you combine those fields together using some document ingestion or combine it on application side itself. (I may be wrong here, though).
I gave it a try to get required output using scripted_metric aggregation. (Please read about it if you don't know what it is or how it works)
I experimented on painless script to do following in aggregation:
pick srcIp, dstIp & totalMB from each doc
populate a cross-mapping like IP -> { (src : totalMBs), (dst : totalMBs) } in a map
return this map as result of aggregation
Here is the actual search query with aggregation:
GET /testIndex/testType/_search
{
"size": 0,
"aggs": {
"ip-addr": {
"scripted_metric": {
"init_script": "params._agg.addrs = []",
"map_script": "def lst = []; lst.add(doc.srcIP.value); lst.add(doc.dstIP.value); lst.add(doc.totalMB.value); params._agg.addrs.add(lst);",
"combine_script": "Map ipMap = new HashMap(); for(entry in params._agg.addrs) { def srcIp = entry.get(0); def dstIp = entry.get(1); def mbs = entry.get(2); if(ipMap.containsKey(srcIp)) {def srcMbSum = mbs + ipMap.get(srcIp).get('srcMB'); ipMap.get(srcIp).put('srcMB',srcMbSum); } else {Map types = new HashMap(); types.put('srcMB', mbs); types.put('dstMB', 0.0); ipMap.put(srcIp, types); } if(ipMap.containsKey(dstIp)) {def dstMbSum = mbs + ipMap.get(dstIp).get('dstMB'); ipMap.get(dstIp).put('dstMB',dstMbSum); } else {Map types = new HashMap(); types.put('srcMB', 0.0); types.put('dstMB', mbs); ipMap.put(dstIp, types); } } return ipMap;",
"reduce_script": "Map resultMap = new HashMap(); for(ipMap in params._aggs) {for(entry in ipMap.entrySet()) {def ip = entry.getKey(); def srcDestMap = entry.getValue(); if(resultMap.containsKey(ip)) {Map types = new HashMap(); types.put('srcMB', srcDestMap.get('srcMB') + resultMap.get(ip).get('srcMB')); types.put('dstMB', srcDestMap.get('dstMB') + resultMap.get(ip).get('dstMB')); resultMap.put(ip, types); } else {resultMap.put(ip, srcDestMap); } } } return resultMap;"
}
}
}
}
Here are experiment details:
Index mapping:
GET testIndex/_mapping
{
"testIndex": {
"mappings": {
"testType": {
"dynamic": "true",
"_all": {
"enabled": false
},
"properties": {
"dstIP": {
"type": "ip"
},
"srcIP": {
"type": "ip"
},
"totalMB": {
"type": "double"
}
}
}
}
}
}
Sample input:
POST testIndex/testType
{
"srcIP" : "10.0.11.12",
"dstIP" : "19.67.78.91",
"totalMB" : "0.25"
}
POST testIndex/testType
{
"srcIP" : "10.45.11.62",
"dstIP" : "19.67.78.91",
"totalMB" : "0.50"
}
POST testIndex/testType
{
"srcIP" : "13.67.52.91",
"dstIP" : "10.0.11.12",
"totalMB" : "0.75"
}
POST testIndex/testType
{
"srcIP" : "10.23.64.12",
"dstIP" : "10.45.11.62",
"totalMB" : "0.25"
}
Query output:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"ip-addr": {
"value": {
"13.67.52.91": {
"srcMB": 0.75,
"dstMB": 0
},
"10.23.64.12": {
"srcMB": 0.25,
"dstMB": 0
},
"10.45.11.62": {
"srcMB": 0.5,
"dstMB": 0.25
},
"19.67.78.91": {
"srcMB": 0,
"dstMB": 0.75
},
"10.0.11.12": {
"srcMB": 0.25,
"dstMB": 0.75
}
}
}
}
}
Here is readable query for better understanding.
"scripted_metric": {
"init_script": "params._agg.addrs = []",
"map_script": """
def lst = [];
lst.add(doc.srcIP.value);
lst.add(doc.dstIP.value);
lst.add(doc.totalMB.value);
params._agg.addrs.add(lst);
""",
"combine_script": """
Map ipMap = new HashMap();
for(entry in params._agg.addrs) {
def srcIp = entry.get(0);
def dstIp = entry.get(1);
def mbs = entry.get(2);
if(ipMap.containsKey(srcIp)) {
def srcMbSum = mbs + ipMap.get(srcIp).get('srcMB');
ipMap.get(srcIp).put('srcMB',srcMbSum);
} else {
Map types = new HashMap();
types.put('srcMB', mbs);
types.put('dstMB', 0.0);
ipMap.put(srcIp, types);
}
if(ipMap.containsKey(dstIp)) {
def dstMbSum = mbs + ipMap.get(dstIp).get('dstMB');
ipMap.get(dstIp).put('dstMB',dstMbSum);
} else {
Map types = new HashMap();
types.put('srcMB', 0.0);
types.put('dstMB', mbs);
ipMap.put(dstIp, types);
}
}
return ipMap;
""",
"reduce_script": """
Map resultMap = new HashMap();
for(ipMap in params._aggs) {
for(entry in ipMap.entrySet()) {
def ip = entry.getKey();
def srcDestMap = entry.getValue();
if(resultMap.containsKey(ip)) {
Map types = new HashMap();
types.put('srcMB', srcDestMap.get('srcMB') + resultMap.get(ip).get('srcMB'));
types.put('dstMB', srcDestMap.get('dstMB') + resultMap.get(ip).get('dstMB'));
resultMap.put(ip, types);
} else {
resultMap.put(ip, srcDestMap);
}
}
}
return resultMap;
"""
}
However, prior to going in depth, I would suggest you to test it out on some sample data and check if it works. Scripted metric aggregations do have considerable impact on query performance.
One more thing, to get required key string in aggregation result, replace all occurrences of 'srcMB' & 'dstMB' in script to 'total_GB_SrcIp' & 'total_GB_DstIp' as per your need.
Hope this may help you or some one.
FYI, I tested this on ES v5.6.11.

Formatting currency when exporting amCharts

I'm working with amCharts 3.20.9, I sucesfully draw a graph and may export the data into an XLSX file. However, one of the columns I'm exporting is a currency, is there a way of setting such a format in the resulting file?
The script I have for the graph is:
var chart = AmCharts.makeChart("graph", {
"type" : "serial",
"theme" : "light",
"dataProvider" : data,
"valueAxes" : [ {
"stackType": "regular",
"gridColor" : "#FFFFFF",
"gridAlpha" : 0.2,
"dashLength" : 0,
"title" : "Metros cúbicos"
} ],
"gridAboveGraphs" : true,
"startDuration" : 1,
"graphs" : graphs,
"chartCursor" : {
"categoryBalloonEnabled" : false,
"cursorAlpha" : 0,
"zoomable" : false
},
"categoryField" : "formatedTime",
"categoryAxis" : {
"gridPosition" : "start",
"gridAlpha" : 0,
"tickPosition" : "start",
"tickLength" : 20,
"parseDates" : false,
"labelsEnabled": true,
"labelFrequency": 3
},
"export" : {
"enabled" : true,
"fileName" : "Reporte",
"exportTitles" : true,
"exportFields" : fields,
"columnNames" : columnNames,
"menu" : [ {
"class" : "export-main",
"menu" : [ "PDF", "XLSX" ]
} ]
}
});
Where:
graphs contains the graphs definitions, something like:
[{
"balloonText" : "[[formatedTime]]: <b>[[" + sites[i] + "]]</b>",
"balloonFunction" : formater,
"lineThickness": 1,
"lineAlpha" : 0.2,
"type" : "line",
"valueField" : sites[i]
}];
fields: ["formatedTime", "Viva Villavicencio", "Viva Villavicencio_COST_"]
columnNames: {"formatedTime": "Fecha", "Viva Villavicencio": "Metros cúbicos para: Viva Villavicencio", "Viva Villavicencio_COST_": "Costo para: Viva Villavicencio"}
So far so good, I have my xlsx with the proper data, but at the end I want the column "Viva Villavicencio_COST_" be defined as a currency in the resulting file and therefore formatted and displayed that way.
Any help will be appreciated.
Have a look at the processData option. It takes a callback function that lets you make changes to your dataset before it gets written to your exported file.
So, add to your code:
"export": {
"processData": function(data){
for(var i = 0; i < data.length; i++){
data[i].Viva Villavicencio_COST_ = toCurrency(data[i].Viva Villavicencio_COST_);
}
return data;
}
...
}
This function returns the exact dataset as before, but with a formatted Viva Villavicencio_COST_ field.
Then, add the function toCurrency. I don't believe amCharts has a function built in for formatting. If you need a better formatting function you could use something like numeral.js or accounting.js, but for now try:
function toCurrency(value){
return '$' + value;
}
Complete docs for the export plugin are here: https://github.com/amcharts/export
Hope that helps.

How to take the shortest distance per person (with multiple addresses) to an origin point and sort on that value

I have People documents in my elastic index and each person has multiple addresses, each address has a lat/long point associated.
I'd like to geo sort all the people by proximity to a specific origin location, however multiple locations per person complicates this matter. What has been decided is [Objective:] to take the shortest distance per person to the origin point and use that number as the sort number.
Example of my people index roughed out in 'pseudo-JSON' showing a couple of person documents each having multiple addresses:
person {
name: John Smith
addresses [
{ lat: 43.5234, lon: 32.5432, 1 Main St. }
{ lat: 44.983, lon: 37.3432, 2 Queen St. W. }
{ ... more addresses ... }
]
}
person {
name: Jane Doe
addresses [
... she has a bunch of addresses too ...
]
}
... many more people docs each having multiple addresses like above ...
Currently I'm using an elastic script field with an inline groovy script like so - it uses a groovy script to calculate meters from origin for each address, shoves all those meter distances into an array per person and picks the minimum number from the array per person making it the sort value.
string groovyShortestDistanceMetersSortScript = string.Format("[doc['geo1'].distance({0}, {1}), doc['geo2'].distance({0}, {1})].min()",
origin.Latitude,
origin.Longitude);
var shortestMetersSort = new SortDescriptor<Person>()
.Script(sd => sd
.Type("number")
.Script(script => script
.Inline(groovyShortestDistanceMetersSortScript)
)
.Order(SortOrder.Ascending)
);
Although this works, I wonder if using a scripted field might be more expensive or too complex at querying time, and if there is a better way to achieve the desired sort order outcome by indexing the data differently and/or by using aggregations, maybe even doing away with the script field altogether.
Any thoughts and guidance are appreciated. I'm sure somebody else has run into this same requirement (or similar) and has found a different or better solution.
I'm using the Nest API in this code sample but will gladly accept answers in elasticsearch JSON format because I can port those into the NEST API code.
When sorting on distance from a specified origin where the field being sorted on contains a collection of values (in this case geo_point types), we can specify how a value should be collected from the collection using the sort_mode. In this case, we can specify a sort_mode of "min" to use the nearest location to the origin as the sort value. Here's an example
public class Person
{
public string Name { get; set; }
public IList<Address> Addresses { get; set; }
}
public class Address
{
public string Name { get; set; }
public GeoLocation Location { get; set; }
}
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var indexName = "people";
var connectionSettings = new ConnectionSettings(pool)
.InferMappingFor<Person>(m => m.IndexName(indexName));
var client = new ElasticClient(connectionSettings);
if (client.IndexExists(indexName).Exists)
client.DeleteIndex(indexName);
client.CreateIndex(indexName, c => c
.Settings(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
)
.Mappings(m => m
.Map<Person>(mm => mm
.AutoMap()
.Properties(p => p
.Nested<Address>(n => n
.Name(nn => nn.Addresses.First().Location)
.AutoMap()
)
)
)
)
);
var people = new[] {
new Person {
Name = "John Smith",
Addresses = new List<Address>
{
new Address
{
Name = "Buckingham Palace",
Location = new GeoLocation(51.501476, -0.140634)
},
new Address
{
Name = "Empire State Building",
Location = new GeoLocation(40.748817, -73.985428)
}
}
},
new Person {
Name = "Jane Doe",
Addresses = new List<Address>
{
new Address
{
Name = "Eiffel Tower",
Location = new GeoLocation(48.858257, 2.294511)
},
new Address
{
Name = "Uluru",
Location = new GeoLocation(-25.383333, 131.083333)
}
}
}
};
client.IndexMany(people);
// call refresh for testing (avoid in production)
client.Refresh("people");
var towerOfLondon = new GeoLocation(51.507313, -0.074308);
client.Search<Person>(s => s
.MatchAll()
.Sort(so => so
.GeoDistance(g => g
.Field(f => f.Addresses.First().Location)
.PinTo(towerOfLondon)
.Ascending()
.Unit(DistanceUnit.Meters)
// Take the minimum address location distance from
// our target location, The Tower of London
.Mode(SortMode.Min)
)
)
);
}
This creates the following search
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"addresses.location": [
{
"lat": 51.507313,
"lon": -0.074308
}
],
"order": "asc",
"mode": "min",
"unit": "m"
}
}
]
}
which returns
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yT",
"_score" : null,
"_source" : {
"name" : "John Smith",
"addresses" : [ {
"name" : "Buckingham Palace",
"location" : {
"lat" : 51.501476,
"lon" : -0.140634
}
}, {
"name" : "Empire State Building",
"location" : {
"lat" : 40.748817,
"lon" : -73.985428
}
} ]
},
"sort" : [ 4632.035195223564 ]
}, {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yU",
"_score" : null,
"_source" : {
"name" : "Jane Doe",
"addresses" : [ {
"name" : "Eiffel Tower",
"location" : {
"lat" : 48.858257,
"lon" : 2.294511
}
}, {
"name" : "Uluru",
"location" : {
"lat" : -25.383333,
"lon" : 131.083333
}
} ]
},
"sort" : [ 339100.6843074794 ]
} ]
}
}
The value returned in the sort array for each hit is the minimum distance in the sort unit specified (in our case, metres) from the specified point (The Tower of London) and the addresses for each person.
Per the guidelines in Sorting By Distance documentation, often it can make more sense to score by distance, which can be achieved by using function_score query with a decay function.

MongoDB Data Model Optimization

About
I have raw data which is processed by aggregation framework and later results saved in another collection. Let's assume aggregation results in something like this:
cursor = {
"result" : [
{
"_id" : {
"x" : 1,
"version" : [
"2_0"
],
"date" : {
"year" : 2015,
"month" : 3,
"day" : 26
}
},
"x" : 1,
"count" : 2
}
],
"ok" : 1
};
Note that in most cases cursor length are more than about 2k elements.
So, now i'm looping thought cursor ( cursor.forEach ) and performing following steps:
// Try to increment values:
var inc = db.runCommand({
findAndModify: my_coll,
query : {
"_id.x" : "1",
"value.2_0" : {
"$elemMatch" : {
"date" : ISODate("2015-12-18T00:00:00Z")
}
}
},
update : { $inc: {
"value.2_0.$.x" : 1
} }
});
// If there's no effected row via inc operation, - sub-element doesn't exists at all
// so let's push it
if (inc.value == null) {
date[date.key] = date.value;
var up = db.getCollection(my_coll).update(
{
"_id.x" : 1
},
{
$push : {}
},
{ writeConcern: { w: "majority", wtimeout: 5000 } }
);
// No document found for inserting sub element, let's create it
if (up.nMatched == 0) {
db.getCollection(my_coll).insert({
"_id" : {
"x" : 1
},
"value" : {}
});
}}
Resulting data-structure:
data = {
"_id" : {
"x" : 1,
"y" : 1
},
"value" : {
"2_0" : [
{
"date" : ISODate("2014-12-17T00:00:00.000Z"),
"x" : 1
},
{
"date" : ISODate("2014-12-18T00:00:00.000Z"),
"x" : 2
}
]
}
};
In short i have to apply these actions to process my data:
Try to increment values.
If there's no effected data by increment operation push data to array.
If there's no effected data by push operation create new document.
Problem:
In some cases aggregation result returns more than 2k results i have to apply mentioned steps who causes performance bottleneck. While i'm processing already aggregated data, - new data accumulates for aggregation and later i cannot apply even aggregation to this new raw data because it exceeds 64MB size limit due firsts slowness.
Question:
How i can with this data-structure improve performance when increasing x ( see data-stucture ) values or adding-sub elements?
Also i cannot apply mongodb bulk operations due nested structure using positional parameter.
Maybe chosen data-model is not correct? Or maybe i'm doing not correctly aggregation task at all?
How i can improve aggregated data insertions?

Resources