Related
I have just started to learn about elastic search and facing a problem on group aggregation. I have a data set on elastic search like :
[{
srcIP : "10.0.11.12",
dstIP : "19.67.78.91",
totalMB : "0.25"
},{
srcIP : "10.45.11.62",
dstIP : "19.67.78.91",
totalMB : "0.50"
},{
srcIP : "13.67.52.91",
dstIP : "10.0.11.12",
totalMB : "0.75"
},{
srcIP : "10.23.64.12",
dstIP : "10.45.11.62",
totalMB : "0.25"
}]
I Just want to group data on the basis of srcIP and sum the field totalMB but I just wanna add up on more thing like when group by performing on scrIP then it will match the srcIP value to dstIP value and also sum the totalMB for dstIP.
Output should be like this :
buckets : [{
key : "10.0.11.12",
total_GB_SrcIp :{
value : "0.25"
},
total_GB_dstIP :{
value : "0.75"
}
},
{
key : "10.45.11.62",
total_MB_SrcIp :{
value : "0.50"
},
total_MB_dstIP :{
value : "0.25"
}
}]
I have done normal aggregation for one key but didn't get the final query for my problem.
Query :
GET /index*/_search
{
size : 0,
"aggs": {
"group_by_srcIP": {
"terms": {
"field": "srcIP",
"size": 100,
"order": {
"total_MB_SrcIp": "desc"
}
},
"aggs": {
"total_MB_SrcIp": {
"sum": {
"field": "TotalMB"
}
}
}
}
}
}
Hope you understand my problem on the basis of sample output.
Thanks in advance.
As per my understanding, you need a sum aggregation on field (totalMB) with respect to distinct values in two another fields (srcIP, dstIP).
AFAIK, elastic search is not that good for aggregating on values of multiple fields, unless you combine those fields together using some document ingestion or combine it on application side itself. (I may be wrong here, though).
I gave it a try to get required output using scripted_metric aggregation. (Please read about it if you don't know what it is or how it works)
I experimented on painless script to do following in aggregation:
pick srcIp, dstIp & totalMB from each doc
populate a cross-mapping like IP -> { (src : totalMBs), (dst : totalMBs) } in a map
return this map as result of aggregation
Here is the actual search query with aggregation:
GET /testIndex/testType/_search
{
"size": 0,
"aggs": {
"ip-addr": {
"scripted_metric": {
"init_script": "params._agg.addrs = []",
"map_script": "def lst = []; lst.add(doc.srcIP.value); lst.add(doc.dstIP.value); lst.add(doc.totalMB.value); params._agg.addrs.add(lst);",
"combine_script": "Map ipMap = new HashMap(); for(entry in params._agg.addrs) { def srcIp = entry.get(0); def dstIp = entry.get(1); def mbs = entry.get(2); if(ipMap.containsKey(srcIp)) {def srcMbSum = mbs + ipMap.get(srcIp).get('srcMB'); ipMap.get(srcIp).put('srcMB',srcMbSum); } else {Map types = new HashMap(); types.put('srcMB', mbs); types.put('dstMB', 0.0); ipMap.put(srcIp, types); } if(ipMap.containsKey(dstIp)) {def dstMbSum = mbs + ipMap.get(dstIp).get('dstMB'); ipMap.get(dstIp).put('dstMB',dstMbSum); } else {Map types = new HashMap(); types.put('srcMB', 0.0); types.put('dstMB', mbs); ipMap.put(dstIp, types); } } return ipMap;",
"reduce_script": "Map resultMap = new HashMap(); for(ipMap in params._aggs) {for(entry in ipMap.entrySet()) {def ip = entry.getKey(); def srcDestMap = entry.getValue(); if(resultMap.containsKey(ip)) {Map types = new HashMap(); types.put('srcMB', srcDestMap.get('srcMB') + resultMap.get(ip).get('srcMB')); types.put('dstMB', srcDestMap.get('dstMB') + resultMap.get(ip).get('dstMB')); resultMap.put(ip, types); } else {resultMap.put(ip, srcDestMap); } } } return resultMap;"
}
}
}
}
Here are experiment details:
Index mapping:
GET testIndex/_mapping
{
"testIndex": {
"mappings": {
"testType": {
"dynamic": "true",
"_all": {
"enabled": false
},
"properties": {
"dstIP": {
"type": "ip"
},
"srcIP": {
"type": "ip"
},
"totalMB": {
"type": "double"
}
}
}
}
}
}
Sample input:
POST testIndex/testType
{
"srcIP" : "10.0.11.12",
"dstIP" : "19.67.78.91",
"totalMB" : "0.25"
}
POST testIndex/testType
{
"srcIP" : "10.45.11.62",
"dstIP" : "19.67.78.91",
"totalMB" : "0.50"
}
POST testIndex/testType
{
"srcIP" : "13.67.52.91",
"dstIP" : "10.0.11.12",
"totalMB" : "0.75"
}
POST testIndex/testType
{
"srcIP" : "10.23.64.12",
"dstIP" : "10.45.11.62",
"totalMB" : "0.25"
}
Query output:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"ip-addr": {
"value": {
"13.67.52.91": {
"srcMB": 0.75,
"dstMB": 0
},
"10.23.64.12": {
"srcMB": 0.25,
"dstMB": 0
},
"10.45.11.62": {
"srcMB": 0.5,
"dstMB": 0.25
},
"19.67.78.91": {
"srcMB": 0,
"dstMB": 0.75
},
"10.0.11.12": {
"srcMB": 0.25,
"dstMB": 0.75
}
}
}
}
}
Here is readable query for better understanding.
"scripted_metric": {
"init_script": "params._agg.addrs = []",
"map_script": """
def lst = [];
lst.add(doc.srcIP.value);
lst.add(doc.dstIP.value);
lst.add(doc.totalMB.value);
params._agg.addrs.add(lst);
""",
"combine_script": """
Map ipMap = new HashMap();
for(entry in params._agg.addrs) {
def srcIp = entry.get(0);
def dstIp = entry.get(1);
def mbs = entry.get(2);
if(ipMap.containsKey(srcIp)) {
def srcMbSum = mbs + ipMap.get(srcIp).get('srcMB');
ipMap.get(srcIp).put('srcMB',srcMbSum);
} else {
Map types = new HashMap();
types.put('srcMB', mbs);
types.put('dstMB', 0.0);
ipMap.put(srcIp, types);
}
if(ipMap.containsKey(dstIp)) {
def dstMbSum = mbs + ipMap.get(dstIp).get('dstMB');
ipMap.get(dstIp).put('dstMB',dstMbSum);
} else {
Map types = new HashMap();
types.put('srcMB', 0.0);
types.put('dstMB', mbs);
ipMap.put(dstIp, types);
}
}
return ipMap;
""",
"reduce_script": """
Map resultMap = new HashMap();
for(ipMap in params._aggs) {
for(entry in ipMap.entrySet()) {
def ip = entry.getKey();
def srcDestMap = entry.getValue();
if(resultMap.containsKey(ip)) {
Map types = new HashMap();
types.put('srcMB', srcDestMap.get('srcMB') + resultMap.get(ip).get('srcMB'));
types.put('dstMB', srcDestMap.get('dstMB') + resultMap.get(ip).get('dstMB'));
resultMap.put(ip, types);
} else {
resultMap.put(ip, srcDestMap);
}
}
}
return resultMap;
"""
}
However, prior to going in depth, I would suggest you to test it out on some sample data and check if it works. Scripted metric aggregations do have considerable impact on query performance.
One more thing, to get required key string in aggregation result, replace all occurrences of 'srcMB' & 'dstMB' in script to 'total_GB_SrcIp' & 'total_GB_DstIp' as per your need.
Hope this may help you or some one.
FYI, I tested this on ES v5.6.11.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I'm trying to create dynamically nested json in go. i understand that go is static type and there are various ways to create dynamic objects(interfaces) and wondering if there is a way to tackle my dependency mapping in nested json
sample json
[
{
"display" : "Environment" ,
"field" : "test_env" ,
"value" : {
"display" : "staging",
"value" : "s"
},
"type" : "drop-down" ,
"data" : [
{
"display" : "version" ,
"field" : "test_version" ,
"value" : {
"display" : "1.1.9" ,
"value" : "1.1.9"
},
"type" : "drop-down" ,
"data" : [
{
"display" : "DataCenter" ,
"field" : "test_dc" ,
"value" : {
"display" : "washington",
"value" : "wa"
},
"type" : "drop-down" ,
"data" : [{
"display" : "Secondary" ,
"field" : "test_secondary_dc" ,
"value" : {
"display" : "miami" ,
"value" : "mi"
},
"type" : "drop-down" ,
"data" : [{
"display" : "Size" ,
"field" : "test_size" ,
"value" : {
"display" : "small" ,
"value" : "s"
}
}]
}]
}
]
}
]
},
{
"display" : "Environment" ,
"field" : "test_env" ,
"value" : {
"display" : "production",
"value" : "p"
},
"type" : "drop-down" ,
"data" : [
{
"display" : "version" ,
"field" : "test_version" ,
"value" : {
"display" : "1.1.9" ,
"value" : "1.1.9"
},
"type" : "drop-down" ,
"data" : [
{
"display" : "DataCenter" ,
"field" : "test_dc" ,
"value" : {
"display" : "miami",
"value" : "mi"
},
"type" : "drop-down" ,
"data" : [{
"display" : "Secondary" ,
"field" : "test_secondary_dc" ,
"value" : {
"display" : "washington" ,
"value" : "wa"
},
"type" : "drop-down" ,
"data" : [{
"display" : "Size" ,
"field" : "test_size" ,
"value" : {
"display" : "medium" ,
"value" : "m"
}
}]
}]
}
]
}
]
}
]
sample code :
package main
import (
"fmt"
"reflect"
)
// struct definition ///
type RootElem struct {
RDisplay string `json:"display"`
RField string `json:"field"`
RType string `json:"type"`
RData RSlice `json:"data"`
RValue RValue `json:"value"`
}
type RValue struct {
Display string `json:"display"`
Evalue string `json:"value"`
}
type Vars struct {
Env string `json:"environment"`
Version string `json:"version"`
Zone string `json:"zone"`
PDcenter string `json:"primary_dc"`
SDcenter string `json:"secondary_dc,omitempty"`
Size string `json:"size"`
}
type RSlice []RootElem
func valueFactory(etype, evalue string) string {
switch (etype) {
case "ENVIRONMENT":
return environmentValue(evalue);
case "VERSION":
return versionValue(evalue);
case "ZONE":
return zoneValue(evalue);
case "PRIMARYDC":
return primaryValue(evalue);
case "SECONDARYDC":
return secondaryValue(evalue);
case "SIZE":
return sizeValue(evalue);
default:
return("Specifying a type we don't have.");
}
}
func sizeValue(sz string) string {
switch (sz) {
case "Small":
return "s"
case "Medium":
return "m"
case "Large" :
return "l"
default:
return "This is not a size environment value"
}
}
func environmentValue(env string) string {
switch (env) {
case "Production":
return "p"
case "staging":
return "s"
default:
return "This is not a valid environment value"
}
}
func versionValue(ver string) string {
switch (ver) {
case "1.1.9":
return "1.1.9"
default:
return "This is not a valid version value"
}
}
func zoneValue(zone string) string {
switch (zone) {
case "BLACK":
return "Black"
case "GREEN" :
return "Green"
default:
return "This is not a valid zone value"
}
}
func primaryValue(pdc string) string {
switch (pdc) {
case "washington ":
return "wa"
case "Miami" :
return "mi"
default:
return "This is not a valid primary data center value"
}
}
func secondaryValue(sdc string) string {
switch (sdc) {
case "washington":
return "wa"
case "Miami" :
return "mi"
default:
return "This is not a valid secondary data center value"
}
}
func dataGeneric(display, field, etype string) (relm RootElem) {
relm.RDisplay = display
relm.RField = field
relm.RValue.Display = ""
relm.RValue.Evalue = ""
relm.RType = etype
return relm
}
func dataEnvironment() RootElem {
display := "Environment"
field := "test_env"
etype := "dropdown"
return dataGeneric(display, field, etype)
}
func dataVersion() RootElem {
display := "Version"
field := "test_version"
etype := "dropdown"
return dataGeneric(display, field, etype)
}
func dataZone() RootElem {
display := "Zone"
field := "test_zone"
etype := "dropdown"
return dataGeneric(display, field, etype)
}
func dataPrimary() RootElem {
display := "Primary Data Center"
field := "test_dc"
etype := "dropdown"
return dataGeneric(display, field, etype)
}
func dataSecondary() RootElem {
display := "Secondary Data Center"
field := "test_secondary_dc"
etype := "dropdown"
return dataGeneric(display, field, etype)
}
func dataSize() RootElem {
display := "size"
field := "test_size"
etype := "dropdown"
return dataGeneric(display, field, etype)
}
func dataFactory(etype string) RootElem {
var rem RootElem
switch (etype) {
case "ENVIRONMENT":
return dataEnvironment()
case "VERSION":
return dataVersion()
case "ZONE":
return dataZone()
case "PRIMARYDC":
return dataPrimary()
case "SECONDARYDC":
return dataSecondary()
case "SIZE":
return dataSize()
}
return rem
}
func main() {
// sample element ///
var elment = Vars{
Env: "Production" ,
Version: "1.1.9" ,
Zone: "GREEN" ,
PDcenter: "Washington" ,
SDcenter: "Miami" ,
Size: "Small" ,
}
var Dict = []string{"ENVIRONMENT" , "VERSION" , "ZONE" , "PRIMARYDC" , "SECONDARYDC" , "SIZE" }
var newData, finalElem RootElem
for i := 0 ; i < reflect.ValueOf(elment).NumField() ; i++ {
currentElement := reflect.ValueOf(elment).Field(i).Interface()
currentElemType := Dict[i]
newData = dataFactory(currentElemType)
newData.RValue.Display = currentElement.(string)
newData.RValue.Evalue = valueFactory(currentElemType, currentElement.(string))
if finalElem.RDisplay == "" {
finalElem = newData
} else {
if len(finalElem.RData) == 0 {
finalElem.RData = append(finalElem.RData, newData)
} else {
if len(finalElem.RData[0].RData) == 0 {
finalElem.RData[0].RData = append( finalElem.RData[0].RData , newData)
} else {
if len(finalElem.RData[0].RData[0].RData) == 0 {
finalElem.RData[0].RData[0].RData = append (finalElem.RData[0].RData[0].RData , newData)
} else {
if len(finalElem.RData[0].RData[0].RData[0].RData) == 0 {
finalElem.RData[0].RData[0].RData[0].RData = append(finalElem.RData[0].RData[0].RData[0].RData, newData )
} else {
finalElem.RData[0].RData[0].RData[0].RData[0].RData = append(finalElem.RData[0].RData[0].RData[0].RData[0].RData, newData)
}
}
}
}
}
}
fmt.Println("final element" , finalElem)
}
wondering if there is a way to write a recursive function for creating dynamic nested json in go?
thanks
I don't know exactly what you are trying to achive, I ran your application, and you are building a tree from the flat structure. Why and what your original plan was is not clear.
Yet, your application's ever growing if tree is always doing the same with the last appended RootElem and can be written as follows. As you can see, your if structure is now independent from NumField()
var appendHere *RootElem
for i := 0; i < reflect.ValueOf(elment).NumField(); i++ {
[ ... stuff deleted ... ]
if finalElem.RDisplay == "" {
finalElem = newData
appendHere = &finalElem
} else {
appendHere.RData = append(appendHere.RData, newData)
appendHere = &(appendHere.RData[0])
}
}
fmt.Println("final element", finalElem)
}
Would have written this as comment, but the answer is too large for comments.
I'm working with amCharts 3.20.9, I sucesfully draw a graph and may export the data into an XLSX file. However, one of the columns I'm exporting is a currency, is there a way of setting such a format in the resulting file?
The script I have for the graph is:
var chart = AmCharts.makeChart("graph", {
"type" : "serial",
"theme" : "light",
"dataProvider" : data,
"valueAxes" : [ {
"stackType": "regular",
"gridColor" : "#FFFFFF",
"gridAlpha" : 0.2,
"dashLength" : 0,
"title" : "Metros cúbicos"
} ],
"gridAboveGraphs" : true,
"startDuration" : 1,
"graphs" : graphs,
"chartCursor" : {
"categoryBalloonEnabled" : false,
"cursorAlpha" : 0,
"zoomable" : false
},
"categoryField" : "formatedTime",
"categoryAxis" : {
"gridPosition" : "start",
"gridAlpha" : 0,
"tickPosition" : "start",
"tickLength" : 20,
"parseDates" : false,
"labelsEnabled": true,
"labelFrequency": 3
},
"export" : {
"enabled" : true,
"fileName" : "Reporte",
"exportTitles" : true,
"exportFields" : fields,
"columnNames" : columnNames,
"menu" : [ {
"class" : "export-main",
"menu" : [ "PDF", "XLSX" ]
} ]
}
});
Where:
graphs contains the graphs definitions, something like:
[{
"balloonText" : "[[formatedTime]]: <b>[[" + sites[i] + "]]</b>",
"balloonFunction" : formater,
"lineThickness": 1,
"lineAlpha" : 0.2,
"type" : "line",
"valueField" : sites[i]
}];
fields: ["formatedTime", "Viva Villavicencio", "Viva Villavicencio_COST_"]
columnNames: {"formatedTime": "Fecha", "Viva Villavicencio": "Metros cúbicos para: Viva Villavicencio", "Viva Villavicencio_COST_": "Costo para: Viva Villavicencio"}
So far so good, I have my xlsx with the proper data, but at the end I want the column "Viva Villavicencio_COST_" be defined as a currency in the resulting file and therefore formatted and displayed that way.
Any help will be appreciated.
Have a look at the processData option. It takes a callback function that lets you make changes to your dataset before it gets written to your exported file.
So, add to your code:
"export": {
"processData": function(data){
for(var i = 0; i < data.length; i++){
data[i].Viva Villavicencio_COST_ = toCurrency(data[i].Viva Villavicencio_COST_);
}
return data;
}
...
}
This function returns the exact dataset as before, but with a formatted Viva Villavicencio_COST_ field.
Then, add the function toCurrency. I don't believe amCharts has a function built in for formatting. If you need a better formatting function you could use something like numeral.js or accounting.js, but for now try:
function toCurrency(value){
return '$' + value;
}
Complete docs for the export plugin are here: https://github.com/amcharts/export
Hope that helps.
I need to do an update to a specific soldier in this user collection:
For example:
user: {
myArmy : {
money : 100,
fans : 100,
mySoldiers : [{
_id : ddd111bbb,
mySkill : 50,
myStamina : 50,
myMoral : 50,
},
{
_id : ddd111dd ,
mySkill : 50,
myStamina : 50,
myMoral : 50,
}],
}
}
I want in my update query to do like the following:
conditions = { _id : user._id };
update =
{ 'myArmy.mySoldiers._id' : soldierId},
{
'$set': {
'myArmy.money' : balanceToSet,
'myArmy.fans' : fansToSet,
'myArmy.mySoldiers.$.skill': skillToSet,
'myArmy.mySoldiers.$.stamina': staminaToSet,
'myArmy.mySoldiers.$.moral': moralToSet
}
}
and this is the final query:
User.update(conditions, update, options, function(err){
if (err) deferred.reject;
stream.resume();
});
And the end result if soldierId is 'ddd111bbb':
user: {
myArmy : {
money : 200,
fans : 100,
mySoldiers : [{
_id : ddd111bbb,
mySkill : 150,
myStamina : 250,
myMoral : 50,
},
{
_id : ddd111dd ,
mySkill : 50,
myStamina : 50,
myMoral : 50,
}],
}
}
Those skill, moral and stamina should change only on the specific soldier.
How do i get the $ to be the index number of this soldier, what is missing from the update query above?
This is what i was looking for:
conditions = { _id : user._id , 'myArmy.mySoldiers._id' : soldierId};
update = {
$set: {
'myArmy.balance': balanceToSet,
'myArmy.fans' : fansToSet,
'myArmy.tokens' : tokensToSet,
'myArmy.mySoldiers.$.skill' : skillToSet,
'myArmy.mySoldiers.$.stamina': staminaToSet,
'myArmy.mySoldiers.$.moral' : moralToSet
}
}
This gave me the result i wanted, before i accidentally inserted the condition query with the update one...
About
I have raw data which is processed by aggregation framework and later results saved in another collection. Let's assume aggregation results in something like this:
cursor = {
"result" : [
{
"_id" : {
"x" : 1,
"version" : [
"2_0"
],
"date" : {
"year" : 2015,
"month" : 3,
"day" : 26
}
},
"x" : 1,
"count" : 2
}
],
"ok" : 1
};
Note that in most cases cursor length are more than about 2k elements.
So, now i'm looping thought cursor ( cursor.forEach ) and performing following steps:
// Try to increment values:
var inc = db.runCommand({
findAndModify: my_coll,
query : {
"_id.x" : "1",
"value.2_0" : {
"$elemMatch" : {
"date" : ISODate("2015-12-18T00:00:00Z")
}
}
},
update : { $inc: {
"value.2_0.$.x" : 1
} }
});
// If there's no effected row via inc operation, - sub-element doesn't exists at all
// so let's push it
if (inc.value == null) {
date[date.key] = date.value;
var up = db.getCollection(my_coll).update(
{
"_id.x" : 1
},
{
$push : {}
},
{ writeConcern: { w: "majority", wtimeout: 5000 } }
);
// No document found for inserting sub element, let's create it
if (up.nMatched == 0) {
db.getCollection(my_coll).insert({
"_id" : {
"x" : 1
},
"value" : {}
});
}}
Resulting data-structure:
data = {
"_id" : {
"x" : 1,
"y" : 1
},
"value" : {
"2_0" : [
{
"date" : ISODate("2014-12-17T00:00:00.000Z"),
"x" : 1
},
{
"date" : ISODate("2014-12-18T00:00:00.000Z"),
"x" : 2
}
]
}
};
In short i have to apply these actions to process my data:
Try to increment values.
If there's no effected data by increment operation push data to array.
If there's no effected data by push operation create new document.
Problem:
In some cases aggregation result returns more than 2k results i have to apply mentioned steps who causes performance bottleneck. While i'm processing already aggregated data, - new data accumulates for aggregation and later i cannot apply even aggregation to this new raw data because it exceeds 64MB size limit due firsts slowness.
Question:
How i can with this data-structure improve performance when increasing x ( see data-stucture ) values or adding-sub elements?
Also i cannot apply mongodb bulk operations due nested structure using positional parameter.
Maybe chosen data-model is not correct? Or maybe i'm doing not correctly aggregation task at all?
How i can improve aggregated data insertions?