MongoDB Data Model Optimization - performance

About
I have raw data which is processed by aggregation framework and later results saved in another collection. Let's assume aggregation results in something like this:
cursor = {
"result" : [
{
"_id" : {
"x" : 1,
"version" : [
"2_0"
],
"date" : {
"year" : 2015,
"month" : 3,
"day" : 26
}
},
"x" : 1,
"count" : 2
}
],
"ok" : 1
};
Note that in most cases cursor length are more than about 2k elements.
So, now i'm looping thought cursor ( cursor.forEach ) and performing following steps:
// Try to increment values:
var inc = db.runCommand({
findAndModify: my_coll,
query : {
"_id.x" : "1",
"value.2_0" : {
"$elemMatch" : {
"date" : ISODate("2015-12-18T00:00:00Z")
}
}
},
update : { $inc: {
"value.2_0.$.x" : 1
} }
});
// If there's no effected row via inc operation, - sub-element doesn't exists at all
// so let's push it
if (inc.value == null) {
date[date.key] = date.value;
var up = db.getCollection(my_coll).update(
{
"_id.x" : 1
},
{
$push : {}
},
{ writeConcern: { w: "majority", wtimeout: 5000 } }
);
// No document found for inserting sub element, let's create it
if (up.nMatched == 0) {
db.getCollection(my_coll).insert({
"_id" : {
"x" : 1
},
"value" : {}
});
}}
Resulting data-structure:
data = {
"_id" : {
"x" : 1,
"y" : 1
},
"value" : {
"2_0" : [
{
"date" : ISODate("2014-12-17T00:00:00.000Z"),
"x" : 1
},
{
"date" : ISODate("2014-12-18T00:00:00.000Z"),
"x" : 2
}
]
}
};
In short i have to apply these actions to process my data:
Try to increment values.
If there's no effected data by increment operation push data to array.
If there's no effected data by push operation create new document.
Problem:
In some cases aggregation result returns more than 2k results i have to apply mentioned steps who causes performance bottleneck. While i'm processing already aggregated data, - new data accumulates for aggregation and later i cannot apply even aggregation to this new raw data because it exceeds 64MB size limit due firsts slowness.
Question:
How i can with this data-structure improve performance when increasing x ( see data-stucture ) values or adding-sub elements?
Also i cannot apply mongodb bulk operations due nested structure using positional parameter.
Maybe chosen data-model is not correct? Or maybe i'm doing not correctly aggregation task at all?
How i can improve aggregated data insertions?

Related

Index a dynamic object using NEST 5.0

I Found this question Index a dynamic object using NEST from 2 years ago.
I basically have exactly the same question but using NEST 5.0 . The proposed sollution doesn't work annymore in the newest version.
casting to object and then indexing, results in an elasticsearch document with no fields in the source tag
the esClient.Raw.Index api is missing
Working with dynamic types is similar in NEST 5.x as it was in NEST 1.x; some of the client API has changed a little in between these versions but the premise is still the same.
Here's an example
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var defaultIndex = "default-index";
var connectionSettings = new ConnectionSettings(pool)
.DefaultIndex(defaultIndex);
var client = new ElasticClient(connectionSettings);
// delete the index if it already exists
if (client.IndexExists(defaultIndex).Exists)
client.DeleteIndex(defaultIndex);
client.CreateIndex(defaultIndex);
// create an anonymous type assigned to a dynamically typed variable
dynamic instance = new
{
Name = "Russ",
CompanyName = "Elastic",
Date = DateTimeOffset.UtcNow
};
// cast the instance to object to index, explicitly
// specify the document type and index
var indexResponse = client.Index((object)instance, i => i
.Type("my_type")
.Index(defaultIndex)
);
// fetch the document just indexed
var getResponse = client.Get<dynamic>(indexResponse.Id, g => g
.Type(indexResponse.Type)
.Index(indexResponse.Index)
);
The request and response JSON for this look like
HEAD http://localhost:9200/default-index?pretty=true
Status: 200
------------------------------
DELETE http://localhost:9200/default-index?pretty=true
Status: 200
{
"acknowledged" : true
}
------------------------------
PUT http://localhost:9200/default-index?pretty=true
{}
Status: 200
{
"acknowledged" : true,
"shards_acknowledged" : true
}
------------------------------
POST http://localhost:9200/default-index/my_type?pretty=true
{
"name": "Russ",
"companyName": "Elastic",
"date": "2017-03-11T04:03:53.0561954+00:00"
}
Status: 201
{
"_index" : "default-index",
"_type" : "my_type",
"_id" : "AVq7iXhpc_F3ya7MTJiU",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : true
}
------------------------------
GET http://localhost:9200/default-index/my_type/AVq7iXhpc_F3ya7MTJiU?pretty=true
Status: 200
{
"_index" : "default-index",
"_type" : "my_type",
"_id" : "AVq7iXhpc_F3ya7MTJiU",
"_version" : 1,
"found" : true,
"_source" : {
"name" : "Russ",
"companyName" : "Elastic",
"date" : "2017-03-11T04:03:53.0561954+00:00"
}
}
------------------------------
This demonstrates that the document is indexed as expected, and the original source document can be retrieved.
The low level client can be accessed on the high level client in NEST 2.x and 5.x through the .LowLevel property, so you can do something similar to the linked question with
dynamic instance = new
{
Id = "id",
Index = defaultIndex,
Type = "my_type",
Document = new
{
Name = "Russ",
CompanyName = "Elastic",
Date = DateTimeOffset.UtcNow
}
};
string documentJson = client.Serializer.SerializeToString((object)instance.Document);
var result = client.LowLevel.Index<string>(instance.Index, instance.Type, instance.Id, documentJson);

Formatting currency when exporting amCharts

I'm working with amCharts 3.20.9, I sucesfully draw a graph and may export the data into an XLSX file. However, one of the columns I'm exporting is a currency, is there a way of setting such a format in the resulting file?
The script I have for the graph is:
var chart = AmCharts.makeChart("graph", {
"type" : "serial",
"theme" : "light",
"dataProvider" : data,
"valueAxes" : [ {
"stackType": "regular",
"gridColor" : "#FFFFFF",
"gridAlpha" : 0.2,
"dashLength" : 0,
"title" : "Metros cúbicos"
} ],
"gridAboveGraphs" : true,
"startDuration" : 1,
"graphs" : graphs,
"chartCursor" : {
"categoryBalloonEnabled" : false,
"cursorAlpha" : 0,
"zoomable" : false
},
"categoryField" : "formatedTime",
"categoryAxis" : {
"gridPosition" : "start",
"gridAlpha" : 0,
"tickPosition" : "start",
"tickLength" : 20,
"parseDates" : false,
"labelsEnabled": true,
"labelFrequency": 3
},
"export" : {
"enabled" : true,
"fileName" : "Reporte",
"exportTitles" : true,
"exportFields" : fields,
"columnNames" : columnNames,
"menu" : [ {
"class" : "export-main",
"menu" : [ "PDF", "XLSX" ]
} ]
}
});
Where:
graphs contains the graphs definitions, something like:
[{
"balloonText" : "[[formatedTime]]: <b>[[" + sites[i] + "]]</b>",
"balloonFunction" : formater,
"lineThickness": 1,
"lineAlpha" : 0.2,
"type" : "line",
"valueField" : sites[i]
}];
fields: ["formatedTime", "Viva Villavicencio", "Viva Villavicencio_COST_"]
columnNames: {"formatedTime": "Fecha", "Viva Villavicencio": "Metros cúbicos para: Viva Villavicencio", "Viva Villavicencio_COST_": "Costo para: Viva Villavicencio"}
So far so good, I have my xlsx with the proper data, but at the end I want the column "Viva Villavicencio_COST_" be defined as a currency in the resulting file and therefore formatted and displayed that way.
Any help will be appreciated.
Have a look at the processData option. It takes a callback function that lets you make changes to your dataset before it gets written to your exported file.
So, add to your code:
"export": {
"processData": function(data){
for(var i = 0; i < data.length; i++){
data[i].Viva Villavicencio_COST_ = toCurrency(data[i].Viva Villavicencio_COST_);
}
return data;
}
...
}
This function returns the exact dataset as before, but with a formatted Viva Villavicencio_COST_ field.
Then, add the function toCurrency. I don't believe amCharts has a function built in for formatting. If you need a better formatting function you could use something like numeral.js or accounting.js, but for now try:
function toCurrency(value){
return '$' + value;
}
Complete docs for the export plugin are here: https://github.com/amcharts/export
Hope that helps.

How to take the shortest distance per person (with multiple addresses) to an origin point and sort on that value

I have People documents in my elastic index and each person has multiple addresses, each address has a lat/long point associated.
I'd like to geo sort all the people by proximity to a specific origin location, however multiple locations per person complicates this matter. What has been decided is [Objective:] to take the shortest distance per person to the origin point and use that number as the sort number.
Example of my people index roughed out in 'pseudo-JSON' showing a couple of person documents each having multiple addresses:
person {
name: John Smith
addresses [
{ lat: 43.5234, lon: 32.5432, 1 Main St. }
{ lat: 44.983, lon: 37.3432, 2 Queen St. W. }
{ ... more addresses ... }
]
}
person {
name: Jane Doe
addresses [
... she has a bunch of addresses too ...
]
}
... many more people docs each having multiple addresses like above ...
Currently I'm using an elastic script field with an inline groovy script like so - it uses a groovy script to calculate meters from origin for each address, shoves all those meter distances into an array per person and picks the minimum number from the array per person making it the sort value.
string groovyShortestDistanceMetersSortScript = string.Format("[doc['geo1'].distance({0}, {1}), doc['geo2'].distance({0}, {1})].min()",
origin.Latitude,
origin.Longitude);
var shortestMetersSort = new SortDescriptor<Person>()
.Script(sd => sd
.Type("number")
.Script(script => script
.Inline(groovyShortestDistanceMetersSortScript)
)
.Order(SortOrder.Ascending)
);
Although this works, I wonder if using a scripted field might be more expensive or too complex at querying time, and if there is a better way to achieve the desired sort order outcome by indexing the data differently and/or by using aggregations, maybe even doing away with the script field altogether.
Any thoughts and guidance are appreciated. I'm sure somebody else has run into this same requirement (or similar) and has found a different or better solution.
I'm using the Nest API in this code sample but will gladly accept answers in elasticsearch JSON format because I can port those into the NEST API code.
When sorting on distance from a specified origin where the field being sorted on contains a collection of values (in this case geo_point types), we can specify how a value should be collected from the collection using the sort_mode. In this case, we can specify a sort_mode of "min" to use the nearest location to the origin as the sort value. Here's an example
public class Person
{
public string Name { get; set; }
public IList<Address> Addresses { get; set; }
}
public class Address
{
public string Name { get; set; }
public GeoLocation Location { get; set; }
}
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var indexName = "people";
var connectionSettings = new ConnectionSettings(pool)
.InferMappingFor<Person>(m => m.IndexName(indexName));
var client = new ElasticClient(connectionSettings);
if (client.IndexExists(indexName).Exists)
client.DeleteIndex(indexName);
client.CreateIndex(indexName, c => c
.Settings(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
)
.Mappings(m => m
.Map<Person>(mm => mm
.AutoMap()
.Properties(p => p
.Nested<Address>(n => n
.Name(nn => nn.Addresses.First().Location)
.AutoMap()
)
)
)
)
);
var people = new[] {
new Person {
Name = "John Smith",
Addresses = new List<Address>
{
new Address
{
Name = "Buckingham Palace",
Location = new GeoLocation(51.501476, -0.140634)
},
new Address
{
Name = "Empire State Building",
Location = new GeoLocation(40.748817, -73.985428)
}
}
},
new Person {
Name = "Jane Doe",
Addresses = new List<Address>
{
new Address
{
Name = "Eiffel Tower",
Location = new GeoLocation(48.858257, 2.294511)
},
new Address
{
Name = "Uluru",
Location = new GeoLocation(-25.383333, 131.083333)
}
}
}
};
client.IndexMany(people);
// call refresh for testing (avoid in production)
client.Refresh("people");
var towerOfLondon = new GeoLocation(51.507313, -0.074308);
client.Search<Person>(s => s
.MatchAll()
.Sort(so => so
.GeoDistance(g => g
.Field(f => f.Addresses.First().Location)
.PinTo(towerOfLondon)
.Ascending()
.Unit(DistanceUnit.Meters)
// Take the minimum address location distance from
// our target location, The Tower of London
.Mode(SortMode.Min)
)
)
);
}
This creates the following search
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"addresses.location": [
{
"lat": 51.507313,
"lon": -0.074308
}
],
"order": "asc",
"mode": "min",
"unit": "m"
}
}
]
}
which returns
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yT",
"_score" : null,
"_source" : {
"name" : "John Smith",
"addresses" : [ {
"name" : "Buckingham Palace",
"location" : {
"lat" : 51.501476,
"lon" : -0.140634
}
}, {
"name" : "Empire State Building",
"location" : {
"lat" : 40.748817,
"lon" : -73.985428
}
} ]
},
"sort" : [ 4632.035195223564 ]
}, {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yU",
"_score" : null,
"_source" : {
"name" : "Jane Doe",
"addresses" : [ {
"name" : "Eiffel Tower",
"location" : {
"lat" : 48.858257,
"lon" : 2.294511
}
}, {
"name" : "Uluru",
"location" : {
"lat" : -25.383333,
"lon" : 131.083333
}
} ]
},
"sort" : [ 339100.6843074794 ]
} ]
}
}
The value returned in the sort array for each hit is the minimum distance in the sort unit specified (in our case, metres) from the specified point (The Tower of London) and the addresses for each person.
Per the guidelines in Sorting By Distance documentation, often it can make more sense to score by distance, which can be achieved by using function_score query with a decay function.

Specifying "groups" dynamically in stacked bar chart c3js

//take this code as an example
Here i have specified yvalue[0],yvalue[1] in groups..
But i need a general design ,where I dont know the number of groups that i have to create(i.e the number of segments) this varies according to the json data.
Consider this example here I have total,total1 therefore i have only 2 values.But if a third variable say total2 is specified in json, I should have a segment for it in my bar chart and so on.This has to be done without altering the groups everytime a field is added.Is there any way to achieve this??
Thanks
var datajson = [ {
country : "china",
total : 20,
total1 : 10
}, {
country : "India",
total : 40,
total1 : 20
}, {
country : "aus",
total : 10,
total1 : 30
}, {
country : "nxz",
total : 50,
total1 : 40
}
];
var xvalue;
var yvalue = [];
var i = 0;
var obj = datajson[0]
for ( var key in obj) {
if (xvalue === undefined)
xvalue = key;
else {
yvalue[i] = key;
i++;
}
}
var chart = c3.generate({
bindto : '#chart',
data : {
json : datajson,
type : 'bar',
keys : {
x : xvalue, // it's possible to specify 'x' when category axis
value : [yvalue[0],yvalue[1]],
},
groups : [ [yvalue[0],yvalue[1]] ]
},
bar : {
width : {
ratio : 0.3
// this makes bar width 50% of length between ticks
}
},
axis : {
x : {
type : 'category'
},
}
});
Your question seems to have most of the answer (you are already generating the yvalue array from the object properties).
You just don't have to specify the elements one by one, instead you can just pass in the array directly and you are done.
groups: [yvalue]

Generating Different Colours On the JVectorMap Regions Based On The Range Of Values

I'm using JVectorMap for creating World Map..
As part of my JVectorMap im displaying on the regions CountryName with Population..
My Question is:
How to show differrent colours for the regions (countries) based on population ranges.
Ex: For 1-1000 population i have to show red colour.
For 1000-5000 population i have to show blue colour.
I'm using code like this.But it is not displaying different colours based on the range of population
var mapData = {
"AF": 1000,
"AL": 5000,
"DZ": 20000,
...
};
try{
$('#id').vectorMap(
{
map : 'world_mill_en',
series : {
regions : [ {
initial : {
fill : 'white',
"fill-opacity" : 1,
stroke : 'none',
"stroke-width" : 0,
"stroke-opacity" : 1
},
hover : {
"fill-opacity" : 0.8
},
selected : {
fill : 'yellow'
},
selectedHover : {},
values : mapData,
scale : [ '#C8EEFF', '#0071A4' ],
normalizeFunction : 'polynomial'
} ]
},
onRegionLabelShow : function(e, el, code) {
el.html(el.html()+' (Population - '+mapData[code]+')');
}
});
}
catch(err){
alert(err);
}
Can any one help me in displaying differnt colours for the regions based on the range of population......?
Thanks in advance..
Create a JSON with count and color codes like this according to your regions and colors.
var colorData = {
"1" : "#C8EEFF",
"2" : "#0071A4",
"3" : "#C8EEFF",
"4" : "#0071A4",
"5" : "#C8EEFF",
"6" : "#0071A4"
}
and pass this JSON to the scale : colorData. Hope it helps you.

Resources