Index a dynamic object using NEST 5.0 - elasticsearch

I Found this question Index a dynamic object using NEST from 2 years ago.
I basically have exactly the same question but using NEST 5.0 . The proposed sollution doesn't work annymore in the newest version.
casting to object and then indexing, results in an elasticsearch document with no fields in the source tag
the esClient.Raw.Index api is missing

Working with dynamic types is similar in NEST 5.x as it was in NEST 1.x; some of the client API has changed a little in between these versions but the premise is still the same.
Here's an example
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var defaultIndex = "default-index";
var connectionSettings = new ConnectionSettings(pool)
.DefaultIndex(defaultIndex);
var client = new ElasticClient(connectionSettings);
// delete the index if it already exists
if (client.IndexExists(defaultIndex).Exists)
client.DeleteIndex(defaultIndex);
client.CreateIndex(defaultIndex);
// create an anonymous type assigned to a dynamically typed variable
dynamic instance = new
{
Name = "Russ",
CompanyName = "Elastic",
Date = DateTimeOffset.UtcNow
};
// cast the instance to object to index, explicitly
// specify the document type and index
var indexResponse = client.Index((object)instance, i => i
.Type("my_type")
.Index(defaultIndex)
);
// fetch the document just indexed
var getResponse = client.Get<dynamic>(indexResponse.Id, g => g
.Type(indexResponse.Type)
.Index(indexResponse.Index)
);
The request and response JSON for this look like
HEAD http://localhost:9200/default-index?pretty=true
Status: 200
------------------------------
DELETE http://localhost:9200/default-index?pretty=true
Status: 200
{
"acknowledged" : true
}
------------------------------
PUT http://localhost:9200/default-index?pretty=true
{}
Status: 200
{
"acknowledged" : true,
"shards_acknowledged" : true
}
------------------------------
POST http://localhost:9200/default-index/my_type?pretty=true
{
"name": "Russ",
"companyName": "Elastic",
"date": "2017-03-11T04:03:53.0561954+00:00"
}
Status: 201
{
"_index" : "default-index",
"_type" : "my_type",
"_id" : "AVq7iXhpc_F3ya7MTJiU",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : true
}
------------------------------
GET http://localhost:9200/default-index/my_type/AVq7iXhpc_F3ya7MTJiU?pretty=true
Status: 200
{
"_index" : "default-index",
"_type" : "my_type",
"_id" : "AVq7iXhpc_F3ya7MTJiU",
"_version" : 1,
"found" : true,
"_source" : {
"name" : "Russ",
"companyName" : "Elastic",
"date" : "2017-03-11T04:03:53.0561954+00:00"
}
}
------------------------------
This demonstrates that the document is indexed as expected, and the original source document can be retrieved.
The low level client can be accessed on the high level client in NEST 2.x and 5.x through the .LowLevel property, so you can do something similar to the linked question with
dynamic instance = new
{
Id = "id",
Index = defaultIndex,
Type = "my_type",
Document = new
{
Name = "Russ",
CompanyName = "Elastic",
Date = DateTimeOffset.UtcNow
}
};
string documentJson = client.Serializer.SerializeToString((object)instance.Document);
var result = client.LowLevel.Index<string>(instance.Index, instance.Type, instance.Id, documentJson);

Related

How to use the attachment processor and remove processor within an array of attachments with NEST client?

Description of problem
I want to use the attachment processor and remove processor within an array of attachments. I am aware of the fact that the foreach processor is required for this purpose.
This enables the attachment processor and remove processor to be run on the individual elements of the array (https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment-with-arrays.html)
I dont find any good NEST(c#) examples for indexing an array of attachments and removing the content field. Can someone provide a NEST(C#) example for my use case?
UPDATE: Thanks to Russ Cam, It's now possible to index an array of attachments and remove base64 encoded file content with following pipeline:
_client.PutPipeline("attachments", p => p
.Description("Document attachments pipeline")
.Processors(pp => pp
.Foreach<ApplicationDto>(fe => fe
.Field(f => f.Attachments)
.Processor(fep => fep
.Attachment<Attachment>(a => a
.Field("_ingest._value._content")
.TargetField("_ingest._value.attachment")
)
)
).Foreach<ApplicationDto>(fe => fe
.Field(f => f.Attachments)
.Processor(fep => fep
.Remove<Attachment>(r => r
.Field("_ingest._value._content")
)
)
)
)
);
Your code is missing the ForeachProcessor; the NEST implementation for this is pretty much a direct translation of the Elasticsearch JSON example. It's a little easier using the Attachment type available in NEST too, which the attachment object that the data is extracted into will deserialize into.
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var defaultIndex = "default-index";
var connectionSettings = new ConnectionSettings(pool)
.DefaultIndex(defaultIndex);
var client = new ElasticClient(connectionSettings);
if (client.IndexExists(defaultIndex).Exists)
client.DeleteIndex(defaultIndex);
client.PutPipeline("attachments", p => p
.Processors(pp => pp
.Description("Document attachment pipeline")
.Foreach<Document>(fe => fe
.Field(f => f.Attachments)
.Processor(fep => fep
.Attachment<Attachment>(a => a
.Field("_ingest._value.data")
.TargetField("_ingest._value.attachment")
)
)
)
)
);
var indexResponse = client.Index(new Document
{
Attachments = new List<DocumentAttachment>
{
new DocumentAttachment { Data = "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo=" },
new DocumentAttachment { Data = "VGhpcyBpcyBhIHRlc3QK" }
}
},
i => i.Pipeline("attachments")
);
var getResponse = client.Get<Document>(indexResponse.Id);
}
public class Document
{
public List<DocumentAttachment> Attachments { get; set; }
}
public class DocumentAttachment
{
public string Data { get; set; }
public Attachment Attachment { get; set; }
}
returns
{
"_index" : "default-index",
"_type" : "document",
"_id" : "AVrOVuC1vjcwkxZzCHYS",
"_version" : 1,
"found" : true,
"_source" : {
"attachments" : [
{
"data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo=",
"attachment" : {
"content_type" : "text/plain; charset=ISO-8859-1",
"language" : "en",
"content" : "this is\njust some text",
"content_length" : 24
}
},
{
"data" : "VGhpcyBpcyBhIHRlc3QK",
"attachment" : {
"content_type" : "text/plain; charset=ISO-8859-1",
"language" : "en",
"content" : "This is a test",
"content_length" : 16
}
}
]
}
}
You can chain the RemoveProcessor on to remove the data field from _source too.

Formatting currency when exporting amCharts

I'm working with amCharts 3.20.9, I sucesfully draw a graph and may export the data into an XLSX file. However, one of the columns I'm exporting is a currency, is there a way of setting such a format in the resulting file?
The script I have for the graph is:
var chart = AmCharts.makeChart("graph", {
"type" : "serial",
"theme" : "light",
"dataProvider" : data,
"valueAxes" : [ {
"stackType": "regular",
"gridColor" : "#FFFFFF",
"gridAlpha" : 0.2,
"dashLength" : 0,
"title" : "Metros cúbicos"
} ],
"gridAboveGraphs" : true,
"startDuration" : 1,
"graphs" : graphs,
"chartCursor" : {
"categoryBalloonEnabled" : false,
"cursorAlpha" : 0,
"zoomable" : false
},
"categoryField" : "formatedTime",
"categoryAxis" : {
"gridPosition" : "start",
"gridAlpha" : 0,
"tickPosition" : "start",
"tickLength" : 20,
"parseDates" : false,
"labelsEnabled": true,
"labelFrequency": 3
},
"export" : {
"enabled" : true,
"fileName" : "Reporte",
"exportTitles" : true,
"exportFields" : fields,
"columnNames" : columnNames,
"menu" : [ {
"class" : "export-main",
"menu" : [ "PDF", "XLSX" ]
} ]
}
});
Where:
graphs contains the graphs definitions, something like:
[{
"balloonText" : "[[formatedTime]]: <b>[[" + sites[i] + "]]</b>",
"balloonFunction" : formater,
"lineThickness": 1,
"lineAlpha" : 0.2,
"type" : "line",
"valueField" : sites[i]
}];
fields: ["formatedTime", "Viva Villavicencio", "Viva Villavicencio_COST_"]
columnNames: {"formatedTime": "Fecha", "Viva Villavicencio": "Metros cúbicos para: Viva Villavicencio", "Viva Villavicencio_COST_": "Costo para: Viva Villavicencio"}
So far so good, I have my xlsx with the proper data, but at the end I want the column "Viva Villavicencio_COST_" be defined as a currency in the resulting file and therefore formatted and displayed that way.
Any help will be appreciated.
Have a look at the processData option. It takes a callback function that lets you make changes to your dataset before it gets written to your exported file.
So, add to your code:
"export": {
"processData": function(data){
for(var i = 0; i < data.length; i++){
data[i].Viva Villavicencio_COST_ = toCurrency(data[i].Viva Villavicencio_COST_);
}
return data;
}
...
}
This function returns the exact dataset as before, but with a formatted Viva Villavicencio_COST_ field.
Then, add the function toCurrency. I don't believe amCharts has a function built in for formatting. If you need a better formatting function you could use something like numeral.js or accounting.js, but for now try:
function toCurrency(value){
return '$' + value;
}
Complete docs for the export plugin are here: https://github.com/amcharts/export
Hope that helps.

How to take the shortest distance per person (with multiple addresses) to an origin point and sort on that value

I have People documents in my elastic index and each person has multiple addresses, each address has a lat/long point associated.
I'd like to geo sort all the people by proximity to a specific origin location, however multiple locations per person complicates this matter. What has been decided is [Objective:] to take the shortest distance per person to the origin point and use that number as the sort number.
Example of my people index roughed out in 'pseudo-JSON' showing a couple of person documents each having multiple addresses:
person {
name: John Smith
addresses [
{ lat: 43.5234, lon: 32.5432, 1 Main St. }
{ lat: 44.983, lon: 37.3432, 2 Queen St. W. }
{ ... more addresses ... }
]
}
person {
name: Jane Doe
addresses [
... she has a bunch of addresses too ...
]
}
... many more people docs each having multiple addresses like above ...
Currently I'm using an elastic script field with an inline groovy script like so - it uses a groovy script to calculate meters from origin for each address, shoves all those meter distances into an array per person and picks the minimum number from the array per person making it the sort value.
string groovyShortestDistanceMetersSortScript = string.Format("[doc['geo1'].distance({0}, {1}), doc['geo2'].distance({0}, {1})].min()",
origin.Latitude,
origin.Longitude);
var shortestMetersSort = new SortDescriptor<Person>()
.Script(sd => sd
.Type("number")
.Script(script => script
.Inline(groovyShortestDistanceMetersSortScript)
)
.Order(SortOrder.Ascending)
);
Although this works, I wonder if using a scripted field might be more expensive or too complex at querying time, and if there is a better way to achieve the desired sort order outcome by indexing the data differently and/or by using aggregations, maybe even doing away with the script field altogether.
Any thoughts and guidance are appreciated. I'm sure somebody else has run into this same requirement (or similar) and has found a different or better solution.
I'm using the Nest API in this code sample but will gladly accept answers in elasticsearch JSON format because I can port those into the NEST API code.
When sorting on distance from a specified origin where the field being sorted on contains a collection of values (in this case geo_point types), we can specify how a value should be collected from the collection using the sort_mode. In this case, we can specify a sort_mode of "min" to use the nearest location to the origin as the sort value. Here's an example
public class Person
{
public string Name { get; set; }
public IList<Address> Addresses { get; set; }
}
public class Address
{
public string Name { get; set; }
public GeoLocation Location { get; set; }
}
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var indexName = "people";
var connectionSettings = new ConnectionSettings(pool)
.InferMappingFor<Person>(m => m.IndexName(indexName));
var client = new ElasticClient(connectionSettings);
if (client.IndexExists(indexName).Exists)
client.DeleteIndex(indexName);
client.CreateIndex(indexName, c => c
.Settings(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
)
.Mappings(m => m
.Map<Person>(mm => mm
.AutoMap()
.Properties(p => p
.Nested<Address>(n => n
.Name(nn => nn.Addresses.First().Location)
.AutoMap()
)
)
)
)
);
var people = new[] {
new Person {
Name = "John Smith",
Addresses = new List<Address>
{
new Address
{
Name = "Buckingham Palace",
Location = new GeoLocation(51.501476, -0.140634)
},
new Address
{
Name = "Empire State Building",
Location = new GeoLocation(40.748817, -73.985428)
}
}
},
new Person {
Name = "Jane Doe",
Addresses = new List<Address>
{
new Address
{
Name = "Eiffel Tower",
Location = new GeoLocation(48.858257, 2.294511)
},
new Address
{
Name = "Uluru",
Location = new GeoLocation(-25.383333, 131.083333)
}
}
}
};
client.IndexMany(people);
// call refresh for testing (avoid in production)
client.Refresh("people");
var towerOfLondon = new GeoLocation(51.507313, -0.074308);
client.Search<Person>(s => s
.MatchAll()
.Sort(so => so
.GeoDistance(g => g
.Field(f => f.Addresses.First().Location)
.PinTo(towerOfLondon)
.Ascending()
.Unit(DistanceUnit.Meters)
// Take the minimum address location distance from
// our target location, The Tower of London
.Mode(SortMode.Min)
)
)
);
}
This creates the following search
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"addresses.location": [
{
"lat": 51.507313,
"lon": -0.074308
}
],
"order": "asc",
"mode": "min",
"unit": "m"
}
}
]
}
which returns
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yT",
"_score" : null,
"_source" : {
"name" : "John Smith",
"addresses" : [ {
"name" : "Buckingham Palace",
"location" : {
"lat" : 51.501476,
"lon" : -0.140634
}
}, {
"name" : "Empire State Building",
"location" : {
"lat" : 40.748817,
"lon" : -73.985428
}
} ]
},
"sort" : [ 4632.035195223564 ]
}, {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yU",
"_score" : null,
"_source" : {
"name" : "Jane Doe",
"addresses" : [ {
"name" : "Eiffel Tower",
"location" : {
"lat" : 48.858257,
"lon" : 2.294511
}
}, {
"name" : "Uluru",
"location" : {
"lat" : -25.383333,
"lon" : 131.083333
}
} ]
},
"sort" : [ 339100.6843074794 ]
} ]
}
}
The value returned in the sort array for each hit is the minimum distance in the sort unit specified (in our case, metres) from the specified point (The Tower of London) and the addresses for each person.
Per the guidelines in Sorting By Distance documentation, often it can make more sense to score by distance, which can be achieved by using function_score query with a decay function.

How to update this specific data in this user collection in mongodb?

I need to do an update to a specific soldier in this user collection:
For example:
user: {
myArmy : {
money : 100,
fans : 100,
mySoldiers : [{
_id : ddd111bbb,
mySkill : 50,
myStamina : 50,
myMoral : 50,
},
{
_id : ddd111dd ,
mySkill : 50,
myStamina : 50,
myMoral : 50,
}],
}
}
I want in my update query to do like the following:
conditions = { _id : user._id };
update =
{ 'myArmy.mySoldiers._id' : soldierId},
{
'$set': {
'myArmy.money' : balanceToSet,
'myArmy.fans' : fansToSet,
'myArmy.mySoldiers.$.skill': skillToSet,
'myArmy.mySoldiers.$.stamina': staminaToSet,
'myArmy.mySoldiers.$.moral': moralToSet
}
}
and this is the final query:
User.update(conditions, update, options, function(err){
if (err) deferred.reject;
stream.resume();
});
And the end result if soldierId is 'ddd111bbb':
user: {
myArmy : {
money : 200,
fans : 100,
mySoldiers : [{
_id : ddd111bbb,
mySkill : 150,
myStamina : 250,
myMoral : 50,
},
{
_id : ddd111dd ,
mySkill : 50,
myStamina : 50,
myMoral : 50,
}],
}
}
Those skill, moral and stamina should change only on the specific soldier.
How do i get the $ to be the index number of this soldier, what is missing from the update query above?
This is what i was looking for:
conditions = { _id : user._id , 'myArmy.mySoldiers._id' : soldierId};
update = {
$set: {
'myArmy.balance': balanceToSet,
'myArmy.fans' : fansToSet,
'myArmy.tokens' : tokensToSet,
'myArmy.mySoldiers.$.skill' : skillToSet,
'myArmy.mySoldiers.$.stamina': staminaToSet,
'myArmy.mySoldiers.$.moral' : moralToSet
}
}
This gave me the result i wanted, before i accidentally inserted the condition query with the update one...

MongoDB Data Model Optimization

About
I have raw data which is processed by aggregation framework and later results saved in another collection. Let's assume aggregation results in something like this:
cursor = {
"result" : [
{
"_id" : {
"x" : 1,
"version" : [
"2_0"
],
"date" : {
"year" : 2015,
"month" : 3,
"day" : 26
}
},
"x" : 1,
"count" : 2
}
],
"ok" : 1
};
Note that in most cases cursor length are more than about 2k elements.
So, now i'm looping thought cursor ( cursor.forEach ) and performing following steps:
// Try to increment values:
var inc = db.runCommand({
findAndModify: my_coll,
query : {
"_id.x" : "1",
"value.2_0" : {
"$elemMatch" : {
"date" : ISODate("2015-12-18T00:00:00Z")
}
}
},
update : { $inc: {
"value.2_0.$.x" : 1
} }
});
// If there's no effected row via inc operation, - sub-element doesn't exists at all
// so let's push it
if (inc.value == null) {
date[date.key] = date.value;
var up = db.getCollection(my_coll).update(
{
"_id.x" : 1
},
{
$push : {}
},
{ writeConcern: { w: "majority", wtimeout: 5000 } }
);
// No document found for inserting sub element, let's create it
if (up.nMatched == 0) {
db.getCollection(my_coll).insert({
"_id" : {
"x" : 1
},
"value" : {}
});
}}
Resulting data-structure:
data = {
"_id" : {
"x" : 1,
"y" : 1
},
"value" : {
"2_0" : [
{
"date" : ISODate("2014-12-17T00:00:00.000Z"),
"x" : 1
},
{
"date" : ISODate("2014-12-18T00:00:00.000Z"),
"x" : 2
}
]
}
};
In short i have to apply these actions to process my data:
Try to increment values.
If there's no effected data by increment operation push data to array.
If there's no effected data by push operation create new document.
Problem:
In some cases aggregation result returns more than 2k results i have to apply mentioned steps who causes performance bottleneck. While i'm processing already aggregated data, - new data accumulates for aggregation and later i cannot apply even aggregation to this new raw data because it exceeds 64MB size limit due firsts slowness.
Question:
How i can with this data-structure improve performance when increasing x ( see data-stucture ) values or adding-sub elements?
Also i cannot apply mongodb bulk operations due nested structure using positional parameter.
Maybe chosen data-model is not correct? Or maybe i'm doing not correctly aggregation task at all?
How i can improve aggregated data insertions?

Resources