Can't select sub aggregation in Nest - elasticsearch

I get these results in my Elastic query:
"Results" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "73c47133-8656-45e7-9499-14f52df07b70",
"doc_count" : 1,
"foo" : {
"doc_count" : 40,
"bar" : {
"doc_count" : 1,
"customscore" : {
"value" : 10.496919917864476
}
}
}
}
]
I am trying to get a list of anonymous objects with the key field as the key and customscore field as the value.
No matter what I try, I can't seem to write code in Nest that accesses the customscore value. Apparently, I'm the very first person in the world to use nested Aggregations with the Nest library. Either that, or the documentation is very lacking. I can easily reach the Buckets:
response?.Aggregations.Terms("Results").Buckets;
But I have no idea what to do with this object. Buckets contains several objects, which I would assume I could navigate by doing this:
bucketObject["foo"]["bar"]["customscore"]
But apparently not. I have found solutions that use for loops, solutions with long Linq queries, and all of them seem to return null for me. What am I missing?

Assuming the following query, which I think would match the response in the question
var client = new ElasticClient();
var response = client.Search<object>(s => s
.Index("some_index")
.Aggregations(a => a
.Terms("Results", t => t
.Field("some_field")
.Aggregations(aa => aa
.Filter("foo", f => f
.Filter(q => q.MatchAll())
.Aggregations(aaa => aaa
.Filter("bar", ff => ff
.Filter(q => q.MatchAll())
.Aggregations(aaaa => aaaa
.ValueCount("customscore", vc => vc
.Field("some_other_field")
)
)
)
)
)
)
)
)
);
To get a collection of anonymous types would be
var kvs = response.Aggregations.Terms("Results").Buckets
.Select(b => new
{
key = b.Key,
value = b.Filter("foo").Filter("bar").ValueCount("customscore").Value
});
.Aggregations exposes methods that convert the IAggregate response to the expected type

Related

Elasticsearch NEST 2 How to correctly map and use nested classes and bulk index

I have three main questions I need help answering.
How do you correctly map and store a nested map?
How do you search a nested part of a document?
How do you bulk index?
I'm using Nest version 2 and have been looking over the new documentation which can be found Here. The documentation has been useful in creating certain parts of the code but unfortunately doesn't explain how they fit together.
Here is the class I'm trying to map.
[ElasticsearchType(Name = "elasticsearchproduct", IdProperty = "ID")]
public class esProduct
{
public int ID { get; set; }
[Nested]
public List<PriceList> PriceList { get; set; }
}
[ElasticsearchType(Name = "PriceList")]
public class PriceList
{
public int ID { get; set; }
public decimal Price { get; set; }
}
and my mapping code
var node = new Uri(HOST);
var settings = new ConnectionSettings(node).DefaultIndex("my-application");
var client = new ElasticClient(settings);
var map = new CreateIndexDescriptor("my-application")
.Mappings(ms => ms
.Map<esProduct>(m => m
.AutoMap()
.Properties(ps => ps
.Nested<PriceList>(n => n
.Name(c => c.PriceList)
.AutoMap()
)
)
)
);
var response = client.Index(map);
This is the response I get:
Valid NEST response built from a succesful low level call on POST: /my-application/createindexdescriptor
So that seems to work. next index.
foreach (DataRow dr in ProductTest.Tables[0].Rows)
{
int id = Convert.ToInt32(dr["ID"].ToString());
List<PriceList> PriceList = new List<PriceList>();
DataRow[] resultPrice = ProductPriceTest.Tables[0].Select("ID = " + id);
foreach (DataRow drPrice in resultPrice)
{
PriceList.Add(new PriceList
{
ID = Convert.ToInt32(drPrice["ID"].ToString()),
Price = Convert.ToDecimal(drPrice["Price"].ToString())
}
esProduct product = new esProduct
{
ProductDetailID = id,
PriceList = PriceList
};
var updateResponse = client.Update<esProduct>(DocumentPath<esProduct>.Id(id), descriptor => descriptor
.Doc(product)
.RetryOnConflict(3)
.Refresh()
);
var index = client.Index(product);
}
}
Again this seems to work but when I come to search it does seem to work as expected.
var searchResults = client.Search<esProduct>(s => s
.From(0)
.Size(10)
.Query(q => q
.Nested(n => n
.Path(p => p.PriceList)
.Query(qq => qq
.Term(t => t.PriceList.First().Price, 100)
)
)
));
It does return results but I was expecting
.Term(t => t.PriceList.First().Price, 100)
to look move like
.Term(t => t.Price, 100)
and know that is was searching the nested PriceList class, is this not the case?
In the new version 2 documentation I can't find the bulk index section. I tried using this code
var descriptor = new BulkDescriptor();
***Inside foreach loop***
descriptor.Index<esProduct>(op => op
.Document(product)
.Id(id)
);
***Outside foreach loop***
var result = client.Bulk(descriptor);
which does return a success response but when I search I get no results.
Any help would be appreciated.
UPDATE
After a bit more investigation on #Russ advise I think the error must be with my bulk indexing of a class with a nested object.
When I use
var index = client.Index(product);
to index each product I can use
var searchResults = client.Search<esProduct>(s => s
.From(0)
.Size(10)
.Query(q => q
.Nested(n => n
.Path(p => p.PriceList)
.Query(qq => qq
.Term(t => t.PriceList.First().Price, 100)
)
)
)
);
to search and return results, but when I bulk index this no long works but
var searchResults = client.Search<esProduct>(s => s
.From(0)
.Size(10)
.Query(q => q
.Term(t => t.PriceList.First().Price, 100)
)
);
will work, code b doesn't work on the individual index method. Does anyone know why this has happened?
UPDATE 2
From #Russ suggested I have taken a look at the mapping.
the code I'm using to index is
var map = new CreateIndexDescriptor(defaultIndex)
.Mappings(ms => ms
.Map<esProduct>(m => m
.AutoMap()
.Properties(ps => ps
.Nested<PriceList>(n => n
.Name(c => c.PriceList)
.AutoMap()
)
)
)
);
var response = client.Index(map);
Which is posting
http://HOST/fresh-application2/createindexdescriptor {"mappings":{"elasticsearchproduct":{"properties":{"ID":{"type":"integer"},"priceList":{"type":"nested","properties":{"ID":{"type":"integer"},"Price":{"type":"double"}}}}}}}
and on the call to http://HOST/fresh-application2/_all/_mapping?pretty I'm getting
{
"fresh-application2" : {
"mappings" : {
"createindexdescriptor" : {
"properties" : {
"mappings" : {
"properties" : {
"elasticsearchproduct" : {
"properties" : {
"properties" : {
"properties" : {
"priceList" : {
"properties" : {
"properties" : {
"properties" : {
"ID" : {
"properties" : {
"type" : {
"type" : "string"
}
}
},
"Price" : {
"properties" : {
"type" : {
"type" : "string"
}
}
}
}
},
"type" : {
"type" : "string"
}
}
},
"ID" : {
"properties" : {
"type" : {
"type" : "string"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
fresh-application2 returned mapping doesn't mention nested type at all, which I'm guessing is the issue.
The mapping my working nested query looks more like this
{
"my-application2" : {
"mappings" : {
"elasticsearchproduct" : {
"properties" : {
"priceList" : {
"type" : "nested",
"properties" : {
"ID" : {
"type" : "integer"
},
"Price" : {
"type" : "double"
}
}
},
"ID" : {
"type" : "integer"
},
}
}
}
}
}
This has the nested type returned. I think the one which isn't returning nested as a type is when I started using .AutoMap() , am I using it correctly?
UPDATE
I have fixed my mapping problem. I have changed my mapping code to
var responseMap = client.Map<esProduct>(ms => ms
.AutoMap()
.Properties(ps => ps
.Nested<PriceList>(n => n
.Name(c => c.PriceList)
.AutoMap()
)
)
);
Whilst you're developing, I would recommend logging out requests and responses to Elasticsearch so you can see what is being sent when using NEST; this'll make it easier to relate to the main Elasticsearch documentation and also ensure that the body of the requests and responses match your expectations (for example, useful for mappings, queries, etc).
The mappings that you have look fine, although you can forgo the attributes since you are using fluent mapping; there's no harm in having them there but they are largely superfluous (the type name for the esProduct is the only part that will apply) in this case because .Properties() will override inferred or attribute based mapping that is applied from calling .AutoMap().
In your indexing part, you update the esProduct and then immediately after that, index the same document again; I'm not sure what the intention is here but the update call looks superfluous to me; the index call will overwrite the document with the given id in the index straight after the update (and will be visible in search results after the refresh interval). The .RetryOnConflict(3) on the update will use optimistic concurrency control to perform the update (which is effectively a get then index operation on the document inside of the cluster, that will try 3 times if the version of the document changes in between the get and index). If you're replacing the whole document with an update i.e. not a partial update then the retry on conflict is not really necessary (and as per previous note, the update call in your example looks unnecssary altogether since the index call is going to overwrite the document with the given id in the index).
The nested query looks correct; You specify the path to the nested type and then the query to a field on the nested type will also include the path.I'll update the NEST nested query usage documentation to better demonstrate.
The bulk call looks fine; you may want to send documents in batches e.g. bulk index 500 documents at a time, if you need to index a lot of documents. How many to send in one bulk call is going to depend on a number of factors including the document size, how it is analyzed, performance of the cluster, so will need to experiment to get a good bulk size call for your circumstances.
I'd check to make sure that you are hitting the right index, that the index contains the count of documents that you expect and find a document that you know has a PriceList.Price of 100 and see what is indexed for it. It might be quicker to do this using Sense while you're getting up an running.

LINQ to JSON - Querying an array

I need to select users that have a "3" in their json array.
{
"People":[
{
"id" : "123",
"firstName" : "Bill",
"lastName" : "Gates",
"roleIds" : {
"int" : ["3", "9", "1"]
}
},
{
"id" : "456",
"firstName" : "Steve",
"lastName" : "Jobs",
"roleIds" : {
"int" : ["3", "1"]
}
},
{
"id" : "789",
"firstName" : "Elon",
"lastName" : "Musk",
"roleIds" : {
"int" : ["3", "7"]
}
},
{
"id" : "012",
"firstName" : "Agatha",
"lastName" : "Christie",
"roleIds" : {
"int" : "2"
}
}
]}
In the end, my results should be Elon Musk & Steve Jobs. This is the code that I used (& other variations):
var roleIds = pplFeed["People"]["roleIds"].Children()["int"].Values<string>();
var resAnAssocInfo = pplFeed["People"]
.Where(p => p["roleIds"].Children()["int"].Values<string>().Contains("3"))
.Select(p => new
{
id = p["id"],
FName = p["firstName"],
LName = p["lastName"]
}).ToList();
I'm getting the following error:
"Accessed JArray values with invalid key value: "roleIds". Int32 array index expected"
I changed .Values<string>() to .Values<int>() and still no luck.
What am I doing wrong?
You are pretty close. Change your Where clause from this:
.Where(p => p["roleIds"].Children()["int"].Values<string>().Contains("3"))
to this:
.Where(p => p["roleIds"]["int"].Children().Contains("3"))
and you will get you the result you want (although there are actually three users in your sample data with a role id of "3", not two).
However, there's another issue that you might hit for which this code still won't work. You'll notice that for Agatha Christie, the value of int is not an array like the others, it is a simple string. If the value will sometimes be an array and sometimes not, then you need a where clause that can handle both. Something like this should work:
.Where(p => p["roleIds"]["int"].Children().Contains(roleId) ||
p["roleIds"]["int"].ToString() == roleId)
...where roleId is a string containing the id you are looking for.
Fiddle: https://dotnetfiddle.net/Zr1b6R
The problem is that not all objects follow the same interface. The last item in that list has a single string value in the roleIds.int property while all others has an array. You need to normalize that property and then do the check. It'll be easiest if they were all arrays.
You should be able to do this:
var roleId = "3";
var query =
from p in pplFeed["People"]
let roleIds = p.SelectToken("roleIds.int")
let normalized = roleIds.Type == JTokenType.Array ? roleIds : new JArray(roleIds)
where normalized.Values().Contains(roleId)
select new
{
id = p["id"],
FName = p["firstName"],
LName = p["lastName"],
};

Mongo Group Query using the Ruby driver

I've got a working Mongo query that I need to translate into Ruby:
var reducer = function(current, result){
result.loginsCount++;
result.lastLoginTs = Math.max(result.lastLoginTs, current.timeStamp);
}
var finalizer = function(result){
result.lastLoginDate = new Date(result.lastLoginTs).toISOString().split('T')[0];
}
db.audit_log.group({
key : {user : true},
cond : {events : { $elemMatch : { action : 'LOGIN_SUCCESS'}}},
initial : {lastLoginTs : -1, loginsCount : 0},
reduce : reducer,
finalize : finalizer
})
I'm hitting several sticking points getting this to work in Ruby. I'm not really all that familiar with Mongo, and I'm not sure what to pass as arguments to the method calls. This is my best guess, after connecting to the database and a collection called audit_log:
audit_log.group({
"key" => {"user" => "true"},
"cond" => {"events" => { "$elemMatch" => { "action" => "LOGIN_SUCCESS"}}},
"initial" => {"lastLoginTs" => -1, "loginsCount" => 0},
"reduce" => "function(current, result){result.loginsCount += 1}",
"finalize" => "function(result){ result.lastLoginDate = new Date(result.lastLoginTs).toISOString().split('T')[0]; }
})
Or something like that. I've tried using a simpler aggregate operation using the Mongo docs, but I couldn't get that working, either. I was only able to get really simple queries to return results. Are those keys (key, cond, initial, etc.) even necessary, or is that only for JavaScript?
This is how the function finally took shape using the 1.10.0 Mongo gem:
#db.collection("audit_log").group(
[:user, :events],
{'events' => { '$elemMatch' => { 'action' => 'LOGIN_SUCCESS' }}},
{ 'lastLoginTs' => -1, 'loginsCount' => 0 },
"function(current, result){ result.loginsCount++; result.lastLoginTs = Math.max(result.lastLoginTs, current.timeStamp);}",
"function(result){ result.lastLoginDate = new Date(result.lastLoginTs).toISOString().split('T')[0];}"
)
With the Mongo Driver, you leave off the keys: "key", "cond", "initial", "reduce", "finalize" and simply pass in the respective values.
I've linked to two approaches taken by other SO users here and here.

Compare three arrays of hashes and get the result without duplicates in ruby?

I m using the fql gem to retrieve the data from facebook. The original array of hashes is like this. Here. When i compare these three arrays of hashes then i want to get the final result in this way:
{
"photo" => [
[0] {
"owner" : "1105762436",
"src_big" : "https://fbcdn-sphotos-b-a.akamaihd.net/hphotos-ak-xap1/t31.0-8/q71/s720x720/10273283_10203050474118531_5420466436365792507_o.jpg",
"caption" : "Rings...!!\n\nView Full Screen.",
"created" : 1398953040,
"modified" : 1398953354,
"like_info" : {
"can_like" : true,
"like_count" : 22,
"user_likes" : true
},
"comment_info" : {
"can_comment" : true,
"comment_count" : 2,
"comment_order" : "chronological"
},
"object_id" : "10203050474118531",
"pid" : "4749213500839034982"
}
],
"comment" => [
[0] {
"text" : "Wow",
"text_tags" : [],
"time" : 1398972853,
"likes" : 1,
"fromid" : "100001012753267",
"object_id" : "10203050474118531"
},
[1] {
"text" : "Woww..",
"text_tags" : [],
"time" : 1399059923,
"likes" : 0,
"fromid" : "100003167704574",
"object_id" : "10203050474118531"
}
],
"users" =>[
[0] {
"id": "1105762436",
"name": "Nilanjan Joshi",
"username": "NilaNJan219"
},
[1] {
"id": "1105762436",
"name": "Ashish Joshi",
"username": "NilaNJan219"
}
]
}
Here is my attempt:
datas = File.read('source2.json')
all_data = JSON.parse(datas)
photos = all_data[0]['fql_result_set'].group_by{|x| x['object_id']}.to_a
comments = all_data[1]['fql_result_set'].group_by{|x| x['object_id']}.to_a
#photos_comments = []
#comments_users = []
#photo_users = []
photos.each do |a|
comments.each do |b|
if a.first == b.first
#photos_comments << {'photo' => a.last, 'comment' => b.last}
else
#comments_users << {'photo' => a.last, 'comment' => ''} unless #photos_comments.include? (a.last)
end
end
end
#photo_users = #photos_comments | #comments_users
#photo_comment_users = {photos_comments: #photo_users }
Here is what i'm getting final result
Still there are duplicates in the final array. I've grouped by the array by object id which is common between the photo and the comment array. But the problem it is only taking those photos which has comments. I'm not getting the way how to find out the photos which don't have the comments.
Also in order to find out the details of the person who has commented, ive users array and the common attribute between comments and users is fromid and id. I'm not able to understand how to get the user details also.
I think this is what you want:
photos = all_data[0]['fql_result_set']
comments = all_data[1]['fql_result_set'].group_by{|x| x['object_id']}
#photo_comment_users = photos.map do |p|
{ 'photo' => p, 'comment' => comments[p['object_id']] || '' }
end
For each photo it takes all the comments with the same object_id, or if none exist - returns ''.
If you want to connect the users too, you can map them by id, and select the relevant ones by the comment:
users = Hash[all_data[2]['fql_result_set'].map {|x| [x['id'], x]}]
#photo_comment_users = photos.map do |p|
{ 'photo' => p, 'comment' => comments[p['object_id']] || '',
'user' => (comments[p['object_id']] || []).map {|c| users[c['formid']]} }
end

Upsert Multiple Records with MongoDb

I'm trying to get MongoDB to upsert multiple records with the following query, ultimately using MongoMapper and the Mongo ruby driver.
db.foo.update({event_id: { $in: [1,2]}}, {$inc: {visit:1}}, true, true)
This works fine if all the records exist, but does not create new records for records that do not exist. The following command has the desired effect from the shell, but is probably not ideal from the ruby driver.
[1,2].forEach(function(id) {db.foo.update({event_id: id}, {$inc: {visit:1}}, true, true) });
I could loop through each id I want to insert from within ruby, but that would necessitate a trip to the database for each item. Is there a way to upsert multiple items from the ruby driver with only a single trip to the database? What's the best practice here? Using mongomapper and the ruby driver, is there a way to send multiple updates in a single batch, generating something like the following?
db.foo.update({event_id: 1}, {$inc: {visit:1}}, true); db.foo.update({event_id: 2}, {$inc: {visit:1}}, true);
Sample Data:
Desired data after command if two records exist.
{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 11 }
{ "_id" : ObjectId("4d6baf56c0d8bb8238d0209a"), "event_id" : 2, "visit" : 2 }
Actual data after command if two records exist.
{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 11 }
{ "_id" : ObjectId("4d6baf56c0d8bb8238d0209a"), "event_id" : 2, "visit" : 2 }
Desired data after command if only the record with event_id 1 exists.
{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 2 }
{ "_id" : ObjectId("4d6baf56c0d8bb8238d0209a"), "event_id" : 2, "visit" : 1 }
Actual data after command if only the record with event_id 1 exists.
{ "_id" : ObjectId("4d6babbac0d8bb8238d02099"), "event_id" : 1, "visit" : 2 }
This - correctly - will not insert any records with event_id 1 or 2 if they do not already exist
db.foo.update({event_id: { $in: [1,2]}}, {$inc: {visit:1}}, true, true)
This is because the objNew part of the query (see http://www.mongodb.org/display/DOCS/Updating#Updating-UpsertswithModifiers) does not have a value for field event_id. As a result, you will need at least X+1 trips to the database, where X is the number of event_ids, to ensure that you insert a record if one does not exist for a particular event_id (the +1 comes from the query above, which increases the visits counter for existing records). To say it in a different way, how does MongoDB know you want to use value 2 for the event_id and not 1? And why not 6?
W.r.t. batch insertion with ruby, I think it is possible as the following link suggests - although I've only used the Java driver: Batch insert/update using Mongoid?
What you are after is the Find and Modify command with the upsert option set to true. See the example from the Mongo test suite (same one linked to in the Find and Modify docs) for an example that looks very much like what you describe in your question.
I found a way to do this using the eval operator for server-side code execution. Here is the code snippit:
def batchpush(body, item_opts = {})
#batch << {
:body => body,
:duplicate_key => item_opts[:duplicate_key] || Mongo::Dequeue.generate_duplicate_key(body),
:priority => item_opts[:priority] || #config[:default_priority]
}
end
def batchprocess()
js = %Q|
function(batch) {
var nowutc = new Date();
var ret = [];
for(i in batch){
e = batch[i];
//ret.push(e);
var query = {
'duplicate_key': e.duplicate_key,
'complete': false,
'locked_at': null
};
var object = {
'$set': {
'body': e.body,
'inserted_at': nowutc,
'complete': false,
'locked_till': null,
'completed_at': null,
'priority': e.priority,
'duplicate_key': e.duplicate_key,
'completecount': 0
},
'$inc': {'count': 1}
};
db.#{collection.name}.update(query, object, true);
}
return ret;
}
|
cmd = BSON::OrderedHash.new
cmd['$eval'] = js
cmd['args'] = [#batch]
cmd['nolock'] = true
result = collection.db.command(cmd)
#batch.clear
#pp result
end
Multiple items are added with batchpush(), and then batchprocess() is called. The data is sent as an array, and the commands are all executed. This code is used in the MongoDequeue GEM, in this file.
Only one request is made, and all the upserts happen server-side.

Resources