NEST Aggregate similar to SQL Group By - elasticsearch

This is my class when inserting to ES
public class BasicDoc
{
public string Name { get; set; }
public string Url { get; set; }
}
I managed successfully insert my document to ES using NEST. But I'm having trouble to do a aggregation. My goals is to have something similar to SQL Group By. What I did so far:
var response = elastic.Search<BasicDoc>(s => s
.Aggregations(a => a
.Terms("group_by_url", st => st
.Field(o => o.Url)
))
);
I tried to aggregate my document based on BasicDoc.Url. Say I have these in my ES:
/api/call1/v1
/api/call2/v1
/api/call1/v1
When I debug, I my Nest.BucketAggregate will have 4 Items key which is api,call1, call2 and v1. I was expecting only 2 which are /api/call1/v1 and /api/call2/v1. What I'm doing wrong?

You currently have analysis set up on your Url property which means that it will be tokenized by the standard analyzer and terms stored in the inverted index. If you need to be able to search on Uri and also need to aggregate on it, then you may consider mapping it as a multi_field where one field mapping analyzes it and another does not. Here's an example index creation with mapping
client.CreateIndex("index-name", c => c
.Mappings(m => m
.Map<BasicDoc>(mm => mm
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => n.Url)
.Fields(f => f
.String(ss => ss
.Name("raw")
.NotAnalyzed()
)
)
)
)
)
)
);
When you perform your aggregation, you can now use the Uri raw field
var response = client.Search<BasicDoc>(s => s
.Size(0)
.Aggregations(a => a
.Terms("group_by_url", st => st
.Field(o => o.Url.Suffix("raw"))
)
)
);

Related

How can I execute ElasticSearch query on multiple indices with nested mapping

I have two indices with the following configuration with mappings
var settings = new ConnectionSettings(new Uri("http://localhost:9200/"));
settings
.DefaultMappingFor<ManagementIndex>(m => m
.IndexName("management")
)
.DefaultMappingFor<PropertyIndex>(m => m
.IndexName("apartmentproperty")
);
var client = new ElasticClient(settings);
1) Properties mapping
client.Indices.Create("property", i => i
.Settings(s => s
.NumberOfShards(2)
.NumberOfReplicas(0)
)
.Map<PropertyIndex>(map => map
.AutoMap()
.Properties(p => p
.Nested<PropertyData>(n => n
.Name(c => c.property)
.AutoMap()
.Properties(pp => pp
.Text(c => c
.Name(np => np.city)
.Analyzer("standard")
)
.Text(c => c
.Name(np => np.market)
.Fields(ff => ff
.Text(tt => tt
.Name(np => np.market)
.Analyzer("standard")
)
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
)
).Text(c => c
.Name(np => np.name)
.Analyzer("standard")
)
)
)
)
)
);
and
2) Owner
if (client.Indices.Exists("owner").Exists)
client.Indices.Delete("owner");
client.Indices.Create("owner", i => i
.Settings(s => s
.NumberOfShards(2)
.NumberOfReplicas(0)
)
.Map<OwnerIndex>(map => map
.AutoMap()
.Properties(p => p
.Nested<OwnerProp>(n => n
.Name(c => c.owner)
.AutoMap()
.Properties(pp => pp
.Text(c => c
.Name(np => np.market)
.Fields(ff => ff
.Text(tt => tt
.Name(np => np.market)
.Analyzer("standard")
)
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
)
).Text(c => c
.Name(np => np.name)
.Analyzer("standard")
)
)
)
)
)
);
with the following POCO definitions
public class PropertyData
{
public string name { get; set; }
public string city { get; set; }
public string market { get; set; }
}
public class PropertyIndex
{
public PropertyData property { get; set; }
}
public class OwnerProp
{
public string name { get; set; }
public string market { get; set; }
}
public class OwnerIndex
{
public OwnerProp owner { get; set; }
}
Trying to do a search through the two indices like so
public async Task<object> SearchPropertiesAsync(string searchQuery, List<string> description, int limit = 25, int skip = 1)
{
var propertyfilters = new List<Func<QueryContainerDescriptor<object>, QueryContainer>>();
var ownerFilters = new List<Func<QueryContainerDescriptor<object>, QueryContainer>>();
if (description.Any())
{
propertyfilters.Add(fq => fq.Terms(t => t.Field("property.market.keyword").Terms(description)));
ownerFilters.Add(fq => fq.Terms(t => t.Field("owner.market.keyword").Terms(description)));
}
var searchResponse = await _elasticClient.SearchAsync<object>(s => s
.Index(Indices.Index(typeof(PropertyIndex)).And(typeof(OwnerIndex)))
.Query(q => (q
.Nested(n => n
.Path(Infer.Field<PropertyIndex>(ff => ff.property))
.Query(nq => nq
.MultiMatch(m => m
.Fields(f => f
.Field(Infer.Field<PropertyIndex>(ff => ff.property.city))
.Field(Infer.Field<PropertyIndex>(ff => ff.property.market))
.Field(Infer.Field<PropertyIndex>(ff => ff.property.name))
)
.Operator(Operator.Or)
.Query(searchQuery)
.Fuzziness(Fuzziness.Auto)
) && +q.Bool(bq => bq.Filter(propertyfilters))
))
) || (q
.Nested(n => n
.Path(Infer.Field<OwnerIndex>(ff => ff.mgmt))
.Query(nq => nq
.MultiMatch(m => m
.Fields(f => f
.Field(Infer.Field<OwnerIndex>(ff => ff.owner.market))
.Field(Infer.Field<OwnerIndex>(ff => ff.owner.name))
)
.Operator(Operator.Or)
.Query(searchQuery)
.Fuzziness(Fuzziness.Auto)
)
&& +q.Bool(bq => bq.Filter(ownerFilters))
))
)
).From((skip - 1) * limit)
.Size(limit)
);
return searchResponse.Documents;
}
calling the SearchPropertiesAsync method returns this error messages (truncated for brevity)
....
"index": "owner",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [property]"
}
....
"index": "property",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [owner]"
}
Notice that it looks like its trying to perform a nested search of owner. on property index and a nested search of property. on owner index which doesnt exist.
I feel like this should be a very trivial problem but I have been using ElasticSearch for only 4days now and still very new into it.
Is there something I am doing wrongly or is there something I am missing. Have searched the whole internet to even arrive at the solution I have at the moment.
Note that when you executed the nested query only one index at a time, the code works fine but trying to execute on multiple Indices is where my problem lies. Any help will be highly appreciated.
I am using ElasticSearch 7.3.2 and Nest Client 7.3.0.
I don't mind downgrading to a lower version that works.
Apparently, according to the docs
ignore_unmapped
(Optional, boolean) Indicates whether to ignore an unmapped path and not return any documents instead of an error. Defaults to false.
If false, Elasticsearch returns an error if the path is an unmapped field.
You can use this parameter to query multiple indices that may not contain the field path.
So chaining .IgnoreUnmapped(true) on the query body for each of the nested query solved the problem.
Just in case someone else encounter same problem

Elasticsearch nested aggregation with nested object using NEST

I am trying to do an aggregation on a nested object. The following is my json. The first code sample below successfully returns the productCategory Id. However, I want to return the category id and name in the aggregation. I thought I could try the second code sample below but it doesn't work.
"productCategories": [{
"id":6,
"productId":6,
"categoryId":4,
"category":{
"parentId":2,
"name":"Air Fresheners",
"id":6
}
}]
This one aggregates the productCategory id as the key:
.Aggregations(aggs => aggs
.Nested("agg-categories", nested => nested
.Path(p => p.ProductCategories)
.Aggregations(r => r
.Terms("agg-category", w => w
.Field(f => f.ProductCategories.First().Id)
)
)
)
)
But I need the category info, and this one doesn't work:
.Aggregations(aggs => aggs
.Nested("agg-categories", nested => nested
.Path(p => p.ProductCategories.First().Category)
.Aggregations(r => r
.Terms("agg-category", w => w
.Field(f => f.ProductCategories.First().Category.Id)
)
)
)
)
If category is simply mapped as object, then the following will work
var searchResponse = client.Search<Document>(s => s
.Aggregations(aggs => aggs
.Nested("agg-categories", nested => nested
.Path(p => p.ProductCategories)
.Aggregations(r => r
.Terms("agg-category", w => w
.Field(f => f.ProductCategories.First().Category.Id)
)
)
)
)
);

ElasticSearch C# client (NEST): Filtering results with ES 5.5.0

This was my code in the earlier version of ES it used to work. After moving to ES 5.5. It has stopped working and it gives a compiler error.
Error: 'QueryStringQueryDescriptor' does not contain a definition for 'OnFields' and no extension method 'OnFields' accepting a first argument of type 'QueryStringQueryDescriptor'
Below is my code snippet...
public List<EmployeeInfo> SearchText2(string query, List<string> sendersList, int page = 0, int pageSize = 50)
{
try
{
var result = this.client.Search<EmployeeInfo>(s => s
.From(page * pageSize)
.Size(int.MaxValue)
.Query(q => q
.QueryString(qs => qs.Query(query).UseDisMax()
.OnFields(b => b.Subject)
.OnFields(b => b.Body)
))
.SortDescending(f => f.ReceivedTime)
.Filter(f => f.Terms(ak => ak.SenderName, sendersList))
);
...
// Some code here
}
Any tips on how to make this work will be great.
In latest version of Nest library there are some API changes
Instead of OnFields in QueryString you should use Fields
QueryString(qs => qs.Query(string.Empty).UseDisMax()
.Fields(descriptor => descriptor.Fields(b => b.Subject, b => b.Body))
))
Instead of SortDescending you should use Sort
.Sort(descriptor => descriptor.Field(f => f.ReceivedTime, SortOrder.Descending))
Also the filters are not available in elasticsearch starting from version 5 and you should use bool query with filter
Query(descriptor =>
descriptor.Bool(boolQuery =>
boolQuery
.Must(query => query.MatchAll())
.Filter(f => f.Terms(ak => ak.SenderName, sendersList)
)
)
)

Elasticsearch Aggregation on objects by query on other documents

Lets say i have an index that contains documents that represent a Message in a discussion.
that document owns a discussionId property.
(it also has its own ID "that represent MessageId")
now, i need to find all discussionIds that have no documents (messages) that match a query.
for example:
"Find all discussionIds , that have no message that contains the text 'YO YO'"
how can i do that?
the class is similar to this:
public class Message
{
public string Id{get;set}
public string DiscussionId {get;set}
public string Text{get;set}
}
You just need to wrap the query that would find matches for the phrase "YO YO" in a bool query must_not clause.
With NEST
client.Search<Message>(s => s
.Query(q => q
.Bool(b => b
.MustNot(mn => mn
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
)
)
);
which, with operator overloading, can be shortened to
client.Search<Message>(s => s
.Query(q => !q
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
);
Both produce the query
{
"query": {
"bool": {
"must_not": [
{
"match": {
"text": {
"type": "phrase",
"query": "YO YO"
}
}
}
]
}
}
}
To only return DiscussionId values, you can use source filtering
client.Search<Message>(s => s
.Source(sf => sf
.Includes(f => f
.Field(ff => ff.DiscussionId)
)
)
.Query(q => !q
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
);
And, if you want to get them all, you can use the scroll API
var searchResponse = client.Search<Message>(s => s
.Scroll("1m")
.Source(sf => sf
.Includes(f => f
.Field(ff => ff.DiscussionId)
)
)
.Query(q => !q
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
);
// fetch the next batch of documents, using the scroll id returned from
// the previous call. Do this in a loop until no more docs are returned.
searchResponse = client.Scroll<Message>("1m", searchResponse.ScrollId);

NEST 2.x Terms Query cannot accept 2 arguments

How NEST 1.x expression below could be rewritten to NEST 2.x or 5.x
var searchResult = _elasticClient.Search<SearchResult>(
request => request
.MinScore(0.7)
.Query(q =>
{
QueryContainer query = null;
query &= q.Terms<int>(t => t.Categories
.SelectMany(s => s.ChildCategories.Select(c => c.Id))
.ToArray(),
categories.Select(c => Convert.ToInt32(c)));
to accept List() which contains elements on what ids elastic search query should match
query &= q.Terms(c => c.Field(t => t.Categories.SelectMany(s => s.ChildCategories.Select(d => d.Id))));
This line will below complain about Terms has 1 parameter, but invoked with 2
query &= q.Terms(c => c.Field(t => t.Categories.SelectMany(s => s.ChildCategories.Select(d => d.Id))), new List<int> {1});
UPDATE:
The last example on elasticsearch documentation for 1.X contains field and
qff.Terms(p => p.Country, userInput.Countries) which I want to achieve in NEST 5.x or 2.x
Take a look at the Terms query documentation. A terms query needs a field that contains the term(s) to match and the collection of terms to match against.
field to match can be specified using .Field(), which can take anything from which a Field can be inferred, including a string or a Lambda expression.
values to match against can be specified using .Terms(), which is a collection of terms.
Given the following POCO
public class Project
{
public IEnumerable<string> Tags { get; set; }
}
A terms query on the tags field would be
var searchResponse = client.Search<Project>(s => s
.Query(q => q
.Terms(t => t
.Field(f => f.Tags)
.Terms("tag1", "tag2", "tag3")
)
)
);

Resources