Elasticsearch Aggregation on objects by query on other documents - elasticsearch

Lets say i have an index that contains documents that represent a Message in a discussion.
that document owns a discussionId property.
(it also has its own ID "that represent MessageId")
now, i need to find all discussionIds that have no documents (messages) that match a query.
for example:
"Find all discussionIds , that have no message that contains the text 'YO YO'"
how can i do that?
the class is similar to this:
public class Message
{
public string Id{get;set}
public string DiscussionId {get;set}
public string Text{get;set}
}

You just need to wrap the query that would find matches for the phrase "YO YO" in a bool query must_not clause.
With NEST
client.Search<Message>(s => s
.Query(q => q
.Bool(b => b
.MustNot(mn => mn
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
)
)
);
which, with operator overloading, can be shortened to
client.Search<Message>(s => s
.Query(q => !q
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
);
Both produce the query
{
"query": {
"bool": {
"must_not": [
{
"match": {
"text": {
"type": "phrase",
"query": "YO YO"
}
}
}
]
}
}
}
To only return DiscussionId values, you can use source filtering
client.Search<Message>(s => s
.Source(sf => sf
.Includes(f => f
.Field(ff => ff.DiscussionId)
)
)
.Query(q => !q
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
);
And, if you want to get them all, you can use the scroll API
var searchResponse = client.Search<Message>(s => s
.Scroll("1m")
.Source(sf => sf
.Includes(f => f
.Field(ff => ff.DiscussionId)
)
)
.Query(q => !q
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
);
// fetch the next batch of documents, using the scroll id returned from
// the previous call. Do this in a loop until no more docs are returned.
searchResponse = client.Scroll<Message>("1m", searchResponse.ScrollId);

Related

How can I execute ElasticSearch query on multiple indices with nested mapping

I have two indices with the following configuration with mappings
var settings = new ConnectionSettings(new Uri("http://localhost:9200/"));
settings
.DefaultMappingFor<ManagementIndex>(m => m
.IndexName("management")
)
.DefaultMappingFor<PropertyIndex>(m => m
.IndexName("apartmentproperty")
);
var client = new ElasticClient(settings);
1) Properties mapping
client.Indices.Create("property", i => i
.Settings(s => s
.NumberOfShards(2)
.NumberOfReplicas(0)
)
.Map<PropertyIndex>(map => map
.AutoMap()
.Properties(p => p
.Nested<PropertyData>(n => n
.Name(c => c.property)
.AutoMap()
.Properties(pp => pp
.Text(c => c
.Name(np => np.city)
.Analyzer("standard")
)
.Text(c => c
.Name(np => np.market)
.Fields(ff => ff
.Text(tt => tt
.Name(np => np.market)
.Analyzer("standard")
)
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
)
).Text(c => c
.Name(np => np.name)
.Analyzer("standard")
)
)
)
)
)
);
and
2) Owner
if (client.Indices.Exists("owner").Exists)
client.Indices.Delete("owner");
client.Indices.Create("owner", i => i
.Settings(s => s
.NumberOfShards(2)
.NumberOfReplicas(0)
)
.Map<OwnerIndex>(map => map
.AutoMap()
.Properties(p => p
.Nested<OwnerProp>(n => n
.Name(c => c.owner)
.AutoMap()
.Properties(pp => pp
.Text(c => c
.Name(np => np.market)
.Fields(ff => ff
.Text(tt => tt
.Name(np => np.market)
.Analyzer("standard")
)
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
)
).Text(c => c
.Name(np => np.name)
.Analyzer("standard")
)
)
)
)
)
);
with the following POCO definitions
public class PropertyData
{
public string name { get; set; }
public string city { get; set; }
public string market { get; set; }
}
public class PropertyIndex
{
public PropertyData property { get; set; }
}
public class OwnerProp
{
public string name { get; set; }
public string market { get; set; }
}
public class OwnerIndex
{
public OwnerProp owner { get; set; }
}
Trying to do a search through the two indices like so
public async Task<object> SearchPropertiesAsync(string searchQuery, List<string> description, int limit = 25, int skip = 1)
{
var propertyfilters = new List<Func<QueryContainerDescriptor<object>, QueryContainer>>();
var ownerFilters = new List<Func<QueryContainerDescriptor<object>, QueryContainer>>();
if (description.Any())
{
propertyfilters.Add(fq => fq.Terms(t => t.Field("property.market.keyword").Terms(description)));
ownerFilters.Add(fq => fq.Terms(t => t.Field("owner.market.keyword").Terms(description)));
}
var searchResponse = await _elasticClient.SearchAsync<object>(s => s
.Index(Indices.Index(typeof(PropertyIndex)).And(typeof(OwnerIndex)))
.Query(q => (q
.Nested(n => n
.Path(Infer.Field<PropertyIndex>(ff => ff.property))
.Query(nq => nq
.MultiMatch(m => m
.Fields(f => f
.Field(Infer.Field<PropertyIndex>(ff => ff.property.city))
.Field(Infer.Field<PropertyIndex>(ff => ff.property.market))
.Field(Infer.Field<PropertyIndex>(ff => ff.property.name))
)
.Operator(Operator.Or)
.Query(searchQuery)
.Fuzziness(Fuzziness.Auto)
) && +q.Bool(bq => bq.Filter(propertyfilters))
))
) || (q
.Nested(n => n
.Path(Infer.Field<OwnerIndex>(ff => ff.mgmt))
.Query(nq => nq
.MultiMatch(m => m
.Fields(f => f
.Field(Infer.Field<OwnerIndex>(ff => ff.owner.market))
.Field(Infer.Field<OwnerIndex>(ff => ff.owner.name))
)
.Operator(Operator.Or)
.Query(searchQuery)
.Fuzziness(Fuzziness.Auto)
)
&& +q.Bool(bq => bq.Filter(ownerFilters))
))
)
).From((skip - 1) * limit)
.Size(limit)
);
return searchResponse.Documents;
}
calling the SearchPropertiesAsync method returns this error messages (truncated for brevity)
....
"index": "owner",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [property]"
}
....
"index": "property",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [owner]"
}
Notice that it looks like its trying to perform a nested search of owner. on property index and a nested search of property. on owner index which doesnt exist.
I feel like this should be a very trivial problem but I have been using ElasticSearch for only 4days now and still very new into it.
Is there something I am doing wrongly or is there something I am missing. Have searched the whole internet to even arrive at the solution I have at the moment.
Note that when you executed the nested query only one index at a time, the code works fine but trying to execute on multiple Indices is where my problem lies. Any help will be highly appreciated.
I am using ElasticSearch 7.3.2 and Nest Client 7.3.0.
I don't mind downgrading to a lower version that works.
Apparently, according to the docs
ignore_unmapped
(Optional, boolean) Indicates whether to ignore an unmapped path and not return any documents instead of an error. Defaults to false.
If false, Elasticsearch returns an error if the path is an unmapped field.
You can use this parameter to query multiple indices that may not contain the field path.
So chaining .IgnoreUnmapped(true) on the query body for each of the nested query solved the problem.
Just in case someone else encounter same problem

Async way of implementing search query in Elastic Search Nest Client .NET

I have implemented a Search Query through NEST client and was able to get the records. The code is as follows.
var response = clientProvider.Client.Search<ProjectModel>(s => s
.Index("project_index")
.Type("projects")
.Source(so => so.Excludes(f => f.Field(x => x.FileInfo.FileBase64Data)))
.Size(100)
.Query(q => q
.Bool(b => b
.Should(
m => m.QueryString(qs => qs
.Query(searchOptions.SearchTerm)
.Fields(ff => ff.Fields(fields))
.Fuzziness(Fuzziness.Auto)
),
m => m.MultiMatch(qs => qs
.Query(searchOptions.SearchTerm)
.Type(Nest.TextQueryType.PhrasePrefix)
.Fields(ff => ff.Fields(fields))
)
)
)
)
.Sort(ss => ss.Descending(SortSpecialField.Score))
);
And I am mapping the response to my Project Model as follows.
var project = response.Hits.Select(h =>
{
h.Source._id = h.Id;
h.Source.Score = h.Score;
return h.Source;
}).ToList();
When I am trying to implement the same Search in Async way that is
var response = clientProvider.Client.SearchAsync<ProjectModel>(s => s
.Index("project_index")
.Type("projects")
.Source(so => so.Excludes(f => f.Field(x => x.FileInfo.FileBase64Data)))
.Size(100)
.Query(q => q
.Bool(b => b
.Should(
m => m.QueryString(qs => qs
.Query(searchOptions.SearchTerm)
.Fields(ff => ff.Fields(fields))
.Fuzziness(Fuzziness.Auto)
),
m => m.MultiMatch(qs => qs
.Query(searchOptions.SearchTerm)
.Type(Nest.TextQueryType.PhrasePrefix)
.Fields(ff => ff.Fields(fields))
)
)
)
)
.Sort(ss => ss.Descending(SortSpecialField.Score))
);
I am not getting any errors while executing it. But I am not able to get the response.Hits objects to map it back to my original Project Model.
Thanks In advance
In SearchAsync<T>(), response is Task<ISearchResponse<T>>, so you probably want to await it
Map all documents, found in products index, onto ProductDto type:
var result = await _elasticClient.SearchAsync<ProductDto>(x => x.Index("products").MatchAll());
var documents = result.Documents;

Elasticsearch nested aggregation with nested object using NEST

I am trying to do an aggregation on a nested object. The following is my json. The first code sample below successfully returns the productCategory Id. However, I want to return the category id and name in the aggregation. I thought I could try the second code sample below but it doesn't work.
"productCategories": [{
"id":6,
"productId":6,
"categoryId":4,
"category":{
"parentId":2,
"name":"Air Fresheners",
"id":6
}
}]
This one aggregates the productCategory id as the key:
.Aggregations(aggs => aggs
.Nested("agg-categories", nested => nested
.Path(p => p.ProductCategories)
.Aggregations(r => r
.Terms("agg-category", w => w
.Field(f => f.ProductCategories.First().Id)
)
)
)
)
But I need the category info, and this one doesn't work:
.Aggregations(aggs => aggs
.Nested("agg-categories", nested => nested
.Path(p => p.ProductCategories.First().Category)
.Aggregations(r => r
.Terms("agg-category", w => w
.Field(f => f.ProductCategories.First().Category.Id)
)
)
)
)
If category is simply mapped as object, then the following will work
var searchResponse = client.Search<Document>(s => s
.Aggregations(aggs => aggs
.Nested("agg-categories", nested => nested
.Path(p => p.ProductCategories)
.Aggregations(r => r
.Terms("agg-category", w => w
.Field(f => f.ProductCategories.First().Category.Id)
)
)
)
)
);

elasticsearch only show where nested object has no values

I have the following structure (simplified):
{
"id": 100,
"vendorStatuses": [
{
"id": 200,
"status": "Open"
}
]
}
What I want to find is records where there are no vendor statuses. We recently upgraded from elasticseach 1.x to 5.x and I'm having trouble converting to get this functionality back.
My old Nest query looked like this:
!Filter<PurchaseOrder>.Nested(nfd => nfd.Path(x => x.VendorStatuses.First())
.Filter(f2 => f2.Missing(y => y.Id)));
The new query (now that Missing isn't available) looks like this so far:
Query<PurchaseOrder>
.Bool(z => z
.MustNot(a => a
.Exists(t => t
.Field(f => f.VendorStatuses)
)
)
);
Which generates this:
GET purchaseorder/_search
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "vendorStatuses",
}
}
]
}
}
}
But I'm still seeing results that have vendorStatuses records.
What am I doing wrong? I've tried searching for vendorStatuses.id or other fields, but it's not working. When I try to reverse the logic and do a must i see no results. I also tried doing it as a nested but couldn't get any closer with that.
The query using must_not and exists is not a nested query like the 1.x query. I think you're looking for something like
var query = Query<PurchaseOrder>
.Bool(z => z
.MustNot(a => a
.Nested(n => n
.Path(p => p.VendorStatuses)
.Query(nq => nq
.Exists(t => t
.Field(f => f.VendorStatuses)
)
)
)
)
);
client.Search<PurchaseOrder>(s => s.Query(_ => query));
which yields
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"query": {
"exists": {
"field": "vendorStatuses"
}
},
"path": "vendorStatuses"
}
}
]
}
}
}
You can use operator overloading to make the query more succinct too
var query = !Query<PurchaseOrder>
.Nested(n => n
.Path(p => p.VendorStatuses)
.Query(nq => nq
.Exists(t => t
.Field(f => f.VendorStatuses)
)
)
);
I found a workaround that is far from ideal in my opinion. I created a new property on my PurchaseOrder model for NumberOfStatuses, then I just do a term search on that for value of 0.
public int NumberOfStatuses => VendorStatuses.OrEmptyIfNull().Count();
Query<PurchaseOrder>.Term(t => t.Field(po => po.NumberOfStatuses).Value(0));

NEST Aggregate similar to SQL Group By

This is my class when inserting to ES
public class BasicDoc
{
public string Name { get; set; }
public string Url { get; set; }
}
I managed successfully insert my document to ES using NEST. But I'm having trouble to do a aggregation. My goals is to have something similar to SQL Group By. What I did so far:
var response = elastic.Search<BasicDoc>(s => s
.Aggregations(a => a
.Terms("group_by_url", st => st
.Field(o => o.Url)
))
);
I tried to aggregate my document based on BasicDoc.Url. Say I have these in my ES:
/api/call1/v1
/api/call2/v1
/api/call1/v1
When I debug, I my Nest.BucketAggregate will have 4 Items key which is api,call1, call2 and v1. I was expecting only 2 which are /api/call1/v1 and /api/call2/v1. What I'm doing wrong?
You currently have analysis set up on your Url property which means that it will be tokenized by the standard analyzer and terms stored in the inverted index. If you need to be able to search on Uri and also need to aggregate on it, then you may consider mapping it as a multi_field where one field mapping analyzes it and another does not. Here's an example index creation with mapping
client.CreateIndex("index-name", c => c
.Mappings(m => m
.Map<BasicDoc>(mm => mm
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => n.Url)
.Fields(f => f
.String(ss => ss
.Name("raw")
.NotAnalyzed()
)
)
)
)
)
)
);
When you perform your aggregation, you can now use the Uri raw field
var response = client.Search<BasicDoc>(s => s
.Size(0)
.Aggregations(a => a
.Terms("group_by_url", st => st
.Field(o => o.Url.Suffix("raw"))
)
)
);

Resources