Filtering DSL Query Search - Elasticsearch - elasticsearch

I was reading a few articles and documents over query context and filter context and learned that it is always best to use filter context if you do not need to do a full text search or scoring does not matter. In my case, I am wanting to return the logs that contain an ID... so I then realized I should just use a filter context instead of a query context. Besides the full text search or scoring is there a hard base line that defines when you should use one over the other?
So I went from my original DSL query search cmd:
GET /customer-simulation-es-app-logs*/_search
{
"query": {
"match": {
"_id": "mJvG0nkBiU3wk_hrEd-8"
}
}
to the filter context:
GET /customer-simulation-es-app-logs*/_search
{
"query": {
"bool": {
"filter": [
{"match": {"_id": "mJvG0nkBiU3wk_hrEd-8"}}
]
}
}
}
}
Since, I am wanting to use NEST to perform the query search I took this approach.
[HttpGet("GetAll/{_id}")]
public async Task<EsSource> GetAll(String _id)
{
var response = await _elasticClient.SearchAsync<EsSource>(s => s
.Index("customer-simulation-es-app-logs*")
.Query(q => q
.Bool(b => b
.Filter(f => f
.Match(m => m.Field("_id").Query(_id))))));
return response?.Documents?.FirstOrDefault();
}
Would this be the correct way to do a filter context using NEST?

That would be the correct way to issue a query with only a filter context. Some additional points that might help
A term-level query on the _id field, like a term query should suffice, as there's no analysis chain involved
If you know the index that contains the document, the get API would be a better option. Given a wildcard index pattern is being used though, implies that the index might not be known.
NEST has convenient operator overloads on queries to make writing bool queries more succinct. The final query can be written more succinctly as
var response = await _elasticClient.SearchAsync<EsSource>(s => s
.Index("customer-simulation-es-app-logs*")
.Query(q => +q
.Match(m => m
.Field("_id")
.Query(_id)
)
)
);

Related

Trying to filter some Elasticsearch results where the field might not exist

I have some data and I'm trying to add an extra filter that will exclude/filter-out any results which is where the key/value is foo.IsMarried == true.
Now, there's heaps of documents that don't have this field. If the field doesn't exist, then I'm assuming that the value is foo.IsMarried = false .. so those documents will be included in the result set.
Can anyone provide any clues, please?
I'm also using the .NET 'NEST' nuget client library - so I'll be really appreciative if the answer could be targeting that, but just happy with any answer, really.
Generally, within elasticsearch, for a boolean field, if the field doesn't exist, it doesn't mean that it's value is false. It could be that there is no value against it.
But, based on the assumption you are making in this case - we can check if the field foo.isMarried is explicitly false OR it does not exist in the document itself.
The query presented by Rahul in the other answer does the job. However since you wanted a NEST version of the same, the query can be constructed using the below snippet of code.
// Notice the use of not exists here. If you do not want to check for the 'false' value,
// you can omit the first term filter here. 'T' is the type to which you are mapping your index.
// You should pass the field based on the structure of 'T'.
private static QueryContainer BuildNotExistsQuery()
{
var boolQuery = new QueryContainerDescriptor<T>().Bool(
b => b.Should(
s => s.Term(t => t.Field(f => f.foo.IsMarried).Value(false)),
s => !s.Exists(ne => ne.Field(f => f.foo.IsMarried))
)
);
}
You can trigger the search through the NEST client within your project as shown below.
var result = client.Search<T>(
.From(0)
.Size(20)
.Query(q => BuildNotExistsQuery())
// other methods that you want to chain go here
)
You can use a should query with following conditions.
IsMarried = false
must not exists IsMarried
POST test/person/
{"name": "p1", "IsMarried": false}
POST test/person/
{"name": "p2", "IsMarried": true}
POST test/person/
{"name": "p3"}
Raw DSL query
POST test/person/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"IsMarried": false
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "IsMarried"
}
}
}
}
]
}
}
}
I hope you can convert this raw DSL query to NEST!

Delete all search response documents

Using nest for elasticsearch I am trying to delete an exact number of documents (the oldest I can find) from my index. My mapped object has a TimeStamp field. Only way I managed to make this work is by searching for these documents and then running a foreach over every hit and passing the ID of that hit into the delete API and removing them one by one:
var searchResponseAsc = client.Search<MyPersonalObject>(s => s.Sort(sd => sd.Ascending(e => e.TimeStamp)).Take(NumberOfDocumentsToBeDeleted));
foreach (IHit<MyPersonalObject> hit in searchResponseAsc.Hits) {
client.DeleteByQuery<MyPersonalObject>(dbq => dbq.Index(IndexName).Query(q => q.Ids(s => s.Values(hit.Id))));
}
Is there a way to call the delete API on a bulk of search response or even better to nest the search query directly into the delete query?
Any tips would be much appreciated!
You can embed an Elasticsearch search query in the delete by query API:
POST twitter/_delete_by_query
{
"query": {
"match": {
"message": "some message"
}
}
}

Performing an AND query in elastic search

I have tried looking for another solution to this, but the Bool query in ES seems to not do quite what I am looking for. Or I am just not using it correctly.
In our current implementation of search we are trying to boost performance/reduce memory footprint of each query by changing our query logic. Today, if you search for "The Red Ball" you may get back 5 million documents because ES returns any document that matches "the" OR "red" OR "ball" which means we get back WAAAAAY too many irrelevant documents (mostly because of the "the" term). I would like to change our query to instead use AND so ES would return only documents that match "the" AND "red" AND "ball".
I am using the NEST Client to do this with C# so an example using the client would be best since that seems to be where I cannot figure out what to do. Thanks
You can simply use query string query with AND operator.
{
"query": {
"query_string": {
"default_field": "your_field", <--- remove this if you want to search on all fields
"query": "the red ball",
"default_operator": "AND"
}
}
}
or simply
{
"query": {
"query_string": {
"query": "the AND red AND ball"
}
}
}
I do not know C#, but this is how it might look in nest(everyone,feel free to edit)
client.Search<your_index>(q => q
.Query(qu => qu
.QueryString(qs=>qs
.OnField(x=>your_field).Query("the AND red AND ball")
)
)
);
I found the appropriate query to make using the NEST client:
SearchDescriptor<BackupEntitySearchDocument> desc = new SearchDescriptor<BackupEntitySearchDocument>();
desc.Query(qq => qq.MultiMatch(m => m.OnFields(_searchFields).Query(query).Operator(Operator.And)));
var searchResp = await _client.SearchAsync<BackupEntitySearchDocument>(desc).ConfigureAwait(false);
Where _searchFields is a List<string> containing the fields to match on and query is the term to search for.

Elasticsearch: Field level custom scores in text searches

I just started exploring elasticsearch. I need to find an approach for specifying custom scores at field level. For example:
I have a collection named blog whose documents have following format:
{
"_id": "1736hst26672829",
"name": "Learning regular expressions basics",
"author": "John Lee",
"summery": "Here is summery.",
"body": "Content of the blog."
}
If I search a text 'xyz' in the collection then the result should reflect following score criteria
match in the field 'name' has priority 1.
match in the author field has the 2nd priority.
match in the summery has 3rd priority.
match in the body has least priority.
I need top 10 results on the basis of the above criteria.
:
Scoring in ElasticSearch is extremely customizable, the following applies to query time based custom scoring. There are various other scoring options, by index, in your mapping (and thus applied to every query), on Filters or Facets, using boosts or custom scoring.
While Custom Score Query is generally the most powerful solution, here are the docs for various custom scoring methods to read up on.
Boosting Query
Custom Boost Factor Query
Custom Score Query
The following is probably the simplest methods to apply custom scoring in query time, although I suggest you read up on Custom Score Query.
"query": {
"filtered": {
"query":
"bool": {
"must": [
{"multi_match": {
"fields": [
"name^4",
"author^3",
"summery^2",
"body^1"
],
"query": "xyz",
"operator": "AND",
"type": "cross_fields",
"analyzer": "standard"
}}
]
}
}
}
}
For people who search this answer but wish to use NEST, bellow is the same query using NEST. Use the ^ character to boost specific fields or use OnFieldsWithBoost to give fields custom scoring, and the query is sorted by score.
var query = "xyz";
//Add your field names to a string in lower camelCase as is ES default.
List<string> searchIn = new List<string(new string[] {"_id","name","author","summery","body"});
.Type("blogType")
.SortDescending("_score")
.Query(
q => q.MultiMatch(
t => t.OnFields(
searchIn
.Select(qs => qs == "name" ? "name^4" : qs)
.Select(qs => qs == "author" ? "author^3" : qs)
.Select(qs => qs == "summery" ? "summery^2" : qs)
.Select(qs => qs == "body" ? "body" : qs)
)
.Query(query)
)
)
If you have the correct (default) mapping in ES (C# Object to ES indexed JSON Object), you can also use the following within the OnFields:
t => t.OnFieldsWithBoost(qs => qs.Add(entry => entry.Name, 4.0)
.Add(entry => entry.Author, 3.0)
.Add(entry => entry.Summary, 2.0)
.Add(entry => entry.Body, 1.0))

NEST: How to query against multiple indices and handle different subclasses (document types)?

I’m playing around with ElasticSearch in combination with NEST in my C# project. My use case includes several indices with different document types which I query separately so far. Now I wanna implement a global search function which queries against all existing indices, document types and score the result properly.
So my question: How do I accomplish that by using NEST?
Currently I’m using the function SetDefaultIndex but how can I define multiple indices?
Maybe for a better understanding, this is the query I wanna realize with NEST:
{
"query": {
"indices": {
"indices": [
"INDEX_A",
"INDEX_B"
],
"query": {
"term": {
"FIELD": "VALUE"
}
},
"no_match_query": {
"term": {
"FIELD": "VALUE"
}
}
}
}
}
TIA
You can explicitly tell NEST to use multiple indices:
client.Search<MyObject>(s=>s
.Indices(new [] {"Index_A", "Index_B"})
...
)
If you want to search across all indices
client.Search<MyObject>(s=>s
.AllIndices()
...
)
Or if you want to search one index (thats not the default index)
client.Search<MyObject>(s=>s.
.Index("Index_A")
...
)
Remember since elasticsearch 19.8 you can also specify wildcards on index names
client.Search<MyObject>(s=>s
.Index("Index_*")
...
)
As for your indices_query
client.Search<MyObject>(s=>s
.AllIndices()
.Query(q=>q
.Indices(i=>i
.Indices(new [] { "INDEX_A", "INDEX_B"})
.Query(iq=>iq.Term("FIELD","VALUE"))
.NoMatchQuery(iq=>iq.Term("FIELD", "VALUE"))
)
)
);
UPDATE
These tests show off how you can make C#'s covariance work for you:
https://github.com/Mpdreamz/NEST/blob/master/src/Nest.Tests.Integration/Search/SubClassSupport/SubClassSupportTests.cs
In your case if all the types are not subclasses of a shared base you can still use 'object'
i.e:
.Search<object>(s=>s
.Types(typeof(Product),typeof(Category),typeof(Manufacturer))
.Query(...)
);
This will search on /yourdefaultindex/products,categories,manufacturers/_search and setup a default ConcreteTypeSelector that understands what type each returned document is.
Using ConcreteTypeSelector(Func<dynamic, Hit<dynamic>, Type>) you can manually return a type based on some json value (on dynamic) or on the hit metadata.

Resources