Return five following documents from an id with Elasticsearch and NEST - elasticsearch

I think I have blinded myself staring at an error over and over again and could really use some input. I have a time-series set of documents. Now I want to find the five documents following a specific id. I start by fetching that single document. Then fetching the following five documents without this id:
var documents = client.Search<Document>(s => s
.Query(q => q
.ConstantScore(cs => cs
.Filter(f => f
.Bool(b => b
.Must(must => must
.DateRange(dr => dr.Field(field => field.Time).GreaterThanOrEquals(startDoc.Time))
.MustNot(mustNot => mustNot
.Term(term => term.Id, startDoc.Id))
))))
.Take(5)
.Sort(sort => sort.Ascending(asc => asc.Time))).Documents;
My problem is that while 5 documents are returned and sorted correctly, the start document is in the returned data. I'm trying to filter this away with the must not filter, but doesn't seem to be working. I'm pretty sure I have done this in other places, so might be a small issue that I simply cannot see :)
Here's the query generated by NEST:
{
"query":{
"constant_score":{
"filter":{
"bool":{
"must":[
{
"range":{
"time":{
"gte":"2020-08-31T10:47:12.2472849Z"
}
}
}
],
"must_not":[
{
"term":{
"id":{
"value":"982DBC1BE9A24F0E"
}
}
}
]
}
}
}
},
"size":5,
"sort":[
{
"time":{
"order":"asc"
}
}
]
}

This could be happening because the id field might be an analyzed field. Analyzed fields are tokenized. Having a non-analyzed version, for exact match (like you mentioned in the comments, you have one) and using it within your filter will fix the difference you are seeing.
More about analyzed vs non-analyzed fields here

Related

Trying to filter some Elasticsearch results where the field might not exist

I have some data and I'm trying to add an extra filter that will exclude/filter-out any results which is where the key/value is foo.IsMarried == true.
Now, there's heaps of documents that don't have this field. If the field doesn't exist, then I'm assuming that the value is foo.IsMarried = false .. so those documents will be included in the result set.
Can anyone provide any clues, please?
I'm also using the .NET 'NEST' nuget client library - so I'll be really appreciative if the answer could be targeting that, but just happy with any answer, really.
Generally, within elasticsearch, for a boolean field, if the field doesn't exist, it doesn't mean that it's value is false. It could be that there is no value against it.
But, based on the assumption you are making in this case - we can check if the field foo.isMarried is explicitly false OR it does not exist in the document itself.
The query presented by Rahul in the other answer does the job. However since you wanted a NEST version of the same, the query can be constructed using the below snippet of code.
// Notice the use of not exists here. If you do not want to check for the 'false' value,
// you can omit the first term filter here. 'T' is the type to which you are mapping your index.
// You should pass the field based on the structure of 'T'.
private static QueryContainer BuildNotExistsQuery()
{
var boolQuery = new QueryContainerDescriptor<T>().Bool(
b => b.Should(
s => s.Term(t => t.Field(f => f.foo.IsMarried).Value(false)),
s => !s.Exists(ne => ne.Field(f => f.foo.IsMarried))
)
);
}
You can trigger the search through the NEST client within your project as shown below.
var result = client.Search<T>(
.From(0)
.Size(20)
.Query(q => BuildNotExistsQuery())
// other methods that you want to chain go here
)
You can use a should query with following conditions.
IsMarried = false
must not exists IsMarried
POST test/person/
{"name": "p1", "IsMarried": false}
POST test/person/
{"name": "p2", "IsMarried": true}
POST test/person/
{"name": "p3"}
Raw DSL query
POST test/person/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"IsMarried": false
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "IsMarried"
}
}
}
}
]
}
}
}
I hope you can convert this raw DSL query to NEST!

Dynamic field list for MultiMatch - Nest

We have a requirement to have a search for a document type with a variable/dynamic number of fields being queried against. For one search/type it might be Name and Status. For another, the Description field. The fields to be searched against will be chosen by the user at run time.
To do this statically appears easy. Something like this to search in Name and Description fields. (Assume that rootQuery is a valid searchDescriptor ready for the query.
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => ff.Name).Field(ff => ff.Description))));
However, we don't want to have a library of static queries to handle the potential permutations if possible. We'd rather do something dynamic like:
foreach (var field in string-list-of-fields-from-user)
{
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => field);
}
Is this possible? If so, how?
You can pass the string list of fields directly to .Fields(...)
var searchResponse = client.Search<Document>(s => s
.Query(q => q
.MultiMatch(mm => mm
.Query("query")
.Fields(new string[] { "field1", "field2", "field3" })
)
)
);
which yields
{
"query": {
"multi_match": {
"fields": ["field1", "field2", "field3"],
"query": "query"
}
}
}

Split a message using grok

I have logs in the format:
2018-09-17 15:24:34;Count of files in error folder in;C:\Scripts\FOLDER\SUBFOLDER\error;1
I want to put in a separate field the path to the folder and the number after.
Like
dirTEST=C:\Scripts\FOLDER\SUBFOLDER\
count.of.error.filesTEST=1
or
dir=C:\Scripts\FOLDER\SUBFOLDER\
count.of.error.files=1
I use for this grok pattern in logstash config:
if "TestLogs" in [tags] {
grok{
match => { "message" => "%{DATE:date_in_log}%{SPACE}%{TIME:time.in.log};%{DATA:message.text.log};%{WINPATH:dir};%{INT:count.of.error.files}" }
add_field => { "dirTEST" => "%{dir}" }
add_field => { "count.of.error.filesTEST" => "%{count.of.error.files}" }
}
}
No errors in logstash logs.
But in the Kibana I get the usual log without new fields.
A couple of notes here. First of all, it must be said that the solution seems to be doing what you expect, so probably the problem is that your Index Pattern has not been updated with the new fields. To do so in Kibana you can go to Management -> Kibana -> Index Patterns and refresh the field list in the upper right corner (Next to the delete Index Pattern button).
Second is that you must take into account that using points to separate the terms makes the structured data look like this:
{
"date_in_log": "18-09-17",
"count": {
"of": {
"error": {
"files": "1"
}
}
},
"time": {
"in": {
"log": "15:24:34"
}
},
"message": {
"text": {
"log": "Count of files in error folder in"
}
},
"dir": "C:\\Scripts\\FOLDER\\SUBFOLDER\\error"
}
I don't know if this is how you want your data to be represented, but maybe you should consider other solution changing the naming of the fields in the grok pattern.

Get the first document from every Elasticsearch route

I have an Elasticsearch index with route key of day in the following format "yyyyMMdd". Each day a lot of new documents are added. At the end of the month I would like to query if there are any days when for some reason a document haven't been added by a source. There is a source_id field representing the source.
I got it so far that I need to give all the routekeys, like 20160101,20160102 etc. and filter by the source_id. But this can return hundreds of thounsands of documents, I may need to paginate through them all.
Is there a way to only know if there is a routing key which doesn't have matching document with the given source_id, so essentially I would only return 31 documents or less to my application code, so it would be easy to iterate through and check if there is a day without document.
Any ideas?
You can use Terms Aggregation on the _routing field to know what all routing values have been used. See the query below:
POST <index>/<type>/_search
{
"size": 0,
"query": {
"term": {
"source_id": {
"value": "VALUE" <-- Value of source_id to filter on
}
}
},
"aggs": {
"routings": {
"terms": {
"field": "_routing",
"size": 31 <-- We don't expect to get more than 31 unique _routing values
}
}
}
}
Corresponding Nest code is as under:
var response = client.Search<object>(s => s
.Index("<index name>")
.Type("<type>")
.Query(q => q
.Term("source_id", "<source value>"))
.Aggregations(a => a
.Terms("routings", t => t
.Field("_routing")
.Size(31))));
var routings = response.Aggs.Terms("routings").Items.Select(b => b.Key);
routings will contain the list of routing values you need.

Search result fluctuations

I have bunch of collections with documents and i have encountered so,ething starnge. When I execute same request few times in a row result change consecutively
It would be fine if it's small fluctuations, but count of results changes on ~75000 of documents
So I have a question what's going on
My request is:
POST mycollection/mytype/_search
{
"fields": ["timestamp", "bool_field"],
"filter" : {
"terms":{
"bool_field" : [true]
}
}
}
results are going like this:
=> 148866
=> 75381
=> 148866
=> 75381
=> 148866
=> 75381
=> 148866
When count is 148k
I see some records with bool_field: "False" in Sense

Resources