Elasticsearch C# NEST not match other field value - elasticsearch

I'm trying to build a query with Elasticsearch C# NEST library.
I have a table in SQL, let's say it called Mail. I need to make sure that one field does not equals to another in a query.
In SQL:
SELECT * FROM MAILS
WHERE Field_A != Field_B
How should I do it with C# NEST?

Elasticsearch is not intended for this type of functionality, you may be better served looking into more efficient ways to setup your project to be able to hand this, however there are some tools which can allow you to shoehorn in this form of a query.
While the basic query syntax doesn't encompass comparing fields, Scripts can allow you work around this.
Here is an example of a script using NEST:
.Query(q=>q
.Term(w => w.MatchAll())
.Filter(s => s.Script(sf => sf.Script("doc['mails.field_A'].value == doc['mails.field_B'].value"))
)
Here is an example of a script without using NEST:
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc['mails.field_A'].value == doc['mails.field_B'].value"
}
}
}
}
}
This will function provided script.disable_dynamic is set to false. Some security issues can arise.

Related

Query to see if a field contains a string using Query DSL

I am trying to filter Kibana for a field that contains the string "pH". The field is called extra.monitor_value_name. Examples of potential values are Temperature_ABC01, DO_ABC01, or pH_ABC01.
Kibana's Elasticsearch Query DSL does not seem to have a "contains string" so I need to custom make a query.
I am new to Query DSL, can you help me create the query?
Also, is it proper to call it Query DSL? I'm not even sure of proper wording.
Okay! Circling back with an answer to my own question.
My initial problem stemmed from not knowing about field_name vs field_name.keyword. Read here for info on keyword here: What's the difference between the 'field' and 'field.keyword' fields in Kibana?
Solution 1
Here's the query I ended up using. I used a regexp query. I found this article useful in figuring out syntax for the regexp:
{
"query": {
"regexp": {
"extra.monitor_value_name.keyword": "pH.*"
}
}
}
Solution 2
Another way I could have filtered, without Query DSL was typing in a search field: extra.monitor_value_name.keyword:pH*.
One interesting thing to note was the .keyword doesn't seem to be necessary with this method. I am not sure why.
try this in filter using Elasticsearch Query DSL:
{
"query": {
"wildcard": {
"extra.monitor_value_name": {
"value": "pH.*"
}
}
}
}

Not able to understand this Elasticsearch query

{
"query": {
"nested": {
"path": "product_vendors",
"query": {
"bool" :{
"must" : {
"bool" : {
"should" : [
{ "terms": {"product_vendors.manufacturer_style":["FSS235D-26","SG463-1128-5","SG463-2879-4"]}},
{ "terms": {"product_vendors.id":["71320"]}}
]
}
}
}
}
}
}
}
I have above elastic query, not able to understand this. Would anyone please explain what it means and what documents it will return?
Update : #christinabo , i tried your query , and results returned , but here some small issues , apart from the matched documents , two more additional documents are returning in those documents only vendor_id is matching , may i know why two extra unmatched documents are returning , do we need to some attribute or something to make sure strict search and return is allowed , can please suggest on this .
By observing the query, I can understand that there is a nested object in the data. I can imagine that it has this structure:
product_vendors: {
'id': 'the_id',
'manufacturer_style': 'some style'
}
In order to query a nested object, you need a nested query. This is why you have the nested keyword there. In a nested query, you need to specify the path (product_vendors) that leads to the embedded fields (id, manufacturer_style).
Then, the query defines a bool query with the must keyword, which means that the query which follows must appear in matching documents. In this case, what it must appear is another bool query, defined with the should keyword. This contains two terms sub-queries (one for manufacturer_style and one for id) and means that the matching documents should match one or two of them. Each sub-query queries the embedded field by specifying the whole route of the nested object, using the dot (i.e. product_vendors.manufacturer_style).
I would expect the query to return you the documents that match at least one of the terms queries, with the documents that match both to have higher score.
I hope that this explanation gives you an overall idea of this query.
More about bool queries from the documentation here.

Logstash -> Elasticsearch - update denormalized data

Use case explanation
We have a relational database with data about our day-to-day operations. The goal is to allow users to search the important data with a full-text search engine. The data is normalized and thus not in the best form to make full-text queries, so the idea was to denormalize a subset of the data and copy it in real-time to Elasticsearch, which allows us to create a fast and accurate search application.
We already have a system in place that enables Event Sourcing of our database operations (inserts, updates, deletes). The events only contains the changed columns and primary keys (on an update we don't get the whole row). Logstash already gets notified for each event so this part is already handled.
Actual problem
Now we are getting to our problem. Since the plan is to denormalize our data we will have to make sure updates on parent objects are propagated to the denormalized child objects in Elasticsearch. How can we configure logstash to do this?
Example
Lets say we maintain a list of Employees in Elasticsearch. Each Employee is assigned to a Company. Since the data is denormalized (for the purpose of faster search), each Employee also carries the name and address of the Company. An update changes the name of a Company - how can we configure logstash to update the company name in all Employees, assigned to the Company?
Additional explanation
#Darth_Vader:
The problem we are facing is, that we get an event that a Company has changed, but we want to modify documents of type Employee in Elasticsearch, because they carry the data about the company in itself. Your answer expects that we will get an event for every Employee, which is not the case.
Maybe this will make it clearer. We have 3 employees in Elasticsearch:
{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}
Then an update happens in the source DB.
UPDATE company SET name = 'Company NEW' WHERE cmp_id = 1;
We get an event in logstash, where it says something like this:
{type:'company',cmp_id:'1',old.name:'Company A',new.name:'Company NEW'}
This should then be propagated to Elasticsearch, so that the resulting employees are:
{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}
Notice that the field company.name changed.
I suggest a similar solution to what I've posted here, i.e. to use the http output plugin in order to issue an update by query call to the Employee index. The query would need to look like this:
POST employees/_update_by_query
{
"script": {
"source": "ctx._source.company.name = params.name",
"lang": "painless",
"params": {
"name": "Company NEW"
}
},
"query": {
"term": {
"company.cmp_id": "1"
}
}
}
So your Logstash config should look like this:
input {
...
}
filter {
mutate {
add_field => {
"[script][lang]" => "painless"
"[script][source]" => "ctx._source.company.name = params.name"
"[script][params][name]" => "%{new.name}"
"[query][term][company.cmp_id]" => "%{cmp_id}"
}
remove_field => ["host", "#version", "#timestamp", "type", "cmp_id", "old.name", "new.name"]
}
}
output {
http {
url => "http://localhost:9200/employees/_update_by_query"
http_method => "post"
format => "json"
}
}

Exclude setting on integer field in term query

My documents contain an integer array field, storing the id of tags describing them. Given a specific tag id, I want to extract a list of top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id field to a term filter over the same field, but the list I get back obviously always starts with the album id I provide: all documents matching my filter have that tag, and it is thus the first in the list.
I though of using the exclude field to avoid creating the problematic bucket, but as I'm dealing with an integer field, that seems not to be possible: this query
{
"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}
returns an error saying that Aggregation [tags] cannot support the include/exclude settings as it can only be applied to string values.
Is it possible to avoid getting back this bucket?
This is, as of Elasticsearch 1.4, a shortcoming of ES itself.
After the community proposed this change, the functionality has been added and will be included in Elasticsearch 1.5.0.
It's supposed to be fixed since version 1.5.0.
Look at this: https://github.com/elasticsearch/elasticsearch/pull/7727
While it is enroute to being fixed: My workaround is to have the aggregation use a script instead of direct access to the field, and let that script use the value as string.
Works well and without measurable performance loss.

Can elasticsearch return multiple value fields in a single facet?

I am looking for a way to create a facet such that I can essentially return two values for one key.
For instance, I am attempting to retrieve both an amount and schedule properties of an object. I attempted to use a computed value script, but the calculations that have to be done using the two objects are date based, and require an external library to perform them.
Basically, something along the lines of:
"theFacet": {
"terms_stats": {
"key_field": "someKeyProbablyADate",
"value_field": "amount",
"value_field": "simpleSchedule"
}
}
Workarounds are also appreciated. Perhaps some way to return a new dynamic object with both fields?
Sounds like you want to pre-process your data before you index it into a single field, then facet on that.
Something among the line of a single string containing key#amount#schedule
Then when you get the faceting results back you can split it up again and run whatever logic you want.
Try combining different fields with a script element. For example:
"facets": {
"facet-name": {
"terms": {
"field": "some-field",
"script": "_source['another-field'] + '/' + term
}
}
}

Resources