Elastic Update By Query Updated Entire Index Instead - elasticsearch

I am trying to get a query to work that will update a specific field in a document, provided it matches a query (in this example, where one field matches an exact value).
Here I am trying to query all documents that have the field "Foo" set to "Bar", and set the field "TextField5" in each of them to 1337. There are only a handfull in the index that match this. However, when I run this query, every document in the index has its TextField5 updated.
POST /threat_vuln/_update_by_query
{
"query": {
"match": {
"Foo": "Bar"
}
},
"script" : {
"source" : "ctx._source.TextField5='1337';",
"lang" : "painless"
}
}
I've gone over the Update API and Update By Query API and am still missing something. How can I change this to only update documents that match the query?
I'm on Kibana 7.4.0
EDIT: Also tried this, which still updates every document in the index instead of those matching the query:
POST /threat_vuln/_update_by_query
{
"query": {
"bool" : {
"must": [
{
"match": {
"Foo": "Bar"
}
}
]
}
},
"script" : {
"source" : "ctx._source.TextField5='1337';",
"lang" : "painless"
}
}

I got this to work as intended:
POST /threat_vuln/_update_by_query
{
"query": {
"bool" : {
"must": [
{
"match": {
"Foo.keyword": "Bar"
}
}
]
}
},
"script" : {
"source" : "ctx._source.TextField5='1337';",
"lang" : "painless"
}
}
I still don't understand how/why the examples in the question would just go ahead and update everything with what now appears to be a query that should return nothing, but I digress.

Related

Elasticsearch: How to filter results with a specific word in a value using elasticsearch

I need to add a parameter to my search that filters results containing a specific word in a value. The query is searching for user history records and contains a url key. I need to filter out /history and any other url containing that string.
Here's my current query:
GET /user_log/_search
{
"size" : 50,
"query": {
"match": {
"user_id": 56678
}
}
}
Here's an example of a record, boiled down to just the value we're looking at:
"_source": {
"url": "/history?page=2&direction=desc",
},
How can the parameters of the search be changed to filter out this result.
You can use the filter param of boolean query in Elasticsearch.
if your url field is of type keyword, you can use the below query
{
"query": {
"bool": {
"must": {
"match": {
"user_id": 56678
}
},
"filter": { --> note filter
"term": {
"url": "/history"
}
}
}
}
}
I found a way to solve my specific issue. Instead of filtering on the url I'm filtering on a different value. Here's what I'm using now:
{
"size" : 50,
"query": {
"bool" : {
"must" : {
"match" : { "user_id" : 56678 }
},
"must_not": {
"match" : { "controller": "History" }
}
}
}
}
I'm still going to leave this question open for a while to see if anyone has other ways of solving the original problem.

Elasticsearch - use a field match to boost only and not to fetch the document

I have a query phrase that needs to match in either of the fields - name, summary or description or the exact match on the name field.
Now, I have one more new field brand. Match in this field should be used only to boost results. Meaning if there is a match only in the brand field, the doc should not be in the result set.
To solve the without brand I have the below query:
query: {
bool: {
minimum_should_match: 1,
should: [
multi_match:{
query : "Cadbury chocklate milk",
fields : [name, summary, description]
},
term: {
name_keyword: {
value: "Cadbury chocklate milk"
}
}
]
}
}
This works fine for me.
How do I fetch the data using the same query but boost docs that have brand:cadbury, without increasing the recall set(match based on brand:cadbury).
Thanks!
Using a bool inside must should work for you.
multi_match has multiple types and for phrase you have to use type:phrase.
{
"query": {
"bool": {
"must": [
{ "bool" :
{ "should" : [ {
"multi_match" :{
"type" : "phrase",
"query" : "Cadbury chocklate milk",
"fields" : ["name", "summary", "description"]
} }, {
"term": {
"name_keyword": {
"value": "Cadbury chocklate milk"
} }
}
]
}
}
],
"should" : {
"term" : {
"brand" : {
"value" : "cadbury"
}
}
}
}
}

ElasticSearch filter by array item

I have the following record in ES:
"authInput" : {
"uID" : "foo",
"userName" : "asdfasdfasdfasdf",
"userType" : "External",
"clientType" : "Unknown",
"authType" : "Redemption_regular",
"uIDExtensionFields" :
[
{
"key" : "IsAccountCreation",
"value" : "true"
}
],
"externalReferences" : []
}
"uIDExtensionFields" is an array of key/value pairs. I want to query ES to find all records where:
"uIDExtensionFields.key" = "IsAccountCreation"
AND "uIDExtensionFields.value" = "true"
This is the filter that I think I should be using but it never returns any data.
GET devdev/authEvent/_search
{
"size": 10,
"filter": {
"and": {
"filters": [
{
"term": {
"authInput.uIDExtensionFields.key" : "IsAccountCreation"
}
},
{
"term": {
"authInput.uIDExtensionFields.value": "true"
}
}
]
}
}
}
Any help you guys could give me would be much appreciated.
Cheers!
UPDATE: WITH THE HELP OF THE RESPONSES BELOW HERE IS HOW I SOLVED MY PROBLEM:
lowercase the value that I was searching for. (changed "IsAccoutCreation" to "isaccountcreation")
Updated the mapping so that "uIDExtensionFields" is a nested type
Updated my filter to the following:
_
GET devhilden/authEvent/_search
{
"size": 10,
"filter": {
"nested": {
"path": "authInput.uIDExtensionFields",
"query": {
"bool": {
"must": [
{
"term": {
"authInput.uIDExtensionFields.key": "isaccountcreation"
}
},
{
"term": {
"authInput.uIDExtensionFields.value": "true"
}
}
]
}
}
}
}
}
There are a few things probably going wrong here.
First, as mconlin points out, you probably have a mapping with the standard analyzer for your key field. It'll lowercase the key. You probably want to specify "index": "not_analyzed" for the field.
Secondly, you'll have to use nested mappings for this document structure and specify the key and the value in a nested filter. That's because otherwise, you'll get a match for the following document:
"uIDExtensionFields" : [
{
"key" : "IsAccountCreation",
"value" : "false"
},
{
"key" : "SomeOtherField",
"value" : "true"
}
]
Thirdly, you'll want to be using the bool-filter's must and not and to ensure proper cachability.
Lastly, you'll want to put your filter in the filtered-query. The top-level filter is for when you want hits to be filtered, but facets/aggregations to not be. That's why it's renamed to post_filter in 1.0.
Here's a few resources you'll want to check out:
Troubleshooting Elasticsearch searches, for Beginners covers the first two issues.
Managing Relations in ElasticSearch covers nested docs (and parent/child)
all about elasticsearch filter bitsets covers and vs. bool.

elasticsearch filtering by the size of a field that is an array

How can I filter documents that have a field which is an array and has more than N elements?
How can I filter documents that have a field which is an empty array?
Is facets the solution? If so, how?
I would have a look at the script filter. The following filter should return only the documents that have at least 10 elements in the fieldname field, which is an array. Keep in mind that this could be expensive depending on how many documents you have in your index.
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.length > 10"
}
}
Regarding the second question: do you really have an empty array there? Or is it just an array field with no value? You can use the missing filter to get documents which have no value for a specific field:
"filter" : {
"missing" : { "field" : "user" }
}
Otherwise I guess you need to use scripting again, similarly to what I suggested above, just with a different length as input. If the length is constant I'd put it in the params section so that the script will be cached by elasticsearch and reused, since it's always the same:
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.length > params.param1"
"params" : {
"param1" : 10
}
}
}
javanna's answer is correct on Elasticsearch 1.3.x and earlier, since 1.4 the default scripting module has changed to groovy (was mvel).
To answer OP's question.
On Elasticsearch 1.3.x and earlier, use this code:
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.length > 10"
}
}
On Elasticsearch 1.4.x and later, use this code:
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.size() > 10"
}
}
Additionally, on Elasticsearch 1.4.3 and later, you will need to enable the dynamic scripting as it has been disabled by default, because of security issue. See: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-scripting.html
Still posting to here for who stuck same situation with me.
Let's say your data look like this:
{
"_source": {
"fieldName" : [
{
"f1": "value 11",
"f2": "value 21"
},
{
"f1": "value 12",
"f2": "value 22"
}
]
}
}
Then to filter fieldName with length > 1 for example:
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc['fieldName.f1'].values.length > 1",
"lang": "painless"
}
}
}
}
}
The script syntax is as ES 5.4 documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-query.html.
Imho the correct way of filtering arrays by size using scripting is :
"filter" : {
"script" : {
"script" : "_source.fieldName.size() > 1"
}
}
If I do that as #javanna suggests it throws exception groovy.lang.MissingPropertyException: No such property: length for class: java.lang.String
If you have an array of objects that aren't mapped as nested, keep in mind that Elastic will flatten them into:
attachments: [{size: 123}, {size: 456}] --> attachments.size: [123, 456]
So you want to reference your field as doc['attachments.size'].length, not doc['attachments'].length, which is very counter-intuitive.
Same for doc.containsKey(attachments.size).
The .values part is deprecated and no longer needed.
Based on this:
https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/RegularImmutableList.java?r=707f3a276d4ea8e9d53621d137febb00cd2128da
And on lisak's answer here.
There is size() function which returns the length of list:
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.size() > 10"
}
}
Easiest way to do this is to "denormalize" your data so that you have a property that contains the count and a boolean if it exists or not. Then you can just search on those properties.
For example:
{
"id": 31939,
"hasAttachments": true,
"attachmentCount": 2,
"attachments": [
{
"type": "Attachment",
"name": "txt.txt",
"mimeType": "text/plain"
},
{
"type": "Inline",
"name": "jpg.jpg",
"mimeType": "image/jpeg"
}
]
}
When you need to find documents which contains some field which size/length should be larger then zero #javanna gave correct answer. I only wanted to add if your field is text field and you want to find documents which contains some text in that field you can't use same query. You will need to do something like this:
GET index/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"FIELD_NAME": {
"gt": 0
}
}
}
]
}
}
}
This is not exact answer to this question because answer already exists but solution for similar problem which I had so maybe somebody will find it useful.
a suggestion about the second question:
How can I filter documents that have a field which is an empty array?
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "fieldname"
}
}
}
}
}
will return docs with empty fieldname: [] arrays. must (rather than must_not will return the opposite).
Here is what worked for me:
GET index/search {
"query": {
"bool": {
"filter" : {
"script" : {
"script" : "doc['FieldName'].length > 10"
}
}
}
}
}
For version 7+:
"filter": {
"script": {
"script": {
"source": "doc['fieldName.keyword'].length > 10",
"lang": "painless"
}
}
}
Ref. https://medium.com/#felipegirotti/elasticsearch-filter-field-array-more-than-zero-8d52d067d3a0

Elasticsearch DSL query from an SQL statement

I'm new to Elasticsearch. I don't think I fully understand the concept of query and filters. In my case I just want to use filters as I don't want to use advance feature like scoring.
How would I convert the following SQL statement into elasticsearch query?
SELECT * FROM advertiser
WHERE company like '%com%'
AND sales_rep IN (1,2)
What I have so far:
curl -XGET 'localhost:9200/advertisers/advertiser/_search?pretty=true' -d '
{
"query" : {
"bool" : {
"must" : {
"wildcard" : { "company" : "*com*" }
}
}
},
"size":1000000
}'
How to I add the OR filters on sales_rep field?
Thanks
Add a "should" clause after your must clause. In a bool query, one or more should clauses must match by default. Actually, you can set the "minimum_number_should_match" to be any number, Check out the bool query docs.
For your case, this should work.
"should" : [
{
"term" : { "sales_rep_id" : "1" }
},
{
"term" : { "sales_rep_id" : "2" }
}
],
The same concept works for bool filters. Just change "query" to "filter". The bool filter docs are here.
I come across this post 4 years too late...
Anyways, perhaps the following code could be useful...
{
"query": {
"filtered": {
"query": {
"wildcard": {
"company": "*com*"
}
},
"filter": {
"bool": {
"should": [
{
"terms": {
"sales_rep_id": [ "1", "2" ]
}
}
]
}
}
}
}
}

Resources