query must match 2 fields exactly, don't analyze - elasticsearch

I tried a few different ways of doing a simple get request, filtering on two different attributes, example:
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"email": "erik.landvall#example.com"
}
},
{
"term": {
"password": "bb3810356e9b60cf6..."
}
}
]
}
},
"query": {
"match_all": []
}
}
}
The problem is that I get nothing back in return. As I understand it, this is because ElasticSearch analyzes the email field, making the query fail. So if I however would use the term erik.landvall instead of the complete email address, it will match the document - which confirms that's what's going on.
I can define the attribute as type:string and index:not_analyzed when I create the index. But what if I wanna be able to search on the email attribute in a different context? So there should, to my mind, be a way to specify that I wanna filter on the actual value of the attribute in a query. I can however not find how such a query would look.
Is it possible to force Elasticsearch to use "not_analyze" when querying? If so, then how?

You can use scripting for this purpose. You would have to directly access the JSON you have stored with _source. Try following query
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"inline" : "_source.email==param1 && _source.password==param2",
"params" : {
"param1" : "erik.landvall#example.com",
"param2" : "bb3810356e9b60cf6"
}
}
}
}
}
}
}
You would need to enable dynamic scripting. Add script.inline: on to your yml file and restart the node.
If this kind of query is fairly regular then It would be much better to reindex the data as others have suggested in the comments.

Its not possible to turn on/off analyzed or not, the way to do it to "transform" your field to analysis you need by using fields.
curl -XPUT 'localhost:9200/my_index?pretty' -d'
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
curl -XPUT 'localhost:9200/my_index/my_type/1?pretty' -d'
{
"city": "New York"
}'
curl -XPUT 'localhost:9200/my_index/my_type/2?pretty' -d'
{
"city": "York"
}'
curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
"query": {
"match": {
"city": "york"
}
},
"sort": {
"city.raw": "asc"
},
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}'

Related

elasticsearch aggregation on field containing spaces

I have a field that contains spaces called "CompanyName". The CompanyName field contains things like, "ABC Client", "BCD CLIENT 123", "EFG CLIENT HIJ"
When I index the data I set the mapping to "index" : "not_analyzed". When I run an aggregation, without any other queries, it appears to work fine.
The issue I have is that if I want to first run another query and then get an aggregation of those results, the aggregation then breaks because it interprets the spaces in the company names, so it looks like the aggregation is run over the output of the first query and not over the field that I setup.
The query:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"Stuff": "1"
}
},
{
"term": {
"filename": "FileOfData.sourcedata"
}
}
]
}
}
}
},
"aggs": {
"users": {
"terms": {
"field": "CompanyName"
}
}
}
}
I have also tried adding a custom analyzer using:
"analysis": {
"analyzer": {
"companynamestring": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
And it is still not working. Does anyone know how I can run a query and then get an aggregation that returns only the full CompanyName field and is not tokenized?
Thanks!

Exact match in elastic search query

I want to exactly match the string ":Feed:" in a message field and go back a day pull all such records. The json I have seems to also match the plain word " feed ". I am not sure where I am going wrong. Do I need to add "constant_score" to this query JSON? The JSON I have currently is as shown below:
{
"query": {
"bool": {
"must": {
"query_string": {
"fields": ["message"],
"query": "\\:Feed\\:"
}
},
"must": {
"range": {
"timestamp": {
"gte": "now-1d",
"lte": "now"
}
}
}
}
}
}
As stated here: Finding Exact Values, since the field has been analyzed when indexed - you have no way of exact-matching its tokens (":"). Whenever the tokens should be searchable the mapping should be "not_analyzed" and the data needs to be re-indexed.
If you want to be able to easily match only ":feed:" inside the message field you might want to costumize an analyzer which doesn't tokenize ":" so you will be able to query the field with a simple "match" query instead of wild characters.
Not able to do this with query_string but managed to do so by creating a custom normalizer and then using a "match" or "term" query.
The following steps worked for me.
create a custom normalizer (available >V5.2)
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
}
Create a mapping with type "keyword"
{
"mappings": {
"default": {
"properties": {
"title": {
"type": "text",
"fields": {
"normalize": {
"type": "keyword",
"normalizer": "my_normalizer"
},
"keyword" : {
"type": "keyword"
}
}
}
}
}
}
use match or term query
{
"query": {
"bool": {
"must": [
{
"match": {
"title.normalize": "string to match"
}
}
]
}
}
}
Use match phrase
GET /_search
{
"query": {
"match_phrase": {
"message": "7000-8900"
}
}
}
In java use matchPhraseQuery of QueryBuilder
QueryBuilders.matchPhraseQuery(fieldName, searchText);
Simple & Sweet Soln:
use term query..
GET /_search
{
"query": {
"term": {
"message.keyword": "7000-8900"
}
}
}
use term query instead of match_phrase,
match_phrase this find/match with ES-document stored sentence, It will not exactly match. It matches with those sentence words!

Find documents with empty string value on elasticsearch

I've been trying to filter with elasticsearch only those documents that contains an empty string in its body. So far I'm having no luck.
Before I go on, I should mention that I've already tried the many "solutions" spread around the Interwebz and StackOverflow.
So, below is the query that I'm trying to run, followed by its counterparts:
{
"query": {
"filtered":{
"filter": {
"bool": {
"must_not": [
{
"missing":{
"field":"_textContent"
}
}
]
}
}
}
}
}
I've also tried the following:
{
"query": {
"filtered":{
"filter": {
"bool": {
"must_not": [
{
"missing":{
"field":"_textContent",
"existence":true,
"null_value":true
}
}
]
}
}
}
}
}
And the following:
{
"query": {
"filtered":{
"filter": {
"missing": {"field": "_textContent"}
}
}
}
}
None of the above worked. I get an empty result set when I know for sure that there are records that contains an empty string field.
If anyone can provide me with any help at all, I'll be very grateful.
Thanks!
If you are using the default analyzer (standard) there is nothing for it to analyze if it is an empty string. So you need to index the field verbatim (not analyzed). Here is an example:
Add a mapping that will index the field untokenized, if you need a tokenized copy of the field indexed as well you can use a Multi Field type.
PUT http://localhost:9200/test/_mapping/demo
{
"demo": {
"properties": {
"_content": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Next, index a couple of documents.
/POST http://localhost:9200/test/demo/1/
{
"_content": ""
}
/POST http://localhost:9200/test/demo/2
{
"_content": "some content"
}
Execute a search:
POST http://localhost:9200/test/demo/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"_content": ""
}
}
}
}
}
Returns the document with the empty string.
{
took: 2,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0
},
hits: {
total: 1,
max_score: 0.30685282,
hits: [
{
_index: test,
_type: demo,
_id: 1,
_score: 0.30685282,
_source: {
_content: ""
}
}
]
}
}
Found solution here https://github.com/elastic/elasticsearch/issues/7515
It works without reindex.
PUT t/t/1
{
"textContent": ""
}
PUT t/t/2
{
"textContent": "foo"
}
GET t/t/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "textContent"
}
}
],
"must_not": [
{
"wildcard": {
"textContent": "*"
}
}
]
}
}
}
Even with the default analyzer you can do this kind of search: use a script filter, which is slower but can handle the empty string:
curl -XPOST 'http://localhost:9200/test/demo/_search' -d '
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source._content.length() == 0"
}
}
}
}
}'
It will return the document with empty string as _content without a special mapping
As pointed by #js_gandalf, this is deprecated for ES>5.0. Instead you should use: query->bool->filter->script as in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
For those of you using elastic search 5.2 or above, and still stuck. Easiest way is to reindex your data correctly with the keyword type. Then all the searches for empty values worked. Like this:
"query": {
"term": {"MY_FIELD_TO_SEARCH": ""}
}
Actually, when I reindex my database and rerun the query. It worked =)
The problem was that my field was type: text and NOT a keyword. Changed the index to keyword and reindexed:
curl -X PUT https://username:password#host.io:9200/mycoolindex
curl -X PUT https://user:pass#host.io:9200/mycoolindex/_mapping/mycooltype -d '{
"properties": {
"MY_FIELD_TO_SEARCH": {
"type": "keyword"
},
}'
curl -X PUT https://username:password#host.io:9200/_reindex -d '{
"source": {
"index": "oldindex"
},
"dest": {
"index": "mycoolindex"
}
}'
I hope this helps someone who was as stuck as I was finding those empty values.
OR using lucene query string syntax
q=yourfield.keyword:""
See Elastic Search Reference https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-query-string-query.html#query-string-syntax
in order to find the empty string of one field in your document, it's highly relevant to the field's mapping, in other word, its index/analyzer setting .
If its index is not_analyzed, which means the token is just the empty string, you can just use term query to find it, as follows:
{"from": 0, "size": 100, "query":{"term": {"name":""}}}
Otherwise, if the index setting is analyzed and I believe most analyzer will treat empty string as null value So
you can use the filter to find the empty string.
{"filter": {"missing": {"existence": true, "field": "name", "null_value": true}}, "query": {"match_all": {}}}
here is the gist script you can reference: https://gist.github.com/hxuanji/35b982b86b3601cb5571
BTW, I check the commands you provided, it seems you DON'T want the empty string document.
And all my above command are just to find these, so just put it into must_not part of bool query would be fine.
My ES is 1.0.1.
For ES 1.3.0, currently the gist I provided cannot find the empty string. It seems it has been reported: https://github.com/elasticsearch/elasticsearch/issues/7348 . Let's wait and see how it go.
Anyway, it also provides another command to find
{ "query": {
"filtered": {
"filter": {
"not": {
"filter": {
"range": {
"name": {
}
}
}
}
}
} } }
name is the field name to find the empty-string. I've tested it on ES 1.3.2.
I'm using Elasticsearch 5.3 and was having trouble with some of the above answers.
The following body worked for me.
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc['city'].empty",
"lang": "painless"
}
}
}
}
}
}
Note: you might need to enable the fielddata for text fields, it is disabled by default. Although I would read this: https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html before doing so.
To enable the fielddata for a field e.g. 'city' on index 'business' with type name 'record' you need:
PUT business/_mapping/record
{
"properties": {
"city": {
"type": "text",
"fielddata": true
}
}
}
If you don't want to or can't re-index there is another way. :-)
You can use the negation operator and a wildcard to match any non-blank string *
GET /my_index/_search?q=!(fieldToLookFor:*)
For nested fields use:
curl -XGET "http://localhost:9200/city/_search?pretty=true" -d '{
"query" : {
"nested" : {
"path" : "country",
"score_mode" : "avg",
"query" : {
"bool": {
"must_not": {
"exists": {
"field": "country.name"
}
}
}
}
}
}
}'
NOTE: path and field together constitute for search. Change as required for you to work.
For regular fields:
curl -XGET 'http://localhost:9200/city/_search?pretty=true' -d'{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "name"
}
}
}
}
}'
I didn't manage to search for empty strings in a text field. However it seems to work with a field of type keyword. So I suggest the following:
delete /test_idx
put test_idx
{
"mappings" : {
"testMapping": {
"properties" : {
"tag" : {"type":"text"},
"content" : {"type":"text",
"fields" : {
"x" : {"type" : "keyword"}
}
}
}
}
}
}
put /test_idx/testMapping/1
{
"tag": "null"
}
put /test_idx/testMapping/2
{
"tag": "empty",
"content": ""
}
GET /test_idx/testMapping/_search
{
"query" : {
"match" : {"content.x" : ""}}}
}
}
You need to trigger the keyword indexer by adding .content to your field name. Depending on how the original index was set up, the following "just works" for me using AWS ElasticSearch v6.x.
GET /my_idx/_search?q=my_field.content:""
I am trying to find the empty fields (in indexes with dynamic mapping) and set them to a default value and the below worked for me
Note this is in elastic 7.x
POST <index_name|pattern>/_update_by_query
{
"script": {
"lang": "painless",
"source": """
if (ctx._source.<field name>== "") {
ctx._source.<field_name>= "0";
} else {
ctx.op = "noop";
}
"""
}
}
I followed one of the responses from the thread and came up with below it will do the same
GET index_pattern*/_update_by_query
{
"script": {
"source": "ctx._source.field_name='0'",
"lang": "painless"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "field_name"
}
}
],
"must_not": [
{
"wildcard": {
"field_name": "*"
}
}
]
}
}
}
I am also trying to find the documents in the index that dont have the field and add them with a value
one of the responses from this thread helped me to come up with below
GET index_pattern*/_update_by_query
{
"script": {
"source": "ctx._source.field_name='0'",
"lang": "painless"
},
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "field_name"
}
}
]
}
}
}
Thanks to every one who contributed to this thread I am able to solve my problem

ElasticSearch Script Score Using Field Value

ElasticSearch 1.2.1
I am trying to query documents using weighted tags.
curl -X PUT 'http://localhost:9200/test'
curl -X PUT 'http://localhost:9200/test/thing/_mapping' - '{
"thing": {
"properties": {
"tags": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"weight": { "type": "integer" }
}
}
}
}}'
Adding a document:
curl -X PUT 'http://localhost:9200/test/thing/1', -d '{
"tags": [
{ "name": "a", "weight": 2 }
]
}'
Now I am searching for documents having a tag a and boost the score based on the weight.
Note: to run these examples you have to enable dynamic scripting in ElasticSearch: add script.disable_dynamic: false to elasticsearch.yml
curl -X GET 'http://localhost:9200/test/thing/_search?pretty' -d '{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"nested": {
"path": "tags",
"filter": {
"term": {
"tags.name": "a"
}
}
}
},
"script_score": {
"script": "doc.weight.value"
}
}
]
}
}
}'
The document is found, as expected, however the score is 0. It seems as if the property doc.weight was empty.
Let's test this by replacing the script with doc.weight.empty ? 50 : 100. The hit now has a score of 50 indicating that the field doc.weight is empty. It is found though, because using a non-existant field name (e.g. doc.foobar) gives an error.
Background: The match_all part would be replaced by a real query. I want to use the tags to boost results matching the tags before the ones not matching the tags.
What am I missing?

I don't get any documents back from my elasticsearch query. Can someone point out my mistake?

I thought I had figured out Elasticsearch but I suspect I have failed to grok something, and hence this problem:
I am indexing products, which have a huge number of fields, but the ones in question are:
{
"show_in_catalogue": {
"type": "boolean",
"index": "no"
},
"prices": {
"type": "object",
"dynamic": false,
"properties": {
"site_id": {
"type": "integer",
"index": "no"
},
"currency": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "float"
},
"gross_tax": {
"type": "integer",
"index": "no"
}
}
}
}
I am trying to return all documents where "show_in_catalogue" is true, and there is a price with site_id 1:
{
"filter": {
"term": {
"prices.site_id": "1",
"show_in_catalogue": true
}
},
"query": {
"match_all": {}
}
}
This returns zero results. I also tried an "and" filter with two separate terms - no luck.
A subset of one of the documents returned if I have no filters looks like:
{
"prices": [
{
"site_id": 1,
"currency": "GBP",
"value": 595,
"gross_tax": 1
},
{
"site_id": 2,
"currency": "USD",
"value": 745,
"gross_tax": 0
}
]
}
I hope I am OK to omit so much of the document here; I don't believe it to be contingent but I cannot be certain, of course.
Have I missed a vital piece of knowledge, or have I done something terminally thick? Either way, I would be grateful for an expert's knowledge at this point. Thanks!
Edit:
At the suggestion of J.T. I also tried reindexing the documents so that prices.site_id was indexed - no change. Also tried the bool/must filter below to no avail.
To clarify, the reason I'm using an empty query is that the web interface may supply a query string, but the same code is used to simply filter all products. Hence I left in the query, but empty, since that's what Elastica seems to produce with no query string.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}
}
}
You have site_id set as {"index": "no"}. This tells ElasticSearch to exclude the field from the index which makes it impossible to query or filter on that field. The data will still be stored. Likewise, you can set a field to only be in the index and searchable, but not stored.
I'm new to ElasticSearch as well and can't always grok the questions! I'm actually confused by you query. If you are going to "just filter" then you don't need a query. What I don't understand is your use of two fields inside the term filter. I've never done this. I guess it acts as an OR? Also, if nothing matches, it seems to return everything. If you wanted a query with the results of that query filtered, then you would want to use a
-d '{
"query": {
"filtered": {
"query": {},
"filter": {}
}
}
}'
If you just want to apply filters is the filter that should work without any "query" necessary
-d '{
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}'

Resources