Some Elastic fields DSL query searchable and some not - elasticsearch

I'm using Elastic Search 6.8.1 and Dynamic Mapping. I have one document in the index now, and am testing out searching on various fields. I make a post to http://localhost:9200/documents/_search and send a DSL query
{
"query":
{"bool":{"must":{"term":{"name": "item2"}}} }
}
and I get the document I expect:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "nRMOs5DZg",
"_score": 0.2876821,
"_source": {
"freeform": "DEF",
"name": "item2",
"url": "s3://mybucket/key",
"visible": true
}
}
]
}
}
Now, I want to make sure that I can search on the "freeform" field by changing the query to
{
"query":
{"bool":{"must":{"term":{"freeform": "DEF"}}} }
}
This results in no hits and I can't understand why.
[EDIT]
Here is the dynamic mapping
{
"documents": {
"aliases": {},
"mappings": {
"document": {
"properties": {
"freeform": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"visible": {
"type": "boolean"
}
}
}
},
"settings": {
"index": {
"creation_date": "1564776393764",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "2er2TF-ySEKgk6gd32K6Ig",
"version": {
"created": "6080199"
},
"provided_name": "documents"
}
}
}
}

It's hard to answer without seeing your mapping, but my guess would be this:
The dynamic mapping tries to guess the data type to assign to your fields; the default for string fields is the "text" data type, which means their value is analyzed and stored as a list of normalized terms, which is useful for free-text search. The string "item2" happens to survive this analysis unchanged, but "DEF" would be analyzed to "def".
Since you're using a term query, the queried term doesn't go through the same analysis process, so you have to query using the analyzed term in order to match the document.
Try searching for "def" instead of "DEF" to test this hypothesis. Also, take a look at the automatically-generated mapping for your index and you'll see which data type each field was mapped to.
If this is indeed the case, you can do one of several things:
If you want exact-string matching: change the mapping from text to keyword (you can control dynamic mapping using Dynamic Templates); or alternatively search using the keyword sub-field which is created automatically for you by searching against freeform.raw instead of freeform.
If you want "free-text" matching: use a match query instead of a term query so both the input and the document value undergo the same analysis (but make sure you understand how analysis and match queries work).

Related

How does type ahead in ElasticSearch work on multiple words and partial text match

I would like to explain with an example.
Documents of my ElasticSearch dataset has a field 'product_name'.
One document has product_name = 'Anmol Twinz Biscuit"
When the user types (a)'Anmol Twin' or (b)'Twin Anmol' or (c)'Twinz Anmol' or (d) Anmol Twinz, I want this specific record returned as search result.
However, this works only if I specify the complete words in the search query. Partial matches are not working. Thus (a) & (b) is not returning the desired result.
Mapping defined (obtained by _mapping query)
{
"sbis_product_idx": {
"mappings": {
"items": {
"properties": {
"category_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_company": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_id": {
"type": "long"
},
"product_name": {
"type": "text"
},
"product_price": {
"type": "float"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
Query being used:
{
"_source": "product_name",
"query": {
"multi_match" : {
"type": "best_fields",
"query": "Twin Anmol",
"fields": [ "product_name", "product_company" ],
"operator": "and"
}
}
}
The document in ES
{
"_index": "sbis_product_idx",
"_type": "misc",
"_id": "107996",
"_version": 1,
"_score": 0,
"_source": {
"suggest": {
"input": [
"Anmol",
"Twinz",
"Biscuit"
]
},
"category_name": "Other Product",
"product_company": "Anmol",
"product_price": 30,
"product_name": "Anmol Twinz Biscuit",
"product_id": 107996
}
}
Result
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
Mistake in query / mapping?
I just created the index with your mapping and indexed ES doc given in your example and just changed the operator in your query from and to or and it's giving me all result for all 4 query combinations.
Find below my query
{
"_source": "product_name",
"query": {
"multi_match" : {
"type": "best_fields",
"query": "Anmol Twinz",
"fields": [ "product_name", "product_company" ],
"operator": "or" --> changed it to `or`
}
}
}
With and operator your query tries to find both terms in your search query, some of which are not complete token like Twin in ES, hence you were not getting results for them, when you change your operator to or then if any of the token present, it will match.
Note:- if you want to match on partial tokens like Twin or Twi then, you need to use the n-gram tokens as explained in official ES doc https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html and its a completely different design.

Nested query not working on Elasticsearch 1.7 if mapping with same name exists

I just downgraded my local ES from 2.1.8 to 1.7.5 to match AWS Elasticsearch and now my nested queries aren't working. I have to admit I'm baffled and couldn't find anything helpful online.
I've abbreviated the following for clarity and changed some of the names but otherwise these are real outputs from my local ES. The final nested result correctly returned file documents with the matching package on 2.1 but nothing on 1.7.
Update: I actually have another nested field that is not exhibiting this problem. The difference is the value for that is a single nested object instead of an array. Known issue?
Update #2: Changing the value to a single value made no difference. However, changing the nested property name from package to packages made the problem go away. The only thing I can think of is that I also have a mapping called package, would that cause a problem?
Mapping
"file": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"properties": {
"name": {
"type": "string"
},
"type": {
"type": "string",
"index": "not_analyzed"
},
"package": {
"type": "nested",
"dynamic": "strict",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"path": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Document
Search
{ "query": {"term": {"type": "file"}} }
Result
{
"_index": "blah",
"_type": "file",
"_id": "slkdfjsdfjsoijfoisjfisdjf",
"_score": 7.8872123,
"_source": {
"name": "foo",
"type": "file",
"package": [
{
"name": "the_package",
"path": "the_package!path"
}
]
}
}
Term Vectors
localhost:9200/blah/file/slkdfjsdfjsoijfoisjfisdjf/_termvector?pretty=true&fields=package.name
{
"_index": "blah",
"_type": "file",
"_id": "slkdfjsdfjsoijfoisjfisdjf",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"package.name": {
"field_statistics": {
"sum_doc_freq": 1040,
"doc_count": 1040,
"sum_ttf": 1040
},
"terms": {
"the_package": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 7
}
]
}
}
}
}
}
Nested Query
{
"query": {
"nested":{
"path": "package",
"query": {
"term": {
"package.name": "the_package"
}
}
}
}
}
Result
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Following update #2 I tried deleting the package mapping and sure enough the nested query now works as expected. I'll update my mappings to avoid this issue.
Nothing in the ES nested object documentation suggests this should be an issue and it has obviously been fixed between 1.7.5 and 2.1.8 so if anyone knows of such documentation or a link to a fixed bug feel free to add. Posting this as an answer in case anyone else hits this.

how to switch on the elasticsearch stemming

I don't know how to turn on the Elasticsearch English word stemming. I am sorry I didn't find out a clear example to do that.
Here is what I did
Creating the index
PUT /staff/list/ -d
{
"settings" : {
"analysis": {
"analyzer": {
"standard": {
"type": "standard"
}
}
}
}
}
Adding document
PUT /staff/list/jason
{
"Title" : "searches"
}
when I search for search
GET /staff/list/_search?q=search
The result doesnt appear.
What index setting I should do to make the stemming works.
Many thanks in advance
Please note that the default Elasticsearch analyzer do not support stemming.
In order to support stemming you may need to create a custom analyzer.
Here is how you do it:
Create the index and define an analyzer called my_analyzer
PUT /staff
{
"settings" : {
"analysis": {
"filter": {
"filter_snowball_en": {
"type": "snowball",
"language": "English"
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"filter_snowball_en"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
Configure mapping that assigns my_analyzer to list type
PUT /staff/_mapping/list
{
"list": {
"properties": {
"title": {
"type": "string",
"analyzer": "my_analyzer"
}
}
}
}
Index documents
PUT /staff/list/jason
{
"title": "searches"
}
PUT /staff/list/debby
{
"title": "searched open"
}
Search and stemmed results
GET staff/list/_search
{
"query": {
"query_string": {
"query": "title:opened"
}
}
}
Result
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "staff",
"_type": "list",
"_id": "debby",
"_score": 1,
"_source": {
"title": "open"
}
}]
}
}
As you can see in the search results, debby document which contains the term
open was returned although we where searching for opened.
Hope that helps.
When you create the index, you are doing nothing (just re-declaring the standard analyzer).
The standard analyzer is the default that Elasticsearch uses, which doesn't stem any word.
You need to map the fields to their respective analyzers at your index creation (mapping documentation):
PUT /staff -d
{
"mappings": {
"list": {
"properties": {
"Title": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
I guess english analyzer fits to your case (which uses the standard tokenizer).

ElasticSearch - Upgraded indexed field to a Multi Field - new field is empty

Having noticed my sort on an indexed string field doesn't work properly, I've discovered that it's sorting analyzed strings so "bags of words" and if I want it to work properly I have to sort on the non-analyzed string. My plan was to just change the string field to a multi-field, using information I found in those two articles:
https://www.elastic.co/blog/changing-mapping-with-zero-downtime (Upgrade to a multi-field part)
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
Using Sense I've created this field mapping
PUT myindex/_mapping/type
{
"properties": {
"Title": {
"type": "string",
"fields": {
"Raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
And then I try to sort my search results using the newly made field. I've put all of the name variations I could think of after reading the articles:
POST myindex/_search
{
"_source" : ["Title","titlemap.Raw","titlemap.Title","titlemap.Title.Raw","Title.Title","Raw","Title.Raw"],
"size": 6,
"query": {
"multi_match": {
"query": "title",
"fields": ["Title^5"
],
"fuzziness": "auto",
"type": "best_fields"
}
},
"sort": {
"Title.Raw": "asc"
}
}
And that's what I get in response:
{
"_index": "myindex_2015_11_26_12_22_38",
"_type": "type",
"_id": "1205",
"_score": null,
"_source": {
"Title": "The title of the item"
},
"sort": [
null
]
}
Only the Title field's value is shown in the response and the sort criterium is null for every result.
Am I doing something wrong, or is there another way to do that?
The index name is not the same after re-indexing and thus the default mapping gets installed... that's probably why.
I suggest using an index template instead, so you don't have to care when to create the index and ES will do it for you. The idea is to create a template with the proper mapping you need and then ES will create every new index whenever it deems necessary, add the myindex alias and apply the proper mapping to it.
curl -XPUT localhost:9200/_template/myindex_template -d '{
"template": "myindex_*",
"settings": {
"number_of_shards": 1
},
"aliases": {
"myindex": {}
},
"mappings": {
"type": {
"properties": {
"Title": {
"type": "string",
"fields": {
"Raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
Then whenever you launch your re-indexing process a new index with a new name will be created BUT with the proper mapping and the proper alias.

Elasticsearch: query for multiple words across multiple fields (with prefix)

I'm trying to implement an auto-suggest control powered by an ES index. The index has multiple fields and I want to be able to query across multiple fields using the AND operator and allowing for partial matches (prefix only).
Just as an example, let's say I got 2 fields I want to query on: "colour" and "animal".
I would like to be able to fulfil queries like "duc", "duck", "purpl", "purple", "purple duck".
I managed to get all these working using multi_match() with AND operator.
What I don't seem to be able to do is match on queries like "purple duc", as multi_match doesn't allow for wildcards.
I've looked into match_phrase_prefix() but as i understand it, it doesn't span across multiple fields.
I'm turning toward the implementation of a tokeniser: it feels the solution may be there, so ultimately the questions are:
1) can someone confirm there's no out-of-the-box function to do what I want to do? It feels like a common enough pattern that there could be something ready to use.
2) can someone suggest any solution? Are tokenizers part of the solution?
I'm more than happy to be pointed in the right direction and do more research myself.
Obviously if someone has working solutions to share that would be awesome.
Thanks in advance
- F
I actually wrote a blog post about this awhile back for Qbox, which you can find here: http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams. (Unfortunately some of the links on the post are broken, and can't easily be fixed at this point, but hopefully you'll get the idea.)
I'll refer you to the post for the details, but here is some code you can use to test it out quickly. Note that I'm using edge ngrams instead of full ngrams.
Also note in particular the use of the _all field, and the match query operator.
Okay, so here is the mapping:
PUT /test_index
{
"settings": {
"analysis": {
"filter": {
"edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"edgeNGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"edgeNGram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"_all": {
"enabled": true,
"index_analyzer": "edgeNGram_analyzer",
"search_analyzer": "standard"
},
"properties": {
"field1": {
"type": "string",
"include_in_all": true
},
"field2": {
"type": "string",
"include_in_all": true
}
}
}
}
}
Now add a few documents:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"field1":"purple duck","field2":"brown fox"}
{"index":{"_id":2}}
{"field1":"slow purple duck","field2":"quick brown fox"}
{"index":{"_id":3}}
{"field1":"red turtle","field2":"quick rabbit"}
And this query seems to illustrate what you're wanting:
POST /test_index/_search
{
"query": {
"match": {
"_all": {
"query": "purp fo slo",
"operator": "and"
}
}
}
}
returning:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.19930676,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.19930676,
"_source": {
"field1": "slow purple duck",
"field2": "quick brown fox"
}
}
]
}
}
Here is the code I used to test it out:
http://sense.qbox.io/gist/b87e426062f453d946d643c7fa3d5480cd8e26ec

Resources