Elasticsearch - Can't search using suggestion field (“is not a completion suggest field”) - elasticsearch

I'm completely new to elasticsearch and I'm trying to use elasticsearch completion suggester on an existing field called "identity.full_name", index = "search" and type = "person".
I followed the below index to change the mappings of the field.
1)
POST /search/_close
2)
POST search/person/_mapping
{
"person": {
"properties": {
"identity.full_name": {
"type": "text",
"fields":{
"suggest":{
"type":"completion"
}
}
}
}
}
}
3)
POST /search/_open
When I check the mappings at this point, using
GET search/_mapping/person/field/identity.full_name
I get the result,
{
"search": {
"mappings": {
"person": {
"identity.full_name": {
"full_name": "identity.full_name",
"mapping": {
"full_name": {
"type": "text",
"fields": {
"completion": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
},
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
}
}
}
Which is suggesting that it has been updated to be a completion field.
However, when I'm querying to check if this works using,
GET search/person/_search
{
"suggest": {
"person-suggest" : {
"prefix" : "EMANNUEL",
"completion" : {
"field" : "identity.full_name"
}
}
}
}
It is giving me the error "Field [identity.full_name] is not a completion suggest field"
I'm not sure why I'm getting this error. Is there anything else I can try?
sample data:
{
"_index": "search",
"_type": "person",
"_id": "3106105149",
"_score": 1,
"_source": {
"identity": {
"id": "3106105149",
"first_name": "FLORENT",
"last_name": "TEBOUL",
"full_name": "FLORENT TEBOUL"
}
}
}
{
"_index": "search",
"_type": "person",
"_id": "125296353",
"_score": 1,
"_source": {
"identity": {
"id": "125296353",
"first_name": "CHRISTINA",
"last_name": "BHAN",
"full_name": "CHRISTINA K BHAN"
}
}
}
so when I do a GET based on prefix "CHRISTINA"
GET search/person/_search
{
"suggest": {
"person-suggest" : {
"prefix" : "CHRISTINA",
"completion" : {
"field" : "identity.full_name.suggest"
}
}
}
}
I'm getting all the results like a match_all query.

You should use it like
GET search/person/_search
{
"suggest": {
"person-suggest" : {
"prefix" : "EMANNUEL",
"completion" : {
"field" : "identity.full_name.suggest"
}
}
}
}
Mapping for GET search/_mapping/person/field/identity.full_name
{
"search" : {
"mappings" : {
"person" : {
"identity.full_name" : {
"full_name" : "identity.full_name",
"mapping" : {
"full_name" : {
"type" : "text",
"fields" : {
"suggest" : {
"type" : "completion",
"analyzer" : "simple",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50
}
}
}
}
}
}
}
}
}

Related

Count total number of words of all documents pointing to specific fields

Someone asked this question but no one seems to answer or tried to suggest possible ways to solve it: https://discuss.elastic.co/t/count-the-number-of-words-in-the-field-elastic-search-6-2/121373
Now, I'm trying to produce a report from Elasticsearch to count the number of WORDS / TOKENS from a specific field called title and content
Is there a proper aggregation for this?
For example, I have this query:
GET web/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"fields":[
"title",
"content"
],
"query":"((\"Hello\") AND (\"World\")"
}
},
{
"range":{
"pub_date":{
"from":1569456000,
"to":1570060800
}
}
}
]
}
}
}
And for example, this query produced 23 DOCUMENTS, I want to make a response telling me how MANY words do those 23 documents contain based from the title and content fields?
I would leverage the token_count data type. In your index, you can add a sub-field of type token_count to your title and content fields, like this:
PUT web
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
},
"content": {
"type": "text",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
Then, in order to find out the number of tokens, you can simply run a sum aggregation on the .length sub-field, like this:
POST web/_search
{
"size": 0,
"aggs": {
"title_tokens": {
"sum": {
"field": "title.length"
}
},
"content_tokens": {
"sum": {
"field": "content.length"
}
}
}
}
I am using data type called token_count It will calculate and store the count of tokens for each text. This count value can be utilized to get the token count of fields
PUT index18
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
Data:
"hits" : [
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "edJPtW0BVHM68p7X-Wlu",
"_score" : 1.0,
"_source" : {
"title" : "Mayor Isko"
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "etJQtW0BVHM68p7XGmmr",
"_score" : 1.0,
"_source" : {
"title" : "Isko"
}
}
]
Query
GET index18/_search
{
"query": {"match_all": {}},
"aggs": {
"WordCount": {
"sum": {
"field": "title.length"
}
}
}
}

ElasticSearch: Highlighting with Stemming

I have read this question and attempted to understand the documentation here, but this is complicated.
The problem (I think):
[update 1]
I am using Scala for my code and interface with ES High Level Java API.
I have a stemming analyzer configured. If I search for responsibilities i get results for responsibilities and responsibility. That's great.
BUT
Only the documents with the term responsibilities return highlights.
This is because the search is on the stemmed content , i.e., responsib. However, the highlight is against the unstemmed content. Hence, it finds responsibilities which was a search criteria, but not responsibility, which wasn't.
If I set the highlighter to highlight on the stemmed content, it returns nothing at all. I guess because it is comparing resonsib with responsibilities
Search
I an using the Java high level API. The problem is not the code itself.
Currently, I am highlighting only the content field, returning only responsibilities. Highlighting content.english seems to return nothing
private def buildHighlighter(): HighlightBuilder = {
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder
val highlightBuilder = new HighlightBuilder
val highlightContent = new HighlightBuilder.Field("content")
highlightContent.highlighterType("unified")
highlightBuilder.field(highlightContent)
highlightBuilder
}
Mapping (adumbrated)
{
"settings": {
"number_of_shards": 3,
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": []
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
},
"content": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
[update 2]
Scala code to implement search:
def searchByField(indices: Seq[ESIndexName], terms: Seq[(String, String)], size: Int = 20): SearchResponse = {
val searchRequest = new SearchRequest
searchRequest.indices(indices.map(idx => idx.completeIndexName()): _*)
searchRequest.source(buildTargetFieldsMatchQuery(terms, size))
searchRequest.indicesOptions(IndicesOptions.strictSingleIndexNoExpandForbidClosed())
client.search(searchRequest, RequestOptions.DEFAULT)
}
and query is built as follows:
private def buildTargetFieldsMatchQuery(termsByField: Seq[(String, String)], size: Int): SearchSourceBuilder = {
val query = new BoolQueryBuilder
termsByField.foreach {
case (field, term) =>
if (field == "content") {
logger.debug(field + " should have " + term)
query.should(new MatchQueryBuilder(field+standardAnalyzer, term.toLowerCase))
query.should(new MatchQueryBuilder(field, term.toLowerCase))
}
else if (field == "title"){
logger.debug(field + " should have " + term)
query.should(new MatchQueryBuilder(field+standardAnalyzer, term.toLowerCase())).boost
}
else {
logger.debug(field + " should have " + term)
query.should(new MatchQueryBuilder(field, term.toLowerCase))
}
}
val sourceBuilder: SearchSourceBuilder = new SearchSourceBuilder()
sourceBuilder.query(query)
sourceBuilder.from(0)
sourceBuilder.size(size)
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS))
sourceBuilder.highlighter(buildHighlighter())
}
With plain REST the following is working fine for me:
PUT test
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": []
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"content": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
}
POST test/_doc/
{
"content": "This is my responsibility"
}
POST test/_doc/
{
"content": "These are my responsibilities"
}
GET test/_search
{
"query": {
"match": {
"content.english": "responsibilities"
}
},
"highlight": {
"fields": {
"content.english": {
"type": "unified"
}
}
}
}
The result is then:
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5D5PPGoBqgTTLzdtM-_Y",
"_score" : 0.18232156,
"_source" : {
"content" : "This is my responsibility"
},
"highlight" : {
"content.english" : [
"This is my <em>responsibility</em>"
]
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5T5PPGoBqgTTLzdtZe8U",
"_score" : 0.18232156,
"_source" : {
"content" : "These are my responsibilities"
},
"highlight" : {
"content.english" : [
"These are my <em>responsibilities</em>"
]
}
}
]
Looking at your Java / Groovy (?) code it looks close enough to the example in the docs. Could you log the actual query you are running, so we can figure out what is going wrong? Generally it should work like this.

Is it impossible to index a document where a property has multiple fields, one of them being a completion type with contexts?

Here is my mapping (some fields renamed/removed), I'm using ES 6.0
{
"mappings": {
"_doc" :{
"properties" : {
"username" : {
"type": "keyword",
"fields": {
"suggest" : {
"type" : "completion",
"contexts": [
{
"name": "user_id",
"type": "category"
}
]
}
}
},
"user_id": {
"type": "integer"
}
}
}
}
}
Now when I try to index a document with
PUT usernames/_doc/1
{
"username" : "JOHN",
"user_id": 1
}
OR
PUT usernames/_doc/1
{
"username" : {
"input": "JOHN",
"contexts: {
"user_id": 1
}
}
"user_id": 1
}
The first doesn't index with context and the second just fails. I've attempted to add a path like so,
{
"mappings": {
"_doc" :{
"properties" : {
"username" : {
"type": "keyword",
"fields": {
"suggest" : {
"type" : "completion",
"contexts": [
{
"name": "user_id",
"type": "category",
"path": "user_id",
}
]
}
}
},
"user_id": {
"type": "integer"
}
}
}
}
}
And attempting indexing again
PUT usernames/_doc/1
{
"username" : "JOHN",
"user_id": 1
}
But it just throws a context must be a keyword or text error. Do I have to give up and make a totally new property username-autocomplete instead? Or is there some magical way where I can have a context completion suggester and another field on the same property, and be able to index like I would other multifield properties?
The second approach is the right one (i.e. with the path inside the context), but you need to set the user_id field as a keyword and it will work:
{
"mappings": {
"_doc" :{
"properties" : {
"username" : {
"type": "keyword",
"fields": {
"suggest" : {
"type" : "completion",
"contexts": [
{
"name": "user_id",
"type": "category",
"path": "user_id",
}
]
}
}
},
"user_id": {
"type": "keyword" <--- change this
}
}
}
}
}
Then you can index your document without creating an additional field, like this:
PUT usernames/_doc/1
{
"username" : "JOHN",
"user_id": "1" <--- wrap in double quotes
}

How do I increment the weight of a completion suggest field?

I am making a completion suggester. I would like to increment the weight of some of the indexed docs by incrementing them. I have:
POST /tester/
{
"mappings": {
"song": {
"properties": {
"suggest": {
"type": "completion",
"analyzer": "simple",
"search_analyzer" : "simple",
"payloads": true,
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 100
}
}
}
}
}
// Index a doc
PUT tester/song/1
{
"name" : "Nevermind",
"suggest" : {
"input": [ "Nevermind", "Nirvana" ],
"output": "Nirvana - Nevermind",
"payload" : { "artistId" : 2321 },
"weight" : 1
}
}
// Increment the weight
POST /tester/song/1
{
"script" : {
"inline": "ctx._source.suggest.weight += 1"
}
}
// The result of GET /tester
{
"_index": "tester",
"_type": "song",
"_id": "1",
"_score": 1,
"_source": {
"script": {
"inline": "ctx._source.suggest.weight += 1"
}
}
}
Rather than incrementing the weight it rewrites the document. What am I doing wrong here?
First by adding these lines to your configuration you should enable dynamic scripting:
script.inline: true
script.indexed: true
Then you need to use _update endpoint to update:
POST 'localhost:9200/tester/song/1/_update' -d '
{
"script" : {
"inline": "ctx._source.suggest.weight += 1"
}
}'
Check:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#_scripted_updates

Elasticsearch context suggester, bool on contexts

I'm using context suggester and am wondering if we can set the scope of the context to be used for suggestions rather that using all contexts.
Currently the query needs to match all contexts. Can we add an "OR" operation on the contexts and/or specify which context to use for a particular query?
Taking the example from here :
Mapping :
PUT /venues/poi/_mapping
{
"poi" : {
"properties" : {
"suggest_field": {
"type": "completion",
"context": {
"type": {
"type": "category"
},
"location": {
"type": "geo",
"precision" : "500m"
}
}
}
}
}
}
Then I index a document :
{
"suggest_field": {
"input": ["The Shed", "shed"],
"output" : "The Shed - fresh sea food",
"context": {
"location": {
"lat": 51.9481442,
"lon": -5.1817516
},
"type" : "restaurant"
}
}
}
Query:
{
"suggest" : {
"text" : "s",
"completion" : {
"field" : "suggest_field",
"context": {
"location": {
"value": {
"lat": 51.938119,
"lon": -5.174051
}
}
}
}
}
}
If I query using only one Context ("location" in the above example) it gives an error, I need to pass both the contexts, is it possible to specify which context to use? Or pass something like a "Context_Operation" parameter set to "OR".
You have 2 choices:
First you add all available type values as default in your mapping (not scalable)
{
"poi" : {
"properties" : {
"suggest_field": {
"type": "completion",
"context": {
"type": {
"type": "category",
"default": ["restaurant", "pool", "..."]
},
"location": {
"type": "geo",
"precision" : "500m"
}
}
}
}
}
}
Second option, you add a default value to every indexed document, and you add only this value as default
Mapping:
{
"poi" : {
"properties" : {
"suggest_field": {
"type": "completion",
"context": {
"type": {
"type": "category",
"default": "any"
},
"location": {
"type": "geo",
"precision" : "500m"
}
}
}
}
}
}
Document:
{
"suggest_field": {
"input": ["The Shed", "shed"],
"output" : "The Shed - fresh sea food",
"context": {
"location": {
"lat": 51.9481442,
"lon": -5.1817516
},
"type" : ["any", "restaurant"]
}
}
}

Resources