I'm trying to use ElasticSearch 1.7.3 to implement a Did-you-mean function for my company's search engine. I've followed the documentation to set up a Phrase Suggester and created a customized mapping to support that.
However, when I do a _suggest query, I get ElasticsearchIllegalArgumentException[Suggester[simple_phrase] not supported]. What am I doing wrong?
This is my query:
POST knowledge_graph/entities/_suggest
{
"suggest": {
"text" : "apple in",
"simple_phrase": {
"phrase" : {
"field" : "canonical_name"
}
}
}
}
I get the following response:
{
"_shards": {
"total": 5,
"successful": 0,
"failed": 5,
"failures": [
{
"index": "knowledge_graph",
"shard": 0,
"status": 500,
"reason": "BroadcastShardOperationFailedException[[knowledge_graph][0] ]; nested: ElasticsearchException[failed to execute suggest]; nested: ElasticsearchIllegalArgumentException[Suggester[simple_phrase] not supported]; "
},
...
]
}
}
Here's my index's settings and mappings:
{
"knowledge_graph": {
"aliases": {},
"mappings": {
"entities": {
"properties": {
"autocomplete": {
"type": "completion",
"analyzer": "simple",
"payloads": true,
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
},
"canonical_name": {
"type": "string",
"analyzer": "simple",
"fields": {
"shingles": {
"type": "string",
"analyzer": "simple_shingle_analyzer"
}
}
},
"entity_query": {
"properties": {
"simple_phrase": {
"properties": {
"phrase": {
"properties": {
"field": {
"type": "string"
}
}
}
}
},
"text": {
"type": "string"
}
}
},
"suggest": {
"properties": {
"simple_phrase": {
"properties": {
"phrase": {
"properties": {
"field": {
"type": "string"
}
}
}
}
},
"text": {
"type": "string"
}
}
}
}
}
},
"settings": {
"index": {
"creation_date": "1449251691345",
"analysis": {
"filter": {
"shingles_1_6": {
"type": "shingle",
"max_shingle_size": "6",
"output_unigrams_if_no_shingles": "true"
}
},
"analyzer": {
"simple_shingle_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"shingles_1_6"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "0",
"version": {
"created": "1070399"
},
"uuid": "g_Yp7z6kQHCDRtd6TvVlzQ"
}
},
"warmers": {}
}
}
There are two ways to execute suggest requests, one with _search endpoint and another with _suggest endpoint.
From the Docs
Suggest requests executed against the _suggest endpoint should omit
the surrounding suggest element which is only used if the suggest
request is part of a search.
Your query will work if you execute it against _search api
POST knowledge_graph/entities/_search <---- here
{
"suggest": {
"text" : "apple in",
"simple_phrase": {
"phrase" : {
"field" : "canonical_name"
}
}
},
"size" : 0
}
If you want to query with _suggest endpoint try this
POST knowledge_graph/_suggest
{
"suggest": {
"text": "apple in",
"phrase": {
"field": "canonical_name"
}
}
}
Note - I think you should be executing phrase suggest against canonical_name.shingles
Related
I am trying to have title field as both text and completion types in elastic search.
As shown below
PUT playlist
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2,
"analysis": {
"filter": {
"custom_english_stemmer": {
"type": "stemmer",
"name": "english"
},
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"custom_lowercase_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"english_stop",
"custom_english_stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "long",
"index": false,
"doc_values": false
},
"title": {
"type": "text",
"analyzer": "custom_lowercase_analyzer",
"fields": {
"raw": {
"type": "completion"
}
}
}
}
}
}
The below suggestion query works
POST media/_search
{
"_source": ["id", "title"],
"suggest": {
"job-suggest": {
"prefix": "sri",
"completion": {
"field": "title"
}
}
}
}
But normal search would fail on the same title
GET media/_search
{
"_source": ["id", "title"],
"query" : {
"query_string": {
"query" : "*sri*",
"fields" : [
"title"
]
}
}
}
Please help me solve this problem
(Using ES 6.7)
I have an index and want to support search-as-you-type feature. For that, I want to try completion suggester but I'm having trouble in reindexing to change the mappings old index.
Here's the old index mappings
{
"old-index": {
"mappings": {
"doc": {
"properties": {
"content": {
"type": "text"
},
"project": {
"type": "keyword"
},
"title": {
"type": "text"
},
"version": {
"type": "keyword"
}
}
}
}
}
}
Here's the new test index mappings
PUT test-completion
{
"mappings": {
"doc": {
"properties": {
"content": {
"type": "text",
"fields": {
"autocomplete": {
"type": "completion",
"contexts": [
{
"name": "project",
"type": "category",
"path": "project"
},
{
"name": "version",
"type": "category",
"path": "version"
}
]
}
}
},
"title": {
"type": "text"
},
"project": {
"type": "keyword"
},
"version": {
"type": "keyword"
}
}
}
}
}
Here's the reindexing query
POST _reindex
{
"source": {
"index": "old-index"
},
"dest": {
"index": "test-completion"
}
}
And here's the query which returns no results
POST test-completion/_search
{
"suggest": {
"autocompletion_suggest": {
"prefix": "part of documentation",
"completion": {
"field": "content.autocomplete",
"fuzzy": {
"fuzziness": "AUTO"
},
"contexts": {
"project": "xyz-project",
"version": "abc-version"
}
}
}
}
}
If the prefix is set to a or b, it returns results outside of context.
Where I'm doing wrong?
https://discuss.elastic.co/t/problem-with-completion-suggester/181695
I am trying to compare hours using painless language in my elasticsearch query. I would like query something like:
{
"script":"doc['schedule.from_time'] >= doc['schedule.to_time']"
}
But I have the error:
Cannot apply [>] operation to types [org.elasticsearch.index.fielddata.ScriptDocValues.Dates]
The scheme of the nested document is:
{
"settings": {
"index.mapping.total_fields.limit": 10000
},
"mappings": {
"_doc": {
"dynamic_templates": [{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "long",
"index": false
}
}
}],
"properties": {
"enabled_services": {
"type": "nested",
"properties": {
"service_id": {
"type": "text",
"analyzer": "whitespace",
"search_analyzer": "whitespace"
},
"available_day_of_week": {
"type": "long"
},
"available_from_time": {
"type": "date",
"format": "hour_minute"
},
"available_to_time": {
"type": "date",
"format": "hour_minute"
}
}
}
}
}
}
}
(The values are formated like "2:00" or "18:00").
I have tried to use .date or .value but it does not work because my variable contains only hours not datetime.
Can someone help me :)
I think you are looking for:
doc['enabled_services.available_from_time'].value.isAfter(doc['enabled_services.available_to_time'].value)
You also need to remove "type": "nested" from your mapping. I think it is not needed in your case.
Working code is below:
Mapping
PUT /painless-dates
{
"settings": {
"index.mapping.total_fields.limit": 10000
},
"mappings": {
"_doc": {
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "long",
"index": false
}
}
}
],
"properties": {
"enabled_services": {
"properties": {
"service_id": {
"type": "text",
"analyzer": "whitespace",
"search_analyzer": "whitespace"
},
"available_day_of_week": {
"type": "long"
},
"available_from_time": {
"type": "date",
"format": "hour_minute"
},
"available_to_time": {
"type": "date",
"format": "hour_minute"
}
}
}
}
}
}
}
Add two elements
POST /painless-dates/_doc
{
"enabled_services": {
"available_from_time": "02:00",
"available_to_time": "18:00"
}
}
POST /painless-dates/_doc
{
"enabled_services": {
"available_from_time": "04:00",
"available_to_time": "03:00"
}
}
Query
GET /painless-dates/_search
{
"query": {
"bool": {
"must": {
"script": {
"script": {
"source": "doc['enabled_services.available_from_time'].value.isAfter(doc['enabled_services.available_to_time'].value)",
"lang": "painless"
}
}
}
}
}
}
Answer
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "painless-dates",
"_type": "_doc",
"_id": "0wFw7mYBueYINcTmJsMG",
"_score": 1,
"_source": {
"enabled_services": {
"available_from_time": "04:00",
"available_to_time": "03:00"
}
}
}
]
}
}
OK I found the answer:
{
"script": {
"script": "doc['enabled_services.available_from_time'].date.isBefore(doc['enabled_services.available_to_time'].date)"
}
}
Thank you all !
I have an index which contains CustomerProfile documents. Each of this document in the CustomerInsightTargets(with the properties Source,Value) property can be an array with x items. What I am trying to achieve is an autocomplete (of top 5) on CustomerInsightTargets.Value grouped by CustomerInisghtTarget.Source.
It will be helpful if anyone gives me hint about how to select only a subset of nested objects from each document and use that nested obj in aggregations.
{
"customerinsights": {
"aliases": {},
"mappings": {
"customerprofile": {
"properties": {
"CreatedById": {
"type": "long"
},
"CreatedDateTime": {
"type": "date"
},
"CustomerInsightTargets": {
"type": "nested",
"properties": {
"CustomerInsightSource": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"CustomerInsightValue": {
"type": "text",
"term_vector": "yes",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "ngram_tokenizer_analyzer"
},
"CustomerProfileId": {
"type": "long"
},
"Guid": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Id": {
"type": "long"
}
}
},
"DisplayName": {
"type": "text",
"term_vector": "yes",
"analyzer": "ngram_tokenizer_analyzer"
},
"Email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Id": {
"type": "long"
},
"ImageUrl": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "customerinsights",
"creation_date": "1484860145041",
"analysis": {
"analyzer": {
"ngram_tokenizer_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "10"
}
}
},
"number_of_replicas": "2",
"uuid": "nOyI0O2cTO2JOFvqIoE8JQ",
"version": {
"created": "5010199"
}
}
}
}
}
Having as example a document:
{
{
"Id": 9072856,
"CreatedDateTime": "2017-01-12T11:26:58.413Z",
"CreatedById": 9108469,
"DisplayName": "valentinos",
"Email": "valentinos#mail.com",
"CustomerInsightTargets": [
{
"Id": 160,
"CustomerProfileId": 9072856,
"CustomerInsightSource": "Tags",
"CustomerInsightValue": "Tag1",
"Guid": "00000000-0000-0000-0000-000000000000"
},
{
"Id": 160,
"CustomerProfileId": 9072856,
"CustomerInsightSource": "ProfileName",
"CustomerInsightValue": "valentinos",
"Guid": "00000000-0000-0000-0000-000000000000"
},
{
"Id": 160,
"CustomerProfileId": 9072856,
"CustomerInsightSource": "Playground",
"CustomerInsightValue": "Wiki",
"Guid": "00000000-0000-0000-0000-000000000000"
}
]
}
}
If i ran an aggregation on the top_hits the result will include all targets from a document -> if one of them match my search text.
Example
GET customerinsights/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "CustomerInsightTargets",
"query": {
"bool": {
"must": [
{
"match": {
"CustomerInsightTargets.CustomerInsightValue": {
"query": "2017",
"operator": "AND",
"fuzziness": 2
}
}
}
]
}
}
}
}
]
}
} ,
"aggs": {
"root": {
"nested": {
"path": "CustomerInsightTargets"
},
"aggs": {
"top_tags": {
"terms": {
"field": "CustomerInsightTargets.CustomerInsightSource.keyword"
},
"aggs": {
"top_tag_hits": {
"top_hits": {
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"size": 5,
"_source": "CustomerInsightTargets"
}
}
}
}
}
}
},
"size": 0,
"_source": "CustomerInsightTargets"
}
My question is how I should use the aggregation to get the "autocomplete" Values grouped by Source and order by the _score. I tried to use a significant_terms aggregation but doesn't work so well, also terms aggs doesn't sort by score (and by _count) and having fuzzy also adds complexity.
I have my analyzers set like this:
"analyzer": {
"edgeNgram_autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "autocomplete"]
},
"full_name": {
"filter":["standard","lowercase","asciifolding"],
"type":"custom",
"tokenizer":"standard"
}
My filter:
"filter": {
"autocomplete": {
"type": "edgeNGram",
"side":"front",
"min_gram": 1,
"max_gram": 50
}
Name field analyzer:
"textbox": {
"_parent": {
"type": "document"
},
"properties": {
"text": {
"fields": {
"text": {
"type":"string",
"analyzer":"full_name"
},
"autocomplete": {
"type": "string",
"index_analyzer": "edgeNgram_autocomplete",
"search_analyzer": "full_name",
"analyzer": "full_name"
}
},
"type":"multi_field"
}
}
}
Put all together, makes up my mapping for docstore index:
PUT http://localhost:9200/docstore
{
"settings": {
"analysis": {
"analyzer": {
"edgeNgram_autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "autocomplete"]
},
"full_name": {
"filter":["standard","lowercase","asciifolding"],
"type":"custom",
"tokenizer":"standard"
}
},
"filter": {
"autocomplete": {
"type": "edgeNGram",
"side":"front",
"min_gram": 1,
"max_gram": 50
} }
}
},
"mappings": {
"space": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"document": {
"_parent": {
"type": "space"
},
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"textbox": {
"_parent": {
"type": "document"
},
"properties": {
"bbox": {
"type": "long"
},
"text": {
"fields": {
"text": {
"type":"string",
"analyzer":"full_name"
},
"autocomplete": {
"type": "string",
"index_analyzer": "edgeNgram_autocomplete",
"search_analyzer": "full_name",
"analyzer":"full_name"
}
},
"type":"multi_field"
}
}
},
"entity": {
"_parent": {
"type": "document"
},
"properties": {
"bbox": {
"type": "long"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Add a space to hold all docs:
POST http://localhost:9200/docstore/space
{
"name": "Space 1"
}
When user enters word: proj
this should return, all text:
SampleProject
Sample Project
Project Name
myProjectname
firstProjectName
my ProjectName
But it returns nothing.
My query:
POST http://localhost:9200/docstore/textbox/_search
{
"query": {
"match": {
"text": "proj"
}
},
"filter": {
"has_parent": {
"type": "document",
"query": {
"term": {
"name": "1-a1-1001.pdf"
}
}
}
}
}
If I search by project, I get:
{ "took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 3.0133555,
"hits": [
{
"_index": "docstore",
"_type": "textbox",
"_id": "AVRuV2d_f4y6IKuxK35g",
"_score": 3.0133555,
"_routing": "AVRuVvtLf4y6IKuxK33f",
"_parent": "AVRuV2cMf4y6IKuxK33g",
"_source": {
"bbox": [
8750,
5362,
9291,
5445
],
"text": [
"Sample Project"
]
}
},
{
"_index": "docstore",
"_type": "textbox",
"_id": "AVRuV2d_f4y6IKuxK35Y",
"_score": 2.4106843,
"_routing": "AVRuVvtLf4y6IKuxK33f",
"_parent": "AVRuV2cMf4y6IKuxK33g",
"_source": {
"bbox": [
8645,
5170,
9070,
5220
],
"text": [
"Project Name and Address"
]
}
}
]
}
}
Maybe my edgengram is not suited for this?
I am saying:
side":"front"
Should I do it differently?
Does anyone know what I am doing wrong?
The problem is with the autocomplete indexing analyzer field name.
Change:
"index_analyzer": "edgeNgram_autocomplete"
To:
"analyzer": "edgeNgram_autocomplete"
And also search like (#Andrei Stefan) showed in his answer:
POST http://localhost:9200/docstore/textbox/_search
{
"query": {
"match": {
"text.autocomplete": "proj"
}
}
}
And it will work as expected!
I have tested your configuration on Elasticsearch 2.3
By the way, type multi_field is deprecated.
Hope I have managed to help :)
Your query should actually try to match on text.autocomplete and not text:
"query": {
"match": {
"text.autocomplete": "proj"
}
}