Elastic Search Highlight Not Working With Custom Analyzer/Tokenizer - elasticsearch

I can't figure out why highlight is not working. The query works but highlight just shows the field content without em tags. Here is my settings and mappings:
PUT wmsearch
{
"settings": {
"index.mapping.total_fields.limit": 2000,
"analysis": {
"analyzer": {
"custom": {
"type": "custom",
"tokenizer": "custom_token",
"filter": [
"lowercase"
]
},
"custom2": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"custom_token": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10
}
}
}
},
"mappings": {
"doc": {
"properties": {
"document": {
"properties": {
"reference": {
"type": "text",
"analyzer": "custom"
}
}
},
"scope" : {
"type" : "nested",
"properties" : {
"level" : {
"type" : "integer"
},
"ancestors" : {
"type" : "keyword",
"index" : "true"
},
"value" : {
"type" : "keyword",
"index" : "true"
},
"order" : {
"type" : "integer"
}
}
}
}
}
}
}
Here is my query:
GET wmsearch/_search
{
"query": {
"simple_query_string" : {
"fields": ["document.reference"],
"analyzer": "custom2",
"query" : "bloom"
}
},
"highlight" : {
"fields" : {
"document.reference" : {}
}
}
}
The query does return the correct results and highlight field exists within results. However, there is not em tags around "bloom". Rather, it just shows the entire string with no tags at all.
Does anyone see any issues here or can help?
Thanks

I got it to work by adding "index_options": "offsets" to my mappings for document.reference.

Related

ELASTICSEARCH- Return filtered fields based on values

I am developing a query in which I would like to identify those fields within "CES", those that show in the field "ces.des" or "ces.output" the value "jack"
{"_source": ["ces.desc","ces.output"] ,
"query": {
"nested": {
"path": "ces",
"query": {
"bool": {
"should": [
{"term": {"ces.desc": "jack"}},
{"term": {"ces.output": "jack"}}
]
}
}
}
},
"aggs": {
"nestedData": {
"nested": {
"path": "ces"
},
"aggs": {
"data_desc": {
"filter": {
"term": {
"ces.desc": "jack"
}
}
}
}
}
}
}
And the output is :
{
"ces" : [
{
"output" : "Laura", <-------------- WRONG
"desc" : "fernando" <-------------- WRONG
},
{
"output" : "",
"desc" : "jack" <-------------- RIGHT
}
"output" : "jack",<-------------- RIGHT
"desc" : "Fer"
},
}
mapping:
{
"names_1" : {
"aliases" : { },
"mappings" : {
"properties" : {
"created_at" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"data" : {
"properties" : {
"addresses" : {
"properties" : {
"asn" : {
"type" : "long"
},
"ces" : {
"type" : "nested",
"properties" : {
"banner" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"desc" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"output" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"source" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"tag" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"error" : {
"type" : "long"
},
"finished_at" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"total" : {
"type" : "long"
}
}
}
}
}
I would like to filter only those that comply that present the values ​​according to the "bool" condition.
if (ces.desc == "jack") or (ces.output == "jack)"
return
ces.desc,ces.output key and value
Even if I add"agg", make a "JACK" count
doc_value = 2
What part of the query am I making the error?
query mapping:
{
"mappings": {
"properties": {
"data.addresses":{
"type":"nested",
"properties": {
"data.addresses.ces": {
"type": "nested"
}
}
}
}
}
}
You need to change your mapping to the following, i.e. BOTH addresses AND ces need to be nested:
{
"aliases": {},
"mappings": {
"properties": {
"created_at": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"data": {
"properties": {
"addresses": {
"type": "nested", <------ MUST BE NESTED
"properties": {
"asn": {
"type": "long"
},
"ces": {
"type": "nested", <------ MUST BE NESTED
"properties": {
"banner": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"desc": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"output": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"tag": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"error": {
"type": "long"
},
"finished_at": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"total": {
"type": "long"
}
}
}
}
Then you simply need to use nested inner_hits:
{
"_source": false,
"query": {
"nested": {
"path": "data.addresses.ces",
"inner_hits": {}, <---- ADD THIS
"query": {
"bool": {
"should": [
{
"term": {
"data.addresses.ces.desc": "jack"
}
},
{
"term": {
"data.addresses.ces.output": "jack"
}
}
]
}
}
}
},
"aggs": {
"nestedData": {
"nested": {
"path": "data.addresses.ces"
},
"aggs": {
"data_desc": {
"filter": {
"term": {
"data.addresses.ces.desc": "jack"
}
}
}
}
}
}
}
And the response will only contain the nested inner hits containing jack

Elasticsearch mappings api not showing my lists as nested type

Elastic search is not recognizing my list of objects as a nested type.
I would like for that to happen automatically without needing to update mapping for every such field.
I need the response of _mappings api to have some sort of identifier that distinguishes properties which are of list type.
For ex:
When i index such a document on a new test index ('mapping_index')
{
"text":"value",
"list":[{"a":"b","c":"d"},{"a":"q","c":"f"}]
}
and hit mappings api
localhost:9200/mapping_index/_mapping
I get
{
"mapping_index": {
"mappings": {
"_doc": {
"properties": {
"list": {
"properties": {
"a": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"c": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
I would want something like
"type" : "nested"
for the "list" key in this response so that another service which uses these fields stored in ES can be conveyed that this "list" is a multivalue key.
I've read about dynamic templates and think it might be able to help me but i'm not really sure
(https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html).
Any help is much appreciated.
You can use dynamic_templates
match_mapping_type: "object" will take any object type change it to nested
{
"mappings": {
"dynamic_templates": [
{
"objects": {
"match": "*",
"match_mapping_type": "object",
"mapping": {
"type": "nested"
}
}
}
]
}
}
Data:
{
"list": [
{
"a": "b",
"c": "d"
},
{
"a": "q",
"c": "f"
}
]
}
Result:
"index80" : {
"mappings" : {
"dynamic_templates" : [
{
"objects" : {
"match" : "*",
"match_mapping_type" : "object",
"mapping" : {
"type" : "nested"
}
}
}
],
"properties" : {
"list" : {
"type" : "nested",
"properties" : {
"a" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"c" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}

Completion Suggester Foreign Language Accents Greek

I am trying to use the Completion suggester with Greek language. Unfortunately I have problems with accents like ά. I've tried a few ways.
One was simply to set the greek analyzer in the mapping the other a lowercase analyzer with asciifolding. No success, with greek analyser I dont even get a result with the accent.
Below is what I did, would be great if anyone can help me out here.
Mapping
PUT t1
{
"mappings": {
"profession" : {
"properties" : {
"text" : {
"type" : "keyword"
},
"suggest" : {
"type" : "completion",
"analyzer": "greek"
}
}
}
}
}
Dummy
POST t1/profession/?refresh
{
"suggest" : {
"input": [ "Μάγειρας"]
}
,"text": "Μάγειρας"
}
Query
GET t1/profession/_search
{ "suggest":
{ "profession" :
{ "prefix" : "Μα"
, "completion" :
{ "field" : "suggest"}
}}}
I found a way to do it with a custom analyzer or via a plugin for es which i highly recommend when it comes to non-latin texts.
Option 1
PUT t1
{ "settings":
{ "analysis":
{ "filter":
{ "greek_lowercase":
{ "type": "lowercase"
, "language": "greek"
}
}
, "analyzer":
{ "autocomplete":
{ "tokenizer": "lowercase"
, "filter":
[ "greek_lowercase" ]
}
}
}}
, "mappings": {
"profession" : {
"properties" : {
"text" : {
"type" : "keyword"
},
"suggest" : {
"type" : "completion",
"analyzer": "autocomplete"
}
}}}
}
Option 2 ICU Plugin
Install ES Plugin:
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html
{ "settings": {
"index": {
"analysis": {
"normalizer": {
"latin": {
"filter": [
"custom_latin_transform"
]
}
},
"analyzer": {
"latin": {
"tokenizer": "keyword",
"filter": [
"custom_latin_transform"
]
}
},
"filter": {
"noDelimiter": {"type": "word_delimiter"},
"custom_latin_transform": {
"type": "icu_transform",
"id": "Greek-Latin/UNGEGN; Lower(); NFD; [:Nonspacing Mark:] Remove; NFC"
}
}
}
}
}
, "mappings":
{ "doc" : {
"properties" : {
"verbose" : {
"type" : "keyword"
},
"name" : {
"type" : "keyword"
},
"slugHash":{
"type" : "keyword",
"normalizer": "latin"
},
"level": { "type": "keyword" },
"hirarchy": {
"type" : "keyword"
},
"geopoint": { "type": "geo_point" },
"suggest" :
{ "type" : "completion"
, "analyzer": "latin"
, "contexts":
[ { "name": "level"
, "type": "category"
, "path": "level"
}
]
}}
}
}}

Elasticsearch query multiple types with different bool

I have an index with 3 different types of content: ['media','group',user'] and I need to do a search at the three at the same type, but requesting some extra parameters that one of them must accomplish before adding to the results list.
Here is my current index data:
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"media": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"UID": {
"type": "integer",
"include_in_all": false
},
"addtime": {
"type": "integer",
"include_in_all": false
},
"title": {
"type": "string",
"index": "not_analyzed"
}
}
},
"group": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"UID": {
"type": "integer",
"include_in_all": false
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"desc": {
"type": "string",
"include_in_all": false
}
}
},
"user": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"addtime": {
"type": "integer",
"include_in_all": false
},
"username": {
"type": "string"
}
}
}
}
}
So currently I can make a search on all the index with
{
query: {
match: {
_all: {
"query": "foo",
"operator": "and"
}
}
}
}
and get the results for media, groups or users with the word "foo" on it, which is great, but I need to make it remove all the media on which the user is not the owner of the results. So I guess I need to do a bool query where I set the "must" clause and add the 'UID' variable to whatever the current user ID is.
My problem is how to do this and how to specify that the filter will work just on one type while leaving the others untouched.
I haven't been able to find an answer on the Elastic Search documentation
At the end I was able to accomplish this by following Andrei's comments. I know it is not perfect since I had to add a should with the types "group" and "user", but it fit perfectly with my design since I need to put more filters on those too. Be advice that the search will end up being slower.
curl -X GET 'http://localhost:9200/foo/_search' -d '
{
"query": {
"bool" :
{
"must" :
{
"query" : {
"match" :
{
"_all":
{
"query" : "test"
}
}
}
},
"filter":
{
"bool":
{
"should":
[{
"bool" : {
'must':
[{
"type":
{
"value": "media"
}
},
{
'bool':
{
"should" : [
{ "term" : {"UID" : 2}},
{ "term" : {"type" : "public"}}
]
}
}]
}
},
{
"bool" : {
"should" : [
{ "type" : {"value" : "group"}},
{ "type" : {"value" : "user"}}
]
}
}]
}
}
}
}
}'

Query_string not returning anything in Elasticsearch?

Hello All i am having a problem that when i was using query_string with mappings given below everything was working fine i am just using default analyzers with no filters.
mappings : {
places_area1: {
properties:{
area1 : {"type" : "string", "index": "analyzed"},
city : {"type" : "string", "index": "analyzed"}
},
}
}
}
}
but now when i am trying to use query_string with this mapping it is not working can someone please tell me what am i doing wrong, i guess its because of whitespace tokenizer but why.
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym_wildcard": {
"tokenizer": "whitespace",
"filter": ["filter_wildcard"]
},
"synonym_term": {
"tokenizer": "keyword",
"filter": ["filter_term"]
},
"simple_wildcard": {
"tokenizer": "whitespace"
}
},
"filter": {
"filter_term": {
"tokenizer": "keyword", // here you have to write this only for tokenizer keyword but not for whitespace
"type": "synonym",
"synonyms_path": "synonyms.txt",
},
"filter_wildcard": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
}
}
}
}
},
mappings : {
places_area1: {
properties:{
area1 : {"type" : "string", "index": "analyzed", "analyzer": "simple_wildcard"},
city : {"type" : "string", "fields": {
"raw": {
"type": "string",
"analyzer": "synonym_term"
},
"raw_wildcard": {
"type": "string",
"analyzer": "synonym_wildcard"
}
} },
}
}
}
}
I think the problem could be your query is lowercased because by default "lowercase_expanded_terms" is true
{
"query": {
"query_string": {
"default_field": "state",
"query": "Ban*",
"lowercase_expanded_terms": false
}
}
}
Now this should match Bangalore

Resources