How to filter aggregation results in elasticsearch (v 6.3) - elasticsearch

I have an array for field commodity line ex:[3,35,1,11,12],[3,12]. I am trying to query the field for autocomplete results and i need output as 3 and 35 when i match with 3. My indexing works fine for all the scenarios except when i am working with an array data type.
I will need to filter the aggregation results to give 3 and 35, which i am unable to retrieve.i need to use facet_filter or filter with prefix .Similar to facet.prefix in solr.
Let me know if i need to change the query or the mapping?
Query :
GET contracts/doc/_search
{
"size":0,
"query":{
"bool":{
"must":{
"match":{
"commodity_line.autocomplete":"3"
}
}
}
},
"aggs" : {
"names":{
"terms":{
"field":"commodity_line.keyword"
}
}
}
}
Mapping :
PUT contracts
{
"settings":{
"analysis":{
"filter":{
"gramFilter": {
"type": "edge_ngram",
"min_gram" : 1,
"max_gram" : 20,
"token_chars": [
"letter",
"symbol",
"digit"
]
}
},
"analyzer":{
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trim",
"gramFilter",
"asciifolding"
]
}
}
}
}
,
"mappings":{
"doc":{
"properties":{
"commodity_line" :{
"type":"text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"autocomplete":{
"type":"text",
"analyzer":"autocomplete",
"search_analyzer":"standard"
}
}
}
}
}
}
}

I have found an solution,
I had to match it with a prefix rather than filtering the results.
"aggs" : {
"names":{
"terms":{
"field":"commodity_line.keyword",
"include" : "3.*"
}
}

Related

Elasticsearch replacing cross_fields with combined field and fuzzy

We have an index which was previously searching a few fields such as this:
"query":{
"bool":{
"filter":[
{
"term":{
"eventvisibility":"public"
}
}
],
"should":[
{
"multi_match":{
"query":"keyword",
"fields":[
"eventname",
"venue.name",
"venue.town"
],
"type":"cross_fields",
"minimum_should_match":"3<80%"
}
},
{
"match":{
"eventdescshort":{
"query":"keyword",
"minimum_should_match":"2<80%"
}
}
}
],
"minimum_should_match":1
}
}
This works, but often fails due to spelling mistakes, etc with letters left off the keyword or transposed.
So I was hoping to implement fuzzy searching, As this doesn't work with cross_fields, I created a new field in the index:
"mappings": {
"event": {
"properties": {
"basic_search": {
"type": "text",
"analyzer": "nameanalyzer"
},
"eventname":{
"type": "text",
"copy_to": "basic_search" ,
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "nameanalyzer"
},
"venue": {
"properties": {
"name": {
"type": "text",
"copy_to": "basic_search" ,
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "nameanalyzer"
},
...snip (all fields previosouly in cross_fields now have copy_to: basic_search) ...
}
And our analyzer is as follows:
"nameanalyzer": {
"filter": [
"lowercase",
"stop",
"english_possessive_stemmer",
"english_minimal_stemmer",
"synonym",
"asciifolding",
"word_delimiter"
],
"char_filter": "html_strip",
"type": "custom",
"tokenizer": "standard"
}
I've now run a test search, as follows:
{
"query": {
"fuzzy": {
"basic_search": {
"value": "carers fair"
}
}
}
However, this is not giving me any matches at all.
I just get:
"type": "MatchNoDocsQuery",
"description": "MatchNoDocsQuery(\"empty BooleanQuery\")",
I know I can't see the contents of the basic_search field in _source, so how can I debug and know why this isn't matching?
Fuzzy query don't analyze text before searching. Usage of the same should be avoided.
Excerpt from ES Doc below :
fuzzy query: The elasticsearch fuzzy query type should generally be avoided. Acts much like a term query. Does not analyze the query text first.
Please try below query:
{
"query":{
"match":{
"basic_search":{
"query":"carers fair",
"fuzziness":"AUTO"
}
}
}

ElasticSearch accented and no accented words management

I created an index :
PUT members
{
"settings":{
"number_of_shards":1,
"analysis":{
"analyzer":{
"accentedNames":{
"tokenizer":"standard",
"filter":[
"lowercase",
"asciifolding"
]
},
"standardNames":{
"tokenizer":"standard",
"filter":[
"lowercase"
]
}
}
}
},
"mappings":{
"member":{
"properties":{
"id":{
"type":"text"
},
"name":{
"type":"text",
"analyzer":"standardNames",
"fields":{
"accented":{
"type":"text",
"analyzer":"accentedNames"
}
}
}
}
}
}
}
Assume that some documents are in this set (EDIT):
{"1", "Maéllys Macron"};
{"2", "Maêllys Alix"};
{"3", "Maëllys Rosa"};
{"4", "Maèllys Alix"};
{"5", "Maellys du Bois"};
I wanted to have this result :
If I want to get documents named "Maéllys", I expect to get "Maéllys Richard" as the best match, and others with the same score.
What I did is to use my analyzers with a such request :
GET members/member/_search
{
"query":{
"multi_match" : {
"query" : "Maéllys",
"fields" : [ "name", "name.accented" ]
}
}
}
"Maéllys Richard" has the best score. The documents "Ma(ê|ë|é|è)llys Richard have the same score that is higher than "Maellys Richard" document.
Can someone help me ?
Thanks.

why data can't get by elasticsearch?

Elastic search version 6.2.4
I made elastic search environment and made mapping like this.
{
"state":"open",
"settings":{
"index":{
"number_of_shards":"5",
"provided_name":"lara_cart",
"creation_date":"1529082175034",
"analysis":{
"filter":{
"engram":{
"type":"edgeNGram",
"min_gram":"1",
"max_gram":"36"
},
"maxlength":{
"type":"length",
"max":"36"
},
"word_delimiter":{
"split_on_numerics":"false",
"generate_word_parts":"true",
"preserve_original":"true",
"generate_number_parts":"true",
"catenate_all":"true",
"split_on_case_change":"true",
"type":"word_delimiter",
"catenate_numbers":"true"
}
},
"char_filter":{
"normalize":{
"mode":"compose",
"name":"nfkc",
"type":"icu_normalizer"
},
"whitespaces":{
"pattern":"\s[2,]",
"type":"pattern_replace",
"replacement":"\u0020"
}
},
"analyzer":{
"keyword_analyzer":{
"filter":[
"lowercase",
"trim",
"maxlength"
],
"char_filter":[
"normalize",
"whitespaces"
],
"type":"custom",
"tokenizer":"keyword"
},
"autocomplete_index_analyzer":{
"filter":[
"lowercase",
"trim",
"maxlength",
"engram"
],
"char_filter":[
"normalize",
"whitespaces"
],
"type":"custom",
"tokenizer":"keyword"
},
"autocomplete_search_analyzer":{
"filter":[
"lowercase",
"trim",
"maxlength"
],
"char_filter":[
"normalize",
"whitespaces"
],
"type":"custom",
"tokenizer":"keyword"
}
},
"tokenizer":{
"engram":{
"type":"edgeNGram",
"min_gram":"1",
"max_gram":"36"
}
}
},
"number_of_replicas":"1",
"uuid":"5xyW07F-RRCuIJlvBufNbA",
"version":{
"created":"6020499"
}
}
},
"mappings":{
"products":{
"properties":{
"sale_end_at":{
"format":"yyyy-MM-dd HH:mm:ss",
"type":"date"
},
"image_5":{
"type":"text"
},
"image_4":{
"type":"text"
},
"created_at":{
"format":"yyyy-MM-dd HH:mm:ss",
"type":"date"
},
"description":{
"analyzer":"keyword_analyzer",
"type":"text",
"fields":{
"autocomplete":{
"search_analyzer":"autocomplete_search_analyzer",
"analyzer":"autocomplete_index_analyzer",
"type":"text"
}
}
},
"sale_start_at":{
"format":"yyyy-MM-dd HH:mm:ss",
"type":"date"
},
"sale_price":{
"type":"integer"
},
"category_id":{
"type":"integer"
},
"updated_at":{
"format":"yyyy-MM-dd HH:mm:ss",
"type":"date"
},
"price":{
"type":"integer"
},
"image_1":{
"type":"text"
},
"name":{
"analyzer":"keyword_analyzer",
"type":"text",
"fields":{
"autocomplete":{
"search_analyzer":"autocomplete_search_analyzer",
"analyzer":"autocomplete_index_analyzer",
"type":"text"
},
"keyword":{
"analyzer":"keyword_analyzer",
"type":"text"
}
}
},
"image_3":{
"type":"text"
},
"categories":{
"type":"nested",
"properties":{
"parent_category_id":{
"type":"integer"
},
"updated_at":{
"type":"text",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
}
}
},
"name":{
"analyzer":"keyword_analyzer",
"type":"text",
"fields":{
"autocomplete":{
"search_analyzer":"autocomplete_search_analyzer",
"analyzer":"autocomplete_index_analyzer",
"type":"text"
}
}
},
"created_at":{
"type":"text",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
}
}
},
"id":{
"type":"long"
}
}
},
"id":{
"type":"long"
},
"image_2":{
"type":"text"
},
"stock":{
"type":"integer"
}
}
}
},
"aliases":[
],
"primary_terms":{
"0":1,
"1":1,
"2":1,
"3":1,
"4":1
},
"in_sync_allocations":{
"0":[
"clYoJWUKTru2Z78h0OINwQ"
],
"1":[
"MGQC73KiQsuigTPg4SQG4g"
],
"2":[
"zW6v82gNRbe3wWKefLOAug"
],
"3":[
"5TKrfz7HRAatQsJudKX9-w"
],
"4":[
"gqiblStYSYy_NA6fYtkghQ"
]
}
}
I want to use suggest search by autocomplete filed.
So I added a document like this.
{
"_index":"lara_cart",
"_type":"products",
"_id":"19",
"_version":1,
"_score":1,
"_source":{
"id":19,
"name":"Conqueror, whose.",
"description":"I should think you'll feel it a bit, if you wouldn't mind,' said Alice: 'besides, that's not a regular rule: you invented it just missed her. Alice caught the flamingo and brought it back, the fight.",
"category_id":81,
"stock":79,
"price":11533,
"sale_price":15946,
"sale_start_at":null,
"sale_end_at":null,
"image_1":"https://lorempixel.com/640/480/?56260",
"image_2":"https://lorempixel.com/640/480/?15012",
"image_3":"https://lorempixel.com/640/480/?14138",
"image_4":"https://lorempixel.com/640/480/?94728",
"image_5":"https://lorempixel.com/640/480/?99832",
"created_at":"2018-06-01 16:12:41",
"updated_at":"2018-06-01 16:12:41",
"deleted_at":null,
"categories":{
"id":81,
"name":"A secret, kept.",
"parent_category_id":"33",
"created_at":"2018-06-01 16:12:41",
"updated_at":"2018-06-01 16:12:41",
"deleted_at":null
}
}
}
After that, I try to search by below query.
But, this query can't get anything.
Do you know how to resolve it?
I think to cause is mapping and setting cause.
{
"query":{
"bool":{
"must":[
{
"term":{
"name.autocomplete":"Conqueror"
}
}
],
"must_not":[
],
"should":[
]
}
},
"from":0,
"size":10,
"sort":[
],
"aggs":{
}
}
It's just because of the field that you are using is analyzed and "term" couldn't support the query
you can try "match" on the field which analyzer is autocomplete; may be some basic knowledge of autocomplete and n-grams will help you better understanding this problem.
e.g.
you defined the following analyzer:
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
After that you can test the autocomplete with following request:
GET /my_index/_analyze?analyzer=autocomplete
quick brown
as configured abrove, the autocomplete will generate n-grams for the input query with the edges from 1 ~ 20. And the return for the request is:
q
qu
qui
quic
quick
b
br
bro
brow
brown
As we all know that term query is a query that will search the field which exactly contains the query world, just like where condition of mysql.

Optimising ElasticSearch aggregated search suggestions

I'm working on implementing an autocomplete field where the suggestions also contain the number of matching documents.
I have implemented this simply using a terms aggregation with include filter. So for instance given a user typing 'Chrysler' the following query may be generated:
{
"size": 0,
"query": {
"bool": {
"must": [
...
]
}
},
"aggs": {
"filtered": {
"filter": {
...
},
"aggs": {
"suggestions": {
"terms": {
"field": "prefLabel",
"include": "Chry.*",
"min_doc_count": 0
}
}
}
}
}
}
This works fine and I am able to get the data I need. However, I am concerned that this is not very well optimised and more could be done when the documents are indexed.
Currently we have the following mapping:
{
...
"prefLabel":{
"type":"string",
"index":"not_analyzed"
}
}
And I am wondering whether to add an analysed field, like so:
{
...
"prefLabel":{
"type":"string",
"index":"not_analyzed",
"copy_to":"searchLabel"
},
"searchLabel":{
"type":"string",
"analyzer":"???"
}
}
So my question is: what is the most optimal index-time analyser for this? (or, is this just crazy?)
I think that edge ngram tokenizer would speed things up:
curl -XPUT 'localhost:9200/test_ngram' -d '{
"settings" : {
"analysis" : {
"analyzer" : {
"suggester_analyzer" : {
"tokenizer" : "ngram_tokenizer"
}
},
"tokenizer" : {
"ngram_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "2",
"max_gram" : "7",
"token_chars": [ "letter", "digit" ]
}
}
}
},
"mappings": {
...
"searchLabel": {
"type": "string",
"index_analyzer": "suggster_analyzer",
"search_analyzer": "standard"
}
...
}
}'

Multi field analyzer not working as expected

I'm confused. I have the following document indexed:
POST test/topic
{
"title": "antiemetics"
}
With the following query:
{
"query": {
"query_string" : {
"fields" : ["title*"],
"default_operator": "AND",
"query" :"anti emetics",
"use_dis_max" : true
}
},
"highlight" : {
"fields" : {
"*" : {
"fragment_size" : 200,
"pre_tags" : ["<mark>"],
"post_tags" : ["</mark>"]
}
}
}
}
and the following settings and mappings:
POST test{
"settings":{
"index":{
"number_of_shards":1,
"analysis":{
"analyzer":{
"merge":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase"
],
"char_filter":[
"hyphen",
"space",
"html_strip"
]
}
},
"char_filter":{
"hyphen":{
"type":"pattern_replace",
"pattern":"[-]",
"replacement":""
},
"space":{
"type":"pattern_replace",
"pattern":" ",
"replacement":""
}
}
}
}
},
"mappings":{
"topic":{
"properties":{
"title":{
"analyzer":"standard",
"search_analyzer":"standard",
"type":"string",
"fields":{
"specialised":{
"type":"string",
"index":"analyzed",
"analyzer":"standard",
"search_analyzer":"merge"
}
}
}
}
}
}
}
I know my use of a multi-field doesn't make sense as I'm using the same index analyzer as the title so please just ignore that however I'm more interested in my understanding with regard to analyzers. I was expecting the merge analyzer to change the following query "anti emetics" to "antiemetics" and I was hoping the multifield setting that has the analyzer applied would match against the token "antiemetics" but I don't get any results back even though I have tested that the analyzer is removing white spaces from the query by running the analyze API. Any idea why?
This seems to work with your setup:
POST /test_index/_search
{
"query": {
"match": {
"title.specialised": "anti emetics"
}
}
}
Here's some code I set up to play with it:
http://sense.qbox.io/gist/3ef6926644213cf7db568557a801fec6cb15eaf9

Resources