Matching closest ancestor with Path Hierarchy Tokenizer

Matching closest ancestor with Path Hierarchy Tokenizer - elasticsearch

I've got an Elasticsearch v5 index set up for mapping config hashes to URLs.
{
"settings": {
"analysis": {
"analyzer": {
"url-analyzer": {
"type": "custom",
"tokenizer": "url-tokenizer"
}
},
"tokenizer": {
"url-tokenizer": {
"type": "path_hierarchy",
"delimiter": "/"
}
}
}
},
"mappings": {
"route": {
"properties": {
"uri": {
"type": "string",
"index": "analyzed",
"analyzer": "url-analyzer"
},
"config": {
"type": "object"
}}}}}
I would like to match the longest path prefix with the highest score, so that given the documents
{ "uri": "/trousers/", "config": { "foo": 1 }}
{ "uri": "/trousers/grey", "config": { "foo": 2 }}
{ "uri": "/trousers/grey/lengthy", "config": { "foo": 3 }}
when I search for /trousers, the top result should be trousers, and when I search for /trousers/grey/short the top result should be /trousers/grey.
Instead, I'm finding that the top result for /trousers is /trousers/grey/lengthy.
How can I index and query my documents to achieve this?

I have one solution, after drinking on it: what if we treat the URI in the index as a keyword, but still use the PathHierarchyTokenizer on the search input?
Now we store the following docs:
/trousers
/trousers/grey
/trousers/grey/lengthy
When we submit a query for /trousers/grey/short, the search_analyzer can build the input [trousers, trousers/grey, trousers/grey/short].
The first two of our documents will match, and we can trivially select the longest match using a custom sort.
Now our mapping document looks like this:
{
"settings": {
"analysis": {
"analyzer": {
"uri-analyzer": {
"type": "custom",
"tokenizer": "keyword"
},
"uri-query": {
"type": "custom",
"tokenizer": "uri-tokenizer"
}
},
"tokenizer": {
"uri-tokenizer": {
"type": "path_hierarchy",
"delimiter": "/"
}
}
}},
"mappings": {
"route": {
"properties": {
"uri": {
"type": "text",
"fielddata": true,
"analyzer": "uri-analyzer",
"search_analyzer": "uri-query"
},
"config": {
"type": "object"
}
}
}
}
}
```
and our query looks like this:
{
"sort": {
"_script": {
"script": "doc.uri.length",
"order": "asc",
"type": "number"
}
},
"query": {
"match": {
"uri": {
"query": "/trousers/grey/lengthy",
"type": "boolean"
}
}
}
}

Related

ElasticSerch match query result is empty

I have an index with documents that have 3 fields name, summary and tags
name is short text that contains small pharse e.g. "Japanese Handmade Sword"
summary is a long text that is description of certain products, it may be more then 200 words.
tags is an array of string with keywords, e.g. ["Japanese", "Antiquity", "Weapon", "Katana"]
I need to combine these fields into one search query to get desired search results. For example, when user searched "Japan" I should get this item. However, match query always gives me empty result, although I have data and can see all documents without query.
Here is my mapping and index settings that performs some tokenization for fields.
PUT lessons
{
"settings": {
"index": {
"number_of_shards": 1
},
"refresh_interval": "5s",
"similarity": {
"string_similarity": {
"type": "BM25"
}
},
"analysis": {
"analyzer": {
"autocomplete": {
"filter": [
"lowercase"
],
"tokenizer": "standard"
},
"autocomplete_search": {
"type": "custom",
"filter": "lowercase",
"tokenizer": "standard"
}
},
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"summary": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"tags": {
"type": "text",
"search_analyzer": "autocomplete_search",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
I am using Kibana and when run below query I get no result
GET lessons/_search
{
"query": {
"match": {
"summary": "Japan"
}
}
}
What is wrong with my index settings or mapping?

You can use a multi-match query, to search on multiple fields for the same query text
{
"query": {
"multi_match" : {
"query": "Japan",
"fields": [ "summary", "tags", "name" ]
}
}
}

Elasticsearch - search for numbers / price with decimal places

How can I search for documents in Elasticsearch that have numeric field with value having decimal places?
My Mapping is as follows:
POST /itemnew/_doc
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "string",
"analyzer": "edge_ngram_analyzer",
},
"purchase_price": {
"type": "double"
},
"sale_price": {
"type": "double"
},
"sku": {
"type": "string",
},
"settings": {
"index": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
}
},
"analyzer": {
"ngram_analyzer": {
"tokenizer": "standard",
}
Sample document is as follows:
PUT itemnew/_doc/3
{
"company_id":"4510339694428161" ,
"item_type": "goods",
"name":"Apple sam" ,
"purchase_price":"45.50" ,
"sale_price":"50",
"sku": "sku 123"
}
I get NumberFormatException when I try the following query: GET itemnew/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "45.5",
"fields": [
"name",
"purchase_price",
"sale_price",
"sku"
],
"type": "most_fields"
```
How can I search for documents in Elasticsearch that have numeric field with value having decimal places?Please help me to solve this issue. Thank you }

You can use a lenient top-level parameter for a multi-match query here. Adding lenient just ignore exception that occurs due to format failures.
lenient (Optional, Boolean) If true, format-based errors, such as
providing a text query value for a numeric field, are ignored.
Defaults to false.
Adding a working example
Index Mapping:
PUT testidx
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram"
}
},
"analyzer": {
"ngram_analyzer": {
"tokenizer": "standard"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "ngram_analyzer"
},
"purchase_price": {
"type": "double"
},
"sale_price": {
"type": "double"
},
"sku": {
"type": "text"
}
}
}
}
Index Data:
PUT testidx/_doc/1
{
"company_id": "4510339694428161",
"item_type": "goods",
"name": "Apple sam",
"purchase_price": "45.50",
"sale_price": "50",
"sku": "sku 123"
}
Search Query:
POST testidx/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "hello",
"fields": [
"name",
"purchase_price",
"sale_price",
"sku"
],
"lenient": true,
"type": "most_fields"
}
}
]
}
}
}

Elasticsearch Synonym search using wordnet not working

I tried to debug my synonym search .it seems like when i use wornet format and use the wn_s.pl file it doesn't work, but when i use a custom synonym.txt file then it works.Please let me know where i am doing wrong.please find my below index:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/wn_s.pl"
}
},
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": ["lowercase",
"synonym"
]
}
},
"mappings": {
"properties": {
"firebaseId": {
"type": "text"
},
"name": {
"fielddata": true,
"type": "text",
"analyzer": "standard"
},
"name_auto": {
"type": "text"
},
"category_name": {
"type": "text",
"analyzer": "synonym"
},
"sku": {
"type": "text"
},
"price": {
"type": "text"
},
"magento_id": {
"type": "text"
},
"seller_id": {
"type": "text"
},
"square_item_id": {
"type": "text"
},
"square_variation_id": {
"type": "text"
},
"typeId": {
"type": "text"
}
}
}
}
}
}
}
I am trying to do synonym search on category_name ,i have items like shoes and dress etc .when i search for boots,flipflop or slipper nothing comes.
here is my query search:
{
"query": {
"match": {
"category_name": "flipflop"
}
}
}

Your wordnet synonym format is not correct. Please have a look here
For a fast implementation please look at the synonyms.json

How to query fields with path_hierarchy analyzer in elasticsearch?

I have configured path_analyzer in elasticsearch using below configuration.
PUT /elastic_course
{
"settings": {
"analysis": {
"analyzer": {
"path_analyzer": {
"tokenizer": "path_tokenizer"
},
"reverse_path_analyzer": {
"tokenizer": "path_tokenizer"
}
},
"tokenizer": {
"path_tokenizer": {
"type": "path_hierarchy",
"delimiter": "/",
"replacement": "-"
},
"reverse_path_tokenizer": {
"type": "path_hierarchy",
"delimiter": "/",
"replacement": "-"
}
}
}
},
"mappings": {
"book" : {
"properties": {
"author": {
"type": "string",
"index": "not_analyzed"
},
"genre": {
"type": "string",
"index": "not_analyzed"
},
"score": {
"type": "double"
},
"synopsis": {
"type": "string",
"index":"analyzed",
"analyzer":"english"
},
"title": {
"type": "string"
},
"path":{
"type":"string",
"analyzer":"path_analyzer",
"fields": {
"reverse": {
"type":"string",
"analyzer":"reverse_path_analyzer"
}
}
}
}
}
}
}
Now I have inserted four documents where path values are :
/angular/structural
/angular/structural/directives
/angular/structural/components
/angular/logistics
Now I want to query my index such as :
it will retrieve only children of structural.
It will return all the leaf nodes i.e. components , directives and logistics.
I tried running below query but it retrieves all the records.
POST elastic_course/book/_search
{
"query": {
"regexp": {
"path.":"/structural"
}
}
}
any help?
Thanks.

Elasticsearch nested mapping does not seem to work

Given the following document,
{
"domain": "www.example.com",
"tag": [
{
"name": "IIS"
},
{
"name": "Microsoft ASP.NET"
}
]
}
When I launch a query for asp or asp.net I would like to see the Microsoft ASP.NET document in the result set.
So I need a lower case analyzer and remove the . character from word delimiters, so I tried the following mapping,
curl -XPUT http://localhost:9200/tag-test -d '{
"settings": {
"analysis": {
"filter": {
"domain_filter": {
"type": "word_delimiter",
"type_table": [". => ALPHANUM", ": => ALPHANUM"]
}
},
"analyzer": {
"domain_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["lowercase", "domain_filter"]
}
}
}
},
"mappings": {
"assets": {
"properties": {
"domain": {
"type": "string",
"analyzer": "domain_analyzer"
},
"tag": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"analyzer": "domain_analyzer"
}
}
}
}
}
}
}'; echo
Then I tried the following queries, all of which yield an empty result
tag.name:asp
tag.name:asp.net
tag.name:*asp*
I'm using querystring query,
curl http://localhost:9200/tag-test/_search?q=tag.name:asp
Any ideas?

First of all the query_string query doesn't have support for nested queries and unless your use include_in_parent: true (which will flatten the nested field in an array in the parent document) in your mapping, the query_string will not work ever.
Secondly, with your analyzer you will have asp.net as a term being indexed in Elasticsearch. Which means the query_string will work with tag.name:asp.net and tag.name:*asp*. I recommend not to use a leading wildcard though.
So, in the end your test should be:
PUT /tag-test
{
"settings": {
"analysis": {
"filter": {
"domain_filter": {
"type": "word_delimiter",
"type_table": [
". => ALPHANUM",
": => ALPHANUM"
]
}
},
"analyzer": {
"domain_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"domain_filter"
]
}
}
}
},
"mappings": {
"assets": {
"properties": {
"domain": {
"type": "string",
"analyzer": "domain_analyzer"
},
"tag": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "string",
"analyzer": "domain_analyzer"
}
}
}
}
}
}
}
Notice "include_in_parent": true in the mapping for tag.
Then the query should be:
curl -XGET "http://localhost:9200/tag-test/_search?q=tag.name:asp*"
curl -XGET "http://localhost:9200/tag-test/_search?q=tag.name:asp.net"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Matching closest ancestor with Path Hierarchy Tokenizer - elasticsearch

Related

ElasticSerch match query result is empty

Elasticsearch - search for numbers / price with decimal places

Elasticsearch Synonym search using wordnet not working

How to query fields with path_hierarchy analyzer in elasticsearch?

Elasticsearch nested mapping does not seem to work

Categories

Resources