what types are best for elasticsearch "KEYWORDS"(like hashtags) field? - elasticsearch

i want to make Elasticsearch index for something KEYWORDS, like.. hashtag.
and make synonym filter for KEYWORDs.
i think two ways indexing keyword, first is make keyword type.
{
"settings": {
"keywordField": {
"type": "keyword"
}
}
}
if make a index with League of Legends
maybe this.
{
"keywordField": ["leagueoflegends", "league", "legends", "lol" /* synonym */]
}
or text type:
{
"settings": {
"keywordField": {
"type": "text",
"analyzer": "lowercase_and_whitespace_and_synonym_analyzer"
}
}
}
maybe this.
{
"keywordField": ["league of legends"](synonym: lol => leagueoflegends)
}
if use _analyzer api for this field, expects "leagueoflegends", "league", "legends"
search query: 'lol', 'league of legends', 'League of Legends' have to match this field.
which practice is best?

Adding a working example with index data, mapping, search query, and search result. In the below example, I have taken two synonyms lol and leagueoflegends
Index Mapping:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"leagueoflegends, lol"
]
}
},
"analyzer": {
"synonym_analyzer": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"properties": {
"keywordField": {
"type": "text"
}
}
}
}
Index Data:
{
"keywordField": ["leagueoflegends", "league", "legends"]
}
Search Query:
{
"query": {
"match": {
"keywordField": {
"query": "lol",
"analyzer": "synonym_analyzer"
}
}
}
}
Search Result:
"hits": [
{
"_index": "66872989",
"_type": "_doc",
"_id": "1",
"_score": 0.19363807,
"_source": {
"keywordField": [
"leagueoflegends",
"league",
"legends"
]
}
}
]

Related

How to query by number and desconsider special characters

currently I have a document in my opensearch database with the value 1301-003.023.
If I run the following query the document will be returned:
GET test/example
{
"query": {
"match": {
"my_number": "1301-003.023"
}
}
}
the main problem is if the user run this query:
GET test/example
{
"query": {
"match": {
"my_number": "1301003.023"
}
}
}
In the query above the symbol - is missing, and it will returning nothing. I need to create a search that can deal with it but without return documents that doesn't have the exactly same numbers. So, if i search for 1301003023 I want to find the document with 1301-003.023, but I don't for documents with 1301-003.032 (see that the last two numbers were exchanged)
I created a new analyzer using char filter that mapping simbols "." and "-" to empty. So, the number "1301-003.023" becomes token "1301003023".
Full example:
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_filter"
]
}
},
"char_filter": {
"my_filter": {
"type": "mapping",
"mappings": [
". => ",
"- => "
]
}
}
}
},
"mappings": {
"properties": {
"my_number": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Document
POST test/_bulk
{"index":{}}
{"my_number": "1301-003.023"}
Query
GET test/_search
{
"query": {
"match": {
"my_number": {
"query": "1301003023"
}
}
}
}
Results
"hits": [
{
"_index": "test",
"_id": "MC7v0IUBKJKciEqCrBP-",
"_score": 0.2876821,
"_source": {
"my_number": "1301-003.023"
}
}

How to find word 'food2u' by search 'food' in Elasticsearch?

I am a rookie who just started learning elasticsearch,And I want to find word like 'food2u' by search keyword 'food'.But I can only get the results like 'Food Repo','Give Food' etc. The field's Mapping is 'text' and this is my query
GET api/_search
{"query": {
"match": {
"Name": {
"query": "food"
}
}
},
"_source":{
"includes":["Name"]
}
}
You are getting the results like 'Food Repo','Give Food', as the text field uses a standard analyzer if no analyzer is specified. Food Repo gets tokenized into food and repo. Similarly Give Food gets tokenized into give and food.
But food2u gets tokenized into food2u. Since there is no matching token ("food"), you will not get the food2u document.
You need to use edge_ngram tokenizer to do a partial text match.
Adding a working example with index data, mapping, search query and search result
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 4,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
},
"max_ngram_diff": 10
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Index Data:
{
"name":"food2u"
}
Search Query:
{
"query": {
"match": {
"name": "food"
}
}
}
Search Result:
"hits": [
{
"_index": "67552800",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"name": "food2u"
}
}
]
If you don't want to change the mapping, you can even use a wildcard query to return the matching documents
{
"query": {
"wildcard": {
"Name": {
"value": "food*"
}
}
}
}
OR you can even use query_string with wildcard
{
"query": {
"query_string": {
"query": "food*",
"fields": [
"Name"
]
}
}
}

search array of strings by partially match in elasticsearch

I got fields like that:
names: ["Red:123", "Blue:45", "Green:56"]
it's mapping is
"names": {
"type": "keyword"
},
how could I search like this
{
"query": {
"match": {
"names": "red"
}
}
}
to get all the documents where red is in element of names array?
Now it works only with
{
"query": {
"match": {
"names": "red:123"
}
}
}
You can add multi fields OR just change the type to text, to achieve your required result
Index Mapping using multi fields
{
"mappings": {
"properties": {
"names": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"names":{
"type":"text"
}
}
}
}
Index Data:
{
"names": [
"Red:123",
"Blue:45",
"Green:56"
]
}
Search Query:
{
"query": {
"match": {
"names": "red"
}
}
}
Search Result:
"hits": [
{
"_index": "64665127",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"names": [
"Red:123",
"Blue:45",
"Green:56"
]
}
}
]

How to create and add values to a standard lowercase analyzer in elastic search

Ive been around the houses with this for the past few days trying things in various orders but cant figure out why its not working.
I am trying to create an index in Elasticsearch with an analyzer which is the same as the "standard" analyzer but retains upper case characters when records are stored.
I create my analyzer and index as follows:
PUT /upper
{
"settings": {
"index" : {
"analysis" : {
"analyzer": {
"rebuilt_standard": {
"tokenizer": "standard",
"filter": [
"standard"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
}
}
}
Then add two records to test like this...
POST /upper/doc
{
"text" : "TEST"
}
Add a second record...
POST /upper/doc
{
"text" : "test"
}
Using /upper/_settings gives the following:
{
"upper": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "upper",
"creation_date": "1537788581060",
"analysis": {
"analyzer": {
"rebuilt_standard": {
"filter": [
"standard"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "s4oDgdsFTxOwsdRuPAWEkg",
"version": {
"created": "6030299"
}
}
}
}
}
But when I search with the following query I still get two matches! Both the upper and lower cases which must mean the analyser is not applied when I store the records.
Search like so...
GET /upper/_search
{
"query": {
"term": {
"text": {
"value": "test"
}
}
}
}
Thanks in advance!
first thing first you set your analyzer on the title field instead of upon the text field (since your search is on the text property, and since you are indexing doc with only text property)
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
try
"properties": {
"text": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
and keep us posted ;)

Understanding Elasticsearch synonym

Being very new in Elasticsearch, I'm not sure what's the best way to use synonym.
I have two fields, one is hashtag and another one is name. Hashtag containing names in lower case without whitespace whereas name contains actual name in camel case format.
I want to search based on name in the right format and want to get all matching names along with those docs where it matches hashtag as well.
For example, name contains "Tom Cruise" and hashtag is "tomcruise". I want to search "Tom Cruise" and expected result is it will return all docs which has either name "Tom Cruise" or hashtag "tomcruise".
Here is the way I'm creating this index:
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"synonym" : {
"type" : "synonym",
"ignore_case" : true,
"synonyms" : [
"tom cruise => tomcruise, tom cruise"
]
}
},
"analyzer": {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
}
}
}
}
PUT /my_index/my_type/_mapping
{
"my_type": {
"properties": {
"hashtag": {
"type": "string",
"search_analyzer": "synonym",
"analyzer": "standard"
},
"name":{
"type": "keyword"
}
}
}
}
POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "hashtag": "tomcruise", "name": "abc" }
{ "index": { "_id": 2 }}
{ "hashtag": "tomhanks", "name": "efg" }
{ "index": { "_id": 3 }}
{ "hashtag": "tomcruise" , "name": "efg" }
{ "index": { "_id": 4 }}
{ "hashtag": "news" , "name": "Tom Cruise"}
{ "index": { "_id": 5 }}
{ "hashtag": "celebrity", "name": "Kate Winslet" }
{ "index": { "_id": 6 }}
{ "hashtag": "celebrity", "name": "Tom Cruise" }
When I do analyze, it looks like I get the right tokens: [tomcruise, tom, cruise]
GET /my_index/_analyze
{
"text": "Tom Cruise",
"analyzer": "synonym"
}
Here's how I'm searching:
POST /my_index/my_type/_search?pretty
{
"query":
{
"multi_match": {
"query": "Tom Cruise",
"fields": [ "hashtag", "name" ]
}
}
}
Is this the right way to archive my search requirement?
What's the best way to search like this on Kibana? I have to use the entire query but what I need to do if I want to just type "Tom Cruise" and want to get the expected result? I tried with "_all" but didn't work.
Updated:
After discussing with Russ Cam and with my little knowledge of Elasticsearch, I thought it will be overkill to use synonym for my search requirement. So I changed search analyzer to generate same token and got the same result. Still want to know whether I'm doing it in the right way.
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true
}
},
"analyzer": {
"test_analyzer" : {
"type": "custom",
"tokenizer" : "keyword",
"filter" : ["lowercase", "word_joiner"]
}
}
}
}
}

Resources