Elasticsearch Only Searching From Start - elasticsearch

Currently, Elasticsearch is only searching through the mapped items from the beginning of the string instead of throughout the string.
I have a custom analyzer, as well as a custom edge ngram tokenizer.
I am currently using bool queries from within javascript to search the index.
Index
{
"homestead_dev_index": {
"aliases": {},
"mappings": {
"elasticprojectnode": {
"properties": {
"archived": {
"type": "boolean"
},
"id": {
"type": "text",
"analyzer": "full_name"
},
"name": {
"type": "text",
"analyzer": "full_name"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "homestead_dev_index",
"creation_date": "1535439085947",
"analysis": {
"analyzer": {
"full_name": {
"filter": [
"standard",
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "mytok"
}
},
"tokenizer": {
"mytok": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "10"
}
}
},
"number_of_replicas": "1",
"uuid": "iCa7qKJVRU-_MA8sCYIAXw",
"version": {
"created": "5060399"
}
}
}
}
}
Query Body
{
"query": {
"bool": {
"should": [
{ "match": { "name": this.searchString } },
{ "match": { "id": this.searchString } }
]
}
},
"highlight": {
"pre_tags": ["<b style='background-color:yellow'>"],
"post_tags": ["</b>"],
"fields": {
"name": {},
"id": {}
}
}
}
Example
If I have projects with the names "Road - Area 1", "Road - Area 2" and "Sub-area 5 - Road" and the user searches for "Road", only "Road - Area 1" and "Road - Area 2" will display with the word "Road" highlighted in yellow.
The code needs to pick up the final project as well.

I seem to have figured it out.
In the original description, I am using the edge_ngram tokenizer when I am supposed to be using the ngram tokenizer.
Found on: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html#_partial_word_tokenizers

Related

field type as text and completion in elastic serach

I am trying to have title field as both text and completion types in elastic search.
As shown below
PUT playlist
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2,
"analysis": {
"filter": {
"custom_english_stemmer": {
"type": "stemmer",
"name": "english"
},
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"custom_lowercase_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"english_stop",
"custom_english_stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "long",
"index": false,
"doc_values": false
},
"title": {
"type": "text",
"analyzer": "custom_lowercase_analyzer",
"fields": {
"raw": {
"type": "completion"
}
}
}
}
}
}
The below suggestion query works
POST media/_search
{
"_source": ["id", "title"],
"suggest": {
"job-suggest": {
"prefix": "sri",
"completion": {
"field": "title"
}
}
}
}
But normal search would fail on the same title
GET media/_search
{
"_source": ["id", "title"],
"query" : {
"query_string": {
"query" : "*sri*",
"fields" : [
"title"
]
}
}
}
Please help me solve this problem

Elasticsearch - How to search for multiple words in one string

I'm having issues getting the elasticsearch results i need.
My mappings look like this:
"mappings": {
"product": {
"_meta": {
"model": "App\\Entity\\Product"
},
"dynamic_date_formats": [],
"properties": {
"articleNameSearch": {
"type": "text",
"analyzer": "my_analyzer"
},
"articleNumberSearch": {
"type": "text",
"fielddata": true
},
"brand": {
"type": "nested",
"properties": {
"name": {
"type": "text"
}
}
}
}
}
},
My settings:
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "my_index",
"creation_date": "1572252785482",
"analysis": {
"filter": {
"standard": {
"type": "standard"
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"standard"
],
"type": "custom",
"tokenizer": "lowercase"
}
}
},
"number_of_replicas": "1",
"uuid": "bwmc7NZ9RXqB1lpQ3e8HTQ",
"version": {
"created": "5060399"
}
}
}
The data inside:
"hits": [
{
"_index": "my_index",
"_type": "product",
"_id": "14",
"_score": 1.0,
"_source": {
"articleNumberSearch": "5003xx843",
"articleNameSearch": "this is a test string",
"brand": {
"name": "Brand name"
}
}
},
Currently the PHP code for the query looks like this (this does not return correct records):
$searchQuery = new BoolQuery();
$formattedQuery = "*" . str_replace(['.', '|'], '', trim(mb_strtolower($query))) . "*";
/**
* Test NGRAM analyzer
*/
$matchQuery = new Query\MultiMatch();
$matchQuery->setFields([
'articleNumberSearch',
'articleNameSearch',
]);
$matchQuery->setQuery($formattedQuery);
$searchQuery->addMust($matchQuery);
/**
* Nested query
*/
$nestedQuery = new Nested();
$nestedQuery->setPath('brand');
$nestedQuery->setQuery(
new Match('brand.name', 'Brand name')
);
$searchQuery->addMust($nestedQuery);
I'm creating and auto-complete search field, where you can search articleNumberSearch and articleNameSearch while brand name is always a fixed value.
I want to be able to search for example:
500 will find this hit, because 500 is in the articleNumberSearch.
But also be able to search:
this is string
Couple questions:
Which query do i need to use?
Am i using the right analyzer?
Is my anaylizer correctly configured?
You should create an ngram type tokenizer.
The ngram tokenizer first breaks text down into words whenever it encounters one of a list of specified characters.
Something like that:
"analysis": {
"analyzer": {
"autocomplete": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit",
"symbol",
"punctuation"
],
"min_gram": "1",
"type": "ngram",
"max_gram": "2"
}
}
}
NGram Tokenizer

Elasticsearch match query on tokens returns less relevant result

Hi can somebody help me to understand how Elasticsearch evaluates the relevance of tokens? I have a field nn which mapping looks like
{
"settings": {
"index": {
"refresh_interval": "-1",
"number_of_shards": "4",
"analysis": {
"filter": {
"stopwords_SK": {
"ignore_case": "true",
"type": "stop",
"stopwords_path": "stopwords/slovak.txt"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "2",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"stopwords_SK",
"lowercase",
"stopwords_SK",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1"
}
},
"mappings": {
"doc": {
"dynamic": "strict",
"properties": {
"nn": {
"type": "text",
"fielddata": true,
"fields": {
"raw": {
"type": "keyword"
}
},
"boost": 10,
"analyzer": "autocomplete"
}
}
}
}
}
The nn field is tokenized via standard tokenizer. Next simple query works well and returns relevant result like "softone sro", "softec sro"...
{
"_source": [
"nn",
"nazov"
],
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"nn": "softo"
}
}
]
}
}
}
But if I need to add should condition to the query it returns absolutely no relevant results and previous most relevant like "sofone" or "softex" are missing. It returns e.g. "zo soz kovo zts nova as zts elektronika as" or "agentura socialnych sluzieb ass no"...
Here is the should query
{
"_source": [
"nn",
"nazov"
],
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"nn": "softo"
}
}
],
"should": [
{
"match": {
"nn": "as"
}
},
{
"match": {
"nn": "sro"
}
}
]
}
}
}
Why the should query result missing "sofone" and "softex" items which are the most relevant in the first query? I though the relevance is based on the token length which means the "sotf" token is more relevant then "so" token.
Thanks.

Completion Suggester Not working as expected

{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"suggest": {
"type": "completion",
"analyzer": "autocomplete"
},
"hostname": {
"type": "text"
}
}
}
}
}
`
Above mapping is stored in Elastic search.
POST index/test
{
"hostname": "testing-01",
"suggest": [{"input": "testing-01"}]
}
POST index/test
{
"hostname": "testing-02",
"suggest": [{"input":"testing-02"}]
}
POST index/test
{
"hostname": "w1-testing-01",
"suggest": [{"input": "w1-testing-01"}]
}
POST index/test
{
"hostname": "w3-testing-01",
"suggest": [{"input": "w3-testing-01"}]
}
`
When there are 30 documents with hostname starting w1 and hostnames w3, when term "w3" is searched, I get suggestions of all w1 first and then w3.
Suggestion Query
{
"query": {
"_source": {
"include": [
"text"
]
},
"suggest": {
"server-suggest": {
"text": "w1",
"completion": {
"field": "suggest",
"size": 10
}
}
}
}
}
Tried different analyzers, same issue.
can some body guide ?
It's a common trap. It is because the min_ngram is 1, and hence, both w1-testing-01 and w3-testing-01 will produce the token w. Since you only specified analyzer, the autocomplete analyzer will also kick in at search time and hence searching suggestions for w3 will also produce the token w, hence why both w1-testing-01 and w3-testing-01 match.
The solution is to add a search_analyzer to your suggest field so that the autocomplete analyzer is not used at search time (you can use the standard, keyword or whatever analyzer makes sense for your use case), but only at indexing time.
"mappings": {
"test": {
"properties": {
"suggest": {
"type": "completion",
"analyzer": "autocomplete",
"search_analyzer": "standard" <-- add this
},
"hostname": {
"type": "text"
}
}
}
}

Not able to search a phrase in elasticsearch 5.4

I am searching for a phrase in a email body. Need to get the exact data filtered like, if I search for 'Avenue New', it should return only results which has the phrase 'Avenue New' not 'Avenue Street', 'Park Avenue'etc
My mapping is like:
{
"exchangemailssql": {
"aliases": {},
"mappings": {
"email": {
"dynamic_templates": [
{
"_default": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"doc_values": true,
"type": "keyword"
}
}
}
],
"properties": {
"attachments": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"body": {
"type": "text",
"analyzer": "keylower",
"fielddata": true
},
"count": {
"type": "short"
},
"emailId": {
"type": "long"
}
}
}
},
"settings": {
"index": {
"refresh_interval": "3s",
"number_of_shards": "1",
"provided_name": "exchangemailssql",
"creation_date": "1500527793230",
"analysis": {
"filter": {
"nGram": {
"min_gram": "4",
"side": "front",
"type": "edge_ngram",
"max_gram": "100"
}
},
"analyzer": {
"keylower": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
},
"email": {
"filter": [
"lowercase",
"unique",
"nGram"
],
"type": "custom",
"tokenizer": "uax_url_email"
},
"full": {
"filter": [
"lowercase",
"snowball",
"nGram"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "0",
"uuid": "2XTpHmwaQF65PNkCQCmcVQ",
"version": {
"created": "5040099"
}
}
}
}
}
I have given the search query like:
{
"query": {
"match_phrase": {
"body": "Avenue New"
}
},
"highlight": {
"fields" : {
"body" : {}
}
}
}
The problem here is that you're tokenizing the full body content using the keyword tokenizer, i.e. it will be one big lowercase string and you cannot search inside of it.
If you simply change the analyzer of your body field to standard instead of keylower, you'll find what you need using the match_phrase query.
"body": {
"type": "text",
"analyzer": "standard", <---change this
"fielddata": true
},

Resources