Elastic Search Closest match with space in fields

Elastic Search Closest match with space in fields - elasticsearch

I have data in elasticsearch fields which contains "face mask", I am able to fetch this data with the following query with search_term set to "face", "face mask", "mask" and "face masks".
{
"query": {
"bool": {
"must": [{
"multi_match": {
"query": ${search_term},
"fields": [
"title^4",
"titleNgram^3",
],
"fuzziness": "auto"
}
}]
}
}
}
What I am not able to achieve is when I query for "facemask" (no spaces in search_term) it does not return any document containing "face" or "mask", but, it returns documents with fields containing "facewash". Is there a way to achieve it?
Mapping is shared below
{
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text"
},
"titleNgram": {
"search_analyzer": "whitespace_analyzer",
"analyzer": "nGram_analyzer",
"type": "text"
},
"id": {
"index": false,
"type": "text"
}
}
}
}
}

You can try removing the "search_analyzer": "whitespace_analyzer", and reindex the data and use the same query and it should return you the expected results.
{
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text"
},
"titleNgram": {
"search_analyzer": "whitespace_analyzer", --> remove this line.
"analyzer": "nGram_analyzer",
"type": "text"
},
"id": {
"index": false,
"type": "text"
}
}
}
}
}
Edit: I tested it locally and it worked for me, below is the sample I used
Search query
{
"query": {
"multi_match": {
"query": "face mask",
"fields": [
"titlengram",
"title"
]
}
}
}
And it gives your expected doc
"hits": [
{
"_index": "ngram",
"_type": "_doc",
"_id": "1",
"_score": 2.3014567,
"_source": {
"title": "face mask",
"titlengram": "face mask"
}
}

Related

Searching Elasticsearch document by existing field not found but the field exists

First of all, I must say I'm on Elasticsearch 5.6.16
I'm trying to figuring out what's happening here. I have several documents indexed with this mapping (I copied the document directly from Kibana):
{
"_index": "my_index",
"_type": "doc",
"_id": "Outbreak_10346",
"_version": 1,
"_score": 1,
"_source": {
"outbreakId": 10346,
"reference": "XX-AD-2021-00003",
"countryCode": "BE",
"adisNotificationReasonType": {
"code": "TERRESTRIAL"
},
"approximateLocation": false,
"latitude": 50.93766,
"longitude": 3.97156,
"adminZoneLevelOne": {
"zoneId": 40,
"zoneCode": "BE2"
},
"affectedSpecies": [
{
"speciesId": 16703,
"name": "Swine",
"measuringUnit": "ANIMAL",
"casesQuantity": 10,
"deadQuantity": 1,
"susceptibleQuantity": 100,
"isAquatic": false
}
],
"affectedSpeciesTotalSusceptible": 100,
"affectedSpeciesTotalCases": 10
}
}
If I do this query in Kibana:
GET my_index/_search
{
"query": {
"exists": {
"field": "adminZoneLevelOne"
}
}
}
I don't get any results. But if I change the field to any of the others I find the documents.
Also, when I retrieve the documents I can access the adminZoneLevelOne field.
How's this possible? Why Elasticsearch doesn't find any document with that field?
The index mapping for adminZoneLevelOne field is:
"adminZoneLevelOne": {
"type": "nested",
"properties": {
"zoneCode": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "WHITESPACE"
},
"zoneId": {
"type": "long"
}
}
}
And for adisNotificationReasonType that works fine, is:
"adisNotificationReasonType": {
"properties": {
"code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "LOWERCASE_KEYWORD"
}
}
}

Since adminZoneLevelOne is of nested type, you need to use exists query along with the nested query as
{
"query": {
"nested": {
"path": "adminZoneLevelOne",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "adminZoneLevelOne"
}
}
]
}
}
}
}
}

Elastic Search_as_you_type case insensitive match

I want to perform partial search on 3 fields: UUID, tracking_id, and zip_code. They only contain 1 word and no special characters/space except hypen for UUID.
I'm not sure whether I should use search_as_you_type or edge ngram tokenizer or edge ngram token filter, so I tried search_as_you_type first.
I have created this index:
{
"settings": {
"index": {
"sort.field": [ "created_at", "id" ],
"sort.order": [ "desc", "desc" ]
}
},
"mappings": {
"properties": {
"id": { "type": "keyword", "fields": { "raw": { "type": "search_as_you_type" }}},
"current_status": { "type": "keyword" },
"tracking_id": { "type": "wildcard" },
"invoice_number": { "type": "keyword" },
"created_at": { "type": "date" }
}
}
}
and inserted this doc:
{
"id": "SIGRID",
"current_status": "unassigned",
"tracking_id": "AXXH",
"invoice_number": "xxx",
"created_at": "2021-03-24T09:36:10.717672467Z"
}
I sent this query:
{"query": {
"multi_match": {
"query": "sigrid",
"type": "bool_prefix",
"fields": [
"id"
]
}
}
}
this returns no result, but SIGRID, S, SIG returns the result. How can I make search_as_you_type query be case insensitive? should i use edge ngram tokenizer instead? Thanks

You can define a custom normalizer with a lowercase filter, lowercase filter will ensure that all the letters are changed to lowercase before indexing the document and searching. Modify your index mapping as
{
"settings": {
"index": {
"sort.field": [
"created_at",
"id"
],
"sort.order": [
"desc",
"desc"
]
},
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom", // note this
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "keyword",
"normalizer": "my_normalizer", // note this
"fields": {
"raw": {
"type": "search_as_you_type"
}
}
},
"current_status": {
"type": "keyword"
},
"tracking_id": {
"type": "wildcard"
},
"invoice_number": {
"type": "keyword"
},
"created_at": {
"type": "date"
}
}
}
}
Search Query:
{
"query": {
"multi_match": {
"query": "sigrid",
"type": "bool_prefix"
}
}
}
Search Result:
"hits": [
{
"_index": "66792606",
"_type": "_doc",
"_id": "1",
"_score": 2.0,
"_source": {
"id": "SIGRID",
"current_status": "unassigned",
"tracking_id": "AXXH",
"invoice_number": "xxx",
"created_at": "2021-03-24T09:36:10.717672467Z"
}
}
]

Elasticsearch: Full text search

I'm trying to build an Elasticsearch full-text search query with the following text "Gold Cartier watches" on multiple fields.
I have to follow this rule: First find all "Gold" documents. From retrieve "Gold" documents, find all "Cartier" documents and from them, find all "watches" documents.
This is my multi_match query:
{
"query": {
"multi_match": {
"query": "Fred or rose",
"fields": [
"name",
"status",
"categories.name",
"brand.name",
"reference"
]
}
}
}
There is my mapping
{
"product": {
"mappings": {
"product": {
"dynamic_date_formats": [],
"properties": {
"available": {
"type": "text"
},
"brand": {
"properties": {
"available": {
"type": "text"
},
"name": {
"type": "keyword"
},
"shopProductBrands": {
"properties": {
"available": {
"type": "text"
},
"priority": {
"type": "integer"
},
"slug": {
"type": "keyword"
}
}
},
"slug": {
"type": "keyword"
}
}
},
"categories": {
"type": "nested",
"properties": {
"available": {
"type": "text"
},
"brand": {
"properties": {
"available": {
"type": "text"
},
"name": {
"type": "keyword"
},
"slug": {
"type": "keyword"
}
}
},
"name": {
"type": "keyword"
},
"parent": {
"type": "keyword"
},
"slug": {
"type": "keyword"
}
}
},
"createdAt": {
"type": "date",
"format": "date_time_no_millis"
},
"longDescription": {
"type": "text",
"analyzer": "french_search"
},
"name": {
"type": "text",
"boost": 15,
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "french_search"
},
"purchasePrice": {
"type": "double"
},
"rawPrice": {
"type": "double"
},
"reference": {
"type": "keyword",
"boost": 10
},
"shortDescription": {
"type": "text",
"boost": 3,
"analyzer": "french_search"
},
"slug": {
"type": "keyword"
},
"status": {
"type": "text"
},
"updatedAt": {
"type": "date",
"format": "date_time_no_millis"
}
}
}
}
}
}
My search will retrieve all "Gold", "Cartier" and "watches" documents combined.
How can I build a query that follow my rule ?
Thanks

I'm not sure that there's an easy solution. I think the closest you can get is to use cross_fields with "operator": "and" and only search fields that have the same analyzer. Can you add "french_search" versions of each of these fields?
cross_fields analyzes the query string into individual terms, then
looks for each term in any of the fields, as though they were one big
field.
However:
The cross_field type can only work in term-centric mode on fields that
have the same analyzer. ... If there are multiple groups, they are
combined with a bool query.
So this query:
{
"query": {
"multi_match": {
"type": "cross_fields",
"query": "gold Cartier watches",
"fields": [
"name",
"status",
"categories.name",
"brand.name",
"reference"
]
}
}
}
Will become something like this:
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "gold Cartier watches",
"fields": ["name"]
}
},
{
"multi_match": {
"query": "gold Cartier watches",
"fields": ["status"]
}
},
{
"multi_match": {
"query": "gold Cartier watches",
"fields": [
"categories.name",
"brand.name",
"reference"
]
}
}
]
}
}
That query is too loose, but adding "operator": "and" or "minimum_should_match": "100%" would be too strict.
It's not pretty or efficient, but you could do application-side term parsing and build a boolean query. Something like this:
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "gold",
"fields": [
"name",
"status",
...
"reference"
]
}
},
{
"multi_match": {
"query": "Cartier",
"fields": [
"name",
"status",
...
"reference"
]
}
}
...
]
}
}

You can use this approach
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_boolean_operators
The preferred operators are + (this term must be present) and - (this term must not be present). All other terms are optional. For example, this query:
quick brown +fox -news
states that:
fox must be present
news must not be present
quick and brown are optional — their presence increases the relevance
The familiar boolean operators AND, OR and NOT (also written &&, || and !) are also supported but beware that they do not honor the usual precedence rules, so parentheses should be used whenever multiple operators are used together. For instance, the previous query could be rewritten as:
((quick AND fox) OR (brown AND fox) OR fox) AND NOT news
U can also use boosting for weighing-up result for a specific term https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_boosting

Boost score based on integer value - Elasticsearch

I'm not very experienced with ElasticSearch and would like to know how to boost a search based on a certain integer value.
This is an example of a document:
{
"_index": "links",
"_type": "db1",
"_id": "mV32vWcBZsblNn1WqTcN",
"_score": 8.115617,
"_source": {
"url": "example.com",
"title": "Example website",
"description": "This is an example website, used for various of examples around the world",
"likes": 9,
"popularity": 543,
"tags": [
{
"name": "example",
"votes": 5
},
{
"name": "test",
"votes": 2
},
{
"name": "testing",
"votes": 1
}
]
}
}
Now in this particular search, the focus is on the tags and I would like to know how to boost the _score and multiply it by the integer in the votes under tags.
If this is not possible (or very hard to achieve), I would simply like to know how to boost the _score by the votes (not under tags)
Example, add 0.1 to the _score for each integer in votes
This is the current search query I'm using (for searching tags only):
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool":{
"should":{
"match":{
"tags.name":"example,testing,something else"
}
}
}
}
}
}
}
I couldn't find much online, and hope someone can help me out.
How do I boost the _score with an integer value?
Update
For more info, here is the mapping:
{
"links": {
"mappings": {
"db1": {
"properties": {
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"likes": {
"type": "long"
},
"popularity": {
"type": "long"
},
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"votes": {
"type": "long"
}
}
}
}
}
}
}
}
Update 2
Changed the tags.likes/tags.dislikes to tags.votes, and added a nested property to the tags

This took a long time to figure out. I have learnt so much on my way there.
Here is the final result:
{
"query": {
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"tags.name": "example"
}
},
{
"match": {
"tags.name": "testing"
}
},
{
"match": {
"tags.name": "test"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "tags.votes"
}
}
],
"boost_mode": "multiply"
}
}
}
}
}
The array in should has helped a lot, and was glad I could combine it with function_score

You are looking at function score query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
And field value factor https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor.
Snippet from documentation:
GET /_search
{
"query": {
"function_score": {
"field_value_factor": {
"field": "tags.dislikes",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
Or with script score because your nested tags field (not sure if field value score works fine with nested structure).

How can I retrieve matching children only?

Consider a very simple model where we have locations and each location can have zero or more events. A location would have properties such as name, description and geo point data (lon/lat). An event should be attached to one location (its parent) and should have a name and description.
{
"location" : {
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" },
"geo": { "type": "geo_point" },
"exhibits": {
"type": "nested",
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
}
}
What I want to be able to do, is to query for the child documents (events) performing a full text search on their names and descriptions. I would like to get the matching events back and be able to also get their parent location's name. I would also like to narrow down the result set by location's coordinates. I don't want to get any events that do not match the query. Is that possible in Elastic Search? What types of queries should I use?
I have tried putting events as an array property under location (see above) and using the nested query but it does not return the kind of results I want (I think it returns the whole location, including all events, even the ones that do not match my query). I have tried putting events into a separate index (mapping?) providing the _parent property and then performing the top_children query on locations, but I don't get any results.
{
"exhibit": {
"_parent": { "type": "locations" },
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
Could anyone shed some light? I don't know where to begin...

Here's the working solution to my problem, perhaps it will be useful to somebody.
Location mapping:
{
"location" : {
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" },
"geo": { "type": "geo_point" }
}
}
}
Exhibit mapping:
{
"exhibit": {
"_parent": { "type": "locations" },
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
Query:
{
"fields": [ "_parent", "name", "_source" ],
"query": {
"bool": {
"should": [
{ "text": { "name": "candy" } },
{ "text": { "description": "candy" } }
]
}
},
"filter": {
"and": [
{
"terms" : {
"_parent": [ "4e7089a9b97d640b30695b7a", "4e7089eeb97d640b30695b7b" ]
}
},
{ "range": { "start": { "lte": "2011-09-22" } } },
{ "range": { "end": { "gte": "2011-09-22" } } }
]
}
}
You should query using the _parent field and passing it an array of IDs of locations to which you want to limit the exhibits.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elastic Search Closest match with space in fields - elasticsearch

Related

Searching Elasticsearch document by existing field not found but the field exists

Elastic Search_as_you_type case insensitive match

Elasticsearch: Full text search

Boost score based on integer value - Elasticsearch

How can I retrieve matching children only?

Categories

Resources