Searching Elasticsearch document by existing field not found but the field exists

Searching Elasticsearch document by existing field not found but the field exists - elasticsearch

First of all, I must say I'm on Elasticsearch 5.6.16
I'm trying to figuring out what's happening here. I have several documents indexed with this mapping (I copied the document directly from Kibana):
{
"_index": "my_index",
"_type": "doc",
"_id": "Outbreak_10346",
"_version": 1,
"_score": 1,
"_source": {
"outbreakId": 10346,
"reference": "XX-AD-2021-00003",
"countryCode": "BE",
"adisNotificationReasonType": {
"code": "TERRESTRIAL"
},
"approximateLocation": false,
"latitude": 50.93766,
"longitude": 3.97156,
"adminZoneLevelOne": {
"zoneId": 40,
"zoneCode": "BE2"
},
"affectedSpecies": [
{
"speciesId": 16703,
"name": "Swine",
"measuringUnit": "ANIMAL",
"casesQuantity": 10,
"deadQuantity": 1,
"susceptibleQuantity": 100,
"isAquatic": false
}
],
"affectedSpeciesTotalSusceptible": 100,
"affectedSpeciesTotalCases": 10
}
}
If I do this query in Kibana:
GET my_index/_search
{
"query": {
"exists": {
"field": "adminZoneLevelOne"
}
}
}
I don't get any results. But if I change the field to any of the others I find the documents.
Also, when I retrieve the documents I can access the adminZoneLevelOne field.
How's this possible? Why Elasticsearch doesn't find any document with that field?
The index mapping for adminZoneLevelOne field is:
"adminZoneLevelOne": {
"type": "nested",
"properties": {
"zoneCode": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "WHITESPACE"
},
"zoneId": {
"type": "long"
}
}
}
And for adisNotificationReasonType that works fine, is:
"adisNotificationReasonType": {
"properties": {
"code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "LOWERCASE_KEYWORD"
}
}
}

Since adminZoneLevelOne is of nested type, you need to use exists query along with the nested query as
{
"query": {
"nested": {
"path": "adminZoneLevelOne",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "adminZoneLevelOne"
}
}
]
}
}
}
}
}

Related

Select documents by array of objects when at least one object doesn't contain necessary field Elasticsearch

I have documents in the elasticsearch and can't understand how to apply search script that should return documents if any attachment doesn't contain uuid or uuid is null. Version of elastic 5.2.
Mapping of documents
"mappings": {
"documentType": {
"properties": {
"attachment": {
"properties": {
"uuid": {
"type": "text"
},
"path": {
"type": "text"
},
"size": {
"type": "long"
}
}
}}}
In the elasticsearch it looks like
{
"_index": "documents",
"_type": "documentType",
"_id": "1",
"_score": 1.0,
"_source": {
"attachment": [
{
"uuid": "21321321",
"path": "../uploads/somepath",
"size":1231
},
{
"path": "../uploads/somepath",
"size":1231
},
]},
{
"_index": "documents",
"_type": "documentType",
"_id": "2",
"_score": 1.0,
"_source": {
"attachment": [
{
"uuid": "223645641321321",
"path": "../uploads/somepath",
"size":1231
},
{
"uuid": "22341424321321",
"path": "../uploads/somepath",
"size":1231
},
]},
{
"_index": "documents",
"_type": "documentType",
"_id": "3",
"_score": 1.0,
"_source": {
"attachment": [
{
"uuid": "22789789341321321",
"path": "../uploads/somepath",
"size":1231
},
{
"path": "../uploads/somepath",
"size":1231
},
]}
As result I want to get attachments with _id 1 and 3. But as result I get error of the script
I tried to apply next script:
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "attachment"
}
},
{
"script": {
"script": {
"inline": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
"lang": "painless"
}
}
}
]
}
}
}
Error is next:
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:77)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:36)",
"for (item in doc['attachment'].value) { ",
" ^---- HERE"
],
"script": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
"lang": "painless"
}
],
Is it possible to select documents in case even one attachment object doesn't contain uuid ?

Iterating arrays of objects is not as trivial as one would expect. I've written extensively about it here and here.
Since your attachments are not defined as nested, ES will internally represent them as flattened lists of values (also called "doc values"). For instance attachment.uuid in doc#2 will become ["223645641321321", "22341424321321"], and attachments.size will turn into [1231, 1231].
This means that you can simply compare the .length of these flattened representations! I assume attachment.size will always be present and can be thus taken as the comparison baseline.
One more thing. To take advantage of these optimized doc values for textual fields, it'll require one small mapping change:
PUT documents/documentType/_mappings
{
"properties": {
"attachment": {
"properties": {
"uuid": {
"type": "text",
"fielddata": true <---
},
"path": {
"type": "text"
},
"size": {
"type": "long"
}
}
}
}
}
When that's done and you've reindexed your docs — which can be done with this little Update by query trick:
POST documents/_update_by_query
You can then use the following script query:
POST documents/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "attachment"
}
},
{
"script": {
"script": {
"inline": "def size_field_length = doc['attachment.size'].length; def uuid_field_length = doc['attachment.uuid'].length; return uuid_field_length < size_field_length",
"lang": "painless"
}
}
}
]
}
}
}

Just to supplement this answer. If mapping for uuid field was created automatically elastic search adds it in this way:
"uuid": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
then script could look like:
POST documents/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "attachment"
}
},
{
"script": {
"script": {
"inline": "doc['attachment.size'].length > doc['attachment.uuid.keyword'].length",
"lang": "painless"
}
}
}
]
}
}
}

Elastic Search_as_you_type case insensitive match

I want to perform partial search on 3 fields: UUID, tracking_id, and zip_code. They only contain 1 word and no special characters/space except hypen for UUID.
I'm not sure whether I should use search_as_you_type or edge ngram tokenizer or edge ngram token filter, so I tried search_as_you_type first.
I have created this index:
{
"settings": {
"index": {
"sort.field": [ "created_at", "id" ],
"sort.order": [ "desc", "desc" ]
}
},
"mappings": {
"properties": {
"id": { "type": "keyword", "fields": { "raw": { "type": "search_as_you_type" }}},
"current_status": { "type": "keyword" },
"tracking_id": { "type": "wildcard" },
"invoice_number": { "type": "keyword" },
"created_at": { "type": "date" }
}
}
}
and inserted this doc:
{
"id": "SIGRID",
"current_status": "unassigned",
"tracking_id": "AXXH",
"invoice_number": "xxx",
"created_at": "2021-03-24T09:36:10.717672467Z"
}
I sent this query:
{"query": {
"multi_match": {
"query": "sigrid",
"type": "bool_prefix",
"fields": [
"id"
]
}
}
}
this returns no result, but SIGRID, S, SIG returns the result. How can I make search_as_you_type query be case insensitive? should i use edge ngram tokenizer instead? Thanks

You can define a custom normalizer with a lowercase filter, lowercase filter will ensure that all the letters are changed to lowercase before indexing the document and searching. Modify your index mapping as
{
"settings": {
"index": {
"sort.field": [
"created_at",
"id"
],
"sort.order": [
"desc",
"desc"
]
},
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom", // note this
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "keyword",
"normalizer": "my_normalizer", // note this
"fields": {
"raw": {
"type": "search_as_you_type"
}
}
},
"current_status": {
"type": "keyword"
},
"tracking_id": {
"type": "wildcard"
},
"invoice_number": {
"type": "keyword"
},
"created_at": {
"type": "date"
}
}
}
}
Search Query:
{
"query": {
"multi_match": {
"query": "sigrid",
"type": "bool_prefix"
}
}
}
Search Result:
"hits": [
{
"_index": "66792606",
"_type": "_doc",
"_id": "1",
"_score": 2.0,
"_source": {
"id": "SIGRID",
"current_status": "unassigned",
"tracking_id": "AXXH",
"invoice_number": "xxx",
"created_at": "2021-03-24T09:36:10.717672467Z"
}
}
]

Elastic Search Closest match with space in fields

I have data in elasticsearch fields which contains "face mask", I am able to fetch this data with the following query with search_term set to "face", "face mask", "mask" and "face masks".
{
"query": {
"bool": {
"must": [{
"multi_match": {
"query": ${search_term},
"fields": [
"title^4",
"titleNgram^3",
],
"fuzziness": "auto"
}
}]
}
}
}
What I am not able to achieve is when I query for "facemask" (no spaces in search_term) it does not return any document containing "face" or "mask", but, it returns documents with fields containing "facewash". Is there a way to achieve it?
Mapping is shared below
{
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text"
},
"titleNgram": {
"search_analyzer": "whitespace_analyzer",
"analyzer": "nGram_analyzer",
"type": "text"
},
"id": {
"index": false,
"type": "text"
}
}
}
}
}

You can try removing the "search_analyzer": "whitespace_analyzer", and reindex the data and use the same query and it should return you the expected results.
{
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text"
},
"titleNgram": {
"search_analyzer": "whitespace_analyzer", --> remove this line.
"analyzer": "nGram_analyzer",
"type": "text"
},
"id": {
"index": false,
"type": "text"
}
}
}
}
}
Edit: I tested it locally and it worked for me, below is the sample I used
Search query
{
"query": {
"multi_match": {
"query": "face mask",
"fields": [
"titlengram",
"title"
]
}
}
}
And it gives your expected doc
"hits": [
{
"_index": "ngram",
"_type": "_doc",
"_id": "1",
"_score": 2.3014567,
"_source": {
"title": "face mask",
"titlengram": "face mask"
}
}

How to apply filter only if nested mapping exists

I'm trying to apply location radius on nested ES query but the nested value is not present all the time causing exception
"[nested] nested object under path [contact.address] is not of nested type"
I tried to check if the property exists then apply filter but nothing worked so far
The mapping is like:
{
"records": {
"mappings": {
"user": {
"properties": {
"user_id": {
"type": "long"
},
"contact": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"location": {
"properties": {
"lat": {
"type": "long"
},
"lng": {
"type": "long"
},
"lon": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"properties": {
"first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
},
"created_at": {
"type": "date"
}
}
}
}
}
}
and sometimes the records do not have the location or address data which cases problems. sample record:
{
"contact": {
"name": {
"first_name": "Test",
"last_name": "User"
},
"email": "test#user.com",
"address": {}
},
"user_id": 532188
}
here is what i'm trying:
GET records/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "contact.address"
}
},
{
"exists": {
"field": "contact.address.location"
}
}
],
"minimum_should_match": 1,
"should": [
{
"bool": {
"filter": {
"nested": {
"ignore_unmapped": true,
"path": "contact.address",
"query": {
"geo_distance": {
"distance": "50mi",
"contact.address.location": {
"lat": 51.5073509,
"lon": -0.1277583
}
}
}
}
}
}
}
]
}
}
}

You need to define proper mapping with nested datatype to avoid this issue, looks dynamic mapping is creating some issue.
I defined my own mapping with nested datatype and even when I miss, some data in the nested fields, it doesn't complain.
Index def
{
"mappings": {
"properties": {
"user_id": {
"type": "long"
},
"contact": {
"type": "nested"
}
}
}
}
Index sample doc
{
"contact": {
"name": {
"first_name": "raza",
"last_name": "ahmed"
},
"email": "opster#user.com",
"address" :{ --> note empty nested field
}
},
"user_id": 123456
}
Index another doc with data in the nested field
{
"contact": {
"name": {
"first_name": "foo",
"last_name": "bar"
},
"email": "opster#user.com",
"address": {
"location" :{. --> note nested data as well
"lat" : 51.5073509,
"lon" : -0.1277583
}
}
},
"user_id": 123456
}
Index another doc, which doesn't have even empty nested data
{
"contact": {
"name": {
"first_name": "foo",
"last_name": "bar"
},
"email": "opster#user.com"
},
"user_id": 123456
}
Search query using nested field
{
"query": {
"nested": {
"path": "contact", --> note this
"query": {
"bool": {
"must": [
{
"exists": {
"field": "contact.address"
}
},
{
"exists": {
"field": "contact.name.first_name"
}
}
]
}
}
}
}
}
The search result doesn't complain about the docs which don't include the nested doc (query which gives you issues)
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "3",
"_score": 2.0,
"_source": {
"contact": {
"name": {
"first_name": "foo",
"last_name": "bar"
},
"email": "opster#user.com",
"address": { --> note the nested doc
"location": {
"lat": 51.5073509,
"lon": -0.1277583
}
}
},
"user_id": 123456
}
}

Boost score based on integer value - Elasticsearch

I'm not very experienced with ElasticSearch and would like to know how to boost a search based on a certain integer value.
This is an example of a document:
{
"_index": "links",
"_type": "db1",
"_id": "mV32vWcBZsblNn1WqTcN",
"_score": 8.115617,
"_source": {
"url": "example.com",
"title": "Example website",
"description": "This is an example website, used for various of examples around the world",
"likes": 9,
"popularity": 543,
"tags": [
{
"name": "example",
"votes": 5
},
{
"name": "test",
"votes": 2
},
{
"name": "testing",
"votes": 1
}
]
}
}
Now in this particular search, the focus is on the tags and I would like to know how to boost the _score and multiply it by the integer in the votes under tags.
If this is not possible (or very hard to achieve), I would simply like to know how to boost the _score by the votes (not under tags)
Example, add 0.1 to the _score for each integer in votes
This is the current search query I'm using (for searching tags only):
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool":{
"should":{
"match":{
"tags.name":"example,testing,something else"
}
}
}
}
}
}
}
I couldn't find much online, and hope someone can help me out.
How do I boost the _score with an integer value?
Update
For more info, here is the mapping:
{
"links": {
"mappings": {
"db1": {
"properties": {
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"likes": {
"type": "long"
},
"popularity": {
"type": "long"
},
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"votes": {
"type": "long"
}
}
}
}
}
}
}
}
Update 2
Changed the tags.likes/tags.dislikes to tags.votes, and added a nested property to the tags

This took a long time to figure out. I have learnt so much on my way there.
Here is the final result:
{
"query": {
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"tags.name": "example"
}
},
{
"match": {
"tags.name": "testing"
}
},
{
"match": {
"tags.name": "test"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "tags.votes"
}
}
],
"boost_mode": "multiply"
}
}
}
}
}
The array in should has helped a lot, and was glad I could combine it with function_score

You are looking at function score query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
And field value factor https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor.
Snippet from documentation:
GET /_search
{
"query": {
"function_score": {
"field_value_factor": {
"field": "tags.dislikes",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
Or with script score because your nested tags field (not sure if field value score works fine with nested structure).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Searching Elasticsearch document by existing field not found but the field exists - elasticsearch

Since adminZoneLevelOne is of nested type, you need to use exists query along with the nested query as { "query": { "nested": { "path": "adminZoneLevelOne", "query": { "bool": { "must": [ { "exists": { "field": "adminZoneLevelOne" } } ] } } } } }

Related

Select documents by array of objects when at least one object doesn't contain necessary field Elasticsearch

Elastic Search_as_you_type case insensitive match

Elastic Search Closest match with space in fields

How to apply filter only if nested mapping exists

Boost score based on integer value - Elasticsearch

Categories

Resources