Return field even if specific field value isn't available - elasticsearch

I have this bool query:
{
"bool": {
"must_not": [
{
"exists": {
"field": "*multiparttype.doNotDisplay",
"boost": 1
}
}
],
"should": [
{
"exists": {
"field": "multiparttype",
"boost": 1
}
},
{
"exists": {
"field": "*multiparttype.oldValue",
"boost": 1
}
},
{
"exists": {
"field": "*multiparttype.newValue",
"boost": 1
}
}
]
}
}
It return data if ES has following structure. If a document exist like below, this query will work and return this documents
multiparttype{
oldValue: "YY",
newValue:"XXX",
type:10
}
But if document just have this:
multiparttype{
type:10
}
OR
multiparttype{
}
Above query wont return this document
How can i make it possible??

Based on your problem, you need to use a match_all which will match against all documents, which would return all documents with a score of "1.0".
The following data was in the index:
multiparttype = { "oldValue" : "versionX","newValue" : "versionY"}
multiparttype = { "oldValue" : "versionX","newValue" : "versionY"}
empty_field : "test",multiparttype : {}
multiparttype" = {"type" : "typetest"}
The following query was corrected taking into account the boost which can be changed based on the requirements.
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"exists": {
"field": "multiparttype.oldValue",
"boost": 1
}
},
{
"exists": {
"field": "multiparttype.newValue",
"boost": 1
}
}
],
"must_not": [
{
"exists": {
"field": "*multiparttype.doNotDisplay"
}
}
]
}
}
The following response will be generated:
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 3.0,
"hits" : [
{
"_index" : "stackoverflow-field",
"_type" : "_doc",
"_id" : "7Qg7TnQB3IIDvL59KA7i",
"_score" : 3.0,
"_source" : {
"multiparttype" : {
"oldValue" : "versionX",
"newValue" : "versionY"
}
}
},
{
"_index" : "stackoverflow-field",
"_type" : "_doc",
"_id" : "1wmWTnQB3IIDvL59lAAL",
"_score" : 1.0,
"_source" : {
"multiparttype" : {
"type" : "typetest"
}
}
},
{
"_index" : "stackoverflow-field",
"_type" : "_doc",
"_id" : "tQmbTnQB3IIDvL59Zgy7",
"_score" : 1.0,
"_source" : {
"empty_field" : "test"
}
},
{
"_index" : "stackoverflow-field",
"_type" : "_doc",
"_id" : "tQmcTnQB3IIDvL59fA8Z",
"_score" : 1.0,
"_source" : {
"empty_field" : "test",
"multiparttype" : { }
}
}
]
}
Documentation : https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-all-query.html

Related

How to query all content from a field in Elasticssearch

I'm queriying data from Elasticsearch with python. I can query a certain value in a field like this:
GET index/_search
{
"query": {
"match" : {
"somefieldname": "somevalue"
}
}
}
But how can I query all values inside the field somefieldname?
UPDATE:
Here's an example index:
"_index" : „indexname
"_type" : "_doc",
"_id" : "lJlcO3wBhlKWxmXE9jrd",
"_score" : 0,
"_source": {
„field1“: „abc“,
„field2“: „123",
„field3": „def“,
},
"_index" : „indexname
"_type" : "_doc",
"_id" : "lJlcO3wBhlKWxmXE9jrd",
"_score" : 0,
"_source": {
„field1“: „fgh“,
„field2“: „654",
„field3": „kui“,
},
"_index" : „indexname
"_type" : "_doc",
"_id" : "lJlcO3wBhlKWxmXE9jrd",
"_score" : 00,
"_source": {
„field1“: „567“,
„field2“: „gfr",
„field3": „234“,
},
Now I want to query all content from field2 from all docs. So that my output is [„123", „654", „gfr"]
UPDATE:
Index mapping for the field:
{
"myindex" : {
"mappings" : {
"field2" : {
"full_name" : "field2",
"mapping" : {
"field2" : {
"type" : "keyword"
}
}
}
}
}
}
You can use terms aggregation, to get unique values from field2
{
"size": 0,
"aggs": {
"field2values": {
"terms": {
"field": "field2"
}
}
}
}
Search Result would be
"aggregations": {
"field2values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "123",
"doc_count": 1
},
{
"key": "654",
"doc_count": 1
},
{
"key": "gfr",
"doc_count": 1
}
]
}
}

elasticsearch groupby and filter by regex condition

It's a bit hard for me to define the question as I'm not very experienced with Elasticsearch. I'm focusing the question on my specific problem:
Assuming I have the following records:
{
id: 1
name: bla1_1.aaa
},
{
id: 1
name: bla1_2.bbb
},
{
id: 2
name: bla2_1.aaa
},
{
id: 2
name: bla2_2.aaa
}
What I want is to GET all the ids that have all of their names ending with aaa.
I was thinking about group by id and then do a regex query like so: *\.aaa so that all the name must satisfy the regex query.
On this particular example I would get id: 2 back.
How do I do it?
Let me know if there's anything I need to add to clarify the question.
RegexExp can be used.
Wildcard .* matches any character any number of times including zero
Terms aggregation will give you unique "ids" and number of docs under them.
Mapping :
PUT regex
{
"mappings": {
"properties": {
"id":{
"type":"integer"
},
"name":{
"type":"text",
"fields": {
"keyword":{
"type":"keyword"
}
}
}
}
}
}
Data:
"hits" : [
{
"_index" : "regex",
"_type" : "_doc",
"_id" : "olQXjW0BywGFQhV7k84P",
"_score" : 1.0,
"_source" : {
"id" : 1,
"name" : "bla1_1.aaa"
}
},
{
"_index" : "regex",
"_type" : "_doc",
"_id" : "o1QXjW0BywGFQhV7us6B",
"_score" : 1.0,
"_source" : {
"id" : 1,
"name" : "bla1_2.bbb"
}
},
{
"_index" : "regex",
"_type" : "_doc",
"_id" : "pFQXjW0BywGFQhV77c6J",
"_score" : 1.0,
"_source" : {
"id" : 2,
"name" : "bla2_1.aaa"
}
},
{
"_index" : "regex",
"_type" : "_doc",
"_id" : "pVQYjW0BywGFQhV7Dc6F",
"_score" : 1.0,
"_source" : {
"id" : 2,
"name" : "bla2_2.aaa"
}
}
]
Query:
GET regex/_search
{
"size":0,
"query": {
"regexp": {
"name.keyword": {
"value": ".*.aaa" ---> name ending with .aaa
}
}
},
"aggs": {
"unique_ids": {
"terms": {
"field": "id",
"size": 10
}
}
}
}
Result:
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"unique_ids" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 2, ---> 2 doc under id 2
"doc_count" : 2
},
{
"key" : 1, ----> 1 doc under id 1
"doc_count" : 1
}
]
}
}
Edit:
Using bucket selector to keep buckets where total count of docs in Id matches with docs selected in regex
GET regex/_search
{
"size": 0,
"aggs": {
"unique_ids": {
"terms": {
"field": "id",
"size": 10
},
"aggs": {
"totalCount": { ---> to get total count of id(all docs)
"value_count": {
"field": "id"
}
},
"filter_agg": {
"filter": {
"bool": {
"must": [
{
"regexp": {
"name.keyword": ".*.aaa"
}
}
]
}
},
"aggs": {
"finalCount": { -->total count of docs matching regex
"value_count": {
"field": "id"
}
}
}
},
"mybucket_selector": { ---> include buckets where totalcount==finalcount
"bucket_selector": {
"buckets_path": {
"FinalCount": "filter_agg>finalCount",
"TotalCount": "totalCount"
},
"script": "params.FinalCount==params.TotalCount"
}
}
}
}
}
}

Simple way to find which one are in the same company with me?

There is index have field like below, it saves who in which company and which position is
{
"createtime" : 1562844632272,
"post" : "director",
"personId" : 30007346088,
"comId" : 20010774891
}
now want to find the partners of someone, that is which person is in the same company. Now my implementation is
first find the person's related companies(at most 500)
{
"query": { "term": { "personId": 30007346088 } },
"sort": [ { "createtime": "desc" } ],
"_source": ["comId"],
"size":500
}
then find these companies' related persons and exclude the current person and remove duplicate partner(similarly at most 500 partners)
{
"query": {
"bool": {
"must": [{ "terms": { "comId": [20010774891,...] } } ],
"must_not": [ {"term":{"personId":30007346088}} ]
}
},
"aggs" : {
"personId" : {
"terms" : {
"field" : "personId",
"size": 500
}
}
},
"size":0
}
Obviously it's a little complicated, if could exist some more simple way to implement it?
It can work if data is stored in below format.
A unique document for each person , with document id same as person id and company stored as array
POST indexperson/_doc/1
{
"createtime": 1562844632272,
"personId": 1,
"company": [
{
"id": 100,
"post": "director"
},
{
"id": 100,
"post": "director"
}
]
}
Data:
[
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 1,
"company" : [
{
"id" : 100,
"post" : "director"
},
{
"id" : 101,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 2,
"company" : [
{
"id" : 101,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 3,
"company" : [
{
"id" : 100,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 4,
"company" : [
{
"id" : 104,
"post" : "director"
}
]
}
}
]
Query:
Use (terms look up)[https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html]. Terms look up takes doc id as parameter
GET indexperson/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"company.id": {
"index": "indexperson",
"id": "1", --> get all docs in indexperson which match with company id
"path": "company.id"
}
}
}
],
"must_not": [
{
"term": {
"personId": {
"value": 2
}
}
}
]
}
}
}
Result:
"hits" : [
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 1,
"company" : [
{
"id" : 100,
"post" : "director"
},
{
"id" : 101,
"post" : "director"
}
]
}
},
{
"_index" : "indexperson",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"createtime" : 1562844632272,
"personId" : 3,
"company" : [
{
"id" : 100,
"post" : "director"
}
]
}
}
]

Search in nested object

I'm having trouble making a query on elasticsearch 7.3
I create an index as this:
PUT myindex
{
"mappings": {
"properties": {
"files": {
"type": "nested"
}
}
}
}
After I create three documents:
PUT myindex/_doc/1
{
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2",
"files" : [
{
"filename" : "firstfilename.exe",
"datetime" : 111111111
},
{
"filename" : "secondfilename.exe",
"datetime" : 111111144
}
]
}
PUT myindex/_doc/2
{
"SHA256" : "87ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "thirdfilename.exe",
"datetime" : 111111133
},
{
"filename" : "fourthfilename.exe",
"datetime" : 111111122
}
]
}
PUT myindex/_doc/3
{
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "fifthfilename.exe",
"datetime" : 111111155
}
]
}
How can I get the last two files based on the datetime (ids: 1 and 3)?
I would SHA256 of the last two DATETIME ordered by DESC..
I did dozens of tests but none went well...
I don't write the code I tried because I'm really on the high seas ...
I would a result like this or similar:
{
"SHA256": [
"94ee05933....a2af6a22cc2",
"565e04933....a2af6a22c5a"
]
}
Query:
GET myindex/_search
{
"_source":"SHA256",
"sort": [
{
"files.datetime": {
"mode":"max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
]
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
]
}
]
In sort you will get the max date time value . So If you need to get file names too , you can add it in _source and use sort file to get appropriate file name.
A bit more complicated query this will give you exactly two values.
GET myindex/_search
{
"_source": "SHA256",
"query": {
"bool": {
"must": [
{
"nested": {
"path": "files",
"query": {
"match_all": {}
},
"inner_hits": {
"size":1,
"sort": [
{
"files.datetime": "desc"
}
]
}
}
}
]
}
},
"sort": [
{
"files.datetime": {
"mode": "max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
[
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_nested" : {
"field" : "files",
"offset" : 0
},
"_score" : null,
"_source" : {
"filename" : "fifthfilename.exe",
"datetime" : 111111155
},
"sort" : [
111111155
]
}
]
}
}
}
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "files",
"offset" : 1
},
"_score" : null,
"_source" : {
"filename" : "secondfilename.exe",
"datetime" : 111111144
},
"sort" : [
111111144
]
}
]
}
}
}
}
]

Elasticsearch unable retrieve child documents

I recently migrated Elasticsearch version 2.4 to 6.2.1 and my previous GET query is not working. Below is the query I am trying to retrieve the child document based on _id and _parent values. DO i have to change the implementation to retreive the documnets from ES?
{
"query": {
"bool": {
"must": [
{
"term": {
"_id": {
"value": "9:v0",
"boost": 1
}
}
},
{
"term": {
"_parent": {
"value": "v0",
"boost": 1
}
}
},
{
"terms": {
"assoc.domainId": [
"XX"
],
"boost": 1
}
},
{
"terms": {
"assoc.nodeId": [
"YY"
],
"boost": 1
}
}
],
"adjust_pure_negative": false,
"boost": 1
}
}
}
parent document in ES:
{
"_index" : "test",
"_type" : "assocjoin",
"_id" : "v0",
"_score" : 1.0,
"_source" : {
"my_join_field" : {
"name" : "version"
},
"versionnumber" : "v0",
"versiondate" : "2018/03/29 13:25:02"
}
}
Child document in ES:
{
"_index" : "test",
"_type" : "versionjoin",
"_id" : "9:v0",
"_score" : 0.18232156,
"_routing" : "v0",
"_source" : {
"id" : 0,
"assocDTO" : {
"id" : 9,
"domainId" : "XX",
"nodeId" : "YY"
},
"biomarkers" : [
{
....
}
],
"contexts" : [
{
....
}
]
},
"my_join_field" : {
"name" : "assocversion",
"parent" : "v0"
}
}
}
]
}

Resources