Elastic Search Grouped Queries - elasticsearch

I'm indexing an array of key value pairs. The key is always a UUID and the value is a user entered value. I've been crawling through the documentation but I can't figure out exactly how to query in this scenarioExample schema:
{
"id": 1,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the red card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
},
{
"id": 2,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the blue card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
}
I would like to query:
{ "term": { "owner_id": 1 },
{ "term": { "values.key": "23a2dd23108" }, "match": { "values.value": "purple" } },
{ "term": { "values.key": "k3kfa23rewf" }, "match": { "values.value": "blue" } }
So that the record with ID 2 is returned. Any suggestions?

I think that you need here to use nested documents.
That way, you will be able to create BoolQueries, with a Must clause with a TermQuery on owner_id and two must clauses with nested queries with Term and Match queries on values.key and values.value.
Does it help?

Related

Filter documents out of the facet count in enterprise search

We use enterprise search indexes to store items that can be tagged by multiple tenants.
e.g
[
{
"id": 1,
"name": "document 1",
"tags": [
{ "company_id": 1, "tag_id": 1, "tag_name": "bla" },
{ "company_id": 2, "tag_id": 1, "tag_name": "bla" }
]
}
]
I'm looking to find a way to retrieve all documents with only the tags of company 1
This request:
{
"query": "",
"facets": {
"tags": {
"type": "value"
}
},
"sort": {
"created": "desc"
},
"page": {
"size": 20,
"current": 1
}
}
Is coming back with
...
"facets": {
"tags": [
{
"type": "value",
"data": [
{
"value": "{\"company_id\":1,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
},
{
"value": "{\"company_id\":2,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
}
]
}
],
}
...
Can I modify the request in a way such that I get no tags by "company_id" = 2 ?
I have a solution that involves modifying the results to strip the extra data after they are retrieved but I'm looking for a better solution.

Group by terms and get count of nested array property?

I would like to get the count from a document series where an array item matches some value.
I have documents like these:
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 10
},{
"State": "PENDING"
"Timer": 5
}]
}
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 5
},{
"State": "PENDING"
"Timer": 2
}]
}
{
"Name": "martin",
"Todos": [{
"State": "COMPLETED"
"Timer": 15
},{
"State": "PENDING"
"Timer": 10
}]
}
I would like to count how many documents I have where they have any Todos with COMPLETED State. And group by Name.
So from the above I would need to get:
jason: 2
martin: 1
Usually I do this with a term aggregation for the Name, and an other sub aggregation for other items:
"aggs": {
"statistics": {
"terms": {
"field": "Name"
},
"aggs": {
"test": {
"filter": {
"bool": {
"must": [{
"match_phrase": {
"SomeProperty.keyword": {
"query": "THEVALUE"
}
}
}
]
}
},
But not sure how to do this here as I have items in an array.
Elasticsearch has no problem with arrays because in fact it flattens them by default:
Arrays of inner object fields do not work the way you may expect. Lucene has no concept of inner objects, so Elasticsearch flattens object hierarchies into a simple list of field names and values.
So a query like the one you posted will do. I would use term query for keyword datatype, though:
POST mytodos/_search
{
"size": 0,
"aggs": {
"by name": {
"terms": {
"field": "Name"
},
"aggs": {
"how many completed": {
"filter": {
"term": {
"Todos.State": "COMPLETED"
}
}
}
}
}
}
}
I am assuming your mapping looks something like this:
PUT mytodos/_mappings
{
"properties": {
"Name": {
"type": "keyword"
},
"Todos": {
"properties": {
"State": {
"type": "keyword"
},
"Timer": {
"type": "integer"
}
}
}
}
}
The example documents that you posted will be transformed internally into something like this:
{
"Name": "jason",
"Todos.State": ["COMPLETED", "PENDING"],
"Todos.Timer": [10, 5]
}
However, if you need to query for Todos.State and Todos.Timer, for example, filter for those "COMPLETED" but only with Timer > 10, it will not be possible with such mapping because Elasticsearch forgets the link between fields of object array items.
In this case you would need to use something like nested datatype for such arrays, and query them with special nested query.
Hope that helps!

Elasticsearch sorting by array of objects

I have a column engagement like this along with other columns
record 1
"date":"2017-11-23T06:46:04.358Z",
"remarks": "test1",
"engagement": [
{
"name": "comment_count",
"value": 6
},
{
"name": "like_count",
"value": 2
}
],
....
....
record 2
"date":"2017-11-23T07:16:14.358Z",
"remarks": "test2",
"engagement": [
{
"name": "comment_count",
"value": 3
},
{
"name": "like_count",
"value": 9
}
],
....
....
I am storing objects in an array format, Now I want to sort the data by desc order of any given object name, e.g. value of like_count or value of share_count.
So if I sort by like_count then 2nd record should come before the 1st record as the value of like_count of the 2nd record is 9 compared to the value of like_count of the first record which is 2.
How to do this in elasticsearch?
You should have something like the following:
{
"query": {
"nested": {
"path": "engagement",
"filter": {
...somefilter...
}
}
},
"sort": {
"engagement.name": {
"order": "desc",
"mode": "min",
"nested_filter": {
...same.filter.as.before
}
}
}
}
Source: Elastic Docs

Elasticsearch query fails to return results when querying a nested object

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

aggregation on fields values (regex)

I am trying to perform an aggregation to group documents by the first two letters of a specific field value.
I successfully aggreated my documents by a specific field name, but i don't know how to work with the values.
For example, for the docs:
[
{
"name": "John"
},
{
"name": "Jog"
},
{
"name": "James"
},
{
"name": "Robert"
},
{
"name": "Jessica"
}
]
I would like to get the following response:
[
{
"key": "Jo",
"doc_count": 2
},
{
"key": "Ja",
"doc_count": 1
},
{
"key": "Ro",
"doc_count": 1
},
{
"key": "Je",
"doc_count": 1
}
]
Is there an aggregation query able to do that?
You could use a terms aggregation with a script instead of a field, like this:
{
"size": 0,
"aggs": {
"first_two": {
"terms": {
"script": "doc.name.value?.size() >=2 ? doc.name.value?.substring(0, 2) : doc.name.value"
}
}
}
}
Note that if your name fields all have at least two characters, the script could simply be doc.name.value?.substring(0, 2). My script above accounts for single character names.
Also make sure to enable dynamic scripting in order for this to work.

Resources