Filter documents out of the facet count in enterprise search - elasticsearch

We use enterprise search indexes to store items that can be tagged by multiple tenants.
e.g
[
{
"id": 1,
"name": "document 1",
"tags": [
{ "company_id": 1, "tag_id": 1, "tag_name": "bla" },
{ "company_id": 2, "tag_id": 1, "tag_name": "bla" }
]
}
]
I'm looking to find a way to retrieve all documents with only the tags of company 1
This request:
{
"query": "",
"facets": {
"tags": {
"type": "value"
}
},
"sort": {
"created": "desc"
},
"page": {
"size": 20,
"current": 1
}
}
Is coming back with
...
"facets": {
"tags": [
{
"type": "value",
"data": [
{
"value": "{\"company_id\":1,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
},
{
"value": "{\"company_id\":2,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
}
]
}
],
}
...
Can I modify the request in a way such that I get no tags by "company_id" = 2 ?
I have a solution that involves modifying the results to strip the extra data after they are retrieved but I'm looking for a better solution.

Related

Calculate the counts of last snapshot of a record in ElasticSearch

I am storing snapshots of data in ElasticSearch. I want to perform count metric aggregation on latest snapshot of each entry, the purpose is to know what state my current (latest) data are in
I have something like this
[
{
"id": 2,
"state": "deleted",
"timestamp": "2019-11-20T18:18:09+00:00"
},
{
"id": 2,
"state": "published",
"timestamp": "2019-11-19T18:18:09+00:00"
},
{
"id": 3,
"state": "published",
"timestamp": "2019-10-17T18:18:09+00:00"
},
{
"id": 3,
"state": "draft",
"timestamp": "2019-10-16T18:18:09+00:00"
}
]
I tried this
POST /snapshots/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"2": {
"terms": {
"field": "state.keyword",
},
"aggs": {
"1": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
But the problem is it first create a bucket and in that bucket it does the sorting and calculate the top_hits so instead of
deleted = 1
published = 1
draft = 0
It returns
deleted = 1
published = 1
draft = 1

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

es query of suggest in elasticsearch 5.0.1

I have a question that i want to search a result use suggest.
My type schema like this
`
{
"name": {
"input": [
"uers1"
]
},
"usertype": 1
}{
"name": {
"input": [
"uers2"
]
},
"usertype": 2
}`
I want search data by suggest, the query like these
`{
"suggest": {
"person_suggest": {
"text": "us",
"completion": {
"field": "name"
}
}
}
}`
And the result like these
`{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"person_suggest": [
{
"text": "word",
"offset": 0,
"length": 4,
"options": [
{
"name": "user1",
"usertype": 1,
"score": 1
},
{
"text": "user2",
"usertype": 2,
"score": 1
}
]
}
]
} `
But I only want the result is usertype = 1, like add a where condition in mysql. Any body can help me ?I want a DSL query.Thx a lot.
You can'nt filter in completion suggest queries. A solution to your problem to make different completion fields for each usertype or use standard queries with nGram analyzers.

elasticsearch sort nested_filter not matching nested_path

I have an index with items of this form
{
"_index": "identity-index",
"_source": {
"names": [
"test"
],
"private": {
"lists": [
{
"listId": "56b8a0197f3c56654f8751b5",
"ratings": [
{
"rating": 4,
"authorId": "56499b7a97e3aa857cdc4f1d"
},
{
"rating": 4,
"authorId": "56b36646a24d50866de77928"
},
{
"rating": 4,
"authorId": "56cb16005082871b33ab1a60"
},
{
"rating": 4,
"authorId": "56b216a4c28edca956fe96d4"
},
{
"rating": 4,
"authorId": "56b34e8d8e324180259252f7"
}
]
},
{
"listId": "56c1c508da49cdd9662b102c"
}
]
}
},
"sort": [
"-Infinity"
]
}
I want to sort them by average rating given a listId:
I've tried a lot of ways and the closest I got was with this:
"sort": {
"private.lists.ratings.rating": {
"missing": "_last",
"order": "desc",
"mode": "avg",
"nested_path": "private.lists.ratings",
"nested_filter": {
"term": {
"private.lists.listId": "56c1c508da49cdd9662b102c"
}
}
}
},
The problem is that this scores everything as -Inf. I can't find any way to sort the nested elements in private.lists.ratings but taking into account the filter by private.lists.listId. The nested_path and nested_filter fields are different and I don't think they are supposed to be.
If the ratings field is analyzed with type nested, you can get what you want by copying the listId in each of the nested objects.
Unfortunately, nested objects are not part of the main document, and nested_filter (and nested_sort) can only disambiguate based on properties contained in each subdocument.
One solution could be to flatten your structure to a simple list of objects looking like the following
{
"listId": "56b8a0197f3c56654f8751b5",
"rating": 4,
"authorId": "56499b7a97e3aa857cdc4f1d"
}

Elastic Search Grouped Queries

I'm indexing an array of key value pairs. The key is always a UUID and the value is a user entered value. I've been crawling through the documentation but I can't figure out exactly how to query in this scenarioExample schema:
{
"id": 1,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the red card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
},
{
"id": 2,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the blue card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
}
I would like to query:
{ "term": { "owner_id": 1 },
{ "term": { "values.key": "23a2dd23108" }, "match": { "values.value": "purple" } },
{ "term": { "values.key": "k3kfa23rewf" }, "match": { "values.value": "blue" } }
So that the record with ID 2 is returned. Any suggestions?
I think that you need here to use nested documents.
That way, you will be able to create BoolQueries, with a Must clause with a TermQuery on owner_id and two must clauses with nested queries with Term and Match queries on values.key and values.value.
Does it help?

Resources