Order not work in elastic search aggregation when use partition - sorting

How can I create pagination in elastic search aggregation like bellow.
"CONTACT_ID" is not unique in all of my documents, since I have document with same ID and several parts.
So I have to use aggregation to create pagination and top_hits to get documents. but when I use partition, order not correct.
I have an aggregation like this.
"aggregations": {
"CONTACT_ID": {
"terms": {
"field": "CONTACT_ID",
"size": 1000,
"order": {
"_key": "desc"
}
},
"aggregations": {
"DOCUMENTS": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"seq_no_primary_term": false,
"explain": false,
"sort": [
{
"_id": {
"order": "desc"
}
}
]
}
}
}
}
}
with the output
[CONTACT_ID] => 367884276
[CONTACT_ID] => 367884262
[CONTACT_ID] => 367884240
[CONTACT_ID] => 367884190
[CONTACT_ID] => 367884178
[CONTACT_ID] => 367884166
[CONTACT_ID] => 367884164
[CONTACT_ID] => 367884142
[CONTACT_ID] => 367884140
[CONTACT_ID] => 367884128
[CONTACT_ID] => 367884106
[CONTACT_ID] => 367884104
[CONTACT_ID] => 367884092
[CONTACT_ID] => 367884090
[CONTACT_ID] => 367884088
[CONTACT_ID] => 367884034
[CONTACT_ID] => 367884032
[CONTACT_ID] => 367884008
All documents are sorted based on ID
but when I used partition, order not work
"terms": {
"field": "CONTACT_ID",
"size": 1000,
"order": {
"_key": "desc"
},
"include": {
"partition": 0,
"num_partitions": 2
}
},
output is
[CONTACT_ID] => 367884276
[CONTACT_ID] => 367884262
[CONTACT_ID] => 367884240
**[CONTACT_ID] => 367884166** --> not in the order
[CONTACT_ID] => 367884142
[CONTACT_ID] => 367884140
[CONTACT_ID] => 367884128
[CONTACT_ID] => 367884106
[CONTACT_ID] => 367884104
[CONTACT_ID] => 367884090
[CONTACT_ID] => 367884034
[CONTACT_ID] => 367884008
documents with
[CONTACT_ID] => 367884190
[CONTACT_ID] => 367884178
are in my next partition and order is not correct.

Related

Logstash : Is there a way to change some of the properties in document while migrating

I have been migrating some of the indexes from self-hosted Elasticsearch to AmazonElasticSearch using Logstash. While migrating the documents, We need to change the field names in the index based on some logic.
Our Logstash Config file
input {
elasticsearch {
hosts => ["https://staing-example.com:443"]
user => "userName"
password => "password"
index => "testingindex"
size => 100
scroll => "1m"
}
}
filter {
}
output {
amazon_es {
hosts => ["https://example.us-east-1.es.amazonaws.com:443"]
region => "us-east-1"
aws_access_key_id => "access_key_id"
aws_secret_access_key => "access_key_id"
index => "testingindex"
}
stdout{
codec => rubydebug
}
}
Here it is one of the documents for the testingIndex from our self-hosted elastic search
{
"uniqueIdentifier" => "e32d331b-ce5f-45c8-beca-b729707fca48",
"createdDate" => 1527592562743,
"interactionInfo" => [
{
"value" => "Hello this is testing",
"title" => "msg",
"interactionInfoId" => "8c091cb9-e51b-42f2-acad-79ad1fe685d8"
},
{
**"value"** => """"{"edited":false,"imgSrc":"asdfadf/soruce","cont":"Collaborated in <b class=\"mention\" gid=\"4UIZjuFzMXiu2Ege6cF3R4q8dwaKb9pE\">#2222222</b> ","chatMessageObjStr":"Btester has quoted your feed","userLogin":"test.comal#google.co","userId":"tester123"}"""",
"title" => "msgMeta",
"interactionInfoId" => "f6c7203b-2bde-4cc9-a85e-08567f082af3"
}
],
"componentId" => "compId",
"status" => [
"delivered"
]
},
"accountId" => "test123",
"applicationId" => "appId"
}
This is what we are expecting when documents get migrated to our AmazonElasticSearch
{
"uniqueIdentifier" => "e32d331b-ce5f-45c8-beca-b729707fca48",
"createdDate" => 1527592562743,
"interactionInfo" => [
{
"value" => "Hello this is testing",
"title" => "msg",
"interactionInfoId" => "8c091cb9-e51b-42f2-acad-79ad1fe685d8"
},
{
**"value-keyword"** => """"{"edited":false,"imgSrc":"asdfadf/soruce","cont":"Collaborated in <b class=\"mention\" gid=\"4UIZjuFzMXiu2Ege6cF3R4q8dwaKb9pE\">#2222222</b> ","chatMessageObjStr":"Btester has quoted your feed","userLogin":"test.comal#google.co","userId":"tester123"}"""",
"title" => "msgMeta",
"interactionInfoId" => "f6c7203b-2bde-4cc9-a85e-08567f082af3"
}
],
"componentId" => "compId",
"status" => [
"delivered"
]
},
"accountId" => "test123",
"applicationId" => "appId"
}
What we need is to change the "value" field to "value-keyword" wherever we find some JSON format. Is there any other filter in Logstash to achieve this
As documented in the Logstash website:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-rename
You can use the mutate filter, applying the rename function.
For example:
filter {
mutate {
replace => { "old-field" => "new-field" }
}
}
For nested fields, you could just pass the path of the field:
filter {
mutate {
replace => { "[interactionInfo][value]" => "[interactionInfo][value-keyword]" }
}
}
Try adding this to your filter:
filter {
ruby {
code => "event.get('interactionInfo').each { |item| if item['value'].match(/{.+}/) then item['value-keyword'] = item.delete('value') end }"
}
}

Elasticsearch: Is it possible to sort collapsed results by a nested field?

I have a fairly complex mapping which is storing products, and within each document it contains a nested array of pre-calculated prices for each customer.
There may be multiple versions of each product in the index (with unique codes). Alternative products are grouped by a common xrefs_hash. The query I'm writing needs to select the best product for each customer (i.e. aggregate/collapse on the xrefs_hash), and then select the top product based on the value of the prices.weight nested field.
The prices.weight field is a float which we've pre-calculated based on the shops' customer settings on how they want to prioritise their own items. A hash is created from these settings (stored in prices.pricing_hash) so that we can store a single set of pricing if multiple customers share the same settings.
The index contains up to 300,000 products and can end up with ~100,000,000 documents once all prices are calculated and inserted.
The mapping looks something like this (shortened for brevity):
'mappings' => [
'_source' => [
'enabled' => true,
],
'dynamic' => false,
'properties' => [
'dealer_item_id' => [
'type' => 'integer',
],
'code' => [
'type' => 'text',
'analyzer' => 'custom_code_analyzer',
'fields' => [
'raw' => [
'type' => 'keyword',
],
],
],
'xrefs' => [
'type' => 'text',
'analyzer' => 'custom_code_analyzer',
'fields' => [
'raw' => [
'type' => 'keyword',
],
],
],
'xrefs_hash' => [
'type' => 'keyword',
],
'title' => [
'type' => 'text',
'analyzer' => 'custom_english_analyzer',
'fields' => [
'ngram_title' => [
'type' => 'text',
'analyzer' => 'custom_title_analyzer',
],
'raw' => [
'type' => 'keyword',
],
],
],
...
'prices' => [
'type' => 'nested',
'dynamic' => false,
'properties' => [
'pricing_hash' => [
'type' => 'keyword',
'index' => true,
],
'unit_price' => [
'type' => 'float',
'index' => true,
],
'pricebreaks' => [
'type' => 'object',
'dynamic' => false,
'properties' => [
'quantity' => [
'type' => 'integer',
'index' => false,
],
'price' => [
'type' => 'integer',
'index' => false,
],
],
],
'weight' => [
'type' => 'float',
'index' => true,
],
],
],
],
],
Example documents:
{
"dealer_item_id": 122023,
"code": "ABC123A",
"xrefs": [
"ABC123A",
"ABC123B",
],
"title": "Product A",
"xrefs_hash": "16d5415674c8365f63329b11ffc88da109590cec",
"prices": [
{
"pricebreaks": [
{
"quantity": 1,
"price": 9.75,
"contract": false
}
],
"weight": 0.20512820512820512,
"pricing_hash": "aabe06b7",
"unit_price": 9.75,
},
{
"pricebreaks": [
{
"quantity": 1,
"price": 9.75,
"contract": false
}
],
"weight": 0.20512820512820512,
"pricing_hash": "73643f3b",
"unit_price": 9.75,
}
]
},
{
"dealer_item_id": 124293,
"code": "ABC1234B",
"xrefs": [
"ABC123A",
"ABC123B",
],
"title": "Product B",
"xrefs_hash": "16d5415674c8365f63329b11ffc88da109590cec",
"prices": [
{
"contract_item": false,
"pricebreaks": [
{
"quantity": 1,
"price": 7.39,
"contract": false
}
],
"weight": 0.33829499323410017,
"pricing_hash": "aabe06b7",
"unit_price": 7.39,
},
{
"pricebreaks": [
{
"quantity": 1,
"price": 9.75,
"contract": false
}
],
"weight": 0.20512820512820512,
"pricing_hash": "73643f3b",
"unit_price": 9.75,
}
]
},
Example query:
{
"track_total_hits": 100000,
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "prices",
"score_mode": "none",
"inner_hits": {
"_source": {
"include": [
"prices"
]
}
},
"query": {
"bool": {
"must": [
{
"term": {
"prices.pricing_hash": "aabe06b7"
}
}
]
}
}
}
},
{
"term": {
"code.raw": "RX58022"
}
}
],
"must_not": [
{
"term": {
"disabled": true
}
}
]
}
}
}
},
"_source": {
"includes": [
"code",
"dealer_item_id",
"title",
"xrefs"
]
},
"collapse": {
"field": "xrefs_hash",
"inner_hits": {
"name": "best_xrefs",
"sort": {
"prices.weight": "desc"
},
"size": 1
}
},
"aggregations": {
"xrefs_count": {
"cardinality": {
"field": "xrefs_hash",
"precision_threshold": 40000
}
}
}
}
I have tried using a collapse query to select the best product, but this does not seem to support sorting by the nested prices.weight field.
I've also tried aggretating based on the xrefs_hash, but this seems to make pagination at the category level impossible.
The above example query almost works, but does not return the collapsed results in the correct order. When inspecting the query it seems to be replacing the collapse sort with Infinity, which apparently ES does if a document does not contain a sort field.
So what I'm wondering is; is it possible to:
Return 1 document per unique xref_hash value
Return the specific document whith the highest prices.weight value, matching customer's pricing_hash
Also make this work with pagination

Elasticsearch should operator on multiple fields

Currently, I am trying to query the elasticsearch with should clause on multiple fields along with must clause in one field.
with SQL I would write this query:
SELECT * FROM test where ( titleName='Business' OR titleName='Bear') AND (status='Draft' OR status='Void') AND creator='bob'
I tried this:
$params = [
'index' => myindex,
'type' => mytype,
'body' => [
"from" => 0,
"size" => 1000,
'query' => [
'bool' => [
'must' => [
'bool' => [
'should' => [
['match' => ['titleName' => 'Business']],
['match' => ['titleName' => 'Bear']]
]
],
'should' => [
['match' => ['status' => 'Draft']],
['match' => ['status' => 'Void']]
]
'must' => [
['match'] => ['creator' => 'bob']
]
]
]
]
]
];
The above query string working with single status field or single title field. But it's not working with both the fields.
Does anyone have a solution?
You need to AND both of your bool/should pairs. This should work:
$params = [
'index' => myindex,
'type' => mytype,
'body' => [
"from" => 0,
"size" => 1000,
'query' => [
'bool' => [
'must' => [
[
'bool' => [
'should' => [
['match' => ['titleName' => 'Business']],
['match' => ['titleName' => 'Bear']]
]
]
],
[
'bool' => [
'should' => [
['match' => ['status' => 'Draft']],
['match' => ['status' => 'Draft']]
]
]
],
[
'match' => ['creator' => 'bob']
]
]
]
]
]
];
You can write your query something like this. Add a Must inside that you add Should
{
"query": {
"filter": {
"bool": {
"must": [{
"bool": {
"should": [{
"term": {
"titleName": "business"
}
},
{
"term": {
"titleName": "bear"
}
}
]
}
},
{
"bool": {
"should": [{
"term": {
"status": "draft"
}
},
{
"term": {
"status": "void"
}
}
]
}
},
{
"bool": {
"must": [{
"term": {
"creator": "bob"
}
}]
}
}
]
}
}
},
"from": 0,
"size": 25
}

How can i update the ids field with this rethinkdb document structure?

Having trouble trying to update the ids field in the document structure:
[
[0] {
"rank" => nil,
"profile_id" => 3,
"daily_providers" => [
[0] {
"relationships" => [
[0] {
"relationship_type" => "friend",
"count" => 0
},
[1] {
"relationship_type" => "acquaintance",
"ids" => [],
"count" => 0
}
],
"countries" => [
[0] {
"country_name" => "United States",
"count" => 0
},
[1] {
"country_name" => "Great Britain",
"count" => 0
}
],
"provider_name" => "foo",
"date" => 20130912
},
[1] {
"provider_name" => "bar"
}
]
}
]
In JavaScript, you can do
r.db('test').table('test').get(3).update(function(doc) {
return {daily_providers: doc("daily_providers").changeAt(
0,
doc("daily_providers").nth(0).merge({
relationships: doc("daily_providers").nth(0)("relationships").changeAt(
1,
doc("daily_providers").nth(0)("relationships").nth(1).merge({
ids: [1]
})
)
})
)}
})
Which becomes in Ruby
r.db('test').table('test').get(3).update{ |doc|
{"daily_providers" => doc["daily_providers"].changeAt(
0,
doc["daily_providers"][0].merge({
"relationships" => doc["daily_providers"][0]["relationships"].changeAt(
1,
doc["daily_providers"][0]["relationships"][1].merge({
ids => [1]
})
)
})
)}
}
You should probably have another table for the daily providers and do joins.
That would make things way more simpler.

elastic search limit results for types

i have the following query
$queryDefinition = [
'query' => [
'bool' => [
'must' => [
[
'query_string' => [
'default_field' => '_all',
'query' => $term
]
]
],
'must_not' => [],
'should' => []
],
],
//'filter' => [
// 'limit' => ['value' => 3],
//],
'from' => 0,
'size' => 50,
'sort' => [],
'facets' => [
'types' => [
'terms' => ['field' => '_type']
]
]
];
in the index we have 5 types, and i would like to show only 3 results from each type, for autocomplete. when i set the filter limit only the results from the first type are filtered, for other types i get all the results.
how can i do this?
thanks
{
"aggregations": {
"types": {
"terms": {
"field": "_type"
},
"hits": {
"top_hits": {
"size": 3
}
}
}
}
}
If you're using ElasticSearch 1.4.0, you can use terms aggregation on "_type" field and use top hits aggregation as sub aggregation and set its size to 3 (in your case).
Hope that helps.

Resources