i have the data
{
"_id": {
"$oid": "5e8e1be5a78b4479443eae43"
},
"vehicle_component_id": 3,
"damages_types": [
1,
7
],
"mileages": [
"0",
"0",
"0",
"0",
"0",
"0",
"0",
"0",
"0",
"0"
],
"damages": {
"1": {
"upper_bound": [
null
],
"damages": [
"0"
],
"lower_bound": [
null
]
},
"7": {
"upper_bound": [
null
],
"damages": [
"0"
],
"lower_bound": [
null
]
}
},
"created_at": "2020-04-08 20:45:57",
"updated_at": "2020-04-08 20:45:57"
}
and trying to get the size or length of the mileages array the mongo query is
db.vehicle_component_damages_values.aggregate(
{ $match: { vehicle_component_id: 3} },
{$project: { count: { $size:"$mileages" }}}
)
and it's returning 10 as expected but when trying with jenssegers/laravel-mongodb like
$query = DB::connection($this->mongoCon)->collection($this->table)->raw(function($collection) use ($vehicleComponentId) {
return $collection->aggregate([
['$match' => ['vehicle_component_id' => $vehicleComponentId]],
[
'$project' => [
'count' => [
'$size' => '$mileages'
]
]
],
]);
});
it's returning
MongoDB\Driver\Cursor {#550}
so my final code for this is
public function findMileagesSizeByVehicleComponentId($vehicleComponentId)
{
try {
$cursor = DB::connection($this->mongoCon)->collection($this->table)->raw(function ($collection) use (
$vehicleComponentId
) {
return $collection->aggregate([
['$match' => ['vehicle_component_id' => $vehicleComponentId]],
[
'$project' => [
'count' => [
'$size' => '$mileages',
],
],
],
]);
});
$result = $cursor->toArray();
if (!empty($result)) {
if (sizeof($result) == 1) {
return iterator_to_array($result[0])['count'];
} else {
throw new ValidationError("Result has multiple counts please check your query.");
}
} else {
return 0;
}
} catch (\Throwable $e) {
return Error::handel($e);
} catch (\Exception $e) {
return Error::handel($e);
}
}
I have a fairly complex mapping which is storing products, and within each document it contains a nested array of pre-calculated prices for each customer.
There may be multiple versions of each product in the index (with unique codes). Alternative products are grouped by a common xrefs_hash. The query I'm writing needs to select the best product for each customer (i.e. aggregate/collapse on the xrefs_hash), and then select the top product based on the value of the prices.weight nested field.
The prices.weight field is a float which we've pre-calculated based on the shops' customer settings on how they want to prioritise their own items. A hash is created from these settings (stored in prices.pricing_hash) so that we can store a single set of pricing if multiple customers share the same settings.
The index contains up to 300,000 products and can end up with ~100,000,000 documents once all prices are calculated and inserted.
The mapping looks something like this (shortened for brevity):
'mappings' => [
'_source' => [
'enabled' => true,
],
'dynamic' => false,
'properties' => [
'dealer_item_id' => [
'type' => 'integer',
],
'code' => [
'type' => 'text',
'analyzer' => 'custom_code_analyzer',
'fields' => [
'raw' => [
'type' => 'keyword',
],
],
],
'xrefs' => [
'type' => 'text',
'analyzer' => 'custom_code_analyzer',
'fields' => [
'raw' => [
'type' => 'keyword',
],
],
],
'xrefs_hash' => [
'type' => 'keyword',
],
'title' => [
'type' => 'text',
'analyzer' => 'custom_english_analyzer',
'fields' => [
'ngram_title' => [
'type' => 'text',
'analyzer' => 'custom_title_analyzer',
],
'raw' => [
'type' => 'keyword',
],
],
],
...
'prices' => [
'type' => 'nested',
'dynamic' => false,
'properties' => [
'pricing_hash' => [
'type' => 'keyword',
'index' => true,
],
'unit_price' => [
'type' => 'float',
'index' => true,
],
'pricebreaks' => [
'type' => 'object',
'dynamic' => false,
'properties' => [
'quantity' => [
'type' => 'integer',
'index' => false,
],
'price' => [
'type' => 'integer',
'index' => false,
],
],
],
'weight' => [
'type' => 'float',
'index' => true,
],
],
],
],
],
Example documents:
{
"dealer_item_id": 122023,
"code": "ABC123A",
"xrefs": [
"ABC123A",
"ABC123B",
],
"title": "Product A",
"xrefs_hash": "16d5415674c8365f63329b11ffc88da109590cec",
"prices": [
{
"pricebreaks": [
{
"quantity": 1,
"price": 9.75,
"contract": false
}
],
"weight": 0.20512820512820512,
"pricing_hash": "aabe06b7",
"unit_price": 9.75,
},
{
"pricebreaks": [
{
"quantity": 1,
"price": 9.75,
"contract": false
}
],
"weight": 0.20512820512820512,
"pricing_hash": "73643f3b",
"unit_price": 9.75,
}
]
},
{
"dealer_item_id": 124293,
"code": "ABC1234B",
"xrefs": [
"ABC123A",
"ABC123B",
],
"title": "Product B",
"xrefs_hash": "16d5415674c8365f63329b11ffc88da109590cec",
"prices": [
{
"contract_item": false,
"pricebreaks": [
{
"quantity": 1,
"price": 7.39,
"contract": false
}
],
"weight": 0.33829499323410017,
"pricing_hash": "aabe06b7",
"unit_price": 7.39,
},
{
"pricebreaks": [
{
"quantity": 1,
"price": 9.75,
"contract": false
}
],
"weight": 0.20512820512820512,
"pricing_hash": "73643f3b",
"unit_price": 9.75,
}
]
},
Example query:
{
"track_total_hits": 100000,
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "prices",
"score_mode": "none",
"inner_hits": {
"_source": {
"include": [
"prices"
]
}
},
"query": {
"bool": {
"must": [
{
"term": {
"prices.pricing_hash": "aabe06b7"
}
}
]
}
}
}
},
{
"term": {
"code.raw": "RX58022"
}
}
],
"must_not": [
{
"term": {
"disabled": true
}
}
]
}
}
}
},
"_source": {
"includes": [
"code",
"dealer_item_id",
"title",
"xrefs"
]
},
"collapse": {
"field": "xrefs_hash",
"inner_hits": {
"name": "best_xrefs",
"sort": {
"prices.weight": "desc"
},
"size": 1
}
},
"aggregations": {
"xrefs_count": {
"cardinality": {
"field": "xrefs_hash",
"precision_threshold": 40000
}
}
}
}
I have tried using a collapse query to select the best product, but this does not seem to support sorting by the nested prices.weight field.
I've also tried aggretating based on the xrefs_hash, but this seems to make pagination at the category level impossible.
The above example query almost works, but does not return the collapsed results in the correct order. When inspecting the query it seems to be replacing the collapse sort with Infinity, which apparently ES does if a document does not contain a sort field.
So what I'm wondering is; is it possible to:
Return 1 document per unique xref_hash value
Return the specific document whith the highest prices.weight value, matching customer's pricing_hash
Also make this work with pagination
Currently, I am trying to query the elasticsearch with should clause on multiple fields along with must clause in one field.
with SQL I would write this query:
SELECT * FROM test where ( titleName='Business' OR titleName='Bear') AND (status='Draft' OR status='Void') AND creator='bob'
I tried this:
$params = [
'index' => myindex,
'type' => mytype,
'body' => [
"from" => 0,
"size" => 1000,
'query' => [
'bool' => [
'must' => [
'bool' => [
'should' => [
['match' => ['titleName' => 'Business']],
['match' => ['titleName' => 'Bear']]
]
],
'should' => [
['match' => ['status' => 'Draft']],
['match' => ['status' => 'Void']]
]
'must' => [
['match'] => ['creator' => 'bob']
]
]
]
]
]
];
The above query string working with single status field or single title field. But it's not working with both the fields.
Does anyone have a solution?
You need to AND both of your bool/should pairs. This should work:
$params = [
'index' => myindex,
'type' => mytype,
'body' => [
"from" => 0,
"size" => 1000,
'query' => [
'bool' => [
'must' => [
[
'bool' => [
'should' => [
['match' => ['titleName' => 'Business']],
['match' => ['titleName' => 'Bear']]
]
]
],
[
'bool' => [
'should' => [
['match' => ['status' => 'Draft']],
['match' => ['status' => 'Draft']]
]
]
],
[
'match' => ['creator' => 'bob']
]
]
]
]
]
];
You can write your query something like this. Add a Must inside that you add Should
{
"query": {
"filter": {
"bool": {
"must": [{
"bool": {
"should": [{
"term": {
"titleName": "business"
}
},
{
"term": {
"titleName": "bear"
}
}
]
}
},
{
"bool": {
"should": [{
"term": {
"status": "draft"
}
},
{
"term": {
"status": "void"
}
}
]
}
},
{
"bool": {
"must": [{
"term": {
"creator": "bob"
}
}]
}
}
]
}
}
},
"from": 0,
"size": 25
}
I need to update this query to be ES 5.5 compatible (NOT and AND are deprecated).
:and => [
{
term: {
type: 'group'
}
},
{
:not => {
terms: {
group_id: group_ids
}
}
},
{
:not => {
terms: {
user_id: user_ids
}
}
}
]
Try this:
:bool => [
:filter => [
{
term: {
type: 'group'
}
}
],
:must_not => [
{
terms: {
group_id: group_ids
}
},
{
terms: {
user_id: user_ids
}
}
]
]
Having trouble trying to update the ids field in the document structure:
[
[0] {
"rank" => nil,
"profile_id" => 3,
"daily_providers" => [
[0] {
"relationships" => [
[0] {
"relationship_type" => "friend",
"count" => 0
},
[1] {
"relationship_type" => "acquaintance",
"ids" => [],
"count" => 0
}
],
"countries" => [
[0] {
"country_name" => "United States",
"count" => 0
},
[1] {
"country_name" => "Great Britain",
"count" => 0
}
],
"provider_name" => "foo",
"date" => 20130912
},
[1] {
"provider_name" => "bar"
}
]
}
]
In JavaScript, you can do
r.db('test').table('test').get(3).update(function(doc) {
return {daily_providers: doc("daily_providers").changeAt(
0,
doc("daily_providers").nth(0).merge({
relationships: doc("daily_providers").nth(0)("relationships").changeAt(
1,
doc("daily_providers").nth(0)("relationships").nth(1).merge({
ids: [1]
})
)
})
)}
})
Which becomes in Ruby
r.db('test').table('test').get(3).update{ |doc|
{"daily_providers" => doc["daily_providers"].changeAt(
0,
doc["daily_providers"][0].merge({
"relationships" => doc["daily_providers"][0]["relationships"].changeAt(
1,
doc["daily_providers"][0]["relationships"][1].merge({
ids => [1]
})
)
})
)}
}
You should probably have another table for the daily providers and do joins.
That would make things way more simpler.