I have ~7 million docs in a bucket and I am struggling to write the correct query/index combo to prevent it from running >5 seconds.
Here is a similar scenario to the one I am trying to solve:
I have multiple coffee shops each making coffee with different container/lid combos. These field key’s are also different for different doc types. With each sale being generated I keep track of these combos.
Here are a few example docs:
[{
"shopId": "x001",
"date": "2022-01-01T08:49:00Z",
"cappuccinoContainerId": "a001",
"cappuccinoLidId": "b001"
},
{
"shopId": "x001",
"date": "2022-01-02T08:49:00Z",
"latteContainerId": "a002",
"latteLidId": "b002"
},
{
"shopId": "x001",
"date": "2022-01-02T08:49:00Z",
"espressoContainerId": "a003",
"espressoLidId": "b003"
},
{
"shopId": "x002",
"date": "2022-01-01T08:49:00Z",
"cappuccinoContainerId": "a001",
"cappuccinoLidId": "b001"
},
{
"shopId": "x002",
"date": "2022-01-02T08:49:00Z",
"latteContainerId": "a002",
"latteLidId": "b002"
},
{
"shopId": "x002",
"date": "2022-01-02T08:49:00Z",
"espressoContainerId": "a003",
"espressoLidId": "b003"
}]
What I need to get out of the query is the following:
[{
"shopId": "x001",
"day": "2022-01-01",
"uniqueContainersLidsCombined": 2
},
{
"shopId": "x001",
"day": "2022-01-01",
"uniqueContainersLidsCombined": 4
},
{
"shopId": "x002",
"day": "2022-01-01",
"uniqueContainersLidsCombined": 2
},
{
"shopId": "x002",
"day": "2022-01-01",
"uniqueContainersLidsCombined": 4
}]
I.e. I want the total number of unique containers and lids combined per site and day.
I have tried using composite, adaptive and FTS indexes but I unable to figure this one out.
Does anybody have a different suggestion? Can someone please help?
CREATE INDEX ix1 ON default(shopId, DATE_FORMAT_STR(date,"1111-11-11"), [cappuccinoContainerId, cappuccinoLidId]);
If Using EE and shopId is immutable add PARTITION BY HASH (shopId) to above index definition (with higher partition numbers).
SELECT d.shopId,
DATE_FORMAT_STR(d.date,"1111-11-11") AS day
COUNT(DISTINCT [d.cappuccinoContainerId, d.cappuccinoLidId]) AS uniqueContainersLidsCombined
FROM default AS d
WHERE d.shopId IS NOT NULL
GROUP BY d.shopId, DATE_FORMAT_STR(d.date,"1111-11-11");
Adjust index key order of shopId, day based on the query predicates.
https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/
Update:
Based on EXPLAIN you have date predicate and all shopIds so use following index
CREATE INDEX ix2 ON default( DATE_FORMAT_STR(date,"1111-11-11"), shopId, [cappuccinoContainerId, cappuccinoLidId]);
As you need to DISTINCT of cappuccinoContainerId, cappuccinoLidId storing as single key (array of 2 elements) as [cappuccinoContainerId, cappuccinoLidId]. The advantage of this you can directly reference in COUNT as DISTINCT this allows use index aggregation. (NO DISTINCT in the Index that turns into ARRAY index and things will not work as expected .
I assume
That the cup types and lid types can be used for any drink type.
That you don't want to add any precomputed stuff to your data.
Perhaps an index like this my collection keyspace is in bulk.sales.amer, note I am not sure if this performs better or worse (or even if it is equivalent) WRT the solution posted by vsr:
CREATE INDEX `adv_shopId_concat_nvls`
ON `bulk`.`sales`.`amer`(
`shopId` MISSING,
(
nvl(`cappuccinoContainerId`, "") ||
nvl(`cappuccinoLidId`, "") ||
nvl(`latteContainerId`, "") ||
nvl(`latteLidId`, "") ||
nvl(`espressoContainerId`, "") ||
nvl(`espressoLidId`, "")),substr0(`date`, 0, 10)
)
And then a using the covered index above do your query like this:
SELECT
shopId,
CONCAT(
NVL(cappuccinoContainerId,""),
NVL(cappuccinoLidId,""),
NVL(latteContainerId,""),
NVL(latteLidId,""),
NVL(espressoContainerId,""),
NVL(espressoLidId,"")
) AS uniqueContainersLidsCombined,
SUBSTR(date,0,10) AS day,
COUNT(*) AS cnt
FROM `bulk`.`sales`.`amer`
GROUP BY
shopId,
CONCAT(
NVL(cappuccinoContainerId,""),
NVL(cappuccinoLidId,""),
NVL(latteContainerId,""),
NVL(latteLidId,""),
NVL(espressoContainerId,""),
NVL(espressoLidId,"")
),
SUBSTR(date,0,10)
Note I used the following 16 lines of data:
{"amer":"amer","date":"2022-01-01T08:49:00Z","cappuccinoContainerId":"a001","cappuccinoLidId":"b001","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-01T08:49:00Z","cappuccinoContainerId":"a001","cappuccinoLidId":"b001","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-02T08:49:00Z","latteContainerId":"a002","latteLidId":"b002","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-02T08:49:00Z","latteContainerId":"a002","latteLidId":"b002","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-02T08:49:00Z","espressoContainerId":"a003","espressoLidId":"b003","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-02T08:49:00Z","espressoContainerId":"a003","espressoLidId":"b003","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T08:49:00Z","cappuccinoContainerId":"a007","cappuccinoLidId":"b004","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-03T08:49:00Z","cappuccinoContainerId":"a007","cappuccinoLidId":"b004","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T08:49:00Z","latteContainerId":"a007","latteLidId":"b004","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-03T08:49:00Z","latteContainerId":"a007","latteLidId":"b004","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T01:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-03T02:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T03:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T04:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T05:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T06:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
Applying some sorting by wrapping the above query with
SELECT T1.* FROM
(
-- paste above --
) AS T1
ORDER BY T1.day, T1,shopid, T1.uniqueContainersLidsCombined
We get
cnt day shopId uniqueContainersLidsCombined
1 "2022-01-01" "x001" "a001b001"
1 "2022-01-01" "x002" "a001b001"
1 "2022-01-02" "x001" "a002b002"
1 "2022-01-02" "x001" "a003b003"
1 "2022-01-02" "x002" "a002b002"
1 "2022-01-02" "x002" "a003b003"
1 "2022-01-03" "x001" "a007b005"
2 "2022-01-03" "x001" "a007b004"
2 "2022-01-03" "x002" "a007b004"
5 "2022-01-03" "x002" "a007b005"
If you still don't get the performance you need, you could possibly use the Eventing service to do a continuous map/reduce and an occasional update query to make sure things stay perfectly in sync.
I want to extract the subtree containing the focused window from i3-msg -t get_tree with jq. I know the focused window can be found with
i3-msg -t get_tree | jq ".. | (.nodes? // empty)[] | select(.focused == true)"
A simple example would be:
{
"node": [
{
"node": {
"foo": "bar"
}
},
{
"node": {
"foo": "foo"
}
}
]
}
And the output should if searching for a node containg .foo == "bar" should return
{
"node": [
{
"node": {
"foo": "bar"
}
}
]
}
But I can't seem to find a proper method to extract the subtree spanning from the root to this node.
.node |= map(select(.node.foo == "bar"))
This concept is referred to as Update assignment
The original question has two distinct sub-questions, one related to the use of .. without reference to a posted JSON sample, and the other based on a specific type of JSON input. This response uses a strategy based on using paths along the lines of:
reduce pv as [$p,$v] (null; setpath($p; $v))
This may or may not handle arrays as desired, depending in part on what is desired. If null values in arrays are not wanted in general, then adding a call to walk/1 as follows would be appropriate:
walk(if type == "array" then map(select(. != null)) else . end)
Alternatively, if the null values that are present in the original must be preserved, the strategy detailed in the Appendix below may be used.
(1) Problem characterized by using ..
def pv:
paths as $p
| getpath($p)
| . as $v
| (.nodes? // empty)[] | select(.focused == true)
| [$p,$v];
reduce pv as [$p,$v] (null; setpath($p; $v))
As mentioned above, to eliminate all the nulls in all arrays, you could tack on a call to walk/1. Otherwise, if the null values inserted in arrays by setpath to preserve aspects of the original structure, see the Appendix below.
(2) For the sample JSON, the following suffices:
def pv:
paths as $p
| getpath($p)
| . as $v
| (.node? // empty) | select(.foo == "bar")
| [$p,$v];
reduce pv as [$p,$v] (null; setpath($p; $v))
For the given sample, this produces:
{"node":[{"node":{"foo":"bar"}}]}
For similar inputs, if one wants to eliminate null values from arrays, simply tack on the call to walk/1 as before; see also the Appendix below.
Appendix
If the null values potentially inserted into arrays by setpath to preserve the original structure are not wanted, the simplest would be to change null values in the original JSON to some distinctive value (e.g. ":null:"), perform the selection, trim the null values, and then convert the distinctive value back to null.
Example
For example, consider this variant of the foo/bar example:
{
"node": [
{
"node": {
"foo": "foo0"
}
},
{
"node": {
"foo": "bar",
"trouble": [
null,
1,
null
]
}
},
{
"node": {
"foo": "foo1"
}
},
{
"node": {
"foo": "bar",
"trouble": [
1,
2,
3
]
}
}
],
"nodes": [
{
"node": {
"foo": "foo0"
}
},
{
"node": {
"foo": "bar",
"trouble": [
null,
1,
null
]
}
}
]
}
Using ":null:" as the distinctive value, the following variant of the "main" program previously shown for this case may be used:
walk(if type == "array" then map(if . == null then ":null:" else . end) else . end)
| reduce pv as [$p,$v] (null; setpath($p; $v))
| walk(if type == "array"
then map(select(. != null) | if . == ":null:" then null else . end)
else . end)
I have two clients updating the same document at about the same time:
{
a: "1",
b: "2",
}
Client A changes a to "8" and client B changes b to "9". Does rethinkdb guarantee the following will be the final result?
{
a: "8",
b: "9",
}
If it does not (i.e. the result may sometimes be 1 & 9 or 2 & 8) then is it the case that to avoid data getting 'trounced' in this way that every writer has to have it's own dedicated tables and/or rows?
Thanks,
Brent
The really final result will be:
{
a: "8",
b: "9",
}
But, if, for example, 3d client tries to read this record between 2 updates, he possible will get:
{
a: "8",
b: "2",
}
My documents contain an array, for example;
{
"a": [ 2, 3 ],
"id": ...
}
I want to return only the documents for which a contains only the elements 2 and 3.
This is the simplest I've found yet;
r.table("t").filter(r.row('a').difference([2,3]).eq([]))
Is there a better way?
A nice way to write the same function would be to use .isEmpty instead of .eq([]).
r.table("t")
.filter(r.row('a').difference([2,3]).isEmpty())
This query is equivalent to the function you wrote.
That being said, your current query returns document where a has only 2 and/or 3. So, for example, a document with with a: [2] would get matched.
Example result set:
{
"a": [ 2 ] ,
"id": "f60f0e43-a542-499f-9481-11372cc386c8"
} {
"a": [ 2, 3 ] ,
"id": "c6ed9b4e-1399-47dd-a692-3db80df4143c"
}
That might be what you want, but if you only want documents where the a property is [2, 3] or [3, 2] exactly and contains no other elements, you might want to just use .eq:
r.table("t")
.filter(r.row('a').eq([2,3]).or( r.row('a').eq([3, 2]) ))
Example result:
{
"a": [ 2, 3 ] ,
"id": "c6ed9b4e-1399-47dd-a692-3db80df4143c"
}, {
"a": [ 3, 2 ] ,
"id": "cb2b5fb6-7601-43b4-a0fd-4b6b8eb83438"
}
this is pretty straight forward im sure, but im feeling braindead right now, and can't figure this out....
I have this JSON response, and just want to grab all the values for the key "price", and dump them all into an array, so i can get the average for both of them.
{
"status": "success",
"data": {
"network": "DOGE",
"prices": [
{
"price": "0.00028055",
"price_base": "USD",
"exchange": "bter",
"time": 1407184167
},
{
"price": "0.00022007",
"price_base": "USD",
"exchange": "cryptsy",
"time": 1407184159
}
]
}
}
this is my code thus far:
data = ActiveSupport::JSON.decode(response)
status = data["status"]
if status == "success"
total = ......what do i need to do here?...
end
thanks in advance
:)
How to sum array of numbers in Ruby?
Except you yield a hash, not a number. So you drill in.
And since the values are strings, you have to convert them to floats to do math.
total = data["data"]["prices"].reduce(0.0) do |sum, hash|
sum + hash["price"].to_f
end
Out of curiosity, how were you stuck? What was the logical gap in your understanding that prevented you from finding a solution?