ruby - get all values for particular key from JSON string - ruby

this is pretty straight forward im sure, but im feeling braindead right now, and can't figure this out....
I have this JSON response, and just want to grab all the values for the key "price", and dump them all into an array, so i can get the average for both of them.
{
"status": "success",
"data": {
"network": "DOGE",
"prices": [
{
"price": "0.00028055",
"price_base": "USD",
"exchange": "bter",
"time": 1407184167
},
{
"price": "0.00022007",
"price_base": "USD",
"exchange": "cryptsy",
"time": 1407184159
}
]
}
}
this is my code thus far:
data = ActiveSupport::JSON.decode(response)
status = data["status"]
if status == "success"
total = ......what do i need to do here?...
end
thanks in advance
:)

How to sum array of numbers in Ruby?
Except you yield a hash, not a number. So you drill in.
And since the values are strings, you have to convert them to floats to do math.
total = data["data"]["prices"].reduce(0.0) do |sum, hash|
sum + hash["price"].to_f
end
Out of curiosity, how were you stuck? What was the logical gap in your understanding that prevented you from finding a solution?

Related

N1Ql query grouping on distinct values with different keys

I have ~7 million docs in a bucket and I am struggling to write the correct query/index combo to prevent it from running >5 seconds.
Here is a similar scenario to the one I am trying to solve:
I have multiple coffee shops each making coffee with different container/lid combos. These field key’s are also different for different doc types. With each sale being generated I keep track of these combos.
Here are a few example docs:
[{
"shopId": "x001",
"date": "2022-01-01T08:49:00Z",
"cappuccinoContainerId": "a001",
"cappuccinoLidId": "b001"
},
{
"shopId": "x001",
"date": "2022-01-02T08:49:00Z",
"latteContainerId": "a002",
"latteLidId": "b002"
},
{
"shopId": "x001",
"date": "2022-01-02T08:49:00Z",
"espressoContainerId": "a003",
"espressoLidId": "b003"
},
{
"shopId": "x002",
"date": "2022-01-01T08:49:00Z",
"cappuccinoContainerId": "a001",
"cappuccinoLidId": "b001"
},
{
"shopId": "x002",
"date": "2022-01-02T08:49:00Z",
"latteContainerId": "a002",
"latteLidId": "b002"
},
{
"shopId": "x002",
"date": "2022-01-02T08:49:00Z",
"espressoContainerId": "a003",
"espressoLidId": "b003"
}]
What I need to get out of the query is the following:
[{
"shopId": "x001",
"day": "2022-01-01",
"uniqueContainersLidsCombined": 2
},
{
"shopId": "x001",
"day": "2022-01-01",
"uniqueContainersLidsCombined": 4
},
{
"shopId": "x002",
"day": "2022-01-01",
"uniqueContainersLidsCombined": 2
},
{
"shopId": "x002",
"day": "2022-01-01",
"uniqueContainersLidsCombined": 4
}]
I.e. I want the total number of unique containers and lids combined per site and day.
I have tried using composite, adaptive and FTS indexes but I unable to figure this one out.
Does anybody have a different suggestion? Can someone please help?
CREATE INDEX ix1 ON default(shopId, DATE_FORMAT_STR(date,"1111-11-11"), [cappuccinoContainerId, cappuccinoLidId]);
If Using EE and shopId is immutable add PARTITION BY HASH (shopId) to above index definition (with higher partition numbers).
SELECT d.shopId,
DATE_FORMAT_STR(d.date,"1111-11-11") AS day
COUNT(DISTINCT [d.cappuccinoContainerId, d.cappuccinoLidId]) AS uniqueContainersLidsCombined
FROM default AS d
WHERE d.shopId IS NOT NULL
GROUP BY d.shopId, DATE_FORMAT_STR(d.date,"1111-11-11");
Adjust index key order of shopId, day based on the query predicates.
https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/
Update:
Based on EXPLAIN you have date predicate and all shopIds so use following index
CREATE INDEX ix2 ON default( DATE_FORMAT_STR(date,"1111-11-11"), shopId, [cappuccinoContainerId, cappuccinoLidId]);
As you need to DISTINCT of cappuccinoContainerId, cappuccinoLidId storing as single key (array of 2 elements) as [cappuccinoContainerId, cappuccinoLidId]. The advantage of this you can directly reference in COUNT as DISTINCT this allows use index aggregation. (NO DISTINCT in the Index that turns into ARRAY index and things will not work as expected .
I assume
That the cup types and lid types can be used for any drink type.
That you don't want to add any precomputed stuff to your data.
Perhaps an index like this my collection keyspace is in bulk.sales.amer, note I am not sure if this performs better or worse (or even if it is equivalent) WRT the solution posted by vsr:
CREATE INDEX `adv_shopId_concat_nvls`
ON `bulk`.`sales`.`amer`(
`shopId` MISSING,
(
nvl(`cappuccinoContainerId`, "") ||
nvl(`cappuccinoLidId`, "") ||
nvl(`latteContainerId`, "") ||
nvl(`latteLidId`, "") ||
nvl(`espressoContainerId`, "") ||
nvl(`espressoLidId`, "")),substr0(`date`, 0, 10)
)
And then a using the covered index above do your query like this:
SELECT
shopId,
CONCAT(
NVL(cappuccinoContainerId,""),
NVL(cappuccinoLidId,""),
NVL(latteContainerId,""),
NVL(latteLidId,""),
NVL(espressoContainerId,""),
NVL(espressoLidId,"")
) AS uniqueContainersLidsCombined,
SUBSTR(date,0,10) AS day,
COUNT(*) AS cnt
FROM `bulk`.`sales`.`amer`
GROUP BY
shopId,
CONCAT(
NVL(cappuccinoContainerId,""),
NVL(cappuccinoLidId,""),
NVL(latteContainerId,""),
NVL(latteLidId,""),
NVL(espressoContainerId,""),
NVL(espressoLidId,"")
),
SUBSTR(date,0,10)
Note I used the following 16 lines of data:
{"amer":"amer","date":"2022-01-01T08:49:00Z","cappuccinoContainerId":"a001","cappuccinoLidId":"b001","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-01T08:49:00Z","cappuccinoContainerId":"a001","cappuccinoLidId":"b001","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-02T08:49:00Z","latteContainerId":"a002","latteLidId":"b002","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-02T08:49:00Z","latteContainerId":"a002","latteLidId":"b002","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-02T08:49:00Z","espressoContainerId":"a003","espressoLidId":"b003","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-02T08:49:00Z","espressoContainerId":"a003","espressoLidId":"b003","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T08:49:00Z","cappuccinoContainerId":"a007","cappuccinoLidId":"b004","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-03T08:49:00Z","cappuccinoContainerId":"a007","cappuccinoLidId":"b004","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T08:49:00Z","latteContainerId":"a007","latteLidId":"b004","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-03T08:49:00Z","latteContainerId":"a007","latteLidId":"b004","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T01:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x001"}
{"amer":"amer","date":"2022-01-03T02:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T03:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T04:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T05:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
{"amer":"amer","date":"2022-01-03T06:49:00Z","espressoContainerId":"a007","espressoLidId":"b005","sales":"sales","shopId":"x002"}
Applying some sorting by wrapping the above query with
SELECT T1.* FROM
(
-- paste above --
) AS T1
ORDER BY T1.day, T1,shopid, T1.uniqueContainersLidsCombined
We get
cnt day shopId uniqueContainersLidsCombined
1 "2022-01-01" "x001" "a001b001"
1 "2022-01-01" "x002" "a001b001"
1 "2022-01-02" "x001" "a002b002"
1 "2022-01-02" "x001" "a003b003"
1 "2022-01-02" "x002" "a002b002"
1 "2022-01-02" "x002" "a003b003"
1 "2022-01-03" "x001" "a007b005"
2 "2022-01-03" "x001" "a007b004"
2 "2022-01-03" "x002" "a007b004"
5 "2022-01-03" "x002" "a007b005"
If you still don't get the performance you need, you could possibly use the Eventing service to do a continuous map/reduce and an occasional update query to make sure things stay perfectly in sync.

Rethink merge array

I have a query -
r.table('orgs')
.filter(function(org) {
return r.expr(['89a26384-8fe0-11e8-9eb6-529269fb1459', '89a26910-8fe0-11e8-9eb6-529269fb1459'])
.contains(org('id'));
})
.pluck("users")
This returns following output-
{
"users": [
"3a415919-f50b-4e15-b3e6-719a2a6b26a7"
]
} {
"users": [
"d8c4aa0a-8df6-4d47-8b0e-814a793f69e2"
]
}
How do I get the result as -
[
"3a415919-f50b-4e15-b3e6-719a2a6b26a7","d8c4aa0a-8df6-4d47-8b0e-814a793f69e2"
]
First, don't use that complicated and resource-consuming .filter directly on a table. Since your tested field is already indexed (id), you can:
r.table('orgs').getAll('89...59', '89...59')
or
r.table('orgs').getAll(r.args(['89...59', '89...59']))
which is way faster (way!). I recently found this article about how faster that is.
Now to get an array of users without the wrapping, using the brackets operation:
r.table('orgs').getAll(...).pluck('users')('users')
will provide a result like
[
'123',
'456'
],
[
'123',
'789'
]
We just removed the "users" wrapping, but the result is an array of arrays. Let's flatten this 2D array with .concatMap:
r.table('orgs').getAll(...).pluck('users')('users').concatMap(function (usrs) {
return usrs;
})
Now we've concatenated the sub-arrays into one, however we see duplicates (from my previous result example, you'd have '123' twice). Just .distinct the thing:
r.table('orgs').getAll(...).pluck('users')('users').concatMap(function (usrs) {
return usrs;
}).distinct()
From the example I took, you have now:
'123',
'456',
'789'
Et voilà!

Name of sorting algorithm?

I'm trying to figure out the name of a sorting algorithm (or just a method?) that sorts via 3 values.
We start off with 3 values and the array should sort based on the id of the object, position and then the date it was set to that position, allowing both date and position to be the same. Please excuse my horrible explanation. I will give an example.
we have 6 positions, without any edits the array would look something like this
{id:1,pos:0,date:0}
{id:2,pos:0,date:0}
{id:3,pos:0,date:0}
{id:4,pos:0,date:0}
{id:5,pos:0,date:0}
{id:6,pos:0,date:0}
if I was to move the first object to the second position, it would return this order
{id:2,pos:0,date:0}
{id:1,pos:2,date:1}
{id:3,pos:0,date:0}
{id:4,pos:0,date:0}
{id:5,pos:0,date:0}
{id:6,pos:0,date:0}
However if we where to then move the third object into the second position
{id:2,pos:0,date:0}
{id:3,pos:2,date:2}
{id:1,pos:2,date:1}
{id:4,pos:0,date:0}
{id:5,pos:0,date:0}
{id:6,pos:0,date:0}
Note the pos does not change but is ordered before positions of the same number based on the higher date value.
We now move the 4th object into position 1
{id:4,pos:1,date:3}
{id:2,pos:0,date:0}
{id:3,pos:2,date:2}
{id:1,pos:2,date:1}
{id:5,pos:0,date:0}
{id:6,pos:0,date:0}
note id 2 takes the position of number 2 even though pos and date are still 0 because the id is less than the id behind it
We now move id 6 to position 2
{id:4,pos:1,date:3}
{id:6,pos:2,date:4}
{id:2,pos:0,date:0}
{id:3,pos:2,date:2}
{id:1,pos:2,date:1}
{id:5,pos:0,date:0}
id 5 to position 4
{id:4,pos:1,date:3}
{id:6,pos:2,date:4}
{id:2,pos:0,date:0}
{id:5,pos:4,date:5}
{id:3,pos:2,date:2}
{id:1,pos:2,date:1}
And finally id 2 to position 6
{id:4,pos:1,date:3}
{id:6,pos:2,date:4}
{id:5,pos:4,date:5}
{id:3,pos:2,date:2}
{id:1,pos:2,date:1}
{id:2,pos:6,date:6}
I hope my examples aid any response given, I know this is not a question of much quality and if answered I will do my best to edit the question as best I can.
Just a guess, because your final order doesn't look "sorted", lexicographical sort? See Lexicographical order.
The movement of objects is similar to insertion sort, where an entire sub-array is shifted in order to insert an object. The date indicates the order of operations that were performed, and the position indicates where the object was moved to, but there's no field for where an object was moved from. There's enough information to reproduce the sequence by starting with the initial ordering and following the moves according to the date. I don't know if the sequence can be followed in reverse with the given information.
The original ordering can be restored using any sort algorithm using the id field.
I was unfortunately unable to find the name of the 'sort'(?) however, I was able to achieve the effect I was aiming for using the code bellow.
(If I missed something entirely let me know I'll change it and credit you)
PHP Implementation.
$data = '[
{"id":"1","pos":"1","date":"0"},
{"id":"2","pos":"5","date":"0"},
{"id":"3","pos":"4","date":"0"},
{"id":"4","pos":"3","date":"0"},
{"id":"5","pos":"4","date":"1"},
{"id":"6","pos":"2","date":"0"}
]'; //simulated data set
$arr = json_decode($data,true);
$final_arr = $arr;
$tmp_array = array();
$actions = array();
for ($i=0; $i < sizeof($arr); $i++) {
$num = $i+1;
$tmp = array();
for ($o=0; $o < sizeof($arr); $o++) {
if($arr[$o]['pos'] == 0)continue;
if($arr[$o]['pos'] == $num){
array_push($tmp,$arr[$o]);
}
}
if($tmp){
usort($tmp,function($a,$b){
return $a['date'] > $b['date'];
});
for ($o=0; $o < sizeof($tmp); $o++) {
array_push($tmp_array,$tmp[$o]);
}
}
}
for ($i=0; $i < sizeof($tmp_array); $i++) {
for ($o=0; $o < sizeof($arr); $o++) {
if($final_arr[$o]['id'] == $tmp_array[$i]['id']){
array_splice($final_arr, $tmp_array[$i]['pos']-1, 0, array_splice($final_arr, $o, 1));
}
}
}
$output = json_encode($final_arr,JSON_PRETTY_PRINT);
printf($output);
Result:
[
{
"id": "1",
"pos": "1",
"date": "0"
},
{
"id": "6",
"pos": "2",
"date": "0"
},
{
"id": "4",
"pos": "3",
"date": "0"
},
{
"id": "5",
"pos": "4",
"date": "1"
},
{
"id": "2",
"pos": "5",
"date": "0"
},
{
"id": "3",
"pos": "4",
"date": "0"
}
]

How do I dynamically name a collection?

Title: How do I dynamically name a collection?
Pseudo-code: collect(n) AS :Label
The primary purpose of this is for easy reading of the properties in the API Server (node application).
Verbose example:
MATCH (user:User)--(n)
WHERE n:Movie OR n:Actor
RETURN user,
CASE
WHEN n:Movie THEN "movies"
WHEN n:Actor THEN "actors"
END as type, collect(n) as :type
Expected output in JSON:
[{
"user": {
....
},
"movies": [
{
"_id": 1987,
"labels": [
"Movie"
],
"properties": {
....
}
}
],
"actors:" [ .... ]
}]
The closest I've gotten is:
[{
"user": {
....
},
"type": "movies",
"collect(n)": [
{
"_id": 1987,
"labels": [
"Movie"
],
"properties": {
....
}
}
]
}]
The goal is to be able to read the JSON result with ease like so:
neo4j.cypher.query(statement, function(err, results) {
for result of results
var user = result.user
var movies = result.movies
}
Edit:
I apologize for any confusion in my inability to correctly name database semantics.
I'm wondering if it's enough just to output the user and their lists of both actors and movies, rather than trying to do a more complicated means of matching and combining both.
MATCH (user:User)
OPTIONAL MATCH (user)--(m:Movie)
OPTIONAL MATCH (user)--(a:Actor)
RETURN user, COLLECT(m) as movies, COLLECT(a) as actors
This query should return each User and his/her related movies and actors (in separate collections):
MATCH (user:User)--(n)
WHERE n:Movie OR n:Actor
RETURN user,
REDUCE(s = {movies:[], actors:[]}, x IN COLLECT(n) |
CASE WHEN x:Movie
THEN {movies: s.movies + x, actors: s.actors}
ELSE {movies: s.movies, actors: s.actors + x}
END) AS types;
As far as a dynamic solution to your question, one that will work with any node connected to your user, there are a few options, but I don't believe you can get the column names to be dynamic like this, or even the names of the collections returned, though we can associate them with the type.
MATCH (user:User)--(n)
WITH user, LABELS(n) as type, COLLECT(n) as nodes
WITH user, {type:type, nodes:nodes} as connectedNodes
RETURN user, COLLECT(connectedNodes) as connectedNodes
Or, if you prefer working with multiple rows, one row each per node type:
MATCH (user:User)--(n)
WITH user, LABELS(n) as type, COLLECT(n) as collection
RETURN user, {type:type, data:collection} as connectedNodes
Note that LABELS(n) returns a list of labels, since nodes can be multi-labeled. If you are guaranteed that every interested node has exactly one label, then you can use the first element of the list rather than the list itself. Just use LABELS(n)[0] instead.
You can dynamically sort nodes by label, and then convert to the map using the apoc library:
WITH ['Actor','Movie'] as LBS
// What are the nodes we need:
MATCH (U:User)--(N) WHERE size(filter(l in labels(N) WHERE l in LBS))>0
WITH U, LBS, N, labels(N) as nls
UNWIND nls as nl
// Combine the nodes on their labels:
WITH U, LBS, N, nl WHERE nl in LBS
WITH U, nl, collect(N) as RELS
WITH U, collect( [nl, RELS] ) as pairs
// Convert pairs "label - values" to the map:
CALL apoc.map.fromPairs(pairs) YIELD value
RETURN U as user, value

Keep id order as in query

I'm using elasticsearch to get a mapping of ids to some values, but it is crucial that I keep the order of the results in the order that the ids have.
Example:
def term_mapping(ids)
ids = ids.split(',')
self.search do |s|
s.filter :terms, id: ids
end
end
res = term_mapping("4,2,3,1")
The result collection should contain the objects with the ids in order 4,2,3,1...
Do you have any idea how I can achieve this?
If you need to use search you can sort ids before you send them to elasticsearch and retrive results sorted by id, or you can create a custom sort script that will return the position of the current document in the array of ids. However, a simpler and faster solution would be to simply use Multi-Get instead of search.
One option is to use the Multi GET API. If this doesn't work for you, another solution is to sort the results after you retrieve them from es. In python, this can be done this way:
doc_ids = ["123", "333", "456"] # We want to keep this order
order = {v: i for i, v in enumerate(doc_ids)}
es_results = [{"_id": "333"}, {"_id": "456"}, {"_id": "123"}]
results = sorted(es_results, key=lambda x: order[x['_id']])
# Results:
# [{'_id': '123'}, {'_id': '333'}, {'_id': '456'}]
May be this problem is resolved,, but someone will help with this answer
we can used the pinned_query for the ES. Do not need the loop for the sort the order
**qs = {
"size" => drug_ids.count,
"query" => {
"pinned" => {
"ids" => drug_ids,
"organic" => {
"terms": {
"id": drug_ids
}
}
}
}
}**
It will keep the sequence of the input as it

Resources