RethinkDB simple pluck from nested array - rethinkdb

I am new to RethinkDB and have looked here and elsewhere for the answer to this. I have found several things close, but still can't seem to figure out what seems like it should be simple. I have a query:
r.db('common').table("counters").filter({org: 'myorg'}).pluck('counters').run()
That gives the following results:
{
"counters": [
{
"aid": 0 ,
"pid": 1000 ,
"rid": 0
}
]
}
What I want is to pluck or somehow get a specific counter (e.g. pid). I tried counter[0].pid, counters.pid and a few others, but can't quite seem to find the magic bullet. From what I did find, I suspect this may involve a function, but am not sure where it should go. Any help is appreciated and if you dup this, please make sure it's an exact dup and not something close. Thanks!

OK, had to change the array to an object:
{
"counters": {
"aid": 0 ,
"pid": 1000 ,
"rid": 0
}
}
... then use get(), this worksr.db('common').table("counters").get('12345-1234-54321-6666-f0dac0b6b68e')('counters')('pid')

Related

Elastic Ingest Pipeline split field and create a nested field

Dear freindly helpers,
I have an index that is fed by a database via Kafka. Now this database holds a field that aggregates a couple of pieces of information like so key/value; key/value; (don't ask for the reason, I have no idea who designed it liked that and why ;-) )
93/4; 34/12;
it can be empty, or it can hold 1..n key/value pairs.
I want to use an ingest pipeline and ideally have a "nested" field which holds all values that are in tha field.
Probably like this:
{"categories":
{ "93": 7,
"82": 4
}
}
The use case is the following: we want to visualize the sum of a filtered number of these categories (they tell me how many minutes a specific process took longer) and relate them in ranges.
Example: I filter categories x, y ,z and then group how many documents for the day had no delay, which had a delay up to 5 minutes and which had a delay between 5 and 15 minutes.
I have tried to get the fields neatly separated with the kv processor and wanted to work from there on but it was a complete wrong approach I guess.
"kv": {
"field": "IncomingField",
"field_split": ";",
"value_split": "/",
"target_field": "delays",
"ignore_missing": true,
"trim_key": "\\s",
"trim_value": "\\s",
"ignore_failure": true
}
When I test the pipeline it seems ok
"delays": {
"62": "3",
"86": "2"
}
but there are two things that don't work.
I can't know upfront how many of these combinations I have and thus converting the values from string t int in the same pipeline is an issue.
When I want to create a kibana index pattern I end up with many fields like delay.82 and delay.82.keyword which does not make sense at all for the usecase as I can't filter (get only the sum of delays where the key is one of x,y,z) and aggregate.
I have looked into other processors (dorexpander) but can't really get my head around how to get this working.
I hope my question is clear (I lack english skills, sorry) and that someone can point me at the right direction.
Thank you very much!
You should rather structure them as an array of objects with shared accessors, for instance:
[ {key: 93, value: 7}, ...]
That way, you'll be able to aggregate on categories.key and categories.value.
So this means iterating the categories' entrySet() using a custom script processor like so:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "extracts k/v pairs",
"processors": [
{
"script": {
"source": """
def categories = ctx.categories;
def kv_pairs = new ArrayList();
for (def pair : categories.entrySet()) {
def k = pair.getKey();
def v = pair.getValue();
kv_pairs.add(["key": k, "value": v]);
}
ctx.categories = kv_pairs;
"""
}
}
]
},
"docs": [
{
"_source": {
"categories": {
"82": 4,
"93": 7
}
}
}
]
}
P.S.: Do make sure your categories field is mapped as nested b/c otherwise you'll lose the connections between the keys & the values (also called flattening).

How can I filter if any value of an array is contained in another array in rethinkdb/reql?

I want to find any user who is member of a group I can manage (using the webinterface/javascript):
Users:
{
"id": 1
"member_in_groups": ["all", "de-south"]
},
{
"id": 2
"member_in_groups": ["all", "de-north"]
}
I tried:
r.db('mydb').table('users').filter(r.row('member_in_groups').map(function(p) {
return r.expr(['de-south']).contains(p);
}))
but always both users are returned. Which command do I have to use and how can I use an index for this (I read about multi-indexes in https://rethinkdb.com/docs/secondary-indexes/python/#multi-indexes but there only one value is searched for)?
I got the correct answer at the slack channel so posting it here if anyone else comes to this thread through googling:
First create a multi index as described in
https://rethinkdb.com/docs/secondary-indexes/javascript/, e. g.
r.db('<db-name>').table('<table-name>').indexCreate('<some-index-name>', {multi: true}).run()
(you can omit .run() if using the webadmin)
Then query the data with
r.db('<db-name>').table('<table-name>').getAll('de-north', 'de-west', {index:'<some-index-name>'}).distinct()

Elastic Search - Sort by multiple fields with the missing parameter

I am trying to apply a sort to an Elastic Search query by two different fields:
price_sold and price_list
I would like to first sort on price_sold, but if that value is null, I would like to then sort by price_list
Would the query be correct if I just set the sorts to:
"sort": [
{ "price_sold": { "order": "desc"}},
{ "price_list": { "order": "desc"}}
]
I have executed the query, and I do not get any errors, and it seems like the results are correct, however I am curious if I have overlooked something.
I have been reading about the missing filter, along with possibly using a custom value. This may not be required, but I am not quite sure.
Would there be a way to define a second field to sort on if the first field is missing, or is that not necessary? Something like:
"sort": [{"price_sold: {"order": "desc", "missing": "doc['field_name']"}]
Would simply adding these two sorts give me the desired result?
Thanks.
I think I understand what you're asking. In SQL terms, you'd like to ORDER BY COALESCE(price_sold, price_list) DESC.
The first sort you listed is a little different. It's similar to ORDER BY price_sold DESC, price_list DESC - in other words, primary sort is by price_sold, and for entries where price_sold is equal, secondary sort is by price_list.
Your second sort attempt would be great if "missing" worked that way. Unfortunately, missing's "custom" option appears to allow you to specify a constant value only.
If you don't need to limit your search using from and size, you should be able to use sort's _script option to write some logic that works for you. I ended up here because I do use from and size to retrieve batches, and when I sort by _script, the items I'm getting don't make sense - the items are sorted correctly, but I'm not getting the right set of items. So, I added a new analyzer and expanded my fields to use the new analyzer, and I was hoping to be able to sort using the new field or, if the new field doesn't exist (for previously-indexed items), use the old field's value instead. But that doesn't seem to be possible. I think I'm going to have to reindex my items so my new field is populated.
In case someone is still looking I ended up creating a script similar to this:
curl -XGET 'localhost:9200/_search?pretty&size=10&from=0' -H 'Content-Type: application/json' -d'
{
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"inline": "doc[\u0027price_sold\u0027] == null ? doc[\u0027price_list\u0027].value : doc[\u0027price_sold\u0027].value"
},
"order" : "desc"
}
},
}
'
For sorting dates, the type still has to remain number but you replace .value with .date.getMillisOfDay() as discussed here.
The from and size worked fine in my version of ElasticSearch (5.1.1).
To make sure your algorithm is working fine check the generated value in the response, e.g.: "sort" : [ 5.0622E7 ].

nested count aggregations in elasticsearch

I have a type in elasticsearch where each user can post any number of posts(fields being "userid" and "post").Now I need the count of users who posted 0 post,1 post,2 posts and so on....how do I do it? I think it needs some nested aggregations implemented but I don't know how to proceed. Thanks in advance !
The best way of doing this is to add a separate field to store the number of posts.
Scripts are not too efficient (values are getting re-evaluated each time a query executes) and you get the value indexed properly which makes queries and aggregations very fast.
Of course you need to be sure you update this count each time you update the document.
You can use script in aggregation:
POST index_name/type_name/_search
{
"aggs": {
"group By Post Count": {
"terms": {
"script" : "doc['post'].size()"
}
}
}
}
Make sure you enable scriptig
Hope this helps you.

Is there a better way to parse a mongodb query in go?

Hi I got a sort of complex aggregate query that I must write with mgo, but I got really dazed when work it out half way :-(, Is there a better way to do that ?
Here is a console query aggregate command that I have tested and it works.
db.event.aggregate([{$match:{clktime:{$gt:1425289561}}},{$group:{"_id":{$subtract:["$clktime",{$mod:["$clktime", 60*5]}]}, count:{$sum:1}}}])
And here is what I have got so far:
c.Pipe([]bson.M{bson.M{"$match": bson.M{"clktime": bson.M{"$gt": 1425289561}}}, bson.M{"$group": bson.M{"_id": bson.M{"$subtract": []bson.M{bson.M{"$clktime"}, bson.M{"$mod": []bson.M{bson.M{"$clktime"}, bson.M{60 * 5}}}}}}, "count": bson.M{"$sum": 1}}})
It says that there is a missing key in map literal, but I can't find where.
I thought human beings don't deserve that, I am so desperate T_T.
Is there a better or humanity way to do that ?
x := []bson.M{{"$match": bson.M{"clktime": bson.M{"gt": 1425289561}}},{"$group": bson.M{"_id": bson.M{"$subtract": []interface{}{"$clktime", bson.M{"$mod": []interface{}{"$clktime", 60 * 5}}}}, "count": bson.M{"$sum": 1}}}}
Yes, there is a better way. Break your code into multiple lines and use comma after last element in map or array. Then the code will be formatted automatically and you will also get a readable error messages indicating line.
package main
type M map[string]M
var x = M{
"a": M{
"b": M{},
"c": M{},
},
}
By the way. Look at this part bson.M{"$clktime"}, bson.M{60 * 5}}.
Finally, I have write this out, but the suggestions #Grzegorz give(which to split the building or query to multi lines for convenience) and also considered #Morty's opinion(which using []interface{} when come to arrays in query command). And here is what I got which works:
q := []bson.M{
bson.M{
"$match": bson.M{
"clktime": bson.M{
"$gt": 1425289561,
},
},
},
bson.M{
"$group": bson.M{
"_id": bson.M{
"$subtract": []interface{}{
"$clktime",
bson.M{
"$mod": []interface{}{
"$clktime",
60 * 5,
},
},
},
},
"count": bson.M{"$sum": 1},
},
},
}
Hope it will be helpful for other people come to similar issue.

Resources