How can I filter if any value of an array is contained in another array in rethinkdb/reql? - rethinkdb

I want to find any user who is member of a group I can manage (using the webinterface/javascript):
Users:
{
"id": 1
"member_in_groups": ["all", "de-south"]
},
{
"id": 2
"member_in_groups": ["all", "de-north"]
}
I tried:
r.db('mydb').table('users').filter(r.row('member_in_groups').map(function(p) {
return r.expr(['de-south']).contains(p);
}))
but always both users are returned. Which command do I have to use and how can I use an index for this (I read about multi-indexes in https://rethinkdb.com/docs/secondary-indexes/python/#multi-indexes but there only one value is searched for)?

I got the correct answer at the slack channel so posting it here if anyone else comes to this thread through googling:
First create a multi index as described in
https://rethinkdb.com/docs/secondary-indexes/javascript/, e. g.
r.db('<db-name>').table('<table-name>').indexCreate('<some-index-name>', {multi: true}).run()
(you can omit .run() if using the webadmin)
Then query the data with
r.db('<db-name>').table('<table-name>').getAll('de-north', 'de-west', {index:'<some-index-name>'}).distinct()

Related

gorm use the find rerurn empty

I use the find method to find some records,but when I send the wrong condition.it should return the
ErrRecordNotFound,but it return a empty struct
var projectSetBudget domain.ProjectSetBudget
filter := &domain.ProjectSetBudget{}
filter.TenantID = tenantID
filter.ID = ProjectSetBudgetID
res := r.db.Where(filter).Find(&projectSetBudget)
i just print the info
zap.S().Info(errors.Is(res.Error, gorm.ErrRecordNotFound))
it print false
and return the empty struct
{
"ID": 0,
"CreatedAt": "0001-01-01T00:00:00Z",
"UpdatedAt": "0001-01-01T00:00:00Z",
"DeletedAt": null,
"title": "",
}
GORM provides First, Take, Last methods to retrieve a single object from the database, it adds LIMIT 1 condition when querying the database, and it will return the error ErrRecordNotFound if no record is found.
https://gorm.io/docs/query.html
only First, Take, Last methods but find
if you use First, Take, Last there will be ErrRecordNotFound
First, Take, Last have folow code, but find have not, find will find not only one, so you can know it found or not by count:
tx.Statement.RaiseErrorOnNotFound = true

FaunaDB search document and get its ranking based on a score

I have the following Collection of documents with structure:
type Streak struct {
UserID string `fauna:"user_id"`
Username string `fauna:"username"`
Count int `fauna:"count"`
UpdatedAt time.Time `fauna:"updated_at"`
CreatedAt time.Time `fauna:"created_at"`
}
This looks like the following in FaunaDB Collections:
{
"ref": Ref(Collection("streaks"), "288597420809388544"),
"ts": 1611486798180000,
"data": {
"count": 1,
"updated_at": Time("2021-01-24T11:13:17.859483176Z"),
"user_id": "276989300",
"username": "yodanparry"
}
}
Basically I need a lambda or a function that takes in a user_id and spits out its rank within the collection. rank is simply sorted by the count field. For example, let's say I have the following documents (I ignored other fields for simplicity):
user_id
count
abc
12
xyz
10
fgh
999
If I throw in fgh as an input for this lambda function, I want it to spit out 1 (or 0 if you start counting from 0).
I already have an index for user_id so I can query and match a document reference from this index. I also have an index sorted_count that sorts document based on count field ascendingly.
My current solution was to query all documents by sorted_count index, then get the rank by iterating through the array. I think there should be a better solution for this. I'm just not seeing it.
Please help. Thank you!
Counting things in Fauna isn't as easy as one might expect. But you might still be able to do something more efficient than you describe.
Assuming you have:
CreateIndex(
{
name: "sorted_count",
source: Collection("streaks"),
values: [
{ field: ["data", "count"] }
]
}
)
Then you can query this index like so:
Count(
Paginate(
Match(Index("sorted_count")),
{ after: 10, size: 100000 }
)
)
Which will return an object like this one:
{
before: [10],
data: [123]
}
Which tells you that there are 123 documents with count >= 10, which I think is what you want.
This means that, in order to get a user's rank based on their user_id, you'll need to implement this two-step process:
Determine the count of the user in question using your index on user_id.
Query sorted_count using the user's count as described above.
Note that, in case your collection has more than 100,000 documents, you'll need your Go code to iterate through all the pages based on the returned object's after field. 100,000 is Fauna's maximum allowed page size. See the Fauna docs on pagination for details.
Also note that this might not reflect whatever your desired logic is for resolving ties.

Elastic Ingest Pipeline split field and create a nested field

Dear freindly helpers,
I have an index that is fed by a database via Kafka. Now this database holds a field that aggregates a couple of pieces of information like so key/value; key/value; (don't ask for the reason, I have no idea who designed it liked that and why ;-) )
93/4; 34/12;
it can be empty, or it can hold 1..n key/value pairs.
I want to use an ingest pipeline and ideally have a "nested" field which holds all values that are in tha field.
Probably like this:
{"categories":
{ "93": 7,
"82": 4
}
}
The use case is the following: we want to visualize the sum of a filtered number of these categories (they tell me how many minutes a specific process took longer) and relate them in ranges.
Example: I filter categories x, y ,z and then group how many documents for the day had no delay, which had a delay up to 5 minutes and which had a delay between 5 and 15 minutes.
I have tried to get the fields neatly separated with the kv processor and wanted to work from there on but it was a complete wrong approach I guess.
"kv": {
"field": "IncomingField",
"field_split": ";",
"value_split": "/",
"target_field": "delays",
"ignore_missing": true,
"trim_key": "\\s",
"trim_value": "\\s",
"ignore_failure": true
}
When I test the pipeline it seems ok
"delays": {
"62": "3",
"86": "2"
}
but there are two things that don't work.
I can't know upfront how many of these combinations I have and thus converting the values from string t int in the same pipeline is an issue.
When I want to create a kibana index pattern I end up with many fields like delay.82 and delay.82.keyword which does not make sense at all for the usecase as I can't filter (get only the sum of delays where the key is one of x,y,z) and aggregate.
I have looked into other processors (dorexpander) but can't really get my head around how to get this working.
I hope my question is clear (I lack english skills, sorry) and that someone can point me at the right direction.
Thank you very much!
You should rather structure them as an array of objects with shared accessors, for instance:
[ {key: 93, value: 7}, ...]
That way, you'll be able to aggregate on categories.key and categories.value.
So this means iterating the categories' entrySet() using a custom script processor like so:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "extracts k/v pairs",
"processors": [
{
"script": {
"source": """
def categories = ctx.categories;
def kv_pairs = new ArrayList();
for (def pair : categories.entrySet()) {
def k = pair.getKey();
def v = pair.getValue();
kv_pairs.add(["key": k, "value": v]);
}
ctx.categories = kv_pairs;
"""
}
}
]
},
"docs": [
{
"_source": {
"categories": {
"82": 4,
"93": 7
}
}
}
]
}
P.S.: Do make sure your categories field is mapped as nested b/c otherwise you'll lose the connections between the keys & the values (also called flattening).

Mongoid each + set vs Critera#set vs update_all + $addToSet

I was wondering what is better performance / memory wise: Iterating over all objects in a collection and calling set/add_to_set or calling set/add_to_set directly on the Criteria or using update all with set/add_to_set.
# update_all
User.where(some_query).update_all(
{
'$addToSet': {
:'some.field.value' => :value_to_add
}
}
)
# each do + add_to_set
User.where(some_query).each do |user|
user.add_to_set(:'some.field.value' => :value_to_add)
end
# Criteria#add_to_set
User.where(some_query).add_to_set(
:'some.field.value' => :value_to_add
)
Any input is appreciated. Thanks!
I started MongoDB server with verbose flag. That's what I got.
Option 1. update_all applied on a selector
2017-04-25 COMMAND command production_v3.$cmd command: update { update: "products", updates: [ { q: { ... }, u: { $addToSet: { test_field: "value_to_add" } }, multi: true, upsert: false } ], ordered: true }
I removed some output so that is easier to read. The flow is:
MongoID generates a single command with query and update specified.
MongoDB server gets the command. It goes through collection and updates each match in [vague] one go.
Note! You may learn from the source code or take as granted. Since MongoID, as per my terminology, generates command to send in step 1, it does not check your models. e.g. If 'some.field.value' is not one of your field in the model User, then the command will still go through and persist on DB.
Option 2. each on a selector
I get find commands like below followed by multiple getMore-s:
2017-04-25 COMMAND command production_v3.products command: find { find: "products", filter: { ... } } 0ms
I also get a huge number of update-s:
2017-04-25 COMMAND command production_v3.$cmd command: update { update: "products", updates: [ { q: { _id: ObjectId('52a6db196c3f4f422500f255') }, u: { $addToSet: { test_field: { $each: [ "value_to_add" ] } } }, multi: false, upsert: false } ], ordered: true } 0ms
The flow is radically different from the 1st option:
MongoID sends a simple query to to MongoDB server. Provided your collection is large enough and the query covers a material chunk of it, the following happens in a loop:
[loop] Respond with a subset of all matches. Leave the rest for the next iteration.
[loop] MongoID gets an array of matching items in Hash format. MongoID parses the each entry and initializes User class for it. That's an expensive operatation!
[loop] For each User instance from the previous step MongoID generates an update commands and sends it to serve. Sockets are expensive too.
[loop] MongoDB gets the command and goes through the collection until the first match. Updates the match. It is quick, but adds up once in a loop.
[loop] MongoID parses the response and updates its User instance accordingly. Expensive and unnecessary.
Option 3. add_to_set applied on a selector
Under the hood it is equivalent to Option 1. Its CPU and Memory overhead is immaterial for the sake of the question.
Conclusion.
Option 2 is so much slower that there is no point in benchmarking. In the particular case I tries, it resulted in 1000s of request to MongoDB and 1000s of User class initialization. Options 1 and 3 resulted in a single request to MongoDB and relied on MongoDB highly optimized engine.

Multiple atomic updates using MongoDB?

I am using Codeigniter and Alex Bilbie's MongoDB library.
In my API that I am developing users can upload images and other users can comment on them.
I have chosen to include the comments as sub documents to the images.
Each comment contains:
Fullname (of author)
Comment
Created_at
So in other words. The users full name is "hard coded" into each comment so if they
later decides to change their names I have a problem.
I read that I can use atomic updates to update all occurrences of the name (like in comments) but how can I do this using Alex´s library? Can I update all places where the name is wrong?
UPDATE
This is how the image document looks like with the comments.
I think that it is pretty strange that MongoDB encourage the use of subdocuments but then does not include a way to update multiple items in an array.
{
"_id": ObjectId("4e9ead773dc793dc01020000"),
"description": "An image",
"category": "accident",
"comments": [
{
"id": ObjectId("4e96bd063dc7937202000000"),
"fullname": "James Bond",
"comment": "This is a comment.",
"created_at": "2011-10-19 13:02:40"
}
],
"created_at": "2011-10-19 12:59:03"
}
Thankful for all help!
I am not familiar with codeignitor, but mb mongodb shell syntax will help you:
db.comments.update( {"Fullname":"Andrew Orsich"},
{ $set : { Fullname: "New name"} }, false, true )
Last true flag indicate that you want update multiple documents. So it is possible to update all comments in one update operation.
BTW: denormalazing (not 'hard coding') data in mongodb and nosql in general is usual operation. Also operation that require update a lot of documents usually work async. But it is up to you.
Update:
db.comments.update( {"comments.Fullname":"Andrew Orsich"},
{ $set : { comments.$.Fullname: "New name"} }, false, true )
But, above query will update full name in first comment on nested array. If you need to affect changes to more than one array element you will need to use multiple update statements.

Resources