Two (seemingly) identical queries, one is faster, why? - ruby

Two seemingly identical queries (as far as a newbie like me can tell, but the first is faster overall in the partial template rendering time (nothing else changed but the ids statement). Also, when testing through rails console, the latter will visibly run a query, the former will not. I do not understand why - and why the first statement is a few ms faster than the second - though I can guess it is due to the shorter method chaining to get the same result.
UPDATE: My bad. They are not running the same query, but it still is interesting how a select on all columns is faster than a select on one column. Maybe it is a negligible difference compared to the method chaining though.
ids = current_user.activities.map(&:person_id).reverse
SELECT "activities".* FROM "activities" WHERE "activities"."user_id" = 1
SELECT "people".* FROM "people" WHERE "people"."id" IN (1, 4, 12, 15, 3, 14, 17, 10, 5, 6) Rendered activities/_activities.html.haml (7.4ms)
ids = current_user.activities.order('id DESC').select{person_id}.map(&:person_id)
SELECT "activities"."person_id" FROM "activities" WHERE "activities"."user_id" = 1 ORDER BY id DESC
SELECT "people".* FROM "people" WHERE "people"."id" IN (1, 4, 12, 15, 3, 14, 17, 10, 5, 6) Rendered activities/_activities.html.haml (10.3ms)
The purpose of the statement is to retrieve the foreign key reference to people in the order in which they appeared in the activities table, (on its PK).
Note: I use Squeel for SQL.

In the first query, you've chained .map and .reverse, while in the second query, you've used .order('id DESC') .select(person_id) which were unnecessary, if you added .reverse

Related

Why ->latest method in model method read all rows from related table?

In laravel 9 I got last value in related CurrencyHistory table
$currencies = Currency
::getByActive(true)
->withCount('currencyHistories')
->with('latestCurrencyHistory')
->orderBy('ordering', 'asc')
->get();
In model app/Models/Currency.php I have :
public function latestCurrencyHistory()
{
return $this->hasOne('App\Models\CurrencyHistory')->latest();
}
But checking generated sql I see lines like :
SELECT *
FROM `currency_histories`
WHERE `currency_histories`.`currency_id` in (8, 13, 16, 19, 27, 30)
ORDER BY `created_at` desc
I suppose this code is raised by latestCurrencyHistory method and wonder can
I set some limit 1 condition here, as resulting data are too big.
Thanks!
Query is correct. As you eager load your relation for the collection of currencies using with method, you load currency_histories for all of your Currency models in collection.
If you dump the result, you will have currencies with IDs: 8, 13, 16, 19, 27, 30 and one latestCurrencyHistory (if present) for each.

Query using whereIn() not working to generate collection

I have this table store_photos and I am trying to get the photos with complete raffle_date_id upto 12 and then group them by user_id. The query works but then it still generate collection even if its not complete upto 12. How can I achieve this using whereIn() or other similar eloquent?
public function collection(): Collection
{
$months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12];
return StorePhoto::with('user.location')
->whereIn('raffle_date_id', $months)
->groupBy('user_id')
->get();
}
You can try constructing the RAW SQL query, to find what you are trying to achieve.
Then you can slowly build your way in Laravel.
You can try to find those users who have count of 12 (i.e. have 12 records = have 12 months). Then you can get only those users.
SELECT user_id FROM store_photos AS sp
GROUP BY sp.user_id
HAVING COUNT(sp.user_id) = 12;
Then in Laravel you can
DB::table('store_photos')
->groupBy('user_id')
->having(DB::raw('count(user_id)'), '=', 12)
->select('user_id');
*I do not know that much about your app and its business logic / requirement. But you get the idea :)

RethinkDB: Can I group by fields between dates efficiently?

I'd like to group by multiple fields, between two timestamps.
I tried something like:
r.table('my_table').between(r.time(2015, 1, 1, 'Z'), r.now(), {index: "timestamp"}).group("field_a", "field_b").count()
Which takes a lot of time since my table is pretty big. I started thinking about using index in the 'group' part of the query, then I remembered it's impossible to use more than one index in the same rql.
Can I achieve what I need efficiently?
You could create a compound index, and then efficiently compute the count for any of the groups without computing all of them:
r.table('my_table').indexCreate('compound', function(row) {
return [row('field_a'), row('field_b'), row('timestamp')];
})
r.table('my_table').between(
[field_a_val, field_b_val, r.time(2015, 1, 1, 'Z')],
[field_a_val, field_b_val, r.now]
).count()

Data structure for user input of complex nested if-then rules

My application requires the processing of measurement data in part via logical rules that are unknown while coding and will be input manually by the user. An example of such a rule is
IF ( Column_3 < 4.5 ) AND ( ( Column_5 > 3.2 ) OR ( Column_7 <= 0 ) ) THEN Result = 2
where the number of elementary comparisons and the bracketing is, a priori, unknown.
This leads to a design question: What is the most efficient way to allow the user to enter this information in a GUI and how can I represent this information in my program in the best way in order to actually compute the whole IF clause? Actually, I would like to represent the rule in an SQL database and so I need a specific data structure.
Thank you all for your kind help!
Regarding GUI, I feel comfortable with entering the data in text-area box.
Unless your common condition are more than 2-3 lines long it should be ok.
The data structure can be something similar to the below design:
Base_Conditions table
ID
Left_operand
Operator_code (> = <)
Right_operand
Logical_conditions table
ID
Left_condition_id
Left_condition_type ("1" for base condition or "2" for another logical condition)
Operator_code (and/or)
Right_condition_id
Right_condition_type
Rules table
ID
Condition_id
Result_action
To store the condition in a relational DB, the data structure would be something similar to this:
Base_Conditions
[1, Column_3, <, 4.5]
[2, Column_5, >, 3.2]
[3, Column_7, <=, 0]
Logical_conditions
[1, 2, 1, OR, 3, 1]
[2, 1, 1, AND, 1, 2]
Rules
[1, 2, "Result = 2"]

MongoDB ranged pagination

It's said that using skip() for pagination in MongoDB collection with many records is slow and not recommended.
Ranged pagination (based on >_id comparsion) could be used
db.items.find({_id: {$gt: ObjectId('4f4a3ba2751e88780b000000')}});
It's good for displaying prev. & next buttons - but it's not very easy to implement when you want to display actual page numbers 1 ... 5 6 7 ... 124 - you need to pre-calculate from which "_id" each page starts.
So I have two questions:
1) When should I start worry about that? When there're "too many records" with noticeable slowdown for skip()? 1 000? 1 000 000?
2) What is the best approach to show links with actual page numbers when using ranged pagination?
Good question!
"How many is too many?" - that, of course, depends on your data size and performance requirements. I, personally, feel uncomfortable when I skip more than 500-1000 records.
The actual answer depends on your requirements. Here's what modern sites do (or, at least, some of them).
First, navbar looks like this:
1 2 3 ... 457
They get final page number from total record count and page size. Let's jump to page 3. That will involve some skipping from the first record. When results arrive, you know id of first record on page 3.
1 2 3 4 5 ... 457
Let's skip some more and go to page 5.
1 ... 3 4 5 6 7 ... 457
You get the idea. At each point you see first, last and current pages, and also two pages forward and backward from the current page.
Queries
var current_id; // id of first record on current page.
// go to page current+N
db.collection.find({_id: {$gte: current_id}}).
skip(N * page_size).
limit(page_size).
sort({_id: 1});
// go to page current-N
// note that due to the nature of skipping back,
// this query will get you records in reverse order
// (last records on the page being first in the resultset)
// You should reverse them in the app.
db.collection.find({_id: {$lt: current_id}}).
skip((N-1)*page_size).
limit(page_size).
sort({_id: -1});
It's hard to give a general answer because it depends a lot on what query (or queries) you are using to construct the set of results that are being displayed. If the results can be found using only the index and are presented in index order then db.dataset.find().limit().skip() can perform well even with a large number of skips. This is likely the easiest approach to code up. But even in that case, if you can cache page numbers and tie them to index values you can make it faster for the second and third person that wants to view page 71, for example.
In a very dynamic dataset where documents will be added and removed while someone else is paging through data, such caching will become out-of-date quickly and the limit and skip method may be the only one reliable enough to give good results.
I recently encounter the same problem when trying to paginate a request while using a field that wasn't unique, for example "FirstName". The idea of this query is to be able to implement pagination on a non-unique field without using skip()
The main problem here is being able to query for a field that is not unique "FirstName" because the following will happen:
$gt: {"FirstName": "Carlos"} -> this will skip all the records where first name is "Carlos"
$gte: {"FirstName": "Carlos"} -> will always return the same set of data
Therefore the solution I came up with was making the $match portion of the query unique by combining the targeted search field with a secondary field in order to make it a unique search.
Ascending order:
db.customers.aggregate([
{$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$gt: 'Carlos'}}]}},
{$sort: {'FirstName': 1, '_id': 1}},
{$limit: 10}
])
Descending order:
db.customers.aggregate([
{$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$lt: 'Carlos'}}]}},
{$sort: {'FirstName': -1, '_id': 1}},
{$limit: 10}
])
The $match part of this query is basically behaving as an if statement:
if firstName is "Carlos" then it needs to also be greater than this id
if firstName is not equal to "Carlos" then it needs to be greater than "Carlos"
Only problem is that you cannot navigate to an specific page number (it can probably be done with some code manipulation) but other than it solved my problem with pagination for non-unique fields without having to use skip which eats a lot of memory and processing power when getting to the end of whatever dataset you are querying for.

Resources