Will this type of pagination scale? - performance

I need to paginate on a set of models that can/will become large. The results have to be sorted so that the latest entries are the ones that appear on the first page (and then, we can go all the way to the start using 'next' links).
The query to retrieve the first page is the following, 4 is the number of entries I need per page:
SELECT "relationships".* FROM "relationships" WHERE ("relationships".followed_id = 1) ORDER BY created_at DESC LIMIT 4 OFFSET 0;
Since this needs to be sorted and since the number of entries is likely to become large, am I going to run into serious performance issues?
What are my options to make it faster?
My understanding is that an index on 'followed_id' will simply help the where clause. My concern is on the 'order by'

Create an index that contains these two fields in this order (followed_id, created_at)
Now, how large is the large we are talking about here? If it will be of the order of millions.. How about something like the one that follows..
Create an index on keys followed_id, created_at, id (This might change depending upon the fields in select, where and order by clause. I have tailor-made this to your question)
SELECT relationships.*
FROM relationships
JOIN (SELECT id
FROM relationships
WHERE followed_id = 1
ORDER BY created_at
LIMIT 10 OFFSET 10) itable
ON relationships.id = itable.id
ORDER BY relationships.created_at
An explain would yield this:
+----+-------------+---------------+------+---------------+-------------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+-------------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | relationships | ref | sample_rel2 | sample_rel2 | 5 | | 1 | Using where; Using index |
+----+-------------+---------------+------+---------------+-------------+---------+------+------+-----------------------------------------------------+
If you examine carefully, the sub-query containing the order, limit and offset clauses will operate on the index directly instead of the table and finally join with the table to fetch the 10 records.
It makes a difference when at one point your query makes a call like limit 10 offset 10000. It will retrieve all the 10000 records from the table and fetch the first 10. This trick should restrict the traversal to just the index.
An important note: I tested this in MySQL. Other database might have subtle differences in behavior, but the concept holds good no matter what.

you can index these fields. but it depends:
you can assume (mostly) that the created_at is already ordered. So that might by unnecessary. But that more depends on you app.
anyway you should index followed_id (unless its the primary key)

Related

Query a large list of partition ids in dynamo db where partition key is unique

I am new to dynamo db.
The table looks like below
| id |rangekey |timestamp |dimensions
| -----| --------|-------------|----------
| of1 | ACTIVE |1631460979529|{"type":"test","content":"abc"}
| of2 | ACTIVE |1631499979529|{"type":"test","content":"bxh"}
| of3 | ACTIVE |1631499979520|{"type":"practice","content":"xyz"}
| of4 | ACTIVE |1631499979528|{"type":"lecture","content":"lll"}
| of5 | ACTIVE |1631460979927|{"type":"practice","content":"olp","component":"one"}
| .. |.. |... |...
so on.
The id is the partition key and range key is the sort key.The id values are unique.
It seems like poorly designed table when it comes to querying all the id for which the dimensions contains (or begins with)
"type":"test" or "type":"practice"
I am aware of the below approaches:
Scan the table with filter expression like below
contains(dimensions,'"type":"test"') or contains(dimensions,'"type":"practice"')
Query the partition id one
by one with filter expression as above.This seems like a problem
because i have a large list of id (partition keys) approx up-to
5000 .But this could be run in parallel to reduce time
Or can i create a dynamo db stream sort of a materialized view which has the
view containing all id whose dimension is of type test or
practice.Need more insight on this one.
Does any of the above approach seems good cost wise or efficiency.Are there better ways of doing this.Thanks in advance !

How to select nth row in CockroachDB?

If I use something like a SERIAL (which is a random number) for my table's primary key, how can I select a numbered row from my table? In MySQL, I just use the auto incremented ID to select a specific row, but not sure how to approach the problem with an arbitrary numbering sequence.
For reference, here is the table I'm working with:
+--------------------+------+-------+
| id | name | score |
+--------------------+------+-------+
| 235451721728983041 | ABC | 1000 |
| 235451721729015809 | EDF | 1100 |
| 235451721729048577 | GHI | 1200 |
| 235451721729081345 | JKL | 900 |
+--------------------+------+-------+
Using the LIMIT and OFFSET clauses will return the nth row. For example SELECT * FROM tbl ORDER BY col1 LIMIT 1 OFFSET 9 returns the 10th row.
Note that it’s important to include the ORDER BY clause here because you care about the order of the results (if you don’t include ORDER BY, it’s possible that the results are arbitrarily ordered).
If you care about the order in which things were inserted, you could ORDER BY the SERIAL column (id in your case), though it’s not always the case because transaction contention and other things could cause the generated SERIAL values to not be strictly ordered.

Column that sums values once per unique ID, while filtering on type (Oracle Fusion Transportation Intelligence)

I realize that this has been discussed before but haven't seen a solution in a simple CASE expression for adding a column in Oracle FTI - which is as far as my experience goes at the moment unfortunately. My end goal is to have an total Weight for each Category only counting the null type entries and only one Weight per ID (Don't know why null was chosen as the default Type). I need to break the data apart by Type for a total Cost column which is working fine so I didn't include that in the example data below, but because I have to break the data up by Type, I am having trouble eliminating redundant values in my Total Weight results.
My original column which included redundant weights was as follows:
SUM(CASE Type
WHEN null
THEN 'Weight'
ELSE null
END)
Some additional info:
Each ID can have multiple Types (additionally each ID may not always have A or B but should always have null)
Each ID can only have one Weight (But when broken apart by type the value just repeats and messes up results)
Each ID can only have one Category (This doesn't really matter since I already separate this out with a Category column in the results)
Example Data:
ID |Categ. |Type | Weight
1 | Old | A | 1600
1 | Old | B | 1600
1 | Old |(null) | 1600
2 | Old | B | 400
2 | Old |(null) | 400
2 | Old |(null) | 400
3 | New | A | 500
3 | New | B | 500
3 | New |(null) | 500
4 | New | A | 500
4 | New |(null) | 500
4 | New |(null) | 500
Desired Results:
Categ. | Total Weight
Old | 2000
New | 1000
I was trying to figure out how to include a DISTINCT based on ID in the column, but when I put DISTINCT in front of CASE it just eliminates redundant weights so I would just get 500 for Total Weight New.
Additionally, I thought it would be possible to divide the weight by the count of weights before aggregating them, but this didn't seem to work either:
SUM(CASE Type
WHEN null
THEN 'Weight'/COUNT(CASE Type
WHEN null
THEN 'Weight'
ELSE null
END)
ELSE null
END)
I am very appreciative of any help that can be offered, please let me know if there is a simple way to create a column that achieves the desired results. As it may be apparent, I am pretty new to Oracle, so please let me know if there is any additional information that is needed.
Thanks,
Will
You don't need a case statement here. You were on the right track with distinct, but you also need to use an inline view (a subquery in the from the caluse).
The subquery in the from clause, selecting all distinct combinations of (id, categ, weight), allows you to then select from the result set, whereby you select only categ, sum of weight, grouping by categ. The subquery in the from clause has no repeated weights for a given id (unlike the table itself, which is why this is needed).
This would have to be done a little differently if an id were ever to have more than one category, but you noted that an id only ever has one category.
select categ,
sum(weight)
from (select distinct id,
categ,
weight
from tbl)
group by categ;
Fiddle: http://sqlfiddle.com/#!4/11a56/1/0

cassandra query on map in select clause

i am new to cassandra and i am trying to read a row from database which contains values
siteId | country | someMap
1 | US | {a:b, x:z}
2 | PR | {a:b, x:z}
I have also created an index on table using create index on columnfamily(keys(someMap));
but still when i query as select * from table where siteId=1 and someMap contains key 'a'
it returns an entiremap as
1 | US | {a:b, x:z}
Can somebody help me on what should i do to get the value as
1 | US | {a:b}
You can not: even if internally each entry of a Map|List|Set is stored as a column you can only retrieve the whole collection but not part of it. You are not asking cassandra give me the entry of the map containing X, but the row whom map contains X.
HTH,
Carlo

Simple Oracle UPDATE Statement unusually bad performance

every month I do a simple update statement on my oracle database. But, since monday it takes very long. The table grows every month by 5 percent. Now there are 8 million records stored.
The Statement:
update /*+ parallel(destination_tab, 4) */ destination_tab dest
set (full_name, state) =
(select /*+ parallel(source_tab, 4) */ dest.name, src.state
from source_tab src
where src.city = dest.city);
In real there are 20 fields to update, not only two... but so it looks easier to descripe the problem.
explain plan:
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | update statement | | 8517K| 3167M| 579M (50)|999:59:59 |
| 1 | update | destination_tab | | | | |
| 2 | PX COORDINATOR | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 8517K| 3167M| 6198 (1)| 00:01:27 |
| 4 | px block iterator | | 8517K| 3167M| 6198 (1)| 00:01:27 |
| 5 | table access full | DESTINATION_TAB | 8517K| 3167M| 6198 (1)| 00:01:27 |
| 6 | table access by index rowid| SOURCE_TAB | 1 | 56 | 1 (0)| 00:00:01 |
|* 7 | index unique scan | CITY_PK | 1 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Could anyone descripe to me, how this can be? The plan looks very bad! Thank you very very much.
You didn't say how long is too long. You are joining an 8 million row table. Not sure how many rows are in source_tab.
I noticed the execution plan indicates a full table scan of destination_tab. Is the city column on the destination_tab table indexed? If not, try adding an index. If it is, Oracle may be ignoring it because it knows it needs to return every value anyway and destination_tab is the driving table.
No matter how you optimize it, this will always degrade in performance as the tables grow because you are updating every row by fetching a value from the same table joined to another. That is, you are always doing N operations where N is the number of rows in destination_tab.
High-level questions/suggestions:
Do you need to update every row every time? Are only certain rows likely to have changed values? If so, can you somehow predict which rows you need to update and limit your updates to it.
Why are the hints there? If performance changes, I would experiment with dropping hints. It's the optimizer's job to find the best plan for you. By using hints, you are telling the optimizer how to do its job. You'd better be right.
You are updating the full_name column on destination_tab to the name column of the same row. But you are obtaining the name column through a join to the table. It may be quicker to take that out of your select and use something like below. This is a guess. It may not matter.
update destination_tab dest
set full_name = name,
state =
(select src.state
from source_tab src
where src.city = dest.city);
Try the following.
merge
into destination_tab d
using source_tab s
on (d.city = d.city)
when matched then
update
set d.state = s.state
where decode(d.state, s.state, 1, 0) = 0;
If this is a data warehouse, I wouldn't do updates, especially not every row in a large table. I'd probably create a materialized view combining the pieces from various base tables, and do a full refresh when needed (non-atomic: truncate + insert append).
Edit:
As for WHY the current update approach is taking much longer than usual, my guess is that in previous runs Oracle found a good number of blocks needed for the update in buffer cache, and lately Oracle has had to pull a lot from disk into buffer first. You can look into consistent gets and db block gets (logical io) vs physical io (disk).
I understand the comments about the sense of a data warehouse and so on. However, I have to do this update in this kind. The update is part of an ETL workflow. I have to copy every month the complete 8 million records of the table "destination". After this step I have to do the UPDATE which makes problems.
I do not understand the problem, that the performance is so bad day-to-day. Usually, the update runs 45 minutes. Now, it runs about 4 hours. But why? There is no sorting necessary, so the famous reason "sorting on disc instead on main memory" is not possible. What is the problem in my case?
Could there be an difference about the performance between normal update (how I do it) and the merge-update?

Resources