How to convert vinyl to memtx in tarantool? - tarantool

I've created a table in vinyl engine (disk), how can I convert it to memtx engine (ram)?
I have some small table that would be better if it converted to memtx for something like this:
WITH x AS ( -- big table that wont fit in memory
SELECT cat_id, click_count FROM bla WHERE user_id = ?
), mx AS (
SELECT MAX(click_count) max_click FROM x
)
SELECT IFNULL(x.click_count,0)/IFNULL((SELECT max_click FROM mx),1)
, listings.*
FROM listings -- a small table less than 100k records which better in memory
LEFT JOIN x
ON listings.cat_id = x.cat_id
ORDER BY 1 DESC
LIMIT 10
but currently listings table is a vinyl table, how to convert it in memtx?
tried to find in documentation, there's no other method other than create in box.space related to engine.
If there's a way to rename old table, create a new space with proper engine, insert from old table, then drop the old table.

You can do dump/restore with tarantool/dump
Or you may walk the whole space and copy it row by row
Or create a replica with changed engine

Related

Too many or too few rows when using different engines when creating one table from another

I'm trying to create one table from another using
CREATE TABLE IF NOT EXISTS new_data ENGINE = ReplicatedReplacingMergeTree(/clickhouse/fedor/tables/{shard}/subfolder/new_data', '{replica}')
ORDER BY created_at
SETTINGS index_granularity = 8192, allow_nullable_key=TRUE
AS
SELECT *
FROM table
WHERE column IS NOT NULL
When I use
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/fedor/tables/{shard}/subfolder/new_data', '{replica}'),
i've got around 7-9% of expected number of rows i've got from query SELECT...FROM...WHERE
When I use
ENGINE = ReplicatedMergeTree('/clickhouse/fedor/tables/{shard}/subfolder/new_data', '{replica}')
i've got 3 times more than expected (I assume every row occur exactly 3 times)
I would like to have exact number of rows without losses and without duplication
ReplicatedReplacingMergeTree with ORDER BY created_at
will replace many rows with the same created_at value to one rows
How did you delete exists table data before create
ReplicatedMergeTree('/clickhouse/fedor/tables/{shard}/subfolder/new_data'...)?
Did you use DROP TABLE new_data SYNC?
Which engine do you have for table?

Create a dynamic View based on changable partitions table

I have an application that reads from a view in oracle, that view reads from a big table, and the view included functions and joins with other tables.
the view takes a while to run because the table becomes bigger for each month.
I try to partitions the table by year and become faster than before.
my problem is how to create a view based on the changeable partition (by year).
Assuming that PARTITION_COL is a date column that is your partitioning key, you could do this:
create or replace
view THIS_CURRENT_YEAR as
select *
from MY_PARTITIONED_TABLE
where PARTITION_COL >= trunc(sysdate,'YYYY')
and PARTITION_COL < add_months(trunc(sysdate,'YYYY'),12)
in this way you'll get partition pruning where possible.

Trying to figure out max length of Rowid in Oracle

As per my design I want to fetch rowid as in
select rowid r from table_name;
into a C variable. I was wondering what is the max size / length in characters of the rowid.
Currently in one of the biggest tables in my DB we have the max length as 18 and its 18 throughout the table for rowid.
Thanks in advance.
Edit:
Currently the below block of code is iterated and used for multiple tables hence in-order to make the code flexible without introducing the need of defining every table's PK in the query we use ROWID.
select rowid from table_name ... where ....;
delete from table_name where rowid = selectedrowid;
I think as the rowid is picked and used then and there without storing it for future, it is safe to use in this particular scenario.
Please refer to below answer:
Is it safe to use ROWID to locate a Row/Record in Oracle?
I'd say no. This could be safe if for instance the application stores ROWID temporarily(say generating a list of select-able items, each identified with ROWID, but the list is routinely regenerated and not stored). But if ROWID is used in any persistent way it's not safe.
A physical ROWID has a fixed size in a given Oracle version, it does not depend on the number of rows in a table. It consists of the number of the datafile, the number of the block within this file, and the number of the row within this block. Therefore it is unique in the whole database and allows direct access to the block and row without any further lookup.
As things in the IT world continue to grow, it is safe to assume that the format will change in future.
Besides volume there are also structural changes, like the advent of transportable tablespaces, which made it necessary to store the object number (= internal number of the table/partition/subpartion) inside the ROWID.
Or the advent of Index organized tables (mentioned by #ibre5041), which look like a table, but are in reality just an index without such a physical address (because things are moving constantly in an index). This made it necessary to introduce UROWIDs which can store physical and index-based ROWIDs.
Please be aware that a ROWID can change, for instance if the row moves from one table partition to another one, or if the table is defragmented to fill the holes left by many DELETEs.
According documentation ROWID has a length of 10 Byte:
Rowids of Row Pieces
A rowid is effectively a 10-byte physical address of a row.
Every row in a heap-organized table has a rowid unique to this table
that corresponds to the physical address of a row piece. For table
clusters, rows in different tables that are in the same data block can
have the same rowid.
Oracle also documents the (current) format see, Rowid Format
In general you could use the ROWID in your application, provided the affected rows are locked!
Thus your statement may look like this:
CURSOR ... IS
select rowid from table_name ... where .... FOR UPDATE;
delete from table_name where rowid = selectedrowid;
see SELECT FOR UPDATE and FOR UPDATE Cursors
Oracle even provides a shortcut. Instead of where rowid = selectedrowid you can use WHERE CURRENT OF ...

Optimizing a delete... where query with rownum

I'm working with an application that has a large amount of outdated data clogging up a table in my databank. Ideally, I'd want to delete all entries in the table whose reference date is too old:
delete outdatedTable where referenceDate < :deletionCutoffDate
If this statement were to be run, it would take ages to complete, so I'd rather break it up into chunks with the following:
delete outdatedTable where referenceData < :deletionCutoffDate and rownum <= 10000
In testing, this works suprisingly slowly. The following query, however, runs dramatically faster:
delete outdatedTable where rownum <= 10000
I've been reading through multiple blogs and similar questions on StackOverflow, but I haven't yet found a straightforward description of how/whether using rownum affects the Oracle optimizer when there are other Where clauses in the query. In my case, it seems to me as if Oracle checks
referenceData < :deletionCutoffDate
on every single row, executes a massive Select on all matching rows, and only then filters out the top 10000 rows to return. Is this in fact the case? If so, is there any clever way to make Oracle stop checking the Where clause as soon as it's found enough matching rows?
How about a different approach without so much DML on the table. As a permanent solution for future you could go for table partitioning.
Create a new table with required partition(s).
Move ONLY the required rows from your existing table to the new partitioned table.
Once the new table is populated, add the required constraints and indexes.
Drop the old table.
In future, you would just need to DROP the old partitions.
CTAS(create table as select) is another way, however, if you want to have a new table with partition, you would have to go for exchange partition concept.
First of all, you should read about SQL statement's execution plan and learn how to explain in. It will help you to find answers on such questions.
Generally, one single delete is more effective than several chunked. It's main disadvantage is extremal using of undo tablespace.
If you wish to delete most rows of table, much faster way usially a trick:
create table new_table as select * from old_table where date >= :date_limit;
drop table old_table;
rename table new_table to old_table;
... recreate indexes and other stuff ...
If you wish to do it more than once, partitioning is a much better way. If table partitioned by date, you can select actual date quickly and you can drop partion with outdated data in milliseconds.
At last, paritioning if a way to dismiss 'deleting outdated records' at all. Sometimes we need old data, and it's sad if we delete it by own hands. With paritioning you can archive outdated partitions outside of the database, but connects them when you need to access old data.
This is an old request, but I'd like to show another approach (also using partitions).
Depending on what you consider old, you could create corresponding partitions (optimally exactly two; one current, one old; but you could just as well make more), e.g.:
PARTITION BY LIST ( mod(referenceDate,2) )
(
PARTITION year_odd VALUES (1),
PARTITION year_even VALUES (0)
);
This could as well be months (Jan, Feb, ... Dec), decades (XX0X, XX1X, ... XX9X), half years (first_half, second_half), etc. Anything circular.
Then whenever you want to get rid of old data, truncate:
ALTER TABLE mytable TRUNCATE PARTITION year_even;
delete from your_table
where PK not in
(select PK from your_table where rounum<=...) -- these records you want to leave

T-SQL Performance dilemma when looping through a Big table (details inside)

Let's say I have a Big and a Bigger table.
I need to cycle through the Big table, that is indexed but not sequential (since it is a filter of a sequentially indexed Bigger table).
For this example, let's say I needed to cycle through about 20000 rows.
Should I do 20000 of these
set #currentID = (select min(ID) from myData where ID > #currentID)
or
Creating a (big) temporary sequentially indexed table (copy of the Big table) and do 20000 of
#Row = #Row + 1
?
I imagine that doing 20000 filters of the Bigger table just to fetch the next ID is heavy, but so must be filling a big (Big sized) temporary table just to add a dummy identity column.
Is the solution somewhere else?
For example, if I could loop through the results of the select statement (the filter of the Bigger table that originates "table" (actually a resultset) Big) without needing to create temporary tables, it would be ideal, but I seem to be unable to add something like an IDENTITY(1,1) dummy column to the results.
Thanks!
You may want to consider finding out how to do your work set based instead of RBAR. With that said, for very big tables, you may want to not make a temp table so that you are sure that you have live data if you suspect that the proc may run for a while in production. If your proc fails, you'll be able to pick up where you left off. If you use a temp table then if your proc crashes, then you could lose data that hasn't been completed yet.
You need to provide more information on what your end result is, It is only very rarely necessary to do row-by-row processing (and almost always the worst possible choice from a performance perspective). This article will get you started on how to do many tasks in a set-based manner:
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
If you just want a temp table with an identity, here are two methods:
create table #temp ( test varchar (10) , id int identity)
insert #temp (test)
select test from mytable
select test, identity(int) as id into #temp from mytable
I think a join will serve your purposes better.
SELECT BIG.*, BIGGER.*, -- Add additional calcs here involving BIG and BIGGER.
FROM TableBig BIG (NOLOCK)
JOIN TableBigger BIGGER (NOLOCK)
ON BIG.ID = BIGGER.ID
This will limit the set you are working with to. But again it comes down to the specifics of your solution.
Remember too, you can do bulk inserts and bulk updates in this manner too.

Resources