Spring boot in-memory H2 data base reads data in ascending order, whereas I've tested the same data using oracle it reads data in insertion. how to make h2 database read data in insertion order?
If your SELECT statement does not specify ORDER BY clause explicitely, that means you do not care about order, and it is implementation-dependant and non-deterministic. There is no way to match order in database X vs. database Y, Even in a single database, different runs may produce different results. Fix you query if you care about order, or build your test to compare unordered collections, if you don't.
And if you really need "insertion order" than use column populated by SEQUENCE values for ordering.
Related
In oracle 12c while trying to fetch data with doing multiple join and filtering the data, then doing pagination using rownum, the tables first gets joined and filtered and then result set ordered with order by and then data is fetched. so in case of large number of rows all the join operations are very costly. Even though we need just 100 out of 1 million result set still all data need to be prepared. Any idea to improvise this process?
If you only need a subset of data you could hint query with FIRST_ROWS(n).
Improved Response Time with FIRST_ROWS(n) Hint for ORDER BY Queries
You use the FIRST_ROWS(n) hint in cases where you want the first number (n) of rows in the shortest possible time.
what happen if
create table X (...) clustered by(date) sorted by (time)
but inserted without sort
insert into x select * from raw
Will data be sorted after fetched from raw before inserted?
If unsorted data inserted
What does "sorted by" do in create table statement.
It works just hint for later select queries?
The documentation explains:
The CLUSTERED BY and SORTED BY creation commands do not affect how
data is inserted into a table – only how it is read. This means that
users must be careful to insert data correctly by specifying the
number of reducers to be equal to the number of buckets, and using
CLUSTER BY and SORT BY commands in their query.
I think it is clear that you want to insert the data sorted if you are using that option.
No, the data will not be sorted.
As another answer explains, the SORTED BY and CLUSTERED BY options do not change how data will be returned from queries. While the documentation is technically accurate, the purpose of CLUSTER BY is to write underlying data to HDFS in a way that will make subsequent queries faster in some cases. Clustering (bucketing) is similar to partitioning as it allows the query processor to skip reading rows ... If the cluster is chosen wisely. A common use of buckets is sampling data, where you explicitly include only certain buckets, thereby avoiding reads against those excluded.
I want to optimize a query in vertica database. I have table like this
CREATE TABLE data (a INT, b INT, c INT);
and a lot of rows in it (billions)
I fetch some data using whis query
SELECT b, c FROM data WHERE a = 1 AND b IN ( 1,2,3, ...)
but it runs slow. The query plan shows something like this
[Cost: 3M, Rows: 3B (NO STATISTICS)]
The same is shown when I perform explain on
SELECT b, c FROM data WHERE a = 1 AND b = 1
It looks like scan on some part of table. In other databases I can create an index to make such query realy fast, but what can I do in vertica?
Vertica does not have a concept of indexes. You would want to create a query specific projection using the Database Designer if this is a query that you feel is run frequently enough. Each time you create a projection, the data is physically copied and stored on disk.
I would recommend reviewing projection concepts in the documentation.
If you see a NO STATISTICS message in the plan, you can run ANALYZE_STATISTICS on the object.
For further optimization, you might want to use a JOIN rather than IN. Consider using partitions if appropriate.
Creating good projections is the "secret-sauce" of how to make Vertica perform well. Projection design is a bit of an art-form, but there are 3 fundamental concepts that you need to keep in mind:
1) SEGMENTATION: For every row, this determines which node to store the data on, based on the segmentation key. This is important for two reasons: a) DATA SKEW -- if data is heavily skewed then one node will do too much work, slowing down the entire query. b) LOCAL JOINS - if you frequently join two large fact tables, then you want the data to be segmented the same way so that the joined records exist on the same nodes. This is extremely important.
2) ORDER BY: If you are performing frequent FILTER operations in the where clause, such as in your query WHERE a=1, then consider ordering the data by this key first. Ordering will also improve GROUP BY operations. In your case, you would order the projection by columns a then b. Ordering correctly allows Vertica to perform MERGE joins instead of HASH joins which will use less memory. If you are unsure how to order the columns, then generally aim for low to high cardinality which will also improve your compression ratio significantly.
3) PARTITIONING: By partitioning your data with a column which is frequently used in the queries, such as transaction_date, etc, you allow Vertica to perform partition pruning, which reads much less data. It also helps during insert operations, allowing to only affect one small ROS container, instead of the entire file.
Here is an image which can help illustrate how these concepts work together.
I´m currently working on optimzing my database schema in regards of index structures. As I´d like to increase my DDL performance I´m searching for potential drop candidates on my Oracle 12c system. Here´s the scenario in which I don´t know what the consequences for the query performance might be if I drop the index.
Given two indexes on the same table:
- non-unique, single column index IX_A (indexes column A)
- unique, combined index UQ_AB (indexes column A, then B)
Using index monitoring I found that the query optimizer didn´t choose UQ_AB, but only IX_A (probably because it´s smaller and thus faster to read). As UQ_AB contains column A and additionally column B I´d like to drop IX_A. Though I´m not sure if I get any performance penalties if I do so. Does the higher selectivity of the combined unique index have any influence on the execution plans?
It could do, though it's quite likely to be minor (usually). Of course it depends on various things, for example how large the values in column B are.
You can look at various columns in USER_INDEXES to compare the two indexes, such as:
BLEVEL: tells you the "height" of the index tree (well, height is BLEVEL+1)
LEAF_BLOCKS: how many data blocks are occupied by the index values
DISTINCT_KEYS: how "selective" the index is
(You need to have analyzed the table first for these to be accurate). That will give you an idea of how much work Oracle needs to do to find a row using the index.
Of course the only way to really be sure is to benchmark and compare timings or even trace output.
We are using hibernate, jpa and spring and our db is postgres 9. We are using sequence to autogenrate primary key. But what we have noticed is, it is skipping 20 numbers when new records is inserted in that tables and in our sequence we have increment by 1, then why postgres incrementing next value to 20. We do use cache value as "20".
That's normal. You can tell Hibernate not to cache sequence values - at a performance cost to inserts - but this still doesn't mean you won't have sequence gaps.
I wrote more about this on an older answer - here.
Sequences have gaps. That's their nature. If they couldn't have gaps, you could only have one transaction inserting at a time.
See:
CREATE SEQUENCE
Sequence manipulation functions
for details.
If you expect gapless sequences, you need to understand that you'll have to do all your inserts serially, with only one transaction able to do work at a time. To learn more, search for "postgresql gaples sequence". Relying on gapless sequences in the DB is usually a bad idea; instead, have your application construct the user-visible values when it fetches values, using the row_number() window function or similar.
Related:
Re-using deleted IDs