Why does Vertica query_requests table report that a query took a few milliseconds, while it actually took 10 seconds? - vertica

I'm running queries against a Vertica table with close to 500 columns and only 100 000 rows.
A simple query (like select avg(col1) from mytable) takes 10 seconds, as reported by the Vertica vsql client with the \timing command.
But when checking column query_requests.request_duration_ms for this query, there's no mention of the 10 seconds, it reports less than 100 milliseconds.
The query_requests.start_timestamp column indicates that the beginning of the processing started 10 seconds after I actually executed the command.
The resource_acquisitions table show no delay in resource acquisition, but its queue_entry_timestamp column also shows the queue entry occurred 10 seconds after I actually executed the command.
The same query run on the same data but on a table with only one column returns immediately. And since I'm running the queries directly on a Vertica node, I'm excluding any network latency issue.
It feels like Vertica is doing something before executing the query. This is taking most of the time, and is related to the number of columns of the table. Any idea what it could be, and what I could try to fix it ?
I'm using Vertica 8, in a test environment with no load.

I was running Vertica 8.1.0-1, it seems the issue was caused by a Vertica bug in the query planning phase causing a performance degradation. It was solved in versions >= 8.1.1 :
https://my.vertica.com/docs/ReleaseNotes/8.1./Vertica_8.1.x_Release_Notes.htm
VER-53602 - Optimizer - This fix improves complex query performance during the query planning phase.

Related

Spoon run slow from Postgres to Oracle

I have an ETL Spoon that read a table from Postgres and write into Oracle.
No transformation, no sort. SELECT col1, col2, ... col33 from table.
350 000 rows in input. The performance is 40-50 rec/sec.
I try to read/write the same table from PS to PS with ALL columns (col1...col100) I have 4-5 000 rec/sec
The same if I read/write from Oracle to Oracle: 4-5 000 rec/sec
So, for me, is not a network problem.
If I try with another table Postgres and only 7 columns, the performances are good.
Thanks for the help.
It happened same in my case also, while loading data from Oracle and running it on my local machine(Windows) the processing rate was 40 r/s but it was 3000 r/s for Vertica database.
I couldn't figure it out what was the exact problem but I found a way to increase the row count. It worked from me. you can also do the same.
Right click on the Table Input steps, you will see "Change Number Of Copies to Start"
Please include below in the where clause, This is to avoid duplicates. Because when you choose the option "Change Number Of Copies to Start" the query will be triggered N number of time and return duplicates but keeping below code in where clause will get only distinct records
where ora_hash(v_account_number,10)=${internal.step.copynr}
v_account_number is primary key in my case
10 is, say for example you have chosen 11 copies to start means, 11 - 1 = 10 so it is up to you to set.
Please note this will work, I suggest you to use on local machine for testing purpose but on the server definitely you will not face this issue. so comment the line while deploying to servers.

Sporadic long query execution in Oracle RDS

We are experiencing sporadic long queries execution in our application. The database is Oracle 12.1 RDS. I can see in AppDynamics that query was executed for 13s, I'm executing it myself in Oracle SQL Developer and it never takes longer than 0.1s. I can't put query here as there are 3 of them that sporadically give execution time longer than 10s and for each of them I can't reproduce it in SQL Developer.
We've started to log Execution plan for long running queries using /*+ gather_plan_statistics */ and it is the same as if query executed for 0.1s except the fact that it doesn't have such a record "1 SQL Plan Directive used for this statement".
I'm looking for any ideas that could help to identify the root cause of this behavior.
One possibility is that you've got a cached execution plan which works fine for most parameter values, or combination of parameter values, but which fails badly for certain values/combinations. You can try adding a non-filtering predicate such as 1 = 1 to your WHERE clause. I've read but haven't tested that this can be used to force a hard parse, but it may be that you need to change the value (e.g. 1 = 1, 2 = 2, 3 = 3, etc) for each execution of your query.

Oracle database performance issue

We are trying to pull data from an oracle database but seem to be getting very low performance.
We have a table of around 10M rows and we have an index via which we are pulling around 1.3k rows {select * from tab where indexed_field = 'value'} (in a simplified form).
SQuirreL reports the query taking "execution: 0.182s, building output: 28.921s". The returned data occupies something like 340kB (eg, when copied/pasted into a text file).
Sometimes the building output phase takes much longer (>5 minutes), particularly the first time a query is run. Repeating it seems to run much faster - eg the 29s value above. Is this likely to just be the result of a transient overload on the database, of might it be due to buffering the repeat data?
Is a second per 50 rows (13kB) a reasonable figure or is this unexpectedly large? (This is unlikely to be a network issue.)
Is it possible that the dbms if failing to leverage the fact that the data could be grouped physically (by having the physical order the same as the index order) and is doing a separate disk read per row, and if so how can it be persuaded to be more efficient?
There isn't much odd about the data - 22 columns per row, mostly defined as varchar2(250) though usually containing a few tens of chars. I'm not sure how big the ironware running Oracle is, but it lives in a datacentre so probably not too puny.
Any thoughts gratefully received.
kfinity> Have you tried setting your fetch size larger, like 500 or so?
That's the one! Speeds it up by an order of magnitude. 1.3k rows in 2.5s, 9.5k rows in 19s. Thanks for that suggestion.
BTW, doing select 1 only provides a speedup of about 10%, which I guess suggests that disk access wasn't the bottleneck.
others>
The fetch plan is:
Operation Options Object Mode Cost Bytes Cardinality
0 SELECT STATEMENT ALL_ROWS 6 17544 86
1 TABLE ACCESS BY INDEX ROWID BATCHED TAB ANALYZED 6 17544 86
2 INDEX RANGE SCAN TAB_IDX ANALYZED 3 86
which, with my limited understanding, looks OK.
The "sho parameter" things didn't work (SQL errors), apart from the select which gave:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE 12.1.0.2.0 Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production
I guess the only outstanding question is "what's the downside of setting the fetch size to a large value?". Given that we will always end up reading the entire result set (unless there is an exception) my guess would be "not much". Is that right?
Anyway, many thanks to those who responded and a big thanks for the solution.
1.3k rows on a table of 10M rows for oracle is not too big.
The reason why second results are faster than first results is that oracle load data in RAM on the fisrt query and just read it from RAM on the second.
Are you sure that the index is well used ? Maybe you can do an explain plan and show us the result ?
Few immediate actions to be taken are:
Rebuild the index on table.
Gather the stats on table.
execute following before rerun the query to extract execution plan.
sql> set autotrace traceonly enable ;
turn this off by:
sql> set autotrace off ;
Also,provide result of following :
sql> sho parameter SGA
sql> sho parameter cursor
sql> select banner from v$version;
Abhi

Update Oracle statistics while query is running to improve performance?

I've got a Oracle Insert query that runs and it has been busy for almost 24 hours.
The SELECT part of the statement had a cost of 211M.
Now I've updated the statistics on the source tables and the cost has came down significantly to 2M.
Should I stop and restart my INSERT statement, or will the new updated statistics automatically have an effect and start speeding up the performance?
I'm using Oracle 11g.
Should I stop and restart my INSERT statement, or will the new updated statistics automatically have an effect and start speeding up the performance?
New statistics will be used the next time Oracle parses them.
So, optimizer cannot update the execution plan based on the stats gathered at run time, since the query is already parsed and the execution plan has already been chosen.
What you can expect from 12c optimizer is, adaptive dynamic execution. It has the ability to adapt the plan at run time based on actual execution statistics. You can read more about it here http://docs.oracle.com/database/121/TGSQL/tgsql_optcncpt.htm

SQLAlchemy raw sql vs expression language statements

When inserting multiple rows in a MySQL-DB via a SQLA-Expression-Language statement, f.e.
Foo.__table__.insert().execute([{'bar': 1}, {'bar': 2}, {'bar': 3}])
it´s extremly slow, when compared to the execution of a "raw" sql statement for the same task, i.e.
engine.execute("insert into foo (bar) values (1),(2),(3)")
What is the reason for this? Can´t SQLA generate a single bulk insert statement and therefore executes multiple inserts? Due to the speed limits of the orm, i need a fast way to add several thousand rows at once, but the SQLA-Expression-Language-Version is too slow. So, do i need to write the raw sql by myself? The documentation isn't too clear about this.
I ran a speed test with the ORM insert, the ORM with preassigned PK and the SQLA bulk insert (see SQLA bulk insert speed) like this (https://gist.github.com/3341940):
SqlAlchemy ORM: Total time for 500 records 9.61418914795 secs
SqlAlchemy ORM pk given: Total time for 500 records 9.56391906738 secs
SqlAlchemy Core: Total time for 500 records 9.5362598896 secs
SQLAlchemy RAW String Execution: Total time for 500 records 1.233677 secs
As you can see, there is practically no difference between the three versions. Only the execution of a raw string insert, where all the records are included in the raw sql statement is significantly faster. Thus, for fast inserts, SQLA seems sub-optimal.
It seems that the special INSERT with multiple values only became recently supported(0.8 unreleased), you can see the note at the bottom of this section regarding the difference between executemany(what execute with a list does) and a multiple-VALUES INSERT:
http://docs.sqlalchemy.org/ru/latest/core/expression_api.html#sqlalchemy.sql.expression.Insert.values
This should explain the performance difference you see. You could try installing the development version and repeating the tests with the altered calling syntax, mentioned in the link, to confirm.

Resources