IDS 9.04 on unix.
I got a table which has 200000+ rows ,each row has 200+ columns.
when I execute a query (supposed to return 470+ rows with 50 columns)on this table,it takes 100+ secs to return,and dbvisualizer told me :
eexecution time : 4.87 secs
fetch time : 97.56 secs
if I export all the 470+ rows into a file, the file size will less than 800K
UPDATE STATISTICS has been runned,only 50 columns selected,no blob involved ,if I select first 100 rows ,it only need 5 secs to return.
Plz help !
If SELECT FIRST 100 only takes a few seconds, it suggests that the query-plan for FIRST_ROWS is dramatically different to that for ALL_ROWS.
Try running the query with SET EXPLAIN ON; both with and without the FIRST n. It might give you a clue what's going on.
Use:
set explain on avoid_execute;
YOUR_QUERY
set explain off;
And review the sqexplain.out file in your folder.
Related
Toad by default gives only 500 rows. I have a query returning 100,000 rows. How can I determine the elapsed time for the query. When I do ctrl+end to fetch all records it still displays the time taken to fetch first 500 rows only.
One option is to run
select * from your_table order by 1;
as it requires Oracle to fetch all data in order to be able to sort them. Though, you'll get some overhead (because of sorting, of course), but it will still be way faster than CTRL + END option.
I have an ETL Spoon that read a table from Postgres and write into Oracle.
No transformation, no sort. SELECT col1, col2, ... col33 from table.
350 000 rows in input. The performance is 40-50 rec/sec.
I try to read/write the same table from PS to PS with ALL columns (col1...col100) I have 4-5 000 rec/sec
The same if I read/write from Oracle to Oracle: 4-5 000 rec/sec
So, for me, is not a network problem.
If I try with another table Postgres and only 7 columns, the performances are good.
Thanks for the help.
It happened same in my case also, while loading data from Oracle and running it on my local machine(Windows) the processing rate was 40 r/s but it was 3000 r/s for Vertica database.
I couldn't figure it out what was the exact problem but I found a way to increase the row count. It worked from me. you can also do the same.
Right click on the Table Input steps, you will see "Change Number Of Copies to Start"
Please include below in the where clause, This is to avoid duplicates. Because when you choose the option "Change Number Of Copies to Start" the query will be triggered N number of time and return duplicates but keeping below code in where clause will get only distinct records
where ora_hash(v_account_number,10)=${internal.step.copynr}
v_account_number is primary key in my case
10 is, say for example you have chosen 11 copies to start means, 11 - 1 = 10 so it is up to you to set.
Please note this will work, I suggest you to use on local machine for testing purpose but on the server definitely you will not face this issue. so comment the line while deploying to servers.
I'm running queries against a Vertica table with close to 500 columns and only 100 000 rows.
A simple query (like select avg(col1) from mytable) takes 10 seconds, as reported by the Vertica vsql client with the \timing command.
But when checking column query_requests.request_duration_ms for this query, there's no mention of the 10 seconds, it reports less than 100 milliseconds.
The query_requests.start_timestamp column indicates that the beginning of the processing started 10 seconds after I actually executed the command.
The resource_acquisitions table show no delay in resource acquisition, but its queue_entry_timestamp column also shows the queue entry occurred 10 seconds after I actually executed the command.
The same query run on the same data but on a table with only one column returns immediately. And since I'm running the queries directly on a Vertica node, I'm excluding any network latency issue.
It feels like Vertica is doing something before executing the query. This is taking most of the time, and is related to the number of columns of the table. Any idea what it could be, and what I could try to fix it ?
I'm using Vertica 8, in a test environment with no load.
I was running Vertica 8.1.0-1, it seems the issue was caused by a Vertica bug in the query planning phase causing a performance degradation. It was solved in versions >= 8.1.1 :
https://my.vertica.com/docs/ReleaseNotes/8.1./Vertica_8.1.x_Release_Notes.htm
VER-53602 - Optimizer - This fix improves complex query performance during the query planning phase.
I am beginning with Amanzon Redshift.
I have just loaded a big table, millions of rows and 171 fields. The data quality is poor, there are a lot of characters that must be removed.
I have prepare updates for every column, since redshift stores in column mode, I suppose it is faster by column.
UPDATE MyTable SET Field1 = REPLACE(Field1, '~', '');
UPDATE MyTable SET Field2 = REPLACE(Field2, '~', '');
.
.
.
UPDATE MyTable set FieldN = Replace(FieldN, '~', '');
The first 'update' took 1 min. The second one took 1 min and 40 sec...
Every time a run one of the updates, it takes more time than the last one. I have run 19 and the last one took almost 25 min. The time consumed by every 'update' increases one after another.
Another thing is that with the first update, the cpu utilization was minimal, now with the last update it is taking 100%
I have a 3-nodes cluster of dc1.large instances.
I have rebooted the cluster but the problem continues.
Please, I need orientation to find the cause of this problem.
When you update a column, Redshift actually deletes all those rows and inserts new rows with the new value. So there are bunch of space that needs to be reclaimed. So you need to VACUUM your table after the update.
They also recommend that you run ANALYZE after each update to update statistics for the query planner.
http://docs.aws.amazon.com/redshift/latest/dg/r_UPDATE.html
A more optimal way might be
Create another identical table.
Read N ( say 10000) rows at a time from first table, process and load into the second table using s3
loading (instead of insert).
Delete first table and rename second table
If are running into space issues, delete the N migrated rows after every iteration from the first table and run vacuum delete only <name_of_first_table>
Refrences
s3 loading : http://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-run-copy.html
copy table from 's3://<your-bucket-name>/load/key_prefix' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' options;
I have a dataset that has about 1,100,000 rows.
When I load this into my jqGrid, SQL Profiler tells me it takes 29.7 seconds just to return the count of records and then a further 29.8 seconds to return the data to display in the grid.
Please see below the SQL that does the row count against my SQL Server table.
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[vw_ProductSearch_FULL] AS [Extent1]
) AS [GroupBy1]
Can anyone suggest how to improve the performance of this "count" query that is generated by jqGrid?
We need more information about your database in order to recommend improvements to your query. But as Oleg said, you may not need to query for the count.
As to the data in the grid, you have seen that having ~1 million rows in the grid just does not work well. I suggest you either use Pagination or True Scrolling Rows to only load a small subset of the rows at any given time. This should get your performance back up to an acceptable level.