How do I use pg_stat_statements extension in greenplum open source version? - greenplum

I am trying to use a modified greenplum open source version for development. The greenplum version is Greenplum Database 6.0.0-beta.1 build dev (based on PostgreSQL 9.4.24).
I wanted to add pg_stat_statements extension to my database, and I did manage to install it on the database, following https://www.postgresql.org/docs/9.5/pgstatstatements.html. However, this extension doesn't work as expected. It only records non-plannable queries and utility queries. For all plannable queries which modify my tables, there is not a single record.
My question is, is pg_stat_statements compatible with greenplum? Since I am not using the official release, I would like to make sure the original one can work with pg_stat_statements. If so, how can I use it to track all sql queries in greeplum? Thanks.
Below is a example of not recording my select query.
postgres=# select pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------
(1 row)
postgres=# select query from pg_stat_statements;
query
------------------------------------
select pg_stat_statements_reset();
(1 row)
postgres=# select * from test;
id | num
----+-----
1 | 2
3 | 4
(2 rows)
postgres=# select query from pg_stat_statements;
query
------------------------------------
select pg_stat_statements_reset();
(1 row)

Here's what I get from #leskin-in in greenplum slack:
I suppose this is currently not possible.
When Greenplum executes "normal" queries, each of Greenplum segments acts as an independent PostgreSQL instance which executes a plan created by GPDB master.
pg_stat_statements tracks resources usage for a single PostgreSQL instance; as a result, in GPDB it is able to track the resources consumption for each segment independently.
There are several complications PostgreSQL pg_stat_statements does not deal with. An example is that GPDB uses slices. On GPDB segments, these are independent parts of plan trees and are executed as independent queries.
I suppose when a query to pg_stat_statemens is made on GPDB master in current version, the results retrieved are for master only. As "normal" queries are executed in most part by segments, the results are inconsistent with actual resources' consumption by the query.
In open-source Greenplum 5 & 6 there is a Greenplum-specific utility gpperfmon. It provides some of the pg_stat_statements features, and is cluster-aware (shows actual resources consumption for the cluster as a whole, and also several cluster-specific metrics).

Related

Auto Optimizer Stats Collection Job Causing Oracle RDS Database to Restart

We have a Oracle 19C database (19.0.0.0.ru-2021-04.rur-2021-04.r1) on AWS RDS which is hosted on an 4 CPU 32 GB RAM instance. The size of the database is not big (35 GB) and the PGA Aggregate Limit is 8GB & Target is 4GB. Whenever the scheduled internal Oracle Auto Optimizer Stats Collection Job (ORA$AT_OS_OPT_SY_nnn) runs then it consumes substantially high PGA memory (approx 7GB) and sometimes this makes database unstable and AWS loses communication with the RDS instance so it restarts the database.
We thought this may be linked to existing Oracle bug 30846782 (19C+: Fast/Excessive PGA growth when using DBMS_STATS.GATHER_TABLE_STATS) but Oracle & AWS had fixed it in the current 19C version we are using. There are no application level operations that consume this much PGA and the database restart have always happened when the Auto Optimizer Stats Collection Job was running. There are couple of more databases, which are on same version, where same pattern was observed and the database was restarted by AWS. We have disabled the job now on those databases to avoid further occurrence of this issue however we want to run this job as disabling it may cause old stats being available in the database.
Any pointers on how to tackle this issue?
I found the same issue in my AWS RDS Oracle 18c and 19c instances, even though I am not in the same patch level as you.
In my case, I applied this workaround and it worked.
SQL> alter system set "_fix_control"='20424684:OFF' scope=both;
However, before applying this change, I strongly suggest that you test it on your non production environments, and if you can, try to consult with Oracle Support. Dealing with hidden parameters might lead to unexpected side effects, so apply it at your own risk.
Instead of completely abandoning automatic statistics gathering, try find any specific objects that are causing the problem. If only a small number of tables are responsible for a large amount of statistics gathering, you can manually analyze those tables or change their preferences.
First, use the below SQL to see which objects are causing the most statistics gathering. According to the test case in bug 30846782, the problem seems to be only related to the number of times DBMS_STATS is called.
select *
from dba_optstat_operations
order by start_time desc;
In addition, you may be able to find specific SQL statements or sessions that generate a lot of PGA memory with the below query. (However, if the database restarts, it's possible that AWR won't save the recorded values.)
select username, event, sql_id, pga_allocated/1024/1024/1024 pga_allocated_gb, gv$active_session_history.*
from gv$active_session_history
join dba_users on gv$active_session_history.user_id = dba_users.user_id
where pga_allocated/1024/1024/1024 >= 1
order by sample_time desc;
If the problem is only related to a small number of tables with a large number of partitions, you can manually gather the stats on just that table in a separate session. Once the stats are gathered, the table won't be analyzed again until about 10% of the data is changed.
begin
dbms_stats.gather_table_stats(user, 'PGA_STATS_TEST');
end;
/
It's not uncommon for a database to spend a long time gathering statistics, but it is uncommon for a database to constantly analyze thousands of objects. Running into this bug implies there is something unusual about your database - are you constantly dropping and creating objects, or do you have a large number of objects that have 10% of their data modified every day? You may need to add a manual gather step to a few of your processes.
Turning off the automatic statistics job entirely will eventually cause many performance problems. Even if you can't add manual gathering steps, you may still want to keep the job enabled. For example, if tables are being analyzed too frequently, you may want to increase the table preference for the "STALE_PERCENT" threshold from 10% to 20%:
begin
dbms_stats.set_table_prefs
(
ownname => user,
tabname => 'PGA_STATS_TEST',
pname => 'STALE_PERCENT',
pvalue => '20'
);
end;
/

Performance balancing while querying in a remote database

We are working on 2 AIX 7 server and 2 Oracle databases 12.1.0.2.
1 database (called in this topic DB1) is our central PROD db.
The second database (called in this topic DB2) is a production DB too, but for used for a non critical application.
We want to isolate traitement (impact as less as possible DB1) executed on DB2 (with joins) from the central production database DB1.
These traitements uses DBLINK to read DB1 datas.
So the question is:
If we perform a query like
select col1, col2 from table1#dblink_DB1, table2#dblink_DB1 where JOIN DB1/DB2
On which server the JOIN treatment is executed?
Are only reads occurring on DB1 (so low performance case) and JOIN treatment is executed with SGA/CPU on DB2?
Or is everything executing on DB1?
Such queries (which can be executed fully remotely, without access to local database) usually work on the remote db link site and it's much better than if it work on local database, since in this case it would read leading table and run (Select * from table#dblink_DB1 where col=:a) so many times as a number of rows returned from table1#dblink_DB2. Of course, you can force it run locally using hint driving_site, but this case it would be far less effective for both databases. Read more about driving_site hint. And also you should now that dml statements (update/delete/merge/insert) work always on the database where you change data.

SP using table/index with volatile statistics that differ at compile and run time

I’m a longtime MSSQL developer who finds himself back in PL/SQL for the first time since Oracle 7. I’m looking for some tuning advice re a large export stored procedure, which is sporadically and not very reproducably running slow at certain points. This happens around some static working tables which it truncates, fills and uses as part of the export. The code in outline typically looks like this:
create or replace Procedure BigMultiPurposeExport as (
-- about 2000 lines of other code
INSERT WORK_TABLE_5 SELECT WHATEVER1 FROM WHEREVER1;
INSERT WORK_TABLE_5 SELECT WHATEVER2 FROM WHEREVER2;
INSERT WORK_TABLE_5 SELECT WHATEVER3 FROM WHEREVER3;
INSERT WORK_TABLE_5 SELECT WHATEVER4 FROM WHEREVER4;
-- WORK_TABLE_5 now has 0 to ~500k rows whose content can vary drastically from run to run
-- e.g. one hourly run exports 3 whale sightings, next exports all tourist visits to Kenya this decade
-- about 1000 lines of other code
INSERT OUTPUT_TABLE_3
SELECT THIS, THAT, THE_OTHER
FROM BUSINESS_TABLE_1 BT1
INNER JOIN BUSINESS_TABLE_2 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_3 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_1 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_2 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_3 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_5 WT5 ON BT1.ID = WT5.BT1_ID AND WT5.RECORD_TYPE = 21
-- join above is now supported by indexes on BUSINESS_TABLE_1 (ID) and WORK_TABLE_5 (BT1_ID, RECORD_TYPE), originally wasn't
LEFT OUTER JOIN WORK_TABLE_6 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_7 ON etc -- typical join on indexed columns
-- about 4000 lines of other code
)
That final insert into OUTPUT_TABLE_3 usually runs in under 10 seconds, but once in a while on certain customer servers it times out at our default 99 minutes. Then we have them take the tiemout off and run it on Friday night, and it finishes but takes 16 hours.
I narrowed the problem down to the join to WORK_TABLE_5, which had no index support, and put an index on the join terms. The next run took 4 seconds. But success has been intermittent, the customer occasionally gets some slow runs when they drastically change their export selection (i.e. drastically change the data in WORK_TABLE_5). And if we update statistics and rebuild indexes after a timed out export, it runs fine at the next attempt.
So, I am wondering about how best to handle truncating/filling static work tables with static indexes, statistics updated overnight, and a stored procedure compiled when the statistics are nothing like runtime.
I have a few general questions about things I'd like to understand better:
Is the nature of the data in the work table going to substantially effect the query plan? Does Oracle form its query plan when you compile the stored procedure? Could we get a highly inappropriate query plan if we compile the stored procedure with the table empty then use a table with 500k rows at runtime?
I expect that if this were an ad-hoc script then updating statistics on the problem table just before selecting from it would eliminate the sporadic slowdowns. But what if I were to update statistics inside the stored procedure, which is compiled with different statistics from runtime?
Anything else you'd like to add...
Thanks for any advice. I hope my MSSQL preconceptions haven't made me too far off base.
This is happening in Oracle 11g, but the code is deployed to assorted customers using Oracle 10 through 12 and I'd like to cater to all of those if possible.
-- Joel
Huge differences in table or index sizes can most definitely cause performance problems. The solution is to add statistics gathering to the procedure instead of relying on the default statistics jobs.
If you've been away from Oracle since version 7, the most important new feature is the Cost Based Optimizer. Oracle now builds query execution plans based on the optimizer statistics of tables, indexes, columns, expressions, system statistics, outlines, directives, dynamic sampling, etc. If you're a full time Oracle developer you should probably spend a day reading about optimizer statistics. Start with Managing Optimizer Statistics and DBMS_STATS in the official documentation.
Eventually the stored procedure should look like this:
--1: Insert into working tables.
insert into work_table...
--2: Gather statistics on working tables.
dbms_stats.gather_table_stats('SCHEMA_NAME', 'WORK_TABLE', ...);
--3: Use working tables.
insert into other_table select * from work_table...
There are so many statistics features it's hard to know exactly what parameters to use in that second step above. Here are some guesses about some features you might find useful:
DEGREE - One reason people avoid gathering statistics inside a process is the time is takes. You can significantly improve the run time by setting the degree. Although this also uses significantly more resources.
NO_INVALIDATE - It can be tricky to know when exactly are the statistics "set" for a query. Gathering statistics usually quickly invalidates execution plans that were based on old statistics. But not always. If you want to be 100% sure that the next query is using the latest statistics you want to set NO_INVALIDATE=>FALSE.
ESTIMATE_PERCENT In 11g and above you definitely want to use the default, which uses a faster algorithm. In 10g and below you may need to set the value to something low to make the gathering fast enough.
Although Oracle 10g and above comes with default statistics gathering jobs you cannot rely on them for a few reasons:
They are scheduled and may not run at the right time. If a process significantly changes the data then new stats are needed right away, not at 10 PM. If there are a lot of tables that need to be analyzed the job may not get to them all in one day.
Many DBAs disable the jobs. This is ridiculous and almost always a mistake. But you'll find many DBAs that disabled the job because they think they can do it better. Instead of working with the auto tasks, and settings preferences, many DBAs like to throw the whole thing out and replace it with a custom procedure that rots over time.

Read Oracle Cluster name from Oracle RAC using SQL query

I'd like to know what is my RAC cluster name using SQL query. I've found out that it can be retrieved using Oracle tool cemutlo -n or just ocrdump (see http://www.br8dba.com/tag/how-to-display-oracle-cluster-name/). However, it's not possible in this case, because on target environment, I can only execute SQL queries and I don't have access to DBMS installation directory.
I've found out (here https://community.oracle.com/thread/2510788?tstart=0) that it can be done using some unusual queries:
SELECT a.ID, a.CLUSTER_ID FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_OC('CLUS_OC_1_15',NULL,NULL,1,0,0)) a
select * from table(dbms_data_mining.get_model_details_km('CLUS_KM_1_25'))
However, they don't work on my environment and I'm unable to create new model.
Most preferably, I'd just read this from some kind of v$/gv$ tables - but I can't find it there. I guess that's because cluster is far below DBMS.
Finally, I found out that there is no way to do that :(.

SDO_JOIN table join on large tables no longer returns results after upgrade from Oracle 11.2.0.1 to 11.2.0.4

I have the same database on 2 different Oracle servers, one is 11.2.0.1.0 and the other is 11.2.0.4.0.
I have the same 2 geometry tables in both databases and run the following query on both servers. When run on an 11.2.0.1.0 version of Oracle, the query runs for a few minutes and I get results, the same query when run on 11.2.0.4.0 runs for about 3 seconds and returns no results.
The BLPUs table holds 36 million points and the PD_B2 table holds a polygon. I am trying to find all the points that fall in the polygon.
Other spatial queries do return rows but it takes hours and hours whereas the table join suggested in the Oracle Spatial documentation, takes 15 minutes to return all the points.
SELECT /*+ ordered */ a.uprn
FROM TABLE(SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC','mask=ANYINTERACT')) c, blpus a, PD_B2 b
WHERE c.rowid1 = a.rowid
AND c.rowid2 = b.rowid;
The spatial queryies below return SDO_ROWIDSET() when run on the 11.2.0.4 server
select SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC','mask=ANYINTERACT')
from dual;
select SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC')
from dual;
On the 11.2.0.1 server they return results.
I have discovered that a much smaller table of points will work on 11.2.0.4 so it seems that there is a size limit on 11.2.0.4 when using SDO_JOIN where as 11.2.0.1 seems to cope with the large table.
Does anyone know why this is or if there is an actual limit on table size when using SDO_JOIN?
This is strange. I see no reason why SDO_JOIN will not work the same way in 11.2.0.4. At least I have not seen that sort of behavior before. It looks like a bug to me and I suggest you file a service request with Oracle Support so we can take a look. You may need to provide a dump of the tables - or at least of a small enough subset that demonstrates the problem.
That said there are a few things to check: did you apply the 11.2.0.4 patch on the same database ? I.e. nothing changed in terms of table structures or content, grants, etc ?
When you say that the query returns no rows, does it do so immediately ? Or does it perform some processing before completing without anything being returned ?
How large is the PD_B2 table ?
What happens when you do:
select SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC','mask=ANYINTERACT')
from dual;
Does this also return nothing ?
What happens if you do
select SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC')
from dual;
Both should return something that looks like this:
SDO_ROWIDSET(SDO_ROWIDPAIR('AAAW3LAAGAAAADmAAu', 'AAAW3TAAGAAAAg7AAC'), SDO_ROWIDPAIR('AAAW3LAAGAAAADmABE', 'AAAW3TAAGAAAAgrAAA'),...)
You will see this if you run the query in sqlplus. [If you use a GUI (like TOAD or SQLDeveloper) then you may not see that. All those GUIs have problems dealing with objects or arrays.]
But the fact that your 11.2.0.4 tests complete very quickly probably means that you get an empty result back, maybe like this:
SDO_ROWIDSET()
and that confirms that something did not work.
Now, from what you say PD_B2 only contains one row ? If so then there is no reason whatsoever to go via SDO_JOIN. A straightforward spatial query is easier to write. SDO_JOIN only makes sense when both tables being joined contain multiple rows. Then again, if one of the tables is very small (like the PD_B2 table in your case), then it anyway falls back to a simple query.
A simple query would look like this:
SELECT a.uprn
FROM blpus a, PD_B2 b
WHERE sdo_anyinteract (a.geoloc, b.geoloc) = 'TRUE';
Does this return what you expect in 11.2.0.4 ?
You may also want to examine the cost of the query and the query plan used. In sqlplus, do this:
set timing on
set autotrace traceonly
then run the above query. The effect is that sqlplus will not display any results - but it will still fetch and format the output: it will just not print them. At the end you will get a printout of the query plan used as well as some execution statistics and the elapsed time. Please add those results to your question.
Running the query from within your application should have a similar profile: similar response time and database-side costs - assuming you do fetch all the 1.3 million rows.
BTW, what do you do with those results ? Do you show them out as a report ? Do you save them into a table for later analysis ? Surely you do not want to show them all on a map ?
To clarify: SDO_JOIN is only for matching many to many geometries. For matching one to many, the simple SDO_ANYINTERACT() is what you need. As a matter of fact, SDO_JOIN will automatically fall back to a simple SDO_ANYINTERACT when one of the sets is very much smaller than the other. I can't see how SDO_JOIN could be faster in your circumstances (i.e. when searching for all objects that match one object) since it will perform the same as an SDO_ANYINTERACT and will require extra joins to the two tables.
Going back to the original issue - that of a difference in behavior in SDO_JOIN between 11.2.0.1 and 11.2.0.4 that IMO is definitely a bug that you need to report to Oracle Support.

Resources