Oracle 12c full text index maintainence - oracle

We are planning to use a context index for full text search in Oracle 12c standard edition.
The data on which search will run is a JSON containing one Channel post and its replies from a 3rd party tool that is loaded into our database.(basically, all the chats and replies(including other attributes like timestamp/user etc) are stored in this table).
We are expecting about 50k rows of data per year and a daily of 100-150 DMLs per day. Our index is "SYNC ON COMMIT" currently,so what are the recommendations for optimizing the Oracle Text index?

First, let me preface my response with a disclaimer: I am exploring using Oracle Text as part of a POC currently, and my knowledge is somewhat limited as we're still in the research phase. Additionally, our datasets are in the 10s of millions with 100k DML operations daily.
From what I've read, Oracle Docs suggest scheduling both a FULL and REBUILD optimization for indexes which incur DML, so I currently have implemented the following in our dev environment.
execute ctx_ddl.optimize_index('channel_json_ctx_idx', 'FULL'); --run daily
execute ctx_ddl.optimize_index('channel_json_ctx_idx', 'REBUILD'); --run weekly
I cannot imagine with the dataset you've identified that your index will really become that fragmented and cause performance issues. You could probably get away with less frequent optimizations than what I've mentioned.
You could even forego scheduling the optimization and benchmark your performance. If you see it start to degrade, then note the timespan and perhaps count of DML operations for reference. Then run a 'FULL'. Then test performance. If performance improves, create a schedule. If performance does not improve, then run 'REBUILD'. Test performance. Assuming performance improves then you could schedule the 'REBUILD'for that time range and consider adding a more frequent 'FULL'.

Related

Monitor on all indexes

I'm using Oracle 11g.
A lot of times I need to check in
stats$
tables and snapshots tables if an index has used or no.
My question is - are there any disadvantages or problems to use
MONITORING USAGE
for all of my indexes? Just put them all inside.
Maybe build a procedure that perform it on every new index?
Thanks a lot.
Index monitoring has serious issues pre-12.2:
1) A boolean flag generally isn't enough information to make a reasonable determination on whether to keep an index or not. Was that index used 1 time because a developer forced an index using a hint or was the index called hundreds of times?
2) The check happens at parse phase, not execute phase. (See the previous point why this is an issue).
3) There is a performance impact. While that performance impact is small, especially if you are only turning it on for a single index, that impact will be more significant if you try turning it on for all indexes, system wide.
Index Monitoring is designed to be turned on for a single index (or small group of indexes), wait some reasonable time, then turn it off and check the stats. It isn't meant to just be on all the time.
In 12.2, Index Monitoring was completely overhauled so that it is on by default for all indexes (I'm pretty sure you can't even turn it off). Oracle has largely solved most of the issues with index monitoring in previous Oracle versions: the performance impact is insignificant, more meaningful stats (an actual count of number of times used), and the stats are updated on execute phase, not parse phase.
Tim Hall has a good write up on Index Monitoring here. Connor McDonald has an excellent YouTube video on why Index Monitoring had issues pre-12.2, and what Oracle has done to address these issues here (start watching from 19:15 - 27:05).

Oracle SQL table and index statistics

I've been doing some reading on gathering table and index statistics on Oracle databases but it's left me ... confused.
For the sake of argument, let's assume Oracle 11gR2 as the RDBMS. Regarding gathering table and index statistics, when should it be done, which is the preferred way of doing it, and does Oracle really automatically gather the necessary statistics for us?
Regarding the first point: when should it be done. I've read that, as a rule of thumb, gathering table and index statistics should be done after around 10% of the table's records have been modified (inserted, updated, etc) since the last time the table was analyzed.
Regarding the second point: which is the preferred way of doing it. If we want to calculate both table and index statistics, does executing DBMS_STATS.GATHER_TABLE_STATS with default options, assuming the table is not partitioned, suffice?
Regarding the third point:does Oracle really gather the necessary statistics automatically for us. If this is the case, should i not worry abouth gathering table statistics (see points 1 and 2)?
Thanks in advance.
EDIT: Following the comment by ammoQ, i realized that the question is not clear in what the use case really is, here. My question is about tables that aren' "manipulated" via a user's actions, i.e manually, rather via procedures typically ran by database jobs. Take my example, for instance. My ETL process loads several tables on a daily basis and it does so in approximately 1 hour. Of that 1 hour, about half is spent analyzing the tables themselves. Thus, the tables area analyzed daily, following insertions or updates. This seems overkill, hence the question.
In general, you need to have statistics that are representative (not necessarily accurate) and that give you the right execution plan. By default, Oracle will run a statistics collection job, during the nightly batch window. That may be fine for some applications, but if you have a data warehouse, which presumably includes a regular data load process, then managing the stats should be part of that process. Note that I have said "managing" and not "collecting" statistics. That's just my way of saying that there are other options for statistics in addition to just gathering statistics, although gathering statistics would be where I would start.
There are also things that can be done to optimize statistics gathering, incremental statistics for example. The other thing that is very important is is to use the AUTO Sample size when gathering stats. Do not specify a percentage, not even 100%. The reason is that auto sample size enables a number of internal optimizations and capabilities that are disabled if you do not use AUTO sample size.
So, taking your specific points
10% staleness is pretty random, and is just a number used by the auto stats.
dbms_stats.gather_table_stats() with default values is the preferred method. One parameter that I may change would be the DEGREE, to enable stats gathering in parallel
In 12c, basic stats are gathered on load into an empty table (or empty partition). Stats are built on indexes when indexes are created. So to reiterate what I said above, stats gathering should be part of your ELT process.
I hope that makes sense and helps.

Building an sql execution plan history

Using Oracle 11.2.
Is there a standard approach to build an execution plan history like how an execution plan may change ofer time? I thought about tracking changes in dba_hist_sql_plan (plans are kept there for 10 days in our env.). But are all plans stored in dba_hist_sql_plan or does that depent on the awr config? Or is there a standard approach for doing that (may without dba_hist_sql_plan).
I'm rather new to Oracle so every idea is welcomed.
Not all plans are stored in the AWR. Generally, though, all the plans that you're interested in are. Few people are interested in tracking changes to query plans for queries that don't consume a lot of resources in a particular snapshot window. They're generally only interested in the behavior of the top resource consumers. You can adjust the N in the top N queries that AWR captures with each snapshot.
If you're looking at a single database, the simplest option is to keep data in the AWR as long as you're likely to need it. If you want data back beyond 10 days, the simplest option is to increase your AWR retention. AWR data doesn't consume that much disk space so it's generally easier to let Oracle retain more data than to write anything of your own.
Sometimes, people configure replication processes that take AWR data from many individual databases and write them to a single central database. That requires a decent amount of setup but it makes it much easier to compare information across databases (either to compare data among dev/ test/ staging/ production environments or to compare across multiple client-specific instances).

How do I correctly performance test SELECT queries with Oracle?

I would like to test two queries to find out their performance as apposed to just looking at the execution plan. I have seen Tom Kyte do this all the time on his website as a way to gather evidence on his theories.
I believe there are many pitfalls in performance testing, for example, when i run a query in SQL developer for the first time, that query might return some fair number. Running that exact same query again, returns instantaneously. There must be some sort of caching on the server or client going on and I understand this is important - however I am only interested in non cached performance.
What are the guidelines to performance test? AND how do I write a performance test which repeats the query? Do i just write an anonymous block & loop? How do i get timing information, averages, medians, std deviations?
Oracle (and other databases) cache queries, which is where you see the behavior you describe. A "hard" parse means there's no query plan for the query, which leaves Oracle to figure out the query plan based on indexes and statistics. A "soft" parse is what happens when you run the identical query afterwards, and receive an instantaneous result, because the query plan exists & Oracle re-uses it. See the Ask Tom question about it for more details.
Be aware of the EXPLAIN output:
With the cost-based optimizer, execution plans can and do change as the underlying costs change. EXPLAIN PLAN output shows how Oracle runs the SQL statement when the statement was explained. This can differ from the plan during actual execution for a SQL statement, because of differences in the execution environment and explain plan environment.
Focusing on the non-cached performance gives a worst-case scenario, but given that caching will occur - non-cached benchmarks aren't realistic in everyday use.
To build off OMG Ponies answer, tuning based on timing is something that's possible, but not realistic. You'd have to start either with a fully-cached buffer cache in every case, or a fully-empty buffer cache, and neither of those is going to be representative of reality - especially if there's no competing load.
When I'm tuning, it's generally against a live system with activity, and I focus on tuning logical I/Os, either through using the extended SQL trace (dbms_monitor.session_trace_enable / dbms_monitor.session_trace_disable) and the tkprof utility, or using SQL*Plus and set autotrace traceonly - which does all the work of the query, but throws the output away, because I'm usually not interested in watching a jillion rows scroll by.
The exact mechanism usually involves bound SQL, using something like the following:
variable :my_bind1 number;
variable :my_bind2 varchar2(30);
begin
:my_bind1 := 42;
:my_bind2 := 'some meaningful string';
end;
/
set timing on;
set autotrace traceonly;
[godawful query with binds]
set autotrace off;
Within the results, I'm looking for the plan I'd expect, a comparative value for sorts - assuming any exist - and most importantly, the number of consistent I/Os. That's how many blocks Oracle had to read in consistent mode to satisfy the query. I can't find the original source of the quote, but I think it's Cary Milsap of Method R.
"Tune your logical I/Os, and your physical I/Os will follow."
In performance tuning, if the only piece of data you look at is wall-clock time, you will only be getting a small part of the whole picture. You need to at least look at the execution plan, as well as IO stats, in order to work out how best to tune the query.
Also, you need to eliminate other causes of performance issues - e.g. if there is a general performance issue across many queries, it might not be the fault of just one of them - it might be an architecture problem, or significant concurrent activity on the database, or even an underlying hardware issue.
I've had similar issues to what you describe before; e.g. a certain type of query which should be very fast was taking 30 seconds to run on the first time, then would settle down to a second or two. As soon as I looked at the execution plan, however, it was obvious that it was using a full table scan, because it couldn't use the unique index that had been created. The first time the query ran, most of the data was loaded into the cache (in fact, there were two levels of cache involved - the database buffer cache, as well as a storage-level cache over the disks) so subsequent full table scans were extremely fast.
What is correctly ?
Since 11g there are a few extra complications to take into account. The optimizer pre peeking has become a lot smarter and sql plan stability has a BIG influence. These two features make the database auto tuning but can also have unexpected effects during performance tests, for example because not all variations of the plans are known and accepted at the beginning of the tests.
This might be the cause that a second test run, the day after the first run, suddenly runs much quicker, without any apparent changes.
Since 11g performance testing is less important, compared to writing logically correct code. For example a Cartesian product and filtering out one distinct value van be functional correct but is in most of the cases wrong code because it fetches more data than logically needed.
If the queries fetches the data that is really needed and is in the correct control structure, have the database processes tune the code during the maintenance windows. In many cases the differences between the test environment and production are such that a comparison can not be safely made.
Don't get me wrong, testing is important but mostly for the logic compared to performance testing before 11g, there are extra steps to be taken.
For nice reading see Oracle® Database 2 Day + Performance Tuning Guide 11g Release 2 (11.2)

Does Oracle 11g automatically index fields frequently used for full table scans?

I have an app using an Oracle 11g database. I have a fairly large table (~50k rows) which I query thus:
SELECT omg, ponies FROM table WHERE x = 4
Field x was not indexed, I discovered. This query happens a lot, but the thing is that the performance wasn't too bad. Adding an index on x did make the queries approximately twice as fast, which is far less than I expected. On, say, MySQL, it would've made the query ten times faster, at the very least. (Edit: I did test this on MySQL, and there saw a huge difference.)
I'm suspecting Oracle adds some kind of automatic index when it detects that I query a non-indexed field often. Am I correct? I can find nothing even implying this in the docs.
As has already been indicated, Oracle11g does NOT dynamically build indexes based on prior experience. It is certainly possible and indeed happens often that adding an index under the right conditions will produce the order of magnitude improvement you note.
But as has also already been noted, 50K (seemingly short?) rows is nothing to Oracle. The Oracle database in fact has a great deal of intelligence that allows it to scan data without indexes most efficiently. Every new release of the Oracle RDBMS gets better at moving large amounts of data. I would suggest to you that the reason Oracle was so close to its "best" timing even without the index as compared to MySQL is that Oracle is just a more intelligent database under the covers.
However, the Oracle RDBMS does have many features that touch upon the subject area you have opened. For example:
10g introduced a feature called AUTOMATIC SQL TUNING which is exposed via a gui known as the SQL TUNING ADVISOR. This feature is intended to analyze queries on its own, in depth and includes the ability to do WHAT-IF analysis of alternative query plans. This includes simulation of indexes which do not actually exist. However, this would not explain any performance differences you have seen because the feature needs to be turned on and it does not actually build any indexes, it only makes recommendations for the DBA to make indexes, among other things.
11g includes AUTOMATIC STATISTICS GATHERING which when enabled will automatically collect statistics on database objects as it deems necessary based on activity on those objects.
Thus the Oracle RDBMS is doing what you have suggested, dynamically altering its environment on its own based on its experience with your workload over time in order to improve performance. Creating indexes on the fly is just not one of the things is does yet. As an aside, this has been hinted to by Oracle in private sevearl times so I figure it is in the works for some future release.
Does Oracle 11g automatically index fields frequently used for full table scans?
No.
In regards the MySQL issue, what storage engine you use can make a difference.
"MyISAM relies on the operating system for caching reads and writes to the data rows while InnoDB does this within the engine itself"
Oracle will cache the table/data rows, so it won't need to hit the disk. depending on the OS and hardware, there's a chance that MySQL MyISAM had to physically read the data off the disk each time.
~50K rows, depending greatly on how big each row is, could conceivably be stored in under 1000 blocks, which could be quickly read into the buffer cache by a full table scan (FTS) in under 50 multi-block reads.
Adding appropriate index(es) will allow queries on the table to scale smoothly as the data volume and/or access frequency goes up.
"Adding an index on x did make the
queries approximately twice as fast,
which is far less than I expected. On,
say, MySQL, it would've made the query
ten times faster, at the very least."
How many distinct values of X are there? Are they clustered in one part of the table or spread evenly throughout it?
Indexes are not some voodoo device: they must obey the laws of physics.
edit
"Duplicates could appear, but as it
is, there are none."
If that column has neither a unique constraint nor a unique index the optimizer will choose an execution path on the basis that there could be duplicate values in that column. This is the value of declaring the data model as accuratley as possible: the provision of metadata to the optimizer. Keeping the statistics up to date is also very useful in this regard.
You should have a look at the estimated execution plan for your query, before and after the index has been created. (Also, make sure that the statistics are up-to-date on your table.) That will tell you what exactly is happening and why performance is what it is.
50k rows is not that big of a table, so I wouldn't be surprised if the performance was decent even without the index. Thus adding the index to equation can't really bring much improvement to query execution speed.

Resources