Will reloading all TDE templates tigger reindexing cause ML Performance issue? - performance

Now I am using gradle mlReloadSchemas tasks to reload TDE templates.
I guess even if the change is for one tde file only, the reload schemas task may delete all in DB and load all TDE templates to ML DB.
I wonder whether it will cause a performance issue for ML. Will that trigger indexing even for the TDE files that have not yet changed?
I am using DevOps pipeline to trigger the schema reload from GIT repository. As such, I could not load only the change TDE file. I have to reload everything. If there is performance issue, how to load only changed file with the pipeline?

Redeploying TDE can cause reindexing. How many records to be reindexed depends upon the context matching for those TDE.
A properly resourced cluster should be able to handle the load of reindexing.
That being said, the merging activities can compete with online traffic and query demands. You can help minimize the impact by setting the reindex throttle to a lower level (1-5 with 1 being the lowest), and you can set a background-io limit to restrict the amount of IO any node will use for background activities such as merges and backups.
You can also choose when to enable/disable reindexing, and adjust the reindexing level to a higher/lower level at different periods.
https://help.marklogic.com/Knowledgebase/Article/View/how-reindexing-works-and-its-impact-on-performance
https://help.marklogic.com/Knowledgebase/Article/View/indexing-best-practices

Related

Azure cognitive search indexer blob storage

I am stuck in a complicated situation and appreciate that if somebody can help.
So I was testing indexing blob storage( pdf files) and indexed a copy of my storage in qa environment that cost me some money.
My question is that:
Is there any solution to use this index in production without indexing again?
I found a solution to copy the index and that works fine but when I add an indexer that is connect to production blob storage it start indexing from scratch again( as I expected). Is there any solution to avid this? Is there any solution to ask indexer to index from now on?
I tried to use the index and the indexer that I already have by changing the subscription to prod. But I have to change the data source for indexer to point at production blob storage and in this case I get an error :
Indexer 'filesIndexer' currently references data source 'qafilesds' and cannot be updated to reference a different datasource 'prodfilesds' because it has a non-empty change tracking state, or it is currently in progress. You can use Reset API to reset the indexer's change tracking state when it is no longer in progress, and retry this call.
A simple answer to your first question is to simply use the qa index you built.
A more complicated answer is to switch from the push model you are using now to a pull model. From your explanation above I assume all of your content comes from blob storage. And you have configured an indexer to do the indexing for you. This is known as the pull model.
The alternative is to use the Azure Cognitive Search SDK to write your own application that submits content to the index instead. In this case you do not use the built-in indexer, only the index itself. Then you are free to use whatever logic you want to determine what to index and what to skip. You can even enable your storage accounts to notify your application with events when content is updated.

Retention policy to TFS Code Search Server (Elastic Search)

We have TFS 2017.3 with separate Code Search server.
We have huge TFS DB (about 1.6TB), in the code search server we have 700GB dis space.
After few weeks the disk space running out and the code search not work in the tfs.
After we increase the disk space the search back to work.
How can we make retention policy to delete old code search data (index)? we don't want to increased more the disk space.
Search indexing (Code and Work Item) works in 2 phases:
Bulk Indexing (BI) where the entire code and work item artifacts in all projects/repositories under a Collection are indexed. This is a
time consuming operation and depends on the size of the artifacts
under the collection.
Continuous Indexing (CI) which handles all incremental updates to the artifacts (add/updated/delete) and indexes them. This is
notification based model where the indexer listens to TFS events
and operates based on those event notifications. CI handles almost
all update operations including CRUD operations at
Project/Repository/Collection layer (such as Repository renames,
Project add/deletes, etc.). The operation time for these CI would
depend again on the size of the incremental update. BI always
precedes CI i.e. a CI will never execute on a project/repository
until BI is completed for the same.
How to Clean-up Index Data and Re-index please follow below steps:
Pause Indexing for all collections. Run the following script on TFS
Configuration DB
https://github.com/Microsoft/Code-Search/blob/master/PauseIndexing.ps1
Login to the machine where the Elasticsearch (ES) is running
Stop the ES service
Delete the entire Search Index folder (something like,
C:\TfsData\Search\IndexStore, or wherever you had configured it to
be)
Restart the TFS Job Agent service(s) on the AT machines
Delete the following tables from each of the collection DBs
DELETE FROM [Search].[tbl_IndexingUnit]
DELETE FROM [Search].[tbl_IndexingUnitChangeEvent]
DELETE FROM [Search].[tbl_IndexingUnitChangeEventArchive]
DELETE FROM [Search].[tbl_JobYield]
DELETE FROM [Search].[tbl_TreeStore]
DELETE FROM [Search].[tbl_DisabledFiles]
DELETE FROM [Search].[tbl_ResourceLockTable]
Restart the ES service
Run this script on TFS Configuration DB:
https://github.com/Microsoft/Code-Search/blob/master/ResumeIndexing.ps1
Run this script (pick from the correct TFS release folder) on each of
the collections:
https://github.com/Microsoft/Code-Search/blob/master/TFS_2017Update2/MissingIndexFolderTriggerCollectionIndexing.ps1
Try the last script on a smaller collection first (which has less
number of repositories) so that you can verify that indexing happened
correctly and the results are query-able.
More details please refer this blog in MSDN: Resetting Search Index in Team Foundation Server
I was able to reduce the disk size after deleting the ES folders, reinstalling the code search extension, and sometimes had to run the MissingIndexFolderTriggerCollectionIndexing.ps1.
But - I came to the conclusion that it was not worth doing, the disk size was growing rapidly and reaching the original size, so I did not save anything.
Although Microsoft recommends giving disk space of 35% of the DB, it is not enough for us and we increase the size when the disk is full to the end (currently about 45% of the DB size).
The conclusion - don't touch the ES, if the disk fills up then increase the disk size.

Quality profile weirdness (active/inactive rules) after sonarqube upgrade 6.3.1

I have upgraded sonarqube server from 6.2 to 6.3.1 and since then I see a weird behaviour regarding the quality profile (it might have occurred before, it is only now I see it).
When I click on the Quality Profile SonarWay (Java) I see
so it seems, that all rules are inactive.
When I click Activate More I see the following
so it looks, that there are rules are active (I assume due to the "Deactivate" option").
But when switching in the left bar to "active" under Quality Profile results in this
so clearly, no rules are active.
What is the second image then showing, what does the "Deactivate" mean, although it is inactive ?
How could this happen that suddenly all rules seem to be inactivated ?
This specific behaviour is a common symptom of a corrupted Elastic Search index (no longer in sync with SonarQube database).
Solution
Rebuild the SonarQube ElasticSearch index:
stop your SonarQube server
delete the ElasticSearch index # sonar_install_dir/data/es
start your SonarQube server
(reminder: ElasticSearch is a search engine used by SonarQube to index issues, rules etc. so that it can access this data rapidly without having to query the database all the time, see SonarQube Architecture)
Root-cause
Why did that happen ? A common case is an ElasticSearch index not being properly rebuilt after upgrading and/or changing database. Here's a typical scenario: you first start SonarQube on embedded H2 database, experiment a bit with it, then plug it to a full-fledged database. If the ElasticSearch index does not get scratched/rebuilt in between, then the index gets corrupted as the database/dataset it used to be in synch with just changed all of the sudden.
FYI there's an improvement planned to handle this more gracefully: SONAR-5681 .
Note: independently from above solution, do not take ElasticSearch index rebuild as a lightweight operation that should be performed regularly. SonarQube does self-manage its ElasticSearch index, so any issue must be investigated first.

PostgreSQL statistics issue - could not rename temporary statistics file

I am running PotgreSQL 9.4 on Windows, and constantly get the error,
2015-06-15 09:35:36 EDT LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
I also see constant 200-800k writes to global.stat and global.tmp. I have seen other users with the same issue, but no solution.
It is a big database server, with 300g of data, and 6,000 databases.
I tried setting,
track_activities=off
In the config file, but it did not seem to have any affect.
Any help for the error, or reducing the write?
After my initial answer, I decided to research the operation of the stats collector and in particular what it is doing with the files in pg_stat_tmp. I've substantially re-written the answer as a result.
What are the global.stat / global.tmp files used for?
Postgresql contains functionality to collect statistics and status information about its operation. The function is described in Section 27.2 of the manual.
This information is collated by the stats collector process. It is made available to the other postgresql processes via the global.stat file. The first time you run a query that accesses this data within a transaction, the backend which you are connected to will read the global.stat file and cache the result, using it until the end of the transaction.
To keep this file up to date, the stats collector process periodically re-writes it with updated information. It typically does this several times a second. The process is as follows:
Create a new file global.tmp
Write data to this file
Rename global.tmp as global.stat, overwriting the previous global.stat
The global.tmp and global.stats files are written into the directory configured by the stats_temp_directory configuration parameter. Normally this is set to $PGDATA/pg_stat_tmp.
On shutdown, the stats file is written into the file $PGDATA/global/pgstat.stat, and the files in the tmp dir above are removed. This file is then read and removed when the database is started up again.
Why is the stats collector processor creating so much I/O load?
Normally, the amount of data written to the global.stats is relatively modest and writing it does not generate that much I/O traffic. However under some circumstances it does seem to get very bloated. When this happens the amount of load generated can start to get excessive as the entire file is rewritten more than once a second.
I have had one experience where it grew by a factor or 10 or more, compared to other similar servers. This machine did have an unusually large number of databases (for our application at least - 30-40 databases - but nothing like the 6000 you say you have). It is possible that having a large number of databases exacerbates this.
Some of the references below talk about a pattern of creating / dropping lots of tables causing bloat in these files, and that perhaps autovacuum is not running aggressively enough to remove the associated bloat. You may wish to consider your autovac settings.
Why do I get 'Permission Denied' errors on Windows?
After examining the postgresql source code I think there may be a race condition in accessing the global.stats file which could happen at any time, but is exacerbated by the size of the file.
The default mode of operation in Windows is that it is not possible to rename or remove a file while another process has it open. This is different to Linux (or Unix) where a file can be renamed or removed while other processes are accessing it.
In the sequence above you can see that if one of the backend processes is reading the file at the same time as the stats collector is rewriting it, then the backend process may still have the file open at the time the rename is attempted. That leads to the 'Permission Denied' error you are seeing.
Naturally when the file becomes very large, then the amount of time taken to read it becomes more significant, therefore the probability of the stats collector process attempting a rename while a backend still has it open increases.
However, since the file is frequently being rewritten, the impact of these errors is relatively mild. It just means that this particular update fails, leading the the backends getting slightly out of date statistics. The next update will probably succeed.
Note that Windows does offer a file opening mode which does allow files to be deleted or renamed while they are opened by another process, however as far as I could tell, this mode is not used by Postgresql. I could not find any bug report on this - seems like it should be reported.
In summary, these errors are a side effect of the main problem, which is the excessive size of the global.stat file.
I've turned track_activities off but the file is still being written - Why?
From what I can see, track_activites affects only one of the sets of information that the stats collector is collecting.
In addition, it looks as though the stats collector process is started regardless of these settings, and will continue to re-write the file. The settings appear to control only the collection of fresh data.
My conclusion is that once the file has become bloated, it will remain so and continue to be re-written, even once all of the stats collection options are turned off.
What can I do to avoid this problem?
Once the file has become bloated, it seems that the easiest way to get the database back into a good working state is to remove the file, using the following steps:
Stop the database
When the DB is stopped, the pg_stat_tmp directory is empty and a file $PGDATA/global/pgstat.stat is written. We renamed this file to pgstat.stat.old.
Start the database. It creates a fresh set of pgstat files. After confirming the server was operating correctly you can remove the old file you have renamed.
This is the process we used when one of our servers suffered from this problem.
Needless to say be very careful when manually manipulating any files under the Postgresql Data directory.
After this you may want to monitor the server to see if it the file becomes bloated again. If it does then here are some additional ideas to consider:
As mentioned above I have seen some references to this file becoming bloated if autovacuum is not running aggressively enough. You may wish to tune the autovacuum settings
Disabling any of the track_xxx options described in the Section 18.9.1 of the manual which are not required may help
It is possible to place the pg_stats_tmp directory in a tmpfs filesystem (or whatever equivalent RAM based filesystem is available in windows). Doing so should eliminate I/O as a concern for these files.
References:
Postgres stats collector showing high disk I/O
Too much I/O generated by postgres stats collector process
stats collector suddenly causing lots of IO
Here might be a solution for your problem. https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug
Another possibility could be antivirus settings. Try to turn it off temporarily.
It happened to me few days ago. I rebooted the machine, but the error did not disappeared.
Don't know why, but performing a vacuum analyze verbose did the trick, and the error has stoped to show up.

OBIEE:how to reload rpd file quickly?

I'm new to use oracle BIEE.My development enviromnent now is installed,and the project is a little big.Multi user development is using for developing now.The problems happens when one developer publish the rpd to network and want to test the data,the server reloading the rpd file takes too much time and I can hardly wait!When multi users want to test rpd file,e,can't stand it... is there any other way to solve the problem?or how to make the biee sever reload the rpd file quickly?
It's hard to say specifically without knowing a bit more about your setup, but here are a few general advice pointers:
When stopping the service OBI will wait for any running queries to complete before stopping the service, so making sure there's nothing running before you try to do this.
Make sure you're only restarting the BI Server component, you don't need to wait for the other services to restart if you're just changing the RPD (if you're on 11g then deploying through EM should mean this happens anyway so you don't need to worry).
If you're using 11g, you could try incremental updates by creating patches.
Check whether the hardware you're running on is adequate, most importantly that you've enough RAM so it's not having to page out to disk when it loads the RPD.
Remove anything unused from the RPD to make it smaller.

Resources