how to ignore backing up certain tables with innobackupex - database-backups

I am using innobackupex to back up my database but I need to ignore a few tables.
is there an --ignore-tables option? I can't seem to find any examples

Update (thanks to Shlomi's comment): Percona in XtraBackup 2.3, there's an option --tables-exclude
https://www.percona.com/doc/percona-xtrabackup/2.3/xtrabackup_bin/xbk_option_reference.html says:
--tables-exclude=name
Filtering by regexp for table names. Operates the same way as xtrabackup --tables, but matched names are excluded from backup. Note that this option has a higher priority than xtrabackup --tables.
Before XtraBackup version 2.3, there is no option in xtrabackup or innobackupex to ignore certain tables while backing up the rest.
You could use --include and give a regular expression that matches all tables except the ones you want to ignore.
For example, you could put the tables to ignore into a separate database, and then back up all the databases except for that one.

Related

How to identify if a schema in database (its structure/metadata) has changed or not

I need to identify if a schema in database has any change in metadata such as changed table columns or changed procedure/package PL/SQL-codes additional/deleted triggers etc. I've tried to make a expdp with content=metadata_only and calculated a checksum of the dump. But this doesn't work because the checksum changes every time despite the same unchanged database. How to identify if a schema in database (its structure) has changed or not? Do I have to export the plain text metadata instead? Thx.
If you only need to know who did what when, use database auditing.
If you only need to know something might have changed, but don't care what and are okay with the possibility of the change not being significant, you can use the last_ddl_time from dba_objects and compare it to the last maximum value you got on the previous check. This can be done either at the schema or object level.
If you really do need to generate a delta and know for certain that something changed, you have two choices:
Construct data dictionary queries against all application dictionary views (lots of work, because there are lot of views - columns, tables, partitions, subpartitions, indexes, index partitions, index subpartitions, lobs, lob partitions, etc, etc, etc.)
(Recommended) Use dbms_metadata to extract the DDL of the entire schema. See this answer for a query that will export almost every object you would likely care about.
Either using #1 or #2, you can then compare old/new strings or use a hash function (e.g. dbms_crypto.hash) to compute a hash value and compare that. I wrote a schema upgrade tool that does exactly this - surgically identifies and upgrades individual objects that are different than some template source schema. I use dbms_metadata to look for diffs on the hash values. You will, however, need to set certain transforms to omit clauses you don't care about and that could have arbitrary changes, or mask them with regexp_replace after the fact (e.g. a sequence will contain the current value which will always be different.. you don't want to see this as a change). It can be a bit of work.

When should I use CREATE and when MERGE in Cypher queries?

I've seen that sometimes CREATE is used to create nodes, and in other situations, MERGE is used. What's the difference, and when should one be used in place of another?
CREATE does just what it says. It creates, and if that means creating duplicates, well then it creates.
MERGE does the same thing as CREATE, but also checks to see if a node already exists with the properties you specify. If it does, then it doesn't create. This helps avoid duplicates.
Here's an example: I use CREATE twice to create a person with the same name.
CREATE should be used when you are absolutely certain that the information doesn't exist in the database (for example, when you are loading data). MERGE is used whenever there is a possibility that the node or relationship already exists and you don’t need to duplicate it. MERGE shouldn't always be used as it’s considerably slower than the create clause.

can multiple csv's can be read using external table in oracle without specifying list of file names?

I have 150 odd csv files but there file name may vary.
so i want to know whether in external table concept can we use *.csv where we provide file name list.
LOCATION (*.csv)
According to this article, you can do it from 12c Release 1 (not tested):
A number of minor usability improvements have been made to the
ORACLE_LOADER access driver to make external table creation simpler.
The LOCATION clause now accepts wildcards. An "*" matches multiple
characters, while a "?" matches a single character.
LOCATION ('emp_ext*.dmp')
LOCATION ('emp_ext?.dmp')
From the docs:
The LOCATION clause lets you specify one or more external data sources. Usually the location_specifier is a file, but it need not be. Oracle Database does not interpret this clause. It is up to the access driver to interpret this information in the context of the external data.
So what happened when you tested it for yourself?

Effect of renaming table column on explain/execution plans

I have a table with 300+ columns and hundreds of thousands of records. I need to re-name one of the existing columns.
Is there anything that I need to be worried about? Will this operation have any effect on the explain plans etc ?
Notes:
I am working on a live production database on Oracle 11g.
This column is not being used currently. It's not populated for any of the rows and I am 100% sure none of the existing queries refer to this column.
If "working on a live production database" means that you are going to try to do this without testing in lower environments while people are working, I would strongly caution against that plan.
Existing query plans that involve the table you're doing DDL on will be invalidated so those queries will need to be hard parsed again. That can easily be an expensive operation if there are large numbers of such queries. It is certainly possible that some query plans will change because something else has changed (i.e. statistics are different, settings are different, bind variables are different, etc.) They won't change because of the column name change but the column name change may result in changed plans.
Any queries that you're executing will, obviously, need to use the new name as soon as you rename the column. That generally means that you need to do a coordinated release where you modify the code (including stored procedures) as well as the column name. That, in turn, generally implies that you're doing this as part of a build that includes at least a bit of downtime. You probably could, if you have the enterprise edition, do edition-based redefinition without downtime but that adds complexity to the process and is something that you would absolutely need to test thoroughly before implementing it in prod.

How can I query two databases and combine the results using LINQ?

I need to pull values in similar tables from two different databases, combine them and then write the output to a CSV file. Can I just create a second connection string in the Properties file and explicitly pass the DataContext the second connection string for the other LINQ query? Or do I need to do something else? The tables are nearly identical except for an ID used for some criteria.
I've never used LINQ before but it seems the easier way to handle this insead of having to write SQL by hand.
if the schema matches both of the databases, then you should be able to just create second DataContext instance (giving it the second connection string as an argument). The LINQ to SQL doesn't check in any way whether you use "the right" database - if it has the right columns & tables it will work.
However, LINQ doesn't automatically work with multiple databases in any "smart" way, so it will need to download the content to the memory before doing any operations that involve multiple data sources. You can still use single LINQ query to do this - but you have to be careful about what part of it is running using in memory data. (By the way, you can use extension methods like "ToList" to explicitly say - get the data from the databse at this point).
You also mention that the tables are nearly identical except for an ID in some case - does that mean that primary/foreign keys are different? In that case, some autogenerated relations may not work. If it means that there is a different column name, then you could manually edit the generated schema to contain both columns and then use only the right one. However, this feels a bit odd - unless you're planning doing some manual edits to the schema, you could as well just generate two very similar schemas.

Resources