INFA SFDC Connector refresh staging table - etl

I have a problem in INFA where the metadata definitions from source/target tables gets out of sync with the actual definition in the underlying database. I'm working with SFDC connector in INFA, and the SFDC object is changed my integration fails. Is it possible to script to update my source/target table metadata before processing?

No, this is not possible, I'm afraid. This would require to also refresh mappings and session. By design, Informatica Source/Target definitions are not related with any particular implementation. This might cause a lot of issues if you'd use one mapping in multiple session to access many different servers. If one of them would change, this would impact every other one.
I guess you can have some additional session that would perform simple read operation to make sure the structure has not been modified before actually starting the main ETL part.
Well, to be honest: other solution would be to have a tool that checks source and target definition, generates the whole mapping/session/workflow, does the import operation on repository with conflict resolution and runs the workflow afterwards. This is possible, but... very complex.

Related

Realtime one-way mirroring of a SQLite database

I am dealing with a 3rd party application that's running a SQLite 3 database with WAL (Write-Ahead Logging) on a local computer, and I'm looking to mirror that database (read only, this is a one-way mirroring) to another system. The challenge is that I'm running in a separate process, which seems to complicate things somewhat.
The database is being created and opened with a normal locking mode so there's no problem reading it from another process, but I'm trying to either find an existing implementation or get some pointers on where to get started. My understanding, based on other posts is that the standard sqlite update hooks (such as sqlite3_update_hook) will not work out of process.
A key issue is speed, I'd like to ideally be able to detect each update as soon as it happens and begin transmitting it. This means that most polling options would be out of the question, but even if they were, how would you detect the most recent changes?
I'm seeing two files that look promising: the actual WAL file (foo.db-wal), and that memory mapped index file (foo.db-shm). I'm hoping that those two contain the information I need to: A. Detect when changes occur in the database and B. Be able to grab just the incremental changes since the last update.
But a pointer to some existing solution would be much preferred... :-)
SymmetricDS might be the solution for you

How to overcome data mismatch on several database

In my system I have more than one project, each project connect with individual DB .When Insert transaction occur in any project then record insert on all of the db,but when update event occur in any project then respective update occur only it’s DB not impact rest of the project db.it’s my system process.After continue this process data become difference on each db.With out change this process what I do to overcome this data mismatch problem.
Suppose on system-1 transaction activity :
Transaction -->Update -->Modification occur only on system1 db not in system-2,sytem-3 db
Any type of suggestion will be acceptable,if have any query please ask,thanks in advanced.
I'm currently working in almost the same Project architecture. Our solution is to create Orchestration module that will manage Single_entry_point module. Last one is responsible to unify the information from the Upstream (cluster of different DataBases and Service systems) and after it to upload/distribute it to a Downstream (Single_Data_Warehouse). By doing so - you can guarantee that all your information is actual in every moment. The Orchestrator communicates with Service massages when dealing with all other modules.
This design is based on Pipes and Filters Pattern concept.
I think that in your case, you can only add logic for Update DB information and reuse all that you have at this point. If you spend some time on such Single_entry_point module, which to deal with not only Insert, but with Transaction Update too.
When it comes to Databases “eyeballing” validation (done by SQL scripting) you definatelly have to consider the use of Informatica. To be more specific - when data as it is being moved into production systems. The data in your production systems has to be right in order to support your business decision making. Informatica Data Validation Option provides the ETL testing automation and management capabilities to ensure that your production systems are not compromised by the data update process.
If you find that this options doesn't suits your needs, here are resources I found about this topic:
database-synchronization-an-overview-of-approaches
MSDN Synchronizing Databases
how-to-synchronize-databases-in-different-servers-in-sql-server-2008
sql-comparison-sdk-synchronizing-databases

How can couchbase be used as a caching layer on top of oracle?

I have Oracle as my main RDBMS for read and write, but I want to use couchbase as caching layer as it has map-reduce as can be used as memcache. Any idea as to how i can implement that, and how to transfer and update data in the caching layer, when Oracle is updated or inserted etc.
You are not telling anything about your current performance issues.
I have seen too many applications which do not really take advantage of RDBMS/SQL features, especially if an ORM sits in between.
The cure is to put another cache on top of a database, and to synchronize this in a cluster manually using IP multicasts (SwarmCache for example), message queues (JMS) or nightly import jobs. It could create more problems in the end. And it increases system complexity.
So my answer to your question is: I would not do it, as long as there is room for improvement regarding your data model and/or queries.
I believe your question is about Database synchronization. This can be done through a combination of using DB dependencies and "right-thru" features that I am not too sure about whether couchbase offers. So with DB dependency you have cached items dependent upon Db items and if the DB items are updated or deleted the corresponding dependent item in the cache is removed and at the same time you can write a "right-thru" handler executed at the server level; and the main purpose of this handler is loading fresh copies of the removed items in the cache. So, basically, you'll write the handler once and registerit with the cache server and the cache server will execute it when needed to sync. new items in the DB with the cache. This reading on Db synchronization can be useful . Its based on a product Ncache.
So your question is not directly related to Couchbase, but as other stated more about how you can be alerted when data are changing into your Oracle instance.
One thing that is not well known is the Oracle Database Change Notification feature that is quite cool for this:
http://docs.oracle.com/cd/E11882_01/java.112/e16548/dbchgnf.htm
So you can create an application that is listening to your changes and pushes the data into Couchbase.

Multiple programs updating the same database

I have a website developed with ASP.NET MVC, Entity Framework Code First and SQL Server.
The website has entities that each have a history of statuses that we defined (NEW, PACKED, SHIPPED etc.)
The DB contains a table in which a completely separate system inserts parcel tracking data.
I have to read this data tracking data and, following certain business rules, add to the existing status history of my entities.
The best way I can think of is to write an independent Windows service to poll the tracking data every so often and update my entity statuses from that. However, that makes me concerned about DB concurrency issues.
Please could someone advise me on the best strategy for this scenario?
Many thanks
There are different ways to do it. It also depends on the response time you need. If you need to update your system as soon as the tracking system updates the record then a trigger is the preferred way. Alternative way is to schedule a job which will run every 15/30mins and sync the 2 systems.
As for the concurrency issue you can use a concurrency token field. Entity framework has support for this.

How do you manage schema upgrades to a production database?

This seems to be an overlooked area that could really use some insight. What are your best practices for:
making an upgrade procedure
backing out in case of errors
syncing code and database changes
testing prior to deployment
mechanics of modifying the table
etc...
Liquibase
liquibase.org:
it understands hibernate definitions.
it generates better schema update sql than hibernate
it logs which upgrades have been made to a database
it handles two-step changes (i.e. delete a column "foo" and then rename a different column to "foo")
it handles the concept of conditional upgrades
the developer actually listens to the community (with hibernate if you are not in the "in" crowd or a newbie -- you are basically ignored.)
http://www.liquibase.org
opinion
the application should never handle a schema update. This is a disaster waiting to happen. Data outlasts the applications and as soon as multiple applications try to work with the same data ( the production app + a reporting app for example) -- chances are they will both use the same underlying company libraries... and then both programs decide to do their own db upgrade ... have fun with that mess.
I am a big fan of Red Gate products that help creating SQL packages to update database schemas. The database scripts can be added to source control to help with versioning and rollback.
In general my rule is: "The application should manage it's own schema."
This means schema upgrade scripts are part of any upgrade package for the application and run automatically when the application starts. In case of errors the application fails to start and the upgrade script transaction is not committed. The downside to this is that the application has to have full modification access to the schema (this annoys DBAs).
I've had great success using Hibernates SchemaUpdate feature to manage the table structures. Leaving the upgrade scripts to only handle actual data initialization and occasional removing of columns (SchemaUpdate doesn't do that).
Regarding testing, since the upgrades are part of the application, testing them becomes part of the test cycle for the application.
Afterthought: Taking on board some of the criticism in other posts here, note the rule says "it's own". It only really applies where the application owns the schema as is generally the case with software sold as a product. If your software is sharing a database with other software, use other methods.
That's a great question. ( There is a high chance this is going to end up a normalised versus denormalised database debate..which I am not going to start... okay now for some input.)
some off the top of my head things I have done (will add more when I have some more time or need a break)
client design - this is where the VB method of inline sql (even with prepared statements) gets you into trouble. You can spend AGES just finding those statements. If you use something like Hibernate and put as much SQL into named queries you have a single place for most of the sql (nothing worse than trying to test sql that is inside of some IF statement and you just don't hit the "trigger" criteria in your testing for that IF statement). Prior to using hibernate (or other orms') when I would do SQL directly in JDBC or ODBC I would put all the sql statements as either public fields of an object (with a naming convention) or in a property file (also with a naming convention for the values say PREP_STMT_xxxx. And use either reflection or iterate over the values at startup in a) test cases b) startup of the application (some rdbms allow you to pre-compile with prepared statements before execution, so on startup post login I would pre-compile the prep-stmts at startup to make the application self testing. Even for 100's of statements on a good rdbms thats only a few seconds. and only once. And it has saved my butt a lot. On one project the DBA's wouldn't communicate (a different team, in a different country) and the schema seemed to change NIGHTLY, for no reason. And each morning we got a list of exactly where it broke the application, on startup.
If you need adhoc functionality , put it in a well named class (ie. again a naming convention helps with auto mated testing) that acts as some sort of factory for you query (ie. it builds the query). You are going to have to write the equivalent code anyway right, just put in a place you can test it. You can even write some basic test methods on the same object or in a separate class.
If you can , also try to use stored procedures. They are a bit harder to test as above. Some db's also don't pre-validate the sql in stored procs against the schema at compile time only at run time. It usually involves say taking a copy of the schema structure (no data) and then creating all stored procs against this copy (in case the db team making the changes DIDn't validate correctly). Thus the structure can be checked. but as a point of change management stored procs are great. On change all get it. Especially when the db changes are a result of business process changes. And all languages (java, vb, etc get the change )
I usually also setup a table I use called system_setting etc. In this table we keep a VERSION identifier. This is so that client libraries can connection and validate if they are valid for this version of the schema. Depending on the changes to your schema, you don't want to allow clients to connect if they can corrupt your schema (ie. you don't have a lot of referential rules in the db, but on the client). It depends if you are also going to have multiple client versions (which does happen in NON - web apps, ie. they are running the wrong binary). You could also have batch tools etc. Another approach which I have also done is define a set of schema to operation versions in some sort of property file or again in a system_info table. This table is loaded on login, and then used by each "manager" (I usually have some sort of client side api to do most db stuff) to validate for that operation if it is the right version. Thus most operations can succeed, but you can also fail (throw some exception) on out of date methods and tells you WHY.
managing the change to schema -> do you update the table or add 1-1 relationships to new tables ? I have seen a lot of shops which always access data via a view for this reason. This allows table names to change , columns etc. I have played with the idea of actually treating views like interfaces in COM. ie. you add a new VIEW for new functionality / versions. Often, what gets you here is that you can have a lot of reports (especially end user custom reports) that assume table formats. The views allow you to deploy a new table format but support existing client apps (remember all those pesky adhoc reports).
Also, need to write update and rollback scripts. and again TEST, TEST, TEST...
------------ OKAY - THIS IS A BIT RANDOM DISCUSSION TIME --------------
Actually had a large commercial project (ie. software shop) where we had the same problem. The architecture was a 2 tier and they were using a product a bit like PHP but pre-php. Same thing. different name. anyway i came in in version 2....
It was costing A LOT OF MONEY to do upgrades. A lot. ie. give away weeks of free consulting time on site.
And it was getting to the point of wanting to either add new features or optimize the code. Some of the existing code used stored procedures , so we had common points where we could manage code. but other areas were this embedded sql markup in html. Which was great for getting to market quickly but with each interaction of new features the cost at least doubled to test and maintain. So when we were looking at pulling out the php type code out, putting in data layers (this was 2001-2002, pre any ORM's etc) and adding a lot of new features (customer feedback) looked at this issue of how to engineer UPGRADES into the system. Which is a big deal, as upgrades cost a lot of money to do correctly. Now, most patterns and all the other stuff people discuss with a degree of energy deals with OO code that is running, but what about the fact that your data has to a) integrate to this logic, b) the meaning and also the structure of the data can change over time, and often due to the way data works you end up with a lot of sub process / applications in your clients organisation that needs that data -> ad hoc reporting or any complex custom reporting, as well as batch jobs that have been done for custom data feeds etc.
With this in mind i started playing with something a bit left of field. It also has a few assumptions. a) data is heavily read more than write. b) updates do happen, but not at bank levels ie. one or 2 a second say.
The idea was to apply a COM / Interface view to how data was accessed by clients over a set of CONCRETE tables (which varied with schema changes). You could create a seperate view for each type operation - update, delete, insert and read. This is important. The views would either map directly to a table , or allow you to trigger of a dummy table that does the real updates or inserts etc. What i actually wanted was some sort of trappable level indirection that could still be used by crystal reports etc. NOTE - For inserts , update and deletes you could also use stored procs. And you had a version for each version of the product. That way your version 1.0 had its version of the schema, and if the tables changed, you would still have the version 1.0 VIEWS but with NEW backend logic to map to the new tables as needed, but you also had version 2.0 views that would support new fields etc. This was really just to support ad hoc reporting, which if your a BUSINESS person and not a coder is probably the whole point of why you have the product. (your product can be crap but if you have the best reporting in the world you can still win, the reverse is true - your product can be the best feature wise, but if its the worse on reporting you can very easily loose).
okay, hope some of those ideas help.
These are all weighty topics, but here is my recommendation for updating.
You did not specify your platform, but for NANT build environments I use Tarantino. For every database update you are ready to commit, you make a change script (using RedGate or another tool). When you build to production, Tarantino checks if the script has been run on the database (it adds a table to your database to keep track). If not, the script is run. It takes all the manual work (read: human error) out of managing database versions.
I've heard good things about iBATIS 3 Schema Migrations System:
User Guide: http://svn.apache.org/repos/asf/ibatis/java/ibatis-3/trunk/doc/en/iBATIS-3-Migrations.pdf
As Pat said, use liquibase. Especially when you have several developers with their own dev databases
making changes that will become part of the production database.
If there's only one dev, as on one project I'm on now(ha), I just commit the schema changes as SQL text files into a CVS repo, which I check out in batches on the production server when the code changes go in.
But liquibase is better organized than that!

Resources