I am wondering if anyone has performed bulk deletes to clear all data out of a CRM environment? My plan was to create a console app that performed a number of Bulk Deletes. But my initial testing found this to be very slow.
I am asking because we are doing data migration form an existing .NET system to CRM. I want to clear all the data from CRM so we can re run and re test the data migration component.
Has any one got any suggestions?
F.Y.I this is using CRM Dynamics Online
From experience I find that calling the IOrganizationService.Delete () method from an external application is faster than using the BulkDelete operation.
IOrganizationService.Delete() will simply be faster - but not better. Remember that simply removing a record might leave related records unusable and using the SDK does not do the checks that the bulk delete will do for you. My findings are that using the bulk delete gives you the assurance of proper data that are left behind and will follow design and prompt when something aren't allowed or cannot get deleted till something else is not removed first.
My suggestion (in your case) . start by running the bulk delete on all parent records, once cleared move on the relateds.
Another way would be to create non production instances and do your test runs there. You can then refresh the non prod instance with a copy of prod for the succeeding test cycles. The time to reset your target organisation will be close to constant, unless your production data increases rapidly. (i.e. heavy transactions).
This is even better if you're CRM is not yet live, I read from your post that you wanted to remove all records in CRM.
Related
We recently had a duplication in uploaded records to one of our REDCap projects totaling about 3,800 files that now need to be removed. Currently it is taking a team of three about 30 seconds per record to delete due to the size of the project and traffic in REDCap. I have not found any solution to delete multiple records at once from Vanderbilt or the other university resources I typically use. I was wondering if others have found a work around?
There is an API method for deleting records, so that would be your best bet for batch deletion. The details, with examples for different languages, are in the API and API playground applications in the project sidebar. You must have an API token generated, and your admin might have to approve that before you can use the API. It also inherits your privileges on the project, so you need to have the privilege to delete records in the User Rights application.
Another method that might be easier and cleaner would be to ditch this particular project and copy the project without its records, then perform the import on the new, empty project.
Consider REDCapR::redcap_delete() if you're using R, or PyCap's delete_records() if you're using Python.
Packages for other languages are listed in REDCap Tools.
I am new to sitecore and just trying to understand its architecture/design. Just curious to know how Intranet and Internet server communicates and how does the data flow happens between these two layers in on-prem and on AWS EC2 environment? I have surfed enough in the web and couldn't find the appropriate explanation.
Really appreciate if anyone can help me understand.
When u do a publish from CM, it puts a record in eventqueue table in Web Db.
all CD servers will hit the eventqueue table table for update and proceed.
default is 2 seconds once this hit happens.
In short, they communicate via events in the database(s). Note: This is very simplified but seeing it this way helped me understand how the events work and troubleshoot issues.
For example, when publishing an item, the publisher (running on CM or on a dedicated role) reads its data from the master database and writes it to the web database. When done, it raises an event by writing a row in the EventQueue table in web database. The CD server(s) picks up this event and clears its corresponding caches etc. causing a reload of that data from the web database.
All Sitecore databases have the EventQueue table and events goes to the table in different databases, depending on the type of event. An events is basically just a class name and a set of serialized data. Events can be raised "locally" and "globally" indicating if several instances should pick up the event. Think of a scenario where you have two CD servers sharing one web database, both CD's would have to pick up the event.
To keep track on what events has been processed, a "EQSTAMP" value is stored in the Properties table. It's named [database]_EQSTAMP_[InstanceName]. It's therefore essential that not two Sitecore instances share the same instance name. If not set, Sitecore will make an instance name by combining the hostname and IIS site name. The decimal Value of this timestamp corresponds to the hexadecimal Stamp column in the EventQueue table.
Normally, you should never have to play with these tables yourself, but I find it good to have some insights in how they work and keep an eye on them. They can grow in size and cause some issues. The CleanupEventQueue scheduled task is responsible for removing old processed events from the EventQueue tables. You may want to play with the scheduling of this agent if your EventQueue grows too large between cleanups.
Note: This is the most common way of communication between the servers. Later versions of Sitecore have other techniques as well, such as Rebus.
Event Queues. Why? How? When? article that explains it in detail, it also describes the pitfalls of using this mechanism in real life as well.
Please also be aware that Sitecore.Link project is a good place to get more knowledge regarding Sitecore functionality.
It accumulates Sitecore knowledge all around the web.
Thanks.
I have some questions here and here the management of state in an ASP.NET MVC3 application. One of the answers mentions that an option for this is to simply store the state of each step in the database.
I was wondering if anyone had any advice on how this is usually achieved as I had some thoughts when this was first suggested to me.
Invalid entities
Consider a multi-step form (wizard) that has 3 steps. I could save each step in the database to maintain state but a user could close the web application midway through the process leaving my database containing entities that are in an invalid state.
To overcome this I could add a field to the table which indicates if the wizard has been completed. Any inconsistent items could be reviewed on a periodic basis and automatically deleted if required e.g. if any invalid entities are found in the database at the end of the day they will be automatically deleted.
The problem with this is that I have to add fields to the tables to store metadata about the application. Every table that stores information that is entered in a multi-step form needs to have these fields. This seems wrong to me somehow. One solution might be to create a specific table for managing this rather than polluting each entity table with metadata.
Intermediary database
I thought of having a database that sits in between my application and the 'real' database.
The intermediary database would have tables that stored the state information for each 'step' and only when the last step was completed would this information be transferred over to the 'real' database (and the temporary data deleted from the intermediary).
This also sounds similar to one of the session state options offered by ASP.NET already so personally I think this would be a waste of time.
Use in other application (E.G. Desktop)
At this moment in time my application is purely web based, but I have plans for having desktop programs that can interact with the same database. If the database has a load of meta-data used by the web application for storing state my desktop application is going to need to be aware of this in order to avoid any errors (I.E. my desktop application would need to know that it has to set an entity state as 'valid' so that the web application does not delete the entity at the end of the day because it thinks it is invalid).
Summary
So does anyone have any information or tips on how to best use a database for storing application state?
Is the database option that common?
Is it suitable for large applications with a lot of entities?
Are there any performance implications?
Edit
Just to be clear, I am aware that other options exist for managing state in an ASP.NET MVC application (TempData, cache and session) but I am specifically interested in information about using a database to manage state.
Please refrain from down-voting anyone that has mentioned the other options as my original question may not have been clear about this.
Why not store data in a session state? You just need to come up with a mechanism that would allow you to uniquely identify and store items in the session state.
To start with, you can use InProc session state mode. As the system grows, you can look into storing your session state on a state server or on a SQL server.
This is a hard one to answer, but basically, I'd see two routes.
If the data in a given step in the wizard is logically right, and meets all the constraints you've imposed, you could write it to your "main" database. For instance, if you've got a multi-step process for managing orders, and step one is to create a customer record if one doesn't already exist, write the customer record to the database when the user completes the form.
This means that if the user goes away, closes the browser or whatever, the data will be there when they come back - which is probably what they expect.
If the data in a given step is NOT coherent, does not meet constraints etc., use session state to manage it until it's ready for writing to the database. Session state in MVC is a bit of a pain, and you should use it sparingly - it makes it hard to write unit tests.
The purpose of session state is to store data that is relevant to the user session, but that isn't (yet) intended to go into the database.
I have a website developed with ASP.NET MVC, Entity Framework Code First and SQL Server.
The website has entities that each have a history of statuses that we defined (NEW, PACKED, SHIPPED etc.)
The DB contains a table in which a completely separate system inserts parcel tracking data.
I have to read this data tracking data and, following certain business rules, add to the existing status history of my entities.
The best way I can think of is to write an independent Windows service to poll the tracking data every so often and update my entity statuses from that. However, that makes me concerned about DB concurrency issues.
Please could someone advise me on the best strategy for this scenario?
Many thanks
There are different ways to do it. It also depends on the response time you need. If you need to update your system as soon as the tracking system updates the record then a trigger is the preferred way. Alternative way is to schedule a job which will run every 15/30mins and sync the 2 systems.
As for the concurrency issue you can use a concurrency token field. Entity framework has support for this.
This seems to be an overlooked area that could really use some insight. What are your best practices for:
making an upgrade procedure
backing out in case of errors
syncing code and database changes
testing prior to deployment
mechanics of modifying the table
etc...
Liquibase
liquibase.org:
it understands hibernate definitions.
it generates better schema update sql than hibernate
it logs which upgrades have been made to a database
it handles two-step changes (i.e. delete a column "foo" and then rename a different column to "foo")
it handles the concept of conditional upgrades
the developer actually listens to the community (with hibernate if you are not in the "in" crowd or a newbie -- you are basically ignored.)
http://www.liquibase.org
opinion
the application should never handle a schema update. This is a disaster waiting to happen. Data outlasts the applications and as soon as multiple applications try to work with the same data ( the production app + a reporting app for example) -- chances are they will both use the same underlying company libraries... and then both programs decide to do their own db upgrade ... have fun with that mess.
I am a big fan of Red Gate products that help creating SQL packages to update database schemas. The database scripts can be added to source control to help with versioning and rollback.
In general my rule is: "The application should manage it's own schema."
This means schema upgrade scripts are part of any upgrade package for the application and run automatically when the application starts. In case of errors the application fails to start and the upgrade script transaction is not committed. The downside to this is that the application has to have full modification access to the schema (this annoys DBAs).
I've had great success using Hibernates SchemaUpdate feature to manage the table structures. Leaving the upgrade scripts to only handle actual data initialization and occasional removing of columns (SchemaUpdate doesn't do that).
Regarding testing, since the upgrades are part of the application, testing them becomes part of the test cycle for the application.
Afterthought: Taking on board some of the criticism in other posts here, note the rule says "it's own". It only really applies where the application owns the schema as is generally the case with software sold as a product. If your software is sharing a database with other software, use other methods.
That's a great question. ( There is a high chance this is going to end up a normalised versus denormalised database debate..which I am not going to start... okay now for some input.)
some off the top of my head things I have done (will add more when I have some more time or need a break)
client design - this is where the VB method of inline sql (even with prepared statements) gets you into trouble. You can spend AGES just finding those statements. If you use something like Hibernate and put as much SQL into named queries you have a single place for most of the sql (nothing worse than trying to test sql that is inside of some IF statement and you just don't hit the "trigger" criteria in your testing for that IF statement). Prior to using hibernate (or other orms') when I would do SQL directly in JDBC or ODBC I would put all the sql statements as either public fields of an object (with a naming convention) or in a property file (also with a naming convention for the values say PREP_STMT_xxxx. And use either reflection or iterate over the values at startup in a) test cases b) startup of the application (some rdbms allow you to pre-compile with prepared statements before execution, so on startup post login I would pre-compile the prep-stmts at startup to make the application self testing. Even for 100's of statements on a good rdbms thats only a few seconds. and only once. And it has saved my butt a lot. On one project the DBA's wouldn't communicate (a different team, in a different country) and the schema seemed to change NIGHTLY, for no reason. And each morning we got a list of exactly where it broke the application, on startup.
If you need adhoc functionality , put it in a well named class (ie. again a naming convention helps with auto mated testing) that acts as some sort of factory for you query (ie. it builds the query). You are going to have to write the equivalent code anyway right, just put in a place you can test it. You can even write some basic test methods on the same object or in a separate class.
If you can , also try to use stored procedures. They are a bit harder to test as above. Some db's also don't pre-validate the sql in stored procs against the schema at compile time only at run time. It usually involves say taking a copy of the schema structure (no data) and then creating all stored procs against this copy (in case the db team making the changes DIDn't validate correctly). Thus the structure can be checked. but as a point of change management stored procs are great. On change all get it. Especially when the db changes are a result of business process changes. And all languages (java, vb, etc get the change )
I usually also setup a table I use called system_setting etc. In this table we keep a VERSION identifier. This is so that client libraries can connection and validate if they are valid for this version of the schema. Depending on the changes to your schema, you don't want to allow clients to connect if they can corrupt your schema (ie. you don't have a lot of referential rules in the db, but on the client). It depends if you are also going to have multiple client versions (which does happen in NON - web apps, ie. they are running the wrong binary). You could also have batch tools etc. Another approach which I have also done is define a set of schema to operation versions in some sort of property file or again in a system_info table. This table is loaded on login, and then used by each "manager" (I usually have some sort of client side api to do most db stuff) to validate for that operation if it is the right version. Thus most operations can succeed, but you can also fail (throw some exception) on out of date methods and tells you WHY.
managing the change to schema -> do you update the table or add 1-1 relationships to new tables ? I have seen a lot of shops which always access data via a view for this reason. This allows table names to change , columns etc. I have played with the idea of actually treating views like interfaces in COM. ie. you add a new VIEW for new functionality / versions. Often, what gets you here is that you can have a lot of reports (especially end user custom reports) that assume table formats. The views allow you to deploy a new table format but support existing client apps (remember all those pesky adhoc reports).
Also, need to write update and rollback scripts. and again TEST, TEST, TEST...
------------ OKAY - THIS IS A BIT RANDOM DISCUSSION TIME --------------
Actually had a large commercial project (ie. software shop) where we had the same problem. The architecture was a 2 tier and they were using a product a bit like PHP but pre-php. Same thing. different name. anyway i came in in version 2....
It was costing A LOT OF MONEY to do upgrades. A lot. ie. give away weeks of free consulting time on site.
And it was getting to the point of wanting to either add new features or optimize the code. Some of the existing code used stored procedures , so we had common points where we could manage code. but other areas were this embedded sql markup in html. Which was great for getting to market quickly but with each interaction of new features the cost at least doubled to test and maintain. So when we were looking at pulling out the php type code out, putting in data layers (this was 2001-2002, pre any ORM's etc) and adding a lot of new features (customer feedback) looked at this issue of how to engineer UPGRADES into the system. Which is a big deal, as upgrades cost a lot of money to do correctly. Now, most patterns and all the other stuff people discuss with a degree of energy deals with OO code that is running, but what about the fact that your data has to a) integrate to this logic, b) the meaning and also the structure of the data can change over time, and often due to the way data works you end up with a lot of sub process / applications in your clients organisation that needs that data -> ad hoc reporting or any complex custom reporting, as well as batch jobs that have been done for custom data feeds etc.
With this in mind i started playing with something a bit left of field. It also has a few assumptions. a) data is heavily read more than write. b) updates do happen, but not at bank levels ie. one or 2 a second say.
The idea was to apply a COM / Interface view to how data was accessed by clients over a set of CONCRETE tables (which varied with schema changes). You could create a seperate view for each type operation - update, delete, insert and read. This is important. The views would either map directly to a table , or allow you to trigger of a dummy table that does the real updates or inserts etc. What i actually wanted was some sort of trappable level indirection that could still be used by crystal reports etc. NOTE - For inserts , update and deletes you could also use stored procs. And you had a version for each version of the product. That way your version 1.0 had its version of the schema, and if the tables changed, you would still have the version 1.0 VIEWS but with NEW backend logic to map to the new tables as needed, but you also had version 2.0 views that would support new fields etc. This was really just to support ad hoc reporting, which if your a BUSINESS person and not a coder is probably the whole point of why you have the product. (your product can be crap but if you have the best reporting in the world you can still win, the reverse is true - your product can be the best feature wise, but if its the worse on reporting you can very easily loose).
okay, hope some of those ideas help.
These are all weighty topics, but here is my recommendation for updating.
You did not specify your platform, but for NANT build environments I use Tarantino. For every database update you are ready to commit, you make a change script (using RedGate or another tool). When you build to production, Tarantino checks if the script has been run on the database (it adds a table to your database to keep track). If not, the script is run. It takes all the manual work (read: human error) out of managing database versions.
I've heard good things about iBATIS 3 Schema Migrations System:
User Guide: http://svn.apache.org/repos/asf/ibatis/java/ibatis-3/trunk/doc/en/iBATIS-3-Migrations.pdf
As Pat said, use liquibase. Especially when you have several developers with their own dev databases
making changes that will become part of the production database.
If there's only one dev, as on one project I'm on now(ha), I just commit the schema changes as SQL text files into a CVS repo, which I check out in batches on the production server when the code changes go in.
But liquibase is better organized than that!