Related
I'm in charge of building a complex system for my company, and after some research decided that Camunda fits most of my requirements. But some of my requirements are not common, and after reading the user guide I realized there are many ways of doing the same thing, so I hope this question will clarify my thoughts and also will serve as a base questión for everyone else looking for building something similar.
First of all, I'm planning to build a specific App on top of Camunda BPM. It will use workflow and BPM, but not necessarily all the stuff BPM/Camunda provides. This means it is not in my plans to use mostly of the web apps that came bundled with Camunda (tasks, modeler...), at least not for end users. And to make things more complicated it must support multiple tenants... dynamically.
So, I will try to specify all of my requirements and then hopefully someone with more experience than me could explain which is the best architecture/solution to make this work.
Here we go:
Single App built on top of Camunda BPM
High-performance
Workload (10k new process instances/day after few months).
Users (starting with 1k, expected to be ~ 50k).
Multiple tenants (starting with 10, expected to be ~ 1k)
Tenants dynamically managed (creation, deploy of process definitions)
It will be deployed on cluster
PostgreSQL
WildFly 8.1 preferably
After some research, this are my thoughts
One Process Application
One Process Engine per tenant
Multi tenancy data isolation: schema or table level.
Clustering (2 nodes) at first for high availability, and adding more nodes when amount of tenants and workload start to rise.
Doubts
Should I let camunda manage my users/groups, or better manage this on my app? In this case, can I say to Camunda “User X completed Task Y”, even if camunda does not know about the existence of user X?
What about dynamic multi tenancy? Is it possible to create tenants on the fly and make those tenants persist over time even after restarting the application server? What about re-deployment of processes after restarting?
After which point should I think on partitioning of engines on nodes? It’s hard to figure out how I’m going to do this with dynamic multi tenancy, but moreover... Is this the right way to deal with high workload and growing number of tenants?
With my setup of just one process application, should I take care of something else in a cluster environment?
I'm not ruling out using only one tenant, one process engine and handle everything related to tenants logically within my app, but I understand that this can be very (VERY!) cumbersome.
All answers are welcome, hopefully we'll achieve a good approach to this problem.
1. Should I let camunda manage my users/groups, or better manage this on my app? In this case, can I say to Camunda “User X completed Task Y”, even if camunda does not know about the existence of user X?
Yes, you can choose your app to manage the users and tell Camunda that a task is completed by a user whom Camunda doesn't know about. And same way, you can make Camunda to assign task to users which it doesn't know at all. This is done by implementing their org.camunda.bpm.engine.impl.identity.ReadOnlyIdentityProvider interface and let the configuration know about your implementation.
PS: If you doesn't need all the application that comes with Camunda, I would even suggest you to embed the Camunda engine in your app. It can be done easily and they have good documentation for thier java APIs. And it is easily achievable.
2. What about dynamic multi tenancy? Is it possible to create tenants on the fly and make those tenants persist over time even after restarting the application server? What about re-deployment of processes after restarting?
Yes. Its possible to dynamically add Tenants. While restarting the engine or your application, you can either choose to redeploy / or just use the existing deployed processes. Even when you redeploy a process, if you want Camunda to create a new version of the process only if there is a change in the process, that's also possible. See enableDuplicateFiltering property for their DeploymentBuilder.
3. After which point should I think on partitioning of engines on nodes? It’s hard to figure out how I’m going to do this with dynamic multi tenancy, but moreover... Is this the right way to deal with high workload and growing number of tenants?
In my experience, it is possible. You need to keep track of various parameters here, like memory, number of requests being served, number of open connections available etc., then accordingly add more or remove nodes. With AWS, this will be much easier as they have some of these tools already available for dynamic scaling in / out nodes. But that said, I have done this only with Camunda as embedded engine application(s).
The following questions were posed by a customer who is about to write a large scale XPages application. While I think the questions are actually to broad to fit stackoverflow style, they are interesting and the collective knowledge of the experts here could yield better results than one person answering them:
How many concurent users can use XPages applications on 1 Lotus Domino server (There are several applications on Lotus Domino server. Not one)?
How can we define and analyze memory leaks on Lotus Domino server, when run XPages application?
How can we write XPages the right way for achiving the best performance and avoding memory leaks?
What code methods and objects should not be used?
What are typical errors when the Lotus Script developer begins to write the code for XPages? What are the best practises?
How can we build centralized, consolidated application on XPages for 10000 - 15000 users? How many servers we need? How to configure XPages application in that case?
How to balace users?
I will provide my insights, please share yours
How long is a string? It depends on how the server is configured. And "application" could be a single form or hundreds. Only a test can tell. In general: build a high performance server preferably with 64Bit architecture and lots of RAM. Make that RAM available for the JVM. If the applications use attachments, use DAOS, put it on a separate disk - and of course make sure you have the latest version of Domino (8.5.3FP1 at time of this writing)
There is the XPages Toolbox that includes a memory and CPU profiler.
It depends on the type of application. Clever use of the scopes for caching, Expression Language and beans instead of SSJS. You leak memory whey you forget .recycle. Hire an experienced lead developer and read the book also the other one and two. Consider threading off longer running code, so users don't need to wait.
Depends on your needs. The general lessons of Domino development apply when it comes to db operations, so FTSearch over DBSearch, scope usage over #DBColumn for parameters. EL over SSJS.
Typical errors include: all code in the XPages -> use script libraries. Too much #dblookup, #dbcolumn instead of scope. Validation in buttons instead of validators. Violation of decomposition principles. Forgetting to use .recycle(). Designing applications "like old Notes screens" instead of single page interaction. Too little use of partial refresh. No use of caching. Too little object orientation (crating function graves in script libraries).
This is a summary of question 1-5, nothing new to answer
When clustering Domino servers for XPages and putting a load balancer in front, the load balancer needs to be configured to keep a session on the same server, so partial refreshes and Ajax calls reach the server that has the component tree rendered for that user.
It depends on the server setup, I have i.e XPage extranet with 12000 registered users spanning over aprox 20 XPage applications. That runs on 1 Windows 2003 server with 4GB Ram and quad core cpu. Data aount is about 60GB over these 20 applications. No Daos, no beans just SSJS. Performance is excellent. So when I upgrade this installation to 64 bit and Daos the application will scale when more. So 64Bit and Lots of Ram is the key to alot of users.
I haven't done anything around this
Make sure to recyle when you do document loops, Use the openntf.org debug toolbar it will save alot of time before we have a debugger for XPages.
Always think when you are doing things this will be done by several users, so try to cut down number of lookup or getElementByKey. Try to use ViewNavigator when you can.
It all depends on how many users that uses the system concurrent. If you have 10000 - 15000 users concurrent then you have to look at what the applications does and how many users will use the same application at the same time.
Thats my insights into the question
Recently I had a project in which I had to get some data from particular software system to a portlet. The software used a database, and I spent a fair bit of time modeling the data I wanted and then creating a web service so that my portlet could grab the information.
Then it suddenly struck me that I was wasting my time. I grabbed BIRT, tossed it into a portlet, and then just wrote some reports that directly grabbed the necessary data from the database. I was done in an afternoon.
I understand that reporting is a one way street, but this got me thinking. Reporting tools can be very effective for creating reports (duh) from your actual data, but when you're doing this you're bypassing your model which except in simple cases is not a direct representation of your data as it exists in your database.
If you're writing a data-intensive application and require the ability to perform non-trivial reporting, do you bypass your application and use something like BIRT or Crystal Reports? How do you manage these tools as part of your overall process? Do you consider the reports you write as being part of your application and treat them as such? A report is a view and a model and a controller (if you will) all in one big mess, how do you deal with and interpret and plan for that?
Revised question: it's possible and even common that a report will perform some business calculations that in a perfect world you would like to have contained in your application. This can lead to a mismatch of information given back to the user. On the other hand, reporting tools make it so easy to gather and display information that it's hard to take a purist's approach and do everything from within the application. Are there any good techniques for ensuring that the data in your reports matches the data that you might be showing in the regular GUI?
I see reporting as simply another view on the data, not a view/model/controller in one (well, maybe a view and controller in one).
We have our reports (built in sql 2008 reporting services) consume a service in our application layer to get data (keeping with our standard, that data access is in a repository). These functions could do a simple query or handle very complex processing that would be a nightmare in your reporting evironment or a stored procedure. In practice, we find this takes no longer than coding up some one-off stored procedure that will, as your system grows and grows, become a nightmare to maintain.
Treating reporting as simply a one-off or not integrating into your application design is a huge mistake.
Reporting is crucial. Reporting is mostly crucial to share values collected in one system to external users, e.g. users not directly using the system (eg management for sales figures). So reporting is a lot more than just displaying facts and figures and is something central to almost every system that drives a commercial.
At least the more advanced systems allow you to enhance them: with your own reusable "controls". Even a way back can be implemented - if you just use the correct plugins. Once I wrote a system to send emails out of a report, because the system did not allow for change. It worked - though it was not meant to be used that way ;)
Reports make a good part of the application, and you gain a lot freedom if you make reports changeable for your customers. Sometimes you come up with more possibilities than you thought of when you built the system in the first place.
So yes, for me reporting is part of the system.
Reports are part of your app but because they are generally something a user will have strong ideas about than, say, your data capture UI, I'd sacrifice purity for convenience/speed of delivery and get back to "real" coding... :-)
As soon as you've done a report, users want another one or change the colour or optional grouping or more filtering or... something that takes you away from whizzier stuff... so I don't bust a gut maintaining purity.
This is a fine line indeed. You don't want to spend too much time building reports (that users want you to change all the time anyway) but you don't want to duplicate logic by putting business logic into your reports! With our reporting products at Data Dynamimcs I think we have reached a happy medium between these two tradeoffs.
By using the ObjectDataProvider (see links below for more info) you can bind the report directly to business objects (plain old objects) so you don't have to bypass your business layer for getting data. At the same time we provide a way to reference and use functions from other libraries in your report. This way if you have some code configured already to do some business logic calculations you can reuse those functions directly within your report. You can see an example of this in the links below too.
Binding to Objects for your Data (see "Object Provider" section): http://www.datadynamics.com/Help/ddReports/ddrconDataSetAndObjectDataSource.html
Adding Custom Code to your reports Walkthrough: http://www.datadynamics.com/Help/ddReports/ddrwlkCustomCode.html
Using Custom Assemblies (referencing shared libraries/dlls from your report): http://www.datadynamics.com/Help/ddReports/ddrconCustomCode.html, and http://www.datadynamics.com/Help/ddReports/ddrtskCreatingAnInstanceMethod.html
Scott Willeke
Data Dynamics / GrapeCity
The way I've always worked with reports is to consider part reports as part of the code-base, and stored in the source along with the application. In some contexts, reports are more important than the application, in that management makes business decisions off of report data, having the wrong information can cause them to cancel a product line, cancel a campaign, or fire a sales person. Obviously, this depends highly on your management and your application.
Regarding keeping your model consistent, this is a bit trickier question. One way to ensure consistent model between reports and your application is to use stored procedures (or views) to retrieve data, depending on your application's architecture.
What parts of your application are not coded?
I think one of the most obvious examples would be DB credentials - it's considered bad to have them hard coded. And in most of situations it is easy to decide if you want something to be externalized or coded.For me the rules are simple. Some part of the application should be externalized if:
it can and should be changed by non-developer, but not so often to be included in application settings defined in UI (DB credentials, service URLs, etc)
it does not require programming language and seems unnatural being coded (localization)
Do you have anything to add?
This is a little related to this question about spring cfg.
Spring configuration seems less obvious example for me, because in my practice it is never modified by anyone except the developer. And the road of externalizing can take you far away, to the entire project being "configured", not coded - so where to stop?
So please post here some examples from your experience, when you got benefit from having something configured, not coded - like dependency injection configuration in spring, etc.
And if you use spring - how often is configuration changed without recompiling?
Anything that needs to differ between different deployments of your application. That is, anything specific to the environment.
Examples include:
Database connection strings
URLs for web or WCF services
Logging configuration
Any information your application uses that is "data" and that could change depending on where it is installed. Things like:
smtp mail server used to send e-mails
Database connect strings
Paths to file locations / folders used by the app
FTP servers & connect info
Active Directory servers used for authentication
Any links displayed in the application to external information
sources
Warning limit values
I've even put the RegEx filters used to limit the allowable characters
for data entry fields.
Besides the obvious changing stuff (paths, servers, ports, and so on), some people argue that you should be able to easily change whatever might reasonably change, for instance, say you have a generic engine which operates on the business logic (a rule engine).
You would then define the rules on a "configuration file" which ends up being is no less than programming in a DSL instead of in the generic purpose language. Benefits being it's closer to the domain so it's easier and more maintainable, and that you can easily change things that otherwise would demand a new build.
The main argument behind this is that things you assumed would never change always end up changing nonetheless, so you better be prepared.
paths and server names/addresses come to mind..
I agree with your two conditions, which is why I:
Rarely include a config file as part of a Windows or Windows Mobile application (web apps yes).
If I did include a config file meant to be tweaked by end users, it certainly wouldn't be XML.
Employee emails/names since employees can come and go... (you should typically try to keep them out of an application though)
Configuration files should include:
deployment details
DB credentials
file paths
host names
anything that is used in many places but that may change
contact email addresses
options that aren't in the GUI
The last one is a bit open-ended, but very important. I've found it very useful to foresee variables that the client may, in the future, want to change. If changes are infrequent, I or they can edit the config file. If it becomes a frequent thing, it's trivial to add the option to the GUI, which isn't hardcoded.
I would also add encryption keys (which themselves should be encrypted)...
Basically the rule of thumb is information the application needs BEFORE it's regular, functional operation, data that it MUST have on-hand (i.e. local and not networked).
Note that this data should not be dynamically changing or large amounts of it, otherwise it should be in the database.
With Spring apps I actually distinguish between two types of configuration:
Items externalized into property files which are "deploy time" concerns or "environment-specific": server IP's / addresses, file system locations, etc etc
Spring XML configuration which can do lots of things, like indicate the overall application structure, apply behavior via AOP, etc.
I use Spring to wire all the beans in a J2SE application that has no GUI (a transactional switch). That way it's very easy for me to have different configurations in each deployment (we have this thing running in different countries), without having to code anything different.
Another thing I like to have is to manage all the SQL statements separately from the code, when I use plain JDBC (or Spring JDBC). Like in a properties file or XML or something, sometimes even as String properties in the beans that will use the statement (when there is only one bean that will use the statement, such as a DAO).
I am going to use spring JDBC or vanilla JDBC for data persistence, here we have decided to externalize all the SQL from the Java code, so can be better mangable in terms of SQL query tuning and optimization, we don't need to disturb the java code.
This seems to be an overlooked area that could really use some insight. What are your best practices for:
making an upgrade procedure
backing out in case of errors
syncing code and database changes
testing prior to deployment
mechanics of modifying the table
etc...
Liquibase
liquibase.org:
it understands hibernate definitions.
it generates better schema update sql than hibernate
it logs which upgrades have been made to a database
it handles two-step changes (i.e. delete a column "foo" and then rename a different column to "foo")
it handles the concept of conditional upgrades
the developer actually listens to the community (with hibernate if you are not in the "in" crowd or a newbie -- you are basically ignored.)
http://www.liquibase.org
opinion
the application should never handle a schema update. This is a disaster waiting to happen. Data outlasts the applications and as soon as multiple applications try to work with the same data ( the production app + a reporting app for example) -- chances are they will both use the same underlying company libraries... and then both programs decide to do their own db upgrade ... have fun with that mess.
I am a big fan of Red Gate products that help creating SQL packages to update database schemas. The database scripts can be added to source control to help with versioning and rollback.
In general my rule is: "The application should manage it's own schema."
This means schema upgrade scripts are part of any upgrade package for the application and run automatically when the application starts. In case of errors the application fails to start and the upgrade script transaction is not committed. The downside to this is that the application has to have full modification access to the schema (this annoys DBAs).
I've had great success using Hibernates SchemaUpdate feature to manage the table structures. Leaving the upgrade scripts to only handle actual data initialization and occasional removing of columns (SchemaUpdate doesn't do that).
Regarding testing, since the upgrades are part of the application, testing them becomes part of the test cycle for the application.
Afterthought: Taking on board some of the criticism in other posts here, note the rule says "it's own". It only really applies where the application owns the schema as is generally the case with software sold as a product. If your software is sharing a database with other software, use other methods.
That's a great question. ( There is a high chance this is going to end up a normalised versus denormalised database debate..which I am not going to start... okay now for some input.)
some off the top of my head things I have done (will add more when I have some more time or need a break)
client design - this is where the VB method of inline sql (even with prepared statements) gets you into trouble. You can spend AGES just finding those statements. If you use something like Hibernate and put as much SQL into named queries you have a single place for most of the sql (nothing worse than trying to test sql that is inside of some IF statement and you just don't hit the "trigger" criteria in your testing for that IF statement). Prior to using hibernate (or other orms') when I would do SQL directly in JDBC or ODBC I would put all the sql statements as either public fields of an object (with a naming convention) or in a property file (also with a naming convention for the values say PREP_STMT_xxxx. And use either reflection or iterate over the values at startup in a) test cases b) startup of the application (some rdbms allow you to pre-compile with prepared statements before execution, so on startup post login I would pre-compile the prep-stmts at startup to make the application self testing. Even for 100's of statements on a good rdbms thats only a few seconds. and only once. And it has saved my butt a lot. On one project the DBA's wouldn't communicate (a different team, in a different country) and the schema seemed to change NIGHTLY, for no reason. And each morning we got a list of exactly where it broke the application, on startup.
If you need adhoc functionality , put it in a well named class (ie. again a naming convention helps with auto mated testing) that acts as some sort of factory for you query (ie. it builds the query). You are going to have to write the equivalent code anyway right, just put in a place you can test it. You can even write some basic test methods on the same object or in a separate class.
If you can , also try to use stored procedures. They are a bit harder to test as above. Some db's also don't pre-validate the sql in stored procs against the schema at compile time only at run time. It usually involves say taking a copy of the schema structure (no data) and then creating all stored procs against this copy (in case the db team making the changes DIDn't validate correctly). Thus the structure can be checked. but as a point of change management stored procs are great. On change all get it. Especially when the db changes are a result of business process changes. And all languages (java, vb, etc get the change )
I usually also setup a table I use called system_setting etc. In this table we keep a VERSION identifier. This is so that client libraries can connection and validate if they are valid for this version of the schema. Depending on the changes to your schema, you don't want to allow clients to connect if they can corrupt your schema (ie. you don't have a lot of referential rules in the db, but on the client). It depends if you are also going to have multiple client versions (which does happen in NON - web apps, ie. they are running the wrong binary). You could also have batch tools etc. Another approach which I have also done is define a set of schema to operation versions in some sort of property file or again in a system_info table. This table is loaded on login, and then used by each "manager" (I usually have some sort of client side api to do most db stuff) to validate for that operation if it is the right version. Thus most operations can succeed, but you can also fail (throw some exception) on out of date methods and tells you WHY.
managing the change to schema -> do you update the table or add 1-1 relationships to new tables ? I have seen a lot of shops which always access data via a view for this reason. This allows table names to change , columns etc. I have played with the idea of actually treating views like interfaces in COM. ie. you add a new VIEW for new functionality / versions. Often, what gets you here is that you can have a lot of reports (especially end user custom reports) that assume table formats. The views allow you to deploy a new table format but support existing client apps (remember all those pesky adhoc reports).
Also, need to write update and rollback scripts. and again TEST, TEST, TEST...
------------ OKAY - THIS IS A BIT RANDOM DISCUSSION TIME --------------
Actually had a large commercial project (ie. software shop) where we had the same problem. The architecture was a 2 tier and they were using a product a bit like PHP but pre-php. Same thing. different name. anyway i came in in version 2....
It was costing A LOT OF MONEY to do upgrades. A lot. ie. give away weeks of free consulting time on site.
And it was getting to the point of wanting to either add new features or optimize the code. Some of the existing code used stored procedures , so we had common points where we could manage code. but other areas were this embedded sql markup in html. Which was great for getting to market quickly but with each interaction of new features the cost at least doubled to test and maintain. So when we were looking at pulling out the php type code out, putting in data layers (this was 2001-2002, pre any ORM's etc) and adding a lot of new features (customer feedback) looked at this issue of how to engineer UPGRADES into the system. Which is a big deal, as upgrades cost a lot of money to do correctly. Now, most patterns and all the other stuff people discuss with a degree of energy deals with OO code that is running, but what about the fact that your data has to a) integrate to this logic, b) the meaning and also the structure of the data can change over time, and often due to the way data works you end up with a lot of sub process / applications in your clients organisation that needs that data -> ad hoc reporting or any complex custom reporting, as well as batch jobs that have been done for custom data feeds etc.
With this in mind i started playing with something a bit left of field. It also has a few assumptions. a) data is heavily read more than write. b) updates do happen, but not at bank levels ie. one or 2 a second say.
The idea was to apply a COM / Interface view to how data was accessed by clients over a set of CONCRETE tables (which varied with schema changes). You could create a seperate view for each type operation - update, delete, insert and read. This is important. The views would either map directly to a table , or allow you to trigger of a dummy table that does the real updates or inserts etc. What i actually wanted was some sort of trappable level indirection that could still be used by crystal reports etc. NOTE - For inserts , update and deletes you could also use stored procs. And you had a version for each version of the product. That way your version 1.0 had its version of the schema, and if the tables changed, you would still have the version 1.0 VIEWS but with NEW backend logic to map to the new tables as needed, but you also had version 2.0 views that would support new fields etc. This was really just to support ad hoc reporting, which if your a BUSINESS person and not a coder is probably the whole point of why you have the product. (your product can be crap but if you have the best reporting in the world you can still win, the reverse is true - your product can be the best feature wise, but if its the worse on reporting you can very easily loose).
okay, hope some of those ideas help.
These are all weighty topics, but here is my recommendation for updating.
You did not specify your platform, but for NANT build environments I use Tarantino. For every database update you are ready to commit, you make a change script (using RedGate or another tool). When you build to production, Tarantino checks if the script has been run on the database (it adds a table to your database to keep track). If not, the script is run. It takes all the manual work (read: human error) out of managing database versions.
I've heard good things about iBATIS 3 Schema Migrations System:
User Guide: http://svn.apache.org/repos/asf/ibatis/java/ibatis-3/trunk/doc/en/iBATIS-3-Migrations.pdf
As Pat said, use liquibase. Especially when you have several developers with their own dev databases
making changes that will become part of the production database.
If there's only one dev, as on one project I'm on now(ha), I just commit the schema changes as SQL text files into a CVS repo, which I check out in batches on the production server when the code changes go in.
But liquibase is better organized than that!