Practices for allowing systems to accommodate human error? - validation

Systems have to sometimes accommodate the possibility of real world bad data. Consider that some data originates with paper forms. And forms inherently have a limited means of validating data.
Example 1: On one form users are expected to enter an integer distance (in miles) into a blank. We capture the information as written as a string since we don't always end up getting integer values.
Example 2: On another form we capture a code. That code should map to one of the codes in our system. However, sometimes the code written on the form is incorrect. We capture the code and allow it to exist with an invalid value until some future time of resolution. That is, we temporarily allow bad data since it's important to record the record even if some of it is invalid.
I'm interested in learning more about how systems accommodate bad data, that is, human error. Databases are supposed to be bastions of data integrity, but the real world is messy and people make mistakes. Systems must allow us to reflect those mistakes.
What are some ways systems you've developed accommodate human error? What practices have you used? What lessons have you learned?
Any further reading on the topic? (I had trouble Googling it.)

I agree with you, whatever we do there's no guarantee that we can get rid of bad or incorrect data. Especially, but not only, if it comes to user input. In my experience the same problems exist in complex integration projects, in which you have to integrate and merge (often inconsistent) data retrieved from different systems.
A good strategy is to decouple the input from the operational system itself. First, place user (or external system) provided data in a separate datastore (e.g. different schema). In a second step load this data into your operational datastore, but only if it confirms to strict rules (e.g. use address verification software to verify a given address). This Extract, Transform, Load (ETL) approach is fairly common in Data Warehousing (DWH) solutions, but can be applied programmatically in transactional systems as well (in my experience).
The above approach often leads to asynchronous processes in which the input is subitted first and (maybe) at a later time the external entity (user or system) retrives feedback whether its data was correct or not.
EDIT: For further readings I recommend to have a look at DWH concepts. Alhtough, you may not want to build such a thing, you could partially apply those concepts:
http://en.wikipedia.org/wiki/Extract,_transform,_load
http://en.wikipedia.org/wiki/Data_warehouse
http://en.wikipedia.org/wiki/Data_cleansing

A government department I worked in does a lot of surveys, most of which are (were) still paper based.
All the results were OCR'd into the system.
As part of the OCR process a digital scan of the forms is kept.
Data is then validated, data that is undecipherable or which fails validation is flagged.
When a human operator reviews the digital data they can modify the data if they are confident that they can correctly interpret what the code could not; they (here's the cool bit) can also bring up the scan of the paper based original, and use that to determine what the user was trying to say.
On a different thread; at some point you want to validate the data coming in against any expected data ranges that you want it to conform to; buy rejecting it at the point of entry you give the user a chance to correct it - the trade off is that every time you reject it you increase the chance of them abandoning the whole process.
At some point in your system you need to specify the rules which will be used for validation. At the end of the day a system is only going to be as smart as those rules. You can develop these yourself into the code (probably the business logic) or you might use a 3rd party component.
having flexible control over the validation is pretty important as they are likely to change overtime.

To be honest with you, one point of migrating from paper-based systems to IT is to remove these errors and make sure all data is always correct. I doubt any correctly planned and developed IT system (especially business financial systems) would allow such errors. Not in the company I am working for anyway...

There are lots of software tools that address the kinds of problems you mention. There are platforms and tools that let you define rules for scrubbing and transforming data and handling validation errors. Those techniques are widely used for Data Integration and Business Intelligence applications. Google for "Data Quality" or "Data Integration".

The easiest thing to do is to (this is not always possible) design the interface where users enter the data to limit as much as possible the amount of text that they need to enter. In my experience this seems to be where a lot of problems come from. One simple example of this is to provide a select, or auto-complete select field
One thing that you could do is do everything possible to determine if the data is correct before going into the db. I try to give the user entering the data as much feedback as possible so they can (ideally) fix some of the issues before the data gets persisted. For example, it is a very quick check to determine if the data being entered is of the correct type.

I got started in legal systems before the PC era. Litigation support databases routinely have to accommodate factually incorrect, incomplete, and contradictory information. It takes a different way of thinking.
The short version . . .
Instead of recording a single fact, you record multiple assertions about a fact. It boils down to designing a database to store data from assertions like these.
In an interview at 2011-01-03 08:13, Neil Rimes told Officer Cane
that he was at home from 2011-01-02 20:00 until 2011-01-03 08:13.
In an interview at 2011-01-03 08:25, Liza Nevers told Officer Cane
that Neil Rimes came home at 2011-01-02 23:45.
In a deposition at 2011-05-13 10:22, Cody Maxon told attorney Kurt
Schlagel that he saw Neil Rimes at Kroger at 2011-01-03 03:00

Related

Event model or schema in event store

Events in an event store (event sourcing) are most often persisted in a serialized format with versions to represent a changed in the model or schema for an event type. I haven't been able to find good documentation showing the actual model or schema for an actual event (often data table in event store schema if using a RDBMS) but understand that ideally it should be generic.
What are the most basic fields/properties that should exist in an event?
I've contemplated using json-api as a specification for my events but perhaps that's too "heavy". The benefits I see are flexibility and maturity.
Am I heading down the "wrong path"?
Any well defined examples would be greatly appreciated.
I've contemplated using json-api as a specification for my events but perhaps that's too "heavy". The benefits I see are flexibility and maturity.
Am I heading down the "wrong path"?
Don't overlook forward and backward compatibility.
You should plan to review Greg Young's book on event versioning; it doesn't directly answer your question, but it does cover a lot about the basics of interpreting an event.
Short answer: pretty much everything is optional, because you need to be able to change it later.
You should also review Hohpe's Enterprise Integration Patterns, in particular his work on messaging, which details a lot of cases you may care about.
de Graauw's Nobody Needs Reliable Messaging helped me to understan an important point.
To summarize: if reliability is important on the business level, do it on the business level.
So while there are some interesting bits of meta data tracking that you may want to do, the domain model is really only going to look at the data; and that is going to tend to be specific to your domain.
You also have the fun that the representation of events that you use in the service that produces them may not match the representation that it shares with other services, and in particular may not be the same message that gets broadcast.
I worked through an exercise trying to figure out what the minimum amount of information necessary for a subscriber to look at an event to understand if it cares. My answers were an id (have I seen this specific event before?), a token that tells you the semantic meaning of the message (is that something I care about?), and a location (URI) to get a richer representation if it is something I care about.
But outside of the domain -- for example, when you are looking at the system as a whole trying to figure out what is going on, having correlation identifiers and causation identifiers, time stamps, signatures of the source location, and so on stored in a consistent location in the meta data can be a big help.
Just modelling with basic types that map to Json to write as you would for an API can go a long way.
You can spend a lot of time generating overly complex models if you throw too much tooling at it - things like Apache Thrift and/or Protocol Buffers (or derived things) will provide all sorts of IDL mechanisms for you to generate incidental complexity with.
In .NET land and many other platforms, if you namespace the types you can do various projections from the types
Personally, I've used records and DUs in F# as a design and representation tool
you get intellisense, syntax hilighting, and types you can use from F# or C# for free
if someone wants to look, types.fs has all they need

Date/Time and Internationalization for Enterprise Application -- Development Guidelines

Together with another developer, I have embarked on a journey to create a hosted 'CRM Style' application that will cater to enterprise level businesses. These businesses will be accessing our application remotely and so the hosted nature of the application will require certain features. For example, to guarantee a level of professional service the following things must be true:
internationalization requires multiple languages and presentation of date/time for various timezones and locales
transactional capability for batch processing of tasks and rollback capabilities
security concerns for keeping data safe and remote invocations secure from attack
etcetera, the list goes on and on
Due to these concerns and my role as the developer most responsible for the server side development, I am very interested in the choices I make early on. Regarding timezones and languages for example, are there issues related to my choice of database or data fields? Do I choose to use a UTC timestamp or date field throughout the application and if so is there a standard format for that? Also, regarding different languages, am I supposed to ensure the data is stored in the database as UTF-8 or unicode?
I really want to avoid laying down the infustructure of the system only to discover later that a fundamental decision was incorrect or not big enough, wide enough, smart enough, etc. Can someone point me in the right direction regarding these basic 'early' decisions?
EDIT _ Ok I appreciate the broad responses and now I see my question was a little too non-specific. I'd like to focus on the more specific elements that WERE present in the question, such as how to choose the proper format for storing a UTC Date/Time or how to save my text data (do I specify a UTF format?)
If you are targeting enterprise CRM, then you will need a very high level of customizability and integrations with all kinds of systems. You will make mistakes in the design. Your only hope is to isolate each little piece of the code so that you can have a chance of fixing it later.
In short, basic software engineering principles are your best bet.
What you are discussing is called a multi-tenant application wherein you have the same code base used by multiple customers (tenants) with logical or physical separation of data. Remember the fundamental rule of development: flexibility is relational to complexity. The more flexible you make the system, the more complicated it will be.
RE: UTC
For a CRM application that stores things like when calls were made and when meetings took place, I would definitely store all those in UTC and let the user set their local timezone. However, you might run into dates which are timezone agnostic and for those, I would store whatever date was entered.
RE: Unicode
Yes, I would use Unicode for all user-entered data. However, that will not get you localization. If for a single company for example, you have a user in Hong Kong entering text in Chinese and user in Amsterdam entering text in Dutch, you are not going to get automatic translation. Things like dates and number formats can be localized, but raw text like names used in drop lists and such can be a chore to localize.
As you have not mentioned what you think about the issue, you may find my answer or parts of it rather basic.
If you don't need to, don't use a low-level language. I'd use python usually for the first version of a CRM application (with the hope that it would be good enough for the next versions), but this decision depends also on the domain community.
Try to write the minimal code on your own, instead relying on the third-party libraries. People may disagree on this, but I would write the code myself as the last option. But the next point is important.
When selecting a library/framework to use, make sure the party behind it is going to last, the library is stable and the software license suits you needs.
Other general rules apply: focus on the customer, use continuous integration/testing, etc., use good software practices like logging etc.
Nothing is ever stored as "unicode" because this is an abstract concept. Unicode is always stored in some kind of unicode transformation format (UTF) (well or UCS but I never saw that used somewhere). The most commonly used UTF is UTF-8 but I suggest to use what is native/default to your platform. wikipedia

Generating UI from DB - the good, the bad and the ugly?

I've read a statement somewhere that generating UI automatically from DB layout (or business objects, or whatever other business layer) is a bad idea. I can also imagine a few good challenges that one would have to face in order to make something like this.
However I have not seen (nor could find) any examples of people attempting it. Thus I'm wondering - is it really that bad? It's definately not easy, but can it be done with any measure success? What are the major obstacles? It would be great to see some examples of successes and failures.
To clarify - with "generating UI automatically" I mean that the all forms with all their controls are generated completely automatically (at runtime or compile time), based perhaps on some hints in metadata on how the data should be represented. This is in contrast to designing forms by hand (as most people do).
Added: Found this somewhat related question
Added 2: OK, it seems that one way this can get pretty fair results is if enough presentation-related metadata is available. For this approach, how much would be "enough", and would it be any less work than designing the form manually? Does it also provide greater flexibility for future changes?
We had a project which would generate the database tables/stored proc as well as the UI from business classes. It was done in .NET and we used a lot of Custom Attributes on the classes and properties to make it behave how we wanted it to. It worked great though and if you manage to follow your design you can create customizations of your software really easily. We also did have a way of putting in "custom" user controls for some very exceptional cases.
All in all it worked out well for us. Unfortunately it is a sold banking product and there is no available source.
it's ok for something tiny where all you need is a utilitarian method to get the data in.
for anything resembling a real application though, it's a terrible idea. what makes for a good UI is the humanisation factor, the bits you tweak to ensure that this machine reacts well to a person's touch.
you just can't get that when your interface is generated mechanically.... well maybe with something approaching AI. :)
edit - to clarify: UI generated from code/db is fine as a starting point, it's just a rubbish end point.
hey this is not difficult to achieve at all and its not a bad idea at all. it all depends on your project needs. a lot of software products (mind you not projects but products) depend upon this model - so they dont have to rewrite their code / ui logic for different client needs. clients can customize their ui the way they want to using a designer form in the admin system
i have used xml for preserving meta data for this sort of stuff. some of the attributes which i saved for every field were:
friendlyname (label caption)
haspredefinedvalues (yes for drop
down list / multi check box list)
multiselect (if yes then check box
list, if no then drop down list)
datatype
maxlength
required
minvalue
maxvalue
regularexpression
enabled (to show or not to show)
sortkey (order on the web form)
regarding positioning - i did not care much and simply generate table tr td tags 1 below the other - however if you want to implement this as well, you can have 1 more attribute called CssClass where you can define ui specific properties (look and feel, positioning, etc) here
UPDATE: also note a lot of ecommerce products follow this kind of dynamic ui when you want to enter product information - as their clients can be selling everything under the sun from furniture to sex toys ;-) so instead of rewriting their code for every different industry they simply let their clients enter meta data for product attributes via an admin form :-)
i would also recommend you to look at Entity-attribute-value model - it has its own pros and cons but i feel it can be used quite well with your requirements.
In my Opinion there some things you should think about:
Does the customer need a function to customize his UI?
Are there a lot of different attributes or elements?
Is the effort of creating such an "rendering engine" worth it?
Okay, i think that its pretty obvious why you should think about these. It really depends on your project if that kind of model makes sense...
If you want to create some a lot of forms that can be customized at runtime then this model could be pretty uselful. Also, if you need to do a lot of smaller tools and you use this as some kind of "engine" then this effort could be worth it because you can save a lot of time.
With that kind of "rendering engine" you could automatically add error reportings, check the values or add other things that are always build up with the same pattern. But if you have too many of this things, elements or attributes then the performance can go down rapidly.
Another things that becomes interesting in bigger projects is, that changes that have to occur in each form just have to be made in the engine, not in each form. This could save A LOT of time if there is a bug in the finished application.
In our company we use a similar model for an interface generator between cash-software (right now i cant remember the right word for it...) and our application, just that it doesnt create an UI, but an output file for one of the applications.
We use XML to define the structure and how the values need to be converted and so on..
I would say that in most cases the data is not suitable for UI generation. That's why you almost always put a a layer of logic in between to interpret the DB information to the user. Another thing is that when you generate the UI from DB you will end up displaying the inner workings of the system, something that you normally don't want to do.
But it depends on where the DB came from. If it was created to exactly reflect what the users goals of the system is. If the users mental model of what the application should help them with is stored in the DB. Then it might just work. But then you have to start at the users end. If not I suggest you don't go that way.
Can you look on your problem from application architecture perspective? I see you as another database terrorist – trying to solve all by writing stored procedures. Why having UI at all? Try do it in DB script. In effect of such approach – on what composite system you will end up? When system serves different businesses – try modularization, selectively discovered components, restrict sharing references. UI shall be replaceable, independent from business layer. When storing so much data in DB – there is hard dependency of UI – system becomes monolith. How you implement MVVM pattern in scenario when UI is generated? Designers like Blend are containing lots of features, which cannot be replaced by most futuristic UI generator – unless – your development platform is Notepad only.
There is a hybrid approach where forms and all are described in a database to ensure consistency server side, which is then compiled to ensure efficiency client side on deploy.
A real-life example is the enterprise software MS Dynamics AX.
It has a 'Data' database and a 'Model' database.
The 'Model' stores forms, classes, jobs and every artefact the application needs to run.
Deploying the new software structure used to be to dump the model database and initiate a CIL compile (CIL for common intermediate language, something used by Microsoft in .net)
This way is suitable for enterprise-wide software and can handle large customizations. But keep in mind that this approach sets a framework that should be well understood by whoever gonna maintain and customize the application later.
I did this (in PHP / MySQL) to automatically generate sections of a CMS that I was building for a client. It worked OK my main problem was that the code that generates the forms became very opaque and difficult to understand therefore difficult to reuse and modify so I did not reuse it.
Note that the tables followed strict conventions such as naming, etc. which made it possible for the UI to expect particular columns and infer information about the naming of the columns and tables. There is a need for meta information to help the UI display the data.
Generally it can work however the thing is if your UI just mirrors the database then maybe there is lots of room to improve. A good UI should do much more than mirror a database, it should be built around human interaction patterns and preferences, not around the database structure.
So basically if you want to be cheap and do a quick-and-dirty interface which mirrors your DB then go for it. The main challenge would be to find good quality code that can do this or write it yourself.
From my perspective, it was always a problem to change edit forms when a very simple change was needed in a table structure.
I always had the feeling we have to spend too much time on rewriting the CRUD forms instead of developing the useful stuff, like processing / reporting / analyzing data, giving alerts for decisions etc...
For this reason, I made long time ago a code generator. So, it become easier to re-generate the forms with a simple restriction: to keep the CSS classes names. Simply like this!
UI was always based on a very "standard" code, controlled by a custom CSS.
Whenever I needed to change database structure, so update an edit form, I had to re-generate the code and redeploy.
One disadvantage I noticed was about the changes (customizations, improvements etc.) done on the previous generated code, which are lost when you re-generate it.
But anyway, the advantage of having a lot of work done by the code-generator was great!
I initially did it for the 2000s Microsoft ASP (Active Server Pages) & Microsoft SQL Server... so, when that technology was replaced by .NET, my code-generator become obsoleted.
I made something similar for PHP but I never finished it...
Anyway, from small experiments I found that generating code ON THE FLY can be way more helpful (and this approach does not exclude the SAVED generated code): no worries about changing database etc.
So, the next step was to create something that I am very proud to show here, and I think it is one nice resolution for the issue raised in this thread.
I would start with applicable use cases: https://data-seed.tech/usecases.php.
I worked to add details on how to use, but if something is still missing please let me know here!
You can change database structure, and with no line of code you can start edit data, and more like this, you have available an API for CRUD operations.
I am still a fan of the "code-generator" approach, and I think it is just a flavor of using XML/XSLT that I used for DATA-SEED. I plan to add code-generator functionalities.

The future of Naked Objects pattern (and UI auto-generation) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I ask about the pattern, not framework. This is kind of follow-up to a question on UI auto-generation.
Do you believe in the concept of UI auto-generation from metadata?
What kind of problems can be approached this way?
The question arose when I've created a small library to support my student projects, which generates interactive CLI in runtime based on object's metadata. And I think CLI it generates is quite decent.
On the other extreme is the Naked Objects Framework, which is rather universal, but UI it generates is horrible, IMO.
It's clear, every problem is specific and needs specific UI, but maybe there are several classes of problems where auto-generation is acceptable?
Yes, I believe the concept of metadata-based auto-generated applications is very sound - mainly because it drastically reduces development time and improves code quality by reducing the massive redundancy you have in most applications where each domain data field is represented in the database, in the model, in the UI, and often also several times in various mapping layers.
I think the future is auto-generated apps that can be modified wherever necessary. Currently, this is AFAIK not really possible; for example, Rails only allows you to fully customize the UI when you use static scaffolding, which basically means code generation, i.e. many further changes in the domain model are then not automatically represented in the UI because the duplication has happened when the code was generated.
I believe the first framework that manages to combine complete auto-generation with complete modifiability afterwards will become the de-facto development standard to a previously unknown degree. Though most likely we'll get there in small steps so that there will not be such a single dominating framework.
Take a look at JMatter, which is a rather better-looking implementation of Naked Objects.
http://www.jmatter.org
There is also Chris Muller's work on MAUI, and Lukas Renggli's work on Magritte (both Squeak /Smalltalk)
We have lots of generated UI in the configuration part of our apps. All those lists that are around forever and changed once in a blue moon by a system administrator.
I find that most applications with a database back-end tend to have a bad design from an OO and NO perspective, as already shown in the NO book by Pawson and Matthews.
Re: qn #1 ... Do you believe in the concept of UI auto-generation from metadata? ... I'm definitely going to answer 'yes' to your first question, being one of the committers to the Naked Objects (Java) framework and writing a book on DDD + NO.
The question mentions metadata. I think this is key to NO being able to succeed. In the latest version (which will be going beta in Feb) the metamodel has been opened up so that it is very extensible, either so you can write your domain model following your own programming conventions/annotations, or, potentially so that more sophisticated viewers can look for their own metadata to provide more sophisticated views. (For example, consider that if an object implemented a Location interface then it is displayed in a google maps).
Regarding qn #2 ... what kind of problems can be approached this way ... we've always said that NO is more suitable for "sovereign applications" (transactional, operational systems ones used internally within an organization) to "transient applications" (like an airport kiosk, say). An NO GUI does require that the user is familiar with the domain, otherwise they won't know what they are looking at.
What's missing still is sophisticated viewers, of course. You are right about the NO GUI, it is definitely low fidelity (though the .NET version is a big improvement, see recent infoq.com article). On the Java side there is a sister project called scimpi.org that has a lot of promise though... it provides a basic web GUI for free but lets you hand-craft web pages as necessary and incrementally. I'm also working on an Eclipse RCP GUI that'll work similarly.
The other thing to add to this though is that the NO approach has value (I believe) even if you choose to write a custom GUI and/or presentation layer. That is, you can use it as a design tool for building a very solid pojo domain layer, and then skin it as you will. Trouble is that NO was never originally sold in those terms, so most will see the NO pattern as an all-or-nothing affair.
Dan
One way to look at this is to consider the difference between the user interface you get from something like Toad or MySQL Browser, where the user interface is directly constructed from the tables and their associated meta data, and the user interface that a skilled designer would develop for the actual application. IF there not too disimilar then it should be fairly low hanging fruit for an auto-generation framework.
As you say there are classes of problems which will work quite well with this kind of auto generation and some which wouldn't. To my mind the key things are how well the implementation model (or portion thereof) which you are exposing in the user interface maps to the conceptual model of the user. Secondly how well can the behavior of the application can be expressed through a limited set of user interface components (assuming this is a general purpose UI generation framework).
This article "Universal Model of a User Interface" may be of interest .
I think the idea of automatically generated UIs has a lot of potential especially for your average form-and-table layout database user interface. However, even there a human needs to be in the loop, having the ability to override the output without it being overwritten with the next regeneration.
I suspect automatically generated UIs would be more successful today if interaction designers were more involved in developing the generation algorithms. My impression is that historically the creators of these systems don’t know what kinds of UI-related metadata to include or how to use it. Specifying labels, value ranges, formats, and orders for fields is a start, but more high level information is needed. Sufficient modeling of the tasks and user roles in particular tends to be lacking, along with some basic style-guide-level principles for UI.
Oracle’s Designer 2000, for example, was on the right track in including not only the entities and relations in the model, but also the tasks in the form of a functional hierarchy. Then they blew it by misapplying this metadata (e.g., assuming that depth is always preferred to breadth) and including fundamental flaws when generating the UI (e.g., only one primary window can be opened at a time). The result was IUs that were not even consistent with Oracle’s own Applications User Interface Standards.
Getting a basic UI up quickly that lets the customer try out the system and create test data must be of value. Naked Objects frameworks can help for the “boot strapping” even if you have to have replace it with “hand crafted” UI before you ship.
In most system I have worked on, there have been lots of simple housekeeping tables. All these tables need a UI to edit and view them etc. There is also great value in these simple editors being consistent. Here a naked Objects framework could save a lot of time, even if the main “day to day” UI is “hand crafted”
I have seen a couple of failed projects (cases where I was brought in as a rather expensive consultant to help architect the replacement) which used the "naked objects" approach (not the framework, AFAIK) - all with simply atrocious UIs, and worked replacing a lot of the UI on one project which, in its original incarnation, had a similar approach (the entire application was a tree of objects accessed through context menus and property sheets - this was NetBeans 2.0 circa 1998 - IDE as a giant hierarchical JavaBean).
The bottom line is, your users don't care about your architecture, they care about getting what they need to do done in the most comprehensible-to-mere-mortals set of interactions you can come up with. If that happens to align with your architecture, you are having a lucky day - but it really is serendipity. Trying to force users to care (or even know) about your architecture is a recipe for software nobody wants to use.
Code generally needs to be designed around two not-always-compatible goals:
Maintainability - people who didn't write the code can understand the code
Stability and performance - i.e. the activities the code asks the computer to physically do are both possible, and can be completed within a reasonable time frame
The abstractions and code structures that it makes sense to create to meet those two goals very, very rarely map exactly to user interface elements of any sort. Sometimes you can get away with it - barely - if your audience is technical. But even there, you are likely to please more users with at least a "presentation layer" adapter layer on top of the architecture that makes sense for programmers and machines.

Documents for a project? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I work for a CMMI level 5 certified company and one thing I hate about is the amount of documents we prepare (As a programmer I already hate documents). We have lots and lots of documents like PID(project initiation doc), Business requirements, System requirements,tech spec, Code review checklist, issue logs, Defect logs, Configuration management plan, Configuration management check list(s), Release documents and lots...
Almost 90% of these docs are just done for the sake of QA audit :) .. What do you think are the most important documents for a project? What documents can be used in the long run by another developer?
Please share your good practices here. I would like to use them for my own projects or the company I am planning to start in the long run.
Thanks
The key document is a good functional spec. There should be one and only one reference document for a system.
Overdoing documentation proliferates a large number of small requirements and spec documents every time someone changes a system or interface. For a system of any complexity, before long you have your spec distributed around several hundred assorted word, excel, visio and even powerpoint files. When this happens you lose clarity about what is current or even whether you have located and identified all pertinent documentation.
The BRD-SRD-Tech spec progression is based on an assumption that the business signs off the BRD, a business analyst signs off the SRD against requirements documented in the BRD and the technical specification is signed off against the SRD. This generates a web of sign-offs, multiple documents with redundant information and makes it difficult and clumsy to keep the spec documents up to date.
Because of this, subsequent requirements documentatation tends to take the form of a series of change request and supplemental requirement and spec docs, each with their own sign-off and audit process. You gain CYA and audit trail (or at least the appearance of an audit trail), but you lose clarity. There is now no definitive reference document for the system and it is difficult to establish what is current or relevant to any particular activity. The net result is that your business analysis process gets bogged down in forensic research, which adds overheads and latency to delivery schedules.
A spec document should be built in such a way that there is one definitive reference for any given system or subsystem. The document should be kept up to date and versioned. Get a good technical documentation tool like Framemaker, so your process can scale, and the document has some structural integrity of the sort lacking on Word.
For me the only real document I ever use is a spec. The more detail the better. However it doesnt need to be all completed at one time, and it doesnt need to be particularly formal. What is far more useful to me than documents that are checked and signed and double checked and double signed is always being able to get the latest version of a document. And being able to talk to people about what they have written, and get a decision in the case of any ambiguity. this is far more useful to me than anything else.
To sum up: a spec is the only document I have ever found useful, however it pales in comparison to having a project manager who knows the proposed system inside out, and can make sensible decisions based on what they know.
Documentation is like tofu -- most people hate it until they realize that under the right conditions, it can be really good.
The problem is that what you consider documentation is mostly made for documentation's sake. You, as a developer, don't see any immediate value in the documents you produce because you know you can do your job without all the TPS reports which you're required to make.
Unfortunately, I'm going to wager that there's not a lot you can do about in a company where you're being forced to eat raw tofu all the time. You'll probably just have to suck it up and write the docs which your company requires, but you can at least do one thing... you can write documents which at least are useful to you, and you can pass them along with your code for others who will maintain it.
Aside from inline documentation, you could set up a wiki to be used by yourself and people on your team. This type of documentation is searchable, which is already a big plus to developers, plus it's more of a living document instead of a homework-like paper you had to write. You already post to SO, so just think of your documentation as pooling your knowledge in a more useful place.
What do you think are the most important documents for a project?
Different people have different needs: for example the documents which the owner needs (e.g. the business contract) aren't the same as the documents which QA needs.
What documents can be used in the long run by another developer?
IMO the most important document (except for the source code) is the functional specification: because what the software is supposed to do (as opposed to, what it is doing) is the one thing that can't necessarily be reverse-engineered. See also How does a good developer keep from creating code with a low bus hit factor?
User Stories, burndown chart, code
I'm a fan of the old 4+1 views:
Use Case view (a/k/a user stories). There are several forms: proper use cases, forward-looking use cases that aren't as well defined and epics which need to be decomposed.
Logical view. The "static" view. UML Class diagrams and the like work well here as a design document. This also includes request and response formats for various protocols. Here is where we document the RESTful requests and responses. This includes the REST URI design.
Process view. The "dynamic" view. UML activity diagrams, sequence diagrams and statecharts and the like for here for design documents. In some cases, simple narratives work well. In other cases, there's a State design pattern, and it requires a combination of class diagrams and statecharts to show how the stateful objects interact.
This also includes protocols (e.g. REST). Here is where we define any special processing for the various REST requests.
This also includes an authentication or authorization rules, and any other cross-cutting aspects like security, logging, etc.
Component view. The pieces we're building for deployment. This includes the stuff we depend on, the structure of the modules and packages, etc. This is often a simple component diagram or a list of components and their dependencies.
Deployment view. We try to generate this from the code as deployed. Since we're using Python, we use epydoc to create the API documentation. We also use Sphinx to import module documentation into this view of the software.
This also includes the parameters, settings, and configuration details.
This, however, isn't sufficient.
When projects start, you have to work up to this through a series of sprints.
The first sprints build just the use case view.
Subsequent sprints build an "architecture" to implement the use cases. The architecture document has 4+1 views, but at a high level of abstraction. It summarizes the structure of the model schemas, the requests and replies, the RESTful processing, other processing, the expected componentry, etc. It never has a Deployment view. We generally reference operator guide and API documents as the deployment view of an architecture.
Then design-and-construction sprints build (and update) detailed 4+1 view documents for various components.
Then release sprints build (and update) the deployment views.
From the project point of view, the most important documents are those that normally include the word Plan, such as the Project Plan, Configuration Management Plan, Quality Plan, etc.
What you are describing is common in process improvements, and normally responds to two major causes. One is that the system really is overeaching and getting in the way of real work being done. Another is actually answered in your question: it is not that the documents are only done for the sake of audits, and your focus should not just be how usefull is the doc for other developers, but for the project or the company as a whole.
One usually looks at things from it's own perspective, sometimes it's necessary to look at the general picture.

Resources