Designing a complex workflow diagram - user-interface

We've got a surprisingly complex workflow that needs to be monitored by a quasi-technical employees with an in-house webapp. There's about 30 steps, some of which are manual (editing), some are semi-automated stop points (like "the files have been received" or customer approval of certain templates), and some are completely automated (file conversion, search indexing, etc). The flowchart for all of these steps is large and complicated, and three people might be working on three completely different steps at any one time.
How would you present this vast amount of information as usefully as possible to your users? Just showing the whole diagram seems like the brute force solution. But it's big, and it'll likely get bigger as we do more things. Not to mention the complexity necessary to encode this entire diagram in HTML.

I assume you don't want to show these just for entertainment or mockery, but help the users along the way, automating as much as possible, document the process etc. It would probably help if you clearly define the goals or purpose of your app.
I don't see a point in showing the entire workflow, except for "debugging the business rules" or maybe the clients want to see it.
If your goal is to help users do their job, I would present the state of the "project" (or whatever term fits better) is at, and possible transitions to other states.
The State might be multiple mostly independent variables, e.g. one might describe the progress of content - e.g. "incomplete" / "complete" / "reviewed by 2nd staffer" / "signed off by 2nd staffer", others might contain a schedule that is developed in parallel, e.g. "test print date = not scheduled", "print date = not scheduled", "final delivery = tomorrow, preferredly yesterday".
A transition might be "Seint to customer for review", "mark as content-complete", "content modified", etc.
Is this what you have in mind?

I propose to divide your workflow in modules and represent the active state for each module.
A module is a subset of your main workflow. For example it could be divided by tasks, person, roles, department, etc. This will greatly simplify the representation of the workflow. Let's says someone is responsible for data entry at many critical moments. We can group all his tasks in one module (or sub-workflow) containing the same activities, inputs, outputs and conditions. Modules could be inter-dependants and related.
A state is where we are located in a module. In simple workflows there is only one active task. In real life we are multi-threaded! So maybe in one module many states could be active at the same time. The state also includes active inputs, outputs and memory bits.
An input is something required to perform an activity for evaluation a boolean condition. It could be a document, a piece of data, a signal...
An output is something resulting from a task: an information, a document, a signal...
Enough definitions?
Then simply convert your workflow into a LADDER LOGIC and you have your states!
See Ladder Logic definition on Wikipedia
You display only active states:
Active task(s) for the module
Inputs required / inputs confirmed
Output required / output realized
Conditions to continue
Seems abstract?
Here is a small example...
Janet enters data in the system. She manages the green tasks of the diagram. We focus only on her work, not other tasks. She knows how to do 16 tasks in the workflow. We are waiting the following actions from her to continue, and her Intranet dashboard says:
Priority 1: You must send a PO to order enough pencils for the next month based on the sales report.
Task: Send a purchase order
Inputs: Forecast report from the marketing department
Outputs: PO, vendor, item, quantity
Condition for completion: PO sent and order confirmation received from supplier
Priority 2: You must enter into the financial system the number of erasers rejected by production
Task: Data entry
Inputs: Reject count from production
Outputs: Number of rejects
Condition for completion: data entered and confirmed
We do a lot of troubleshooting on automated production systems having hundreds of thousands ladder steps (the workflow is too complex to be represented in a whole). When the system is blocked we look at each module and determine what inputs are missing to activation task completion.
Good luck!

This sounds like the sort of application for which BPEL is suited.
Of course you don't want to re-architect your system right now. But there are a number of BPEL implmentations out there, some of which include graphical editing tools. One of these might help you in your current situation, because they are good at handling scope and hiding detail. So I think you might derive benefit from drawing your workflow as a BPEL diagram even if you don't do anything else with the language.
The Wikipedia page lists several of the available implementations. In addition, Oracle's JDeveloper IDE includes a BPEL Diagrammer as part of its SOA suite; unfortunately it is no longer part of the standard install but it is still available. Find out more.

Try doing it in layers. You have the most detailed layer done, now add additional docs with the details hidden, grouped into higher-level business processes. Users should be able to safely ignore some of those details, but it's good for them to have visibility of how their part fits in to the whole.
You may need more than one higher-level document.

You can use Prezi to present this information to users in a lucid manner.
Split and present the work flow into phases such that the end user is easily able to identify the phase he is currently in.
Display as many number of phases as the number of inputs. The workflow starts with 6 different inputs so display the six different buttons on screen enabling the user to select the input that he wants.
On selecting the button zoom into the workflow depicting the next steps. This would also help the user to verify the actions that he has done so far to reach the current states.
This would also help the user to verify the actions that he has done so far to reach the current states. But this way of presenting could become cumbersome for the users as the number of steps that he has completed goes up. Say the user has almost reached the end of the workflow. To check for the next step he should go through all the steps which might frustrate the user.
To avoid this you can split the complete work flow chronologically into 3-5 phases. The phases should be split logically. The ultimate aim would be not to overwhelm the users with the full work flow. Personally i would try to avoid the task involving this workflow if presented the way you have shown. No offense. I bet you also feel the same.
Could give you a better picture if you could re-post the image after replacing the state names with numbers.

I'd recommend having the whole flow documented somewhere, but in terms of what is distributed to users, how about focusing on task-oriented flows? No one user will be responsible for the entire process I would imagine.
For example, let's say I have 2 roles, A and B, and 6 tasks, 1 through 6, executed in order. Each task may have multiple steps but is self-contained (e.g. download the file, review, run process, review again, upload). A does the even tasks and B does the odd tasks.
A would need to know about those detailed steps that comprise tasks 2, 4, and 6 but not about what goes on in 1, 3, and 5. So hand A a detailed set of flows for the tasks he is responsible for, along with a diagram that treats each task as a black box.
If the flow can't be made modular in this way, you may want to review the process itself to see why it's so complex.

How about showing an example of a workflow scenario, that is, showing the transitions in one possible passing through the workflow? You could cater this to a specific user profile and highlight the pertinent states, dimming the others. This allows them to get a clear idea of the transitions by seeing a real-life example.

Related

Is there a process for munging data from many different formats in RapidMiner?

I'm trying to help my team streamline a data ingestion process that is taking up a substantial amount of time. We receive data in multiple formats and with attributes arranged differently. Is there a way using RapidMiner to create a process that:
Processes files on a schedule that are dropped into a folder (this
one I think I know but I'd love tips on this as scheduled processes
are new to me)
Automatically identifies input filetype and routes to the correct operator ("Read CSV" for example)
Recognizes a relatively small number of attributes and arranges them accordingly. In some cases, attributes are named the same way as our ingestion format and in others they are not (phone vs phone # vs Phone for example)
The attributes we process mostly consist of name, id, phone, email, address. Also, in some cases names are split first/last and in some they are full name.
I recognize that munging files for such simple attributes shouldn't be that hard but the number of files we receive and lack of order makes it very difficult to streamline a process without a bit of automation. I'm also going to move to a standardized receiving format but for a number of reasons that's on the horizon and not an immediate solution.
I appreciate any tips or guidance you can share.
Your question is relative broad, so unfortunately I can't give you complete answer. But here are some ideas on how I would tackle the points you mentioned:
For a full process scheduling RapidMiner Server is what you are
looking for. In that case you can either define a schedule (e.g.,
check regularly for new files) or even define a web service to
trigger the process.
For selecting the correct operator depending on file type, you could
use a combination of "Loop Files" and macro extraction to get the
correct type and the use either "Branch" or "Select Subprocess" for
switching to different input routes.
The "Select Attributes" operator has some very powerful options to
select specific subsets only. In your example I would go for a
regular expression akin to [pP]hone.* to get the different spelling
variants. Also very helpful in that case would be the "Reorder
Attributes" operator and "Rename by Replacing" to create a common
naming schema.
A general tip when building more complex process pipelines is to organize your different tasks in sub-processes and use the "Execute Process" operator. This makes everything much more readable and maintainable. Also a good error handling strategy is important to handle unforeseen data formats.
For more elaborate answers and tips from many adavanced RapidMiner users, I also highly recommend the RapidMiner community.
I hope this gives a good starting point for your project.

Delays in updating content controls when more context.sync() are used on WORD for Mac

We update content control for every character typed in the task pane’s input field. So that user can see the live updates on the word document.
Recently we added functionality for locking content controls. And it happens as below:
User input (types a character) in a input field
We search a content control for that input field (involves context.sync)
Unlock the content control (involves context.sync)
Update value in content control (involves context.sync)
Lock back the content control (involves context.sync)
All this works nice in Word for windows without problems.
But is extremely (visibly) slow with Word for Mac (apple machines)
How should I overcome the delays happening on Mac?
As Juan mentioned in the comment, there are some important details that the team would need to investigate. Sample code would be good too.
That being said, just looking at what you describe, I think you can dramatically cut down on the context.sync() statements. Unlocking the content control, updating its value, and locking it should all be possible to do in one sync.
I have a bunch of details about minimizing sync-s in my book, "Building Office Add-ins using Office.js. Quoting one of the sections from it:
As an add-in author, your job is to minimize the number of context.sync()
calls. Each sync is an extra round-trip to the host application; and when
that application is Office Online, the cost of each of those round-trip adds up
quickly.
If you set out to write your add-in with this in principle in mind, you will
find that you need a surprisingly small number of sync calls. In fact, when
writing this chapter, I found that I really needed to rack my brain to come up with a scenario that did need more than two sync calls. The trick for
minimizing sync calls is to arrange the application logic in such a way that
you're initially scraping the document for whatever information you need
(and queuing it all up for loading), and then following up with a bunch
of operations that modify the document (based on the previously-loaded
data). You've seen several examples of this already: one in the intro chapter,
when describing why Office.js is async; and more recently in the "canonical
sample" section at the beginning of this chapter. For the latter, note that the
scenario itself was reasonably complex: reading document data, processing
it to determine which city has experienced the highest growth, and then
creating a formatted table and chart out of that data. However, given the
"time-travel" superpowers of proxy objects, you can still accomplish this task
as one group of read operations, followed by a group of write operations.
Still, there are some scenarios where multiple loads may be required. And in
fact, there may be legitimate scenarios where even doing an extra sync is the
right thing to do – if it saves on loading a bunch of unneeded data. You will
see an example of this later in the chapter.

TDD strategy when implementing a multi-stage process?

At the moment I'm developing a piece of code which first gathers sentences from a set of documents, then tokenises these, then uses the results to analyse recurring frequencies of token sequences, including case variations (upper case/lower case/leading cap/other), then prints out the results.
Now I want to introduce two more stages before printing out the results:
1. firstly, removing "stop words" (i.e. words or short sequences the frequency of which can never be of interest, such as, in English, "the", "of the", "of which", etc.) - these stop words/"stop sequences" to be taken from a database table
2. secondly, bringing up a dialog enabling the user to identify sequences of new stop words, which would then remove the token sequences involved and also add the sequence in question to the database table.
The thing is, this is a multi-stage process, and I'm just wondering what TDD experts do faced with a situation like this: do I create a new test method for each individual stage...? The problem being that each individual stage requires the use of "live memory data" from the previous stage: another possibility could be to somehow serialise this data and then deserialise it when testing for the next stage... but then this would involve the app code doing things which were of benefit only for the testing code, i.e. it would mean tweaking ("distorting"?) the app code for the benefit of the testing code, which seems wrong in principle...
Also, if anyone can point me in the direction of a book or site which helps TDD newbs like myself go to "the next level" I would be very grateful.
later
To the person who marked this as "favorite": I've now got hold of a book called "Growing Object-Oriented Software, Guided by Tests", which is well-reviewed and appears to be for someone wanting to move from beginner to intermediate. First impressions good.
Any views on this book by experts also welcome, of course...
On the face of it, you seem to be building a pipeline. From what I can tell, you're currently implementing all of it within a single class, which stores both the data that's being worked on and implements the methods that do the processing. One approach that you could take would be to break down the problem into smaller chunks. Rather than having a single class, you have a class for each stage of the pipeline and another class for orchestrating the process which is responsible for plugging the stages together in the correct order.
So, scanning through what you've described, you appear to have the following processors:
DocumentReader (reads documents from somewhere into in memory document)
SentenceExtractor (document/list of documents in, list of sentences out)
1 or more SentenceAnalysers (sentences in, statistics out), you might want to break this down depending on the type of analysis and how complex it is.
StopWordExtractor (StopWordProvider and sentences in, sentences out)
There are additional supporting classes that would be needed, to support writing of new stopwords to the database and depending on how the stopwordprovider was implemented keeping it in sync as the user selects new ones.
Essentially, what I'm saying is that you appear to be doing too much in a single location. If you're really happy that the code as you've described it is a single unit, then there is nothing wrong with you testing it all in one place, but then your inputs will be your starting documents/sentences and your outputs will be the end of the process. If you agree with me that really, there are several distinct components involved in the process that could change independently, then I would suggest breaking the process down into smaller classes and testing that those perform as expected for given sets of inputs/outputs...

Breaking a project's first User Story in to tasks [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I'm starting a new project from scratch and have written User Stores to describe how a given user will interact with the system. But, I'm having trouble understanding how to break the first user story in to tasks without the first one becoming an epic.
For example, if I were building a car and the first user story said something like "As a driver, I would like to be able to change the direction of motion so that I don't hit things.", that would imply a user interface (steering wheel), but also motion (wheels) and everything necessary to link those together (axle, frame, linkage, etc...). In the end, that first user story always seems to represent about 40% of the project because it implies so much about the underlying architecture.
How do you break user stories down for a new project such that the first one doesn't become an epic representing your entire underlying architecture?
You might want to think of your story as a vertical slice of the system. A story may (and often will) touch components in all of the architectural layers of the system. You might therefore want to think of your tasks as the work needed to be done on each of the components that your story touches.
For example, Let's say you have a story like In order to easily be able to follow my friends' tweets, as a registered user, I want to automatically follow all of my gmail contacts that have twitter accounts.
In order to accomplish this, you will have to pass through the UI layer, service layer, persist some data in the data layer, and make an API call to twitter and gmail.
Your tasks might be:
Add an option to the menu
Add a new gmail authentication screen
Add a twitter authentication screen
Add a contact selection screen
Add a controller that calls into your service layer
Write a new service that does the work
Save contacts to the database
Modify your existing gmail API calling service to get contacts
Add a twitter API calling service to follow selected contacts
There: That's 9 possible tasks right there. Now, as a rule, you want your tasks to take roughly 1/2 a day to 2 days, with a bias towards one day (best practice, for sizing). Depending on the difficulty, you might break down these tasks further, or combine some if they are two easy (perhaps the two API calling services are so simple, you'd just have a modify external API services).
At any rate, this is a raw sketch of how to break the stories down.
EDIT:
In response to more question that I got on the subject of breaking stories into tasks, I wrote a blog post about it, and would like to share it here. I've elaborated on the steps needed to break the story. The link is here.
When we started projects under a Scrum management style, the first set of tasks was always broad, or as you describe it: epic. That's inevitable, the framework of any project is usually the most important, largest, and time-consuming portion, but it supports the rest of the project. In order to pare down the scale on overwhelming-ness of how much there is to do see if you can list the MOST essential parts. Then work on defining those tasks as the starting points. Therefore you have a few tasks as starting points for a broad beginning. Hope that makes sense!
A user story describe the what while a task is more about the how.
There is no perfect formula, just add any task that describe how the user story is going to be implemented, documented or tested.
Keep in mind that a task should be estimated in hours, so try to scale and detail the tasks accordingly.
If you feel that you have too many tasks for a story (even if you have 1-8 hours long tasks), then maybe you should consider rewriting your user story in the first place because it's probably too complex.
Good luck
The story that you implement at the beginning can be refined over time. You dont need to think that every story has to be the final version that the user is going to use.
For example, in a recent project we had to develop an application which involved indexing various websites, and matching them against filters created by users, and finally alerting the user of matches (thing of it as google alert on steroids).
If you look at it from one perspective, there is only one story - "As a user I want to get alerts from matching pages". But look at it from another perspective of "what are the risks we want to mitigate". The first risk was that users wouldn't get relevant or better hits compared to google alerts. The second risk was in learning the technology to build this.
So our first user story was simply "As a user I want relevant hits", then we built just the hit matching algorithm on a hardcoded set of pages and hardcoded filters for some early users and got their feedback.
There might actually be a bit of back and forth here with multiple smaller stories to capture learning like "As a user I want more priority to be given to matches in the URL" etc.. these stories comes from the feedback as we iterate over what the early users consider "relevant hits".
Next, we broadened it to "As a user I want hits from specific websites" and we built the indexing architecture to crawl user specified sites and do hit matching on that.
The third story was "As a user I want to define my own filters", and we built this part of the system.
In this way we were able to build up the architecture piece by piece. Through most of the initial part, only early users could use the system, and many pieces of data were hardcoded etc.
After a point, early users could use the system completely. Then we added stories for allowing new users to register and opened it up to the public.
To cut a long story short, the story you implement first could implement only a small part of the final story, hardcoding and scaffolding everything else. And then you can iterate on it over time till you get the story that you might actually release to the public.
I've come to a crossroads with this issue in the past. User stories are supposed to be isolated so you can do them without any other stories, in whatever order, etc. But I found making that happen just made everything more complicated. To me this fell under the "Individuals and interactions over processes and tools" part of the agile manifesto - or at least my interpretation of it.
The ultimate goal is ship. And to ship you have to build, and to build you have to stop futzing with scrum and just get stuff done and make sure you track it.
So what we did was break a cardinal rule of stories and we made some tech stories like "create a preliminary schema". We also declared that some stories were dependent on others, and noted that on the back of the story card.
In the end I felt this type of story was few and far between, and the difficulty of the alternative justified the exception.

What are the drawbacks to merging the Task and Bug Work Items and only use one of them in TFS 2010?

I was thinking that I’d rather only use the Task Work Item and ignore the Bug Work Item. This is my thinking as I set things up for my team. I’m on a quest to see why I shouldn’t do this. From my perspective a Task is either a new item or a bug item. There is no need to use two distinct Work Item Types. To make this happen in TFS I’ll start with the Bug Work Item and create a custom field (“Item Type”) to distinguish the two task types: new/bug. Both new tasks and bugs will share the same fields. Anyone see any major drawbacks to this approach?
The main reason Tasks/Issue/Bugs/etc are different work items are because the individual fields of each work type can be configured differently.
For example, by default, Bugs have a Triage property, Issues have Due date, Tasks have a Discipline. The States of a Bug (Active/Closed/Resolve) are different from an issue (Active/Closed).
By merging them into a single work item type you would loose the ability to configure each one uniquely.
Also, the rules followed when a Bug and Task are closed, for example, are generally different. Segregating them into work items allows a simpler rules set.
Work item type is also a standard column in all queries.
Overall, it depends on how extensively you are using Team Foundation. If your project is small, and the above don't matter, it's not going to hurt. Though I don't see much gain either.
I would suggest keeping Bug and dropping Task if you want to merge them. By default when you check in code and Resolve with a bug, it sets the status to Resolved and assigns it to whoever created it - usually a tester, but in your case possibly a PM. That person can then test to confirm the work is done and close it. You can set up alerts on their work items so they get an email and know that progress has happened. Alternatively if you use Task, when you Resolve at check in it is just closed. No alerts, no further testing. YMMV but on some of our projects we use Bug for things like "user would like to add a new report" and it fits our process well. (For others we keep the distinction for reporting purposes.)
It all boils down to 3 things:
Creation / prioritization
Reporting / Notifications
Completion workflow
Typically creation of a Task involves different fields than a Bug. For a bug you'll want to know things like environment found in, who notified you, severity, priority, etc.
For tasks you usually want to know the requestor, reason behind it, business unit impacted and iteration it is scheduled for. Tasks might be long term goals that result in new or enhanced functionality.
Reporting and Notifications of the two are generally different as well. PM's are going to track tasks to ensure deliverables are met, your tech support area is going to track bugs.
Next, bugs will generally result in hotfixes and service packs. Depending on severity this this might involve a high priority push through QA and release as quickly as possible. Tasks are more laid back and will go through all forms of regression and regular testing with a period of acceptance by the impacted business unit.
Finally, bugs may impact previous versions of your software. Tasks will almost always be for either the version currently under development or the one after that.
In short, they are fundamentally different things. They might share most fields in common, however by combining them you are restricting yourself in both reporting and workflows. Today this might be okay; however within the next month or next year this could seriously restrict you.
Considering that maintainence of work item types is an incredibly easy thing there is almost no benefit to merging them.

Resources