I'm trying to help my team streamline a data ingestion process that is taking up a substantial amount of time. We receive data in multiple formats and with attributes arranged differently. Is there a way using RapidMiner to create a process that:
Processes files on a schedule that are dropped into a folder (this
one I think I know but I'd love tips on this as scheduled processes
are new to me)
Automatically identifies input filetype and routes to the correct operator ("Read CSV" for example)
Recognizes a relatively small number of attributes and arranges them accordingly. In some cases, attributes are named the same way as our ingestion format and in others they are not (phone vs phone # vs Phone for example)
The attributes we process mostly consist of name, id, phone, email, address. Also, in some cases names are split first/last and in some they are full name.
I recognize that munging files for such simple attributes shouldn't be that hard but the number of files we receive and lack of order makes it very difficult to streamline a process without a bit of automation. I'm also going to move to a standardized receiving format but for a number of reasons that's on the horizon and not an immediate solution.
I appreciate any tips or guidance you can share.
Your question is relative broad, so unfortunately I can't give you complete answer. But here are some ideas on how I would tackle the points you mentioned:
For a full process scheduling RapidMiner Server is what you are
looking for. In that case you can either define a schedule (e.g.,
check regularly for new files) or even define a web service to
trigger the process.
For selecting the correct operator depending on file type, you could
use a combination of "Loop Files" and macro extraction to get the
correct type and the use either "Branch" or "Select Subprocess" for
switching to different input routes.
The "Select Attributes" operator has some very powerful options to
select specific subsets only. In your example I would go for a
regular expression akin to [pP]hone.* to get the different spelling
variants. Also very helpful in that case would be the "Reorder
Attributes" operator and "Rename by Replacing" to create a common
naming schema.
A general tip when building more complex process pipelines is to organize your different tasks in sub-processes and use the "Execute Process" operator. This makes everything much more readable and maintainable. Also a good error handling strategy is important to handle unforeseen data formats.
For more elaborate answers and tips from many adavanced RapidMiner users, I also highly recommend the RapidMiner community.
I hope this gives a good starting point for your project.
Does Tin Can API support questions within questions?
If so, what would be the specification for passing data to an LRS?
I was thinking of adding ID's to each sub question.
This would be much easier to answer if you could provide an example, but the flexibility of the Tin Can API is such that you can literally capture anything (which is also part of the complexity) with more or less grace.
Some immediate options come to mind:
Use a single interaction activity statement (likely with type choice) and use the formatting allowed to have multi-value responses (i.e. golf[,]tetris).
Use multiple statements where there is a combined statement (necessary if there is an overall result) such that there is a single main activity and each sub-question has its own statement where the sub-question has its own activity and the main activity would be stored in the context.contextActivities.parent list. When there is a combined statement in this case I would include a reference to the combined statement in the sub-question statements' context.statement property such that you can tie them all together.
Use result, context, and activity definition extensions to capture anything. This should be a last resort option, it usually makes setting things up simple but adds significant complexity on the reporting side. Though tempting because of the simplicity, unless you are trying to capture a specific type of data point (like geo-location data, math equations, etc.) usually you should try to avoid the use of extensions.
Which of the above makes the most sense is probably determined by what sort of response is being given, and whether or not questions are nested such that there is an overall result and sub-results or whether there is just overall results.
Similar to Lab Panel ( Diagnostic Report with logical set of findings), Vitals are captured/displayed on a system in logical group ( BP,Height, Weight, BMI, O2 Saturation etc.).
FHIR does provide a way to group related lab panel item (wrap all related Observations under Diagnostic Report)however for vitals, what I understood is each vital will be stored as separate Observation. This adds an extra set of logic to build the panel and need for querying multiple times.
What is the recommendation/approach around it, to present all vital entries as logical group?
You should look at this, which directly addresses the question: http://build.fhir.org/observation-vitalsigns.html
Both lab and vital signs would be handled similarly - each individual item would be captured as an Observation with the option of grouping them in a DiagnosticReport if appropriate. (Vitals might or might not be colated for reporting purposes, so the decision to create a DiagnosticReport would be dependent on local practice.)
Background:
I have been digging into the FHIR DSTU2 specification to try and determine what is the most appropriate resource(s) to represent a particular patient's historical list of GPs/PCPs. I am struggling to find an ideal resource to house this information.
The primary criteria I have been using is to identify the proper resource is that it must provide values to associate a patient to a practitioner for a period of time.
Question:
What is the proper resource to represent historical pcp/gp information that can be tied back to a patient resource?
What I have explored:
Here is a list of my possible picks thus far. I paired the resource types with my thought process on why I'm not confident about using it:
Episode of Care - This seems to have the most potential. It has the associations between a patient and a set of doctors for a given time period. However, when I read its description and use-case scenarios, it seems like I would be bastardizing its usage to fit my needs, since it embodies a period of time where a group of related health care activities were performed.
Group - Very generic structure that could fit based on its definition. However, I want to rule out other options before taking this approach.
Care Plan - Similar to Episode of Care rational. It seems like a bastardization to just use this to house PCP/GP history information. The scope of this is much bigger and patient/condition-centric.
I understand that there may not be a clear answer and thus, the question might run the risk of becoming subjective and I apologize in advance if this is the case. Just wondering if anyone can provide concrete evidence of where this information should be stored.
Thanks!
That's not a use-case we've really encountered before. The best possibility is to use the new CareTeam resource (we're splitting out CareTeam from EpisodeOfCare and CarePlan) - take a look at the continuous integration build for a draft.
If you need to use DSTU 2, you could just look at Patient.careProvider and rely on "history" to see changes over time. Or use Basic to look like the new CareTeam resource.
We've got a surprisingly complex workflow that needs to be monitored by a quasi-technical employees with an in-house webapp. There's about 30 steps, some of which are manual (editing), some are semi-automated stop points (like "the files have been received" or customer approval of certain templates), and some are completely automated (file conversion, search indexing, etc). The flowchart for all of these steps is large and complicated, and three people might be working on three completely different steps at any one time.
How would you present this vast amount of information as usefully as possible to your users? Just showing the whole diagram seems like the brute force solution. But it's big, and it'll likely get bigger as we do more things. Not to mention the complexity necessary to encode this entire diagram in HTML.
I assume you don't want to show these just for entertainment or mockery, but help the users along the way, automating as much as possible, document the process etc. It would probably help if you clearly define the goals or purpose of your app.
I don't see a point in showing the entire workflow, except for "debugging the business rules" or maybe the clients want to see it.
If your goal is to help users do their job, I would present the state of the "project" (or whatever term fits better) is at, and possible transitions to other states.
The State might be multiple mostly independent variables, e.g. one might describe the progress of content - e.g. "incomplete" / "complete" / "reviewed by 2nd staffer" / "signed off by 2nd staffer", others might contain a schedule that is developed in parallel, e.g. "test print date = not scheduled", "print date = not scheduled", "final delivery = tomorrow, preferredly yesterday".
A transition might be "Seint to customer for review", "mark as content-complete", "content modified", etc.
Is this what you have in mind?
I propose to divide your workflow in modules and represent the active state for each module.
A module is a subset of your main workflow. For example it could be divided by tasks, person, roles, department, etc. This will greatly simplify the representation of the workflow. Let's says someone is responsible for data entry at many critical moments. We can group all his tasks in one module (or sub-workflow) containing the same activities, inputs, outputs and conditions. Modules could be inter-dependants and related.
A state is where we are located in a module. In simple workflows there is only one active task. In real life we are multi-threaded! So maybe in one module many states could be active at the same time. The state also includes active inputs, outputs and memory bits.
An input is something required to perform an activity for evaluation a boolean condition. It could be a document, a piece of data, a signal...
An output is something resulting from a task: an information, a document, a signal...
Enough definitions?
Then simply convert your workflow into a LADDER LOGIC and you have your states!
See Ladder Logic definition on Wikipedia
You display only active states:
Active task(s) for the module
Inputs required / inputs confirmed
Output required / output realized
Conditions to continue
Seems abstract?
Here is a small example...
Janet enters data in the system. She manages the green tasks of the diagram. We focus only on her work, not other tasks. She knows how to do 16 tasks in the workflow. We are waiting the following actions from her to continue, and her Intranet dashboard says:
Priority 1: You must send a PO to order enough pencils for the next month based on the sales report.
Task: Send a purchase order
Inputs: Forecast report from the marketing department
Outputs: PO, vendor, item, quantity
Condition for completion: PO sent and order confirmation received from supplier
Priority 2: You must enter into the financial system the number of erasers rejected by production
Task: Data entry
Inputs: Reject count from production
Outputs: Number of rejects
Condition for completion: data entered and confirmed
We do a lot of troubleshooting on automated production systems having hundreds of thousands ladder steps (the workflow is too complex to be represented in a whole). When the system is blocked we look at each module and determine what inputs are missing to activation task completion.
Good luck!
This sounds like the sort of application for which BPEL is suited.
Of course you don't want to re-architect your system right now. But there are a number of BPEL implmentations out there, some of which include graphical editing tools. One of these might help you in your current situation, because they are good at handling scope and hiding detail. So I think you might derive benefit from drawing your workflow as a BPEL diagram even if you don't do anything else with the language.
The Wikipedia page lists several of the available implementations. In addition, Oracle's JDeveloper IDE includes a BPEL Diagrammer as part of its SOA suite; unfortunately it is no longer part of the standard install but it is still available. Find out more.
Try doing it in layers. You have the most detailed layer done, now add additional docs with the details hidden, grouped into higher-level business processes. Users should be able to safely ignore some of those details, but it's good for them to have visibility of how their part fits in to the whole.
You may need more than one higher-level document.
You can use Prezi to present this information to users in a lucid manner.
Split and present the work flow into phases such that the end user is easily able to identify the phase he is currently in.
Display as many number of phases as the number of inputs. The workflow starts with 6 different inputs so display the six different buttons on screen enabling the user to select the input that he wants.
On selecting the button zoom into the workflow depicting the next steps. This would also help the user to verify the actions that he has done so far to reach the current states.
This would also help the user to verify the actions that he has done so far to reach the current states. But this way of presenting could become cumbersome for the users as the number of steps that he has completed goes up. Say the user has almost reached the end of the workflow. To check for the next step he should go through all the steps which might frustrate the user.
To avoid this you can split the complete work flow chronologically into 3-5 phases. The phases should be split logically. The ultimate aim would be not to overwhelm the users with the full work flow. Personally i would try to avoid the task involving this workflow if presented the way you have shown. No offense. I bet you also feel the same.
Could give you a better picture if you could re-post the image after replacing the state names with numbers.
I'd recommend having the whole flow documented somewhere, but in terms of what is distributed to users, how about focusing on task-oriented flows? No one user will be responsible for the entire process I would imagine.
For example, let's say I have 2 roles, A and B, and 6 tasks, 1 through 6, executed in order. Each task may have multiple steps but is self-contained (e.g. download the file, review, run process, review again, upload). A does the even tasks and B does the odd tasks.
A would need to know about those detailed steps that comprise tasks 2, 4, and 6 but not about what goes on in 1, 3, and 5. So hand A a detailed set of flows for the tasks he is responsible for, along with a diagram that treats each task as a black box.
If the flow can't be made modular in this way, you may want to review the process itself to see why it's so complex.
How about showing an example of a workflow scenario, that is, showing the transitions in one possible passing through the workflow? You could cater this to a specific user profile and highlight the pertinent states, dimming the others. This allows them to get a clear idea of the transitions by seeing a real-life example.