I am not sure how to express my scenario using activity diagrams:
What I am trying to visualise is the fact that:
A message is received
Two independent and concurrent actions take place: logging of the message and processing the message
Logging always takes less time than processing
The first activity in the diagram is correct in the sense that the actions are independent but it does not relay the fact that logging is guaranteed to take less time than processing.
The second activity in the diagram is not correct because, even if logging completes before processing, it looks as though processing depended on the logging's finishing first and that does not represent the reality.
Here is a non-computer related example:
You are a novice in birdwatching, trying to make your first notes in your notebook about birds passing by
A flock of birds approaches, you try to recognise as many details as possible
You want to write down the details in your notebook, but wait, you begin to realise that your theoretical background does not work in practice, what should be a quick scribble actually amounts to nothing in the end because you did not recognise anything
In the meantime, the birds majestically flew away without waiting for you, the activity is gone
Or maybe you did actually write it down, it took you only a moment and the birds are still nearby, slowly flying away, ending the activity again after some time
Or maybe you were under such awe that you just kept watching at them, without taking any notes - they fly away, disappearing in the horizon, ending the activity
After a few hours, you have enough notes and you come home very happy - maybe you did not capture everything but this was enough to make you smile anyway
I can always add a comment to a diagram to express it all somehow but I wonder, is there a more structured way to express what I described in an activity diagram? If not an activity diagram then what kind of a diagram would be better suited in your opinion? Thank you.
Your first diagram assumes that the duration of logging is always shorter than processing:
If this assumption is correct, the upper flow reaches the flow-final node, and the remaining flows continue until the first reaches the activity-final node. Here, the processing continues and the activity ends when the processing ends. This is exactly what you want.
But if once, the execution would deviate from this assumption and logging would get delayed for any reason, then the end of the processing would reach the activity-final node, resulting in the immediate interruption of all other ongoing activities. So logging would not complete. Maybe it’s not a problem for you, but in most cases audit expects logs to be complete.
You may be interested in a safer way that would be to add a join node:
The advantage is that the activity does not depend on any assumptions. It will always work:
whenever the logging is faster, the token on that flow will wait at the join node, and as soon as process is finished the activity (safely) the join can happen and the outgoing token reaches the end. This is exactly what you currently expect.
if the logging is exceptionally slower, no problem: the processing will be over, but the activity will wait for the logging to be completed.
This robust notation makes logging like Schroedinger's cat in its box: we don't have to know what activity is longer or shorter. At the end of the activity, both actions are completed.
Time in activity diagrams?
Activity diagrams are not really meant to express timing and duration. It's about the flow of control and the synchronization.
However, if time is important to you, you could:
visually make one activity shorter than the other. This is super-ambiguous and absolute meaningless from a formal UML point of view. But it's intuitive when readers see the parallel flow (a kind of sublminal communication ;-) ) .
add a comment note to express your assumption in plain English. This has the advantage of being very clear an unambiguous.
using UML duration constraints. This is often used in timing diagram, sometimes in sequence diagrams, but in general not in activity diagrams (personally I have never seen it, but UML specs doesn't exclude it either).
Time is something very general in the UML specs, and defined independently of the diagram. For example:
8.4.4.2: A Duration is a value of relative time given in an implementation specific textual format. Often a Duration is a non- negative integer expression representing the number of “time ticks” which may elapse during this duration.
8.5.1: An Interval is a range between two values, primarily for use in Constraints that assert that some other Element has a value in the given range. Intervals can be defined for any type of value, but they are especially useful for time and duration values as part of corresponding TimeConstraints and DurationConstraints.
In your case you have a duration observation for the processing (e.g. d), and a duration constraint for the logging (e.g. 0..d).
8.5.4.2: An IntervalConstraint is shown as an annotation of its constrainedElement. The general notation for Constraints may be used for an IntervalConstraint, with the specification Interval denoted textually (...).
Unfortunately little more is said. The only graphical examples are for messages in sequence diagrams (Fig 8.5 and 17.5) and for timing diagrams (Fig 17.28 to 17.30). Nevertheless, the notation could be extrapolated for activity diagrams, but it would be so unusal that I'd rather recommend the comment note.
Related
Following truth table resulted from the circuit below. SR(NOR) latch is used. I have tried several times to trace through the circuit to see how truth table values are produced but its not working. Can someone explain to me what is going on ? This circuit was introduced in conjunction with racing although I am not sure if it has anything to do with it.
NOTE: "CLOCK" appears as a straight line to show how its connected everything. It is a normal clock that oscillates between 1 and 0. (this is how my instructor drew it).
Strictly, this does belong on EE. The other questions you've found are likely to be old - before EE was established.
You should look at the 1-to-0 transitions of the clock. When that occurs and only when that occurs, the value currently on S is transferred to Q.
The Race condition appears when the clock signal is delayed, even with the tiny amount of copper track between real components. The actual waveform is not 1-0 or 0-1, it ramps between the two values. A tiny variation between two components, one seeing the transition at say 2.7V and the other at 2.5 would mean that the first component moves the value from S to Q fractionally before the second, so when the second component decides to transfer the value, it may see the value after the transfer has occurred on the prior component. You therefore may have a race between the two. These delays can also be affected by supply-rail stability and temperature, so the whole arrangement can become unreliable if not carefully designed. The condition is often overcome be deliberately routing the clock so that it will arrive at the last component in the chain first, giving that end of the chain a head-start.
I've worked on systems where replacing a component with a faster version caused the circuit to stop working. The new component was working too fast for the remainder of the circuit - and you needed to deliberately select (or use factory-selected) slower versions.
On a related note, before hard-drives became cheap, and floppy-drives (you may need to google that) before them it was common to use casste tapes (even more likely you'd need google on those.) Cheap and cheerful was best. If you used a professional quality recorder/player, you'd often get unusable results.
In most examples of DES I've seen an Event triggers a State change and possibly schedules some new Events in the future. However, if I simulate a Billiard game this is not the whole story.
In this case the Events of interest are the shots and the collisions of the balls with each other and with the cushion. The State consists of the position and velocity of each ball.
After a collision or a shot I will first recalculate a new State and from there I will calculate all possible future (first) collisions. The strange thing is that I will have to discard all Events which were scheduled previously as these describe collisions which were possible only before the state change.
So there seem to be two ways of doing DES.
One, where the future Events are computed from the State and all Events scheduled in the past are discarded with each State change (as in the Billiard example), and
another one, where each Event causes a state change and possibly schedules new Events, but where old Events are never discarded (as in most examples I've seen).
This is hard to believe.
The Billiard example also has the irritating property, that future events are calculated from the global state of the system. All Balls need to be considered, not just the ones which participated in a collision or a shot.
I wonder if my Billard example is different from classic DES. In any case, I am looking for the correct way to reason about such issues, i.e.
How do I know which Events are to be discarded?
How do I know what States to consider when scheduling future events
It there a possible "safe" or "foolproof" way to compute future events (at the cost of performance)?
An obvious answer is "it all depends on your problem domain". A more precise answer or a pointer to literature would be much appreciated
Your example is not unique or different from other DES models.
There's a third option which you omitted, which is that when certain events occur, specific other events will be cancelled. For example, in an epidemic model you might schedule infection events. Each infection event subsequently schedules 1) the critical time for the patient beyond which death becomes inevitable, with some probability and some delay corresponding to the patient's demographics, mortality rate for that demographic, and rate of progression for the disease; or 2) the patient's recovery. If medical interventions get queued up according to some triage strategy, treatment may or may not occur prior to the critical time. If not, a death gets scheduled, otherwise cancel the critical time event and schedule a recovery event.
These sorts of event scheduling, event cancellation, and parameterizations so that you can identify which entities the scheduling/cancelling applies to can all be described by a notation called "event graphs," created by Lee Schruben. See 'Schruben, Lee 1983. Simulation modeling with event graphs. Communications of the ACM. 26: 957-963' for the original paper, or check out this tutorial from the 1996 Winter Simulation Conference which is freely available online.
You might also want to look at this paper titled "Simple Movement and Detection in Discrete Event Simulation", which appeared in the 2005 Winter Simulation Conference.
The State consists of the position and velocity of each ball.
Once you get that working, you'll need to add the spin and axis of rotation for each ball, since the proper use of spin is what differentiates the pros from the amateurs.
I will have to discard all Events which were scheduled previously
Yup, that's true, so don't bother scheduling them at all. See below.
So there seem to be two ways of doing DES (both involving the
scheduling of events)
Actually, there's a third way. Simply search the problem space to determine the time of the first future event, and then jump to that time. There is no need to schedule Events. You only care about the one Event that will occur first.
All Balls need to be considered
Yes, this is true. Start by considering one of the balls and determining the time of it's next collision. That time then puts an upper limit on how far the other balls can move. For example, imagine the first ball will collide after 0.1 seconds. Then the question for the second ball is, "Is it possible for the second ball to hit anything within 0.1 seconds?" If not, then move along to the third ball. If so, then reduce the time limit to the time it takes for the second ball to collide, and then move on to the third ball.
An obvious answer is "it all depends on your problem domain"
That's true. My comments apply only to your example of a billiards simulation. For other problem domains, different rules apply.
I am making a BI system for a bank-like institution. This system should manage credit contracts, invoices, payments, penalties and interest.
Now, I need to make a method that builds an invoice. I have to calculate how much the customer has to pay right now. He has a debt, which he has to pay for. He also has to pay for the interest. If he was ever late with due payment, penalties are applied for each day he's late.
I thought there were 2 ways of doing this:
By having only 1 original state - the contract's original state. And each time to compute the monthly payment which the customer has to make, consider the actual, made payments.
By constantly making intermediary states, going from the last intermediary state, and considering only the events that took place between the time of these 2 intermediary states. This means having a job that performs periodically (daily, monthly), that takes the last saved state, apply the changes (due payments, actual payments, changes in global constans like the penalty rate which is controlled by the Central Bank), and save the resulting state.
The benefits of the first variant:
Always actual. If changes were made with a date from the past (a guy came with a paid invoice 5 days after he made the payment to the bank), they will be correctly reflected in the results.
The flaws of the first variant:
Takes long to compute
Documents printed with the current results may differ if the correct data changes due to operations entered with a back date.
The benefits of the second variant:
Works fast, and aggregated data is always available for search and reports.
Simpler to compute
The flaws of the second variant:
Vulnerable to failed jobs.
Errors in the past propagate until the end, to the final results.
An intermediary result cannot be changed if new data from past transactions arrives (it can, but it's hard, and with many implications, so I'd rather mark it as Tabu)
Jobs cannot be performed successfully and without problems if an unfinished transaction exists (an issued invoice that wasn't yet paid)
Is there any other way? Can I combine the benefits from these two? Which one is used in other similar systems you've encountered? Please share any experience.
Problems of this nature are always more complicated than they first appear. This
is a consequence of what I like to call the Rumsfeldian problem of the unknown unknown.
Basically, whatever you do now, be prepared to make adjustments for arbitrary future rules.
This is a tough proposition. some future possibilities that may have a significant impact on
your calculation model are back dated payments, adjustments and charges.
Forgiven interest periods may also become an issue (particularly if back dated). Requirements
to provide various point-in-time (PIT) calculations based on either what was "known" at
that PIT (past view of the past) or taking into account transactions occurring after the reference PIT that
were back dated to a PIT before the reference (current view of the past). Calculations of this nature can be
a real pain in the head.
My advice would be to calculate from "scratch" (ie. first variant). Implement optimizations (eg. second variant) only
when necessary to meet performance constraints. Doing calculations from the beginning is a compute intensive
model but is generally more flexible with respect to accommodating unexpected left turns.
If performance is a problem but the frequency of complicating factors (eg. back dated transactions)
is relatively low you could explore a hybrid model employing the best of both variants. Here you store the
current state and calculate forward
using only those transactions that posted since the last stored state to create a new current state. If you hit a
"complication" re-do the entire account from the
beginning to reestablish the current state.
Being able to accommodate the unexpected without triggering a re-write is probably more important in the long run
than shaving calculation time right now. Do not place restrictions on your computation model until you have to. Saving
current state often brings with it a number of built in assumptions and restrictions that reduce wiggle room for
accommodating future requirements.
I'm trying to write a VB6 program (for a laugh) that will compute event times + the critical path JUST BASED ON A PRECEDENCE TABLE. I want my students to use it as a checking mechanism ie. to do everything without drawing the activity network. I'm happy that I can do all this once I've got start and finish events for each activity. How do I allocate events without drawing the network. Everything I come up with works for a specific example and then doesn't work for another one. I need a more general algorithm and it's driving me mental. Help!
I am not a professional programmer - I do this in my spare time to create teaching resources - simple English would really be appreciated.
Okay, so you have a precedence table, which I take to be a table of pairs like
A→B
B→C
and so forth, for activities {A,B,C}. Each of the activities also has a duration and (maybe) a distribution on the duration, so you know A takes 3 days, B takes 2, and so on. This would be interpreted as "A must be finished before B which must be finished before C".
Right?
Now, the obvious thing to do is construct the graph of activities and arrows -- in fact, you basically have the graph there in incidence-list form. The critical part is the greatest-weight (biggest sum of times) path. This is a longest-path problem, and assuming your chart isn't cyclic (which would be bad anyway) it can be solved with topological sort or transitive closure.
This question is about a whole class of similar problems, but I'll ask it as a concrete example.
I have a server with a file system whose contents fluctuate. I need to monitor the available space on this file system to ensure that it doesn't fill up. For the sake of argument, let's suppose that if it fills up, the server goes down.
It doesn't really matter what it is -- it might, for example, be a queue of "work".
During "normal" operation, the available space varies within "normal" limits, but there may be pathologies:
Some other (possibly external)
component that adds work may run out
of control
Some component that removes work seizes up, but remains undetected
The statistical characteristics of the process are basically unknown.
What I'm looking for is an algorithm that takes, as input, timed periodic measurements of the available space (alternative suggestions for input are welcome), and produces as output, an alarm when things are "abnormal" and the file system is "likely to fill up". It is obviously important to avoid false negatives, but almost as important to avoid false positives, to avoid numbing the brain of the sysadmin who gets the alarm.
I appreciate that there are alternative solutions like throwing more storage space at the underlying problem, but I have actually experienced instances where 1000 times wasn't enough.
Algorithms which consider stored historical measurements are fine, although on-the-fly algorithms which minimise the amount of historic data are preferred.
I have accepted Frank's answer, and am now going back to the drawing-board to study his references in depth.
There are three cases, I think, of interest, not in order:
The "Harrods' Sale has just started" scenario: a peak of activity that at one-second resolution is "off the dial", but doesn't represent a real danger of resource depletion;
The "Global Warming" scenario: needing to plan for (relatively) stable growth; and
The "Google is sending me an unsolicited copy of The Index" scenario: this will deplete all my resources in relatively short order unless I do something to stop it.
It's the last one that's (I think) most interesting, and challenging, from a sysadmin's point of view..
If it is actually related to a queue of work, then queueing theory may be the best route to an answer.
For the general case you could perhaps attempt a (multiple?) linear regression on the historical data, to detect if there is a statistically significant rising trend in the resource usage that is likely to lead to problems if it continues (you may also be able to predict how long it must continue to lead to problems with this technique - just set a threshold for 'problem' and use the slope of the trend to determine how long it will take). You would have to play around with this and with the variables you collect though, to see if there is any statistically significant relationship that you can discover in the first place.
Although it covers a completely different topic (global warming), I've found tamino's blog (tamino.wordpress.com) to be a very good resource on statistical analysis of data that is full of knowns and unknowns. For example, see this post.
edit: as per my comment I think the problem is somewhat analogous to the GW problem. You have short term bursts of activity which average out to zero, and long term trends superimposed that you are interested in. Also there is probably more than one long term trend, and it changes from time to time. Tamino describes a technique which may be suitable for this, but unfortunately I cannot find the post I'm thinking of. It involves sliding regressions along the data (imagine multiple lines fitted to noisy data), and letting the data pick the inflection points. If you could do this then you could perhaps identify a significant change in the trend. Unfortunately it may only be identifiable after the fact, as you may need to accumulate a lot of data to get significance. But it might still be in time to head off resource depletion. At least it may give you a robust way to determine what kind of safety margin and resources in reserve you need in future.