Representing parallel operations in flowcharts - parallel-processing

I need to do a flowchart of a hydraulic system featuring a temperature regulation module. However, temperature regulation is only activated during one part of the cycle. During this part, the system continues to perform other operations.
I want to represent that in my flowchart diagram. I need a way to tell when the parallel algorithm begins and when it ends within the main flowchart. How do I do that ?

Add two new flowchart nodes/operators fork and join.
Fork takes one control flow coming in, and produces two going out, similar to the diamond decision node in regular flowcharts. Unlike the decision node, the fork node passes control to both of its children, meaning that "parallel execution starts here".
Join takes two control flows in, and produces one control flow out. It means,
"wait for execution on both input branches to complete, then pass control to the output branch". This means "parallel execution stops here".
Now your flowchart can have serial and parallel sections; you can even have the parallel sections generate additional, nested parallelism.
Now you need a graphical representation of fork and join.
We represent the control flow graph of real computer programs with essentially a flowchart ("actions" and "decisions"). Because of the similarity of "fork" to "decision" (one input, two outputs) we chose to draw "fork" as an upward facing triangle (one input at the top, two outputs at the triangle base endpoints), and "join" as a downward facing triangle, with two inputs at the triangle base endpoints and one output at the peak of the downward facing triangle.
You can see automatically generated examples of this for C and COBOL programs. You might not think these contain parallelism... but, in fact, the langauge semantics of many languages is often unspecified, allowing parallel execution in effect (C function arguments according to the standard can be evaluated in arbitrary order, for example). We model this nondeterminism as "parallelism" because the net effect is the same.
People building industrial control software also need to express this. They
simply split the control flow line going from one flow graph item to another. See Sequential function charts in this document.
The most general "control flow" notation I know are Colored Petri Nets. These can model not just control flow, but data flow, synchronization, and arithmetic, which means you can model systems with very complex flows. You can model a regular flowchart directly with a CPN. CPNs also generalize finite state machines. What that means for programmers, is that if you don't know about CPNs, you should learn about these now. And you discover that "flowcharts" (as CPNs) are now useful again in discussing systems whose parts run asynchronously.

Related

Meaning of merge, phi, effectphi and dead in v8 terminology

I’m trying to read the v8 source code (in particular the compiler part of it) to understand better the optimisation and reduction procedures (in order to look for bugs).
I ran into a few terms that are used in the comments but seem to be unexplained. The comment is this:
// Check if this is a merge that belongs to an unused diamond, which means
// that:
//
// a) the {Merge} has no {Phi} or {EffectPhi} uses, and
// b) the {Merge} has two inputs, one {IfTrue} and one {IfFalse}, which are
// both owned by the Merge, and
// c) and the {IfTrue} and {IfFalse} nodes point to the same {Branch}.
What do the terms Merge, Phi and EffectPhi mean? Also, does marking a node as “dead” mean that it will treated as redundant?
Thanks in advance.
The link to the above code is this:
https://chromium.googlesource.com/v8/v8.git/+/refs/heads/master/src/compiler/common-operator-reducer.cc
V8 developer here. As background knowledge, it helps to know that V8's "Turbofan" compiler uses the "SSA" (static single assignment) and "sea of nodes" concepts. There are various compiler textbooks and research papers that explain these in great detail. To answer your questions in short:
A "Merge" node merges two control nodes, i.e. two branches of control flow. You can think of it as the "opposite" of a Branch, or the equivalent of a Phi for control nodes. Control nodes are the mechanism that Turbofan's sea-of-nodes design uses to make sure nodes aren't reordered across certain control flow boundaries.
A "Phi" node merges the two (or more) possibilities for a value that have been computed by different branches. See https://en.wikipedia.org/wiki/Static_single_assignment_form for more.
An "EffectPhi" is a special version of a Phi node that's used for nodes on the "effect chain". The effect chain is the mechanism Turbofan uses to make sure that nodes' external effects (like memory loads and stores) aren't visibly reordered.
A "dead" node is one that's unreachable and can be eliminated. So it's "redundant" in the sense of "superfluous/unnecessary", but not in the sense of "the same as another node".

How to draw sequence diagram for multiple-stages use case?

In current system, some functionalities/ use cases may need multiple stages to complete their whole procedure. Examples are like user registration which may contain two stages: inserting user record on database and email activation. Or for money transfer on netbank, it requires the stage of filling transfer information and of SMS verification before the actual transaction happens. I want to know how to draw sequence diagram for such kind of functionalities/ use cases.
To be more specific, I have drawn a sequence diagram for the user registration example using MVC pattern, which is shown on the bottom. On this diagram, the two stages is divided by the red line. Also I skip the logic that handles failures or exceptions like invalid input or username already existing in the database to make it simpler. It seems that there are so many objects involved in this diagram, should I draw a sequence diagram this way? Or using 2 sequence diagram for each stage is better?
Generally speaking the smaller diagram, the better. If you are doing it for documentation / informing purpose then clarity should be your priority.
a) breaking it up
In your particular example, breaking it into (at least) two sequence diagrams would be a good option, as both those activities are usually performed at different times.
The way you could look at this is:
first diagram takes as an input unregistered user and produces a unverified user
second diagram takes as an input unverified user and produces a verified user
That way you wouldn't need to worry about connecting the two diagrams as they are really not that dependent.
b) using interaction overview
That being said, UML has interaction overview diagrams (http://www.uml-diagrams.org/interaction-overview-diagrams.html), that allow you sow together various diagrams, so you could split it into two, and then just connect them differently. Or you could use state machine diagram to track the mutation of the user — the UML vocabularly is rather large, so no need to limit yourself just to one type.

FPGA logic cells

I have an small presentation about FPGA techonology. My questions is: If your FPGA has 85k logic cells, does this mean it can run 85k operations simultaneously?
What I am trying to achieve is to shock the audience with some crazy illustrated facts about FPGA technology or facts. The people who listens now very little about FPGA, so I want to impress them.
What's inside a 'cell' can vary per manufacturer, but the Xilinx definition (using this manufacturer as an example, as these are the devices that I'm familiar with) is one four-input look-up table, and one register. Xilinx devices are made up of a number of 'slices', and these contain a number of functional elements. These might include:
Look-up tables
Registers
Multiplexers
Logic for use in carry chains
etc
As an example, a Spartan6 LX4 has 600 slices, and the marketing material claims that this is equivalent to 3840 'logic cells'. You can look in the user guide for a device to determine exactly what is contained inside a slice.
In addition to this, there are other resources such as multipliers, memories, PLLs, etc.
I suppose you could say that one logic cell can perform one operation, but a single cell is only capable of very simple operations, for example an AND gate, 2:1 multiplexer, etc.
I would say no, but it depends on what you mean by an operation. A logic cell has the capability to implement a number of logical functions (and/or/xor), and it has the ability to hold a state with storage elements. These two functions are how every digital system under the sun operates. Even addition and subtraction are higher level constructs built on top of logical functions. As in other answers, FPGA manufacturers publish guides on what is inside of their logic cell. It is this fundamental cell that is stamped repeatedly in the die to create this "array" as in Field Programmable Gate "Array".
This yields a distinctly "more or less" answer. The logic blocks can be used in multiple modes, and you might even be able to pack more than one function in one (including with two independent outputs), but you must also be able to transport meaningful data to work on. It sounds like you have a 7z020 as an example. You may want to note that besides those logic cells, it also has 220 hardware multiply+add blocks. That amount is not random; the surrounding logic is enough to keep them fed in particular cases, every cycle. Looking in 7 Series FPGAs Configurable Logic Block User Guide (UG474), we find that the Logic Cells number given is an estimate of equivalent 4LUT+FF configurations. The reason this number is lower than the number of flipflops (106k) is that the input arguments for the two 5luts you can split a 6lut into must overlap.

How can I visually compare refactoring options?

I have an existing system model, representing services calling APIs backed by one or more implementations in perhaps other services, in the form of a DAG. I've been rendering this by GraphViz dot files' digraph which is fairly reasonable, if sometimes difficult to enforce layering.
While considering various possible refactorings of services and APIs, I'd like to be able to chart alternative routes towards an end goal. Each refactoring step would yield a different DAG – represented in terms of diffs from the previous DAG (e.g., convert service A's use of API x in service B to API y in service C) – and similarly renderable.
What tools are there to be able to create such "refactoring paths" and then visualize flows between them, determine dependencies and parallelism? Extra bonus points for goal seeking (e.g., no dependencies from any service other than service A on service C; cheapest path based on weights) and providing a loose ordering of refactorings that demonstrate their (presumably) monotonically increasing system value.
I am picturing two UI components:
a DAG diff that shows visually the nodes/vectors that got replaced in the source graph with nodes/vectors in the second
a controlling display that acts like D3's force layout that lets you navigate the loosely connected DAG of refactorings and select which refactoring you'd like to see the before/after picture in the DAG diff.
That said, I'm totally up to other tooling, formats, etc. Just would like to be able to produce these and display them to other people as to why what we're doing is valuable (goal assertion) and taking as long as it does (dependencies and Gantt charts).
Would it be feasible for you to create a separate tool that, given two similar DAGs, outputs the "merge" of them? If that's possible, then visualizing the merge DAG will probably tell you a lot about both DAGs. You can color-code the nodes by whether they appear in both DAGs or in either.
That's how we originally designed the visual diffs of workflow graphs in VisTrails, see here.
If you insist on showing the two DAGs side by side, creating the merge DAG might still be the right idea, because then dot can lay the merge graph out, and you can simply hide the appropriate nodes for each subgraph. In this way, the shared structure will be laid out consistently by construction.

Design tips for synchronising signals through a VHDL pipeline

I am designing a video pixel data processing pipeline in VHDL which involves several steps including multiply and divide.
I want to keep signals synchronised so that I can e.g. maintain a sync signal and output it correctly at the end of the pipeline along with manipulated pixel data which has been through several processing stages.
I assume I want to use shift registers or something to delay signals by the right number of cycles so that the output is correct, but I'm looking for advice about good ways to design this, particularly as the number of pipeline stages for different signals may vary as I evolve the design.
Good question.
I'm not aware of a complete solution but here are two partial strategies...
Interconnecting components... It would be really nice if a component could export a generic whose value was its pipeline depth. Unfortunately you can't, and dedicating a port to this seems silly (though it's probably workable; as it would be an integer constant, it would disappear in synthesis)
Failing that, pass IN a generic indicating the budget for this module. Inside the module, assert (severity FAILURE) if the budget can't be met... (this assert is checkable at synth time and at least Xilinx XST handles similar asserts)
Make the budget a hard number, and either assert if not equal to actual pipeline depth, or add pipe stages inside the module if the budget is too large, and only assert if the budget is too small.
That way you are connecting predictable modules, and the top level can perform pipeline arithmetic to balance things (e.g. passing a computed constant value to a programmable delay line)
Within a component... I use a single process, with registers represented as internal signals whose names reflect their pipe stage, exponent_1, exponent_2, exponent_3 and so on. Within the process, the first section describes all the actions for the first cycle, the second section describes the second cycle, and so on. Typically the "easier" paths may be copied verbatim to the next pipe stage, just to sync them with the critical path. The process is fairly organised and easy to maintain.
I might break a 32-bit multiply down into 16*16 chunks and pipeline the partial product additions. The control this gives, USED to give better results than XST gave alone...
I know some people prefer variables within a process, and I use them for intermediate results in a pipe stage, but using signals I can describe the pipeline in its natural order (thanks to postponed assignment) whereas using variables, I would have to describe it backwards!
I create a package for each of my major processing blocks, one of the constants in there is the processing delay of that block. I can then connect that up to my general-purpose "delay-line" block which has a generic for the number of cycles.
Keeping that constant in "sync" with the actual implementation is best done by a self-checking testbench.
Something to consider is delay lines (i.e. back to back registers) vs FIFOs.
Consider a module X with a pipeline delay N. FIFOs work well when there is a N is variable. The trick is remembering that you can only request new work when both the module and the FIFO can accept it. Ideally you size the FIFO so that it can contain the maximum number of items that X can work on concurrently, but sometimes that's not practical. For example, if your calculation includes accesses to a distant memory.
Another option is integrating the side channel (i.e. the path that your sync flag is taking) into the module X rather than it going outside. If you do this then if any part of the calculation has to stall, you can also stall the side channel and the two stay in sync. You can do this because you're in a scope that has all the necessary signals in it. Then all signals, whether used in the calculation or not, appear at the output at the same time.

Resources