What are coregions in UML sequence diagrams?
Coregions are used when the sequence of events does not matter, that is they can occur safely in any order.
This is one of the first few pages I found when I searched coregion sequence diagram in Google.
The coregion is a notational/sytanx choice for representing parallel CombinedFragments the UML 2.2 Superstructure spec (14.3.3) says:
Parallel The interactionOperator par
designates that the CombinedFragment
represents a parallel merge between
the behaviors of the operands. The
OccurrenceSpecifications of the
different operands can be interleaved
in any way as long as the ordering
imposed by each operand as such is
preserved. A parallel merge defines a
set of traces that describes all the
ways that OccurrenceSpecifications of
the operands may be interleaved
without obstructing the order of the
OccurrenceSpecifications within the
operand.
The answer above is correct this is just more context.
The UML is specified by the OMG in the two documents (http://www.omg.org/spec/uml): The UML Infrastructure and the UML Supestructure. Whatever documentation may be not official.
In the UML superstructure section 14.3.3 it is said:
A notational shorthand for parallel combined fragments are available for the common situation where the order of event occurrences (or other nested fragments) on one Lifeline is insignificant. This means that in a given “coregion” area of a Lifeline all the directly contained fragments are considered separate operands of a parallel combined fragment.
Related
I'm writing a program that's manipulating polynomials. I'm defining polynomials recursively as either a term (base case) or a sum or product of polynomials (recursive cases).
Sums and products are completely identical as far as their contents are concerned. They just contain a sequence of polynomials. But they need to be processed very differently. So to distinguish them I have to somehow tag my sequences of polynomials.
Currently I have two records - Sum and Product - defined. But this is causing my code to be littered with the line (:polynomials sum-or-product) to extract the contents of polynomials. Also printing out even small polynomials in the REPL produces so much boilerplate that I have to run everything through a dedicated prettyprinting routine if I want to make sense of it.
Alternatives I have considered are tagging my sums and products using metadata instead, or putting a + or * symbol at the head of the sequence. But I'm not convinced that either of these approaches are good style and I'm wondering if there's perhaps another option I haven't considered yet.
Putting a + or * symbol at the head of the sequence sounds like it would print out nicely. I would try implementing the processing of these two different "types" via multimethods, which keeps the calling convention neat and extensible. That document starts from object-oriented programmin view, but the "area of a shape" is a very neat example on what this approach can accomplish.
In your case you'd use first of the seq to determine if you are dealing with a sum or a product of polynomials, and the multimethod would automagically use the correct implementation.
I have seen two different kinds of CRC algorithms. The one kind is called "direct" the other kind is called "non-direct" or "indirect". The code for both is a bit different. Both are able to calculate the same checksum if direct type is supplied with a converted initial value.
I can successfully run both algorithms and I know how to convert the initial value. So this is no problem.
What I couldn't find out: Why do these two algorithms exist? Is there something that one can do what the other can't? Are they redundant from the user's point of view?
UPDATE You can find a testable online implementation (and C implementations of both aglorithms) here. However these terms (or one of them) are mentioned in some more places. Like here ("direct table algorithm"), in a microcontroller reference document, in forums etc.
The "direct" is referring to how to avoid processing n zero bits at the end for an n-bit CRC.
The mathematical definition of the CRC is a division of the message with n zero bits appended to it. You can avoid the extra operations by exclusive-oring the message with the CRC before operating on it instead of after. This requires processing the initial value of the register in the normal version through the CRC, and having that be the new initial value.
Since it is not necessary, you will never see a real-world CRC algorithm doing the extra operations.
See the section "10. A Slightly Mangled Table-Driven Implementation" in the document you link for a more detailed explanation.
I have a strand-specific RNA-seq library to assemble (Illumina). I would like to use TopHat/Cufflinks. From the manual of TopHat, it says,
"--library-type TopHat will treat the reads as strand specific. Every read alignment will have an XS attribute tag. Consider supplying library type options below to select the correct RNA-seq protocol."
Does it mean that TopHat only supports strand-specific protocols? I use option "--library-type fr-unstranded" to run, does it mean it runs in a strand-specific way? I googled it and asked the developers, but got no answer...
I got some result:
Here the contig is assembled by two groups of reads, left side are reverse reads, while right side is forward. (for visualization, i have reverse complement the right mate)
But some of the contigs are assembled purely from reverse or forward reads. If it is strand specific, one gene should produce the reads in the same direction. It should not report the result like the image above, am I right? Or is it possible that one gene is fragmented and then sequence independently, so that happenly left part produce reverse reads while right part produce forward reads? From my understanding, the strand specificity is kept by 3'/5' ligation, so should be in the unit of genes.
What is the problem here? Or did I understand the concept of 'strand specific' wrongly? Any help is appreciated.
Tophat/Cufflinks are not for assembly, they are for alignment to an already assembled genome or transcriptome. What are you aligning your reads to?
Also, if you have strand specific data, you shouldn't choose an unstranded library type. You should choose the proper one based on your library preparation method. The XS tag will only be placed on split reads if you choose an unstranded library type.
If you want to do a de novo assembly of your transcriptome you should take a look at assemblers (not mappers) like
Trinity
SoapDeNovo
Oases....
Tophat can deal with both stranded libraries and unstranded libraries. In your snapshot the center region does have both + and - strand reads. The biases at the two ends might be some characteristics of your library prep or analytical methods. What's the direction of this gene? It looks like a little bit biased towards the left side. If the left hand-side corresponds to 3' end then it's likely that your library prep has 3' bias features (e.g dT-primed Reverse transcription) The way you fragment your RNA may also have effects on your reads distribution.
I guess we need more information to find the truth. But we should also keep in mind that tophat/cufflinks may have bugs, too.
Which one is the correct way to show a RE union 0+1? I saw this two ways but I think both are correct. If both are correct why to complicate things?
They are both correct, as you stated.
The first one looks like it was generated using a set of standard rules -- in this case, it's overkill (and just looks silly), but in more complicated cases it's easier to follow easy rules than to hold the whole thing in your head and write an equivalent NFA from scratch.
In general, an NFA can be rewritten such that it has a single final state (obviously there's already only one start state).
Then, two NFAs in this form can be combined in such a way that the language they accept when combined is the union of the languages they accept individually -- this corresponds to the or (+) in a regular expression. To combine the NFAs in this way, simply create a new node to act as the start state and connect it with ε-transitions to the start states of the two NFAs.
Then, in order to neatly end the NFA in a single final state (so that we can use this NFA recursively for other unions if we want), we create an extra node to serve as the unified final state and ε-connect the old final states (which lose their final status) to it.
Using the general rules above, it's easy to arrive at the first diagram (two NFAs unioned together, the first matching 0, the other 1) -- the second is easy to arrive at via common sense since it's such a simple regex ;-)
The first construct belongs to the class of NFA with e-moves, which is an extension for the general NFA class. e-moves gives you an ability to make transitions without needing an input. For transition function, it is important to compute the set of states reachable from a given state using with e-transitions only. Obviously, adding e-moves does not allow NFA to accept non-regular languages so it is equivalent to NFAs and then DFAs in the end.
NFA with e-moves is used by Thompson's construction algorithm to build an automaton from any regular expression. It provides a standard way to construct an automaton from regexs when it's handy to automate construction.
A C program source code can be parsed according to the C grammar(described in CFG) and eventually turned into many ASTs. I am considering if such tool exists: it can do the reverse thing by firstly randomly generating many ASTs, which include tokens that don't have the concrete string values, just the types of the tokens, according to the CFG, then generating the concrete tokens according to the tokens' definitions in the regular expression.
I can imagine the first step looks like an iterative non-terminals replacement, which is randomly and can be limited by certain number of iteration times. The second step is just generating randomly strings according to regular expressions.
Is there any tool that can do this?
The "Data Generation Language" DGL does this, with the added ability to weight the probabilities of productions in the grammar being output.
In general, a recursive descent parser can be quite directly rewritten into a set of recursive procedures to generate, instead of parse / recognise, the language.
Given a context-free grammar of a language, it is possible to generate a random string that matches the grammar.
For example, the nearley parser generator includes an implementation of an "unparser" that can generate strings from a grammar.
The same task can be accomplished using definite clause grammars in Prolog. An example of a sentence generator using definite clause grammars is given here.
If you have a model of the grammar in a normalized form (all rules like this):
LHS = RHS1 RHS2 ... RHSn ;
and language prettyprinter (e.g., AST to text conversion tool), you can build one of these pretty easily.
Simply start with the goal symbol as a unit tree.
Repeat until no nonterminals are left:
Pick a nonterminal N in the tree;
Expand by adding children for the right hand side of any rule
whose left-hand side matches the nonterminal N
For terminals that carry values (e.g., variable names, numbers, strings, ...) you'll have to generate random content.
A complication with the above algorithm is that it doesn't clearly terminate. What you actually want to do is pick some limit on the size of your tree, and run the algorithm until the all nonterminals are gone or you exceed the limit. In the latter case, backtrack, undo the last replacement, and try something else. This gets you a bounded depth-first search for an AST of your determined size.
Then prettyprint the result. Its the prettyprinter part that is hard to get right.
[You can build all this stuff yourself including the prettyprinter, but it is a fair amount of work. I build tools that include all this machinery directly in a language-parameterized way; see my bio].
A nasty problem even with well formed ASTs is that they may be nonsensical; you might produce a declaration of an integer X, and assign a string literal value to it, for a language that doesn't allow that. You can probably eliminate some simple problems, but language semantics can be incredibly complex, consider C++ as an example. Ensuring that you end up with a semantically meaningful program is extremely hard; in essence, you have to parse the resulting text, and perform name and type resolution/checking on it. For C++, you need a complete C++ front end.
the problem with random generation is that for many CFGs, the expected length of the output string is infinite (there is an easy computation of the expected length using generating functions corresponding to the non-terminal symbols and equations corresponding to the rules of the grammar); you have to control the relative probabilities of the productions in certain ways to guarantee convergence; for example, sometimes, weighting each production rule for a non-terminal symbol inversely to the length of its RHS suffices
there is lot more on this subject in:
Noam Chomsky, Marcel-Paul Sch\"{u}tzenberger, ``The Algebraic Theory of Context-Free Languages'', pp.\ 118-161 in P. Braffort and D. Hirschberg (eds.), Computer Programming and Formal Systems, North-Holland (1963)
(see Wikipedia entry on Chomsky–Schützenberger enumeration theorem)