predicting output symbols in parallel in the seq-to-seq task - parallel-processing

Though I am not sure if the stackoverflow is a right place to ask a theoritical question which is not directly related to a programming topic, I will just post my question.
One of motivations in developing the Transformer was to achieve parallelism by replacing recurrent operations with a fully connected feed-forward network.
However, it still entails a sequential behavior as it generates an output symbol at each step in resorting to previously generated symbols. That means the Transformer is an auto-regressive model when it comes to decoding internal representations and producing output symbols.
Is it possible to make the decoding process parallel by completing all of output symbols simultaneously?
It would be very helpful if one can refer to a research related to this topic. Thanks in advance.

Related

Profiling and performance issue, high Cstats16 and Cstats17 Simulink

We've been having issue with compilation and run time of some of our Controls Simulink model.
We've done some diagnostic work with a MATLAB tool and got results attached below: https://www.mathworks.com/help/simulink/slref/sldiagnostics.html.
We see that 2 factors are quite high compared to others. I'm not exactly familiar with this kind of report but I think it's related to state flow data exchanges and a high number of reference models in normal modes?
We wanted to get these number down for performance reasons. I was wondering what are good practices or common mistakes that would reduce or be the cause of those 2 problematic numbers.
Note, we've tried already.
Refactor some state machine with a lot of inter-dependencies and data exchanges into one.
Refactor to reduce the number of reference models.
We are considering changing some blocks to accelerator modes, but are looking at the trade-off/impact on our models.
Any other ideas?

QA Algorithm for Q Processing

What Algorithm/method do I use for a Question Answering System's Question Processing?
I have been searching possible algorithms for my Question Answering System, the only thing that I think that would be possible to use is Parsing but I have asked about parsing in my last question and with the answers there i think its not possible to be used?(I'm not sure).
My idea of using Parsing is by Cutting the question into pieces word per word and then it will go through a Storage of Words that would determine what Kind of Word(noun,adjective,verb,etc) is being said. My purpose of using Parsing is to remove or rather to determine the Topic of the question.
The other idea of mine is the ChatterBot. A Chatterbot uses a query of words? Correct me if I'm not mistaken and those words are assigned to another Word. It would randomly choose a word from its Query.
Example: User's Statement: Hello > ChatterBot's Possible Replies: Hi,Hello,Hey
I'm not quite sure what is the possible method/algorithm to use in a Question Answering, I have read the Wikipedia post : http://en.wikipedia.org/wiki/Question_answering but I do not quite understand what algorithm to use in Question Processing.
Thank you.
PS: I'm developing in Javascript. Q = Question
You could use a naive bayes classifier in order to look at the questions and determine their subject. You'd need a lot of training data and a fairly narrow domain.
The sophisticated responses to this problem involve a lot of machine inference techniques which are a bit out of my skill level to explain extremely well. My idea is to use a markov network in which each word has an edge to one or two words next to it. A series of tests are applied to each word which indicate likely memberhood of that word to one of its possible meanings (For example, Mark is more likely a name if it's capitalized, but if the next word is 'a' it probably is used in the sense of a verb.) From there the machine can attempt to determine the actual meaning of the sentence, which will rely on the use of, again, unimaginably large amounts of training data.
Coursera's Probabilistic Graphical Models class (Probably their NLP class too) would probably be the best resource if you're interested in becoming skilled in this area. (PGM is the only reason I know anything about this!)
here's a great book, you may need to read to get a lot of stuff related to NLP, and Question answering systems http://www.amazon.com/Speech-Language-Processing-2nd-Edition/dp/0131873210
the book has a full section (V.Applications) that will help you a lot to develop a good system.
but note that the book is discussing theories and algorithms only (no code)
it's not about parsing text only, you'll need to understand the context to provide better answer. actually you need to extract some keywords and ignore everything else.
also you may read in topics Keywords (Bag of words), algorithms like (TF/IDF).

Understanding parallel usage of Fortran 90

y(1:n-1) = a*y(2:n) + x(1:n-1)
y(n) = c
In the above Fortran 90 code I want to know how it is executed in term of synchronization, communication and arithmetic.
What I understand is:
Communication is the need for different task to communication with each other. E.g. if there's some variable that have dependencies with some other variable. But the above code doesn't show that there is some communication. As it seems to be no dependencies, am I right?
Synchronization is somewhat related to communication, but it also involves if there has been some use of barriers. But in the above code there is no barrier. Therefore the only synchronization that is involved is if there are any data dependencies.
Arithmetic I have no clue regarding this point, and would be gladly if someone could explain it to me.
The rule in Fortran is fairly simple: the right hand side is completely evaluated before the result is assigned to the left.
Thus you could claim there is a communication upon assigning (sending the result to y), which is at the same time a synchronization point.
The actual evaluation of the right side could be vectorized/parallelized by the compiler, resulting in arbitrary orders of the evaluations for all entries in the array, except for the last one, which is only set after the first assignment.
However, except for pipelining, there is no real parallelism introduced here by common compilers.
Without stopping too much at the given snippet, it looks you could perhaps be interested (tell me if I'm wrong) at for example, Using OpenMP book (presentation here). It is a nice gentle introduction to the world of parallel computing (memory shared). For larger systems you would do well to google "MPI" and its related subjects. There is really a plethora of material on the matter (a lot of them deal with fortran+mpi / fortran+openmp) so I'll skip giving examples here.
Is this what you were aiming for?

Assembly Analysis Tools

Does anyone have any suggestions for assembly file analysis tools? I'm attempting to analyze ARM/Thumb-2 ASM files generated by LLVM (or alternatively GCC) when passed the -S option. I'm particularly interested in instruction statistics at the basic block level, e.g. memory operation counts, etc. I may wind up rolling my own tool in Python, but was curious to see if there were any existing tools before I started.
Update: I've done a little searching, and found a good resource for disassembly tools / hex editors / etc here, but unfortunately it is mainly focused on x86 assembly, and also doesn't include any actual assembly file analyzers.
What you need is a tool for which you can define an assembly language syntax, and then build custom analyzers. You analyzers might be simple ("how much space does an instruction take?") or complex ("How many cycles will this isntruction take to execute?" [which depends on the preceding sequence of instructions and possibly a sophisticated model of the processor you care about]).
One designed specifically to do that is the New Jersey Machine Toolkit. It is really designed to build code generators and debuggers. I suspect it would be good at "instruction byte count". It isn't clear it is good at more sophisticated analyses. And I believe it insists you follow its syntax style, rather than yours.
One not designed specifically to do that, but good at parsing/analyzing langauges in general is our
DMS Software Reengineering Toolkit.
DMS can be given a grammar description for virtually any context free language (that covers most assembly language syntax) and can then parse a specific instance of that grammar (assembly code) into ASTs for further processing. We've done with with several assembly langauges, including the IBM 370, Motorola's 8 bit CPU line, and a rather peculiar DSP, without trouble.
You can specify an attribute grammar (computation over an AST) to DMS easily. These are great way to encode analyses that need just local information, such as "How big is this instruction?". For more complex analysese, you'll need a processor model that is driven from a series of instructions; passing such a machine model the ASTs for individual instructions would be an easy way to apply a machine model to compute more complex things as "How long does this instruction take?".
Other analyses such as control flow and data flow, are provided in generic form by DMS. You can use an attribute evaluator to collect local facts ("control-next for this instruction is...", "data from this instruction flows to,...") and feed them to the flow analyzers to compute global flow facts ("if I execute this instruction, what other instructions might be executed downstream?"..)
You do have to configure DMS for your particular (assembly) language. It is designed to be configured for tasks like these.
Yes, you can likely code all this in Python; after all, its a Turing machine. But likely not nearly as easily.
An additional benefit: DMS is willing to apply transformations to your code, based on your analyses. So you could implement your optimizer with it, too. After all, you need to connect the analysis indication the optimization is safe, to the actual optimization steps.
I have written many disassemblers, including arm and thumb. Not production quality but for the purposes of learning the assembler. For both the ARM and Thumb the ARM ARM (ARM Architectural Reference Manual) has a nice chart from which you can easily count up data operations from load/store, etc. maybe an hours worth of work, maybe two. At least up front, you would end up with data values being counted though.
The other poster may be right, as with the chart I am talking about it should be very simple to write a program to examine the ASCII looking for ldr, str, add, etc. No need to parse everything if you are interested in memory operations counts, etc. Of course the downside is that you are likely not going to be able to examine loops. One function may have a load and store, another may have a load and store but have it wrapped by a loop, causing many more memory operations once executed.
Not knowing what you really are interested in, my guess is you might want to simulate the code and count these sorts of things. I wrote a thumb simulator (thumbulator) that attempts to do just that. (and I have used it to compare llvm execution vs gcc execution when it comes to number of instructions executed, fetches, memory operations, etc) The problem may be that it is thumb only, no ARM no Thumb2. Thumb2 could be added easier than ARM. There exists an armulator from arm, which is in the gdb sources among other places. I cant remember now if it executes thumb2. My understanding is that when arm was using it would accurately tell you these sorts of statistics.
You can plug your statistics into LLVM code generator, it's quite flexible and it is already collecting some stats, which could be used as an example.

Is it possible to perform arbitrary data analysis in Erlang?

I want to answer questions about data in Erlang: count things, correlate messages, provide arbitrary statistics. I had thought about resorting to Hadoop for this but is it possible to build a solution in raw Erlang to do rather arbitrary data analysis not necessarily via map/reduce but somehow? I have seen some hints of people doing this but no explicit blog posts or examples of this being done. I know that Powerset's natural language capabilities are written in Erlang. I also know about CouchDB but was looking for some other solutions.
Yes.
For general-purpose computation and statistics, Erlang works just fine. It isn't optimized heavily for such work, so it will have trouble keeping up with similar numeric code in, say MatLab, ForTran, or any of the major C package for this work -- but for most uses it will do just fine. And of course if your code parallelizes neatly and you have multiple CPUs available, Erlang will catch up more easily.
(You also mentioned the map/reduce pattern; it is relatively trivial given the Erlang/OTP runtime and libraries.)
I and my colleagues have written plenty of "raw" Erlang to do counting, statistics, and so on. We have found it to be more than sufficient for most tasks.

Resources