How do for loops in Verilog execute?

Do for loops in Verilog execute in parallel? I need to call a module several times, but they have to execute at the same time. Instead of writing them out one by one, I was thinking of using a for loop. Will it work the same?

Verilog describes hardware, so it doesn't make sense to think in terms of executing loops or calling modules in this context. If I understand the intent of your question correctly, you'd like to have multiple instantiations of the same module with distinct inputs and outputs.
To accomplish this you can use Verilog's generate statements to generate the instantiations automatically.
You can also use the auto_template functionality in Emacs' excellent verilog-mode. I prefer this approach as each instantiation appears explicitly in my source code and I find it easier to detect errors.

As jlf answered, you're looking for a generate statement. You would use a for-loop to model combinational logic, such as going through all of the bits in a register and computing an output. This would be in an always block or even an initial block in your testbench.


When do you use a block statement in a VHDL design and when do you not?

I come from SW world and recently I've started to create FPGA designs in VHDL. I've read about the block concurrent statement and its principal uses like organize architecture grouping concurrent code and guard signals, which is not recommendable.
But this is one of many possibilities in order to implement the same functionality. For instance, I've been implemented a CRC frame checker with a VHDL function. It has one bit value input, and return a register with the cumulative CRC value of all bit inputs.
I think the same functionality can be implemented with a block. What is the best option for resource utilization? When would you use a block and when would not? Which is the best case to implement a block?
What is the best option for resource utilization?
There should be no different between with or without block in terms of resource utilization. This assumes that you're creating the same logic.
When would you use a block and when would not?
Similar to software, the only reason you want to use block statement is when you want to limit the scope of the variables used within a portion of the code. This can significantly improve code readability in a large design where signals can be declared and utilized in the same region.
I would not recommend anyone to use block statement in a small design, or where component instantiation is more appropriate.
Which is the best case to implement a block?
When it improves code readability.

What do we need to define while using parallel optimization flag?

I have a program with more than 100 subroutines and I am trying to make this code to run faster and I am trying to compile these subroutines using parallel flag. I was wondering what variable or parameters do I need to define in the program if I want to use the parallel flag. Just using the parallel optimization flag increased the run time for my program compared to the one without parallel flag.
I can give you some general guidelines, but without knowing your specific compiler and platform/OS I won't be able to help you specifically. As far as I know, all of the autoparallelization schemes that are used in Fortran compilers end up using either OpenMP or MPI commands to split the loops out into either threads or processes. The issue is that there is a certain amount of overhead associated with those schemes. For instance, in one case I had a program that used an optimization library which was provided by a vendor as a compiled library without optimization within it. As all of my subroutines and functions were either outside or inside the large loop of the optimizer, and since there was only object data, the autoparallelizer wasn't able to perform ipo and as such it failed to use more than the one core. The run times in this case, due to the DLL that was loaded for OpenMP, the /qparallel actually added ~10% to the run time.
As a note, autoparallelizers aren't magic. Essentially all they are doing is the same type of thing that the autovectorization techniques do, which is to look for loops that have no data that are dependent upon the previous iteration. If it detects that variables are changed between iterations or if the compiler can't tell, then it will not attempt to parallelize the loop.
If you are using the Intel Fortran compiler, you can turn on a diagnostic switch "/qpar-report3" or "-par-report3" to give you information as to the dependency tree of loops to see why they failed to optimize. If you don't have access to large sections of the code you are using, in particular parts with major loops, there is a good chance that there won't be much opportunity in your code to use the auto-parallelizer.
In any case, you can always attempt to reduce dependencies and reformulate your code such that it is more friendly to autoparallelization.

vhdl multiplicationof 2 numbers

I am a new at vhdl and i have to multiplication two unsigned vectors like we all did in high scool
so i wrote the program and it dose compile but the result is not good.
The logic looks ok but still it dose not work can any one help.
I could not get how to place code here so please see the image attached.
When writing VHDL you'll first and foremost need to think hardware. Even though various statements may look similar to what you know from other languages, many of these behave differently, as they are mapped to hardware and evaluated in parallel rather than sequentially.
For instance, for loops in VHDL do not iterate through the loop, but rather replicate the loop contents and evaluate all of these in parallel. So your idea of accumulating temp will not work, as all values of temp1 would be available at the same time instead of one after another.
The easy way of handling multiplication is to just use the * operator, as many synthesizers will pick this up and automatically instantiate the necessary hardware. I assume this is some form of exercise though, where you need to implement the functionality yourself - so just ditch the for loop and store the intermediate results in their own variable, and then add them all up in the end.

"GLOBAL could be very inefficient"

I am using (in Matlab) a global statement inside an if command, so that I import the global variable into the local namespace only if it is really needed.
The code analyzer warns me that "global could be very inefficient unless it is a top-level statement in its function". Thinking about possible internal implementation, I find this restriction very strange and unusual. I am thinking about two possibilities:
What this warning really means is "global is very inefficient of its own, so don't use it in a loop". In particular, using it inside an if, like I'm doing, is perfectly safe, and the warning is issued wrongly (and poorly worded)
The warning is correct; Matlab uses some really unusual variable loading mechanism in the background, so it is really much slower to import global variables inside an if statement. In this case, I'd like to have a hint or a pointer to how this stuff really works, because I am interested and it seems to be important if I want to write efficient code in future.
Which one of these two explanations is correct? (or maybe neither is?)
EDIT: to make it clearer: I know that global is slow (and apparently I can't avoid using it, as it is a design decision of an old library I am using); what I am asking is why the Matlab code analyzer complains about
but not about
I find it difficult to imagine a reason why the first should be slower than the second.
To supplement eykanals post, this technical note gives an explanation to why global is slow.
... when a function call involves global variables, performance is even more inhibited. This is because to look for global variables, MATLAB has to expand its search space to the outside of the current workspace. Furthermore, the reason a function call involving global variables appears a lot slower than the others is that MATLAB Accelerator does not optimize such a function call.
I do not know the answer, but I strongly suspect this has to do with how memory is allocated and shared at runtime.
Be that as it may, I recommend reading the following two entries on the Mathworks blogs by Loren and Doug:
Writing deployable code, the very first thing he writes in that post
Top 10 MATLAB code practices that make me cry, #2 on that list.
Long story short, global variables are almost never the way to go; there are many other ways to accomplish variable sharing - some of which she discusses - which are more efficient and less error-prone.
The answer from Walter Roberson here
[...] This is not necessarily more work if not done in a top-level command, but people would tend to put the construct in a loop, or in multiple non-exclusive places in conditional structures. It is a lot easier for a person writing mlint warnings to not have to add clarifications like, "Unless you can prove those "global" will only be executed once, in which case it isn't less efficient but it is still bad form"
supports my option (1).
Fact(from Matlab 2014 up until Matlab 2016a, and not using parallell toolbox): often, the fastest code you can achieve with Matlab is by doing nested functions, sharing your variables between functions without passing them.
The step close to that, is using global variables, and splitting your project up into multiple files. This may pull down performance slightly, because (supposedly, although I have never seen it verified in any tests) Matlab incurs overhead by retrieving from the global workspace, and because there is some kind of problem (supposedly, although never seen any evidence of it) with the JIT acceleration.
Through my own testing, passing very large data matrices (hi-res images) between calls to functions, using nested functions or global variables are almost identical in performance.
The reason that you can get superior performance with global variables or nested functions, is because you can avoid having extra data copying that way. If you send a variable to function, Matlab does so by reference, but if you modify the variable in the function, Matlab makes a copy on the fly (copy-on-write). There is no way I know of to avoid that in Matlab, except by nested functions and global variables. Any small drain you get from hinderance to JIT or global fetch times, is totally gained by avoiding this extra data copying, (when using larger data).
This may have changed with never versions of Matlab, but from what i hear from friends, I doubt it. I cant submit any test, dont have a Matlab license anymore.
As proof, look no further then this toolbox of video processing i made back in the day I was working with Matlab. It is horribly ugly under the hood, because I had no way of getting performance without globals.
This fact about Matlab (that global variables is the most optimized way you can code when you need to modify large data in different functions), is an indication that the language and/or interpreter needs to be updated.
Instead, Matlab could use a better, more dynamic notion of workspace. But nothing I have seen indicates this will ever happen. Especially when you see the community of users seemingly ignore the facts, and push forward oppions without any basis: such as using globals in Matlab are slow.
They are not.
That said, you shouldnt use globals, ever. If you are forced to do real time video processing in pure Matlab, and you find you have no other option then using globals to reach performance, you should get the hint and change language. Its time to get into higher performance languages.... and also maybe write an occasional rant on stack overflow, in hopes that Matlab can get improved by swaying the oppinions of its users.

Why is determining if a function is pure difficult?

I was at the StackOverflow Dev Days convention yesterday, and one of the speakers was talking about Python. He showed a Memoize function, and I asked if there was any way to keep it from being used on a non-pure function. He said no, that's basically impossible, and if someone could figure out a way to do it it would make a great PhD thesis.
That sort of confused me, because it doesn't seem all that difficult for a compiler/interpreter to solve recursively. In pseudocode:
function isPure(functionMetadata): boolean;
result = true;
for each variable in functionMetadata.variablesModified
result = result and variable.isLocalToThisFunction;
for each dependency in functionMetadata.functionsCalled
result = result and isPure(dependency);
That's the basic idea. Obviously you'd need some sort of check to prevent infinite recursion on mutually-dependent functions, but that's not too difficult to set up.
Higher-order functions that take function pointers might be problematic, since they can't be verified statically, but my original question presupposes that the compiler has some sort of language constraint to designate that only a pure function pointer can be passed to a certain parameter. If one existed, that could be used to satisfy the condition.
Obviously this would be easier in a compiled language than an interpreted one, since all this number-crunching would be done before the program is executed and so not slow anything down, but I don't really see any fundamental problems that would make it impossible to evaluate.
Does anyone with a bit more knowledge in this area know what I'm missing?
You also need to annotate every system call, every FFI, ...
And furthermore the tiniest 'leak' tends to leak into the whole code base.
It is not a theoretically intractable problem, but in practice it is very very difficult to do in a fashion that the whole system does not feel brittle.
As an aside, I don't think this makes a good PhD thesis; Haskell effectively already has (a version of) this, with the IO monad.
And I am sure lots of people continue to look at this 'in practice'. (wild speculation) In 20 years we may have this.
It is particularly hard in Python. Since anObject.aFunc can be changed arbitrarily at runtime, you cannot determine at compile time which function will anObject.aFunc() call or even if it will be a function at all.
In addition to the other excellent answers here: Your pseudocode looks only at whether a function modifies variables. But that's not really what "pure" means. "Pure" typically means something closer to "referentially transparent." In other words, the output is completely dependent on the input. So something as simple as reading the current time and making that a factor in the result (or reading from input, or reading the state of the machine, or...) makes the function non-pure without modifying any variables.
Also, you could write a "pure" function that did modify variables.
Here's the first thing that popped into my mind when I read your question.
Class Hierarchies
Determining if a variable is modified includes the act of digging through every single method which is called on the variable to determine if it's mutating. This is ... somewhat straight forward for a sealed type with a non-virtual method.
But consider virtual methods. You must find every single derived type and verify that every single override of that method does not mutate state. Determining this is simply not possible in any language / framework which allows for dynamic code generation or is simply dynamic (if it's possible, it's extremely difficult). The reason why is that the set of derived types is not fixed because a new one can be generated at runtime.
Take C# as an example. There is nothing stopping me from generating a derived class at runtime which overrides that virtual method and modifies state. A static verified would not be able to detect this type of modification and hence could not validate the method was pure or not.
I think the main problem would be doing it efficiently.
D-language has pure functions but you have to specify them yourself, so the compiler would know to check them. I think if you manually specify them then it would be easier to do.
Deciding whether a given function is pure, in general, is reducible to deciding whether any given program will halt - and it is well known that the Halting Problem is the kind of problem that cannot be solved efficiently.
Note that the complexity depends on the language, too. For the more dynamic languages, it's possible to redefine anything at any time. For example, in Tcl
proc myproc {a b} {
if { $a > $b } {
return $a
} else {
return $b
Every single piece of that could be modified at any time. For example:
the "if" command could be rewritten to use and update global variables
the "return" command, along the same lines, could do the same thing
the could be an execution trace on the if command that, when "if" is used, the return command is redefined based on the inputs to the if command
Admittedly, Tcl is an extreme case; one of the most dynamic languages there is. That being said, it highlights the problem that it can be difficult to determine the purity of a function even once you've entered it.
