Turn value into 2<sup>value</sup> - algorithm

What is the fastest way of turning some value (stored in register) into 2 to the power of that value in assembly language? I think that some bitwise operations can be used. For example:
Value: 8
Result: 256 (2<sup>8</sup>)

So, short answer: What you're looking for is a left shift.
in C and many other languages, your particular wish would be served by 1 << 8.
You could do it in x86 assembler with shl but there's really no sane reason to do so since pretty much any compiler you come across is going to compile the code into the native shift instruction.

Related

How to write asm code "bsrl" in golang

I need to write some asm code in golang. I read this question Is it possible to include inline assembly in Google Go code?, but not see how to write it.
Could anyone help me? thanks.
asm ("bsrl %1, %0;"
:"=r"(bits) /* output */
:"r"(value) ); /* input */
All the answers on the question you found say it's not possible to use inline-asm in Go, with any syntax. GNU C inline-asm syntax isn't going to help.
But fortunately, you don't need inline asm for bsr (which finds the bit-index of the highest set bit). Go 1.9 has an intrinsic / built-in function for bitwise operations that are close enough that they should compile efficiently.
Use math.bits.LeadingZeros32 to get lzcnt(x), which is 31-bsr(x) for non-zero x. This may cost extra instructions, especially on CPUs which only support bsr, not lzcnt (e.g. Intel pre-Haswell).
Or use Len32(x) - 1
Len32(x) returns the number of bits required to represent x. It returns 0 for x=0, and presumably it returns 1 for x=1, so it's bsr(x) + 1, with defined behaviour for 0 (thus potentially costing extra instructions). Hopefully Len32(x) - 1 can compile directly to a bsr.
Of course, if what you really wanted was lzcnt, then use LeadingZeros32 in the first place.
Note that bsr leaves the destination register unmodified for input = 0. Intel's docs only say with an undefined value, so compilers probably don't take advantage of this guarantee that AMD documents and Intel does provide in hardware.
At least in theory, though, Len32(x) - 1 could compile to a single bsr instruction if the compiler can prove that x is non-zero.

How to see the local variable in DDC-I debugger?

I am trying to see the index value of for loop in DDC-I debugger and it always shows me ERROR.
With the assembly of the same, it shows the following instruction:
cmp cr7,0,r20,r23
so it's comparing r20 and r23 but both of these registers don't hold the index value. I am not sure what is cr7 ?
In short, most embedded tool chains (including the ones you pay for) are horrible about reconstructing local/automatic variables in even lightly optimized code. A lot of them simply can't reconstruct variables that never have storage because they live in registers the whole time (loop index variables like the one you can't see are typical cases). Some even have issues with interim computation holders, and arguments (since they're almost always passed as registers).
Typical strategies might be:
Temporarily turning off optimizations around the code in question
Temporarily moving the variable in question to the global scope
Becoming proficient at reading disassembly.
This isn't a terribly practical answer, but it is surprising for a lot of people that are new to the embedded world or never had the luxury of a source level debugger on their embedded platform.
On PowerPC there are eight CR fields, cr0 to cr7. If you don't specify a CR field for a compare result the default is cr0, but in this case cr7 is specified and so the flags in field cr7 will indicate the result of the compare operation. There are 4 condition code bits in each CR field: lt, gt, eq and so. Typically the compare will be followed by a conditional branch, bc.
There is some useful info in this IBM developerWorks article: Assembly language for Power Architecture, Part 3: Programming with the PowerPC branch processor.

Modify only the LSB of a memory cell

Is it possible to write a sequence of instructions that will place a 1 in the least significant bit of the memory cell at address B3 without disturbing the other bits in the memory cell?
The machine instructions I am referring to is the STOP, ADD, SWITCH, STOP, LOAD, ROTATE etc.
Clarification: this question was originally tagged C#; since it wasn't the OP that re-tagged it, I'll leave this here until the OP's intentions are clearer.
C# is a high-level programming language, which compiles down to IL, not machine code. As such: no, there is absolutely no supported mechanism for performing specific machine code operations (and even if there were, it couldn't possibly port between langauges).
You can do high level bit operations, using the operators on the integer-based types; and if you really want you can write IL, either building it manually (ilasm), or at runtime via DynamicMethod / ILGenerator - but these still only deal with CIL opcodes, not machine codes.
I think ORing it with 1 will do the job ain't it:
algo:
byte= [data at 0xB3]
byte = byte | 0x01
this works fine with me in developing for 8051 MCUs.

Pseudo Random Number Generator Project

I am required to design and build an 8 bit Pseudo Random Number Generator. I have looked at possible methods; using background noise, user input etc. I was wondering if anyone could give me some advice on where to start as this would be of great help to me.
random.org is perhaps the best place to start your investigation.
Below should get you started with the basics
howstuffworks.com
Construct your own random number generator
For a simple 8 bit PRNG you could ry something like a Linear Feedback Shift Register. This is very simple to implement in either software or hardware.
My plan is to use a temperature sensor. When the temps are being processed in the ADC, I am going to amplify the noise generated. This will then give me the random 8 bit number I require which will be used as the 'seed' for the PRNG in stdlib (C programming).
What do you's think?
I've found that the following works very well. This is implemented in MSP430 assembly, but would be easy enough to port to another processor. I've used this to generate 'white' noise for a synthesizer project, and there were no audible patterns in the output. Depending on what your requirements are, this might be sufficient. It uses two state variables, the previous output (8 bits), and a 16-bit state register. I found this online, http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=95614&highlight=radbrad, where it's listed in AVR assembly, and ported it to MSP.
Because it uses shifts and shifts the top bit out of one register into the bottom of another, it doesn't really lend itself to efficient implementation in C. Hence the assembly. I hope you find this as useful as I did.
mov.b &rand_out, r13
mov.b r13,r12
and.b #66, r13
jz ClearCarry
cmp.b #66, r13
xor.w #1, sr ; invert carry flag
jmp SkipClearCarry
ClearCarry:
clrc
SkipClearCarry:
rlc.w &rand_state
rlc.b r12
mov.b r12,&rand_out
ret

Why differ(!=,<>) is faster than equal(=,==)?

I've seen comments on SO saying "<> is faster than =" or "!= faster than ==" in an if() statement.
I'd like to know why is that so. Could you show an example in asm?
Thanks! :)
EDIT:
Source
Here is what he did.
function Check(var MemoryData:Array of byte;MemorySignature:Array of byte;Position:integer):boolean;
var i:byte;
begin
Result := True; //moved at top. Your function always returned 'True'. This is what you wanted?
for i := 0 to Length(MemorySignature) - 1 do //are you sure??? Perhaps you want High(MemorySignature) here...
begin
{!} if MemorySignature[i] <> $FF then //speedup - '<>' evaluates faster than '='
begin
Result:=memorydata[i + position] <> MemorySignature[i]; //speedup.
if not Result then
Break; //added this! - speedup. We already know the result. So, no need to scan till end.
end;
end;
end;
I'd claim that this is flat out wrong except perhaps in very special circumstances. Compilers can refactor one into the other effortlessly (by just switching the if and else cases).
It could have something to do with branch prediction on the CPU. Static branch prediction would predict that a branch simply wouldn't be taken and fetch the next instruction. However, hardly anybody uses that anymore. Other than that, I'd say it's bull because the comparisons should be identical.
I think there's some confusion in your previous question about what the algorithm was that you were trying to implement, and therefore in what the claimed "speedup" purports to do.
Here's some disassembly from Delphi 2007. optimization on. (Note, optimization off changed the code a little, but not in a relevant way.
Unit70.pas.31: for I := 0 to 100 do
004552B5 33C0 xor eax,eax
Unit70.pas.33: if i = j then
004552B7 3B02 cmp eax,[edx]
004552B9 7506 jnz $004552c1
Unit70.pas.34: k := k+1;
004552BB FF05D0DC4500 inc dword ptr [$0045dcd0]
Unit70.pas.35: if i <> j then
004552C1 3B02 cmp eax,[edx]
004552C3 7406 jz $004552cb
Unit70.pas.36: l := l + 1;
004552C5 FF05D4DC4500 inc dword ptr [$0045dcd4]
Unit70.pas.37: end;
004552CB 40 inc eax
Unit70.pas.31: for I := 0 to 100 do
004552CC 83F865 cmp eax,$65
004552CF 75E6 jnz $004552b7
Unit70.pas.38: end;
004552D1 C3 ret
As you can see, the only difference between the two cases is a jz vs. a jnz instruction. These WILL run at the same speed. what's likely to affect things much more is how often the branch is taken, and if the entire loop fits into cache.
For .Net languages
If you look at the IL from the string.op_Equality and string.op_Inequality methods, you will see that both internall call string.Equals.
But the op_Inequality inverts the result. This is two IL-statements more.
I would say they the performance is the same, with maybe a small (very small, very very small) better performance for the == statement. But I believe that the optimizer & JIT compiler will remove this.
Spontaneous though; most other things in your code will affect performance more than the choice between == and != (or = and <> depending on language).
When I ran a test in C# over 1000000 iterations of comparing strings (containing the alphabet, a-z, with the last two letters reversed in one of them), the difference was between 0 an 1 milliseconds.
It has been said before: write code for readability; change into more performant code when it has been established that it will make a difference.
Edit: repeated the same test with byte arrays; same thing; the performance difference is neglectible.
It could also be a result of misinterpretation of an experiment.
Most compilers/optimizers assume a branch is taken by default. If you invert the operator and the if-then-else order, and the branch that is now taken is the ELSE clause, that might cause an additional speed effect in highly calculating code (*)
(*) obviously you need to do a lot of operations for that. But it can matter for the tightest loops in e.g. codecs or image analysis/machine vision where you have 50MByte/s of data to trawl through.
.... and then I even only stoop to this level for the really heavily reusable code. For ordinary business code it is not worth it.
I'd claim this was flat out wrong full stop. The test for equality is always the same as the test for inequality. With string (or complex structure testing), you're always going to break at exactly the same point. Until that break point is reached, then the answer for equality is unknown.
I strongly doubt there is any speed difference. For integral types for example you are getting a CMP instruction and either JZ (Jump if zero) or JNZ (Jump if not zero), depending on whether you used = or ≠. There is no speed difference here and I'd expect that to hold true at higher levels too.
If you can provide a small example that clearly shows a difference, then I'm sure the Stack Overflow community could explain why. However, I think you might have difficulty constructing a clear example. I don't think there will be any performance difference noticeable at any reasonable scale.
Well it could be or it couldn't be, that is the question :-)
The thing is this is highly depending on the programming language you are using.
Since all your statements will eventually end up as instructions to the CPU, the one that uses the least amount of instruction to achieve the result will be the fastest.
For example if you say bits x is equal to bits y, you could use the instruction that does an XOR using both bits as an input, if the result is anything but 0 it is not the same. So how would you know that the result is anything but 0? By using the instruction that returns true if you say input a is bigger than 0.
So this is already 2 instructions you use to do it, but since most CPU's have an instruction that does compare in a single cycle it is a bad example.
The point I am making is still the same, you can't make this generally statements without providing the programming language and the CPU architecture.
This list (assuming it's on x86) of ASM instructions might help:
Jump if greater
Jump on equality
Comparison between two registers
(Disclaimer, I have nothing more than very basic experience with writing assembler so I could be off the mark)
However it obviously depends purely on what assembly instructions the Delphi compiler is producing. Without seeing that output then it's guesswork. I'm going to keep my Donald Knuth quote in as caring about this kind of thing for all but a niche set of applications (games, mobile devices, high performance server apps, safety critical software, missile launchers etc.) is the thing you worry about last in my view.
"We should forget about small
efficiencies, say about 97% of the
time: premature optimization is the
root of all evil."
If you're writing one of those or similar then obviously you do care, but you didn't specify it.
Just guessing, but given you want to preserve the logic, you cannot just replace
if A = B then
with
if A <> B then
To conserve the logic, the original code must have been something like
if not (A = B) then
or
if A <> B then
else
and that may truely be a little bit slower than the test on inequality.

Resources