Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
2^(n/2+10 log n)
or
2^n?
I was doing an exercise in MIT OCW 6.006. It has a problem which states that later grows faster than the former. But I cant agree with the proof. I say that the former grows faster than the later. Could someone explain if I am wrong and let me know why. Thanks!
You could frame that differently by pulling out the exponent part, then just ask which is bigger (n/2+10logn) or n.
Here its clear the 2nd will be bigger whenever 10logn is less than half n.
That becomes true when n reaches about 30, so from then on, the second is bigger. (for log base 10)
Lets discuss log base 2 further and when might 10LogN be less than N/2?
Well, thats the same as asking when does logN become less than N/20
Loosely speaking, log_2 is the number of bits needed to describe a number in base 2. So:
log_2(32) gives us 5.
log_2(64) gives us 6.
log_2(128) gives us 7. <-- look here 128:7 is about 18:1
log_2(256) gives us 8.
log_2(512) gives us 9.
log_2(1024) gives us 10.
log_2(64000) gives us ~16.
Now we are looking for when the first value (32,64,128,etc) is more than 20 times the second. As you can see this would happens just past the 128/7 pair, and they rapid get much further apart.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In my software engineering course, I encountered the following characteristic of a stack, condensed by me: What you push is what you pop. The fully axiomatic version I Uled here.
Being a natural-born troll, I immediately invented the Troll Stack. If it has more than 1 element already on it, pushing results in a random permutation of those elements. Promptly I got into an argument with the lecturers whether this nonsense implementation actually violates the axioms. I said no, the top element stays where it is. They said yes, somehow you can recursively apply the push-pop-axiom to get "deeper". Which I don't see. Who is right?
The violated axiom is pop(push(s,x)) = s. Take a stack s with n > 1 distinct entries. If you implement push such that push(s,x) is s'x with s' being a random permutation of s, then since pop is a function, you have a problem: how do you reverse random_permutation() such that pop(push(s,x)) = s? The preimage of s' might have been any of the n! > 1 permutations of s, and no matter which one you map to, there are n! - 1 > 0 other original permutations s'' for which pop(push(s'',x)) != s''.
In cases like this, which might be very easy to see for everybody but not for you (hence your usage of the "troll" word), it always helps to simply run the "program" on a piece of paper.
Write down what happens when you push and pop a few times, and you will see.
You should also be able to see how those axioms correspond very closely to the actual behaviour of your stack; they are not just there for fun, but they deeply (in multiple meanings of the word) specify the data structure with its methods. You could even view them as a "formal system" describing the ins and outs of stacks.
Note that it is still good for you to be sceptic; this leads to a) better insight and b) detection of errors your superiours make. In this case they are right, but there are cases where it can save you a lot of time (e.g. while searching the solution for the "MU" riddle in "Gödel, Escher, Bach", which would be an excellent read for you, I think).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
Recently,I have been trying to understand how the Binary Extended Euclidean Algorithm works at the processor level. This question is all about finding an Inverse element in GF(2^m) with polynomial basis.
Generally I came across the Extended Euclidean Algorithm for evaluating an inverse element but the fact is that it involves too many addition and multiplication operations. The Binary EEA algorithm requires just bit shifting operations (equivalent to division by 2--logical shift right). The algorithm is in this link, page number 8.
In step 3 and 5 of this algorithm, every iteration shifts the parameters u and b by 1 bit to the right adding zero to the MSB at the same time. The loop ends when u == 1 and returns b. My question is how many primitive operations does a processor (say a 32 bit processor for example) perform in step 3 or step 5 of every iteration?
I came across barrel shifter and I am quite confused about how fast the shifting takes place. Should I really consider these primitive operations or should I ignore them if because the shifting may be faster?
It would really help me a lot if someone would show the primitive operations for the case where the size of u is 194 bits.
In case you might be wondering about the denominator x in step 3 and 5 of the algorithm, its the polynomial representation and x means nothing but 10 in binary and parameter u is an N-bit binary number.
There is no generic answer to this question: you can use portable code that will be tedious to optimize or highly machine specific code that will be even more complicated to optimize without breaking.
If you want real performance, you have to use MMX/AVX registers on the maximum width you can get your hands on. Intel provides lightweight wrappers on low-level instructions as macros and inline functions.
Always use unsigned types for your shifting operations to avoid unnecessary steps.
Usually ther is a "right shift" assembly OP code which is able to right shift a register a given number of bits. Such an operation takes one cycle.
This assumes thet your value is already loaded to the register however.
The best answer anyway: Implement this algorithm in a low level language (C, C++) and look at the assembly code produced by the compiler.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
im new to assembly language and i know many codes.
However im working with 8086 emulator with only works with 16 bit numbers.
this is a home work that im really stuck in :
How can i write an assembly code with do the following :
1-get 20 , maximum 6-digits decimal numbers and store them in an array.
2- sort the array in ascending order.
its really hard for me to understand how to manage registers and stack for this long numbers.
every help will be appreciated in advance .
In order to sort 32bit numbers (or broader) with 16bit registers you have to compare the upper part of each number separately.
Assume we have these random two 32 bit numbers (shown in hex) 4567afdf and 321abc09.
Now when you look at them as 16 bit values they look like this:
4567 afdf
321a bc09
As you can easily see, the upper 16 bits you can compare individually.
If the upper 16 bits are higher or lower, then you know that the lower part doesn't matter anymore and you sort them accordingly.
If the upper 16 bits are equal, then you compare the lower 16 bits and if they are also equal, both numbers are equal => no sort needed, otherwise you shuffle them accordingly. Since the upper 16 bits are also equal, you don't even need to shuffle them.
If the upper 16bits are different, you still have to shuffle the lower 16 bits accordingly, as they might be different.
The basics of this approach can be used for an arbitrary number of bits not just 32bits. Generally when you have a seemingly hard problem, you should try to think of the easy examples and how you can solve it. Then you can extend it to more complicated cases.
EDIT:
An alternative approach would be, if you have strings of decimal numbers and you want to sort them based on the string representation instead of the numbers.
In this case, you can do it as follows
If the length of the two number strings are differnt, the shorter one is the lower number.
if the length is equal, then you can look at each digit individually (starting with the first digit) until you hit a non-equal digit or the string end. If you reaced the end of the string, the numbers are the same, otherwise you kn ow which one is higher/lower.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I am reading through "Computer Architecture: A Quantative Approach, 5th ed" and am looking at an example from Chapter 5 on page 350. Attached is a scan of the example in question. I do not quite follow the logic of how they do things in this example.
My questions are, as follows:
Where is the 0.3ns cycle time coming from?
200/0.3 is roughly 666 cycles, I follow this. However, when plugged back into the CPI equation, it makes no sense: 0.2% (0.002) x 666 is equal to 1.332 and not 1.2. What is going on here?
When they say that "the multiprocessor with all local references is 1.7/0.5 = 3.4 times faster", where are they getting that from? Meaning: I see nowhere in the given information stating that local communication is twice as fast...
Any help would be appreciated.
Where is the 0.3ns cycle time coming from?
That comes from the clock rate of 3.3 GHz. 1 / 3.3 GHz = 0.3ns.
200/0.3 is roughly 666 cycles, I follow this. However, when plugged back into the CPI equation, it makes no sense: 0.2% (0.002) x 666 is equal to 1.332 and not 1.2. What is going on here?
I think you're right. That looks like a misprint. That should be
CPI = 0.5 + 1.33 = 1.83
When they say that "the multiprocessor with all local references is 1.7/0.5 = 3.4 times faster", where are they getting that from? Meaning: I see nowhere in the given information stating that local communication is twice as fast...
They don't say anywhere that local communication is twice as fast. They're dividing the effective CPI that they calculated for the multiprocessor with 0.2% remote references by the base CPI of 0.5. This tells you how many times faster the multiprocessor with all local references is. (Of course it should be about 1.83/0.5 = 3.66 times faster.)
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Please recommend an error correcting algorithm for using very strange data channel.
The channel consists of two parts: Corrupter and Eraser.
Corrupter receives a word consisting of 10000 symbols in 3-symbol alphabet, say, {'a','b','c'}.
Corrupter changes each symbol with probability 10%.
Example:
Corrupter input: abcaccbacbbaaacbcacbcababacb...
Corrupter output: abcaacbacbbaabcbcacbcababccb...
Eraser receives corrupter output and erases each symbol with probability 94%.
Eraser produces word of the same length in 4-symbol alphabet {'a','b','c','*'}.
Example:
Eraser input: abcaacbacbbaabcbcacbcababccb...
Eraser output: *******a*****************c**...
So, on eraser output, approximately 6%*10000=600 symbols would not be erased, approximately 90%*600=540 of them would preserve their original values and approximately 60 would be corrupted.
What encoding-decoding algorithm with error correction is best suited for this channel?
What amount of useful data could be transmitted providing > 99.99% probability of successful decoding?
Is it possible to transmit 40 bytes of data through this channel? (256^40 ~ 3^200)
Here's something you can at least analyze:
Break your 40 bytes up into 13 25-bit chunks (with some wastage so this bit can obviously be improved)
2^25 < 3^16 so you can encode the 25 bits into 16 a/b/c "trits" - again wastage means scope for improvement.
With 10,000 trits available you can give each of your 13 encoded byte triples 769 output trits. Pick (probably at random) 769 different linear (mod 3) functions on 16 trits - each function is specified by 16 trits and you take a vector dot product between those trits and the 16 input trits. This gives you your 769 output trits.
Decode by considering all possible (2^25) chunks and pick the one which matches most of the surviving trits. You have some hope of getting the right answer as long as there are at least 16 surviving trits, which I think excel is telling me via BINOMDIST() happens often enough that there is a pretty good chance that it will happen for all of the 13 25-bit chunks.
I have no idea what error rate you get from garbling but random linear codes have a pretty good reputation, even if this one has a short blocksize because of my brain-dead decoding technique. At worst you could try simulating the encoding transmission and decoding of 25-bit chunks and work it out from there. You can get a slightly more accurate lower bound on error rate than above if you pretend that the garbling stage erases as well and so recalculate with a slightly higher probability of erasure.
I think this might actually work in practice if you can afford the 2^25 guesses per 25-bit block to decode. OTOH if this is a question in a class my guess is you need to demonstrate your knowledge of some less ad-hoc techniques already discussed in your class.