Pattern of decoding instruction - parallel-processing

Pattern of decoding instruction - parallel-processing

I am analyzing Agner Fog's
"Optimizing subroutines in assembly language: An optimization guide for x86 platforms".
Especially I am trying to understand chapter 12.7. And there is an issue I can not understand. The author writes:
Instruction decoding in the PM processor follows the 4-1-1 pattern. The pattern of (fused)
μops for each instruction in the loop in example 12.6b is 2-2-2-2-2-1-1-1. This is not optimal,
and it will take 6 clock cycles to decode. This is more than the retirement time, so we can
conclude that instruction decoding is the bottleneck in example 12.6b. The total execution
time is 6 clock cycles per iteration or 3 clock cycles per calculated Y[i] value.
What does it mean that instruction decoding follows the 4-1-1 pattern and how to know it?
Pattern for loop is 2-2-2-2-2-1-1-1. Ok, but why it takes 6 cycle to decode I don't know. Why?

The CPU's frontend can decode multiple (macro) instructions in one clock cycle. Each macro instruction decodes to 1 or more micro-ops (μops). What the 4-1-1 pattern means is that the first parallel decoder can handle a complex instruction that decodes to up to 4 μops. But the second and third parallel decoders can only handle instructions that decode to 1 μop each (if not satisfied, they don't consume the instruction).
The 5 instructions that decode to 2 μops will must be consumed by the first decoder, then the tail allows some parallelism.
2 2 2 2 2 1 1 1 (Macro-instruction stream, μops per instruction)
^ x x
4 1 1 (Decode cycle 0)
. 2 2 2 2 1 1 1
^ x x
4 1 1 (Decode cycle 1)
. . 2 2 2 1 1 1
^ x x
4 1 1 (Decode cycle 2)
. . . 2 2 1 1 1
^ x x
4 1 1 (Decode cycle 3)
. . . . 2 1 1 1
^ ^ ^
4 1 1 (Decode cycle 4)
. . . . . . . 1
^ x x
4 1 1 (Decode cycle 5)
. . . . . . . . (Instruction stream fully consumed)

Related

Constrained N-Rook Number of Solutions

The purpose of this post is mainly for keywords for related researches.
Unconstrained N-Rook Problem
Count all possible arrangements of N rooks on an N by M (N<=M) chessboard so that no rooks are attacking each other.
The solution is trivial: C(M,N)N!
Constrained N-Rook Problem
You cannot put a rook at certain places of the chessboard.
For example, if the chessboard is presented as a 0-1 matrix, where 0 are the places you cannot put a rook at. So the solution for the matrix
1 1 1
1 1 1
0 1 1
is 4:
R . . | R . . | . R . | . . R
. R . | . . R | R . . | R . .
. . R | . R . | . . R | . R .
Related Problem
A backtracking algorithm can be easily modified from N-Queen problem. However, now I want to solve a problem for around N=28. This solution is too huge to count 1 by 1, even wiki said
The 27×27 board is the highest-order board that has been completely enumerated.
Chances to Speed Up
There are a few chances I thought of so far to speed up the algorithm.
=====Factorial for Unconstrained Submatrix=====
This is a Divide and Conquer method. e.g. The matrix above
1 1 1
1 1 1
0 1 1
can be divided as
A B
1 1 1 | 0 1 1
1 1 1 |
and the solution is equal to sol(A)*sol(B), where sol(A)=2! which can be calculated at once (factorial is much faster than backtracking).
=============Rearrangement=============
Sometimes rearrangement can help to divide the subproblem. e.g. The matrix
1 1 1
1 0 1
1 1 1
is equivalent to
1 1 1
1 1 1
0 1 1
Question
What is the keyword for this kind of problem?
Are there any efficient developed technique for this kind of problem?

The rook polynomial, rook coefficient, restricted permutations and permanent are the keywords.
From Theorem 3.1 of Algorithm for Finding the Coefficients of Rook Polynomials
The number of arrangements of n objects with restriction board B is equal to permanent of B.
Here B is what we defined in the question, a 0-1 matrix where 1 is ok, 0 is restricted for a rook.
So now we need to efficiently calculate the permanent of a matrix.
Fortunately, from this code golf, Ton Hospel uses Glynn formula with a Gray code and Ryser formula, and reach about 57 seconds on the tester's system for n=36, which is quite enough for the questioner's case.

ZIO 2012: Toy Set [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
A toy set contains blocks showing the numbers from 1 to 9. There are plenty of blocks
showing each number and blocks showing the same number are indistinguishable. We
want to examine the number of different ways of arranging the blocks in a sequence so
that the displayed numbers add up to a fixed sum.
For example, suppose the sum is 4. There are 8 different arrangements:
1 1 1 1
1 1 2
1 2 1
1 3
2 1 1
2 2
3 1
4
The rows are arranged in dictionary order (that is, as they would appear if they were
listed in dictionary).
In each of the cases below, you are given the desired sum S and a number K. You have
to write down the Kth line when all arrangements that add up to S are written down
as described above. For instance, if S is 4 and K is 5, the answer is 2 1 1. Remember
that S may be large, but the numbers on the blocks are only from 1 to 9.
(a) S = 9, K = 156 (b) S = 11, K = 881 (c) S = 14, K = 4583
So basically each case (1111, 112, etc.) also known in maths as a partition of a number, although 112 and 121 count as the same partition(in maths), here I will have to consider them different partitions. In this case we are considering it differently. I tried bruteforcing by trying to find a common pattern, and if we consider an array par[] comprising of all the partitions of 9 (the first part of the question), arranged in terms of dictionary order, par[0] = 111111111, par[1] = 11111112 par[2] - par[3] will have 2 terms that comprise of 11111121 and 1111113. If we look carefully at the last 2 digits, we will notice that they are the partitions of 3. So basically the partions starting with 1 will follow an order 1+1 (partitions of 2) + 2 (partitions of 3) + 4 (partitions of 4) and so on, increasing in powers of 2, until par[127] = 18, no. of partitions of 8. We notice that on adding them we get powers of 2. However, I seem to be stuck on calculating position 156, as par[128] = 21111111, and I am unable to move further in my method. A recurrence relation or pseudocode will be most welcome. The answer as an integer is available online, but not the algorithm. Please help me out.
Source: http://www.iarcs.org.in/inoi/2012/zio2012/zio2012-qpaper.pdf
Solution: http://www.iarcs.org.in/inoi/2012/zio2012/zio2012-solutions.pdf

A hint:
partitions of 1
1 the number itself
partitions of 2
11 1 followed by partitions of 1
2 the number itself
partitions of 3
111 1 followed by partitions of 2
12 .
21 2 followed by partitions of 1
3 the number itself
partitions of 4
1111 1 followed by partitions of 3
112 .
121 .
13 .
211 2 followed by partitions of 2
22 .
31 3 followed by partitions of 1
4 the number itself
partitions of 5
11111 1 followed by partitions of 4
1112 .
1121 .
113 .
1211 .
122 .
131 .
14 .
2111 2 followed by partitions of 3
212 .
221 .
23 .
311 3 followed by partitions of 2
32 .
41 4 followed by partitions of 1
5 the number itself

How can I understand the result of this testcase in this code challenge?

I am trying to understand the first testcase of this challenge in codeforces.
The description is:
Sergey is testing a next-generation processor. Instead of bytes the processor works with memory cells consisting of n bits. These bits are numbered from 1 to n. An integer is stored in the cell in the following way: the least significant bit is stored in the first bit of the cell, the next significant bit is stored in the second bit, and so on; the most significant bit is stored in the n-th bit.
Now Sergey wants to test the following instruction: "add 1 to the value of the cell". As a result of the instruction, the integer that is written in the cell must be increased by one; if some of the most significant bits of the resulting number do not fit into the cell, they must be discarded.
Sergey wrote certain values of the bits in the cell and is going to add one to its value. How many bits of the cell will change after the operation?
Summary
Given a binary number, add 1 to its decimal value, count how many bits change after the operation?
Testcases
4
1100
= 3
4
1111
= 4
Note
In the first sample the cell ends up with value 0010, in the second sample — with 0000.
In the 2 test case 1111 is 15, so 15 + 1 = 16 (10000 in binary), so all the 1's change, therefore is 4
But in the 2 test case 1100 is 12, so 12 + 1 = 13 (01101), here just the left 1 at the end changes, but the result is 3 why?

You've missed the crucial part: the least significant bit is the first one (i.e. the leftmost one), not the last one, as we usually write binary.
Thus, 1100 is not 12 but 3. And so, 1100 + 1 = 3 + 1 = 4 = 0010, so 3 bits are changed.
The "least significant bit" means literally a bit that is not the most significant, so you can understand it as "the one representing the smallest value". In binary, the bit representing 2^0 is the least significant. So the binary code in your task is written as follows:
bit no. 0 1 2 3 4 (...)
value 2^0 2^1 2^2 2^3 2^4 (...)
| least | most
| significant | significant
| bit | bit
that's why 1100 is:
1100 = 1 * 2^0 + 1 * 2^1 + 0*2^2 + 0*2^3 = 1 + 2 + 0 + 0 = 3
not the other way around (as we write usually).

Trouble understanding what to do with output of shunting-yard algorithm

I've been looking at the wiki page: http://en.wikipedia.org/wiki/Shunting-yard_algorithm
I've used the code example to build the first part, basically I can currently turn :
3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3 into 3 4 2 * 1 5 − 2 3 ^ ^ / +
But I don't know how to then use 3 4 2 * 1 5 − 2 3 ^ ^ / + to obtain 3.00012207
And the example code and explanation on wiki aren't making any sense to me.
Could someone please explain how to evaluate 3 4 2 * 1 5 − 2 3 ^ ^ / + and produce the answer. Thanks in advance. I don't need a code example just a good explanation or a breakdown of an example.
Not that it matters but I am working .net C#.

The purpose of the shunting yard algorithm is that its output is in Reverse Polish Notation, which is straightforward to evaluate:
create a stack to hold values
while there is reverse polish notation input left:
read an item of input
if it is a value, push it onto the stack
otherwise, it is an operation; pop values from the stack, perform the operation on those values, push the result back
when there's no input left, if the expression was well formed, there should be exactly one value on the stack; this is the evaluated result.

The post-fix notation is how you do the math in, say, a HP calculator.
Keep a stack, whenever you get a number add it to the top. Whenever you get an operator consume inputs from the top and then add the result to the top
token stack
*empty*
3 3 //push numbers...
4 3 4
2 3 4 2
* 3 8 //remove 4 and 2, add 4*2=8
1 3 8 1
5 3 8 1 5
- 3 8 -4
2 3 8 -4 2
3 3 8 -4 2 3
^ 3 8 -4 8
... ...

Process the elements 3 4 2 * 1 5 − 2 3 ^ ^ / + left-to-right as follows:
Initialize a stack to hold numbers.
If the element is a number, push it onto the stack.
if the element is an operator, remove the top two elements from the stack, apply the operator to those two elements, and push the result onto the stack.
When you get to the end, the stack should have a single element which will be the result.

I see I am a bit late to the party.
I saw the question and went off on a tangent writing a couple of tasks for Rosetta Code. It just so happens that this task might be what you are after. It gives an annottated table of what happens when calculating the value of an RPN expression, token by token.
Here is a sample of its output:
For RPN expression: '3 4 2 * 1 5 - 2 3 ^ ^ / +'
TOKEN ACTION STACK
3 Push num onto top of stack 3
4 Push num onto top of stack 3 4
2 Push num onto top of stack 3 4 2
* Apply op to top of stack 3 8
1 Push num onto top of stack 3 8 1
5 Push num onto top of stack 3 8 1 5
- Apply op to top of stack 3 8 -4
2 Push num onto top of stack 3 8 -4 2
3 Push num onto top of stack 3 8 -4 2 3
^ Apply op to top of stack 3 8 -4 8
^ Apply op to top of stack 3 8 65536
/ Apply op to top of stack 3 0.0001220703125
+ Apply op to top of stack 3.0001220703125
The final output value is: '3.0001220703125'

Laser Grid Puzzle

Saw the following puzzle on HN and thought I would repost here. It can be solved using Simplex, but I was wondering if there is a more elegant solution, or if someone can prove NP-completeness.
Each dot below represents the position of a laser. Indicate the direction that the laser should fire by replacing the dot with ^, v, <, or >. Each grid position i,j should be hit by exactly grid[i][j] lasers. In the example below, grid position 0,0 should be hit by exactly grid[0][0] = 2 lasers.
A laser goes through everything in its path including other guns (without destroying those guns).
2 2 3 . 1 . 2 2 3
1 . 2 1 1 . 1 . 2
2 3 . 1 . 2 . 4 .
. 3 . 2 2 . 2 3 4
1 . 2 . 2 3 2 . .
2 3 . 3 . 3 2 2 .
3 . 2 4 2 . 2 . 2
1 1 . . 1 3 . 2 .
. 2 1 . 2 . 1 . 3

If it can be solved with Simplex (Linear Programming) it isn't NP-complete.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Pattern of decoding instruction - parallel-processing

Related

Constrained N-Rook Number of Solutions

ZIO 2012: Toy Set [closed]

How can I understand the result of this testcase in this code challenge?

Trouble understanding what to do with output of shunting-yard algorithm

Laser Grid Puzzle

Categories

Resources