Measurement of unentangled qubits - measurement

If I have two independent (unentangled) qubits, let's say one in state |1> and the other one in some superposition state with equal amplitudes and arbitrary phases. If I measure the qubit that is in the superposition state (and let's say get 0), it seems that the remaining phase is kicked back to the first (non-measured) qubit. So the state of the first (non-measured) qubit changes as a result of the measurement of the second qubit. But the qubits are independent and are not supposed to know about each other and impact each other. So why is the phase transferred to the first qubit?
here is the qiskit code to illustrate the question:
# some arbitrary state:
theta = [np.pi/3,np.pi/4]
a = 1/np.sqrt(2)*np.array([np.exp(1j*theta[0]),np.exp(1j*theta[1])])
qc = QuantumCircuit(2,1)
qc.initialize(a,1) # set q[1] to a
qc.measure(1,0)
execute(qc,svsim).result().get_statevector() # get the post-measurement state

This is due to the difference between global and relative phase. What you can see here is a difference in global phase only which is not observable. The difference between global and relative phase is well explained here: https://quantumcomputing.stackexchange.com/questions/5125/what-is-the-difference-between-a-relative-phase-and-a-global-phase-in-particula
In your case the result is the expected one as the state you get is indistinguishable from the state [0, 1].

Related

RiscV forwarding, why don't we need it?

can someone help me understand why between line 1 and 3 we don't need forwarding (there is no green arrow as between 1 and 2)
I think we need it because sub uses the value of t0 which add determines and both are doing read and write of that value at same time.(To be precise write for add happens more lately when the clock rises)
You are correct that in the third instruction (sub), has already read an incorrect (e.g. stale) value in decode stage, and thus requires mitigation such as forwarding.
In fact, that sub instruction has read two incorrect (stale) values, one for the first operand, t0, and one for the second operand, t3, as that register is updated by the immediately prior instruction.
The first actual register update (of t0 by add) is available in cycle 5 (1-based counting), yet the decode of the sub happens in cycle 4.  A forward is required: here it could be from the W stage of the add to the ALU stage of the sub -or- it could be done from the M stage of the add to the D stage of the sub.
Only in the next cycle after (4th instruction, not shown) could the decode obtain the proper up-to-date value from the earlier instruction's W stage — if the W stage overlaps with a subsequent instruction's D stage, no forward is necessary since the W stage finishes early in the cycle and the D stage is able to pick up that result.
There is also a straightforward ALU-ALU dependency, a (read-after-write) hazard, on t3 between instruction 2 (the writer) and instruction 3 (the reader) that the diagram does not call out, so that is good evidence that the diagram is incomplete with respect to showing all the hazards.
Sometimes educators only show the most clear example of the read-after-write hazard.  There are many other hazards that are often overlooked.
Another involve load hazards.  Normally, a load hazard is seen as requiring both a forward and a stall; this if there is a use of the load result done in the next instruction at the ALU.  However, if a load instruction is succeeded by a store instruction (storing the loaded data), a forward from M (of load) to M of store can mitigate this hazard without a stall (much the same way that X to X forward can mitigate and ALU dependency hazard).
So we might note that a store instruction has two register sources, but the register for the value being stored isn't actually needed until the M stage, whereas the register for the base address computation is needed in the X (ALU) stage.  (That makes store somewhat different from, say, add which also has two register sources, in that there both are needed for the X stage.)

Never ending 'for' loop prevents my RStudio notebook from being rendered into a .md file

I'm trying to calculate the Kolmogorov-Smirnov statistic in R. I have the following sample, which clearly comes from a random variable that follows a long-tailed distribution.
Download link
https://drive.google.com/file/d/1hIgqikX7p343zdyc-Goq34THUpsZA63n/view?usp=sharing
As you may know, the Kolmogorov-Smirnov statistic requires the calculation of the empirical cumulative distribution function and the presumed cumulative distribution function. For both calculations I take the following approach: first, I create a vector with the same length as the length of the sample, and then I modify each of the components of the vector so as for it to contain the empirical cdf (or presumed cdf) of the corresponding observation of the sample.
For the sake of illustration, I'll show you the code I wrote in order to calculate the empirical cdf.
I'm assuming that the data has been read and stored in a dataframe called data.
ecdf = vector("numeric", length(data$logueos))for (i in 1:length(data$logueos)) {ecdf[i] = sum (data$logueos <= data$logueos[i])/length(data$logueos)}
The code I wrote for the calculation of the presumed cdf is analogous to the preceding one; the only difference is that I set each component of the pcdf vector equal to the formula $P(X<=t)$ —where t is the corresponding observation of the sample— according to the distribution that I'm assuming.
The problem is that this 'for' loop never ends. If I force it to end by clicking RStudio's stop button it works: it makes the vector store what I want it to store. But, if I press Ctrl+Shift+k in order to render my notebook and preview it, the load gets stuck when trying to execute the first chunk encountered that contains one of those loops.
First of all, your loop is not endless. It will finish, eventually.
You start initializing a vector with as much elements as the number of observations (1.245.888, which is a lot of iterations). This vector is FULL OF ZEROS.
What your loop does is iterate while changing each zero with the calculus sum (data$logueos <= data$logueos[i])/length(data$logueos). Check that when you stop the execution, the first values of your vector will be values between 0 and 1 while the last values is going to be 0s (because the loop hasn't arrived there yet).
So, you will have to wait more time.
In order to make the execution faster, you could consider loop parallelization (because standard loops go sequentially, one by one, and if it's too much wait, parallelization makes it faster. For example, executing 4 by 4, depending of your computer capacities). Here you'll find some information about it: https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html
Then, my proposal to you:
if(!require(foreach)){install.packages("foreach")}; require(foreach)
registerDoParallel(detectCores() - 1)
ecdf = vector("numeric", length(data$logueos))
foreach (i=1:length(data$logueos)) %do% {
print(i)
ecdf[i] = sum (data$logueos <= data$logueos[i])/length(data$logueos)
}
The first line will download and load foreach library, that you
need for parallelization.
detectCores() - 1 is going to use all the
processors that your computer has except one (to avoid freezing your
machine) for computing this loop. You'll see that is going to be
faster!
registerDoParallel function is what tells to foreach how many cores use.

A significant figure of division of time command's result

I use time command to measure execution time of the code A and B. I got results, and now, I want to get ratio between A and B.
Concretely, I have execution time as below.
A : 0m0.819s
B : 0m2.119s
I need x in B = x * A. So, I did the division 2.119 / 0.819, and the result is 2.5873015873. Now, I want to know the significant figure of 2.5873015873. Can I treat 2.5873015873 as the answer or must I decide the significant figure? And if I must decide the significant figure, is it 4 by A and B's execution time?

Viterbi algorithm for real-time applications

I know that given an HMM and an observation, Viterbi algorithm can guess the hidden states sequence that produce this observation. But what about the case you want to use it real-time? I mean finding the hidden states step by step. Every time an observation symbol is on the input, a hidden state is guessed, without knowing the whole observation sequence that's coming next.
I want to use that for an audio application that is running in real time so the observation will be a sequence of values of an audio feature at each time frame.
If you are interested in predicting what the hidden state is at time T, when you see the observation O_T, then you have data O_1, ..., O_{T-1}, O_T. Now the most likely state is found with forward backwards, where the backward variable is simply 1, because we can't see into the future. In summary, we have P(We are in hidden state i at time T) = \alpha_T(i) / P(O_1, ..., O_T | \lambda), where P(O_1, ..., O_T| \lambda) = \sum_{i=1}^n \alpha_T(i). Then the max index over all i's of P(We are in hidden state i at time T) will be your hidden state.
Please refer to http://courses.media.mit.edu/2010fall/mas622j/ProblemSets/ps4/tutorial.pdf for the formal notation.
Please let me know if this is what you were after, or if you had something else in mind. If you just wanted to find the best sequence of states in realtime, just compute the alpha variables, no need to look into the future for that.

Best way to calculate the result of a formula?

I currently have an application which can contain 100s of user defined formulae. Currently, I use reverse polish notation to perform the calculations (pushing values and variables on to a stack, then popping them off the stack and evaluating). What would be the best way to start parallelizing this process? Should I be looking at a functional language?
The calculations are performed on arrays of numbers so for example a simple A+B could actually mean 100s of additions. I'm currently using Delphi, but this is not a requirement going forward. I'll use the tool most suited to the job. Formulae may also be dependent on each other So we may have one formula C=A+B and a second one D=C+A for example.
Let's assume your formulae (equations) are not cyclic, as otherwise you cannot "just" evaluate them. If you have vectorized equations like A = B + C where A, B and C are arrays, let's conceptually split them into equations on the components, so that if the array size is 5, this equation is split into
a1 = b1 + c1
a2 = b2 + c2
...
a5 = b5 + c5
Now assuming this, you have a large set of equations on simple quantities (whether integer, rational or something else).
If you have two equations E and F, let's say that F depends_on E if the right-hand side of F mentions the left-hand side of E, for example
E: a = b + c
F: q = 2*a + y
Now to get towards how to calculate this, you could always use randomized iteration to solve this (this is just an intermediate step in the explanation), following this algorithm:
1 while (there is at least one equation which has not been computed yet)
2 select one such pending equation E so that:
3 for every equation D such that E depends_on D:
4 D has been already computed
5 calculate the left-hand side of E
This process terminates with the correct answer regardless on how you make your selections on line // 2. Now the cool thing is that it also parallelizes easily. You can run it in an arbitrary number of threads! What you need is a concurrency-safe queue which holds those equations whose prerequisites (those the equations depend on) have been computed but which have not been computed themselves yet. Every thread pops out (thread-safely) one equation from this queue at a time, calculates the answer, and then checks if there are now new equations so that all their prerequisites have been computed, and then adds those equations (thread-safely) to the work queue. Done.
Without knowing more, I would suggest taking a SIMD style approach if possible. That is, create threads to compute all formulas for a single data set. Trying to divide the computation of formulas to parallelise them wouldn't yield much speed improvement as the logic required to be able to split up the computations into discrete units suitable for threading would be hard to write and harder to get right, the overhead would cancel out any speed gains. It would also suffer quickly from diminishing returns.
Now, if you've got a set of formulas that are applied to many sets of data then the parallelisation becomes easier and would scale better. Each thread does all computations for one set of data. Create one thread per CPU core and set its affinity to each core. Each thread instantiates one instance of the formula evaluation code. Create a supervisor which loads a single data set and passes it an idle thread. If no threads are idle, wait for the first thread to finish processing its data. When all data sets are processed and all threads have finished, then exit. Using this method, there's no advantage to having more threads than there are cores on the CPU as thread switching is slow and will have a negative effect on overall speed.
If you've only got one data set then it is not a trivial task. It would require parsing the evaluation tree for branches without dependencies on other branches and farming those branches to separate threads running on each core and waiting for the results. You then get problems synchronizing the data and ensuring data coherency.

Resources