Big 0 confusion: (log2(n))^x = O(n^y) where x>0,y>0 are constants - big-o

Why is n^y the upper bound of (log2(n))^x.My teacher took the values x=1000 and y =.1 for an example. But when i plotted then the graph for (log2(n))^x was always above n^y.If we take any value for n we can see that the value of (log2(n))^x is always much greater than n^y especially when x is much bigger than y.

Related

How can I randomly choose a point from a set with Julia lang?

I have set of points defined by inequalities
f.e 0<x<3, 0<y<3 and x^2+y^2>1,
How can I randomly choose a point from this set?
You can use rejection sampling, e.g.:
function myrand(R)
while true
x, y = 3rand(), 3rand() # now x∈[0,3[ and y∈[0,3[
x^2+y^2>R^2 && return (x,y)
end
end
Of course you should make sure that R^2<18 as otherwise you will get an infinite loop. The function is more expensive (takes more time to finish) the closer R is to this boundary.
If you wanted to improve the speed of it and R>3 (e.g. when you are very close to the boundary) then you can sample x and y from the interval [sqrt(R^2-9),3] by rescaling rand() result appropriately. The reason is that you know that if x or y is less or equal than sqrt(R^2-9) you will reject such a sample for sure (effectively you sample from a smaller square).

how to define the probability distribution

I have small question and I will be very happy if you can give me a solution or any idea for solution of probability distribution of the following idea:
I have a random variable x which follows exponntial distribution with parameter lambda1,I have one more variable y which follows exponential distribution with parameter lambda2. z is a discrete value, how can I define the probability distribution of k in the following formula ?
k=z-x-y
Thank you so much
Ok, lets start with rewriting formula a bit:
k = z-x-y = -(x-y) + z = - (x + y + -z)
That parts in the parentheses looks manageable. Let's start with x+y. For random variable x and y if one wants to find out their sum, answer is PDFs convolution.
q = x+y
PDF(q) = S PDFx(q-t) PDFy(t) dt
where S denotes integration. For x and y being exponential, the convolution integral is known and equal to expression here when lambdas are different, or to Gamma(2,lambda) when lambdas are equal, Gamma being Gamma distribution.
If z is some constant discrete value, then we could express it as continuous RV with PDF
PDF(t) = 𝛿(t+z)
where 𝛿 is Delta function, and we take into account that peak would be at -z as expected. It is normalized, so integral over t is eqaul to 1. It could be easily extended to discrete RV, as sum of 𝛿-functions at those values, multiplied by probabilities such that sum of them is equal to 1.
Again, we have sum of two RV, with known PDFs, and solution is convolution, which is easy to compute due to property of 𝛿-function. So final PDF of x + y + -z would be
PDF(q+z) dq
where PDF is taken from sum expression from Exponential distribution wiki, of Gamma distribution from Gamma wiki.
You just have to negate, and that's it

Doubts regarding this pseudocode for the perceptron algorithm

I am trying to implement the perceptron algorithm above. But I have two questions:
Why do we just update w (weight) variable once? Shouldn't there be separate w variables for each Xi? Also, not sure what w = 0d means mathematically in the initialization.
What is the mathematical meaning of
yi(< xi,w >+b)
I kinda know what the meaning inside the bracket is but not sure about the yi() part.
(2) You can think of 'yi' as a function that depends on w, xi and b.
let's say for a simple example, y is a line that separates two different classes. In that case, y can be represented as y = wx+b. Now, if you use
w = 0,x = 1 and b = 0 then y = 0.
For your given algorithm, you need to update your weight w, when the output of y is less than or equal to 0.
So, if you look carefully, you are not updating w once, as it is inside an if statement which is inside a for loop.
For your algorithm, you will get n numbers of output y based on n numbers of input x for each iteration of t. Here 'i' is used for indexing both input as xi and output as yi.
So, long story short, out of n numbers of input x, you only need to update the w when the output y for the corresponding input x will be less than or equal to zero (for each iteration of t).
(1) I have already mentioned w is not updated once.
Let's say you know that any output value greater(<) than 0 is the correct answer. So if you get an output which is less than or equal to zero then there is a mistake in your algorithm and you need to fix it. This is what your algorithm is doing by updating the w when the output is not matching the desired one.
Here w is represented as a vector and it is initialized as zero.

If Random variable is a function

The basic definition of random variable is that it is a function based on random experiment.the question is that if it is a function say f then how can it take numerical values..
Suppose if we toss two coins and X be random variable relating no. of heads with (0,1,2) .For event of two heads say w....we have X(w)=2 is value of function X at w. and not of X itself..
But sometimes it is written that x is a r .v taking values 0,1,2,....
Don't it sound wrong to say function and takes values?
A random variable is a well defined function X: E -> R, whose domain E is a probability space and its codomain is (generally speaking) the set of real numbers.
Intuitively, X is some kind of metric or measurement on the elements of E.
Example 1
Let E be the set of users of Stack Overflow at a given point in time, say right now. And let X be the function that assigns their reputation to every SO user. For example, you could calculate P(X >= 5000) which is the percent of SO users with a reputation of 5000 or more.
Notice that P(X >= 5000) is nothing but a compact notation for the subset of E defined as:
{u in E | X(u) >= 5000}
meaning the subset of SO users u with a reputation of 5000 or more.
Example 2
Let E be the set of questions in SO and X the function that assigns the number of votes (at certain point in time) to each question. If you pick one question q at random, X(q) would be its number of votes and we could ask for the probability of, say, X < 0 (down-voted questions.)
Here the subset of such questions is
{q in E | X(q) < 0}
i.e., the subset of questions q having a negative vote count.
Conclusion
There is nothing random in a Random variable. The randomness is in the way we pick elements (or subsets) from its domain.
Speaking of functions - Yes, it is safe to say that a function can take certain values. Speaking of random variables and probability, the definition I know is:
A random variable assigns a numerical value to each possible outcome of a random experiment
This definition does indeed say that X (aka random variable) is a function. In your case, where it is said that X (as in function) can take values 0,1,2 is basically saying that the subset of the codomain (or even the codomain or target set itself) of function X is the set {0,1,2}, or interval
[0,2] ⊂ ℕ.

Get X random points in a fixed grid without repetition

I'm looking for a way of getting X points in a fixed sized grid of let's say M by N, where the points are not returned multiple times and all points have a similar chance of getting chosen and the amount of points returned is always X.
I had the idea of looping over all the grid points and giving each point a random chance of X/(N*M) yet I felt like that it would give more priority to the first points in the grid. Also this didn't meet the requirement of always returning X amount of points.
Also I could go with a way of using increments with a prime number to get kind of a shuffle without repeat functionality, but I'd rather have it behave more random than that.
Essentially, you need to keep track of the points you already chose, and make use of a random number generator to get a pseudo-uniformly distributed answer. Each "choice" should be independent of the previous one.
With your first idea, you're right, the first ones would have more chance of getting picked. Consider a one-dimensional array with two elements. With the strategy you mention, the chance of getting the first one is:
P[x=0] = 1/2 = 0.5
The chance of getting the second one is the chance of NOT getting the first one 0.5, times 1/2:
P[x=1] = 1/2 * 1/2 = 0.25
You don't mention which programming language you're using, so I'll assume you have at your disposal random number generator rand() which results in a random float in the range [0, 1), a Hashmap (or similar) data structure, and a Point data structure. I'll further assume that a point in the grid can be any floating point x,y, where 0 <= x < M and 0 <= y < N. (If this is a NxM array, then the same applies, but in integers, and up to (M-1,N-1)).
Hashmap points = new Hashmap();
Point p;
while (items.size() < X) {
p = new Point(rand()*M, rand()*N);
if (!points.containsKey(p)) {
items.add(p, 1);
}
}
Note: Two Point objects of equal x and y should be themselves considered equal and generate equal hash codes, etc.

Resources