Doubts regarding this pseudocode for the perceptron algorithm - algorithm

I am trying to implement the perceptron algorithm above. But I have two questions:
Why do we just update w (weight) variable once? Shouldn't there be separate w variables for each Xi? Also, not sure what w = 0d means mathematically in the initialization.
What is the mathematical meaning of
yi(< xi,w >+b)
I kinda know what the meaning inside the bracket is but not sure about the yi() part.

(2) You can think of 'yi' as a function that depends on w, xi and b.
let's say for a simple example, y is a line that separates two different classes. In that case, y can be represented as y = wx+b. Now, if you use
w = 0,x = 1 and b = 0 then y = 0.
For your given algorithm, you need to update your weight w, when the output of y is less than or equal to 0.
So, if you look carefully, you are not updating w once, as it is inside an if statement which is inside a for loop.
For your algorithm, you will get n numbers of output y based on n numbers of input x for each iteration of t. Here 'i' is used for indexing both input as xi and output as yi.
So, long story short, out of n numbers of input x, you only need to update the w when the output y for the corresponding input x will be less than or equal to zero (for each iteration of t).
(1) I have already mentioned w is not updated once.
Let's say you know that any output value greater(<) than 0 is the correct answer. So if you get an output which is less than or equal to zero then there is a mistake in your algorithm and you need to fix it. This is what your algorithm is doing by updating the w when the output is not matching the desired one.
Here w is represented as a vector and it is initialized as zero.

Related

Function including random number that can be inverted without the random number

Given a number x and a random number n, I am looking for two functions F and G so that:
y = F(x, n) where y is different for different values of n
x = G(y)
all numbers are (large, e.g. 256 bit) integers
For instance given a list of numbers k1, k2, k3, f4 generated by applying multiple times F, it is possible to calculate k3 from k4 but not k4 from k3 (the random number prevents the inversion).
The problem is obvious if we allow to use n (or derived) in G (it is basically an asymmetric encryption) but this is not the target.
Any idea?
Update
I found a function that works with infinite precision F = x * pow(coprime(x), n)
x = 29
p = 5
n = 20
def f(x,n):
return x * pow(p,n)
f(x,n) => 2765655517578125
and G becomes
def g(y):
x = y
while x % p == 0:
x = x/p
return x
g(y) = 29
Unfortunately this fails with overflow as soon as numbers become big (limited precision)
Second update: the problem has no solution
In fact let's start from a situation where the problem has a solution, which is when the domain of G and F is R.
In that case choosing a random output from any function F' that has multiple output will work.
For instance if then F(x, n) = acos(x) + 2nπ, where n random is Integer
then G(y) = cos(y). From y is always possible to go back to x, but not the opposite without knowing n.
A similar example can be built with operation with module, which will work with Integer domains without the need of real numbers.
Anyway this will fail when the domain is the same finite set (like on physical memory) for F and G. It can be proved by contradiction.
Let's assume that for finite domains D1=D2 of size N, a function F:D1->D2 exists that produces M outputs where M > 1.
Assuming that the function produces at least one output for each x in D1,
1 either D2 > D1
2 or outputs from F are the same for different values of x (some overlapping must exists)
Now 1 is against the requirement that D1=D2, while 2 is against the requirement that G(y) has a single output value
If we relax 1 and we allow D2 > D1, then we can solve the problem. This can be done by adding n (or a derivation of it) like suggested in some comments. For my specific scenario probably it makes more sense to use a EC public/private key but that is another story.
Many Thanks
Based on your requirements, the following should work. If there is some other requirement that I did not understand from your question, please clarify, because this seems to suffice based on your definition. In that case, I will change or delete this answer.
f(x, n) = x | n;
g(y | n) = y;
where | means concatenation of bits. We can assign a fixed (maximum) number of bits for n and pad with zeros.
there can be no solution for this problem because:
for a constant x1 and variable r you would have an output set with all Integers in it.
for a constant x2 and variable r again you would have an output set with all Integers in it.
so at best you can have a function g which would take a number from the output set of function f and return all possible answers which are infinite.
this is similar to writing a reverse hashing function; which defies logic.

Pairing the weight of a protein sequence with the correct sequence

This piece of code is part of a larger function. I already created a list of molecular weights and I also defined a list of all the fragments in my data.
I'm trying to figure out how I can go through the list of fragments, calculate their molecular weight and check if it matches the number in the other list. If it matches, the sequence is appended into an empty list.
combs = [397.47, 2267.58, 475.63, 647.68]
fragments = ['SKEPFKTRIDKKPCDHNTEPYMSGGNY', 'KMITKARPGCMHQMGEY', 'AINV', 'QIQD', 'YAINVMQCL', 'IEEATHMTPCYELHGLRWV', 'MQCL', 'HMTPCYELHGLRWV', 'DHTAQPCRSWPMDYPLT', 'IEEATHM', 'MVGKMDMLEQYA', 'GWPDII', 'QIQDY', 'TPCYELHGLRWVQIQDYA', 'HGLRWVQIQDYAINV', 'KKKNARKW', 'TPCYELHGLRWV']
frags = []
for c in combs:
for f in fragments:
if c == SeqUtils.molecular_weight(f, 'protein', circular = True):
frags.append(f)
print(frags)
I'm guessing I don't fully know how the SeqUtils.molecular_weight command works in Python, but if there is another way that would also be great.
You are comparing floating point values for equality. That is bound to fail. You always have to account for some degree of error when dealing with floating point values. In this particular case you also have to take into account the error margin of the input values.
So do not compare floats like this
x == y
but instead like this
abs(x - y) < epsilon
where epsilon is some carefully selected arbitrary number.
I did two slight modifications to your code: I swapped the order of the f and the c loop to be able to store the calculated value of w. And I append the value of w to the list frags as well in order to better understand what is happening.
Your modified code now looks like this:
from Bio import SeqUtils
combs = [397.47, 2267.58, 475.63, 647.68]
fragments = ['SKEPFKTRIDKKPCDHNTEPYMSGGNY', 'KMITKARPGCMHQMGEY', 'AINV', 'QIQD', 'YAINVMQCL', 'IEEATHMTPCYELHGLRWV',
'MQCL', 'HMTPCYELHGLRWV', 'DHTAQPCRSWPMDYPLT', 'IEEATHM', 'MVGKMDMLEQYA', 'GWPDII', 'QIQDY',
'TPCYELHGLRWVQIQDYA', 'HGLRWVQIQDYAINV', 'KKKNARKW', 'TPCYELHGLRWV']
frags = []
threshold = 0.5
for f in fragments:
w = SeqUtils.molecular_weight(f, 'protein', circular=True)
for c in combs:
if abs(c - w) < threshold:
frags.append((f, w))
print(frags)
This prints the result
[('AINV', 397.46909999999997), ('IEEATHMTPCYELHGLRWV', 2267.5843), ('MQCL', 475.6257), ('QIQDY', 647.6766)]
As you can see, the first value for the weight differs from the reference value by about 0.0009. That's why you did not catch it with your approach.

Check whether A and B exists for the given values AorB and APlusB

Problem Statement:
You are given two non-negative integers: the longs pairOr and pairSum.
Determine whether it is possible that for two non-negative integers A and B we have both:
A or B = pairOr
A + B = pairSum
Above, "or" denotes the bitwise-or operator.
Return True if we can find such A and B, and False if not.
My Algorithm goes like this:
I've taken the equation: A | B = X and A + B =Y,
Now after substituting A's value from 2nd Equation, (Y-B) | B= X.
I'm going to traverse from 0 till Y (in place of B) to check if the above equation is true.
Code Snippet:
boolean isPossible(long orAandB,long plusAandB) {
for(long i=0;i<=plusAandB;i++) {
if(((plusAandB-i)|i)==orAandB ){
return true;
}
}
return false;
It will give TLE if the value of plusAndB is of number 10^18. Could you please help me optimize?
You don't need the full iteration, giving O(N). There's a way to do it in O(logN).
But completely solving the problem for you takes away most of the fun... ;-), so here's the main clue:
Your equation (Y-B) | B= X is one great observation, and the second is to have a look at this equation bit by bit, starting from the right (so you don't have to worry about borrow-bits in the first place). Which last-bit combinations of Y, X, and B can make your equation true? And if you found a B bit, how do you continue recursively with the higher bits (don't forget that subtraction may need a borrow)? I hope you remember the rules for subtracting binary numbers.
And keeping in mind that the problem only asks for true or false, not for any specific A or B value, can save you exponential complexity.

If Random variable is a function

The basic definition of random variable is that it is a function based on random experiment.the question is that if it is a function say f then how can it take numerical values..
Suppose if we toss two coins and X be random variable relating no. of heads with (0,1,2) .For event of two heads say w....we have X(w)=2 is value of function X at w. and not of X itself..
But sometimes it is written that x is a r .v taking values 0,1,2,....
Don't it sound wrong to say function and takes values?
A random variable is a well defined function X: E -> R, whose domain E is a probability space and its codomain is (generally speaking) the set of real numbers.
Intuitively, X is some kind of metric or measurement on the elements of E.
Example 1
Let E be the set of users of Stack Overflow at a given point in time, say right now. And let X be the function that assigns their reputation to every SO user. For example, you could calculate P(X >= 5000) which is the percent of SO users with a reputation of 5000 or more.
Notice that P(X >= 5000) is nothing but a compact notation for the subset of E defined as:
{u in E | X(u) >= 5000}
meaning the subset of SO users u with a reputation of 5000 or more.
Example 2
Let E be the set of questions in SO and X the function that assigns the number of votes (at certain point in time) to each question. If you pick one question q at random, X(q) would be its number of votes and we could ask for the probability of, say, X < 0 (down-voted questions.)
Here the subset of such questions is
{q in E | X(q) < 0}
i.e., the subset of questions q having a negative vote count.
Conclusion
There is nothing random in a Random variable. The randomness is in the way we pick elements (or subsets) from its domain.
Speaking of functions - Yes, it is safe to say that a function can take certain values. Speaking of random variables and probability, the definition I know is:
A random variable assigns a numerical value to each possible outcome of a random experiment
This definition does indeed say that X (aka random variable) is a function. In your case, where it is said that X (as in function) can take values 0,1,2 is basically saying that the subset of the codomain (or even the codomain or target set itself) of function X is the set {0,1,2}, or interval
[0,2] ⊂ ℕ.

Maximum xor of a range of numbers

I am grappling with this problem Codeforces 276D. Initially I used a brute force approach which obviously failed for large inputs(It started when inputs were 10000000000 20000000000). In the tutorials Fcdkbear(turtor for the contest) talks about a dp solution where a state is d[p][fl1][fr1][fl2][fr2].
Further in tutorial
We need to know, which bits we can place into binary representation of number а in p-th position. We can place 0 if the following condition is true: p-th bit of L is equal to 0, or p-th bit of L is equal to 1 and variable fl1 shows that current value of a is strictly greater then L. Similarly, we can place 1 if the following condition is true: p-th bit of R is equal to 1, or p-th bit of R is equal to 0 and variable fr1 shows that current value of a is strictly less then R. Similarly, we can obtain, which bits we can place into binary representation of number b in p-th position.
This is going over my head as when ith bit of L is 0 then how come we can place a zero in a's ith bit. If L and R both are in same bucket(2^i'th boundary like 16 and 24) we will eventually place a 0 at 4th whereas we can place a 1 if a = 20 because i-th bit of R is 0 and a > R. I am wondering what is the use of checking if a > L or not.
In essence I do not get the logic of
What states are
How do we recur
I know that might be an overkill but could someone explain it in descriptive manner as editorial is too short to explain anything.
I have already looked in here but suggested solution is different from one given in editorial. Also I know this can be solved with binary search but I am concerned with DP solution only
If I got the problem right: Start to compare the bits of l and r from left (MSB) to right(LSB). As long as these bits are equal there is no freedom of choice, the same bits must appear in a and b. the first bit differing must be 1 in r and 0 in l. they must appear also in a (0) and b(1). from here you can maximise the XOR result. simply use zeros for b an ones for a. that gives a+1==b and the xor result is a+b which is always 2^n-1.
I'm not following the logic as written above but the basic idea is to look bit by bit.
If L and R have different values in the same bit position then we have already found candidates that would maximize the xor'd value of that position (0 xor 1 = 1 xor 0 = 1). The other case to consider is whether the span of R-L is greater than the position value of that bit. If so then there must be two different values of A and B falling between L and R where that bit position has opposite values (as well as being able to generate any combinations of values in the lower bits.)

Resources