how to interpret log fold change (log2FC) on two cases - bioinformatics

just quick bioinformatic question if possible.
I did read few papers and could not understand what this log fold change means.
to make it simple I have a log2 fold change(log2FC) value 2 between condition A and B.
Does it mean A is two times higher compared to B or A is two times smaller compared to B?
Thank you.

Let's say that for gene expression the logFC of B relative to A is 2.
If log2(FC) = 2, the real increase of gene expression from A to B is 4 (2^2) (FC = 4). In other words, A has gene expression four times lower than B, which means at the same time that B has gene expression 4 times higher than A.

Related

Algorithm to compare two arrays of element and find most similar pairs efficiently

I have two arrays of string of length m and n respectively, where the strings inside are all with length x, and I want to find the best matching pairs that contain the most number of common letter possible:
In a simple case, just consider these two strings
Sm = [AAAA, BBBB]
Sn = [ABBA, AAAA, AAAA, CCCC]
Expected results (2 pairs matched, 2 strings left alone):
Pair 1: AAAA -> AAAA because of score 4
Pair 2: BBBB -> ABBA because of score 2
Strings in Sn that are left alone:
AAAA because the same string in Sm has been matched already
CCCC because unable to match any
Score matrix:
My current method (Slow):
Get the string length x, which is the max score (the case where all letters are identical) - in this case it is 4
Brute force compare mxn times generate the score matrix above - in this case it is 2*4 times
Loop from x to 1: (In this case it is looping from 4 to 1)
Walk through the score matrix and pop the string pairs with score x
Mark remaining unpaired strings or strings with 0 score as alone
Question:
My current method is slow with O(mn) when producing the score matrix (x will not be large so I assume const here).
Is there any algorithm that can perform better than O(mn) complexity?
Sorry I don't yet have enough rep to simply provide a comment but in a project I wrote a long time ago I leveraged the Levenshtien Distance algorithm. Specifically see this project for some helpful insight.
As far as I can tell you are doing the most efficient thing. To be completely thorough you need to compare every string in Sn with every string in Sn, so at best the algorithm will be O(mn). Anything less would not be comparing every element to every element.
One optimization could be to remove all duplicates, but that would for the most part incur a performance hit that would likely cause more harm than good in almost all circumstances.

How to minimise sum of 4 different people's scores, each in distinct disciplines

In a swimming contest, there are 4 types of different strokes. Each contestant participates in all of them and their completion times are recorded. From all the contestants, four have to be chosen such that their combined time is minimum(one contestant can be chosen for only one stroke). More than one such team may be possible and all of such teams must be printed in output.
For example, say there are four contestants A, B, C and D. Their completion times are
A 50.5 52.9 51.8 52.7
B 50.7 52.7 51.4 52.7
C 50.7 52.7 51.4 52.8
D 50.8 52.9 51.6 52.6
Here, the minimum time would be (A - 50.5, C - 52.7, B - 51.4, D - 52.6) and (A - 50.5, B - 52.7, C - 51.4, D - 52.6).
I don't have any test cases for this. I can do it using brute force but that would take O(n^4). What would be a better approach?
There's no point in taking the fifth or worse best time in a stroke: one of the top four is available regardless of how we want to assign the other three strokes. Take the top four candidates in each stroke and then filter the 4*4*4*4 = 256 possibilities to make sure that the assignments are unique.
O(n) (assuming that the number of races is constant).
Since you need all of the solutions, there's no way to get an O(n)-time solutions, since there may be exponentially many of them. You can enumerate them efficiently using the algorithm above, however. In your brute-force algorithm, insert a test that uses the above logic to determine whether the partial assignment being considered can be extended to an optimal solution. The running time is O(n + s), where s is the number of optimal solutions.

Guidance on Algorithmic Thinking (4 fours equation)

I recently saw a logic/math problem called 4 Fours where you need to use 4 fours and a range of operators to create equations that equal to all the integers 0 to N.
How would you go about writing an elegant algorithm to come up with say the first 100...
I started by creating base calculations like 4-4, 4+4, 4x4, 4/4, 4!, Sqrt 4 and made these values integers.
However, I realized that this was going to be a brute force method testing the combinations to see if they equal, 0 then 1, then 2, then 3 etc...
I then thought of finding all possible combinations of the above values, checking that the result was less than 100 and filling an array and then sorting it...again inefficient because it may find 1000s of numbers over 100
Any help on how to approach a problem like this would be helpful...not actual code...but how to think through this problem
Thanks!!
This is an interesting problem. There are a couple of different things going on here. One issue is how to describe the sequence of operations and operands that go into an arithmetic expression. Using parentheses to establish order of operations is quite messy, so instead I suggest thinking of an expression as a stack of operations and operands, like - 4 4 for 4-4, + 4 * 4 4 for (4*4)+4, * 4 + 4 4 for (4+4)*4, etc. It's like Reverse Polish Notation on an HP calculator. Then you don't have to worry about parentheses, having the data structure for expressions will help below when we build up larger and larger expressions.
Now we turn to the algorithm for building expressions. Dynamic programming doesn't work in this situation, in my opinion, because (for example) to construct some numbers in the range from 0 to 100 you might have to go outside of that range temporarily.
A better way to conceptualize the problem, I think, is as breadth first search (BFS) on a graph. Technically, the graph would be infinite (all positive integers, or all integers, or all rational numbers, depending on how elaborate you want to get) but at any time you'd only have a finite portion of the graph. A sparse graph data structure would be appropriate.
Each node (number) on the graph would have a weight associated with it, the minimum number of 4's needed to reach that node, and also the expression which achieves that result. Initially, you would start with just the node (4), with the number 1 associated with it (it takes one 4 to make 4) and the simple expression "4". You can also throw in (44) with weight 2, (444) with weight 3, and (4444) with weight 4.
To build up larger expressions, apply all the different operations you have to those initial node. For example, unary negation, factorial, square root; binary operations like * 4 at the bottom of your stack for multiply by 4, + 4, - 4, / 4, ^ 4 for exponentiation, and also + 44, etc. The weight of an operation is the number of 4s required for that operation; unary operations would have weight 0, + 4 would have weight 1, * 44 would have weight 2, etc. You would add the weight of the operation to the weight of the node on which it operates to get a new weight, so for example + 4 acting on node (44) with weight 2 and expression "44" would result in a new node (48) with weight 3 and expression "+ 4 44". If the result for 48 has better weight than the existing result for 48, substitute that new node for (48).
You will have to use some sense when applying functions. factorial(4444) would be a very large number; it would be wise to set a domain for your factorial function which would prevent the result from getting too big or going out of bounds. The same with functions like / 4; if you don't want to deal with fractions, say that non-multiples of 4 are outside of the domain of / 4 and don't apply the operator in that case.
The resulting algorithm is very much like Dijkstra's algorithm for calculating distance in a graph, though not exactly the same.
I think that the brute force solution here is the only way to go.
The reasoning behind this is that each number has a different way to get to it, and getting to a certain x might have nothing to do with getting to x+1.
Having said that, you might be able to make the brute force solution a bit quicker by using obvious moves where possible.
For instance, if I got to 20 using "4" three times (4*4+4), it is obvious to get to 16, 24 and 80. Holding an array of 100 bits and marking the numbers reached
Similar to subset sum problem, it can be solved using Dynamic Programming (DP) by following the recursive formulas:
D(0,0) = true
D(x,0) = false x!=0
D(x,i) = D(x-4,i-1) OR D(x+4,i-1) OR D(x*4,i-1) OR D(x/4,i-1)
By computing the above using DP technique, it is easy to find out which numbers can be produced using these 4's, and by walking back the solution, you can find out how each number was built.
The advantage of this method (when implemented with DP) is you do not recalculate multiple values more than once. I am not sure it will actually be effective for 4 4's, but I believe theoretically it could be a significant improvement for a less restricted generalization of this problem.
This answer is just an extension of Amit's.
Essentially, your operations are:
Apply a unary operator to an existing expression to get a new expression (this does not use any additional 4s)
Apply a binary operator to two existing expressions to get a new expression (the new expression has number of 4s equal to the sum of the two input expressions)
For each n from 1..4, calculate Expressions(n) - a List of (Expression, Value) pairs as follows:
(For a fixed n, only store 1 expression in the list that evaluates to any given value)
Initialise the list with the concatenation of n 4s (i.e. 4, 44, 444, 4444)
For i from 1 to n-1, and each permitted binary operator op, add an expression (and value) e1 op e2 where e1 is in Expressions(i) and e2 is in Expressions(n-i)
Repeatedly apply unary operators to the expressions/values calculated so far in steps 1-3. When to stop (applying 3 recursively) is a little vague, certainly stop if an iteration produces no new values. Potentially limit the magnitude of the values you allow, or the size of the expressions.
Example unary operators are !, Sqrt, -, etc. Example binary operators are +-*/^ etc. You can easily extend this approach to operators with more arguments if permitted.
You could do something a bit cleverer in terms of step 3 never ending for any given n. The simple way (described above) does not start calculating Expressions(i) until Expressions(j) is complete for all j < i. This requires that we know when to stop. The alternative is to build Expressions of a certain maximum length for each n, then if you need to (because you haven't found certain values), extend the maximum length in an outer loop.

Archers and Pikemen (CODEMASTERS) (Codechef)

There was a recent contest on codechef named CODEMASTER (which has just ended a few minutes back, so I can put this question on a forum now, I believe :P ).
The question is Archers and Pikemen.
This is the problem statement::
You are being attacked by hostile enemy forces, with knights and
swordsmen charging at you. Being the commander of your unit, you have
been given the task of arranging your elite archers and pikemen in a
special formation.
The archers and pikemen must stand between two flag posts in a
straight line (no soldier can stand beyond the flags). Each archer
must have at least two pikemen by his side (one to his left and one to
his right) such that he is at equal distances from both of them. (A
pikeman may be shared between two archers).
The archers stand at given fixed positions and the separations between
them may not be equal. You need to position your troops in the given
formation using the minimum number of pikemen.
Assume that the minimum distance between a pikeman and an archer or a
pikeman and a flag is 1 unit. the minimum distance between two
archers is 2 units. Input
The first line of the input contains an integer T denoting the number
of test cases. The description of T test cases follow.
The second line contains an integer N denoting the number of
separations.
The following N lines each contain an integer x, which is the
separation of the current archer from the previous one.
The first value of x is the separation of the first archer from the
first flag. The last value of x is the separation between the last
archer and the second flag Output
For each test case, output a single line containing the minimum number
of pikemen required. Constraints
1 ≤ T ≤ 1000
1 ≤ N ≤ 1000
2 ≤ x ≤ 1000
Example
Input:
2 3 4 4 2 4 2 2 2 2
Output:
3 4
Explanation
Example case 1: A possible formation can be :
F---1---p---3---A---3---p---1---A---1---p---1--- F
Example case 2: A possible formation can be :
F---1---p---1---A---1---p---1---A---1---p---1---A---1---p---1---F
F = flag A = archer p = pikeman
---d--- = distance between pikeman and archer/flag
The first instinct was that the number of pikeman were equal to the number of gaps, but then I realized that there can be a case when, we might that to place two pikeman between 2 archers since the distance of the next archer to the right might have their distance less than the distance between previous two archers.
Can some please help me explain the algorithm for this question.
The problem link:: http://www.codechef.com/CDMS2014/problems/CM1401
Link to one of the accepted solutions:: http://www.codechef.com/viewsolution/5166485
Please help me explain this problem guys.
Thanks in advance.. ;)

Find if any permutation of a number is within a range

I need to find if any permutation of the number exists within a specified range, i just need to return Yes or No.
For eg : Number = 122, and Range = [200, 250]. The answer would be Yes, as 221 exists within the range.
PS:
For the problem that i have in hand, the number to be searched
will only have two different digits (It will only contain 1 and 2,
Eg : 1112221121).
This is not a homework question. It was asked in an interview.
The approach I suggested was to find all permutations of the given number and check. Or loop through the range and check if we find any permutation of the number.
Checking every permutation is too expensive and unnecessary.
First, you need to look at them as strings, not numbers,
Consider each digit position as a seperate variable.
Consider how the set of possible digits each variable can hold is restricted by the range. Each digit/variable pair will be either (a) always valid (b) always invalid; or (c) its validity is conditionally dependent on specific other variables.
Now model these dependencies and independencies as a graph. As case (c) is rare, it will be easy to search in time proportional to O(10N) = O(N)
Numbers have a great property which I think can help you here:
For a given number a of value KXXXX, where K is given, we can
deduce that K0000 <= a < K9999.
Using this property, we can try to build a permutation which is within the range:
Let's take your example:
Range = [200, 250]
Number = 122
First, we can define that the first number must be 2. We have two 2's so we are good so far.
The second number must be be between 0 and 5. We have two candidate, 1 and 2. Still not bad.
Let's check the first value 1:
Any number would be good here, and we still have an unused 2. We have found our permutation (212) and therefor the answer is Yes.
If we did find a contradiction with the value 1, we need to backtrack and try the value 2 and so on.
If none of the solutions are valid, return No.
This Algorithm can be implemented using backtracking and should be very efficient since you only have 2 values to test on each position.
The complexity of this algorithm is 2^l where l is the number of elements.
You could try to implement some kind of binary search:
If you have 6 ones and 4 twos in your number, then first you have the interval
[1111112222; 2222111111]
If your range does not overlap with this interval, you are finished. Now split this interval in the middle, you get
(1111112222 + 222211111) / 2
Now find the largest number consisting of 1's and 2's of the respective number that is smaller than the split point. (Probably this step could be improved by calculating the split directly in some efficient way based on the 1 and 2 or by interpreting 1 and 2 as 0 and 1 of a binary number. One could also consider taking the geometric mean of the two numbers, as the candidates might then be more evenly distributed between left and right.)
[Edit: I think I've got it: Suppose the bounds have the form pq and pr (i.e. p is a common prefix), then build from q and r a symmetric string s with the 1's at the beginning and the end of the string and the 2's in the middle and take ps as the split point (so from 1111112222 and 1122221111 you would build 111122222211, prefix is p=11).]
If this number is contained in the range, you are finished.
If not, look whether the range is above or below and repeat with [old lower bound;split] or [split;old upper bound].
Suppose the range given to you is: ABC and DEF (each character is a digit).
Algorithm permutationExists(range_start, range_end, range_index, nos1, nos2)
if (nos1>0 AND range_start[range_index] < 1 < range_end[range_index] and
permutationExists(range_start, range_end, range_index+1, nos1-1, nos2))
return true
elif (nos2>0 AND range_start[range_index] < 2 < range_end[range_index] and
permutationExists(range_start, range_end, range_index+1, nos1, nos2-1))
return true
else
return false
I am assuming every single number to be a series of digits. The given number is represented as {numberOf1s, numberOf2s}. I am trying to fit the digits (first 1s and then 2s) within the range, if not the procudure returns a false.
PS: I might be really wrong. I dont know if this sort of thing can work. I haven't given it much thought, really..
UPDATE
I am wrong in the way I express the algorithm. There are a few changes that need to be done in it. Here is a working code (It worked for most of my test cases): http://ideone.com/1aOa4
You really only need to check at most TWO of the possible permutations.
Suppose your input number contains only the digits X and Y, with X<Y. In your example, X=1 and Y=2. I'll ignore all the special cases where you've run out of one digit or the other.
Phase 1: Handle the common prefix.
Let A be the first digit in the lower bound of the range, and let B be the first digit in the upper bound of the range. If A<B, then we are done with Phase 1 and move on to Phase 2.
Otherwise, A=B. If X=A=B, then use X as the first digit of the permutation and repeat Phase 1 on the next digit. If Y=A=B, then use Y as the first digit of the permutation and repeat Phase 1 on the next digit.
If neither X nor Y is equal to A and B, then stop. The answer is No.
Phase 2: Done with the common prefix.
At this point, A<B. If A<X<B, then use X as the first digit of the permutation and fill in the remaining digits however you want. The answer is Yes. (And similarly if A<Y<B.)
Otherwise, check the following four cases. At most two of the cases will require real work.
If A=X, then try using X as the first digit of the permutation, followed by all the Y's, followed by the rest of the X's. In other words, make the rest of the permutation as large as possible. If this permutation is in range, then the answer is Yes. If this permutation is not in range, then no permutation starting with X can succeed.
If B=X, then try using X as the first digit of the permutation, followed by the rest of the X's, followed by all the Y's. In other words, make the rest of the permutation as small as possible. If this permutation is in range, then the answer is Yes. If this permutation is not in range, then no permutation starting with X can succeed.
Similar cases if A=Y or B=Y.
If none of these four cases succeed, then the answer is No. Notice that at most one of the X cases and at most one of the Y cases can match.
In this solution, I've assumed that the input number and the two numbers in the range all contain the same number of digits. With a little extra work, the approach can be extended to cases where the numbers of digits differ.

Resources