Game Theory: how to apply it in transcriptomics? - algorithm

Hi
I have seen this paper (and also Game Theory applied to gene expression analysis, Game Theory and Microarray Data Analysis) authors have used game theory for their microarray DEG analysis (microarray
game).
Is there any simple guide from you (or other online resources) that can describe how to use related formula for checking game theory concept in the DEG analysis of RNA-seq experiences ? (Basically is it even practical?)
Maybe there is some software for doing such investigation, painlessly.
NOTE1: For example please have a look at "Game Theory Method" in the first paper above :
Let N 5 {1,. . .,n} be a set of genes. A microarray game is a
coalitional game (N, w) where the function w assigns to each coalition
S ( N a frequency of associations, between a condition and a
expression property, of genes realized in the coalition S."
Imagin we have 150 gene up-regulated in females and 80 up-regulated in males (using de novo assembly and DESeq2 package), now how I can use game theory for mining something new or some extra connections between this collection of genes?
NOTE2: I have asked this question in BIOSTARS but no answer after 8 weeks.
Thanks

Related

What is "gate count" in synthesis result and how to calculate

I'm synthesizing my design with design compiler and have some comparison with another design (as a evaluation in my report). The Synopsys's tool can easily report the area with command but in all paper I've read care about gate count.
My quiz is what is gate count and how to calculate it?
I googled and heard about gate count is calculated as total_area/NAND2_area. So, is it true?
Thank for your reading and please don't blame me about stupid question :(.
Synthesised area is often quoted as Gate count in NAND2 equivalents. You are correct with:
(total area)/(NAND2 area).
Older tools and libraries use to report this number, a few years a go I noticed a shift for tools to just provide areas in Square Microns. However the gate count is a nicer number to get your head around, and the number is portable between different size geometries.
40K for implementation A is smaller than 50K for implementation B. Much harder to compare 100000 um^2 for implementation A process X vs 65000 um^2 for implementation B on process y.

When to stop the looping in random number generators?

I'm not sure StackOverflow is the right place to ask this question, because this question is half-programming and half-mathematics. And also really sorry if my question is stupid ^_^
I'm studying about Monte Carlo simulations via the "Monte Carlo Methods" book. One of the first thing I must learn is about Random Number Generator. The basic algorithm of RNG is:
1. Initialize: Draw the seed S0 from the distribution µ on S. Set t = 1.
2. Transition: Set St = f(St−1).
3. Output: Set Ut = g(St).
4. Repeat: Set t = t+ 1 and return to Step 2.
(µ is a probability distribution on the finite set of states S, the input is S0 and the random number we desire it the output Ut)
It is not hard to understand, but the problem here is I don't see the random factor which lie in the number of repeat. How can we decide when to stop the loop of the RNG? All examples I read which implement a RNG are loop for 100 times, and they returns the same value for a specific seed. It is not random at all >_<
Can someone explain what I'm missing here? Any help will be appreciated. Thanks everyone
You can't get a true sequence of random numbers on a computer, without specialized hardware. (Such specialized hardware performs the equivalent of an initial roll of the dice using physics to provide the randomness. Electronic ones often use the electronic noise of specialized diodes at constant temperatures; others use radioactive decay events.)
Without that specialized hardware, what you can generate are pseudorandom numbers which, as you've observed, always generate the same sequence of numbers for the same initial seed. For simple applications, you can often get away with generating an initial seed from the time of invocation, which is effectively random.
And when I say "simple applications," I am excluding cryptography. (Not just that, but especially that.)
Sometimes when you are trying to debug a simulation, you actually want to have a reproducible stream of "random" numbers so you might specifically sent a stream to start with a specific seed.
For instance in the answer Creating a facet_wrap plot with ggplot2 with different annotations in each plot rcs starts the answer by creating a reproducible set of data using the R code
set.seed(1)
df <- data.frame(x=rnorm(300), y=rnorm(300), cl=gl(3,100)) # create test data
before going on to demonstrate how to answer the actual question.

How does language detection work?

I have been wondering for some time how does Google translate(or maybe a hypothetical translator) detect language from the string entered in the "from" field. I have been thinking about this and only thing I can think of is looking for words that are unique to a language in the input string. The other way could be to check sentence formation or other semantics in addition to keywords. But this seems to be a very difficult task considering different languages and their semantics. I did some research to find that there are ways that use n-gram sequences and use some statistical models to detect language. Would appreciate a high level answer too.
Take the Wikipedia in English. Check what is the probability that after the letter 'a' comes a 'b' (for example) and do that for all the combination of letters, you will end up with a matrix of probabilities.
If you do the same for the Wikipedia in different languages you will get different matrices for each language.
To detect the language just use all those matrices and use the probabilities as a score, let say that in English you'd get this probabilities:
t->h = 0.3 h->e = .2
and in the Spanish matrix you'd get that
t->h = 0.01 h->e = .3
The word 'the', using the English matrix, would give you a score of 0.3+0.2 = 0.5
and using the Spanish one: 0.01+0.3 = 0.31
The English matrix wins so that has to be English.
If you want to implement a lightweight language guesser in the programming language of your choice you can use the method of 'Cavnar and Trenkle '94: N-Gram-Based Text Categorization'. You can find the Paper on Google Scholar and it is pretty straight forward.
Their method builds a N-Gram statistic for every language it should be able to guess afterwards from some text in that language. Then such statistic is build for the unknown text aswell and compared to the previously trained statistics by a simple out-of-place measure.
If you use Unigrams+Bigrams (possibly +Trigrams) and compare the 100-200 most frequent N-Grams your hit rate should be over 95% if the text to guess is not too short.
There was a demo available here but it doesn't seem to work at the moment.
There are other ways of Language Guessing including computing the probability of N-Grams and more advanced classifiers, but in the most cases the approach of Cavnar and Trenkle should perform sufficiently.
You don't have to do deep analysis of text to have an idea of what language it's in. Statistics tells us that every language has specific character patterns and frequencies. That's a pretty good first-order approximation. It gets worse when the text is in multiple languages, but still it's not something extremely complex.
Of course, if the text is too short (e.g. a single word, worse, a single short word), statistics doesn't work, you need a dictionary.
An implementation example.
Mathematica is a good fit for implementing this. It recognizes (ie has several dictionaries) words in the following languages:
dicts = DictionaryLookup[All]
{"Arabic", "BrazilianPortuguese", "Breton", "BritishEnglish", \
"Catalan", "Croatian", "Danish", "Dutch", "English", "Esperanto", \
"Faroese", "Finnish", "French", "Galician", "German", "Hebrew", \
"Hindi", "Hungarian", "IrishGaelic", "Italian", "Latin", "Polish", \
"Portuguese", "Russian", "ScottishGaelic", "Spanish", "Swedish"}
I built a little and naive function to calculate the probability of a sentence in each of those languages:
f[text_] :=
SortBy[{#[[1]], #[[2]] / Length#k} & /# (Tally#(First /#
Flatten[DictionaryLookup[{All, #}] & /# (k =
StringSplit[text]), 1])), -#[[2]] &]
So that, just looking for words in dictionaries, you may get a good approximation, also for short sentences:
f["we the people"]
{{BritishEnglish,1},{English,1},{Polish,2/3},{Dutch,1/3},{Latin,1/3}}
f["sino yo triste y cuitado que vivo en esta prisión"]
{{Spanish,1},{Portuguese,7/10},{Galician,3/5},... }
f["wszyscy ludzie rodzą się wolni"]
{{"Polish", 3/5}}
f["deutsch lernen mit jetzt"]
{{"German", 1}, {"Croatian", 1/4}, {"Danish", 1/4}, ...}
You might be interested in The WiLI benchmark dataset for written language identification. The high level-answer you can also find in the paper is the following:
Clean the text: Remove things you don't want / need; make unicode un-ambiguious by applying a normal form.
Feature Extraction: Count n-grams, create tf-idf features. Something like that
Train a classifier on the features: Neural networks, SVMs, Naive Bayes, ... whatever you think could work.

Standards for pseudo code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I need to translate some python and java routines into pseudo code for my master thesis but have trouble coming up with a syntax/style that is:
consistent
easy to understand
not too verbose
not too close to natural language
not too close to some concrete programming language.
How do you write pseudo code? Are there any standard recommendations?
I recommend looking at the "Introduction to Algorithms" book (by Cormen, Leiserson and Rivest). I've always found its pseudo-code description of algorithms very clear and consistent.
An example:
DIJKSTRA(G, w, s)
1 INITIALIZE-SINGLE-SOURCE(G, s)
2 S ← Ø
3 Q ← V[G]
4 while Q ≠ Ø
5 do u ← EXTRACT-MIN(Q)
6 S ← S ∪{u}
7 for each vertex v ∈ Adj[u]
8 do RELAX(u, v, w)
Answering my own question, I just wanted to draw attention to the TeX FAQ entry Typesetting pseudocode in LaTeX. It describes a number of different styles, listing advantages and drawbacks. Incidentally, there happen to exist two stylesheets for writing pseudo code in the manner used in "Introductin to Algorithms" by Cormen, as recommended above: newalg and clrscode. The latter was written by Cormen himself.
I suggest you take a look at the Fortress Programming Language.
This is an actual programming language, and not pseudocode, but it was designed to be as close to executable pseudocode as possible. In particular, for designing the syntax, they read and analyzed hundreds of CS and math papers, courses, books and journals to find common usage patterns for pseudocode and other computational/mathematical notations.
You can leverage all that research by just looking at Fortress source code and abstracting out the things you don't need, since your target audience is human, whereas Fortress's is a compiler.
Here is an actual example of running Fortress code from the NAS (NASA Advanced Supercomputing) Conjugate Gradient Parallel Benchmark. For a fun experience, compare the specification of the benchmark with the implementation in Fortress and notice how there is almost a 1:1 correspondence. Also compare the implementation in a couple of other languages, like C or Fortran, and notice how they have absolutely nothing to do with the specification (and are also often an order of magnitude longer than the spec).
I must stress: this is not pseudocode, this is actual working Fortress code! From https://umbilicus.wordpress.com/2009/10/16/fortress-parallel-by-default/
Note that Fortress is written in ASCII characters; the special characters are rendered with a formatter.
If the code is procedural, normal pseudo-code is probably easy (Wikipedia has some examples).
Object-oriented pseudo-code might be more difficult. Consider:
using UML class diagrams to depict the classes/inheritence
using UML sequence diagrams to depict the sequence of code
I don't understand your requirement of "not too close to some concrete programming language".
Python is generally considered as a good candidate for writing pseudo-code. Perhaps a slightly simplified version of python would work for you.
Pascal has always been traditionally the most similar to pseudocode, when it comes to mathematical and technical fields. I don't know why, it was just always so.
I have some (oh, I don't know, 10 maybe books on a shelf, which concrete this theory).
Python as suggested, can be nice code, but it can be so unreadable as well, that it's a wonder by itself. Older languages are harder to make unreadable - them being "simpler" (take with caution) than today's ones. They'll maybe be harder to understand what's going on, but easier to read (less syntax/language features is needed for to understand what the program does).
This post is old, but hopefully this will help others.
"Introduction to Algorithms" book (by Cormen, Leiserson and Rivest) is a good book to read about algorithms, but the "pseudo-code" is terrible. Things like Q[1...n] is nonsense when one needs to understand what Q[1...n] is suppose to mean. Which will have to be noted outside of the "pseudo-code." Moreover, books like "Introduction to Algorithms" like to use a mathematical syntax, which is violating one purpose of pseudo-code.
Pseudo-code should do two things. Abstract away from syntax and be easy to read. If actual code is more descriptive than the pseudo-code, and actual code is more descriptive, then it is not pseudo-code.
Say you were writing a simple program.
Screen design:
Welcome to the Consumer Discount Program!
Please enter the customers subtotal: 9999.99
The customer receives a 10 percent discount
The customer receives a 20 percent discount
The customer does not receive a discount
The customer's total is: 9999.99
Variable List:
TOTAL: double
SUB_TOTAL: double
DISCOUNT: double
Pseudo-code:
DISCOUNT_PROGRAM
Print "Welcome to the Consumer Discount Program!"
Print "Please enter the customers subtotal:"
Input SUB_TOTAL
Select the case for SUB_TOTAL
SUB_TOTAL > 10000 AND SUB_TOTAL <= 50000
DISCOUNT = 0.1
Print "The customer receives a 10 percent discount"
SUB_TOTAL > 50000
DISCOUNT = 0.2
Print "The customer receives a 20 percent discount"
Otherwise
DISCOUNT = 0
Print "The customer does not a receive a discount"
TOTAL = SUB_TOTAL - (SUB_TOTAL * DISCOUNT)
Print "The customer's total is:", TOTAL
Notice that this is very easy to read and does not reference any syntax. This supports all three of Bohm and Jacopini's control structures.
Sequence:
Print "Some stuff"
VALUE = 2 + 1
SOME_FUNCTION(SOME_VARIABLE)
Selection:
if condition
Do one extra thing
if condition
do one extra thing
else
do one extra thing
if condition
do one extra thing
else if condition
do one extra thing
else
do one extra thing
Select the case for SYSTEM_NAME
condition 1
statement 1
condition 2
statement 2
condition 3
statement 3
otherwise
statement 4
Repetition:
while condition
do stuff
for SOME_VALUE TO ANOTHER_VALUE
do stuff
compare that to this N-Queens "pseudo-code" (https://en.wikipedia.org/wiki/Eight_queens_puzzle):
PlaceQueens(Q[1 .. n],r)
if r = n + 1
print Q
else
for j ← 1 to n
legal ← True
for i ← 1 to r − 1
if (Q[i] = j) or (Q[i] = j + r − i) or (Q[i] = j − r + i)
legal ← False
if legal
Q[r] ← j
PlaceQueens(Q[1 .. n],r + 1)
If you can't explain it simply, you don't understand it well enough.
- Albert Einstein

Eligibility trace algorithm, the update order

I am reading Silver et al (2012) "Temporal-Difference Search in Computer Go", and trying to understand the update order for the eligibility trace algorithm.
In the Algorithm 1 and 2 of the paper, weights are updated before updating the eligibility trace. I wonder if this order is correct (Line 11 and 12 in the Algorithm 1, and Line 12 and 13 of the Algorithm 2).
Thinking about an extreme case with lambda=0, the parameter is not updated with the initial state-action pair (since e is still 0). So I doubt the order possibly should be the opposite.
Can someone clarify the point?
I find the paper very instructive for learning the reinforcement learning area, so would like to understand the paper in detail.
If there is a more suitable platform to ask this question, please kindly let me know as well.
It looks to me like you're correct, e should be updated before theta. That's also what should happen according to the math in the paper. See, for example, Equations (7) and (8), where e_t is first computed using phi(s_t), and only THEN is theta updated using delta V_t (which would be delta Q in the control case).
Note that what you wrote about the extreme case with lambda=0 is not entirely correct. The initial state-action pair will still be involved in an update (not in the first iteration, but they will be incorporated in e during the second iteration). However, it looks to me like the very first reward r will never be used in any updates (because it only appears in the very first iteration, where e is still 0). Since this paper is about Go, I suspect it will not matter though; unless they're doing something unconventional, they probably only use non-zero rewards for the terminal game state.

Resources