I am doing a few thing on Information Retrieval and have an exam coming up and I am absolutely clueless. First of, could anyone recommend me the shortest and best description possible for what PageRank actually is in Information Retrieval? Maybe even a good short video or your own description. I know Google use to, or did use it.
I know there are a lot of questions here but I could use as MUCH help as possible in a short length of time.
So my first question (taken from past papers, and making my own examples):
I am wanting to take a table such as:
A B C
A 0 1 0
B 1 0 1
C 0 0 0
And create a graph. I believe this is correct but unsure (I could use a "yes that is correct" or a "no":
And if I was given a graph such as:
The table would be:
A B C
A 0 1 0
B 0 0 1
C 0 0 0
Is that correct? If not, could I please get help and get it described somewhere? The lecture I am reading is not great at explaining and my lecturer isn't great at helping either.
Next I will probably be asked to use Teleportation Probability on the first table. This I desperately need help in. If the probability(the special a symbol)=1/2, does this mean multiply everything, including the 0's in the table such as 0x1/2? also 1x1/2? This is for the matrix of transition probabilities.
Next would be, how can I calculate PageRank from the above matrix. Using matrix multiplication. In words or in Pseudocode.
Another question I want to know is, will a user's page rank on twitter increase if they follow another user? I was assuming this would be a no because they are not following the user back?
Does a user's pagerank depend on how frequently you find said user if you start at a random user and click on another random persona and such till you find them? I assume this one is definitely not true. Because they might not be following said user.
I know this is a lot to ask. Does anyone have tutorials I can follow for either that are not complicated and I can look at and get it mastered today?
Thanks I really appreciate all your help. I know not one person can answer them all but can help provide assistance for some.
here's my stab at answering your questions:
good learning resource:
http://en.wikipedia.org/wiki/PageRank#Simplified_algorithm (no doubt you've see it already, but it's a pretty good one). Start there, understand the algorithm first, then do the implementation.
this might be a good simple method to implement?
http://pr.efactory.de/e-pagerank-algorithm.shtml
or this:
http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm
I'm guessing you can program in Python (common school language), in that case you might be interested in a package for handling graphs which has pagerank calculations: http://networkx.lanl.gov/reference/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html. If you have to write your own pagerank algorithm (very doable), you could use that to check the results.
For the matrix -> graph conversion question: your professor needs to specify how directionality is encoded in the matrix. Does a 1 at B,C specify a link from B to C or from C to B? My guess would be B to C. If that's true, your first graph is wrong there, but the second graph is ok. Directionality is very important in PageRank.
I believe the Teleportation probability is the probability that a random walker executing a new step will jump to a random node in the graph. It's in the wikipedia page under "damping factor". I don't know how it ties into multiplying numbers in your matrix.
For the Twitter question - yes, I think you have it right. Linking to (or presumably following) a second person does nothing directly to the the first person's pagerank, but it likely increases the second person's pagerank. In practice, there could be secondary effects, like the second person noticing that the first person is interesting and following them back.
second to last question - yes, one formulation of the pagerank algorithm is as a random walk along links with the frequency of encountering a node (page) going into the pagerank.
good luck!
I have only just started to learn Ruby hoping one of you have some time to explain something to a complete newbie. I have been learning from Treehouse and Lynda (any other suggestions) and want to start testing what i know to write a small sports tipping app for my friends.
basically my theory is to
1. parse some data from an official site
2. get this into the format of a hash
3. iterate over the hash to see which score (value) is higher and which key (team) that is associated with to get a winner.
this is the core of functionality, however there is obviously code required around;
signing up and manage users
getting tips
giving results (and winners)
My first question is around iterating over a hash to compare the scores and get the winner.
what would be the most beautiful code to achieve this.
Hope you can help and thank you for your help its really appreciated, hope one day to be here with some skills returning the favour !
No offense but this post has a lot of explanation for what is really a very simple question: how to find the largest value in a hash.
That being said, you could do scores.max_by{|key,val| val} on a hash called scores. This is assuming the scores are integers, not strings. If the scores are strings you could do scores.max_by{|key,val| val.to_i} instead. That will return an array with the key (team name) as the first item, and the value (the score) as the second item.
See the original question from StackOverflow I posted above for more info.
Good day, I'm a complete newbie and need to solve a problem. I'm not asking someone to do my assignment for me I'm just stuck and If I can see the code I will be able to understand how it was done and create a new problem to check if I understand the solution.
Enter an integer value for the variable called 'num' that contains a value between 35 and 74. Determine if the 'tens' digit is equal to, greater than or less than the 'ones' digit.
I must use pseudo code but cant find anything in my text book.
I dont know how to create an variable with a number between 35 and 75
I know what's a 'tens' and 'ones' digit but have no idea on how to do the calculation.
If someone can please help me with this it will be much appreciated. The book I have is not helpful and I need help.
Thanks in advance
G
am sorry for this question, but i was asking: when using MD5, we get a hash, so to get the password we hash all the words untill we find the same hash.
but in a key derivation algorithme such pbkdf2 or bcrypt or scrypt, what a hacker need to seek? or he will make the same algorithme to all words to get the same key derivation?
am sorry for this dumb question.
It’s the same general idea - try all the hashes - but several thousand (or million, or even billion) times slower.
I found the following question while preparing for an interview:
You are in a very huge library that
has no computer access, and you're
looking for one particular book.
You look up where the book suppose to
be from the card catalog, and went to
shelf X to find it.
However the book is not there.
There is only one person that can
answer questions, which is the
libarian, but he only answers yes/no
responses. Plus, his answers might not
be correct.
What is your strategy for finding this
book?
How would you answer this question? What methods of searching would you use?
Use Binary search type questions to narrow the location of the book.
Each question should narrow the search field by half.
"Is the book on this half of the library"? (Point to the right direction).
Would work as an initial question.
You can also use The Knight and the Knave as part of your method of questioning the person. Your first 5 questions (to establish a baseline) could be about things you 'know'. You could determine his error rate from there. After that, you can use Binary Search-esque questions to determine where the book is.
Ask the interviewer for more information about the librarian and go from there. In particular, find out if he's susceptible to bribery (I mean the librarian, but come to think of it this might go for the interviewer as well).
Double-check for dumb mistakes (wrong card, wrong shelf, "661-88" is reall "88-199" and so on).
Search the drawer of borrowed-book cards. If it's been borrowed, note the due date and come back later, or note the borrower's home address and go to plan B.
Look in the vicinity, a few books in either direction and the shelves above and below, in case it was incorrectly reshelved.
Check the tables, floors, photocopiers and return carts.
Look for a gap on the shelf. If there is a gap in the right spot then at least you know you're looking in the right place. If there's no gap then look for a book on that shelf that doesn't belong-- somebody may have swapped them by mistake. If there's no such misplaced book then maybe the book was never on this shelf, see below.
Look for dust on the shelf. It might indicate whether a book has been removed within the past month. Likewise check the index card for signs of age. The flowchart gets a little complicated, but the book may have been lost years ago.
Check the index system: if the book doesn't have the right number for its subject/title/author/whatever, then there is a typo on the index card and you must calculate the correct number yourself to find out where the book really is.
Just go out and buy the damned book, your time is more valuable than this.
Step A: Calibrate your Librarian.
Pick a random book in the library, walk to a random spot and then ask the Librarian if the book (whose location you know) is to your left. Keep testing the Librarian until you have a good estimate of the probability, p, that Librarian answers correctly. Note that if p < 0.5 then you are better off following the opposite of whatever Librarian tells you. If p=0.5 then give up on Librarian -- her responses are no better than a flip of a coin.
If you find that p depends on the question asked (for example, if the Librarian always answers certain questions correctly, but other questions always falsely), then go to Step B1.
Step B1:
If p==0.5 or p depends on the question asked, start thinking outside the box, like Beta suggests.
Step B2:
If p < 0.5, reverse the answer the Librarian gives, and proceed to Step B3.
Step B3:
If p > 0.5: Choose N. If p is close to 1, then N can be a low number like 10. If p is very close to 0.5, then choose N large, like 1000. The right value of N depends on p and how confident you wish to be.
Ask the Librarian the same question N times ("Is the book I'm looking for to my left").
Assume for the moment that whatever response is given more frequently is the "correct answer". Calculate the average response, assigning 1 for the "correct answer" and 0 for the wrong answer. Call this the "observed average".
The responses are like draws from a box with 2 tickets (the right answer and the wrong answer.) The standard deviation of a sample of N draws will be sqrt(pq), where q = 1-p.
The standard error of the average is sqrt(pq/N).
Take the null hypothesis to be that p=0.5 -- that the Librarian is simply giving random responses. The "expected average" (assuming the null hypthesis) is 1/2.
The z-statistic is the
(observed average - expected average)/(standard error of the average) =
(observed average - 0.5)*sqrt(N)/(sqrt(p*q))
The z-statistic follows a normal distribution. If the z-statistic is > 1.65 then you
have about a 95% chance the average response of the Librarian is statistically
significant. If after N questions z is less than 1.65, repeat Step B3 until you get statistically significant response. Note that the larger you choose N, the larger the z-statistic will be, and the easier it will be to obtain statistically significant results.
Step C:
Once you get a statistically significant response, you act upon it (using George Stocker's binary search idea) and hope you have not been statistically unlucky. :)
PS. Although the library might be 3-dimensional, you could play the Binary Search game along the x-axis, then the y-axis, then the z-axis. So the 3-dimensional problem can be reduced to solving 3 (1-dimensional problems).
here's a starting point: Assume the library uses the Dewey decimal system (but any classification system could be substituted).
Question 1: is the book in the 100s?
Question 2: is the book in the 200s?
..
is the book between 50 and 150?
is the book between 150 and 250?
Depends on who you are interviewing for:
Government (non-law enforcement/military) - hire infinite number of staff to check every location in library. Then hire an infinite number of junior managers to manage those staff, add an infinite number of middle managers etc.
Large corporation - same but use unpaid interns.
Government (law enforcement/military) - take librarian, apply tazer or waterboarding until location of book is revealed.
Small company (web 2.0 startup) - blog about location of book until somebody tells you.
Small company (real business) - try another library / bookstore.
Is it cheating to ask if the librarian takes commands? If he does, simply tell him to find the book and bring it back to you.
How would you answer this question?
"Thank you for your time." And I'd get up and walk out of the interview room. I'm not interested in working with people who think that asking silly riddles in an interrview is more useful than asking me to write some code or demonstrate how I would plan a project or lead a team.