How does one query IMDB database for retrieving data for analysis? - imdb

For example, I want to find the list of actors/actresses that have collaborated with Nicholas Cage the most. DBPEDIA has a extracted data from Wikipedia and put it into a semantic format, what about IMDB? Is there a similar service for such data analysis tasks? I would be interested to hear your answers. I have tried Google but haven't found anything substantial.

Use the OMDb API
<?php
$json=file_get_content("http://www.omdbapi.com/?i=tt1285016"); // get i = imdb ID
$info=json_decode($json);
print_r($info);
?>
Sample Output
The documentation is very clear.

I thought I'll just answer my question since I've discovered an awesome service that has IMDB data extracted into a semantic RDF format with a SPARQL endpoint for querying. This is it.
http://linkedmdb.org/

Related

google-cloud-vision API gives fewer results

I was using ruby client of Google Cloud Vision, to extract the vehicle information on Automobile Original Titles.
Observations:
When I used the client API, i was getting 171 words.
But, when I used the google's API demo here: https://cloud.google.com/vision/, I got 459 words. It has much of the information I was looking for.
Can anyone please explain, how to get the most out of the API ?
I found the answer to my question,
thanks to #marlon-giona.
I was referring to the post: Google Vision API text detection strange behaviour - Javascript
When I used the image.document to extract dense text, I got the exact words I was looking for

How does Market Samurai and Long Tail Pro handle retrieving the top 10 Google search results for a keyword?

I'm curious to know how Market Samurai, Long Tail Pro and other software handle retrieving the top 10 Google search results and not running into limits. It appears that these software packages use the users own Google account. Google Custom Search limits users to 100 queries per day (the free limit) but people tend to do keyword research on hundreds or even thousands of keywords per day and don't pay any additional amounts to Google.
Are they paying extra for this service, are they using a different API (perhaps the Adwords API?) or are they scraping the Google search results page (violation of TOS)? Really would like to know! Thanks.
i have done this in one of my project (in java).
this is very simple, in java there is one library call JSoup by using this library you can send get request to google, for example:
https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=<your url encoded search term>
this will return you an HTML code of google search result with your own term.
using Jsoup u can find specific HTML tag with specific class or id. this concept helps you to extract url link, title and description from google search result.
for working example check here, in that example you can extract google serach result links with custom search term.
i hope this will help you.

MergeOption Usage Examples in EF

Am new to EF and am trying to understand the updated concepts as am getting old data to use EFextensions etc..(mostly) when I search in Google.
Am trying to execute a SQL query using ExecuteStoreQuery.
I have something called MergeOption. What exactly does it do?
I have read http://msdn.microsoft.com/en-us/library/system.data.objects.mergeoption.aspx
But am not understanding it clearly. Some examples will take me through.
Thanks,
Peru
Specifying the MergeOption with ExecuteStoreQuery allows you to determine how results will be tracked as entities. As you read in the article you referenced, there are four options:
AppendOnly (default)
OverwriteChanges
PreserveChanges
NoTracking
Here are a couple of links (basic example, detailed example) that show some in-depth examples of MergeOption is use, and its impact on the objects being tracked.

How do I see/debug the way SOLR find it's results?

Let's say I search for "ABLS" and the SOLR returns a result that to me does not make any sense.
How can I debug why SOLR picked this record to be returned?
debugQuery=true would help you get the detailed score calculation and the explanation for each scores.
An over view of the scoring is available at link
For detailed explaination of the debug information you can refer Link
You could add debugQuery=true&indent=true to the url and examine the results. You could also use the analysis tool in solr. Go to the admin and click analysis. You would need to read the wiki to understand either of these more in depth.
queryDebug will give you knowledge about why your scoring looks like it does (end how every field is relevant).
I will get some results that you are not understand and play with them with Solr's analysis
You should find it under:
/admin/analysis.jsp?highlight=on
Alternatively turn on highlighting over your results to see what is actually matching in your results
Solr queries are full of short parameters, hard to read and modify, especially when the parameters are too many.
And after it is even harder to debug and understand why a document is more or less relevant than another. The debug explain output usually is a three too big to fit in one page.
I found this Google Chrome extension useful to see Solr Query explain and debug in a clear manner.
For those who still use very old version of solr 3.X, "debugQuery=true" will not put the debug information. you should specify "debugQuery=on".
There are two ways of doing that. First is the query level, which means adding the debugQuery=on to your query. That will include a few things:
parsed query
debug timing information
detailed scoring information which helps you with analysis of why a give document is given a score.
In addition to that, you can use the [explain] transformer and add it to your fl parameter. For example ...&fl=*,[explain], which will result in your documents having the scoring information as another field.
The scoring information can be quite extensive and will include calculations done by the similarity algorithm. If you would like to learn more about the similarities and the scoring algorithm in Solr, have a look at this my and my colleague Radu from Sematext talk from the Activate conference: https://www.youtube.com/watch?v=kKocQdYGVJM

Searching a datastore for related topics by keyword

For example, how does StackOverflow decide other questions are similar?
When I typed in the question above and then tabbed to this memo control I saw a list of existing questions which might be the same as the one I am asking.
What technique is used to find similar questions?
I got an email from team#stackoverflow.com on Mar 20 that mentions how it works:
the "ask a question" search is
exclusively on title and will not
match anything in the body. It is a
mystery to me why people think it's
better.
The last sentence refers to the search bar, which I've found is less useful when I'm trying to find a specific question I've already seen.
I think it's plain old word matching. However, I might add that this feature does not work as well as I would like it to. It's much better to do google search with site:stackoverflow.com prefix than to rely on SO to provide the relevant suggestions.
Poorly -- using MS SQL Full Text Search, I believe. You'll have better luck using Lucene, IMO. For more background on the topic see the Wikipedia article on Lucene or the general topic of information retrieval.
The matching program would store an index of all questions. When you ask a question, all keywords in your question are matched against the index. This is similar to Google Search. Lucene open source search can be (and with high probability has been) used for this. Since the results are not quite accurate, I presume they index just the headlines of the questions, as an approximation.
The other related keyword is collaborative filtering, the algorithm popularized by Amazon to recommend products based on behavior of other similar customers. In the current case, an alternative algorithm based on collaborative filtering is: keywords are extracted from the question, then tags associated (in the history) with the keywords are found. Questions which have those tags are returned. Well, experiments are needed to see whether it works well at all.

Resources