Creating a plagiarism detector using google search appliance api - google-api

i wish to design an application that can detect plagiarism ( web based ) in all formats using Google search engine API.
what are the resources that i would be requiring for such an application.
Basically a user can upload a file and that file would be checked against various formats on the web . i have read that there are web crawlers but how exactly do we use them.
Is Google search appliance API the correct approach?
Also I have not used a Google API before so what would be the best way to start?
Thanks a lot

Maybe http://www.google.com/alerts will do? There are libraries with which you can do a single find, for example in Ruby that would be (at the first glance) a galerts gem. In Python a galerts package may help you to manage feeds. Other languages probably have similar libs. After that, parse result feeds by yourself.

Related

Can PHPCrawl can be used for scraping websites and how different is from Scrapy?

I want to scrape few websites and many suggested Scrapy. It is Python based and since I am very familiar with PHP I looked for alternatives.
I got a crawler PHPCrawl. I am not sure if it is just a crawler or will it provides scraping facility as well. If it can be used for scraping- will it support XPath or Regular expressions.
How can it be compared with Scrapy which is on Python.
Please suggest me which is best to use for scraping the websites.
Thanks
PHPCrawl is a pure crawler, it delivers found pages and their sourcecode to users "as they are" (together with some context-information). Therefor it's fast, it's able ot use multi processes and has tons of options to configure it.
Can't say much about Scrapy since i didn't use it so far.
Yes, of course.
But as i said, PHPCrawl delivers the page-sources, and you have to extract the data you want to extract from it.

Direct URL to "I'm Feeling Lucky" for images

I have a website for book reviews. I offer a link to the Amazon entry of the books. I discovered after a bit of research that the direct URL for Google's "I'm Feeling Lucky" is:
http://www.google.com/search?hl=en&q=TITLE+AUTHOR+amazon&btnI=745
Which works magic because then I don't have to manually include the Amazon link in my database and directly links to the Amazon page (works 99.99% of the times).
I was wondering if there was an equivalent for images (whether Google or some alternative) to retrieve an image URL based on keywords only (for the purpose of getting the book cover image).
There's no such thing for Google Images, but you might be able to use another web service to do what you want. I noticed that when you're searching for a book, the first image result isn't always the cover of it. Sometimes it's a photo of the author, sometimes it's some image from book's review, so you can hardly rely on that.
It should not be hard to parse the amazon page and get the image and link but google has an API to google books that return all informations about a book in JSON format, you can try it online on the API Explorer (the cover are on the results too). Click here to see an example (click "Execute" to run it).
Unfortunately public Google search engine doesn't support that. You should use Custom Search API to implement such feature in your application. Alternatively use XGoogle (unofficial Python wrapper to Google Search services, see google_dl tool for example).
Other suggestions is to use:
YQL by Yahoo (see yql-tables repo at GitHub for examples).
Use alternative search engines.
E.g. In Wolfram Alpha you can type: "show image of laptop" and it'll give you the first popular picture, however you need to use Wolfram|Alpha APIs or some script (see this ChatBot for example) to pick up the direct link.

Accessing the Google Documents List API from Ruby

I'm trying to figure out how to access the Google Documents List API from Ruby.
I've looked at the google-api-ruby-client but that doesn't seem to support that particular API. I've also looked at the gdata-ruby-util client but that looks like it's out of date and no longer active.
It seems odd that there's no ruby client for accessing such a popular API, so can anyone help with a solution?
Here is a library that lets you read/write files. It also has methods to read/write spreadsheets cells.
https://github.com/gimite/google-drive-ruby
http://code.google.com/p/gdata-ruby-util/ is the correct library.
I would say it is more "stable" than "no longer active".

Writing a web app using Dropbox API

I would like to write a web app that uses Dropbox for cloud storage.
If I understand correctly, I should use the Restful API to achieve that.
This documentation exists and is quite good but being a newcomer to Restful API I would love to see and play with a simple example that works with this API.
My questions are:
Am I right to assume that Rest API is the way to go?
Is there a quick and easy example (Maybe a live example) to get me going?
Thanks!
as you tagged your question with "ajax", i presume you want to do this entirely client-side (except for some proxy-code to be able to make requests accross domains)? I haven't tried it out myself, but there's dropbox-js on google code which will at least give you some ideas (and if the Dropbox API didn't change too much since June 2010 it might even work out of the box)?
Update: there's no "download", but you can browse the source code of trunk here.
Here's a lengthy article on the matter
Some love for Javascript Applications with code samples, a demo etc.

Extend iPhoto to post pictures to a different web service?

Currently iPhoto lets me upload pictures to Facebook and Flickr.
Is there any way (perhaps write a plugin) to extend this so that I can post photos to a different web service e.g. Picasa?
Its possible. If you are looking for a Picasa plugin, though, Google already has one (I haven't used it in the current version of iPhoto). If you are looking to develop your plugins, this article might help.

Resources