How do I scrape google readers with mechanize (using cookies) - ruby

I'm trying scrape google readers but I've got problems...I wish to log in google readers and get a valid cookie...then try enter in this page:
'http://www.google.es/reader/atom/user/-/state/com.google/reading-list'
if my cookies work and I'm logged in I only need to put "user/-/" and it will enter inside my google reader's XML version....
It's in theory ... I log in inside google readers and it redirects ... then I copy my SID .... and I create a manual cookie using this and the google reader's API info
http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI
name SID
domain .google.com
path /
expires 1600000000
with my cookie create I try enter inside:
'http://www.google.es/reader/atom/user/-/state/com.google/reading-list'
but it don't work .... I think I'm creating my cookie in a bad way but I read the API about CookieJar and Mechanize::Cookie, but I don't find any example about how to use it ... I've tried in different ways but none work ... please someone can help me about how use this cookie....

We do all our web scraping with iMacros (partly free/open source, partly commercial). That works well. No matter what you use, you need something that automates a real web browser. Other options are Selenium or Watir, although these are more geared towards web testing.

Related

Google Drive API v3 : there isn't any way to get a download url for a google document?

The Google Drive API v2 to v3 migration guide says:
The exportLinks field has been removed from files. To export Google Documents, use the files.export method instead.
I don't want to export (download) the file right away. "files.export" will actually download the file. I want a link to download the file, later. This was possible in v2 by means of the exportLinks.
How can I in v3 accomplish the same? If it is not possible, why was this useful feature removed?
Besides, (similar problem to above) downloadUrl was also removed, and the suggested alternative ("files.get with ?alt=media") downloads the file instead of providing a download link. This means there is no way in v3 to get a public short lived URL for a file?
EDIT:
there is no way in v3 to get a public short lived URL for a file?
For regular files, apparently yes.
This seems to work fine (a public short lived link to the file with its right name and contents):
https://www.googleapis.com/drive/v3/files/ID?alt=media&access_token=TOKEN
For google apps files, no (not even private, as v2 exportLinks used to be).
https://www.googleapis.com/drive/v3/files/ID/exportmimeType=TYPEv&access_token=TOKEN
Similar to regular files, this URL is a short lived link to the file contents, but lacking of its right name.
BTW, I see the API is not behaving consistently: /drive/v3/files/FILEID delivers the right file name, but /drive/v3/files/FILEID/export does not.
I think the API itself should be setting the right Content-Disposition, as it is apparently doing when issuing a /drive/v3/files/FILEID call.
This file naming problem invalidates the workaround to the lack of ExportLinks in v3.
The v2 ExportLinks allowed me to link a file (which is not the same as getting its content right away). Anyone logged in and with the proper permissions was able to access it, and the link didn't needed any access_token, and it wasn't short lived. It was good and useful.
Building a link with a raw API call like /drive/v3/files/FILEID/export (with mandatory access_token) would be an close enough workaround (it is temporary and public, not the same as it was, anyway). However, the naming problem invalidates it.
In v2, regular files have a WebContentLink and google apps files have exportLinks. In v3 exportLinks are gone, and I don't see any suitable alternative to them.
Once you query for your file by id you can use the function getWebContentLink() to get the download link of the file (eg. $file->getWebContentLink() ).
I think you're placing too much emphasis on the word "method".
There is still a link to export a file, it's https://www.googleapis.com/drive/v3/files/fileIdxxxxx/export&mimeType=xxxxx/xxxxx. Make sure you URL encode the mime type.
Eg
https://www.googleapis.com/drive/v3/files/1fGBQ81haNU_nEiC5GITZD3bxT0ppL2LHg-C0ubD4Q_s/export?mimeType=text/csv&access_token=ya29.Gmo0BMvO-pVEPKsiD9j4D-NZVGE91MChRvwOcBSg3cTHt5uAClf-jFxcovQScbO2QQhwHS95eSGW1eQQcK5G1UQ6oI4BFEJJkntEBkgriZ14GbHuvpDL7LT2pKA--WiPuNoDDIuZMm5lWtlr
These links form part of the API, so the expectation is that you've written a client that sends authenticated requests, and deals with the response data. This explains why, if you simply paste the link into a browser without an access_token, it will fail. It also explains why the filename is export, ie. it isn't intended that your client would ever use a filename, but rather it should receive the data as a stream. This SO answer discusses the situation in more detail How to set name of file downloaded from browser?

Reading a Google Spreadsheet into Ruby Objects (or settle for a file download)

I think it's probably simplest to start with my use case:
I'm trying to read the contents of a Google Spreadsheet into ruby, and then use that data for other purposes. This needs to happen server-to-server.
Here's what I've tried:
First I tried to use this google drive gem. Since my interaction needs to be server to server (i.e. a service account), I couldn't get this to work (someone please let me know if they've managed this!)
Next, I tried using the latest google-api-ruby gem, and, following its documentation, was successfully able to authenticate my service account, and have been able to get lists of files, etc. So basically the API is authenticated and working.
The latest issue is that Google Spreadsheets can't be downloaded using the normal download_dest parameter on the get_file method (sorry - would post links to relevant documentation here, but don't have enough rep to do so). So I'm not really sure how to proceed with downloading the relevant file (I understand how to get an export_link), but have no clue how to then prepare the next request).
Any help would be greatly appreciated :)
Here's some basic code for reference -
require 'googleauth'
require 'google/apis/drive_v2'
scopes = ['https://www.googleapis.com/auth/drive']
ENV["GOOGLE_APPLICATION_CREDENTIALS"] = 'path/to/my/creds.json'
auth = Google::Auth.get_application_default(scopes)
drive = Google::Apis::DriveV2::DriveService.new
drive.authorization = auth
ss = drive.get_file('my_spreadsheet_id')
export_url = ss.export_links
# what now?

can you load external executable javascript from a firefox extension?

Does anyone know if there is a way to load any external executable javascript from a firefox add-on extension? I looked into scriptloader.loadSubScript, but it appears that it can only load from a local resource.
Any help would be appreciated.
You can always xhr for a file, save the contents to disk, then use scriptloader.loadSubScript with an add-on
this would violate the AMO policies though, so you wouldn't be able to upload the add-on to http://addons.mozilla.org
As #erikvold already pointed out, doing so would be a security hazard AND it also violates AMO rules (because it is a security hazard).
Consider your server gets compromised, or there is a way to MITM the connection retrieving the remote script (TLS bugs anyone :p), or you sell your domain and the new owner decides to ship a script to collect credit card information straight from a user's hard disk...
However, it is possible to run a remote script in an unprivileged environment, much like it would run in a website.
Create a Sandbox. The Sandbox should be unprivileged, e.g. pass an URL in your domain into the constructor.
Retrieve your script, e.g. with XHR.
Evaluate your script in the Sandbox and pull out any data it might have generated for you.
This is essentially what tools like Greasemonkey (executing user scripts) do.
Creating and working with Sandboxes in a secure fashion is hard, and the Sandbox being unprivileged prohibits a lot of use cases, but maybe it will work for your stuff.
Try using Components.utils.import .
Example :
const {Cc,Ci,Cu} = require("chrome");
Cu.import("url/path of the file");
Note :
js file which uses DOM objects like window, navigator, etc. will return error saying "window/navigator is undefined". This is simply because the main.js code does not have access to DOM.
Refer this thread for more information.

Parse.com Mixed Content Error

I'm creating a web application using parse and have found that in order for a user to authenticate I need to make all requests using HTTPS. I'm able to switch this over and get it to work correctly, but when I do I get all kinds of mixed content errors because I'm retrieving PFFile objects which only return a non-secure URL.
This wouldn't even be a huge concern with Chrome or Safari but of course IE needs to present a message to the user and block all this content. Are there any potential work arounds? Why can't parse just put a setting in the app to enable files to be served from a secure url? This seems completely ridiculous. How do people get around this? Are you completely avoiding the use of PFFile?
Replace http:// with https://s3.amazonaws.com/.
So if you start with this:
http://files.parsetfss.com/b05e3211-bf8b-.../tfss-fa825f28-e541-...-jpg
The final url will look something like this:
https://s3.amazonaws.com/files.parsetfss.com/b05e3211-bf8b-.../tfss-fa825f28-e541-...-jpg

Google static map API getting 403 forbidden when loading from img tag

What I have is a Google map that shows the location of a property but when I come to print the dynamic maps dont print so good so I decided to implement the Google Static Map image API.
http://lpoc.co.uk/properties-for-sale/property/oldgate-dairy-st-james-road-long-sutton-cambridgeshire-pe12/?prop-print=1
^^ is an example of a property in print view and should show a static map image but it fails to load and looking at my inspector I'm getting a 403 Forbiden response for the image.
But if I go to the URL directly the image loads...
What am I doing wrong?
Thanks
Scott
This has gotten quite a lot of views, so I'm adding my solution to the problem here:
When using the new API, make sure you generate a Key for browser apps (with referers) and also make sure the patterns match your URL.
E.g. when requesting from example.com your pattern should be
example.com/*
When you're requesting from www.example.com:
*.example.com/*
So make sure you check whether a subdomain is present and allow both patterns in the developer console.
Visit the Developer Console.
Under API Keys, click the pencil icon to edit.
Under "Key restrictions", ensure that you have an entry for example.com/*, *.example.com/*, and any local testing domains you might want.
There seems to be some confusion here, and since this thread is highly ranked on Google, it seems relevant to clarify.
Google has a couple of different API's to use for their maps service:
Javascript API
The old version of this API was version 2, which required a key. This version is deprecated, and it is recommended to upgrade to the newer version 3. Note that the documentation still states that you need a key for this to function, except if you're using "Google Maps API for Business".
Static Maps API
This is a whole different story. Static maps is a service that does not require any javascript. You simply call an url, and Google will return a maps image, making it possible to insert the URL directly into your <img> tag.
The newest version is version 2, and this requires a key to function because a usage limit is applied.
A key can be requested here:
https://code.google.com/apis/console
And the key should be added to the request for the correct image to be generated:
http://maps.googleapis.com/maps/api/staticmap?center=New+York,NY&zoom=13&size=600x300&key=API_console_key
I hope this clears up some confusion.
I had this same problem but my solution was different. I had the V2 maps api enabled, but not the static maps api (I thought this was V2). I enabled the static maps api and it worked.
Oops I feel like such an idiot. I was using the old V2 maps API URL and not the new V3 API URL. I was getting a 403 because I was using the V2 URL without providing an API key :(
Be hundred percent sure of these points: (for static maps)
Enable your project at this url :
https://console.developers.google.com/apis/api/static_maps_backend/overview?project=
You have your localhost, staging and production - all urls with wildcards enabled in the referrer section.
Google has changed its policy and you now need an api key to display maps. refer this for more : Google Maps API without key?
Hope it helps.
Staticmaps V3 doesn't need the "Key" attribute and removing it seems to solve the <img> source problem.
Try with an URL like this:
http://maps.googleapis.com/maps/api/staticmap?center=0.0000,0.0000&zoom=13&size=200x200&maptype=roadmap&markers=0.0000,0.0000&sensor=false
For more information read this.
Yeah, Google Maps API version 3 were java-script version; "Google Static Maps" latest were 2.0. I suspect there might be some restriction on use.
I could also not display static maps and could see 403 error in the browser's network console.
http response headers:
status:403
x-content-type-options:nosniff
I had an API key with a lot of Google Maps APIs enabled but the Google Static Maps API was missing, enabling it solved the issue.
now you should use 'signature' parameter, which you should add to request - otherwise static maps won't work.
here is few useful links
1) how to generate signature
2) how to make signature on BE side (code snippet)
I am using Wordpress 4.9.4 with ChurchThemes Exodus Theme. I had applied for & generated a New API_KEY.
I confirmed it was being used when calling the map:
Google Map Link
However the Js Console showed the following error:
Google Maps Error in Js Console
As Johnny White mentioned above I had to navigate to the API Library Screen via APIs & Services Menu:
enter image description here
You will be greeted by the API Library screen:
API Library Screen
Click on Maps(17) Lower LHS.
Search for & click Google Static Maps API - Enable it if needed:
Google Static Maps API
You may also need to enable Google Maps Javascript API (same process as for Static Maps:
Google Maps Javascript API
Once that is done your maps should start appearing on your site or app.
If they don't appear on refresh you may need to:
clear your cache (Wordpress or Drupal webistes),
wait the 5 min recommended for the API to Register the enabled API's
Try enabling billing on this Google Cloud Project/Firebase Project.
I was experiencing this same issue and just received the 403 error in the console.
Copying and pasting the Static Maps URL in to the URL bar and loading it showed the following error message:
The Google Maps Platform server rejected your request. You must enable Billing on the Google Cloud Project at
https://console.cloud.google.com/project/_/billing/enable Learn more at https://developers.google.com/maps/gmp-get-started
Hope this helps!

Resources