I'm trying to pull data from Google trends and got a "You have reached your daily limit" error after only 2 tries.
Is there any way to go around this? I know Google API projects have special quota limits but Google Trends doesn't have an API. I also read that we may need to pass it a cookie file so that it seems like I'm logged in. Has anyone faced this issue before?
I'm struggling with the same issue!
From your question I can't figure out what stage have you achieved...
But here is the solution that I've found:
You should emulate browser with cookies.
I think the best way to do it is to use Mechanize library.
At first your program should "login" using GET request to "https://accounts.google.com/Login?hl=en"
Immediately after that you can access some other personal resources, but not google trends!
After some significant time you can successfully get google trends data as CSV.
I still have not discovered the exact time period, but it is more than 10 minutes and less than several hours :). That is why saving your cookies for latter use is a good idea!
Few more tips:
If you are developing using python / ruby under Windows do not forget to set up CA ROOT certificates package for OpenSSL library. Otherwise HTTPS connection will fail and you won't login! See Getting the `certificate verify failed (OpenSSL::SSL::SSLError)` erro with Mechanize object
I recommend you to save cookies to external file at program shutdown. And restoring them at startup.
Do not forget to allow redirects, because Google is using redirects all the time.
Ruby code example:
require 'mechanize'
require 'logger'
begin
agent = Mechanize.new { |a|
a.user_agent = 'Opera/9.80 (Windows NT 5.1) Presto/2.12.388 Version/12.16'
cert_store = OpenSSL::X509::Store.new
cert_store.add_file 'cacert.pem'
a.cert_store = cert_store
a.log = Logger.new('mech.log')
if File.file?('mech.cookies')
cookies = Mechanize::CookieJar.new
cookies.load('mech.cookies')
a.cookie_jar = cookies
end
a.open_timeout = 5
a.read_timeout = 6
a.keep_alive = true
a.redirect_ok = true
}
LOGIN_URL = "https://accounts.google.com/Login?hl=en&continue=http://www.google.com/trends/"
login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = *
login_form.Passwd = *
login_response_page = agent.submit(login_form)
page = agent.get(url)
# DO SOME TRENDS REQUESTS AFTER SIGNIFICANT PERIOD OF TIME
ensure
if agent
agent.cookie_jar.save('mech.cookies')
end
end
You probably disabled your cookies, which makes Google Trends think you're a robot
I think I have found a way to solve the problem. Just make sure that you call the Google Trends API with the cookie PREF. That is you don't need to login the Google account. Of course, you don't need to emulate browser. The cookie PREF is just enough.
OK. Where the cookie PREF comes from? It is very easy. Just open the browser, and login in your Google account. Finally, look up the cookie PREF under the Google website, it is just under the domain www.google.com.Then copy the value of the cookie PREF to your program or script. That's all.
I have called the Google Trends API hundreds of times in several seconds by this way. Good Luck to you!
I found this paper about prevention or just a Zeta-Jones effect in google Trends, it was so useable:
G Fond, A Gamanb, E Haffenb, P Llorca. "Google Trends: ready for real-time suicide prevention or just a Zeta-Jones effect ?." International Journal of Computer Networks and Communications Security 3, no. 1 (2015): 1-5.
Related
I'm looking to use Selenium with a username/password authenticated proxy in Ruby. I realize that most people use ProxyChain when doing this in Chrome, but I'd like to use a solution without any additional gems since it doesn't play well on Heroku, plus I'm using Firefox so there seems to be a possible other option judging by THIS question though it's written in Python.
I used the selenium docs to translate that code to Ruby, but Selenium is still not using my proxy when navigating to a webpage. Oddly enough when I refresh the page manually it prompts me for the username/password but it doesn't do that on the initial page load.
profile = Selenium::WebDriver::Firefox::Profile.new
profile["network.proxy.type"] = 1
# proxy ip and port are fake for this example
profile["network.proxy.http"] = "182.192.157.60"
profile["network.proxy.http_port"] = 12345
# set the username and password
profile["network.proxy.socks_username"] = "my_username"
profile["network.proxy.socks_password"] = "my_password"
options = Selenium::WebDriver::Firefox::Options.new(profile: profile)
driver = Selenium::WebDriver.for :firefox, options: options
If anyone has any ideas I would certainly appreciate the help. Thank you.
I'm trying to (ab)use the capybara web testing framework to automate some tasks on github that are not accessible via the github API and which require me to be logged in and click on buttons to send AJAX requests.
Since capybara/selenium is a testing framework it helpfully creates a temporary session which has no cookies in it. I'd like to either stop it from doing that, or else I'd like to know how to load my cookie store into the browser session that it creates.
All I'm trying to do is this:
#!/usr/bin/env ruby
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome
driver.navigate.to "https://github.com"
Or this:
#!/usr/bin/env ruby
require 'capybara'
Capybara.register_driver :selenium do |app|
Capybara::Selenium::Driver.new(app, :browser => :chrome)
end
session = Capybara::Session.new(:selenium)
session.visit "https://www.github.com"
In both cases I get the github.com landing page you'd see as a logged-out user or incognito mode in the browser. I'd like to get my logged-in landing page like I just fired up a web browser myself and navigated to that URL.
Since I have 2FA setup on github that makes automating the login process from the github landing page somewhat annoying, so I'd like to avoid automating logging into github. The tasks that I want to automate do not require re-authenticating via 2FA.
ANSWER:
For MacOSX+Ruby+Selenium this works:
#!/usr/bin/env ruby
require 'selenium-webdriver'
caps = Selenium::WebDriver::Remote::Capabilities.chrome("chromeOptions" => {"debuggerAddress" => "127.0.0.1:20480"}, detach: false)
driver = Selenium::WebDriver.for :chrome, :desired_capabilities => caps
driver.navigate.to "https://github.com"
Then fire up chrome with this:
% /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --user-data-dir=/Users/lamont/Library/Application\ Support/Google/Chrome --profile-directory=Default --remote-debugging-port=20480
Obviously the paths will need to be adjusted because they're OSX-centric and have my homedir in them.
There is also a bug in the selenium-webdriver gem for ruby where it inserts a 'detach' option which gets into a fight with 'debuggerAddress':
/Users/lamont/.rvm/gems/ruby-2.2.4/gems/selenium-webdriver-2.53.0/lib/selenium/webdriver/remote/response.rb:70:in `assert_ok': unknown error: cannot parse capability: chromeOptions (Selenium::WebDriver::Error::UnknownError)
from unknown error: unrecognized chrome option: detach
The lib/selenium/webdriver/chrome/bridge.rb file can be edited to take that out as a quick hack:
chrome_options['binary'] = Chrome.path if Chrome.path
chrome_options['nativeEvents'] = true if native_events
chrome_options['verbose'] = true if verbose
#chrome_options['detach'] = detach.nil? || !!detach
chrome_options['noWebsiteTestingDefaults'] = true if no_website_testing_defaults
chrome_options['prefs'] = prefs if prefs
To implement something similar in Ruby, check out this page that goes over that. Thanks to lamont for letting me know in the comments.
You can start chrome using a specific Chrome profile. I am not sure what the ruby implementation would look like, but in python it looks something like:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options as ChromeOptions
options = ChromeOptions()
# more on this line here later.
options.add_experimental_option('debuggerAddress', '127.0.0.1:7878')
driver = webdriver.Chrome(chrome_options=otpions)
In order for this to work you need to do a few things.
manually start chrome from terminal/command prompt with these command line arguments
--user-data-dir=/path/to/any/custom/directory/home/user/Desktop/Chromedir --profile-directory="Profile 1" --remote-debugging-port=7878
make sure "Profile 1" is already existing in the same --user-data-dir (make sure user Profile 1 has necessary chrome://components/
to run any apps that require those components)
you can use any free port in place of 7878
verify that http://localhost:7878 is running and returns value.
This should manually launch chrome with the "Profile 1" profile, and so long as it has logged into the site in question, it will stay logged in like a normal user so long as you follow these instructions to run the tests.
I used this to write a quick netflix bot that clicks the "continue playing" button when it pops up, and it's the only way to get DRM content to play as far as I have found. But it retains the cookies for the login, and also launches chrome with whatever components the profile is set up to have.
I have tried launching chrome with specific profiles before using different methodologies, but this was the only way to really force it to work how I wanted it to.
Edit: There are methods for saving cookie info as well although I don't know how well they work. Check out this link for more info, as my solution is probably not the best solution even if it works.
The show_me_the_cookies gem provides cross-driver cookie manipulation and can let you add new cookies. The one thing to be aware of when using selenium is that you need to visit the domain before you can create cookie for it, so you'll need to do something like
visit "https://www.github.com"
create_cookie(...)
visit "https://www.github.com"
for it to work - first visit just puts the browser/driver in a state where you can create the cookie, second visit actually goes to the page with the cookies set.
I had to tweak the OP's answer (from within her question) to get this going with Ruby in 2022.
Prerequisites
Chromedriver installed and allowed to run even though it's not signed:
> brew install chromedriver
> xattr -d com.apple.quarantine /usr/local/bin/chromedriver
Chrome launched and accepting commands on a specific port:
> /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --user-data-dir=~/Library/Application\ Support/Google/Chrome --profile-directory=Default --remote-debugging-port=20480
This created a new profile in Chrome so I signed in to my account and got the browser set up, ready to start interacting with the (legacy EdTech) site I'm trying to automate.
Actual use
require 'selenium-webdriver'
caps = Selenium::WebDriver::Remote::Capabilities.chrome("goog:chromeOptions" => {"debuggerAddress" => "127.0.0.1:20480"})
driver = Selenium::WebDriver.for :chrome, capabilities: caps
driver.navigate.to "https://www.google.com"
I've written some Ruby code (connected with Cucumber) that will go to a website and click a file that I'd like to download. The browser I'm using for this is Google Chrome.
Typically, when you go to download a file in Chrome, it doesn't ask for permission. However, when I run the code I made, it says:
"This type of file can harm your computer. Do you want to keep file_name.exe anyway?" It gives 2 options, "keep" or "discard". I have to click keep.
Obviously, you don't want all executables to just start downloading; however, this particular website/file should always be trustworthy.
Is there a command in Ruby or Cucumber that allows you to click the "keep" button automatically? This could just be a general "click at this pixel" or something. Or is there a way to mark a particular website in Chrome as safe. You can't inspect the element because it's not part of the website, but, instead, part of the browser. Preferably without having to download other software.
With this being said, this suggests that if it is possible, it should also be possible to automate an installation (as in clicking next -> next -> etc) for you. Hopefully this is correct?
Thanks in advance.
You can implement it in any browser. But, for Google Chrome, here is the solution -
profile = Selenium::WebDriver::Chrome::Profile.new
profile['download.prompt_for_download'] = false
profile['download.default_directory'] = "Absolute or relative path to your download directory"
browser = Selenium::WebDriver.for :chrome, :profile => profile
You haven't specified which gem you use for browser. But, even if you use watir-webdriver, you can use the same profile you created above with watir-webdriver.
browser = Watir::Browser.new :chrome, :profile => profile
I actually switched to using Sikuli, which worked pretty well. Thanks for the help, though.
Do you really need or want the browser to download the file? Are you really testing the browser's download feature, or do you want to verify that the server can serve the file and that it is what you expect?
I found the idea of setting up a default directory and having to check for the file clumsy, fragile and prone to errors, especially when setting up on a new host, especially for tests that run in multiple browsers.
My solution is to just use Ruby (or whatever language) features to download the file directly, and then validate that it is the file it's supposed to be. I'm not testing the browser, I'm testing the software. The only exception to that idea I can think of is if you use some javascript logic or something browser-dependent to redirect you to a link, but please don't ever do that.
However, you run into a problem if you have to log in to access your file; you either have to implement auth in your Ruby code, which isn't technically part of your Cucumber specification, or you need the cookies. I use this code to copy the cookies to avoid logging in again, and grab the file:
def assert_file_link(uri, filename, content_type)
f = open_uri_with_cookies uri
attachment_filename = f.meta["content-disposition"].sub("Attachment;filename=", "") # "Attachment;filename=Simple Flow - Simple Form.rtf"
content_length = Integer(f.meta["content-length"])
assert(f.status == ["200", "OK"], "Response was not 200 OK")
assert(f.content_type == content_type, "Expected content-type of '#{content_type}' but was '#{f.content_type}'")
assert(attachment_filename == filename, "Expected filename of '#{filename}' but was '#{attachment_filename}'")
assert(content_length > 0, "Expected content-length > 0 but was '#{content_length}'")
end
def open_uri_with_cookies(uri)
# hack the cookies from the existing session so we don't need to log in!
cookies = ""
#driver.manage.all_cookies.each { |cookie| cookies.concat("#{cookie[:name]}=#{cookie[:value]}; ") }
if block_given?
open(uri, "Cookie" => cookies, :proxy => nil) do |f|
yield f
end
else
open(uri, "Cookie" => cookies, :proxy => nil)
end
end
Hope this helps.
I'm new to Windows phone 7 application development. I'm currently developing an app in which I wish to do a HTML request and display the result obtained in Web browser. For example, Suppose I give the below URI
"http://m.imdb.com/find?q="+search_string (where search_string is a variable)
I want to take the result obtained from this and display it on the web browser. I've been searching regarding this for past 1 day... Didn't get any fruitful results. So please redirect me either to a suitable tutorial page or please give a sample code?
WebBrowserTask wbt = new WebBrowserTask();
wbt.URL = "http://m.imdb.com/find?q="+search_string
wbt.Show();
this will allow you to launch the url in the browser.
WebBrowserTask
This is a weird bug, and I'm not even sure how to begin figuring out what's going on.
We are using Cake 1.3.8 with our sessions in the database. I am not using ACL or any other access control. If we navigate into the application and click around a bit, and then rapidly click the browser back button twice (I've tried in Firefox and Chrome) the user is logged out more often than not and receives the error message 'You are not authorized to access that location'.
All of my searches thus far have involved people wanting to make the page inaccessible if a user logged out and then used the back button. I'm not seeing anything reported with regards to the issue I'm seeing.
Does anybody know if this is a Cake issue or have any thoughts on debugging what is going wrong?
Update: I found where the problem is. I have the security set to high, because we need the session to be closed whenever somebody closes the browser. I also have the timeout set very high because we do large binary uploads to S3, and don't want the user logged out while it's uploading or downloading. The specific block of code in cake_sessions.php that's causing the problem is:
$time = $this->read('Config.time');
$this->write('Config.time', $this->sessionTime);
if (Configure::read('Security.level') === 'high') {
$check = $this->read('Config.timeout');
$check -= 1;
$this->write('Config.timeout', $check);
if (time() > ($time - (Security::inactiveMins() * Configure::read('Session.timeout')) + 2) || $check < 1) {
$this->renew();
$this->write('Config.timeout', 10);
}
}
$this->valid = true;
I would guess this is because session IDs are regenerated between requests when security = high. Source:
http://book.cakephp.org/compare/44/CakePHP-Core-Configuration-Variables/cakephp/cakephp1x
You only need one out of sync request, say for a missing image and you will lose the session. I've generally found it unworkable because it's not possible to prevent users double-clicking on links and buttons and invalidating their session.
I would think about using medium security, setting the session timeout fairly short and using an AJAX script to refresh the session at regular intervals (eg every 60s). That way the user will be logged out quickly if the tab/window is closed.
If security is a priority I would suggest hacking the core to make sure the session cookies are set to http_only to help guard against session hijacking by XSS attacks. Cakephp 1.x supports PHP4 so probably isn't setting this by default.
http://php.net/manual/en/function.setcookie.php
It's possible that the session is erased and before it can be written again, the back button is clicked removing the auth from the session variables.
Page loads -> Back Button Clicks -> sessions is erased (but before session is rewritten) -> Back button clicks -> Session checks no existing session.
The only thing that I can think is happening is that when you're going back a page too quickly your code can't validate the person quickly enough (round trip from checking credentials) and throws an error that gets displayed on the next page that is loaded (second backed-to page).
Are you sure the person is actually logged out, or is it just the error being thrown?
Without seeing any code, it will be difficult to nail it down any further.