Totally stuck trying to get HTTPS data using Ruby on Windows - ruby

I'm using Ruby 1.9.3 and trying to write a Google Play scraper loosely based on this one. I am having a really hard time with the HTTPS part of it.
Basically, using Nokogiri::HTML(open("https://play.google.com/store/#{type}/details?id=#{id}")) (as in the original gem) failed on Windows, for reasons explained on this thread.
So, I tried implementing the solution from that same thread, but it is really not working at all. I've even stopped trying with HTTPS for now, because there must be something basic I am missing on even just HTTP.
Here's the code I currently have:
url = URI.parse( "http://google.com/" )
http = Net::HTTP.new( url.host, url.port )
http.use_ssl = true if url.port == 443
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
res, data = http.get ("http://google.com/")
puts data
In this case, I get nothing. Not even "nil", just no output at all.
However, when I just do a straight Net::HTTP.get_print URI('http://www.google.com'), I get the output, no problems.
Any help would be most appreciated. The real solution I am looking for is a simple way to scrape Google Play pages when using Windows -- this is just a step on the way there. So, if you know of a simpler way to accomplish this, I'd love to hear about it.

The reason you are getting nil is because data doesn't have anything assigned to it. This line is only assigning to res:
res, data = http.get("http://google.com/")
Also, Google must be accessed using http://www.google.com with the www otherwise all you get back is a 301 redirect message and Net::HTTPMovedPermanently object.

Related

Receiving no translated text from an https request to translate.google.com

Now I tried to make like a translator through Roblox Studio using Https service by sending a request to the translate.google.com link the thing is that anything I get in return does not give me the translated text.
I put what I received in a google doc and tried to find it by pressing ctrl + f to try to find it but no luck the only thing I could find is that text that was supposed to be translated. Here is the code in case you want to try it for yourself but I do warn you that running this might make Roblox unresponsive for a while since it is a lot of info they gave back.
I don't know if I am doing something wrong or not someone please help! I just want it to give me what 'Hello world' would be in french, there are also no error messages.
local http = game:GetService("HttpService")
local Message = "Hello world"
http:UrlEncode(Message) -- 'Hello world' -> 'Hello%20world'
local response = http:RequestAsync(
{
Url = "https://translate.google.com/?sl=en&tl=fr&text=" .. Message .. "!&op=translate";
Method = "GET"
}
)
if response.Success then
print(response.StatusMessage)
print(response.StatusCode)
print(response.Body)
--print(response.Headers)
else
print("The request failed: ", response.StatusCode, response.StatusMessage)
end
When visiting on your browser (for example) the url https://translate.google.com/?sl=en&tl=fr&text=Hello%20World!&op=translate, the translation you see is fetched using Javascript code executed by the browser after loading the page.
The browser retrieves the html body of the page (like you did in your code) and then executes the javascript in the html body which retrieves the translation and updates the page.
Unless you use a browser driver like Selenium I don't see how you can do what you want in a simple way.
Plus, I'm sure that Google has some protection against automatic bots, so after too many request your program will probably will be blocked by ReCaptcha.
The correct way to translate the text is to use the Google Cloud Translate API which I think is free up to 500k requests per month. There is also Azure Translator from Microsoft which also has a free tier.
Your issue is likely in how you are URL Encoding the string.
http:UrlEncode(Message)
HttpService.UrlEncode returns the encoded string as a new value. It doesn't mutate the existing value. So you just need to store the result of the function call.
Message = http:UrlEncode(Message)
EDIT : Just as #Mohamed AMAZIRH pointed out, hitting this URL will only return HTML.

Using Ruby Script to perform a login

my goal is to use a ruby script to perform a login.
The website uses javascript to render the login form therefore I cannot use mechanize. I want to avoid using selenium,
If I were to login with false data, I can see under the network section, that an action url is performed ->
Request URL: https://www.example.com/admin/bocontroller/bocontroller.cfm?action=dologin
further down I can see the Form Data
->
username: Sample
password: 12345678
Based on this I tried to write several scripts (this being the closest i hope...)
require "net/http"
require "uri"
uri = URI.parse("https://www.eample.com/admin/bocontroller/bocontroller.cfm?action=dologin")
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new(uri.request_uri)
request.set_form_data({'username' => 'Sample', 'password' => '12345678'})
request["Content-Type"] = "application/json"
response = http.request(request)
Unfortunately My script just stops running... and I am kind of lost. Can anyone give me some hints to lead me into the right direction? IS this the right approach?
As it seems to have gained some traction as a comment, I thought I'd move it to an answer.
There's a good chance this will be timing out to prevent CSRF attacks. Here's a link to the Rails docs explaining this: https://guides.rubyonrails.org/security.html#csrf-countermeasures.
In a nutshell, sites will send (and require) an authenticity token along with any potentially transformative request (POST, PUT, DELETE, etc.), in order to prevent people from sending such requests from outside the domain - as you're doing.
I'm not suggesting you have ill intent, though this prevents someone attempting to gain access to something they shouldn't, should their actions be designed to work in a manner beyond what the site intends.

Ruby - How can I follow a .php link through a request and get the redirect link?

Firstly I want to make clear that I am not familiar with Ruby, at all.
I'm building a Discord Bot in Go as an exercise, the bot fetches UrbanDictionary definitions and sends them to whoever asked in Discord.
However, UD doesn't have an official API, and so I'm using this. It's an Heroku App written in Ruby. From what I understood, it scrapes the UD page for the given search.
I want to add random to my Bot, however the API doesn't support it and I want to add it.
As I see it, it's not hard since http://www.urbandictionary.com/random.php only redirects you to a normal link of the site. This way if I can follow the link to the "normal" one, get the link and pass it on the built scraper it can return just as any other link.
I have no idea how to follow it and I was hoping I could get some pointers, samples or whatsoever.
Here's the "ruby" way using net/http and uri
require 'net/http'
require 'uri'
uri = URI('http://www.urbandictionary.com/random.php')
response = Net::HTTP.get_response(uri)
response['Location']
# => "http://www.urbandictionary.com/define.php?term=water+bong"
Urban Dictionary is using an HTTP redirect (302 status code, in this case), so the "new" URL is being passed back as an http header (Location). To get a better idea of what the above is doing, here's a way just using curl and a system call
`curl -I 'http://www.urbandictionary.com/random.php'`. # Get the headers using curl -I
split("\r\n"). # Split on line breaks
find{|header| header =~ /^Location/}. # Get the 'Location' header
split(' '). # Split on spaces
last # Get the last element in the split array

Ruby basic syntax and Net::HTTP

I am completely new to ruby. I have the following code:
body = "hello"
site = "api.mysite.net"
port = 80
conn = Net::HTTP.new(site, port)
resp, data = conn.post("/v1/profile", body, {})
puts body
my questions are:
Where should I go for a library on how NET::HTTP.new() , conn.post() etc... works?
What does the comma between resp and data mean?
How come puts body gives me nothing even though I have hello defined initially? And when passed through the post(), I figure it would assign it a value? but instead puts resp.body actually gives me the http response.
This is all so new to me, just trying to get a handle on things.
Read the docs I guess, but you will need background knowledge on HTTP to really understand it.
That's shorthand for assigning two variables at the same time, assuming the right-hand side returns an array of 2 (or more) items.
You've posted the body in your request, resp.body is the body in the response. I don't know why body should be empty though. I would double-check that, but it sounds like a side effect of conn.post if anything.
BTW there are several nice 3rd-party gems which make HTTP client development much easier than dealing with Net::HTTP, e.g. RESTClient, Excon, HTTparty. Check these out. Or if you want to use the standard Ruby library, also look at Open URI as a higher-level API.

Grab Facebook signed_request with Sinatra

I'm trying to figure out whether or not a user likes our brand page. Based off of that, we want to show either a like button or some 'thank you' text.
I'm working with a sinatra application hosted on heroku.
I tried the code from this thread: Decoding Facebook's signed request in Ruby/Sinatra
However, it doesn't seem to grab the signed_request and I can't figure out why.
I have the following methods:
get "/tab" do
#encoded_request = params[:signed_request]
#json_request = decode_data(#encoded_request)
#signed_request = Crack::JSON.parse(#json_request)
erb :index
end
# used by Canvas apps - redirect the POST to be a regular GET
post "/tab" do
#encoded_request = params[:signed_request]
#json_request = decode_data(#encoded_request)
#signed_request = Crack::JSON.parse(#json_request)
redirect '/tab'
end
I also have the helper messages from that thread, as they seem to make sense to me:
helpers do
def base64_url_decode(payload)
encoded_str = payload.gsub('-','+').gsub('_','/')
encoded_str += '=' while !(encoded_str.size % 4).zero?
Base64.decode64(encoded_str)
end
def decode_data(signed_request)
payload = signed_request.split('.')
data = base64_url_decode(payload)
end
end
However, when I just do
#encoded_request = params[:signed_request]
and read that out in my view with:
<%= #encoded_request %>
I get nothing at all.
Shouldn't this return at least something? My app seems to be crashing because well, there's nothing to be decoded.
I can't seem to find a lot of information about this around the internet so I'd be glad if someone could help me out.
Are there better ways to know whether or not a user likes our page? Or, is this the way to go and am I just overlooking something obvious?
Thanks!
The hint should be in your app crashing because there's nothing to decode.
I suspect the parameters get lost when redirecting. Think about it at the HTTP level:
The client posts to /tab with the signed_request in the params.
The app parses the signed_request and stores the result in instance variables.
The app redirects to /tab, i.e. sends a response with code 302 (or similar) and a Location header pointing to /tab. This completes the request/response cycle and the instance variables get discarded.
The client makes a new request: a GET to /tab. Because of the way redirects work, this will no longer have the params that were sent with the original POST.
The app tries to parse the signed_request param but crashes because no such param was sent.
The simplest solution would be to just render the template in response to the POST instead of redirecting.
If you really need to redirect, you need to carefully pass along the signed_request as query parameters in the redirect path. At least that's a solution I've used in the past. There may be simpler ways to solve this, or libraries that handle some of this for you.

Resources