I'm trying to use HtmlUnit to web-scrape a site, but the start-page requires support for navigator.mediaDevices, which it appears is not supported in HtmlUnit(?)
In the page returned, there is a piece of javascript, like this:
if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia)
{
...
windows.stop();
}
Is there a way to get around this?
Or any other similar programmable "browser" that supports this feature?
Yes this is not supported at the moment. Please open an issue on github, adding the support is not that tricky.
lang and trying to use a framework called iris-web.
i wanted to know how to get the http referrer and user-agent properly..
I've been through the documentation pdf. No clear explanation..
Any help will be deeply appreciated.
As in regular Go http:
request.Header.Get("User-Agent")
request.Header.Get("Referer")
import "github.com/kataras/iris"
// [...]
func(ctx iris.Context){
uag:= ctx.GetHeader("User-Agent")
ref:= ctx.GetHeader("Referer")
}
Iris provides features that other frameworks doesn't have while in the same time it's faster. Iris ,as github project is more active than others. There is no reason to suggest other lower frameworks when the user asks for Iris answer.
#Sergey Lanzman What's the problem with Iris?
There is a library in Python that I love called "Requests". Requests is a HTTP client build on urllib3. "requests doc".
I am looking for something similar in Ruby. Basically what I need is:
Upload files support (multipart/form-data).
Easy get/post.
Cookies can be passed from a response object to a request object (build manually login script).
Stable and Flexible.
Sessions support (to not have to handle cookies manually if we don't have too).
I've looked at Typhoeus, but the code example in the home page doesn't work; they have moved code along and the get method is not longer directly accessible like that, so it's not starting well. Curb seems nice and I like cURL, there is also rest-client, which seems popular, and em-http seems pretty fast according to benchmark. There is a also Patron and curb-fu, which I haven't have the time to try. And, of course, Net:HTTP. But, it doesn't seem to have a mainstream solution that everyone points to.
I think a lot of people have been in my situation and I wonder what they have choosen and why?
The author of the comparison is the author of httpclient, but from the looks of it the comparison is fair.
For a more narrative style with some explanation of the matrix, see http://www.slideshare.net/HiroshiNakamura/rubyhttp-clients-comparison from the same author.
The comparison comes out partly in favor of httpclient, which I can also recommend. Simple, featureful, compatible with all Ruby platforms and performant. Better cookie support than anything else out there, but the presentation mentions that cookies may leak from one (malevolent) site to another if you use the same client object. Don't know if this is still true.
There is https://github.com/cyx/requests, which is exactly what the question is asking for, a port of the requests lib from python.
The built-in OpenURI is the first place to look. It's simple and handles the basics nicely.
Typhoeus, which I've used several times for parallel processes, works nicely. Documentation and the codebase are available at Github.
irb(main):009:0> response = Typhoeus::Request.get("www.example.com")
=> #<Typhoeus::Response:0x007ffbcc067cf8 #code=302, #curl_return_code=0, #curl_error_message="No error", #status_message=nil, #http_version=nil, #headers="HTTP/1.0 302 Found\r\nLocation: http://www.iana.org/domains/example/\r\nServer: BigIP\r\nConnection: close\r\nContent-Length: 0\r\n\r\n", #body="", #time=0.035584, #requested_url=nil, #requested_http_method=nil, #start_time=nil, #start_transfer_time=0.035529, #app_connect_time=2.8e-05, #pretransfer_time=0.000429, #connect_time=2.8e-05, #name_lookup_time=2.8e-05, #request=:method => :get,
:url => www.example.com, #effective_url="HTTP://www.example.com", #primary_ip="192.0.43.10", #redirect_count=0, #mock=false>
irb(main):010:0> puts response.headers
HTTP/1.0 302 Found
Location: http://www.iana.org/domains/example/
Server: BigIP
Connection: close
Content-Length: 0
I use Net::HTTP occasionally too, but OpenURI and Typhoeus, with Hydra, have proven to be easy to use and integrate with my code.
I've eventually found this HTTPClient :
https://github.com/nahi/httpclient
I've started using it, it matches the features I wanted, and more over it's pretty fast according to some benchmark. It also support some advanced things like streaming or chunked response. It's shame though it's not famous in the ruby community. :)
Have you looked at the HTTParty gem?
If you need cookies and form handling, mechanize is the only way to go.
I'm sorry to hear, that Typhoeus didn't work out for you. The reason is, that the README shows howto work with Typhoeus v0.5.0.rc which can be installed with
gem install typhoeus --pre
or
gem "typhoeus", git: "git://github.com/typhoeus/typhoeus.git"
.
There is no session support for Typhoeus but other than that it could be a good fit. At least its stable as hell since it is build on top of libcurl.
File sending example:
Typhoeus.post("www.example.com/file", body: { file: File.open("testfile.txt","r") })
There is unfortunately no shortcut to deal with cookies, you have to set them manually:
Typhoeus.get("www.example.com/needs_cookie", headers: { Cookie: "PRIVATE" })
TLDR: I would choose Typhoeus for its speed and libcurl if you're willing to set things up yourself. Otherwise I would look into Faraday and use it with the Typhoeus adapter.
Edit: I've added installation instructions to the README.
Edit: 0.5 is released.
This question seems to be lacking recent answers. So am filling in the void.
Coming from python myself, and having loved requests library for what it does easily, I recently discovered a very nice Ruby equivalent in rest_client
It supports all the features mentioned in the question, and seems to be very nice from usability perspective - what requests library aimed to achieve.
I'd like to enable GZIP compression for public assets and HTTP responses for performance. My site has a lot of mobile access.
As far as I can tell, there is nothing built into Play Framework to support this, and Heroku doesn't seem to have a solution either.
What is the best way to start getting some compression on my app?
Check out: https://gist.github.com/1317626
Ok I've had a go at minifying and gzipping all STATIC resources - nobody seems to have attempted this yet.
Play provides a hook by means of a plugin, but it seems a little hacky as you have to set all the caching headers and stuff.
Seems to work so far though:
https://gist.github.com/2882360
Check Minifymod module:
http://www.playmodules.net/module/7
Is there a Ruby http client library where responses are automatically cached by ETag and the If-Non-Match header applied to requests on previously used URLs?
You might want to check the list of "Ruby HTTP clients features" (archived version from January 2015) for a complete overview.
Take a look at Faraday-HTTP-Cache.
rufus-jig supports conditional GET.