I'll soon be doing a presentation on the basics of HTTP for colleagues where I work.
I've done this sort of thing a number of times, and one thing I like to do is telnet directly to an http server and send the various headers that way. The idea is to show the simplicity of the protocol, and remove browsers from the discussion.
In the past, I've used a text document to copy the headers from because of typos and timeouts. So, it goes something like this:
telnet to somewebserver.com:80
For the first go, simply type in GET, etc. This emphasizes the fact that it's simple, just text, etc.
For later requests copy and paste the request from a text document.
Etc...
It would be nice if there was a way to replay previous commands, similar to the way various shells' history works. However, searching for http [interactive] shell is a bleak wasteland of irrelevance.
Does such a thing exist? Or am I off base in my search terms? Any advice is welcome, including suggestions about other tools or tips for building my own.
I'll likely be doing the presentation on a Macintosh.
Thanks!
Greg
My answer is based on #BenjaminW's.
HTTPie appears to do what I want.
https://httpie.org/
I suggest you use PostMan, I've used it to develop RESTful APIs for several years, it can send any kind of http request, sync your browser's cookie, I think it's also a great tool to manage your http request.
Related
I am playing a Flash-only game that uses AJAX to communicate with the server. The problem is that all the data is "drawn" and most of it not copy/pastable, so I end up retyping URLs and similar stuff from parts of it (i.e., from the chat).
I thought I'd make a simple page action extension for Chrome that would intercept all the AJAX communication between the game and the server, the way Developer tools can do it, and display only the data I'm interested in (parsing URLs and similar stuff is a no-brainer).
However, looking around the internet, I've found no info on how to do this. Many sites (including answers to some questions here) mention using Developer Tools (I'd prefer having a page action extension, simple enough to share with other players, but any other automation is welcome as well), some mention chrome.webRequest (which seems to be able to provide only the headers),...
I also thought of making a content script along the lines of this answer, but since I'm trying to read the data between a Flash applet (not a web page) and a server, I don't think injecting a JavaScript code is possible.
So, my question is: can this be done and, if yes, how?
In case anyone got the wrong idea, the aim of this is only to monitor the communication and extract the parts I'd want to be able to copy/paste, not change any data (i.e., the purpose is simplification of the game play, not cheating).
So I am attached to this rather annoying project where a clients client is all nit picky about the little things and he's giving my guy hell who is gladly returning the favor by following the good old rule of shoving shi* down the chain of command.
Now my question. The application consists basically of 3 different mini projects. The backend interface for the administrator, backend interface for the client and the frontend for everyone.
I was specifically asked to apply MOD_REWRITE rules to make things SEO friendly. That was the ultimate aim, so this was basically an exercise in making things more search friendly rather than making the links aesthetically better looking.
So I worked on the frontend, which is basically the landing page for everyone. It looks beautiful, the links are at worst followed by one backslash.
My clients issue. He wants to know why the backend interfaces for the admin and user are still displaying those gigantic ugly links. And these are very very ugly links, I am talking three to four backslashes followed by various get sequences and what not, so you can probably understand the complexities behind MOD_REWRITING something such as this.
In the spur of the moment I said that I left it the way it was to make sure the backend interface wouldn't be sniffed up by any crawlers.
But I am not sure if that's necessarily true. Where do crawlers stop? When do they give up on trying to parse links? I know I can use a .robot file to specify rules. But, as indigenous creatures, what are their instincts?
I know this is more of a rant than anything and I am running a very high risk of having my first question rejected :| But hey, it feels good to have this off my chest.
Cheers!
Where do crawlers stop? When do they give up on trying to parse links?
Robots.txt does not work for all bots.
You can use basic authentication or limited access by IP to hide back-end, if no files are needed for front-end.
If not practicable, try to send 404 or 401 headers for back-end files. But this is just an idea, no guarantee.
But, as indigenous creatures, what are their instincts?
Hyperlinks, toolbars and browser-sided, pre-activated functions for malware-, spam- and fraud-warnings...
With regard to Google's AJAX crawling spec, if the server returns one thing (namely, a JavaScript-heavy file) for a #! URL and something else (namely, a "html snapshot" of the page) to Googlebot when the #! is replaced with ?_escaped_fragment_=, that feels like cloaking to me. After all, how is Googlebot sure that the server is returning good faith equivalents for both the #! and ?_escaped_fragment_= URLs. Yet this is what the AJAX crawling spec actually tells webmasters to do. Am I missing something? How is Googlebot sure that the server is returning the same content in both cases?
The crawler does not know. But it never knows even for sites that return plain ol' html either - it is extremely easy to write code that cloaks the site based on http headers used by crawlers or known IP headers.
See this related question: How does Google Know you are Cloaking?
Most of it seems like conjecture, but it seems likely there are various checks in-place, varying between spoofing normal browser headers and actual real-person looking at the page.
Continuing the conjecture, it certainly wouldn't be beyond the capabilities of programmers at Google to write a form of crawler that actually retrieved what the user sees - after all, they have their own browser that does just that. It would be prohibitively CPU-expensive to do that all the time, but probably makes sense for the occasional spot-check.
I came across this:
https://github.com/archiloque/rest-client ...and it seems fairly simple and straight forward. But, working with third-party APIs is new to me, so I'm not sure what's important in a library and most of all, which is easiest to use.
Does rest-client offer any advantage over the standard Net::Http?
I also found https://github.com/jnunemaker/httparty, though it doesn't seem to be as well documented as rest-client or, even this one: https://github.com/dbalatero/typhoeus. Are they better than the included standard?
Any thoughts, suggestions?
Net::HTTP is meant to be a low level library for accessing networked resources. The third-party APIs make up for some of the difficulties that you'd otherwise have to handle yourself. To name a few:
Handling redirect codes
Implementing multipart file uploads
Storing cookies between requests
HTTP exception handling
Parsing responses (HTML, JSON, etc.)
Managing authentication/SSL on secure sites
In general, the authors of those libraries have taken extra care to make their API easy to use compared to Net::HTTP.
Also, I've found Mechanize to be a more complete solution for my needs than rest-client. For example, with rest-client you will still have to implement storing cookies between requests and handling redirects on POST requests.
You may find useful this short article from Adam Wiggins, initial author of RestClient:
http://adam.heroku.com/past/2008/8/8/ruby_libs_for_making_web/
I personally am using httparty in my project - this was choice of previous developer, but it works for me pretty well.
I wish to programmatically download a webpage which requires a log in to view. Is there any sane way of doing this? By looking at HTTP headers and such, I can see the username / password being passed as POST data, but requesting a page with this info attached isn't good enough. I think cookies are involved too, and it looks like they contain some kind of encrypted authorisation data.
Is there any way of faking this? Language isn't too important here, but something like Perl that can be run on Linux with relative ease would be nice. Or maybe a command line browser could be scripted?
Yes, you can do this via the curl command-line tool or the CURL library. You need to figure out what's supposed to be in the cookies, and then pass them with curl's -b option or the equivalent CURL API.
You can also perform HTTP Basic authentication via CURL.
If the page is really sophisticated, you'll have to do HTML parsing or even JS interpretation to extract the cookie data beforehand. That's still doable, but not with CURL alone.
As a general note, anything a web browser can do can be scripted. Turing-completeness and all that. "Unscriptable" captive portals like BlueSocket sells are a load of bunk; they're basically just obfuscated web pages. They'll slow you down but can never, ever stop you - they have to give you the keys in order to work!
Php's CURL would do it. Also check here if this solution is right for you.