Should I use hashbang/shebang? - ajax

I read here that the idea of the shebang (#!) was so Google
knows that an alternative conventional
URL exists providing the same page
"state"
So, if I don't have conventional URLs corresponding to these hash-states, am I right to say that I should be using just a hash and not a shebang?
Background: The hashes are created based on a search form, and the search results are loaded on the same page. The hashes are there so that people can go back to the URL with the hashes and repeat the same search.
More broadly, is there a reason I should have real URLs corresponding to my hashes?

To be clear, is this a javascript powered site? As far as I'm concerned that would be the only reason that one might need to use a #!. With that said, even the #! can be full of issues, as could be seen for a while during the gawker switch earlier this year.

Related

Sphinxdoc/Readthedocs: redirect a "flat" structure to a hierarchical one

I have a Sphinxdoc page on readthedocs with a large number (>1000) of hierarchically sorted pages, https://iraf.readthedocs.io/en/latest/tasks/index.html.
My problem is now that I also need to access them in a "flat" structure, i.e.
https://iraf.readthedocs.io/en/latest/tasks/addstar.html
should redirect to
https://iraf.readthedocs.io/en/latest/tasks/noao/digiphot/daophot/addstar.html
How could one do this? The exact URL does not matter; important is that there is a need to access the pages just by the name (like addstar in the example). There are a few conflicts (i.e. the same name at different places in the hierarchy), but they could be solved in a pragmatic "take the first one" approach.
The Sphinx extension sphinx-reredirects might do what you want.
You could also configure Read The Docs through User-defined Redirects.

How do I take each line of a text file and insert them into a web form? Specifically, for testing domain name availability

I wrote a Ruby script that appended "data" to the beginning of every word of the English dictionary, and then filtered out various strings using different parameters, and now I want to use a site like namecheap or gandi.net in order to take each of these strings and insert them into the domain name availability checker in order to determine which ones are available.
It is my understanding that this will involve making a POST HTTP request of some kind, as well as grabbing the element in question, but I don't really understand the dynamics of what to read about in order to do this kind of thing.
I imagine that after a few requests I will be limited, but as a learning exercise I am still curious as to how I would go about doing this.
I inspected the element (on namecheap) to see what the tag looked like, to find any uniquely identifiable class/id names that I could use to grab that specific part of the source, and found that inside a fieldset tag, there was a line of HTML that I can't seem to paste here, so here is a picture:
Thanks in advance for any guidance in helping me learn about web scripting!

Shorter link urls in Wicket

I am creating an application with a lot of links. Because the links are contained in cells in a table, the urls that are generated by Wicket tend to get long, making the page slower to load.
For example:
2011-06-09 00:00:00.0
I try to figure out where to start exploring the encoding / decoding of URLs, but it is rather complex material. My first approach was to just use 'short' names for components (like "t", "f" etc). I can imagine there is a better approach.
I can image it would be possible just to 'number' the links; as the page still exists, so I would end up with something like this:
2011-06-09 00:00:00.0
Are there solutions for my problem already out there, or can anyone point me to the right direction?
If a Javascript solution is acceptable, you can use a single event listener on the whole table instead of many links in the table.
See this example for an inspiration:
https://github.com/svenmeier/apachecon-wicket/tree/master/src/main/java/eu/apachecon/base/ui/performance
Notice how the Ajax behavior transports dynamic extra parameters to the server. It looks for rows only though. if you need to distinguish between table cells being clicked, you'll have to expand on the idea.
The solution suggested by Sven is the better solution.
Here is a solution which you may call fundamental: register your own root IRequestMapper that will compress/uncompress the generated urls by the real mappers. See CryptoMapper and HttpsMapper for example of custom root mapper.

How do I scrape a site, with multiple pages, and create one single html page with Ruby?

So what I would like to do is scrape this site: http://boxerbiography.blogspot.com/
and create one HTML page that I can either print or send to my Kindle.
I am thinking of using Hpricot, but am not too sure how to proceed.
How do I set it up so it recursively checks each link, gets the HTML, either stores it in a variable or dumps it to the main HTML page and then goes back to the table of contents and keeps doing that?
You don't have to tell me EXACTLY how to do it, but just the theory behind how I might want to approach it.
Do I literally have to look at the source of one of the articles (which is EXTREMELY ugly btw), e.g. view-source:http://boxerbiography.blogspot.com/2006/12/10-progamer-lim-yohwan-e-sports-icon.html and manually programme the script to extract text between certain tags (e.g. h3, p, etc.)?
If I do that approach, then I will have to look at each individual source for each chapter/article and then do that. Kinda defeats the purpose of writing a script to do it, no?
Ideally I would like a script that will be able to tell the difference between JS and other code and just the 'text' and dump it (formatted with the proper headings and such).
Would really appreciate some guidance.
Thanks.
I'd recomment using Nokogiri instead of Hpricot. It's more robust, uses less resources, fewer bugs, it's easier to use, and faster.
I did some scraping extensively for work on time, and had to switch to Nokogiri, because Hpricot would crash on some pages unexplicably.
Check this RailsCast:
http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
and:
http://nokogiri.org/
http://www.rubyinside.com/nokogiri-ruby-html-parser-and-xml-parser-1288.html
http://www.engineyard.com/blog/2010/getting-started-with-nokogiri/

What is the shebang/hashbang for?

Is there any other use for shebangs/hashbangs besides for making AJAX contents crawlable for Google? Or is that it?
The hash when used in a URL has existed since long before Ajax was invented.
It was originally intended as a reference to a sub-section within a page. In this context, you would, for example, have a table of contents at the top of a page, each of which would be a hash link to a section of the same page. When you click on these links, the page scrolls down (or up) to the relevant marker.
When the browser receives a URL with a hash in it, only the part of the address before the hash is sent to the server as a page request. The hash part is kept by the browser to deal with itself and scroll the page to the relevant position.
This is what the hash syntax was originally intended for, so this is the direct answer to your question. But I'll carry on a bit and explain how we got from there to where we are now...
When Ajax was invented, people started wanting to find ways to have a single page on their site, but still have links that people could click on externally to get directly to the relevant content.
Developers quickly realised that the existing hash syntax could do this for them, because it is possible to read the URL's hash value from within javascript. All you have to do then is stop it from scrolling when it sees a hash (which is easy enough), and you've got a bit of the URL which is effectively ignored by the browser, but can be read and written to by javascript; perfect for use with Ajax. The fact that Google includes the hash part of a URL in its searches was just a lucky bonus to begin with, but has become quite important since the technique has become more widespread.
I note that people are calling this hash syntax a "shebang" or "hashbang", but technically that's incorrect; it's just a hash that is relevant -- the 'bang' part of the word "hashbang" refers to an exclamation mark ('bang' is a printing industry term for it). Some URLs may indeed add an exclamation mark after the hash, but only the hash is relevant to the browser; the string after it is entirely up to the site's authors; it may include an exclamation mark or not as they choose, but either way the browser won't do anything with it. Feel free to keep calling it a hashbang or shebang if you like, but understand that only the hash is of significance.
The actual term "shebang" or "hashbang" goes back a lot further, and does refer to a #! syntax, but not in the context of a URL.
The original meaning of this term was where these symbols were used at the beginning of a Unix script file, to tell the script processor what programming language the script is written in.
So this is indeed an answer to your question, the way you've worded it, but is probably not what you meant, since it has nothing to do with URLs at all.

Resources