Is there any other use for shebangs/hashbangs besides for making AJAX contents crawlable for Google? Or is that it?
The hash when used in a URL has existed since long before Ajax was invented.
It was originally intended as a reference to a sub-section within a page. In this context, you would, for example, have a table of contents at the top of a page, each of which would be a hash link to a section of the same page. When you click on these links, the page scrolls down (or up) to the relevant marker.
When the browser receives a URL with a hash in it, only the part of the address before the hash is sent to the server as a page request. The hash part is kept by the browser to deal with itself and scroll the page to the relevant position.
This is what the hash syntax was originally intended for, so this is the direct answer to your question. But I'll carry on a bit and explain how we got from there to where we are now...
When Ajax was invented, people started wanting to find ways to have a single page on their site, but still have links that people could click on externally to get directly to the relevant content.
Developers quickly realised that the existing hash syntax could do this for them, because it is possible to read the URL's hash value from within javascript. All you have to do then is stop it from scrolling when it sees a hash (which is easy enough), and you've got a bit of the URL which is effectively ignored by the browser, but can be read and written to by javascript; perfect for use with Ajax. The fact that Google includes the hash part of a URL in its searches was just a lucky bonus to begin with, but has become quite important since the technique has become more widespread.
I note that people are calling this hash syntax a "shebang" or "hashbang", but technically that's incorrect; it's just a hash that is relevant -- the 'bang' part of the word "hashbang" refers to an exclamation mark ('bang' is a printing industry term for it). Some URLs may indeed add an exclamation mark after the hash, but only the hash is relevant to the browser; the string after it is entirely up to the site's authors; it may include an exclamation mark or not as they choose, but either way the browser won't do anything with it. Feel free to keep calling it a hashbang or shebang if you like, but understand that only the hash is of significance.
The actual term "shebang" or "hashbang" goes back a lot further, and does refer to a #! syntax, but not in the context of a URL.
The original meaning of this term was where these symbols were used at the beginning of a Unix script file, to tell the script processor what programming language the script is written in.
So this is indeed an answer to your question, the way you've worded it, but is probably not what you meant, since it has nothing to do with URLs at all.
Related
I've got a rather large asciidoc document that I translate dynamically to PDF for our developer guide. Since the doc often refers to Java classes that are documented in our developer guide we converted them into links directly in the docs e.g.:
In this block we create a new
https://www.codenameone.com/javadoc/com/codename1/ui/Form.html[Form]
named `hi`.
This works rather well for the most part and looks great in HTML as every reference to a class leads directly to its JavaDoc making the reference/guide process much simpler.
However when we generate a PDF we end up with something like this on some pages:
Normally I wouldn't mind a lot of footnotes or even repeats from a previous page. However, in this case the link to Container appears 3 times.
I could remove some of the links but I'd rather not since they make a lot of sense on the web version. Since I also have no idea where the page break will land I'd rather not do it myself.
This looks to me like a bug somewhere, if the link is the same the footnote for the link should only be generated once.
I'm fine with removing all link footnotes in the document if that is the price to pay although I'd rather be able to do this on a case by case basis so some links would remain printable
Adding these two parameters in fo-pdf.xsl remove footnotes:
<xsl:param name="ulink.footnotes" select="0"></xsl:param>
<xsl:param name="ulink.show" select="0"></xsl:param>
The first parameter disable footnotes, which triggers urls to re-appear inline.
The second parameter removes urls from the text. Links remain active and clickable.
Non-zero values toggle these parameters.
Source:
http://docbook.sourceforge.net/release/xsl/1.78.1/doc/fo/ulink.show.html
We were looking for something similar in a slightly different situation and didn't find a solution. We ended up writing a processor that just stripped away some of the links e.g. every link to the same URL within a section that started with '==='.
Not an ideal situation but as far as I know its the only way.
I am getting data from a broken RSS feed that gives me wrong link. I wanted to fix this link so I made this code:
<link.*>(.*)&.*tid(.*)</link>
and the link could be like:
www.somedomain.com/?value=50&burrrdurrrr;tid=120
But the real working link is in this form:
www.somedomain.com/?value=50&tid=120
The thing that I'm asking is if my measure thing looks like this:
[FeedURL]
Measure=Plugin
Plugin=Plugins\WebParser.dll
Url=[Feed]
StringIndex=2 ;now I only get www.somedomain.com/?value=50
Substitute=#SubstituteFeed#
How am I supposed to concatenate the strings together to complete the url?
I'm guessing rather than &burrrdurrrr;, the link has &, which is how you have to write & in an HTML or XML file.
If that's the case, you just need to set the DecodeCharacterReference option, as described in this handy-looking tutorial. Another option mentioned there is Substitute, which would be able to strip it out even if it really was &burrrdurrrr;.
None of this is a particularly sensible way of dealing with HTML or XML - a much better approach would be a plugin which actually parsed the document structure and let you reference nodes using XPath or CSS rules - but you work with what you've got, I guess. (I've never heard of this "Rainmeter" before, despite its claim to be "the best known and most popular desktop customization program for Windows"; maybe because nobody else calls their program that, instead almost universally using the word "widget"?)
What's a good way to parse HTML in AppleScript?
I haven't dabbled in AppleScript in quite some time, and even when I did it was very minimal and uninvolved, so I don't really think naturally in the language quite yet. But I need to do some string manipulation and parse some HTML (basically some simple screen scraping).
Naturally, I'd like to avoid common pitfalls of HTML parsing. However, this is a temporary script and doesn't need to be particularly robust or supportable. I really just need to scrape specific substrings (from a known starting substring to the next known character) into a file.
I've done plenty of string manipulation in C# and similar languages, but AppleScript is an interesting change of pace to say the least. Can somebody point me to some good resources (Google searches on this subject seem to have a high noise-to-signal ratio), or help me out with some sample code snippets?
The ultimate goal of what I'm doing is to take a pre-determined list of pages, open each one in Safari (I'm doing everything through tell application "Safari"), parse out links which fit a certain pattern, and store all of those links in a file. Then go through that file, open each of those links, parse out more links which fit another pattern, and store all of those links in a file.
(The site is actually owned by someone we're working with, so don't worry about me violating any terms of service or anything like that. But for reasons outside the scope of this question, I'm doing some page scraping in AppleScript.)
I can't say enough good things about Matt Neuburg's AppleScript: the Definitive Guide. Without a doubt the most complete documentation of AppleScript ever done. Matt's also one of my favorite tech writers.
I would also check out this article. It contains a tutorial on how to do this; the example provided there parses HTML data from only one source, but I think it's worth looking at.
I read here that the idea of the shebang (#!) was so Google
knows that an alternative conventional
URL exists providing the same page
"state"
So, if I don't have conventional URLs corresponding to these hash-states, am I right to say that I should be using just a hash and not a shebang?
Background: The hashes are created based on a search form, and the search results are loaded on the same page. The hashes are there so that people can go back to the URL with the hashes and repeat the same search.
More broadly, is there a reason I should have real URLs corresponding to my hashes?
To be clear, is this a javascript powered site? As far as I'm concerned that would be the only reason that one might need to use a #!. With that said, even the #! can be full of issues, as could be seen for a while during the gawker switch earlier this year.
What do you think.. are clean URLs a backend or frontend 'discipline'
The answer is BOTH.
For example:
https://stackoverflow.com/questions/203278/are-clean-urls-a-backend-or-a-frontend-thing
The number above is a database id, a back-end thing. Chop off the pretty part and it goes to the same page. Therefore the "are-clean-urls-a-backend-or-a-frontend-thing" is part of the front-end thing.
If we're talking url's being 'clean' from an end user experience then I'm going to break the mould a bit and say that url's in general are not intuitive and they never will be, they are intended to be machine readable.
There is no standard to the format of a url such that when navigating from site to site humans will never ever remember how to reach a resource purely through remembering urls and their 'friendly syntax'. We can argue the toss about whether using a '?' and '&' or '/' to express how how to identify a resource via a url; is one method better than the other? it doesn't matter. At the end of the day a machine parses it and sends back the result.
We should stop deluding ourselves that people actually type these things in and realise that uri's are for machines, not people.
I have yet to use/remember a uri that goes beyond the first few characters of the http://domain.com/ part of an address, and I've been using the web since a long time. That's what bookmarks are for. Nowhere on a website does it say 'change this part here in our url to view 'whatever else' resource' because url's are usually undocumented and opaque.
Yes make your uri's SEO friendly (hell even they change periodically) but forget about the whole 'human/clean' resource identifier thing, it's a mystical pipe dream.
I agree with Vlion that url's should provide a unique mechanism to bookmark a resource and return to it (unlike some of these abominable web 2.0 ajax/silverlight/flash creations), but the bookmark will never be for humans to comprehend and understand. There seems to be quite a lot of preoccupation and energy spent in dreaming up url strategies that humans can remember and type in, it's a waste of energy. Let's get on and solve real problems.
Sorry for the rant, but there's a lot of web 2.0 nonsense related to urls going on in certain circles that are just a total waste of time.
Now that Firefox's Awesome bar and Google Chrome's Omnibox address bars can be used to search the browsing history it makes it much easier for users to search their history for previously visited sites, so having clean urls may help the user find sites in their history more easily.
Making sure the page has an appropriate Title is important (as both browsers search the title as well as the url) but by making sure the url has relevant keywords in it as well, when those keywords are typed in the address bar the urls will be more likely to show up higher in the suggestions as the keyword will be matched twice, in the url and the title.
Also, once a user has typed the name of a site they will be presented with example urls from the site which they can then use as a template for narrowing down their search. So using verbs and nouns in the url for different sections or actions of the site will aid the user to narrow their search to just the part of the site they are interested in, e.g. the /questions/ or /tag/ sections of stackoverflow, or the "/doc" at the end of docs.google.com/doc that can be used to view just document pages on Google docs*.
Since both Firefox and Chrome search for each space separated word typed into the address bar, it could be argued that it isn't necessary for searching that the url be completely human readable, but to allow the user to actually read the keywords they are interested in from the url the amount of "noise" should be kept to a minimum.
* which are of the form http://docs.google.com/Doc?id=gibberish
My perspective is simple:
every place I visit with my browser(with various edge case exceptions) should be bookmarkable and Forward/Back should be usable and not destroy any data entry.
Backend for sure. Your server is the one that has to take care of the routing to the resources requested by the URL.
I think the main reasons for using friendly URLs are:
Ease of linking / sharing
Presentation
Seo
So I think it's purely a client-side pleasure. While they're nice on the server as well, they're not mission critical.