Why do sites like twitter, gawker use #! instead of simple URL? [duplicate] - ajax

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What's the shebang (#!) in Facebook and new Twitter URLs for?
Twitter's profiles now have URL in the form of:
http://twitter.com/#!/username
instead of the simpler structure:
http://twitter.com/username
What does #! do? What is the advantage of using #!? I read that it's related to google's web crawler, but I don't understand how exactly does that work.

There are two parts to this:
Why a fragment identifier instead of a real page?
Because they are overusing Ajax. Instead of linking to a new page, they link to a non-existent or dynamically generated fragment of the current page and then use JavaScript to change the content.
Why start the fragment identifier with !
Because Google will map it onto a different URL so you can serve up a special alternative version just for them. This allows the content to be indexed by search engines.

In a URL, the contents after the hash mark (#) are not sent to the server, but is instead visible to JavaScript on the page. So, using a # basically allows the page "http://twitter.com/" to handle it (for example, by opening up background connections to load up additional data). This also means that the content that doesn't change from one page to another (think the general layout of the page) can be cached and served immediately (since the effective URL is still "http://twitter.com/"), whereas putting it in the path of the URL (without the hash) would require a full separate fetch to get that layout.

Related

Google Custom Search API returning HTML documents instead of images

I started using the Google Custom Search API for a project, the idea is to search for images, and I wanted to use the Custom Search because the Google Images API is deprecated.
I already enabled image search on the CSE console
My query is like this:
https://www.googleapis.com/customsearch/v1?key=APIKEY&cx=CSECX&q=flower&alt=json&searchType=image&num=1&start=NUMBER
Where NUMBER is a random value between 1 and 20
Sometimes, it returns results like this:
{u'kind': u'customsearch#result', u'title': u'Flower Wallpaper Tumblr #6790199', u'displayLink': u'7-themes.com', u'htmlTitle': u'<b>Flower</b> Wallpaper Tumblr #6790199', u'snippet': u'Flower Wallpaper Tumblr', u'htmlSnippet': u'<b>Flower</b> Wallpaper Tumblr', u'link': u'http://7-themes.com/data_images/out/7/6790199-flower-wallpaper-tumblr.jpg', u'mime': u'image/jpeg', u'image': {u'thumbnailWidth': 150, u'byteSize': 808360, u'height': 1200, u'width': 1920, u'contextLink': u'http://7-themes.com/6790199-flower-wallpaper-tumblr.html', u'thumbnailLink': u'https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSad0z_Wla0nRHAcQrjO5jLQkFjcoqnNHhejjuGmdA1AW2BqIVEpLARAk0s', u'thumbnailHeight': 94}}
Highlighting the interesting part:
u'link': u'http://7-themes.com/data_images/out/7/6790199-flower-wallpaper-tumblr.jpg', u'mime': u'image/jpeg'
So it seems that the URL is http://7-themes.com/data_images/out/7/6790199-flower-wallpaper-tumblr.jpg and mimetype is image/jpeg, but if you go to the URL, you'll see it's not an image, but an HTML document
Of course, I could capture this as an exception, but I don't want to waste daily API requests (out of a 100 limit per day) because the API didn't give me an image when I explicitly said so.
So, the question is: Is this normal behaviour, or misconfiguration/misuse on my part? If so, how could I fix it?
Thanks for your attention
After a little bit of reading, my best guess is that some servers are doing a resource redirect to prevent external sources from hotlinking directly to a resource. The file in question is advertised as an image, but accessing it from an external server will provide an HTML document instead. This is not a URL redirect, so it isn't detected by clients (including the Google crawler) until the resource is downloaded.
This sort of resource redirect is done on Apache servers using the .htaccess file and the RewriteEngine, with a technique similar to the one described here, although that particular technique can't be used to bait-and-switch images for HTML documents.
In short, if a server is lying about what type of file it's hosting, Google can't do anything about that. You can confirm that this is not an issue with the custom search API by performing the same query on the normal web search interface -- notice that clicking the image loads an HTML document rather than the image itself.

AJAX search - parsing and reading the URL parameters with hash tags

we've implemented a new AJAX based search on our website. We're adding the parameters and their values with # tag at the end of the main URL, when user makes further refine by applying additional filters.
This was done to enable our users to share the URL of what they were viewing. It's actually now achieved in a way that the page gets redirected and the content is generated first for the base URL. Using a Javascript function which executes onload looks at the parameters in the # tags and makes another AJAX hit.
Questions:
Why browsers are not sending the # thing to the server. i.e.; # part is not even received by the HTTP Server. It's interesting actually, browsers are not sending them at all
What is best way to get the # values? I'm looking at more of to avoid the double hit that we've implemented right now. i.e.; content is loaded already and then making another AJAX call to apply the refines.
The # value is an instruction to the browser to look for a named anchor in the document it is to load from the server. It is interpreted and actioned by the browser. The server can do nothing with it, so there's no point in sending it. If you're trying to use this for some other purpose then you'll run into difficulties - as you have found.
There is a mechanism for sending data to the server: the querystring. Append your parameters to the URL prefixed by a ?, in the form variablename=data, with successive variables separated by a &.

filtering kml in a static map

i'm developing a desktop application, not web.
The software environment is Windows and VB10.
In my user interface I have a browser where I want to show a map, issuing an address like http://maps.google.com/maps?q= and then I indicate a URL where I have put a KML file with my data.
The problem is: is it possible to filter the data in the KML file in order to show only a subset of them ?
Basically you have two options:
Pass parameters to a service which generates your filtered KML on the fly.
Do it in JavaScript in your browser interface.
Based on your question, I am going to assume option one is out. For option two there are tons of examples on the web, but basically you need to parse the KML yourself and write JavaScript code to handle it however it needs to be done to achieve your filtering, you cannot pass the KML URL to google maps directly and achieve any of this behaviour.
Possibly useful example: http://www.gpsvisualizer.com/examples/google_folders.html
UPDATE
Based on conversation in the comments:
The only other thing I can think of is to create your own map page with the JavaScript to do what you want on it (like http://gpsvisualizer.com/examples/google_folders.html linked above) and then embedding it in your app instead of the google map. Essentially encapsulating the features you want. So instead of maps.google.com/maps?q= in your app you have myMapURL.com/MyMap?querystring which is your google maps wrapper with the desired filtering. Otherwise I think you are out of luck based on your current setup.

Ajax results filtering and URL parameters

I am building a results filtering page using AJAX requests. I would like to reflect the filters in the URL. For example: for price_from I want to add ?price_from=VAL to the URL.
I have a backend that is capable of rendering the page with URL parameters.
After some googling I would a Backbone.router solution which has a hash fallback for the IE that does not support HTML5 history API.
I have a problem with setting a good philosophy of routes. I have a set of filtering parameters (price_from, price_to, color, ...) and I would like to attach each parameter to one route.
Is that possible to chain the routes to match for example: ?price_from=0&price_to=1&color=red? (the item order can change)
It means: call all the routes at the same time and keep the ie backwards compatibility?
Your best bet would be to have a query portion of the URL rather than using GET parameters to denote the search criteria. For example:
Push state: /search/query/price_from=0&price_to=1&color=red
Hash based: #search/query/price_from=0&price_to=1&color=red
Your backend would of course need to change a bit to be able to parse the new URL structure.

How to differentiate from the server side, between the first request of the browser (HTML file) and the following (images, CSS, scripts...)?

I'm programming a website with SEO friendly links, ie, put the page title or other descriptive text in the link, separated by slashes. For example: h*tp://www.domain.com/section/page-title-bla-bla-bla/.
I redirect the request to the main script with mod_rewrite, but links in script, img and link tags are not resolved correctly. For example: assuming you are visiting the above link, the tag request the file at the URL h*tp://www.domain.com/section/page-title-bla-bla-bla/js/file.js, but the file is actually http://www.domain.com/js/file.js
I do not want to use a variable or constant in all HTML file URLs.
I'm trying to redirect client requests to a directory or to another of the server. It is possible to distinguish the first request for a page, which comes after? It is possible to do with mod_rewrite for Apache, or PHP?
I hope I explained well:)
Thanks in advance.
Using rewrite rules to fix the problem of relative paths is unwise and has numberous downsides.
Firstly, it makes things more difficult to maintain because there are hundreds of different links in your system.
Secondly and more seriously, you destroy cacheability. A resource requested from here:
http://www.domain.com/section/page-title-bla-bla-bla/js/file.js
will be regarded as a different resource from
http://www.domain.com/section/some-other-page-title/js/file.js
and loaded two times, causing the number of requests to grow dozenfold.
What to do?
Fix the root cause of the problem instead: Use absolute paths
<script src="/js/file.js">
or a constant, or if all else fails the <base> tag.
This is an issue of resolving relative URIs. Judging by your description, it seems that you reference the other resources using relative URI paths: In /section/page-title-bla-bla-bla a URI reference like js/file.js or ./js/file.js would be resolved to /section/page-title-bla-bla-bla/js/file.js.
To always reference /js/file.js independet from the actual base URI path, use the absolute path /js/file.js. Another solution would be to set the base URI explicitly to / using the BASE element (but note that this will affect all relative URIs).

Resources