Is there a way to disable or password-protect the search page hosted on a Google Mini appliance? I'm referring to the page descriped in option 1 here.
I basically want to prevent someone from stumbling onto this URL and searching here rather than through one of the actual sites using the appliance.
Try some of these:
delete the default front end
delete the default collection
set the Follow and Crawl URLs for the default collection to blank
edit the XSLT stylesheet under Serving > Front Ends > Output format.
My experience is with the Google Search Applicance, so I hope this works on the Mini too.
Related
I was asked to perform some URL re-writes for a new site with numerous dynamic pages and this has all worked fine.
However when I look at the URLs that Google has indexed, it has indexed the 'non-rewrite' url, so all the '?', '&' etc are being used.
What do you have to do to force Google to index your re-written URLs?
I just assumed it would do this automatically and never expected it to be an issue.
All help is gratefully appreciated.
Thanks.
Steps
1) Make sure that expired pages are no longer publicly accessible
2) Anything you do not wish Bots to crawl should be flagged with appropriate "nofollow" meta tags
3) Submit a new sitemap to your Google Web developer account
4) Make sure your Website throws a 404 error when a page isn't found. It is always a good idea to make a splash page for a 404 error which links back to your home page. (this is accomplished different ways across different server-side languages)
Google will automatically remove indexed pages if they no longer exist.. So be patient.
I found this problem all over the net but no answer yet, so maybe here someone solved it ...?
I built a page relying heavily on jquery.address. It's got one index page and the rest loads dynamically via Ajax following Google's /#!/ scheme for crawlable pages. Now I want to add Facebooks Like or share button but I can't get it to grab the actual page title or url.
Whatever I do, it always falls back to title and url of the index page. It tried:
(obviously) changing title an openGraph meta on load of the new parts.
"linking" the crawler page (?_escaped_fragmet_=xyx) but specifying the #! page in meta
"sharing" with a given title and url.
I never get anything but a link to the index page or a blank "share" to the right url with title and thumbnail ignored.
Has anyone got a similar setup working?
Thanks for any hints,
thomas
Facebook is actually using #! now and it works! If you build your site so that http://site.de/?_escaped_fragment=something is identical to http://site.de/#!/something all you have to do is "share" the #! url and it'll display the info from the escaped fragment page.
Use this URL to check: http://developers.facebook.com/tools/debug
But: A much cleaner solution to the problem can be found here: http://github.com/browserstate/history.js/wiki/Intelligent-State-Handling
My guess would be that Facebook's crawler doesn't run Javascript and will always display whatever's actually in the page it gets from the server.
Facebook share has a BRUTAL cache, last time I checked it was impossible to change the title / description data once it was scraped :(
The issue I had was the og:url and the actual url of the page did not match. I also read a number of comments about the og data being just after the title element, but I don't think that solved anything.
With regard to issues of caching, it is true that Facebook's caching is "brutal", but it does not cache anything for the lint tool: http://developers.facebook.com/tools/debug.
I use no-hash-bang urls when sharing links. I process the hard links and redirect them to a hash bang client side using javascript. That way if a crawler goes to the hard linked page it will display the information just as it would if javascript were enabled.
Compare:
http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Flikeapage.com%2F%23!%2FChristmas%2Fvs%2FBacon
and
http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Flikeapage.com%2FChristmas%2Fvs%2FBacon
Hope this helps.
I have a e-commerce website built in Ajax and Js, when the user type a search keyword the list is pulled via ajax but the browser url, in my case doesn't change, so if the user reaload or simply bookmarks the address he 'll have to start form scratch loosing the keywords input.
i noticed Google instead rewrites the url with the complete query, no hashtag or complex workaround...apparently
how can i achieve that? consider i have complete control on my server so i can set my apache in any way i want.
thanks!!
See this question, almost the same except they used Facebook as a example.
How does facebook rewrite the source URL of a page in the browser address bar?
If you watch the URL in Google Instant, it doesn't change until you hit "Search" or pause for a set period of time (2 seconds, i think).
After this delay, Google refreshes the page with those search queries.
I'm not sure what browser you're using, but I get all the search terms after a hashtag in Chrome (e.g., http://www.google.com/#sclient=psy&hl=en&q=test+test+sibilance&aq=3&...). I don't think what you think is occurring is actually happening. It could be done on Chrome and other HTML5 browsers using history.pushState(), but I don't see Google Instant using that method.
Then it is not instant. Without reloading the page you can only change the fragment identifier in the URL.
My experience is, that after you changed the search, the Google URL is no longer "correct", i.e. it does not represent the latest query.
I've created an AJAX enabled web application. In my application all contents [that I want to be appear in search pages] are loaded using AJAX. However I observed that despite of valid sitemap submitted to google, my page raking is very very poor.
What all I need to do and what to avoid in order to improve page ranking.
Thanks in advance.
you probably want to make it enabled for bookmark and history. There are many ways. One of them is jQuery's history plugin: https://github.com/tkyk/jquery-history-plugin
you probably want to create a page for search engines to crawl your website with those links http://www.mysite.com/foobar.php#!fetch_content=xyz. The #! is a way recognized by Google to crawl and index its content.
reference: http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html
Don'ts would be interesting. But here's a do, for all of JS as well.
Make sure that all links degrade gracefully, this can be easily achieved by giving the links real URLs that lead to the same content that is to be loaded in the event that JS is not enabled. This makes crawling your website possible.
You would also have to disable default for all the affected links.
Google Docs Viewer (http://docs.google.com/viewer) creates a cache of a document after the first viewing. To see what I mean, try the following:
Upload file.pdf to your server (i.e., http://example.com).
Visit http://docs.google.com/viewer?url=http://example.com/file.pdf
Upload a new file to replace file.pdf (but use the same name).
Revisit http://docs.google.com/viewer?url=http://example.com/file.pdf.
Google Docs Viewer still shows the old file.pdf.
Anyone know how to correct this?
(I have already tried clearing browser cache, switching browsers, and logging in with a different google account to view the link.)
It appears there is no way to clear the cache. Although, from my experience, Google tends to do it automatically about once a day.
Maybe if you append a dynamic query string parameter to filename maybe cache will not work.
ex: http://docs.google.com/viewer?url=http://example.com/file.pdf?time=3454354
I added ?time=0
Seemed to work.