Using Heroku's Free Domain - heroku

I want to use the domain Heroku is hosting my Rails app on. It is something along the lines of http://quiet-waters-7769.herokuapp.com. Can I make this viewable in sites like google? Even if I search for an exact string within my page, google doesn't return my site. It seems like all Heroku domains are unlisted. Is this something I can change in my Heroku settings?

Check your robots.txt and all your meta tags, and make sure you're not telling google to NOT index your site.
Using meta tags to block access to your site
Using robots to block google indexing
Also, you have to give google time, and a reason to index your site. Just because you have a website, doesn't mean Google will index all your pages.
herokuapp.com domains get indexed fine.
https://www.google.com/#q=site:herokuapp.com

Related

Google Search Console verification fails on single site while many others had no issue

Interestingly, this is apparently the official way to reach Google API support? (...akin to Microsoft/SO's documentation partnership?) Interesting — but obviously this limits the private information that I can include in my "support request"...
I have added-then-verified 400+ domains (with each of their http/https/www/no-www variations, for 800+ total) on Google Search Console via the related API's, without issue.
One domain is giving me a problem with verification via 'HTML File Upload', even though it's triple-checked to be set-up the same as the other 825 that verified without issue.
I compared WHOIS and intodns.com DNS Health report and I also cleared the DNS Cache and waiting a couple days to see if it was a caching issue.
        
I've tried multiple verification methods, but this error persists on both the http:// and http://www. versions of the one site. The site itself works fine and I can't see any anomalies with it on my end.
I'm not sure if this could be related but the webmaster's site list, does include one strange property that is apparently verified (in addition to the two unverified versions of the problem domain):
        
          (I've masked the ID number since I have no idea what it represents.)
How can I get my ownership of this site verified on Google Search Console?
You can verify your site ownership by the alternate method. By inserting HTML tag you can verify your ownership easily. From search console you will get the HTML Tag. The Other way is to verify the ownership is Google Tag Manager and Google Analytics.
HTML Tag Sample is: <meta name="google-site-verification" content="String_we_ask_for">

MEAN-SEO not working as expected

I have a project in meanjs.
It has html5mode disabled so my URLS are like that:
http://localhost:3000/#!/products
I am trying to implement AJAX snapshoots in order to allow Google Crawlers to see content generated by javascript on client side.
I installed a module called MEAN-SEO:
http://blog.meanjs.org/post/78474995741/mean-seo
Now when I access the following URL:
http://localhost:3000/?_escaped_fragment_=
I am redirected to:
http://localhost:3000/?_escaped_fragment_=/#!/
And when I click on "products" or when I access directly, I am redirected to:
http://localhost:3000/?_escaped_fragment_=/#!/products
After reading the Google specification detailed here https://developers.google.com/webmasters/ajax-crawling/docs/getting-started , what I need is to get is something without hashbangs, like the following:
http://localhost:3000/?_escaped_fragment_=/products
What I am doing wrong?
Kind Regards.
Any specific reasons why you want html5mode off?
Here is something a lot of people have missed: Search engines (both Google and Bing) can now handle AJAX based content.
Their crawlers now understands pushstates, so if you just turn html5mode on you don't need any special handling to get your SEO working. You can load your content via AJAX, you can set title tags and meta tags with javascript and so on and so forth, and the crawlers will understand your content the same as if you had rendered things server-side. There is no need to do html-snapshotting or escaped_fragment handling for SEO anymore.
This has been announced on their developer blogs but unfortunately most of the documentation hasn't been updated with this information, so it's gone under the radar for a lot of people.
One word of warning though, Facebook does not handle pushstates, so if you want to support the Facebook crawler you still need to handle that separately.

URL Re-writes and Google Indexing

I was asked to perform some URL re-writes for a new site with numerous dynamic pages and this has all worked fine.
However when I look at the URLs that Google has indexed, it has indexed the 'non-rewrite' url, so all the '?', '&' etc are being used.
What do you have to do to force Google to index your re-written URLs?
I just assumed it would do this automatically and never expected it to be an issue.
All help is gratefully appreciated.
Thanks.
Steps
1) Make sure that expired pages are no longer publicly accessible
2) Anything you do not wish Bots to crawl should be flagged with appropriate "nofollow" meta tags
3) Submit a new sitemap to your Google Web developer account
4) Make sure your Website throws a 404 error when a page isn't found. It is always a good idea to make a splash page for a 404 error which links back to your home page. (this is accomplished different ways across different server-side languages)
Google will automatically remove indexed pages if they no longer exist.. So be patient.

Hashbang URLs make the website difficult to crawl by Google?

Our agency built a dynamic website that uses a lot of AJAX interactions and #! (hashbang) URLs: http://www.gunlawsbystate.com/
It's a long book which you can scroll through and the URL in the address bar changes dynamically. We have to support IE so please don't advise using pushState — hansbang is the only option for us for now.
There's a navigation in the left sidebar which contains links to all chapters in the book.
An example of a link:
http://www.gunlawsbystate.com/#!/federal-properety/national-parks-and-wildlife-refuges/
We are expecting google to crawl this:
http:// www.gunlawsbystate.com/?_escaped_fragment_=/federal-properety/national-parks-and-wildlife-refuges/
which is complete html snapshot of the section. (+ there are links to the subsections like www.gunlawsbystate.com/#!/federal-properety/national-parks-and-wildlife-refuges/ii-change-in-the-law/ => www.gunlawsbystate.com/?_escaped_fragment_=/federal-properety/national-parks-and-wildlife-refuges/ii-change-in-the-law/ ).
It all looks to be complete according to the Google's specifications ( developers.google.com/webmasters/ajax-crawling/docs/specification ).
The site is run for about 3 months for now. The homepage is getting re-indexed every 10-15 days.
The problem is that for some reason Google doesn't crawl hashbang URLs properly. It seems like Google just "doesn't like" those URLs.
www.google.ru/search?&q=site%3Agunlawsbystate.com :
Just 67 pages are indexed. Notice that most of the pages Google indexed have "normal" URLs (mostly wordpress blog posts, categories and tags) and just 5-10% of result pages are hashbang URLs, although there are more than 400 book sections with unique content which Google should really like if it crawles it properly.
Could someone give me an advise on this, why Google does not crawl our book pages properly? Any help will be appreciated.
P.S. I'm sorry for not-clickable links — stackoverflow doesn't let me post more than 2.
UPD. The sitemap has been submitted to google a while ago. Google Webmaster Tools says that 518 URLs submitted and just 62 URLs indexed. Also, on the 'Index Status' page of the Webmaster Tools I see that there are 1196 pages Ever crawled; 1071 pages are Not selected. It clearly points to the fact that for some reason google doesn't index the #! pages that it visits frequently.
You are missing a few things.
First you need a meta tag to tell google that the Hash URLS can be accessed via a different url.
<meta name="fragment" content="!">
Next you need to serve a mapped version of each of the urls to googlebot.
When google visits:
http://www.gunlawsbystate.com/#!/federal-regulation/airports-and-aircraft/ii-boarding-aircraft/
It will instead crawl:
http://www.gunlawsbystate.com/?_escaped_fragment_=federal-regulation/airports-and-aircraft/i-introduction/
For that to work you either need to use something like PHP or ASP to serve up the correct page. Asp.net routing would also work if you can get the piping correct. There are services which will actually create these "snapshot" versions for you and then your meta tag will point to their servers.
Since it is deprecated by Google and now Google is not able to access the content under hashbang URLs.
Based on research Google avoids Escaped fragment URLs now and suggesting to create separate pages rather than using HashBang.
So I think PushState is the other option which can be used in this case.

How to filter myself out of Google Analytics with a dynamic IP address?

Does anyone know how to setup Google Analytics to filter yourself out if you're visiting the site from a dynamic IP address? I don't want to include myself in my stats from home use where I have a dynamic IP address via Verizon FiOS.
Google currently has a browser add-on that will block any visits of yours from showing up in any Analytics. http://tools.google.com/dlpage/gaoptout
Pluses and minuses of this opt-out versus filters are discussed in this blog post.
There are a couple ways of doing this. If you know the range of IP addresses you're accessing your site from (and don't mind filtering them all out) you can set up an "Exclude" filter for that range of IP addresses. If that's too restrictive, you can set a cookie using the Google Analytics code and filter on that. Both techniques are documented at Google's help system.
Alternatively, if you're dynamically producing the pages on the server, you could simply not write the Google Analytics code into the pages in the first place, based on the currently logged in user. On my site, I'm choosing to write the code or not based on a few things, such as whether the website is running in debug mode or if an administrator is logged on.
You can do this by creating a special page on your site that sets a Google Analytics segmentation cookie, using code something like:
<body onLoad="javascript:__utmSetVar('exclude_from_report')">
Then create a custom filter in Analytics to exclude visitors that match the 'exclude_from_report' segment pattern.
Consider using the NoScript plugin for Firefox. Just mark google-analytics.com as an untrusted site and you should be all set. A nice side-benefit: better security in your browser.
Just block the domain where google analytics lives via your system's hosts file:
127.0.0.1 www.google-analytics.com
This is less disruptive than the NoScript plugin mentioned by jdigital, but still makes you effectively invisible to google analytics.
Setting a cookie to prevent the analytics code from being sent to the browser is by far the best option.
If you're a developer and concerned that you're going to get a bazillion hits while you're developing the site you can add the following line in your analytics tracking code :
pageTracker._setDomainName(".yourwebsitename.com");
Assuming you're hitting a url not ending in .yourwebsitename.com during testing then the tracking code will see your URL is 'localhost' and not 'yourwebsitename.com' and not send any tracking.
You can always setup a proxy to tunnel all of your traffic through. Then simply exclude the proxy's IP from the results.
can't find a way to reply to answers, I second to the hosts file trick:
127.0.0.1 www.google-analytics.com
as it works in all browsers at the same time, as designers often try site in all browsers.
I recommend to use a 127.0.0.1 (localhost) redirect in the HOST file to block all/any type of abusive sites or domain/trackers/analytic and such. For a large list take a look at the WinHelp website. I have and still use it for all my PC's. You also need to look over the list which domains you do want and remark the lines with a # tag in the list.
All instructions are on the site for different operating systems.

Resources