Multilanguage website crawling issue (links like en/en/...) - sitemap

I've created a multilanguage website and I tried to generate a sitemap with a common Sitemap Online Generator.
Unfortunally it crawled links like "en/en/..." or "it/de/en/..." (that not exist and are not correct of course). I'm afraid that Google could do the same.
I read all about tag (maybe the problem is there) and did lots of trying, but the resutl is always the same: lots of redundant links (en/apartments/en/apartments/torre)
Any suggestions?

Solved!
I wrote my problem to the sitemap generator webmaster. It was an error about its tool (it doesn't consider singular apices around base-tag's attribute).
I sent the sitemap to google and everything was ok.

Related

Laravel Google tag Manager

Good day to all.
I connected the tag manager to the site, but I want to be sure that it is on all pages.
This is an online home appliance store.
I want to understand the Laravel principle, how can I find all the pages?
Help me figure out the code
Thanks in advance.
For a quick check, I normally use http://www.gachecker.com/ Very useful, it not only indicates absent libraries, it also indicates double libraries. Pretty useful.
You can implement basic pageview tracking through GTM and in a few days see if there are missing pages in Analytics report.
You could use your sitemap or access log (grouped by page path) to check it manually or with a site crawling tool.
Finally, sure, you can go through Laravel's page templates and make sure GTM is referenced in each. I'm not an expert in PHP frameworks though.
If you want to see Laravel-only solution, you should ask your question with no reference to GTM.

Implementing recaptcha on Github pages without php?

Absolute newb here, please forgive me for this basic question.
I have built my portfolio site using Github pages, but am experiencing spam via my contact form (hosted by GetSimpleForm). I am trying to implement Google reCAPTCHA, but I'm a bit stuck in the backend part. As I understand, Github pages don't support PHP, so I can not actually complete the form verification.
Google documentation here was unfortunately a bit overwhelming and cryptic to me as a beginner, since I just stared at my Github html/css/js files and had no clue what to put where.
Am I trying to do the impossible? Is it possible to use reCaptcha on Github pages? If so, is there a beginner friendly tutorial somewhere or a straightforward "copy-paste" thing I could use? (so far, it's not been clear where to use the secret key from the API key pair for example)
Thanks a bunch for any leads or alternative solutions for spam prevention that would work in Github pages!
The short answer is you cannot. Github Pages only support static site. You have to host your own website if you want to do some complex stuffs like backend check etc. and mostly they are not free.
The only suggestion I can come up is simply change your contact form to regular html form instead of hosting by the 3rd party website you are using. I suspect that the main reason you got spam is because you are using it's service.
A really simple way to do it is to make the form with HTML (you can either copy the code from a pre-made HTML site with a form, or find a youtube tutorial that shows you how to make a HTML form, pretty simple), and host it on something like Netlify. Netlify is free for static websites unless you are doing something really complicated, and it has a built in form submission that will send you an email automatically every time someone fills out the form. You don't need PHP or a third party app or anything.
You still create and edit the code of the website through Github, you just need to connect it to Netlify for the forms. I'm a complete beginner and I figured it out. Netfly has some tutorials that explain it nice and simple. No reason to pay or do a lot of complicated stuff, and you can make professional websites with just HTML and CSS.

SEO with angularjs and asp.net restfull service

I have developed a website using angularjs and web api.
The problem is that the ajax rendered content is not crawable by google. And no one can find the website using google search.
After reading many articles regarding this issue, including:
This one with all links of explanation going out,
Google ajax crawling protocol, and also stack over flow question, I couldn't find the proper solution. Those that mention asp.net solutions, are talking about mvc, and I need only the simple REST by web api, other articles are not talking about asp.net.
Is there any simple explanation?
I'm the one who asked this same question long ago, so I will answer from my experience:
Firstly, if all your content are accessible via unique URIs (including the hashbang if you use it), modern search engines should index it just fine. In fact Google can index javascript generated content now. You can try that via the Google Webmaster tool and see how your site is indexed.
Secondly, there are libraries that help you to serve parsed content to search engines if you need to, but in my case I didn't bother much with it since Google is indexing js nicely.
I've seen others ask this question, and maybe I'm missing something or this is outdated, but I don't see why AngularJS needs to be an issue with SEO.
Say you have a landing page and it has a bunch of links. Assuming you're using html5 mode in AngularJS (and I'm not sure that's 100% necessary) and something like ng-route then the links on the landing page can work both as "angular" (JavaScript) links and "old school" (full page load) links.
If you're a human user you can click a link and it will do angular magic and adjust the content without loading the full page. Ok, all fine.
But if you instead copy the link and paste it in a new tab or new browser, it will still work - assuming you've set up routes correctly.
I'm not an SEO expert by any stretch of the imagination, but as I understand it, having links that load pages and having those pages have real and useful content is the core of SEO, and done this way, AngularJS should work fine. The key thing to check is if you copy and paste the link (not just click it) that it works.

Hashbang URLs make the website difficult to crawl by Google?

Our agency built a dynamic website that uses a lot of AJAX interactions and #! (hashbang) URLs: http://www.gunlawsbystate.com/
It's a long book which you can scroll through and the URL in the address bar changes dynamically. We have to support IE so please don't advise using pushState — hansbang is the only option for us for now.
There's a navigation in the left sidebar which contains links to all chapters in the book.
An example of a link:
http://www.gunlawsbystate.com/#!/federal-properety/national-parks-and-wildlife-refuges/
We are expecting google to crawl this:
http:// www.gunlawsbystate.com/?_escaped_fragment_=/federal-properety/national-parks-and-wildlife-refuges/
which is complete html snapshot of the section. (+ there are links to the subsections like www.gunlawsbystate.com/#!/federal-properety/national-parks-and-wildlife-refuges/ii-change-in-the-law/ => www.gunlawsbystate.com/?_escaped_fragment_=/federal-properety/national-parks-and-wildlife-refuges/ii-change-in-the-law/ ).
It all looks to be complete according to the Google's specifications ( developers.google.com/webmasters/ajax-crawling/docs/specification ).
The site is run for about 3 months for now. The homepage is getting re-indexed every 10-15 days.
The problem is that for some reason Google doesn't crawl hashbang URLs properly. It seems like Google just "doesn't like" those URLs.
www.google.ru/search?&q=site%3Agunlawsbystate.com :
Just 67 pages are indexed. Notice that most of the pages Google indexed have "normal" URLs (mostly wordpress blog posts, categories and tags) and just 5-10% of result pages are hashbang URLs, although there are more than 400 book sections with unique content which Google should really like if it crawles it properly.
Could someone give me an advise on this, why Google does not crawl our book pages properly? Any help will be appreciated.
P.S. I'm sorry for not-clickable links — stackoverflow doesn't let me post more than 2.
UPD. The sitemap has been submitted to google a while ago. Google Webmaster Tools says that 518 URLs submitted and just 62 URLs indexed. Also, on the 'Index Status' page of the Webmaster Tools I see that there are 1196 pages Ever crawled; 1071 pages are Not selected. It clearly points to the fact that for some reason google doesn't index the #! pages that it visits frequently.
You are missing a few things.
First you need a meta tag to tell google that the Hash URLS can be accessed via a different url.
<meta name="fragment" content="!">
Next you need to serve a mapped version of each of the urls to googlebot.
When google visits:
http://www.gunlawsbystate.com/#!/federal-regulation/airports-and-aircraft/ii-boarding-aircraft/
It will instead crawl:
http://www.gunlawsbystate.com/?_escaped_fragment_=federal-regulation/airports-and-aircraft/i-introduction/
For that to work you either need to use something like PHP or ASP to serve up the correct page. Asp.net routing would also work if you can get the piping correct. There are services which will actually create these "snapshot" versions for you and then your meta tag will point to their servers.
Since it is deprecated by Google and now Google is not able to access the content under hashbang URLs.
Based on research Google avoids Escaped fragment URLs now and suggesting to create separate pages rather than using HashBang.
So I think PushState is the other option which can be used in this case.

Why use a Google Sitemap?

I've played around with Google Sitemaps on a couple sites. The lastmod, changefreq, and priority parameters are pretty cool in theory. But in practice I haven't seen these parameters affect much.
And most of my sites don't have a Google Sitemap and that has worked out fine. Google still crawls the site and finds all of my pages. The old meta robot and robots.txt mechanisms still work when you don't want a page (or directory) to be indexed. And I just leave every other page alone and as long as there's a link to it Google will find it.
So what reasons have you found to write a Google Sitemap? Is it worth it?
From the FAQ:
Sitemaps are particularly helpful if:
Your site has dynamic content.
Your site has pages that aren't easily
discovered by Googlebot during the
crawl process—for example, pages
featuring rich AJAX or images.
Your site is new and has few links to it.
(Googlebot crawls the web by
following links from one page to
another, so if your site isn't well
linked, it may be hard for us to
discover it.)
Your site has a large
archive of content pages that are not
well linked to each other, or are not
linked at all.
It also allows you to provide more granular information to Google about the relative importance of pages in your site and how often the spider should come back. And, as mentioned above, if Google deems your site important enough to show sublinks under in the search results, you can control what appears via sitemap.
I believe the "special links" in search results are generated from the google sitemap.
What do I mean by "special link"? Search for "apache", below the first result (Apache software foundation) there are two columns of links ("Apache Server", "Tomcat", "FAQ").
I guess it helps Google to prioritize their crawl? But in practice I was involved in a project where we used the gzip-ed large version of it where it helped massively. And AFAIK there is a nice integration with webmaster tools as well.
I am also curious about the topic, but does it cost anything to generate a sitemap?
In theory, anything that costs nothing and may have a potential gain, even if very small or very remote, can be defined as "worth it".
In addition, Google says: "Tell us about your pages with Sitemaps: which ones are the most important to you and how often they change. You can also let us know how you would like the URLs we index to appear." (Webmaster Tools)
I don't think that the bold statement above is possible with the traditional mechanisms that search engines use to discover URLs.

Resources