AMP: How can I create sitemap - sitemap

My site based on AMP ( Accelerated Mobile Pages ); How can I create sitemap files if I have pages used ONLY AMP. Thanks.

Sitemaps can point to any content, it is only by convention that they normally point to webpages, they can point to PDF files, Word Documents, html pages running AMP, etc!
Create your sitemap as normal and it will get picked up and indexed.
As abielita mentioned, if you also have standard HTML pages the preferred way is to have them in a sitemap and then use the rel="amphtml" link.

You can check on this tutorial. Be noted that AMP does not need a sitemap (HTML or XML) if the content is marked up correctly: engines know where AMP content is by this piece of code present in your regular pages: <link rel="amphtml" href="https://www.example.com/url/to/amp/document.html">. You can also check this link for more information and explanation. The sitemap is added to your website by simply using the customizable shortcode. [amp-sitemap append='amp' heading='AMP Sitemap' max='5']

Related

Will my pages be crawled by google as they are Markdown files asynchronously interpreted and injected into the DOM?

I am planning to start a blog so I created my own laravel website. My posts are markdown files with .md extension. When a user visits a post eg. example.com/how-to-create-a-webiste then my markdown file is fetched and parsed to generate html content and displayed on the view called post.
So actually I do not have any html files except for post.blade.php. So will this affect the crawler from crawling my website as I do not have html pages rather all my pages are markdown files?
The answer is NO, Google or any other SE crawler will read the compiled version of your HTML, not your markdown files.
Google offers a tool to simulate crawling and even index manually your page, you will need to sign up in Search Console, check it out here.
You have to be careful with dynamic content tho, if the time it takes to inject it in the DOM is too high the robot may leave before the content shows up.
There are experiments, to test this theory, one that suits your situations is Asynchronous injection:
Experiment
After a time out of 1000 milliseconds, the test writes a string into a DIV element.
Test content
For the test to be succesful, the following content should get indexed.
Asynchronously injected content can be found in Google search, and this is proof: ngwzjcrnub
Result
Google definitely indexes this content.
Source of the experiment
Hope this helps you.

is this possible convert current page to AMP

I'm using this extension for creating AMP pages for our Magento e-commerce website.
In that extension, they create different pages for AMP (home page, category page, product page). We felt it is an extra work.
is this possible to convert the current page to AMP? (without any modification)
I don't think it's possible without having any modification unless you use some third party apps. You may check this Convert HTML to AMP tutorial. Be noted that it is strongly recommended that you use HTTPS in production environments. HTTPS has several benefits beyond just security including SEO. You can read more about this topic in this Google Webmaster blog post. Also, from this page, if you use WordPress, all you have to do is download the official AMP WordPress plugin.

Serve dedicated HTML page to Google crawler without changing the URL to make for dynamic content

My website is in javascript with dynamically generated content on top of a fixed HTML frame. To make Google aware of the content I use the _escaped_fragment_ trick and track on the server side when to serve fixed content instead of dynamic. It all works well for the sub pages as long as they are linked with #!, which is the case for all pages but the homepage.
I obviously want to keep the homepage without an ugly #! at the end of the URL.
So far the only solution I can think of is to serve the homepage with fixed content instead of Ajax generated one for everyone.
I would rather keep the Google dedicated version branch separate from the common version as I don't maintain it as much, especially in terms of CSS and navigation, which do not matter that much.
Is there a way to figure out that it is Google crawling the website and serve a static version instead?
The solution is to add the meta tag:
<meta name="fragment" content="!">
More details there.

How to avoid custom url generation from joomla SH404SEF component?

I have been using joomla Sh404sef component in my site,what is my problem is its generating two url for the same content so it produces the problem with google search engine as both url pointing to the same content.
Here is the examples of url generated from the component
URL:
http://www.mysite.com/page.html - Automatic URLS
http://www.mysite.com/page/ - Custom URLS
When I go and do the purge url from the component option it eliminated the non html (.html) urls from the db but it creates the url again when we post new page etc.
any body come across this issue and could give a suggestion on it?
thanks in advance
I think you'll find this is Joomla generating the url not SH404SEF. It has a habit of generating extras especially where you have blog style views etc. The way I get around this problem is 3 fold.
Create a solid menu structure (Now harder on Joomla2.5 where aliases are created with date/time). This should take care of most issues. Make sure you make unnecessary levels as noindex-nofollow
Use a 3rd party tool to mark secondary urls with a canonical tag. Look at ITPMetaPro but many others are available.
Work in Webmaster Tools to remove urls from index after following step 1 & 2.

In a sitemap, is it advisable to include links to every page on the site, or only ones that need it?

I'm in the process of creating a sitemap for my website. I'm doing this because I have a large number of pages that can only be reached via a search form normally by users.
I've created an automated method for pulling the links out of the database and compiling them into a sitemap. However, for all the pages that are regularly accessible, and do not live in the database, I would have to manually go through and add these to the sitemap.
It strikes me that the regular pages are those that get found anyway by ordinary crawlers, so it seems like a hassle manually adding in those pages, and then making sure the sitemap keeps up to date on any changes to them.
Is it a bad to just leave those out, if they're already being indexed, and have my sitemap only contain my dynamic pages?
Google will crawl any URLs (as allowed by robots.txt) it discovers, even if they are not in the sitemap. So long as your static pages are all reachable from the other pages in your sitemap, it is fine to exclude them. However, there are other features of sitemap XML that may incentivize you to include static URLs in your sitemap (such as modification dates and priorities).
If you're willing to write a script to automatically generate a sitemap for database entries, then take it one step further and make your script also generate entries for static pages. This could be as simple as searching through the webroot and looking for *.html files. Or if you are using a framework, iterate over your framework's static routes.
Yes, I think it is not a good to leave them out. I think it would also be advisable to look for a way that your search pages can be found by a crawler without a sitemap. For example, you could add some kind of advanced search page where a user can select in a form the search term. Crawlers can also fill in those forms.

Resources