Is there an automated way to validate all AMP pages in a site? - validation

What am I trying to accomplish?
I am trying to validate all AMP pages in a site (like Google AMP Validator), automatically and store the results. Is there an NPM BULK Validator or something similar out there? I am trying to avoid having to manually go through my sitemaps and test each of thousands of urls.

There is an NPM library and command line tool (https://npmjs.com/package/amphtml-validator), but you will still need to somehow generate the list of documents.

Here is the solution for testing(validate) thousands of URLs in a short time. One website i.e https://www.ampvalidator.com/
This website is secure, accurate, fast and easy to use. And the most interesting thing is you will get the proper beautified email report.

Related

Implementing recaptcha on Github pages without php?

Absolute newb here, please forgive me for this basic question.
I have built my portfolio site using Github pages, but am experiencing spam via my contact form (hosted by GetSimpleForm). I am trying to implement Google reCAPTCHA, but I'm a bit stuck in the backend part. As I understand, Github pages don't support PHP, so I can not actually complete the form verification.
Google documentation here was unfortunately a bit overwhelming and cryptic to me as a beginner, since I just stared at my Github html/css/js files and had no clue what to put where.
Am I trying to do the impossible? Is it possible to use reCaptcha on Github pages? If so, is there a beginner friendly tutorial somewhere or a straightforward "copy-paste" thing I could use? (so far, it's not been clear where to use the secret key from the API key pair for example)
Thanks a bunch for any leads or alternative solutions for spam prevention that would work in Github pages!
The short answer is you cannot. Github Pages only support static site. You have to host your own website if you want to do some complex stuffs like backend check etc. and mostly they are not free.
The only suggestion I can come up is simply change your contact form to regular html form instead of hosting by the 3rd party website you are using. I suspect that the main reason you got spam is because you are using it's service.
A really simple way to do it is to make the form with HTML (you can either copy the code from a pre-made HTML site with a form, or find a youtube tutorial that shows you how to make a HTML form, pretty simple), and host it on something like Netlify. Netlify is free for static websites unless you are doing something really complicated, and it has a built in form submission that will send you an email automatically every time someone fills out the form. You don't need PHP or a third party app or anything.
You still create and edit the code of the website through Github, you just need to connect it to Netlify for the forms. I'm a complete beginner and I figured it out. Netfly has some tutorials that explain it nice and simple. No reason to pay or do a lot of complicated stuff, and you can make professional websites with just HTML and CSS.

Check on which pages an image is used?

Is there a certain way to check which pages on a website use a specific image?
Say I have some image which I don't use on a page anymore, so I'd like to delete it from my server. But I'm not entirely sure if it's being used on other pages, is there a way to check if it's still being shown on other pages?
You can hook your website to google webmaster tools and wait a little bit after a while 404 errors will appear there. This way you can track unused resources and dead ends.
This includes images.
There is a better way if you have direct access to the web server.
Visit every page in your website or let google crawl it.
You can later sort the files by date modified and ones which are not modified lately are not used.
You have to make sure you get the images from the pages so I would use a historyless cahceless session.
How to sort the files according to the time stamp in unix?

Encode/Decode SEO urls through platform API?

I am trying to decode and encode Joomla urls but Joomla doesn't seem to have a consistent API for that (how it looks). The main problem comes in when another SEO plugin is installed and the operation is performed as background process (ie: not whilst rendering in a browser through Joomla).
The other big problem is that users copy and paste SEO urls of the own site directly into the content.
Does anyone knows a solution for this ? Supporting all sorts of SEO plugins individually is a total no-go and rather impossible.
I actually thought its the Job of the CMS to guarantee on a API level that SEO urls can be decoded and encoded without knowing the plugins, but no. I also had a look in some plugins and indeed, plugins do
handle code for other plugins whilst it shouldn't be, coz.
Well,
thanks
You can't. JRoute won't work reliably in the administrator, I even tried hacking it, it's a no-go.
Moveover sh404 (one of the leading SEF extensions) does a curl call to the frontend in order to get the paths right. You can find in their code a commented attempt to route in the backend.
Are you are trying to parse content when it's saved, find SEF urls and replace with their non-sef equivalents? If you create a simple component to handle this in the frontend (just get what you need from xmap), then you can query the frontend from the backend with curl/wget and possibly achieve this with a decent rate of success: but I wouldn't expect this to work 100% (sometimes parameters are added by components, or the order of parameters is different from call to call, and the router.php in extensions can be very fragile or even plain wrong).

One-page AJAX-based WordPress site. How should I do it?

I am trying to create a one-page WordPress website, something like the ones you sometimes see in ThemeForest's WP section: the whole website is a long page that has everything in one place, from about us, to portfolio, to some blog posts, to contacts.
Placing all things on one page is not difficult. But when I started thinking about how to present individual posts and pages, I realised that I probably need a general way of getting posts' data via AJAX, and create new blocks with JS. How should I go about this? I suppose this was done before, but I struggle to find something this specific on Codex or a tutorial with best practices.
Any advice or link will be greatly appreciated.
You could use a plugin such as jQuery Easytabs, download it here, that has a built-in Ajax component.
I've found that the easiest way is to just get all content to load into the divs ahead of time, vs. trying to load all pages through Ajax. However, appending something like '?ajax/ajax' to the end of your urls through the Easytabs plugin is one option that I have successfully used in the past.
If you decide to use the easytabs functionality, there is ample documentation on the page that I linked to.

AngularJS / AJAX app and search engine crawlers

I've got a web app which heavily uses AngularJS / AJAX and I'd like it to be crawlable by Google and other search engines. My understanding is that I need to do something special to make it work, as described here: https://developers.google.com/webmasters/ajax-crawling
Unfortunately, that looks quite nasty and I'd rather not introduce the hash tags. What I'd like to do is to serve a static page to Googlebot (based on the User-Agent), either directly or by sending it a 302 redirect. That way, the web app can be the same, and the whole Googlebot workaround is nicely isolated until it is no longer necessary.
My worry is that Google may mistakenly assume that I'm trying to trick Googlebot, while my goal is to help it. What do you guys think about this approach, and what would you recommend?
Recently I come upon this excellent post from yearofmoo, explaining in details how to make your Angular app SEO friendly. In essence, when bots see an uri with a hash tag they will know it's an ajaxed page and will try to reach the same uri by replacing '#!' in your uri with '?_escaped_fragment_='. This alternative uri instructs bots that they should expect to find a definitive static version of the page they were accessing.
Of course, to achieve this you'd have to introduce hash tags into your uris. I don't see why are you trying to avoid them. Isn't gmail using hash tags?
Yeah unfortunately, if you want to be indexed - you have to adhere to the scheme :( If your running a ruby app - there's a gem that implements the crawling scheme for any rack app....
gem install google_ajax_crawler
writeup of how to use it is at http://thecodeabode.blogspot.com.au/2013/03/backbonejs-and-seo-google-ajax-crawling.html, source code at https://github.com/benkitzelman/google-ajax-crawler
Have a look at these links and it will give you a good direction:
Set up your own Prerender service using Prerender.io open source code:
https://prerender.io/
Use a different existing service such as BromBone, Seo.js or SEO4AJAX:
http://www.brombone.com/
http://getseojs.com/
http://www.seo4ajax.com/
Create your own service for rendering and serving snapshots to search engines. Read this article. It will give you the big picture:
http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io
As of May 2014 GoogleBot now executes JavaScript. Check WebmasterTools to see how Google sees your site.
http://googlewebmastercentral.blogspot.no/2014/05/understanding-web-pages-better.html
Edit: Note that this does not mean other crawlers (Bing, Facebook, etc.) will execute Javascript. You may still need to take additional steps to ensure that these crawlers can see your site.

Resources