W3C validation for complete site [closed] - w3c-validation

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I am working on a project where I have to validate the complete site, which has around 150 pages, through W3C Markup Validation. Is there a way to check W3C Markup Validation of an entire website?

The W3C doesn't offer this on w3.org.
http://validator.w3.org/docs/help.html#faq-batchvalidation
But you can use this tool and check "Validate entire site": (Also w3.org refers to this site!)
http://www.htmlhelp.com/tools/validator/
But you have a limit of 100 URLs to validate and will get this message when you reach 100 URLs:
Batch validation is limited to 100 URLs at one time. The remaining URLs were not checked.
Also there's a limit of errors displayed for each url.

The WDG offers two free solutions:
Validate entire site (select 'validate entire site')
Validate multiple URLs (batch)

You can run the validator yourself. As of 2018, W3C are using v.Nu for their validator, the code is at https://github.com/validator/validator/releases/latest and usage instructions are at https://validator.github.io/validator/#usage
For example, the following command will run it on all html files under the public_html directory:
java -jar vnu.jar --skip-non-html public_html

I use this tool bulk w3c html validator to validate my entire website
http://www.bulkseotools.com/bulk-w3c-validator.php
This tool uses W3c validator engine, you can check 500 urls at once.

I've used http://sitevalidator.com; I think it would be helpful to you.

I made this java app (Windows installer) in my spare time because I needed it at work:
https://gsoft.no/validator. It's free.
It uses either https://validator.w3.org/ or v.Nu running locally to validate an entire site.
It crawls a website and in the end makes a report with validator-links to all pages with warnings or errors. Because it crawls, all pages to be validated must be linked.
By running v.Nu locally you can validate an internal site (e.g. an intranet) which is not available online and therefore cannot be validated by online validators (unless you post the entire content of each page).

Related

How to develop RSS Feeder [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I need to build a RSS feeder in Go and I guess I did not understand some key concepts. To make it clear, I ask that question.
Is there any standard for number of last fetched news in XML file?
Should RSS document needs to be generated when requested? I mean should the client get always the latest news?
Here is the Go part. I will use https://github.com/gorilla/feeds library. It basically generates RSS XML. But it does not provide a publishing way.
Should I serve RSS XML document from a REST endpoint? If I do, is it okay for RSS clients?
You may say that first I should search on the internet and I did. Most of the articles talks about parsing and fetching from a RSS Feeder.
Is there any standard for number of last fetched news in XML file?
No. And it also varies between feeds. This also makes sense since there are some sites which produces lots of new content and others only few.
Should RSS document needs to be generated when requested? I mean should the client get always the latest news?
That's completely up to the server. But in many cases it is likely more efficient if the server creates a static file whenever new news were added instead of dynamically creating the same output again and again for each client. This also makes it easy to provide caching information (i.e. ETag or similar) and let the client retrieve the full content only if changed.
Should I serve RSS XML document from a REST endpoint? If I do, is it okay for RSS clients?
This does not really matter. The URL for the RSS can be anything you want, but you have to publish it so that RSS readers know where to get the RSS.

How to prepare half dynamic web page with best performance? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Imagine you have a web page which has some static contents and some dynamic contents based on the user's session. For example, you may see a webpage with a menu at top of the page which displays username but the remaining content is completely cachable and static.
There could be a simple solution to achieve that:
You can handle the dynamic part of the page in the client side with ajax request (which is not cachable) e.g. single page applications.
There may be another solution that client sends a request to a middleware(e.g. API Gateway) and the middleware fetches static part from cache and dynamic part from the backend then returns aggregated content to the client.
In my idea, the worst solution is to disable the cache.
What Facebook is doing, loads dynamic part at first request, and loads remaining contents with XHR requests.
Questions:
What is the best practice for this issue?
What would be the drawback of the second solution?
What do you think about Stackoverflow top menu that displays your username?
An AJAX request (or fetch, or any other HTTP based request) may well be cached by using a RESTful service.
For more fine grained controll over what should be cached you could use a service worker, for example by adding https://developers.google.com/web/tools/workbox/ to your application.
If your dynamic data has to be updated live, you should also have a look at WebSockets. Depending on your stack you could use a wrapper library like SignalR, socket.io or simply follow one of the tutorials at http://websocketd.com/

How to disable AMP caching from Google Search? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 months ago.
Improve this question
Some results on Google Search comes with AMP (Accelerated Mobile Pages) icon on theirs links, at least when using a mobile, as soon you click on the link instead of loading the site, google show you a cached version of it rather.
I want to disable this behaviour on my results, I see at least two good reasons for it:
When sharing the link it is a pain in the neck to have the huge google URL in place of the shorter one would be just with the original one.
Security: when you access any site and see a URL other than the site you wanted to load, you should distrust it, even if it looks like google (remember, you can get phished or even get caught in a trap hosted on gsites), Google should respect that instead of encouraging users to trust it just because the url looks like google! Even worst if combined with the first reason and you want to share the URL with a friend.
I have to remove the google AMP prefix ever and ever, there is no advanced search option or cookie that makes Google give the clean URL?
According to the AMP project FAQ you cannot:
By using the AMP format, content producers are making the content in AMP files available to be cached by third parties.
As a content producer I dislike Google adding their own URL, and branding around my content... From the consumer perspective looks like the content comes from Google. They say it is to improve speed, but you can see Google's intention behind this "free" technology.
A simple hack is to keep using AMP guidelines for the speed it provides to the page, but violate one rule (like add you own javascript that does noting).
Once pages have an error, google will not cache them.
By publishing AMP pages you let Google or any other AMP cache store and deliver your web page (which surprisingly seems to be legal):
Caching is a core part of the AMP ecosystem. Publishing a valid AMP document automatically opts it into cache delivery. (https://www.ampproject.org/docs/fundamentals/how_cached)
To stop AMP from caching, the project recommends to invalidate the format by removing the amp attribute from the <html> tag. I propose something else.
One thing I always disliked about AMP ist that it requires you to embed the JavaScript code directly from their server (https://cdn.ampproject.org/v0.js), effectively telling AMP about every single visitor to every AMP page. Embedding the code from your own server stops this privacy issue, disables caching, and still gives you the framework.
To do so you can build your own AMP framework using the source code:
https://github.com/ampproject/amphtml
But it's much simpler to just copy v0.js and all the scripts it fetches to your own server.
Odd because google says to remove the "amp" from the tag to not cache.
It said nothing about loading the js locally.
https://amp.dev/documentation/guides-and-tutorials/learn/amp-caches-and-cors/how_amp_pages_are_cached/
Is google wrong?

Ajax / Deep linking and Google indexing / SEO - Is it a bad idea? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm about to embark on building a music oriented website for a friend's band and I want to build something like this template. It uses ajax and deep linking.
My worry is that this site will not be crawlable by Google. Is there anything I can do or can code I can adjust to make it crawlable?
Many thanks in advance!
That template doesn't look crawlable to me. Googlebot will never find your content. If I go to the page for the template and view source, then search for "Gigs schedule with filter", I can't find it in the page source. That is because that particular content is loaded with AJAX and not part of the page source.
That template does not use Google's crawlable AJAX standard with #! in the url. https://developers.google.com/webmasters/ajax-crawling/ Googlebot will not be index the content on your site if you use that template.
Furthermore, there appear that there are some url issues. I see these two very similar URLS http://radykal.de/themeforest/stylico/features.html and http://radykal.de/themeforest/stylico/?page=features.html. As a user, if I visit that second url, I get the content, but I don't see the navigation. It seems likely that if googbot were to find the content, it would index that second url and use it as the landing page for your visitors. Missing navigation in that case would not be a good user experience, as users would not be able to navigate your site.

Hosting two sites within single Joomla cms [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Is it possible to host multiple websites that all have one single/common CMS (Joomla)?
Thanks.
Joomla offers a CMS to run a website on. Joomla uses mysql databases that just hold the information that will be shown on the content pages at the front. The way it is supposed to be used you won't be able to run multiple sites on a single cms.
You can't run 2 websites with different content on that single cms, but you can create multiple front ends on one cms. You could for example store your data using joomla and get it shown at the front using your own code. This way you will be able to have two interfaces / websites on one cms, both running on the same data.
So from what I read in your question I think the answer will be NO, unless you want to just apply another presentation to your data.
My own experience: I have used Joomla to just hold news articles that my web-master will add. I just used php to get those news-articles out of the mysql database and did that to make sure i could get my own presentation for the data displayed.
I actually beg to differ with those people that were so quick to say "NO!!" As of joomla version 1.5.x there are some components that allow you to do just that, most of them being commercial but there's also http://www.janguo.de/lang-en/Downloads/func-finishdown/31/ which is free at the moment. As of joomla version 1.6.x multiple sites will be integrated into joomla.
If what you need is to have several domains that point to the same Joomla (and to the same content) the answer is YES (see #S.Mark's answer).
If you want to use the same Joomla installation for two different websites (with different content), the answer is NO.
An alternative is to use some Joomla extension, such as:
http://extensions.joomla.org/extensions/core-enhancements/multiple-sites/5550
Yes you can, we have done this before. What you need to do is to have two databases though. We have just written about running multiple Joomla websites on the same Joomla installation. Hope you'll find it useful...
With CNAME record, you could able to mirror a web site to 2+ domains.

Resources