Avoiding duplicate-content hit on Google for archive pages? - ajax

Each blog post on my site -- http://www.correlated.org -- is archived at its own permalinked URL.
On each of these archived pages, I'd like to display not only the archived post but also the 10 posts that were published before it, so that people can get a better sense of what sort of content the blog offers.
My concern is that Google and other search engines will consider those other posts to be duplicate content, since each post will appear on multiple pages.
On another blog of mine -- http://coding.pressbin.com -- I had tried to work around that by loading the earlier posts as an AJAX call, but I'm wondering if there's a simpler way.
Is there any way to signal to a search engine that a particular section of a page should not be indexed?
If not, is there an easier way than an AJAX call to do what I'm trying to do?

Caveat: this hasn't been tested in the wild, but should work based on my reading of the Google Webmaster Central blog and the schema.org docs. Anyway...
This seems like a good use case for structuring your content using microdata. This involves marking up your content as a Rich Snippet of the type Article, like so:
<div itemscope itemtype="http://schema.org/Article" class="item first">
<h3 itemprop="name">August 13's correlation</h3>
<p itemprop="description" class="stat">In general, 27 percent of people have never had any wisdom teeth extracted. But among those who describe themselves as pessimists, 38 percent haven't had wisdom teeth extracted.</p>
<p class="info">Based on a survey of 222 people who haven't had wisdom teeth extracted and 576 people in general.</p>
<p class="social"><a itemprop="url" href="http://www.correlated.org/153">Link to this statistic</a></p>
</div>
Note the use of itemscope, itemtype and itemprop to define each article on the page.
Now, according to schema.org, which is supported by Google, Yahoo and Bing, the search engines should respect the canonical url described by the itemprop="url" above:
Canonical references
Typically, links are specified using the element. For example, the
following HTML links to the Wikipedia page for the book Catcher in the
Rye.
<div itemscope itemtype="http://schema.org/Book">
<span itemprop="name">The Catcher in the Rye</span>—
by <span itemprop="author">J.D. Salinger</a>
Here is the book's <a itemprop="url"
href="http://en.wikipedia.org/wiki/The_Catcher_in_the_Rye">Wikipedia
page.
http://schema.org/docs/gs.html#advanced_enum
So when marked up in this way, Google should be able to correctly ascribe which piece of content belongs to which canonical URL and weight it in the SERPs accordingly.
Once you've done marking up your content, you can test it using the Rich Snippets testing tool, which should give you a good indication of what Google things about your pages before you roll it into production.
p.s. the most important thing you can do to avoid a duplicate content penalty is to fix the titles on your permalink pages. Currently they all read 'Correlated - Discover surprising correlations' which will cause your ranking to take a massive hit.

I'm afraid but I think it is not possible to tell a Search Engine that a specif are of your web page should not be be indexed (example a div in your HTML source). A solution to this would be to use an Iframe for the content you do not what search engine to index, so I would use a robot.text file with an appropriate tag Disallow to deny access to that specific file linked to the Iframe.

You can't tell Google to ignore portions of a web page but you can serve up that content in such a way that the search engines can't find it. You can either place that content in an <iframe> or serve it up via JavaScript.
I don't like those two approaches because they're hackish. Your best bet is to completely block those pages from the search engines since all of the content is duplicated anyway. You can accomplish that a few ways:
Block your archives using robots.txt. If your archives in are in their own directory then you can block the entire directory easily. You can also block individual files and use wildcards to match patterns.
Use the <META NAME="ROBOTS" CONTENT="noindex"> tag to block each page from being indexed.
Use the X-Robots-Tag: noindex HTTP header to block each page from being indexed by the search engines. This is identical in effect to using the ` tag although this one can be easier to implement since you can use it in a .htaccess file and apply it to an entire directory.

Related

Custom sections Umbraco

Not even sure I labeled this correctly, I am in the process of converting a site to Umbraco, and there are sections of the site that needs to be edited using the CMS tools in the back end, basically it is a grid with pictures and description text
Here is a sample of the HTML
<div class="hi-icon-effect-1 hi-icon-effect-1a">
<a class="hi-icon">
<img class="img-responsive " id="ImgSales" src="../../Images/sales_icon_circle_grey.png" alt="">
</a>
<p style="padding-left:5px;" id="lblSales" class="">Sales</p>
</div>
What I would like to be able to do is go to the content section of the admin and edit the list of items and configure the image and text for each item.
http://www2.strikemedia.co.za/
If you view the above link and scroll down there will be a grid of items (services) and it is this list that I want to be able to generate.
I am comfortable with all the technologies used in Umbraco, I just do not know the system well enough to do these kinds of modifications, can someone please assist or point me to the resources that will help me build this.
Thanks
You should take a look at the Archetype package: https://our.umbraco.org/projects/backoffice-extensions/archetype/
As far as I understand your question you are looking for a way to add X amount of similar items to the contents of a page - for this, Archetype is probably perfect :-)
Once you have your list of items added inside Umbraco, look here: https://github.com/kgiszewski/ArchetypeManual/blob/master/03%20-%20Template%20Usage.md
Use case #1 in this example will allow you to iterate through items and output it with whatever "template" you want (aka the HTML sample you provided).

How Do I get my Search Bar to search within my website?

I have a search bar on my unpublished website, and I was hoping that there's some kind of coding that could make it possible to search something that will draw a conclusion from my own website. (As of now, using the search bar takes me to google)
<form id="tfnewsearch" method="get" action="http://www.google.com">
<input type="text" class="tftextinput" name="q" size="21" maxlength="120"><input type="submit" value="search" class="tfbutton">
</form>
<div class="tfclear"></div>
Any suggestions?
You have added the link of google.com that is why you are redirecting to google.
For Your own search engine, you have to make a form for your site and some database queries which will select some data from your database and display the result. replace the google.com with your form url in action tag.
It does really depend on how your site is built, and whether it's fully accessible to public.
For example, if it's completely public, you can still use google to search your site by using the Google Custom Search API.
Otherwise, there's no magic potion. You will likely have to write some code to index your documents etc. Many sites achieve this by storing the information in a database and creating a full text index of the site, and then querying the database. But this will require more than just CSS and HTML.

aria-label, h-card, or both?

Do I need aria-label attributes when I'm using h-card (this is for company contact information in a page footer)?
<div class="h-card">
<a class="u-url" href="http://example.com">
<img src="http://example.com/static/logo.svg" alt="Example Logo">
<span class="p-name sr-only">Example Corp.</span>
</a>
<div aria-label="Address" class="p-adr h-adr">
<span class="p-locality">Eugene</span>
<span class="p-region">OR</span>
<span class="p-postal-code">97403</span>
</div>
<a aria-label="Telephone" class="p-tel" href="tel:12345678">(12) 345-678</a>
</div>
Are the aria-labels superflous here or do they provide some value? Ought there be more detailed aria- attributes? (And if so, which?)
WAI-ARIA and Microformats don’t "compete":
WAI-ARIA is a framework to enhance the accessibility of your web content.
Microformats are a convention for marking up structured data on your HTML pages.
They have different goals, and consumers of WAI-ARIA don’t necessarily support Microformats, and consumers of Microformats don’t necessarily support WAI-ARIA.
So when deciding if you need the WAI-ARIA attribute aria-label in your example, ignore if or how you use the Microformat h-card, and vice-versa. They don’t interact with each other.
Best not to use aria-label here; at worst, a screenreader will end up reading out the aria-label instead of your content, making it less accessible.
As spec'd, the aria-label value, if present, is used instead of the element content (simplifying somewhat); but in practice, behavior varies quite depending on element type and on the specific screenreader/browser used;
As it turns out, in the case or aria-label being used on SPAN,
VoiceOver on Mac reads out the label instead of the content
NVDA and JAWS on Windows ignore the aria-label outright and just read out the div/span content. (This behavior could change in some future update to these tools...)
So at best, it's ignored; at worst, it replaces your actual content. Best to not use it in your case then.
ARIA can be pretty useful when used carefully; but browser compat issues mean it's unfortunately full of pitfalls; if you're going to use it at all, recommend checking out the specs, and also ensure that you test with real-world screenreaders so you can ensure that using aria doesn't have the unintended consequence of making your content less accessible!

PJAX Scroll Loading

I'm working on a news publishing site that needs to load in stories from an RSS feed below the current news page. I've been using InfiniteAjaxScroll (http://infiniteajaxscroll.com/) to some success however, I've hit a brick wall. There is not way for me to dynamically change what story should load in next as you scroll down the page.
Does anyone know of any other plugins, tutorials, examples that replicate behavior like this. I've searched but come up with nothing that meets these requirements.
I'm trying to create something similar to what the Daily Beast has implemented on their site.
http://www.thedailybeast.com/articles/2014/11/05/inside-the-democrats-godawful-midterm-election-wipeout.html
How do they know what stories to load in?
Thanks!
If you're using the InfiniteAjaxScroll library, the "next story" is whatever link you define as the next URL which can be dynamic for each story you load.
Imagine your first story's HTML as something like this
<div class="stories">
<div class="story">
...
</div>
</div>
<div id="pagination">
next
</div>
Then in the storyC.html you have
...
<div id="pagination">
next
</div>
Assuming you're using some sort of dynamic backend, you would use some sort logic to grab a related story and just set that URL as the "next" URL.

IE8 & FF XHTML error or badly formed span?

I recently have found a strange occurrence in IE8 & FF.
The designers where using js to dynamically create some span tags for layout (they were placing rounded corner graphics on some tabs). Now the xhtml, in js, looked like this: <span class=”leftcorner” /><span class=”rightcorner” /> and worked perfectly!
As we all know dynamically rendering elements in js can be quite processor intensive so I moved the elements from js into the page source, exactly as above.
... and it didn’t work... not only didn’t it work, it crashes IE8.The fix was simple, put the close span in ie: <span class=”leftcorner”></span>
I am a bit confused by this.
Firstly as far as I am aware <span class=”leftcorner” /> is perfectly valid XHTML!
Secondly it works dynamically, but not in XHTML?!?!?
Can anyone shed any light on this or is it simply another odd occurrence of browsers?
The major browsers only support a small subset of self-closing tags. (See this answer for a complete list.)
Depending on how you were creating the elements in JS, the JavaScript engine probably created a valid element to place in the DOM.
I had similar problem with a tags in IE.
The problem was my links looked like that (it was an icon set with the css, so I didn't need the text in it:
<a href="link" class="icon edit" />
Unfortunately in IE these links were not displayed at all. They have to be in
format (leaving empty text didn't work as well so I put there). So what I did is I add an few extra JS lines to fix it as I didn't want to change all my HTML just for this browser (ps. I'm using jQuery for my JS).
if ($.browser.msie) {
$('a.icon').html('&nbsp');
}
IE in particular does not support XHTML. That is, it will never apply proper XML parsing rules to a document - it will treat it as HTML even with proper DOCTYPE and all. XHTML is not always valid SGML, however. In some cases (such as <br/>) IE can figure it out because it's prepared to parse tagsoup, and not just valid SGML. However, in other cases, the same "tagsoup" behavior means that it won't treat /> as self-closing tag terminator.
In general, my advice is to just use HTML 4.01 Strict. That way you know exactly what to expect. And there's little point in feeding XHTML to browsers when they're treating it as HTML anyway...
See I think that one of the answers to Is writing self closing tags for elements not traditionally empty bad practice? will answer your question.
XHTML is only XHTML if it is served as application/xhtml+xml — otherwise, at least as far as browsers are concerned, it is HTML and treated as tag soup.
As a result, <span /> means "A span start tag" and not "A complete span element". (Technically it should mean "A span start tag and a greater than sign", but that is another story).
The XHTML spec tells you what you need to do to get your XHTML to parse as HTML.
One of the rules is "For non-empty elements, end tags are required". The list of elements includes a quick reference to which are empty and which are not.

Resources