Image Alt SEO localisation - image

How does it Google or other search engines treat images with same source but different - translated alt attribute.
<!--on English page: example.com/en/locations -->
<img src="http://example.com/img/london.jpg" alt="nice bridge" />
<!--on German page: example.com/de/stelle -->
<img src="http://example.com/img/london.jpg" alt="schöne Brücke" />
What and which language is more relevant? Does it search engines use both alt's or...
thanks

There are no hard facts on this, but there is no reason to think that search engines would look at different language versions when analyzing a page. On each page, they take the alt attribute as the textual equivalent of the image, and it is natural that in pages in different languages, the texts are different. So search engines use for each page the alt text that it has.
On the other hand, “nice bridge” and “schöne Brücke” are rather useless for the purposes of searching, and they are not appropriate textual equivalents of any image that I can imagine.

Related

When do I need to use x-default for hreflang?

I run a site in Belgium for which default language is Dutch. Using a selector the user can translate the page into English and French.
When entering the site for the first time it's served in Dutch:
http://example.com/articles/my_article/
The language switcher gives you this English version (this places a language cookie for English):
http://example.com/my_article/?lang=en
The language switcher gives you this French version (this places a language cookie for French):
http://example.com/my_article/?lang=fr
The language switcher gives you this Dutch version (this places a language cookie for Dutch):
http://example.com/my_article/?lang=nl
Now I use the following canonical and alternate hreflang tags on this page:
<link rel='canonical' href='http://example.com/my_article/'/>
<link rel='alternate' hreflang='nl' href='http://example.com/my_article/?lang=nl'/>
<link rel='alternate' hreflang='en' href='http://example.com/my_article/?lang=en'/>
<link rel='alternate' hreflang='fr' href='http://example.com/my_article/?lang=fr'/>
The problem is, when you go back to the following URL after visiting a URL with lang=xy then it'll be served in the language based on the cookie that was previously set:
http://example.com/articles/my_article/
Does that mean I should add x-default for this page?
<link rel="alternate" href="http://example.com/my_article/" hreflang="x-default" />
From my understanding, that is the way it is supposed to work. Once users select a language, they see the content in that language.
X-default should point to a "language/region/country selection page".
In this case, it could be example.com/welcome that shows a menu to select a preferred language.
So x-default should not show any particular language page version. Like, choosing English to be x-default (example.com/my_article/?lang=en). No. It should point to the language selection page, like a welcome page. That page should be written in whatever language you think is the safest "catch-all", with a design that's easy to navigate even if you don't speak it (country flags with language name written in the language of the country, stating something like "English language site version" or whatever you think explains it the best).
Google explains it here:
https://support.google.com/webmasters/answer/189077?hl=en

How can I write the correct XPath from Html code below?

I have the following HTML code:
<a onmousedown="return rwt(this,'','','','1','AFQjCNGCu8Es2fdCh_-QSfscnnAaMVAngg','','0CB0QFjAA','','',event)"
href="http://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CB0QFjAA&url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBrazil&ei=XaviVIqPA8KqNq3-gpgO&usg=AFQjCNGCu8Es2fdCh_-QSfscnnAaMVAngg&bvm=bv.85970519,d.eXY">
Brazil - Wikipedia, the free encyclopedia</a>
Here is the Xpath from the HTML code above using Firefox Firebug :
/html/body/div[1]/div[4]/div[3]/div[6]/div[2]/div[3]/div/div[2]/div[2]/div/div[1]/ol/li[1]/div/h3/a
How can I write the Xpath code?
For a "search-friendly" XPath, you have to only use entities that do not change between pages and, preferably, do not rely too much on the page's internal details that can change without notice. Here, these are:
The title of the search result
The fact it's a hyperlink
//a[text()='Brazil - Wikipedia, the free encyclopedia']
This worked for me for the page retrieved with Firefox 28 but not with Python requests - in the latter case, the word "Brazil" in the hyperlink text was bolderized and this had to be applied instead:
//a[text()=' - Wikipedia, the free encyclopedia']/*[text()='Brazil']/..

Extending language definitions (for code highlighting) in notepad++

I've been doing development in TWIG lately. It is an html templating language that is very simple and robust.
I've set notepad++ to automatically treat .twig files as html. This is ok, but I don't get any syntax highlighting on my twig functions.
The twig syntax is incredibly simple (by design) and would be easy to add to notepad++. The problem is, everything I find on this subject is either about creating a new language definition (and I do not want to reinvent the html definition), or modifying the color for existing syntax bits in a language.
Is there any way to copy a language definition and then modify it in notepad++? If not, is there any way in notepad++ to add extra syntax bits to an existing language definition?
edit
TWIG is an html template language/engine. they syntax for it is the same as html, with the addition of a few open/close tags (specifically {% %}, {{ }}, and {# #}) for control statements. you can read more about it at the twig website
edit #2
Based on the answer from Brian Deragon, I have been investigating 3 files. Heres what I've figured out/done so far:
\plugins\APIs\html.xml - Seems to define keywords, for autocomplete. I made a copy of the file named twig.xml
langs.model.xml - Again, a list of keywords, with all the languages in 1 XML file. I copied the HTML object and replaced the name and ext parameters with twig.
stylers.model.xml - Has a list of different items, and style information (color, bg color, font, etc) for each. I copied the HTML section and changed the name and desc parameters to twig.
Those changes done, I opened up a twig file in notepad++, hoping to see it listed in the language options. Sadly, it has not appeared, leading me to believe that some of this is hard coded (and thus what I want might not be possible).
The stylers.model.xml is interesting, though. Each entry has a bunch of items, defined like this:
<LexerType name="twig" desc="TWIG" ext="">
<WordsStyle name="DEFAULT" styleID="0" fgColor="000000" bgColor="FFFFFF" fontName="" fontStyle="1" fontSize="" />
<WordsStyle name="COMMENT" styleID="9" fgColor="008000" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
<WordsStyle name="TAG" styleID="1" fgColor="0000FF" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
<WordsStyle name="TAGEND" styleID="11" fgColor="0000FF" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
...
</LexerType>
Those seem to be where the styles are defined for the different elements. I can't find anywhere where those elements are defined though. langs.model.xml has a definition for comment start/end, but not for any other delimiters. what I really need is a place to tell notepad++ to treat { } as a delimiter, much like it does for < > now.
edit #3
I am also looking at this list of user defined languages for notepad++ http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=User_Defined_Language_Files
User defined languages use a different engine, but i might be able to find one in there that is similar to html enough that I can adapt it.
You should be able to just copy and edit the XML definition file (html.xml); as long as you don't need stuff beyond the basics, like code-folding, advanced coloring based off of case-handling blocks or multiple conditionals, separate formatting for lead characters, label coloring, xml-based commenting, language mixing (coloring of embedded scripts), support for coloring of duck-types, etc. If you need anything "advanced" you need to write your own lexer, in which case most of the below applies.
Even still, the templates I listed below should give you a head-start on your own language definition file.
As far as I'm aware, Notepad++ uses Scintilla Lexers for determining its code rules.
You'll have to create your own lexer, but...the HTML Scintilla Lexer is already included in the Scintilla source code.
Then you would insert your custom lexer using a plug-in, like Gary's Lua Highlighter Plugin.
Resources for building a custom lexer:
How to write a scintilla lexer
That being said, Geany is very similar to Notepad++ (based off the same engine, Scintilla), so you might want to see whether it's already been done for Geany, or whether there's an open-source project for it in the works. This would at least give you a head start.
If that doesn't help, there are IDEs and editors with Twig support built-in, like:
Eclipse
Netbeans
GEdit (which has a Windows binary, if needed)
JetBrains PhpStorm
GEdit has published their XML definition of the language here, which might help as a reference when creating your own definition file or lexer; there's also another template published by the guys from Twig here that might be of some help.
Here are the best Notepad++-specific tutorials for creating custom lexer's/User Defined Languages I can find:
User Defined Languages
How to create a user-defined language in Notepad++ based on an existing language?
If you want to get brave and build your own Scintilla dll, reference these threads, to see a guy who got it working, and to show up in the language list (use the previous/next thread message to see responses, or the thread index; it's a mailing list, so its UI isn't the best)
http://osdir.com/ml/editors.notepad++/2007-02/msg00021.html
Hope that helps or gets you at least more of a head start!
I made a Highlighter for it here:
https://github.com/Banane9/notepadplusplus-twig
Possible duplicate of this post: https://superuser.com/questions/40876/assigning-custom-extensions-to-a-languages-syntax-highlighting-in-notepad
All you need to do is add your custom extension in Settings->Style Configurator
Click on HTML and add your extension in the User Ext box.
EDIT: If you want to add more rules to your language, you might have to add another XML in notepad++->plugins->APIs
If you think it's like HTML, just copy over html.xml and save as twig.xml
Add more rules to this XML file

Avoiding duplicate-content hit on Google for archive pages?

Each blog post on my site -- http://www.correlated.org -- is archived at its own permalinked URL.
On each of these archived pages, I'd like to display not only the archived post but also the 10 posts that were published before it, so that people can get a better sense of what sort of content the blog offers.
My concern is that Google and other search engines will consider those other posts to be duplicate content, since each post will appear on multiple pages.
On another blog of mine -- http://coding.pressbin.com -- I had tried to work around that by loading the earlier posts as an AJAX call, but I'm wondering if there's a simpler way.
Is there any way to signal to a search engine that a particular section of a page should not be indexed?
If not, is there an easier way than an AJAX call to do what I'm trying to do?
Caveat: this hasn't been tested in the wild, but should work based on my reading of the Google Webmaster Central blog and the schema.org docs. Anyway...
This seems like a good use case for structuring your content using microdata. This involves marking up your content as a Rich Snippet of the type Article, like so:
<div itemscope itemtype="http://schema.org/Article" class="item first">
<h3 itemprop="name">August 13's correlation</h3>
<p itemprop="description" class="stat">In general, 27 percent of people have never had any wisdom teeth extracted. But among those who describe themselves as pessimists, 38 percent haven't had wisdom teeth extracted.</p>
<p class="info">Based on a survey of 222 people who haven't had wisdom teeth extracted and 576 people in general.</p>
<p class="social"><a itemprop="url" href="http://www.correlated.org/153">Link to this statistic</a></p>
</div>
Note the use of itemscope, itemtype and itemprop to define each article on the page.
Now, according to schema.org, which is supported by Google, Yahoo and Bing, the search engines should respect the canonical url described by the itemprop="url" above:
Canonical references
Typically, links are specified using the element. For example, the
following HTML links to the Wikipedia page for the book Catcher in the
Rye.
<div itemscope itemtype="http://schema.org/Book">
<span itemprop="name">The Catcher in the Rye</span>—
by <span itemprop="author">J.D. Salinger</a>
Here is the book's <a itemprop="url"
href="http://en.wikipedia.org/wiki/The_Catcher_in_the_Rye">Wikipedia
page.
http://schema.org/docs/gs.html#advanced_enum
So when marked up in this way, Google should be able to correctly ascribe which piece of content belongs to which canonical URL and weight it in the SERPs accordingly.
Once you've done marking up your content, you can test it using the Rich Snippets testing tool, which should give you a good indication of what Google things about your pages before you roll it into production.
p.s. the most important thing you can do to avoid a duplicate content penalty is to fix the titles on your permalink pages. Currently they all read 'Correlated - Discover surprising correlations' which will cause your ranking to take a massive hit.
I'm afraid but I think it is not possible to tell a Search Engine that a specif are of your web page should not be be indexed (example a div in your HTML source). A solution to this would be to use an Iframe for the content you do not what search engine to index, so I would use a robot.text file with an appropriate tag Disallow to deny access to that specific file linked to the Iframe.
You can't tell Google to ignore portions of a web page but you can serve up that content in such a way that the search engines can't find it. You can either place that content in an <iframe> or serve it up via JavaScript.
I don't like those two approaches because they're hackish. Your best bet is to completely block those pages from the search engines since all of the content is duplicated anyway. You can accomplish that a few ways:
Block your archives using robots.txt. If your archives in are in their own directory then you can block the entire directory easily. You can also block individual files and use wildcards to match patterns.
Use the <META NAME="ROBOTS" CONTENT="noindex"> tag to block each page from being indexed.
Use the X-Robots-Tag: noindex HTTP header to block each page from being indexed by the search engines. This is identical in effect to using the ` tag although this one can be easier to implement since you can use it in a .htaccess file and apply it to an entire directory.

IE8 & FF XHTML error or badly formed span?

I recently have found a strange occurrence in IE8 & FF.
The designers where using js to dynamically create some span tags for layout (they were placing rounded corner graphics on some tabs). Now the xhtml, in js, looked like this: <span class=”leftcorner” /><span class=”rightcorner” /> and worked perfectly!
As we all know dynamically rendering elements in js can be quite processor intensive so I moved the elements from js into the page source, exactly as above.
... and it didn’t work... not only didn’t it work, it crashes IE8.The fix was simple, put the close span in ie: <span class=”leftcorner”></span>
I am a bit confused by this.
Firstly as far as I am aware <span class=”leftcorner” /> is perfectly valid XHTML!
Secondly it works dynamically, but not in XHTML?!?!?
Can anyone shed any light on this or is it simply another odd occurrence of browsers?
The major browsers only support a small subset of self-closing tags. (See this answer for a complete list.)
Depending on how you were creating the elements in JS, the JavaScript engine probably created a valid element to place in the DOM.
I had similar problem with a tags in IE.
The problem was my links looked like that (it was an icon set with the css, so I didn't need the text in it:
<a href="link" class="icon edit" />
Unfortunately in IE these links were not displayed at all. They have to be in
format (leaving empty text didn't work as well so I put there). So what I did is I add an few extra JS lines to fix it as I didn't want to change all my HTML just for this browser (ps. I'm using jQuery for my JS).
if ($.browser.msie) {
$('a.icon').html('&nbsp');
}
IE in particular does not support XHTML. That is, it will never apply proper XML parsing rules to a document - it will treat it as HTML even with proper DOCTYPE and all. XHTML is not always valid SGML, however. In some cases (such as <br/>) IE can figure it out because it's prepared to parse tagsoup, and not just valid SGML. However, in other cases, the same "tagsoup" behavior means that it won't treat /> as self-closing tag terminator.
In general, my advice is to just use HTML 4.01 Strict. That way you know exactly what to expect. And there's little point in feeding XHTML to browsers when they're treating it as HTML anyway...
See I think that one of the answers to Is writing self closing tags for elements not traditionally empty bad practice? will answer your question.
XHTML is only XHTML if it is served as application/xhtml+xml — otherwise, at least as far as browsers are concerned, it is HTML and treated as tag soup.
As a result, <span /> means "A span start tag" and not "A complete span element". (Technically it should mean "A span start tag and a greater than sign", but that is another story).
The XHTML spec tells you what you need to do to get your XHTML to parse as HTML.
One of the rules is "For non-empty elements, end tags are required". The list of elements includes a quick reference to which are empty and which are not.

Resources