Simple question about the Google Sitemap - sitemap

Can I code my sitemap this way and expect Google to crawl every links contain in the following page I have inserted below? Thanks!
<url>
<loc>http://www.domain.com/page/1</loc>
<lastmod>2010-11-28</lastmod>
<changefreq>daily</changefreq>
<priority>0.2</priority>
</url>
<url>
<loc>http://www.domain.com/page/2</loc>
<lastmod>2010-11-28</lastmod>
<changefreq>daily</changefreq>
<priority>0.2</priority>
</url>

yes
but don'T bother to much with changefreq or priority, it does not really do anything (according to my tests)

Related

Ruby generate XML documents with period in attribute

There are lots of great Ruby libraries out there to generate XML documents, but I can't find any which support the generation of XML attributes with periods in their names.
The end goal here is to build a Ruby lib which auto generates Jenkins templates.
Here is an example Jenkins job field parameter, which as you can see, uses attributes with periods in the name:
<properties>
<hudson.model.ParametersDefinitionProperty>
<parameterDefinitions>
<hudson.model.StringParameterDefinition>
<name>MESSAGE</name>
<description/>
<defaultValue>Hello world!</defaultValue>
</hudson.model.StringParameterDefinition>
</parameterDefinitions>
</hudson.model.ParametersDefinitionProperty>
</properties>
Does anyone know how I could do this? Any way to bend the libraries which already exist to support this?
The solution is to use dynamic dispatching:
builder = Nokogiri::XML::Builder.new do |xml|
xml.properties {
xml.send('foo.bar', 'zaa')
}
end
<?xml version="1.0"?>
<properties>
<foo.bar>zaa</foo.bar>
</properties>

How to get first level children for XML using Nokogiri

I am trying to parse a POM file using Nokogiri, and want to get the first level child nodes.
My POM file looks something like this:
<project xmlns="some.maven.link">
<parent>
<groupId>parent.jar</groupId>
<artifactId>parent-jar</artifactId>
</parent>
<groupId>child.jar</groupId>
<artifactId>child-jar</artifactId>
</project>
I am trying to fetch the artifactId "child-jar" but the XPath that I am using is possibly incorrect and it's fetching me "parent.jar" as the first occurence.
This is my Ruby code:
#pom = Nokogiri::XML(File.open(file_path))
p #pom.xpath("/project/artifactId", project"=>"http://maven.apache.org/POM/4.0.0")[0].text
I can access the second element but that just would be a hack.
Your XML sample does not appear to be correct. Simplifying it:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<project>
<parent>
<groupId>parent.jar</groupId>
<artifactId>parent-jar</artifactId>
</parent>
<groupId>child.jar</groupId>
<artifactId>child-jar</artifactId>
</project>
EOT
doc.at('project > artifactId').text # => "child-jar"
Using XPath I'd use:
doc.at('/project/artifactId').text # => "child-jar"
I'd suggest learning the difference between search, xpath, css and their at* cousins which are all documented in the "Searching a XML/HTML Document" and Node documentation.
In the above example I removed the XML namespace information to simplify things. XML namespaces are useful, but also are irritating and in your example XML you'd broken it by not supplying a valid URL. Fixing the example with:
<project xmlns="http://www.w3.org/1999/xhtml">
I can use:
namespaces = doc.collect_namespaces # => {"xmlns"=>"http://www.w3.org/1999/xhtml"}
doc.at('project > artifactId', namespaces).text # => "child-jar"
or:
doc.at('xmlns|project > xmlns|artifactId').text # => "child-jar"
I prefer and recommend the first because it's more readable and less noisy.
Nokogiri's implementation of CSS in selectors helps simplify most selectors. Passing in the collected namespaces in the document simplifies searches, whether you're using CSS or XPath.
These also work:
doc.at('/xmlns:project/xmlns:artifactId').text # => "child-jar"
doc.at('/foo:project/foo:artifactId', {'foo' => "http://www.w3.org/1999/xhtml"}).text # => "child-jar"
Note that the second uses a renamed namespace, which is useful if you're dealing with redundant xmlns declarations in the document and need to differentiate between them.
Nokogiri's "Namespaces" tutorial is helpful.

Should sitemap URLs include file extensions?

I generated a sitemap with this tool: http://www.angeldigital.marketing/image-sitemap/
It spat out this code:
...
<url>
<loc>http://example.com/page</loc>
</url>
<url>
<loc>http://example.com/page.html</loc>
</url>
<url>
...
In my .htaccess file, I'm hiding .html, so example.com/page.html displays as example.com/page.
My question is, should the sitemap include both locations? If not, which one is preferable?
It looks like I should select a canonical url and include only that in the sitemap.
https://support.google.com/webmasters/answer/139066

Maven error: URI has an authority component

I'm pretty new to maven and trying to build a code on Windows that works fine in linux.
I have 2 local repositories in my pom.xml:
<repositories>
<repository>
<id>p2-repo-equinox_3.8.1</id>
<layout>p2</layout>
<url>file:///${basedir}/../xyz/abc/repository/</url>
</repository>
<repository>
<id>p2-repo-common</id>
<layout>p2</layout>
<url>file:///${basedir}/../xyz/def/repository/</url>
</repository>
</repositories>
When building, I get the error:
Internal error: java.lang.RuntimeException: Failed to load p2 repository with ID 'p2-repo-equinox_3.8.1' from location file://D:/maven/myproject/../xyz/abc/repository/: URI has an authority component -> [Help 1]
I found this post, and tried adding a third slash to pass an empty authority component ( file:///) which made it work, but I'm not sure why the issue only happens in Windows in the first place and not on Linux.
Any Advice appreciated.
The reason this doesn't cause an error on linux is probably due to the difference in the value of ${basedir} on the two platforms:
On Windows ${basedir} will be set to something like "c:/file/path"
file://c:/file/path <--- not happy
On linux ${basedir} will be like "/file/path"
file:///file/path <--- happy
On a windows machine, the complete syntax is file://host/path .
If the host is your machine(localhost), it can be omitted, resulting in file:///path .
See RFC 1738 – Uniform Resource Locators (URL)
A file URL takes the form:
file://<host>/<path>
[…]
As a special case, <host> can be the string "localhost" or the empty string; this is interpreted as 'the machine from which the URL is being interpreted'.

Asciidoctor: How to add Google Analytics code to all HTML pages with asciidoctor-maven-plugin

How do I add Google Analytics (or Google Tag Manager) code to all HTML pages generated by Asciidoctor? There is an extension, but that's not available from the maven repository. I am using the asciidoctor-maven-plugin.
Create a file index-docinfo-footer.html if your file is index.adoc in the same directory and add :docinfo: in that adoc file.
Fill that footer file with:
<script type="text/javascript">
dataLayer = [{'channel' : '{html-googleTagManagerChannel}', 'additional_tracking_code' : '{html-googleAnalyticsId}'}];
(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','{html-googleTagManagerId}');</script>
<noscript><iframe src="//www.googletagmanager.com/ns.html?id={html-googleTagManagerId}" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
And then do something like this in your pom.xml:
<plugin>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctor-maven-plugin</artifactId>
<configuration>
<attributes>
<html-googleAnalyticsId>UA-123456789-1</html-googleAnalyticsId>
<html-googleTagManagerId>GTM-ABCDE</html-googleTagManagerId>
<html-googleTagManagerChannel>MyProject</html-googleTagManagerChannel>
</attributes>
</configuration>
If the extension is published on RubyGems, you can download the dependency with the TorqueBox RubyGems Maven Proxy Repository.
Have a look at the asciidoctor-pdf-with-theme-example in the asciidoctor maven examples.

Resources