Create sitemap from the content of the CommerceTools database - sitemap

I need to create the sitemap file of my CommerceTools based shop and it would be great if it could be done automatically from the contents of the CTP database.
Do you know if there is a module, tool or extension already developed that allows this task?
EDIT->
I am aware that each online store can be built with a different technology.
In our specific case, the front-end is based on Sunrise for JVM, so it would be convenient for this tool to be created for this technology, although it is not essential.
I also recognize that each project can have its specific features that make it different from any other (mainly static content or from an external CMS) so I understand that creating a universal tool is very complex.
Anyway I think it would be great to have some tool that could be able to create a "sitemap-products.xml" from the most dynamic content of CTP using the slug of categories and products.
Then this "sitemap-products.xml" could be called from a sitemapindex from which you link both this and other secondary sitemaps that can be self-generated by the CMS (if you have it) and / or other more static that can be created and maintained manually by the development team.
<-EDIT
Thanks in advance.

I will give you a simple rule for creating a perfect sitemap from the database.
Sitemap.php :
<?php
$site = "https://yourdomain.ccom/"; // your URL addres with slash at end "/".
$chfreqprod = "weekly"; // the frequency of sitemaps
$priority = "0.8"; // priority
$date = date("Y-m-d\TH:m:s+02:00", time());
define ('DB_USER', 'changeWithYourUser');
define ('DB_PASSWORD', 'changeWithYourPassword');
define ('DB_HOST', 'localhost');
define ('DB_NAME', 'cangeWithYourDataBase');
$conn = mysql_connect(DB_HOST, DB_USER, DB_PASSWORD) or die("Could not connect to the database.");
mysql_select_db(DB_NAME, $conn) or die("Can not select the table in the database!");
header("Content-Type: text/xml;charset=utf-8");
echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"smap.xsl\"?>
<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">";
$query = #mysql_query("SELECT * FROM products LIMIT 0,25000");
while($row = #mysql_fetch_array($query)){
$product = $row['product_seo'];
echo "<url>
<loc>".$site.$product.".html</loc>
<lastmod>".$date."</lastmod>
<changefreq>".$chfreqprod."</changefreq>
<priority>".$priority."</priority>
</url>";
}
echo "</urlset>";
?>
smap.xsl :
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:html="http://www.w3.org/TR/REC-html40"
xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>XML Sitemap</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="robots" content="noindex,follow" />
<style type="text/css">
body {
font-family:"Lucida Grande","Lucida Sans Unicode",Tahoma,Verdana;
font-size:13px;
}
#intro {
background-color:#CFEBF7;
border:1px #2580B2 solid;
padding:5px 13px 5px 13px;
margin:10px;
}
#intro p {
line-height:16.8667px;
}
#intro strong {
font-weight:normal;
}
table {
width:100%;
}
td {
font-size:11px;
}
th {
text-align:left;
padding-right:30px;
font-size:11px;
background-color:#E1E3EE;
}
tr.high {
background-color:whitesmoke;
}
tr:hover {
background-color:#E8EAF2;
}
#footer {
width:100%;
padding:2px;
margin-top:10px;
font-size:8pt;
color:gray;
text-align:center;
}
#footer a {
color:gray;
}
a {
color:#000;
text-decoration:none;
}
a:hover {
text-decoration:underline;
}
</style>
</head>
<body>
<xsl:apply-templates></xsl:apply-templates>
</body>
</html>
</xsl:template>
<xsl:template match="sitemap:urlset">
<h1 align="center">XML Sitemap</h1>
<div id="content">
<table cellpadding="5">
<tr style="border-bottom:1px black solid;">
<th width="70%">URL</th>
<th width="5%">Priority</th>
<th width="12%">Change frequency</th>
<th width="13%">Last modified</th>
</tr>
<xsl:variable name="lower" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="upper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:for-each select="./sitemap:url">
<tr>
<xsl:if test="position() mod 2 != 1">
<xsl:attribute name="class">high</xsl:attribute>
</xsl:if>
<td>
<xsl:variable name="itemURL">
<xsl:value-of select="sitemap:loc"/>
</xsl:variable>
<a href="{$itemURL}">
<xsl:value-of select="sitemap:loc"/>
</a>
</td>
<td>
<xsl:value-of select="concat(sitemap:priority*100,'%')"/>
</td>
<td>
<xsl:value-of select="concat(translate(substring(sitemap:changefreq, 1, 1),concat($lower, $upper),concat($upper, $lower)),substring(sitemap:changefreq, 2))"/>
</td>
<td>
<xsl:value-of select="concat(substring(sitemap:lastmod,0,11),concat(' ', substring(sitemap:lastmod,12,5)))"/>
</td>
</tr>
</xsl:for-each>
</table>
</div>
<div id="footer">Index Sitemap by www.adydev.com</div>
</xsl:template>
<xsl:template match="sitemap:sitemapindex">
<h1 align="center">XML Sitemap Index</h1>
<div id="content">
<table cellpadding="5">
<tr style="border-bottom:1px black solid;">
<th width="85%">URL of sub-sitemap</th>
<th width="15%">Last modified</th>
</tr>
<xsl:for-each select="./sitemap:sitemap">
<tr>
<xsl:if test="position() mod 2 != 1">
<xsl:attribute name="class">high</xsl:attribute>
</xsl:if>
<td>
<xsl:variable name="itemURL">
<xsl:value-of select="sitemap:loc"/>
</xsl:variable>
<a href="{$itemURL}">
<xsl:value-of select="sitemap:loc"/>
</a>
</td>
<td>
<xsl:value-of select="concat(substring(sitemap:lastmod,0,11),concat(' ', substring(sitemap:lastmod,12,5)))"/>
</td>
</tr>
</xsl:for-each>
</table>
</div>
<div id="footer">Index Sitemap by www.adydev.com</div>
</xsl:template>
</xsl:stylesheet>
.htaccess :
RewriteRule ^sitemap.xml$ sitemap.php [L]
For multilanguage sitemap, index sitemap and automate sitemap, please contact me. Thank you!

There is no standard module or extension available; the sitemap is frontend-specific since everybody has different URL patterns and non-commerce content on the site.
A sitemap needs to be built fitting to the frontend technology your project is developed in.

I have returned to this question to tell you that we finally managed to solve our need by using a module for Play Framework that is precisely capable of generating sitemaps using the URLs that you pass.
We have downloaded the module from the repository of its creators (https://github.com/edulify/play-sitemap-module.edulify.com) and, after configuring some different providers for products, categories and static pages, since we wanted each type of link to have a different refresh frequency and priority for search engines, we have managed to generate our sitemap.xml automatically every 24h.
If someone needs help to implement this funcionality in your store with Sunrise, contact me and I will try to help you.
Thank you very much to all for trying to help us.
Greetings.
Miguel

Related

xpath - how to find an embedded li with an input element inside it?

Given this HTML:
<li class="check_boxes input optional" id="activity_roles_input">
<fieldset class="choices">
<legend class="label"><label>Roles</label></legend>
<input id="activity_roles_none" name="activity[role_ids][]" type="hidden" value="" />
<ol class="choices-group">
<li class="choice">
<label for="activity_role_ids_104">
<input id="activity_role_ids_104" name="activity[role_ids][]" type="checkbox" value="104" />Language Therapist
</label>
</li>
<li class="choice">
<label for="activity_role_ids_103">
<input id="activity_role_ids_103" name="activity[role_ids][]" type="checkbox" value="103" />Speech Therapist
</label>
</li>
</ol>
</fieldset>
</li>
I am trying to use Selenium and xpath with it.
I am trying to select the first 'checkbox' input element link.
I am having problems selecting the element.
I cannot use the db ID (104) as this is for repeated tests with new ID's each time. I need to select the 'first' input checkbox, based on it having the text for Language Therapist.
I have tried:
xpath=(//li[contains(#id,'activity_roles_input')])//input
and
xpath=(//li[contains(#id,'activity_roles_input')])//contains('Language Therapist")
but it is not finding the element.
When I do:
xpath=(//li[contains(#id,'activity_roles_input')])
it gets to the input set. The problem I am having is selecting the first input checkbox control for 'Language Therapist'.
First, find any <li> containing the text and than look for in the descendant of those for the first checkbox.
xpath=(//li[contains(., "Language Therapist")]/descendant::input[#type="checkbox"][1])
(From Michael)
The above worked for me. In the end I actually used
xpath=(//li[contains(#id,'activity_roles_input')]/descendant::input[#type="checkbox"][1])
becuase I liked ID'ing by css ID.
interesting fact to notice when I try to run this small xsl against your xml.
XSL:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="//li[#id ='activity_roles_input']">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Output:
Roles
Language Therapist
Speech Therapist
You have
xpath=(//li[contains(#id,'activity_roles_input')])//input
Shouldn't that be
xpath=(//li[contains(#id,'activity_roles_input')]//input)
or rather
xpath=(//li[#id='activity_roles_input']//input)
?
xpath=(//li[#id='activity_roles_input']//input[1])

How to get text between two strings with special characters in ruby?

I have a string (#description) that contains HTML code and I want to extract the content between two elements. It looks something like this
<b>Content title<b><br/>
*All the content I want to extract*
<a href="javascript:print()">
I've managed to do something like this
#want = #description.match(/Content title(.*?)javascript:print()/m)[1].strip
But obviously this solution is far from perfect as I get some unwanted characters in my #want string.
Thanks for your help
Edit:
As requested in the comments, here is the full code:
I'm already parsing an HTML document doing something where the following code:
#description = #doc.at_css(".entry-content").to_s
puts #description
returns:
<div class="post-body entry-content">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"><br><br><div style="text-align: justify;">
Some text</div>
<b>More text</b><br><b>More text</b><br><br><ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br><b>Content Title</b><br>
Some text<br><br>
Some text(with links and images)<br>
Some text(with links and images)<br>
Some text(with links and images)<br>
<br><br><img src="http://url.com/photo.jpg">
<div style="clear: both;"></div>
</div>
The text can include more paragraphs, links, images, etc. but it always starts with the "Content Title" part and ends with the javascript reference.
This XPath expression selects all (sibling) nodes between the nodes $vStart and $vEnd:
$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
To obtain the full XPath expression to use in your specific case, simply substitute $vStart with:
/*/b[. = 'Content Title']
and substitute $vEnd with:
/*/a[#href = 'javascript:print()']
The final XPath expressions after the substitutions is:
/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
Explanation:
This is a simple corollary of the Kayessian formula for the intersection of two nodesets $ns1 and $ns2:
$ns1[count(.|$ns2) = count($ns2)]
In our case, the set of all nodes between the nodes $vStart and $vEnd is the intersection of two node-sets: all following siblings of $vStart and all preceding siblings of $vEnd.
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vStart" select="/*/b[. = 'Content Title']"/>
<xsl:variable name="vEnd" select="/*/a[#href = 'javascript:print()']"/>
<xsl:template match="/">
<xsl:copy-of select=
"$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
"/>
==============
<xsl:copy-of select=
"/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (converted to a well-formed XML document):
<div class="post-body entry-content">
<a href="http://www.photourl">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"/>
</a>
<br />
<br />
<div style="text-align: justify;">
Some text</div>
<b>More text</b>
<br />
<b>More text</b>
<br />
<br />
<ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br />
<b>Content Title</b>
<br />
Some text
<br />
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
<br />
<br />
<a href="javascript:print()">
<img src="http://url.com/photo.jpg"/>
</a>
<div style="clear: both;"></div>
</div>
the two XPath expressions (with and without variable references) are evaluated and the nodes selected in each case, conveniently delimited, are copied to the output:
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
==============
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
To test your HTML, I have added tags around your code then pasting it in a file
xmllint --html --xpath '/html/body/div/text()' /tmp/l.html
output :
Some text
Some text
Some text
Some text
Now, you can use an Xpath module in ruby and re-use the Xpath expression
You will find many examples on stackoverflow website searches.

xpath syntax in Scrapy

team = hxs.select ('//table[#class="tablehead"/tbody/tr[contains[.#class, "player"]')
The structure of the web site I whose table I want to select is as follows:
<html>
<body>
<table>
<tbody>
<tr>
<td>...</td>
<td>...</td>
...
</tr>
</tbody>
</table>
</body>
</html>
Since there are multiple tables in the web site, I only want to select the one whose class is defined as "tablehead". Also, for that table, I only want to select the tags whose class attributes contain the string "player". My attempt above looks a bit spotty to begin with. I tried running the crawler, and it says that the line I produced above is an invalid xpath line. Any advice would be nice.
I've came across these problems before, try to omit tbody in the xpath expression.
//table[#class="tablehead"/tbody/tr[contains[.#class, "player"]
Correcting this results in:
//table[#class='tablehead']/tbody/tr[contains(#class, 'player')]
This selects every tr the string value of whose class attribute contains the string "player" and that (the tr) is a child of a tbody that is a child of any table in the XML document, whose class attribute has string value "tablehead" .
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"//table[#class='tablehead']
/tbody/tr[contains(#class, 'player')]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (made just a little bit more realistic):
<html>
<body>
<table class="tablehead">
<tbody>
<tr class="major-player">
<td>player1</td>
<td>player2</td>
</tr>
</tbody>
</table>
</body>
</html>
the Xpath expression is evaluated and the selected nodes (just one in this case) are copied to the output:
<tr class="major-player">
<td>player1</td>
<td>player2</td>
</tr>

Ruby Nokogiri - XPATH using URL

I have this table:
<tr>
<td><b>Amount</b></td>
<td><b>Due Date</b></td>
<td"><b>Link</b></td>
</tr>
<tr>
<td>02/13/2012</td>
<td>$81.66</td>
<td><a onclick="javascript:window.open('/cso/displaypdfbill?selectedBillkey=449409587','_blank');" href="javascript: void(0);">View Bill</a></td>
</tr>
<tr>
<td>01/13/2012</td>
<td>$181.66</td>
<td><a onclick="javascript:window.open('/cso/displaypdfbill?selectedBillkey=543409587','_blank');" href="javascript: void(0);">View Bill</a></td>
</tr>
I am looping through the table and extracting the Bill key in each row. I removed the Billkey and stored it into a variable.
BillKey = 449409587
What I want is to get the <tr> where that BillKey is located:
So I should have:
2/13/2012 81.86 View Bill
I am having trouble writing the XPATH to get the <tr>.
Use:
string(table/tr
[td/a/#onclick
[substring
(.,
string-length()
- 21
)
=
$vEnding
]
]
)
where $vEnding must be substituted by the string: "=449409587','_blank');"
So, the complete XPath expression after this substitution is:
string(table/tr
[td/a/#onclick
[substring
(.,
string-length()
- 21
)
=
"=449409587','_blank');"
]
]
)
XSLT - based verification:
This XSLT transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vEnding">=449409587','_blank');</xsl:variable>
<xsl:template match="/">
<xsl:copy-of select=
"string(table/tr
[td/a/#onclick
[substring
(.,
string-length()
- 21
)
=
$vEnding
]
]
)
"/>
</xsl:template>
</xsl:stylesheet>
when applied on the following XML document (the provided one wrapped in a single top element table):
<table>
<tr>
<td>
<b>Amount</b>
</td>
<td>
<b>Due Date</b>
</td>
<td>
<b>Link</b>
</td>
</tr>
<tr>
<td>02/13/2012</td>
<td>$81.66</td>
<td>
<a onclick=
"javascript:window.open('/cso/displaypdfbill?selectedBillkey=449409587','_blank');" href="javascript: void(0);">View Bill</a>
</td>
</tr>
<tr>
<td>01/13/2012</td>
<td>$181.66</td>
<td>
<a onclick=
"javascript:window.open('/cso/displaypdfbill?selectedBillkey=543409587','_blank');" href="javascript: void(0);">View Bill</a>
</td>
</tr>
</table>
evaluates the XPath expression and copies to the output the result of the evaluation:
02/13/2012
$81.66
View Bill

IE8 overflow:auto with max-height

I have an element which may contain very big amounts of data, but I don't want it to ruin the page layout, so I set max-height: 100px and overflow:auto, hoping for scrollbars to appear when the content does not fit.
It all works fine in Firefox and IE7, but IE8 behaves as if overflow:hidden was present instead of overflow:auto.
I tried overflow:scroll, still does not help, IE8 simply truncates the content without showing scrollbars. Changing max-height declaration to height makes overflow work OK, it's the combination of max-height and overflow:auto that breaks things.
This is also logged as an official bug in the final, release version of IE8
Is there a workaround? For now I resorted to using height instead of max-height, but it leaves plenty of empty space in case there isn't much data.
This is a really nasty bug as it affects us heavily on Stack Overflow with <pre> code blocks, which have max-height:600 and width:auto.
It is logged as a bug in the final version of IE8 with no fix.
http://connect.microsoft.com/IE/feedback/ViewFeedback.aspx?FeedbackID=408759
There is a really, really hacky CSS workaround:
http://my.opera.com/dbloom/blog/2009/03/11/css-hack-for-ie8-standards-mode
/*
SUPER nasty IE8 hack to deal with this bug
*/
pre
{
max-height: none\9
}
and of course conditional CSS as others have mentioned, but I dislike that because it means you're serving up extra HTML cruft in every page request.
{
overflow:auto
}
Try div overflow:auto
I saw this logged as a fixed bug in RC1. But I've found a variation that seems to cause a hard assert render failure. Involves these two styles in a nested table.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Test</title>
<style type="text/css">
.calendarBody
{
overflow: scroll;
max-height: 500px;
}
</style>
</head>
<body>
<table>
<tbody>
<tr>
<td>
This is a cell in the outer table.
<div class="calendarBody">
<table>
<tbody>
<tr>
<td>
This is a cell in the inner table.
</td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</tbody>
</table>
</body>
</html>
{max-height:200px, Overflow:auto}
Thanks to Srinivas Tamada, The above code did work for me.
Similar situation, a pre element with maxHeight set by js to fit in allotted space, width 100%, overflow auto. If the content is shorter than maxHeight and also fits horizontally, we're good. If you resize the window so the content no longer fits horizontally, a horizontal scrollbar appears, but the height of element immediately jumps to the full maxHeight, regardless of the height of the content.
Tried various forms of the css hack mentioned by Jeff, but didn't find anything like it that wasn't a js bad-parameter error.
Best I could find was to pick your poison for ie8: Either drop the maxHeight limit, so the element can be any height (best for my case), or set height rather than maxHeight, so it's always that tall even if the content itself is much shorter. Very not ideal. Wacked behavior is gone in ie9.
Set max-height only and don't set the overflow. This way it will show scroll bar if content is more than max-height and shrinks if content is less than the max-height.
To reproduce:
(This crashes the whole page.)
<HTML>
<HEAD>
<META content="IE=8" http-equiv="X-UA-Compatible"/>
</HEAD>
<BODY>
look:
<TABLE width="100%">
<TR>
<TD>
<TABLE width="100%">
<TR>
<TD>
<DIV style="overflow-y: scroll; max-height: 100px;">
X
</DIV>
</TD>
</TR>
</TABLE>
</TD>
</TR>
</TABLE>
</BODY>
</HTML>
(Whereas this works fine...)
<HTML>
<HEAD>
<META content="IE=8" http-equiv="X-UA-Compatible"/>
</HEAD>
<BODY>
look:
<TABLE width="100%">
<TR>
<TD>
<TABLE width="100%">
<TR>
<TD>
<DIV style="overflow-y: scroll; max-height: 100px;">
The quick brown fox
</DIV>
</TD>
</TR>
</TABLE>
</TD>
</TR>
</TABLE>
</BODY>
</HTML>
(And, madly, so does this. [No content in the div at all.])
<HTML>
<HEAD>
<META content="IE=8" http-equiv="X-UA-Compatible"/>
</HEAD>
<BODY>
look:
<TABLE width="100%">
<TR>
<TD>
<TABLE width="100%">
<TR>
<TD>
<DIV style="overflow-y: scroll; max-height: 100px;">
</DIV>
</TD>
</TR>
</TABLE>
</TD>
</TR>
</TABLE>
</BODY>
</HTML>
I found this :
https://perishablepress.com/maximum-and-minimum-height-and-width-in-internet-explorer/
This method has been verified in IE6 and should also work in IE5. Simply change the values to suit your needs (code commented with explanatory notes). In this example, we are setting the max-height at 333px 1 for IE and all standards-compliant browsers:
* html div#division {
height: expression( this.scrollHeight > 332 ? "333px" : "auto" ); /* sets max-height for IE */
}
and this works for me perfectly so I decided to share this.

Resources