Is there a way to disable validation when using Saxon in the XSLT Ant task? - validation

I am running some XSL transforms via Ant's XSLT task. I am using Saxon 9HE as the processing engine. I have a situation where the input XML files all use the same DTD but declare it to be in different places. Some declare it to be in the current directory, some in a folder and others reference a URL. Here is the Ant script:
<?xml version="1.0" encoding="UTF-8"?>
<project name="PubXML2EHeader" default="transform">
<property name="data.dir.input" value="./InputXML"/>
<property name="data.dir.output" value="./converted-xml"/>
<property name="xslt.processor.location" value="D:\\saxon9he.jar"/>
<property name="xslt.processor.factory" value="net.sf.saxon.TransformerFactoryImpl"/>
<path id="saxon9.classpath" location="${xslt.processor.location}"/>
<target name="clean">
<delete dir="${data.dir.output}" includes="*.xml" failonerror="no"/>
</target>
<target name="transform" depends="clean">
<xslt destdir="${data.dir.output}"
extension=".xml"
failOnTransformationError="false"
processor="trax"
style="Transform.xsl"
useImplicitFileset="false"
classpathref="saxon9.classpath"
>
<outputproperty name="method" value="xml"/>
<outputproperty name="indent" value="yes"/>
<fileset dir="${data.dir.input}" includes="**/*.xml" excludes="Transform.xml"/>
<factory name="${xslt.processor.factory}"/>
</xslt>
</target>
</project>
When I run this Ant script I get errors like this:
[xslt] : Fatal Error! I/O error reported by XML parser processing
file:/D:/annurev.biophys.093008.131228.xml:
http://www.atypon.com/DTD/nlm-dtd/archivearticle.dtd Cause:
java.io.FileNotFoundException:
http://www.atypon.com/DTD/nlm-dtd/archivearticle.dtd
I think these are caused by the fact that Saxon cannot get to the DTD (which is a actually a firewall issue in this case). I don't think I care about validating the input, which is what I think is happening here, and I would like to skip it. Is there an attribute I can add to the XSLT Ant task to stop Saxon from trying to read in the DTD?

You are confusing "reading the DTD" with validating. An XSLT processor will always ask the parser to read the external DTD of a document whether it is validating or not. This is because a DTD is used for more than validation; it is also used for expansion of entity references.
The usual way to deal with this problem is to redirect the DTD reference to a copy that is somewhere it can be accessed, generally by use of catalogs. This involves setting an EntityResolver on the underlying XML parser.
There's lots of information on the web about how to set up a catalog resolver with Saxon, usually from the command line: see for example here: http://www.sagehill.net/docbookxsl/UseCatalog.html
The advice is generally to set the -x, -y, and -r options, but in fact only -x is relevant if you only need to redirect DTD references in source documents (-y affects stylesheets, -r affects the document() function). In Ant, the equivalent to setting the -x option is to use the attribute child of the factory element to set the configuration property<attribute name="http://saxon.sf.net/feature/sourceParserClass" value="org.apache.xml.resolver.tools.ResolvingXMLReader"/>.
That still leaves the part I find tricky, which is actually creating your catalog file.

Related

Return a custom data type from Ant to Maven

I have an Ant script called from Maven to perform some tasks. One such task is to read a text file linewise and use its content to perform text replacements on a huge set of files. The text file will be in the following format on each line:
name#URL#Title
Ant task for this is as follows:
<target name="update-urls" depends="load-ant-contrib">
<loadfile property="file" srcfile="${mapping.file}"/>
<foreach param="file.entry" list="${file}" delimiter="${line.separator}" target="update-template"></foreach>
</target>
<!--Get the name,url and title of each line-->
<target name="update-template">
<propertyregex property="name"
input="${file.entry}"
regexp="(.*)#(.*)#(.*)$"
select="\1"/>
<propertyregex property="url"
input="${file.entry}"
regexp="(.*)#(.*)#(.*)$"
select="\2"/>
<propertyregex property="title"
input="${file.entry}"
regexp="(.*)#(.*)#(.*)$"
select="\3"/>
<echo>${template.name}</echo>
<!--Use title and url to match and replace the URL with the URL in text file-->
<replaceregexp byline="true"
match="(<title>${title} *</title>.*)${url} *"
replace="\1../modified-path/${name}.zip"
flags="g"
encoding="utf-8"
>
<!--Following are the files on which replacement has to happen-->
<fileset dir="${huge.set.of.files}/items">
<include name="**/info.xml"/>
</fileset>
</replaceregexp>
</target>
Since this needs to work on huge set of files, the above solution is not very efficient. I have coded a mutithreaded solution for it in Java which takes up multiple regex expressions(name, value pairs) at once.I've integrated it into my maven project as a mojo.
I'm having trouble with two things:
Framing a data-type to contain a list of name, url, title triplets in Ant.
If I'm successful with step 1., how to return this data structure to POM from where the Ant task got called and decode this data structure to be able to access each element's name, url and title separately..
Can someone guide me on how to go about this..

Ant - Using external file to use in patternset

In Ant, we use patternset to include or exclude some set of file using a pattern such as
<unzip src="${tomcat_src}/tools-src.zip"
dest="${tools.home}">
<patternset>
<include name="**/*.java"/>
<exclude name="**/Test*.java"/>
</patternset>
</unzip>
Is Ant capable of taking this patternset from an external file say txt or xml?
Seeing around the Ant the wiki does not mention of such usuage, but i am thinking otherwise.
Consider using includesfile/excludesfile or includes/excludes attributes of patternset.
In case of includes/excludes, you can use values of properties stored in your normal property file.

Generate Checksum for directories using Ant build command

I tried to generate the checksum for directory using ant.
I have tried the below command, but it generates recursively inside each folder for each file.
<target name="default" depends="">
<echo message="Generating Checksum for each environment" />
<checksum todir="${target.output.dir}" format="MD5SUM" >
<fileset dir="${target.output.dir}" />
</checksum>
</target>
I just want to generate one checksum for particular directory using Ant command.
How do I do that?
You want to use the totalproperty attribute. As per the documentation this property will hold a checksum of all the checksums and file paths.
e.g.
<target name="hash">
<checksum todir="x" format="MD5SUM" totalproperty="sum.of.all">
<fileset dir="x"/>
</checksum>
<echo>${sum.of.all}</echo>
</target>
Some other general notes.
This is not idempotent. Each time you run it you will get a new value because it includes the previous hash file in the new hash (and then writes a new hash file). I suggest that you change the todir attribute to point elsewhere
It's a good idea to name your targets meaningfully. See this great article by Martin Fowler for some naming ideas
You don't need the depends attribute if there's no dependency.

Can MSBuild exclude "Hidden" Web Deploy parameters from the generated SetParameters.xml?

In my Parameters.xml file, I have a couple of parameters that use the Web Deploy "variable" syntax to refer to other parameters, like this one that refers to the IIS Web Application Name parameter:
<parameter name="MyParam"
defaultValue="{IIS Web Application Name}/Web.config"
tags="Hidden"/>
My problem is that VS automatically imports this parameter into my SetParameters.xml file when I build the deployment package in spite of it being tagged as hidden. When it is passed to msdeploy via setParamFile, Web Deploy literally interprets the value of the parameter as
{IIS Web Application Name}/Web.config
rather than substituting the IIS application name.
If I remove the parameter from the auto-generated SetParameters.xml file, the variable works as expected. Is there any way to prevent VS from including that parameter in the first place, either by name or by tag?
This was actually far easier than I thought, given the answer to my earlier question.
I just needed to add a Hidden tag in the target that follows AddIisAndContentDeclareParametersItems. This apparently sets the tag in the source manifest prior to the package being built. It ends up looking something like this:
<Target Name="DeclareCustomParameters"
AfterTargets="AddIisAndContentDeclareParametersItems">
<ItemGroup>
<MsDeployDeclareParameters Include="Foo">
<!-- <snip> -->
<!-- the following elements are the important ones: -->
<Tags>Hidden</Tags>
<ExcludeFromSetParameter>True</ExcludeFromSetParameter>
</MsDeployDeclareParameters>
</ItemGroup>
</Target>
That was it!
This answer is for anyone else looking for a more complete example of substitution via targets. This example shows substituting a variable "database server name" into a connection string.
The ExcludeFromSetParameter element appears to be the key to making substitution work as it keeps the param out of the SetParameters.xml file (as the OP mentioned he did manually). Unfortunately, I don't think that ExcludeFromSetParameter can be set from a parameters.xml file, so this is the only option...
<Target Name="DeclareCustomParameters" BeforeTargets="Package">
<ItemGroup>
<MsDeployDeclareParameters Include="DatabaseServer">
<Description>Location of the database server hosting the user database</Description>
<Value>localhost</Value>
<DefaultValue>localhost</DefaultValue>
<Tags>DBServer, SQL</Tags>
</MsDeployDeclareParameters>
<MsDeployDeclareParameters Include="DB Connection String">
<Kind>XmlFile</Kind>
<Scope>Web.config</Scope>
<Match>/configuration/connectionStrings/add[#name='Database']/#connectionString</Match>
<Description>The connection string to the Database</Description>
<DefaultValue>Data Source={DatabaseServer};Initial Catalog=MyDatabase;Integrated Security=true;MultipleActiveResultSets=true;</DefaultValue>
<Tags>Hidden</Tags>
<ExcludeFromSetParameter>True</ExcludeFromSetParameter>
</MsDeployDeclareParameters>
</ItemGroup>
</Target>

Configuring Cruise Control Net with sourcesafe - Unable to load array item 'executable'

I'm trying to create a continuous integration environment. To do so, I've used a guide that can be found at http://www.15seconds.com/issue/040621.htm.
In this step by step, the goal is to create a CI with CCNet, NAnt, NUni, NDoc, FxCop and source safe.
I've been able to create my build by using the command prompt (despite the the different versions issues). The problem has come with the configuration of ccnet.config
I've made some changes because of the new versions, but I'm still getting errors when starting the CCNet server.
Can anyone help me to fix this issue or point where to find a guide with this scenario?
The error that I'm getting:
Unable to instantiate CruiseControl projects from configuration document.
Configuration document is likely missing Xml nodes required for properly populating CruiseControl configuration.
Unable to load array item 'executable' - Cannot convert from type System.String to ThoughtWorks.CruiseControl.Core.ITask for object with value: "\DevTools\nant\bin\NAnt.exe"
Xml: E:\DevTools\nant\bin\NAnt.exe
My CCNet config file below:
<cruisecontrol>
<project name="BuildingSolution">
<webURL>http://localhost/ccnet</webURL>
<modificationDelaySeconds>10</modificationDelaySeconds>
<triggers>
<intervaltrigger name="continuous" seconds="60" />
</triggers>
<sourcecontrol type="vss" autoGetSource="true">
<ssdir>E:\VSS\</ssdir>
<executable>C:\Program Files\Microsoft Visual SourceSafe\SS.EXE</executable>
<project>$/CCNet/slnCCNet.root/slnCCNet</project>
<username>Albert</username>
<password></password>
</sourcecontrol>
<prebuild type="nant">
<executable>E:\DevTools\nant\bin\NAnt.exe</executable>
<buildFile>E:\Builds\buildingsolution\WebForm.build</buildFile>
<logger>NAnt.Core.XmlLogger</logger>
<buildTimeoutSeconds>300</buildTimeoutSeconds>
</prebuild>
<tasks>
<nant>
<executable>E:\DevTools\nant\bin\nant.exe</executable>
<nologo>true</nologo>
<buildFile>E:\Builds\buildingsolution\WebForm.build</buildFile>
<logger>NAnt.Core.XmlLogger</logger>
<targetList>
<target>build</target>
</targetList>
<buildTimeoutSeconds>6000</buildTimeoutSeconds>
</nant>
</tasks>
<publishers>
<merge>
<files>
<file>E:\Builds\buildingsolution\latest\*-results.xml</file>
</files>
</merge>
<xmllogger />
</publishers>
</project>
</cruisecontrol>
enter code here
This is only a first guess but configuration in <prebuild> element might be broken. Try this:
<prebuild>
<nant>
<executable>E:\DevTools\nant\bin\NAnt.exe</executable>
<buildFile>E:\Builds\buildingsolution\WebForm.build</buildFile>
<logger>NAnt.Core.XmlLogger</logger>
<buildTimeoutSeconds>300</buildTimeoutSeconds>
</nant>
</prebuild>
Just like the <tasks> block the <prebuild> block is a collection of task elements. In Your case this is a single <nant> task.
Currently I don't have access to CCNET documentation since the ThoughtWorks server is down - once again. So I'm not able to verify my advice at the moment.
BTW: Did You know that You don't have to start the server in order to verify Your configuration. Check the configuration with CCValidator.exe from [installdir]\server prior to starting CCNET server.

Resources