How to indent STX transformation output(type "XML")? - transformation

How to indent the output XML that is generated as a result of STX transformation. I searched for it in the official documentation by looking at the transform element, but found that it does not have any indent attribute, like XSLT has. Is there a way to do it?

You can use xmllint, for example
xmllint --format file.xml > file-formated.xml

Related

How to extract values from xml in body using xpath in camel

I have an http response that is set in my exchange body. I have to extract some values from that xml. I found that the best way could be using camel-xpath. I have to extract value from root tag level. For example in the xml below, the value i want to extract would be attribute1.
<rootTag attribute1="value1">
<child1/>
</rootTag>
I saw some examples that use namespace. But i don't think i have the scope of using namespace here. If so, how could i do that. Could i not directly extract it from the body of the exchange
You can extract your attribute to the message header:
.setHeader("MyHeader").xpath("/rootTag/#attribute1", String.class)
or put attribute to the body:
.setBody().xpath("/rootTag/#attribute1", String.class)
Your do not need namespaces here..
And #Gilles Quenot is certainly right about the xpath expression.
This is the Xpath you need :
string(/rootTag/#attribute1)
tests in a shell
xmlstarlet:
$ xmlstarlet sel -t -v '/rootTag/#attribute1' file
xmllint :
$ xmllint --xpath 'string(/rootTag/#attribute1)' file
xidel :
$ xidel -se '/rootTag/#attribute1' file
Output :
value1

Add a figure element in pandoc with filters

I'm writing a filter for pandoc in python. I'm using pandocfilters.
I want to replace a Para[Image] with a Figure[InlineEl1, InlineEl2].
Figure is not supported by pandoc, so I'm using a RawBlock to write raw html. The problem is that I don't know the html for InlineEl1 and InlineEl2. I need to let pandoc process them.
Possible workaround: use a Div and then modify the resulting html file by hand.
Is there a better method?
edit: Or maybe I can put inline elements in a RawBlock? I'm just using a simple string for now. I don't know if it's possible as I don't have any documentation available. I'm just proceeding by trial and error.
As of pandoc 2.0, the figure representation in the AST is still somewhat adhoc. It's simply a paragraph that contains nothing but an image, with the image's title attribute starting with fig:.
$ echo '![caption](/url/of/image.png)' | pandoc -t native
[Para [Image ("",[],[]) [Str "caption"] ("/url/of/image.png","fig:")]]
$ echo '![caption](/url/of/image.png)' | pandoc -t html
<figure>
<img src="/url/of/image.png" alt="caption" />
<figcaption>caption</figcaption>
</figure>
See http://pandoc.org/MANUAL.html#extension-implicit_figures

xmllint to parse a html file

I was trying to parse out text between specific tags on a mac in various html files. I was looking for the first <H1> heading in the body. Example:
<BODY>
<H1>Dublin</H1>
Using regular expressions for this I believe is an anti pattern so I used xmllint and xpath instead.
xmllint --nowarning --xpath '/HTML/BODY/H1[0]'
Problem is some of the HTML files contain badly formed tags. So I get errors on the lines of
parser error : Opening and ending tag mismatch: UL line 261 and LI
</LI>
Problem is I can't just do, 2>/dev/null as then I loose those files altogether. Is there any way, I can just use an XPath expression here and just say, relax if the XML isn't perfect, just give me the value between the first H1 headings?
Try the --html option. Otherwise, xmllint parses your document as XML which is a lot stricter than HTML. Also note that XPath indices are 1-based and that HTML tags are converted to lowercase when parsing. The command
xmllint --html --xpath '/html/body/h1[1]' - <<EOF
<BODY>
<H1>Dublin</H1>
EOF
prints
<h1>Dublin</h1>

Change markdown emphasis notation in pandoc?

Is it possible to modify the character used to denote start/end of emphasis and strong emphasis in pandoc's markdown?
In particular, I'd like to use /emphasis/ and *strong emphasis*.
There is no option in pandoc to customize individual pieces of markdown syntax -- you would have to write another input format for that. I think the easiest way to achieve this is to use a pre-processor that converts your custom syntax into regular markdown-strict or markdown syntax.
Here is one example, using filepp (there are many other options, including a sed or awk script):
#regexp /\/\b/_/
#regexp /\b\//_/
#regexp /\*\b/\*\*/
#regexp /\b\*/\*\*/
Some *bold* and some /emphasis/
To add the preprocessing step to compilation:
filepp -m regexp.pm myfile.md | pandoc ...
For instance, compiling to pandoc -t html:
<p>Some <strong>bold</strong> and some <em>emphasis</em></p>
To make this durable save the preproc commands in their own file, let's say ~/.pandoc-pp
#regexp /\/\b/_/
#regexp /\b\//_/
#regexp /\*\b/\*\*/
#regexp /\b\*/\*\*/
And include at the top of every markdown document:
#include ~/.pandoc-pp
/emphasis/ is not markdown for emphasis, only *foo* and _bar_ is... and the pandoc markdown writer currently only supports the former.
Either way, if you're asking about generating markdown; you could write a pandoc filter that replaces Emph x with Str "/" <> x <> Str "/"). If you're asking about taking markdown as input to pandoc, you should probably try a preprocessor as suggested by #scoa.

Find and replace in file with script

I want to find and replace the VALUE into a xml file :
<test name="NAME" value="VALUE"/>
I have to filter by name (because there are lot of lines like that).
Is it possible ?
Thanks for you help.
Since you tagged the question "bash", I assume that you're not trying to use an XML library (although I think an XML expert might be able to give you something like an XSLT processor command that solves this question very robustly), but that you're simply interested in doing search & replace from the commandline.
I am using perl for this:
perl -pi -e 's#VALUE#replacement#g' *.xml
See perlrun man page: Very shortly put, the -p switches perl into text processing mode, -i stands for "in-place", and -e let's you specify an expression to apply to all lines of input.
Also note (if you are not too familiar with that already) that you may use other characters than # (common ones are %, a comma, etc.) that don't clash with your search & replacement strings.
There is one small caveat: perl will read & write all files given on the commandline, even those that did not change. Thus, the files' modification times will be updated even if they did not change. (I usually work around that with some more shell magic, e.g. using grep -l or grin -l to select files for perl to work on.)
EDIT: If I understand your comments correctly, you also need help with the regular expression to apply. Let me briefly suggest something like this then:
perl -pi -e 's,(name="NAME" value=)"[^"]*",\1"NEWVALUE",g' *.xml
Related: bash XHTML parsing using xpath
You can use SED:
SED 's/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/' test.xml
where test.xml is the xml document containing the given node. This is very fragile, and you can work to make it more flexible if you need to do this substitution multiple times. For instance, the current statement is case sensitive, so it won't substitute the value on a node with the name="name", but you can add a case insensitivity flag to the end of the statement, like so:
('s/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/I').
Another option would be to use XSLT, but it would require you to download an external library. It's pretty versatile, and could be a viable option for more complex modifications to an XML document.

Resources