I'm trying to change a indl file. The indl file is a file created by Adobe Indesign to keep the structure of a document, and is basically an XML. I want to use Nokogiri to find some selected XML nodes and replace the text with my text, saving then the xml to another file.
The XML of course is strange: i find some document to retrieve HTML tag with Nokogiri changing text but I don't know How I can manage a piece of XML like this:
<cflo>
<txsr prst="o_u5084" crst="o_u5085" trak="D_10">
<pcnt>c_tEST</pcnt>
</txsr>
<txsr prst="o_u5086" crst="o_u5c" trak="D_20">
<pcnt>c_Titolo titolo titolo</pcnt>
</txsr>
<cflo>
Basically I need to look for a combination of prst and crst attribute and replace the content inside the pcnt node.
I try with this
#doc.xpath("//txsr[prst='o_u5086' and crst='o_u5085']")
but I don't know how I can change ther text inside the pcnt node.
That's not the correct XPath. The correct XPath will look like this:
#doc.xpath("//txsr[#prst='o_u5086'][#crst='o_u5085']")
You should just take the first node from a set and use the inner_html= method to replace the text value.
Full code may be found here: https://gist.github.com/kaineer/7673698
Related
JHow do I grab this text here?
I am trying to grab the text here based on that the href contains "#faq-default".
I tried this first of all but it doesn't grab the text, only the actual href name, which is pointless:
//a/#href[contains(., '#faq-default-2')]
There will be many of these hrefs, such as default-2, default-3 so I need to do some kind of contains query, I'd guess?
You are selecting the #href node value instead of the a element value. So try this instead:
//a[contains(#href, '#faq-default-2')]
I am trying to grab some content from webpages that are not structured in a uniform fashion. What I want to do is tell the XPATH to grab any content within html tags in the order it sees them and return the results, without having to specify div names etc, as they are different and not very uniform.
So I need to know how to just say 'return any html content in the order that it's found from within tags, regardless of whether they are classes, ems, strong tags etc. The only experience I have had with XPATH is to specify actual div names, example:
//div[#id='tab_info']
This XPath,
string(/)
will return the string value of the entire XML or HTML document. That is, it'll return a single string of all of the text in document order, as requested.
sorry, a newbie Q. Is it possible to hide a specific attribute throughout an XML doc?
I need a way to synchronize the contents of the editor with non-Ace objects elsewhere on the DOM (unfortunately a SWF file that loads the xml seperately...). I thought to label each node throughout the doc, e.g. tag='1', so that if a node with a given tag is manipulated in Ace, I can just use the tag to figure out what exactly was manipulated (and vice versa, update Ace when the xml is manipulated outside of Ace).
Best that people do not manipuate these tags, hence wanting to hide them from view.
Thanks :)
you can create folds to hide text, but i think for tracking changes it is better to use anchors which keep their position relative to text
a=ace.session.doc.createAnchor(row,col); // create
a.getPosition();
a.detach(); // remove when not needed anymore
I'm completely rookie in XPath (I don't even know how to paste proper html into this post ;-p) subject and I need some help. I would like to retrieve text which is in quotation marks and put it into a one cell in Google Spreadsheet. Right now I can only retrieve this text into separate cells.
http://imm.io/oLYI
Does string(//tr[class='darkGreen']/td[2]) result in what you want? Your XML fragment looks incomplete and I'm not sure if you only want the contents of the second cell so it's a wild guess if this fits your need.
I have a collection of one thousand HTML files and need to somewhat trim them. I need to delete all the tags inside <body></body> area of those except for one, <div.pg>, to make them clean to be printed. the excess are navigation links which make the prints messy and make the pages occupy more paper. the contents are not the same so I can't find and replace the code excerpt but the tags are the same foe example there are 3 <table> tags to be deleted each with specific class. manipulate specific tags inside batch HTML files?
Any batch processing technique or software to do this job?
What an easy solution on windows?
I would use an xslt transform on each html page you have. Batch is not the tool to manipulate html files. You can use batch as a "manager" to pass the required file to the xsl transform. Also windows have a rudimentary msxml utility which you can download and install to your machine : http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=21714
That's how I would do it. I am sure there are more options.
If it is XHTML you could use XSLT to transform your HTML to "another" format. Look for example here: http://www.w3schools.com/xsl/ or here: http://help.hannonhill.com/discussions/how-do-i/269-strip-specific-html-tag-in-xslt