I want to read the CDATA "Testlabel" from the entry node in a structure containing this:
<dynamic-element fieldNamespace="ddm" indexType="keyword" localizable="true" name="Label_Tag" readOnly="false" repeatable="false" required="false" showLabel="true" type="ddm-separator" width="">
<meta-data locale="nl_NL">
<entry name="label">
<![CDATA[Testlabel]]>
</entry>
<entry name="predefinedValue">
<![CDATA[]]>
</entry>
<entry name="tip">
<![CDATA[]]>
</entry>
<entry name="style">
<![CDATA[]]>
</entry>
</meta-data>
</dynamic-element>
In the Application Display Template for an AssetPublisher, I used:
#foreach ( $entry in $entries )
#set ( $renderer = $entry.getAssetRenderer() )
#set ( $className = $renderer.getClassName() )
#if ( $className == "com.liferay.portlet.journal.model.JournalArticle" )
## read article properties
#set ( $article = $renderer.getArticle() )
## read webcontent as xml
#set ( $document = $saxReaderUtil.read($article.getContent()) )
#set ( $rootElement = $document.getRootElement() )
## read general elements from webcontent
#set ( $xPathSelector = $saxReaderUtil.createXPath("dynamic-element[#name='Label_Tag']/meta-data/entry[#name='label']") )
#set ( $strLabel = $xPathSelector.selectSingleNode($rootElement).getText() )
$ xPathSelector $xPathSelector<br>
$ strLabel $strLabel <br>
#end
#end
which does not work, it prints:
$ xPathSelector [XPath: dynamic-element[#name='Label']/meta-data/entry[#name='label']]
$ strLabel $strLabel
$strLabel is not filled. What's is wrong?
You're XPath expression is correct; when applying it to your sample using xmllint I get the following result:
> xmllint -xpath "dynamic-element[#name='Label_Tag']/meta-data/entry[#name='label']" ~/test.xml
<entry name="label">
<![CDATA[Testlabel]]>
</entry>
Maybe adding text() to your XPath query might help:
> xmllint -xpath "dynamic-element[#name='Label_Tag']/meta-data/entry[#name='label']/text()" ~/test.xml
<![CDATA[Testlabel]]>
Related
I understand that other people have had similar questions but none are like this. I made a ps1 script to convert an a file of XML objects into a CSV file of rows representing some of that data. Last night I was able to run the batch file and convert files, but this morning it saves an empty CSV file when I run from batch but it works fine when I run it in Powershell ISE.
I run it from a batch file with -STA mode to enable it to open the dialog windows:
powershell -sta C:\Users\*******\Downloads\JiraXMLtoCSV.ps1
And here is the script(it was tough to make this code block lol excuse the '}'):
# This function will open a file-picker for the user to select their Jira XML Export
Function Get-JiraXMLFile(){
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null;
$OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog;
$OpenFileDialog.initialDirectory = Get-Location;
$OpenFileDialog.filter = "XML files (*.xml)|*.xml";
$OpenFileDialog.ShowDialog() | Out-Null;
$OpenFileDialog.filename;
$OpenFileDialog.ShowHelp = $true;
}
# This function will open the file save dialong to allow the user to choose location and name of the converted XML-to-CSV file
Function Get-SaveFile(){
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null;
$SaveFileDialog = New-Object System.Windows.Forms.SaveFileDialog;
$SaveFileDialog.initialDirectory = Get-Location;
$SaveFileDialog.filter = "CSV files (*.csv)|*.csv";
$SaveFileDialog.ShowDialog() | Out-Null;
$SaveFileDialog.filename;
$SaveFileDialog.ShowHelp = $true;
}
# Invoke the file-picker function and obtain input file
$inputFile = Get-JiraXMLFile;
#initialize list for items that will be extracted from XML Input File
$list = #();
# Loop through all the items in Jira XML export file
foreach ( $item in $XMLFile.rss.channel.item ) {
# Create a new hash object
$issue = #{};
# Gather wanted attributes
$issue.Key = $item.key.InnerXML;
$issue.StatusColor = $item.statusCategory.colorName;
$issue.Status = $item.status.InnerXML;
# Check for comments
if ( $item.comments ) {
# Record the comments with column name/header format as follows: comment #0 | comment #2|...
# Change this value to 1 if you want to see it start at comment #1 instead of comment #0
$incrementalCounter = 0;
# Loop through all comments on the issue
foreach ( $comment in $item.comments.comment ) {
$issue.("comment #"+$incrementalCounter) = $comment.InnerXML;
$incrementalCounter += 1;
}
}
#Create an object to be added to the list
$object = New-Object –TypeName PSObject –Prop $issue;
Write-Output $object;
# add this issue to the list to convert/export to CSV
$list += $object;
}
# Open File Saving window to choose file name and location for the new
$OutputFile = Get-SaveFile;
$list | Export-Csv -Path ($OutputFile) -NoTypeInformation;
And if you want some sample XML to help me learn what I am doing wrong:
<rss version="0.92">
<channel>
<title>XML Export</title>
<link>...</link>
<description>An XML representation of a search request</description>
<language>en-us</language>
<issue start="0" end="7" total="7"/>
<build-info>...</build-info>
<item>
<title>[AJT-46] another new story</title>
<project id="1652" key="AJT">Advanced Training</project>
<description/>
<environment/>
<key id="220774">AJT-46</key>
<status id="16615" iconUrl="https://website.com/" description="Desc text">To Do</status>
<statusCategory id="2" key="new" colorName="gray"/>
<labels></labels>
<created>Tue, 5 Jun 2018 11:25:38 -0400</created>
<updated>Tue, 5 Jun 2018 11:29:00 -0400</updated>
<due/>
</item>
</channel>
</rss>
It was working last night and now it is not working when I showed up this morning so nothing changed that I know of, I didn't reboot either. It still works in the Powershell ISE which is fine but I need the batch file method for the person I am making it for. Any help, advice, etc. is appreciated! Thanks
Changes I made and it works now, double newline separated:
# Invoke the file-picker function and obtain input file
[Xml]$inputFile = Get-JiraXMLFile;
# Grab all the items we exported, ignore the header info
if ( $inputFile ) {
#$XmlComments = Select-Xml "//comment()" -Xml $inputFile;
#$inputFile.RemoveChild($XmlComments);
$items = Select-Xml "//rss/channel/item" -Xml $inputFile;
}
# Iterate over items and grab important info to be put into CSV format
foreach ( $item in $items ){
# Create a new hash object
$issue = #{};
# Gather wanted attributes
if( $item.Node.key){
$issue.Key = $item.Node.key.InnerXML;
}
I need to get the value of the "html" key in the bellow JavaScript source code which was extracted by xpath('.//script[34]') and embedded in a html source page.
<script>
FM.view({
"ns": "pl.content.homeFeed.index",
"domid": "Pl_Official_MyProfileFeed__24",
"css": ["style/css/module/list/comb_WB_feed_profile.css?version=73267f08bd52356e"],
"js": "page/js/pl/content/homeFeed/index.js?version=dad90e594db2c334",
"html": " <div class=\"WB_feed WB_feed_v3\" pageNum=\"\" node-type='feed_list' module-type=\"feed\">\r\n...."
})
</script>
I don't know how to process the text "FM.view" especially.
I would use .re() to extract the html key value from the script:
>>> response.xpath("//script[contains(., 'Pl_Official_MyProfileFeed__24')]/text()").re(r'"html": "(.*?)"\n')
[0].strip()
u'<div class=\\"WB_feed WB_feed_v3\\" pageNum=\\"\\" node-type=\'feed_list\' module-type=\\"feed\\">\\r\\n..'
Or, you can extract the complete object from the script, load it with json and get the html value:
>>> import json
>>> data = response.xpath("//script[contains(., 'Pl_Official_MyProfileFeed__24')]/text()").re(r'(?ms)FM\.view\((\{.*?\})\)')[0]
>>> obj = json.loads(data)
>>> obj['html'].strip()
u'<div class="WB_feed WB_feed_v3" pageNum="" node-type=\'feed_list\' module-type="feed">\r\n....'
Note the (?ms) part in the regular expression - this is the way we set the flags - multiline and dotall - required for the pattern to work in this case.
Here's an alternative to regular expression + json using js2xml package.
First step is to get the JavaScript statements within <script> from HTML. You probably have that step already. Here I'm building a Scrapy selector from your input HTML. In your case you are probably working with a response within a callback:
>>> import scrapy
>>> import js2xml
>>> t = r''' <script>
... FM.view({
... "ns": "pl.content.homeFeed.index",
... "domid": "Pl_Official_MyProfileFeed__24",
... "css": ["style/css/module/list/comb_WB_feed_profile.css?version=73267f08bd52356e"],
... "js": "page/js/pl/content/homeFeed/index.js?version=dad90e594db2c334",
... "html": " <div class=\"WB_feed WB_feed_v3\" pageNum=\"\" node-type='feed_list' module-type=\"feed\">\r\n...."
... })
... </script>'''
>>> selector = scrapy.Selector(text=t, type='html')
Second step is to build a tree representation of the JavaScript program using js2xml.parse(). You get an lxml tree back:
>>> js = selector.xpath('//script/text()').extract_first()
>>> jstree = js2xml.parse(js)
>>> jstree
<Element program at 0x7ff19ec94ea8>
>>> type(jstree)
<type 'lxml.etree._Element'>
>>> print(js2xml.pretty_print(jstree))
<program>
<functioncall>
<function>
<dotaccessor>
<object>
<identifier name="FM"/>
</object>
<property>
<identifier name="view"/>
</property>
</dotaccessor>
</function>
<arguments>
<object>
<property name="ns">
<string>pl.content.homeFeed.index</string>
</property>
<property name="domid">
<string>Pl_Official_MyProfileFeed__24</string>
</property>
<property name="css">
<array>
<string>style/css/module/list/comb_WB_feed_profile.css?version=73267f08bd52356e</string>
</array>
</property>
<property name="js">
<string>page/js/pl/content/homeFeed/index.js?version=dad90e594db2c334</string>
</property>
<property name="html">
<string> <div class="WB_feed WB_feed_v3" pageNum="" node-type='feed_list' module-type="feed">
....</string>
</property>
</object>
</arguments>
</functioncall>
</program>
Third is to select the object you want from the tree.
Here, it's the 1st argument of the FM.view() call. Calling .xpath() on the lxml tree gives you a list even if you selected 1 node (XPath returns node-sets)
# select the function call for "FM.view"
# and get first argument
>>> jstree.xpath('''
//functioncall[
function[.//identifier/#name="FM"]
[.//identifier/#name="view"]]
/arguments
/*[1]''')
[<Element object at 0x7ff19ec94ef0>]
>>> args = jstree.xpath('//functioncall[function[.//identifier/#name="FM"][.//identifier/#name="view"]]/arguments/*[1]')
Fourth, convert the <object> into a Python dict using js2xml.jsonlike.make_dict():
# use js2xml.jsonlike.make_dict() on that argument
>>> js2xml.jsonlike.make_dict(args[0])
{'ns': 'pl.content.homeFeed.index', 'html': ' <div class="WB_feed WB_feed_v3" pageNum="" node-type=\'feed_list\' module-type="feed">\r\n....', 'css': ['style/css/module/list/comb_WB_feed_profile.css?version=73267f08bd52356e'], 'domid': 'Pl_Official_MyProfileFeed__24', 'js': 'page/js/pl/content/homeFeed/index.js?version=dad90e594db2c334'}
>>> from pprint import pprint
>>> pprint(js2xml.jsonlike.make_dict(args[0]))
{'css': ['style/css/module/list/comb_WB_feed_profile.css?version=73267f08bd52356e'],
'domid': 'Pl_Official_MyProfileFeed__24',
'html': ' <div class="WB_feed WB_feed_v3" pageNum="" node-type=\'feed_list\' module-type="feed">\r\n....',
'js': 'page/js/pl/content/homeFeed/index.js?version=dad90e594db2c334',
'ns': 'pl.content.homeFeed.index'}
>>>
And finally, you simply use the "html" key from that dict:
>>> jsdata = js2xml.jsonlike.make_dict(args[0])
>>> jsdata['html']
' <div class="WB_feed WB_feed_v3" pageNum="" node-type=\'feed_list\' module-type="feed">\r\n....'
>>>
I am new to ruby and I have a school project were I am parsing a xml file and need to get data after certain tags. I can only use core ruby. No gems
pFile = File.open("myfile.mzML", "r")
regmsLvl = "ms level\" value=\""
pFile.each_line { |line|
scn = line.scan(/#{regmsLvl}(\d)/)
#what I want to do but doesn't work
if scn == 1
puts("Got it!")
end
#what I have to do to compare if == 1
if scn != nil
scn.each do |val|
if val[0].to_i == 1
puts("Got it!")
end
end
end
}
# a sample line that I am parsing is:
<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1" />
This seems silly.
line.scans out put makes scn a 2d array. How can I just have it be a string that gets overridden each pass. Or how should I change this whole thing. Any suggestions are appreciated.
puts(scn) prints out the 1 but if I do scn == 1 or scn.to_i == 1 it never gets into the if. I have tried scn.pop and scn.pop.pop
I have added a section to show what I am trying to do now.
I need to check the ms level: if 1 then get scan start time and then the binary. This is the code that I am now working with.
xmlfile = File.new("afile.mzML")
xmldoc = Document.new(xmlfile)
root = xmldoc.root
puts "Root element : " + root.attributes["xmlns"]
xmldoc.elements.each("mzML/run/spectrumList/spectrum/cvParam"){
|e| if e.attributes["value"].to_i ==1
# Now I need to get start time: #
["mzML/run/spectrumList/spectrum/cvParam/scanList/scan/value"]
# and then
["mzML/run/spectrumList/spectrum/cvParam/binaryDataArrayList/binaryDataArray/binary"]
end
}
<run id="ru_0" defaultInstrumentConfigurationRef="ic_0" sampleRef="sa_0" defaultSourceFileRef="sf_ru_0">
<spectrumList count="3310" defaultDataProcessingRef="dp_sp_0">
<spectrum id="scan=8839" index="0" defaultArrayLength="171" dataProcessingRef="dp_sp_0">
<cvParam cvRef="MS" accession="MS:1000525" name="spectrum representation" />
<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1" />
<cvParam cvRef="MS" accession="MS:1000294" name="mass spectrum" />
<cvParam cvRef="MS" accession="MS:1000130" name="positive scan" />
<scanList count="1">
<cvParam cvRef="MS" accession="MS:1000795" name="no combination" />
<scan>
<cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="5429.47" unitAccession="UO:0000010" unitName="second" unitCvRef="UO" />
</scan>
</scanList>
<binaryDataArrayList count="2">
<binaryDataArray encodedLength="1824">
<cvParam cvRef="MS" accession="MS:1000514" name="m/z array" unitAccession="MS:1000040" unitName="m/z" unitCvRef="MS" />
<cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" />
<cvParam cvRef="MS" accession="MS:1000576" name="no compression" />
<binary>AAAAQBCdgkAAAACAP6KCQAAAAAA8pIJAAAAAYAWlgkAAAABgQ6aCQAAAAGCzp4JAAAAAQEaogkAAAACgDKqCQAAAAEAgqoJAAAAAwEOqgkAAAABAWKqCQAAAAGBErIJAAAAAIOetgkAAAABAMLCCQAAAAGDlsYJAAAAA4DeygkAAAACAw7SCQAAAACBauIJAAAAAwFC6gkAAAACAYb6CQAAAAIDnwYJAAAAAwDjHgkAAAAAATMyCQAAAAADnzIJAAAAAAArOgkAAAACgTc6CQAAAAKBqzoJAAAAAQJLPgkAAAACAVNCCQAAAAAAK0oJAAAAAIF7SgkAAAADABNSCQAAAAKAx1YJAAAAAYHXXgkAAAAAg3teCQAAAAOAf2oJAAAAAICbcgkAAAAAAx92CQAAAAKA03oJAAAAAIBXigkAAAABAO+KCQAAAAKCr5YJAAAAAYMnlgkAAAADgK+aCQAAAAKDq6YJAAAAAAC3qgkAAAACgNe6CQAAAAMCA74JAAAAAANL0gkAAAAAAUfiCQAAAAOCt+YJAAAAA4O75gkAAAACAPPqCQAAAAGBq/oJAAAAAwEQCg0AAAABAKAqDQAAAAAAoDoNAAAAA4G0Og0AAAADAZhKDQAAAACCBEoNAAAAAwIQWg0AAAABAjheDQAAAAMA+GoNAAAAAQIYag0AAAAAA7RyDQAAAAEB9HYNAAAAAwIseg0AAAADgbyKDQAAAAAAPJINAAAAAgEUlg0AAAACgYCaDQAAAAOBfKoNAAAAA4DAug0AAAADAZi+DQAAAAAA0MINAAAAAoFMwg0AAAAAgMjKDQAAAACA2NINAAAAAgDk2g0AAAAAg+DyDQAAAAOAfPoNAAAAAAKU/g0AAAAAgQUKDQAAAAKBVQoNAAAAAYNRHg0AAAAAgf0qDQAAAAICZSoNAAAAAIDFQg0AAAAAgM1KDQAAAAEBjUoNAAAAAoGNUg0AAAAAAZ1aDQAAAAABqWINAAAAAYHhZg0AAAACAfl2DQAAAAEAcXoNAAAAAICpfg0AAAADgw2GDQAAAAACmZ4NAAAAAQDRog0AAAABAiWqDQAAAAAAibYNAAAAAQHpug0AAAABAEnKDQAAAAABCcoNAAAAAoHxyg0AAAACgGXaDQAAAAMBDdoNAAAAAgJR2g0AAAAAgHHqDQAAAAEBGeoNAAAAAIHh6g0AAAABAl3qDQAAAAKCkfYNAAAAAYE5+g0AAAAAAm36DQAAAAEDigYNAAAAAQGWCg0AAAABAjYKDQAAAACClgoNAAAAA4ESGg0AAAABgYIaDQAAAAMDSh4NAAAAAYCqIg0AAAADAT4qDQAAAAACCioNAAAAAwJmOg0AAAABAnZKDQAAAAKDJlINAAAAAgHGWg0AAAABgl5eDQAAAAEB4mINAAAAA4B2eg0AAAADgKKCDQAAAAGAvooNAAAAAwJakg0AAAABAUaiDQAAAAGBgqoNAAAAAIBatg0AAAADAxa6DQAAAAKCosoNAAAAAICy6g0AAAAAAbrqDQAAAAACRuoNAAAAAAMa/g0AAAACgOsCDQAAAAABzwoNAAAAAIOTCg0AAAACADcWDQAAAAGB4xoNAAAAAQOfGg0AAAAAAvceDQAAAAEBZyoNAAAAA4OnKg0AAAAAgMs6DQAAAAOC/z4NAAAAAYInUg0AAAABgftaDQAAAAODC1oNAAAAAwJXXg0AAAAAAgdiDQAAAAKA/2oNAAAAAoILag0AAAABghtyDQAAAAGCm3INAAAAAAO7cg0AAAACgr9+DQAAAAGCY4oNAAAAAgDbkg0AAAABAN+WDQAAAAKBU5oNA</binary>
</binaryDataArray>
I think you were pretty close. Assuming you can use that REXML library (which looks like it's part of the core ruby library) you should be able to do this
require 'rexml/document'
xmlfile = File.new("afile.mzML")
xmldoc = REXML::Document.new(xmlfile)
root = xmldoc.root
start_time = nil
binary = nil
# get the ms level
ms_level = root.elements["spectrumList/spectrum/cvParam[#name='ms level']"].attributes["value"].to_i
if ms_level == 1
# get the scan start time
start_time = root.elements["spectrumList/spectrum/scanList/scan/cvParam[#name='scan start time']"].attributes["value"]
# get the binary
binary = root.elements["spectrumList/spectrum/binaryDataArrayList/binaryDataArray/binary"].text
end
p start_time # => "5429.47"
p binary # => that crazy long binary
This REXML tutorial is helpful: http://www.germane-software.com/software/rexml/docs/tutorial.html
Note, I made a few assumptions, like the elements would always exist, the ms level was always an int, the file structure is always the same. Those assumptions may not be true in your situation but this should be a start.
i trying (for testing purpose) to parse Google merchant XML feed, defined as:
<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="cs" xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0">
<link rel="alternate" type="text/html" href="http://www.example.com"/>
<link rel="self" type="application/atom+xml" href="http://www.example.com/cs/feed/google.xml"/>
<title>EasyOptic</title>
<updated>2014-08-01T16:31:11Z</updated>
<entry>
<title>Sluneční Brýle Producer 1 133a code_color_1 Color 1 133a RayBan</title>
<link href="http://www.example.com/cs/katalog/price-category-1-style-1-optical-glasses-producer-1-rayban-133a-code_color_1-color-1"/>
<summary>Moc krásný a velmi levný produkt</summary>
<updated>2014-08-01T16:31:11Z</updated>
<g:id>EO111</g:id>
<g:condition>new</g:condition>
<g:price>100 Kč</g:price>
<g:availability>in stock</g:availability>
<g:image_link>http://www.example.com/images/fallback/default.png</g:image_link>
<g:additional_image_link>http://www.example.com/images/fallback/default.png</g:additional_image_link>
<g:brand>Producer 1</g:brand>
<g:mpn>EO111</g:mpn>
<g:gender>female</g:gender>
<g:google_product_category>Apparel & Accessories > Clothing Accessories > Sunglasses</g:google_product_category>
<g:product_type>Sluneční Brýle </g:product_type>
</entry>
<entry>
<title>Sluneční Brýle Producer 1 133a code_color_1 Color 1 133a RayBan</title>
<link href="http://www.example.com/cs/katalog/price-category-1-style-1-optical-glasses-producer-1-rayban-133a-code_color_1-color-1"/>
<summary>Moc krásný a velmi levný produkt</summary>
<updated>2014-08-01T16:31:10Z</updated>
<g:id>EO111</g:id>
<g:condition>new</g:condition>
<g:price>100 Kč</g:price>
<g:availability>in stock</g:availability>
<g:image_link>http://www.example.com/images/fallback/default.png</g:image_link>
<g:additional_image_link>http://www.example.com/images/fallback/default.png</g:additional_image_link>
<g:brand>Producer 1</g:brand>
<g:mpn>EO111</g:mpn>
<g:gender>female</g:gender>
<g:google_product_category>Apparel & Accessories > Clothing Accessories > Sunglasses</g:google_product_category>
<g:product_type>Sluneční Brýle </g:product_type>
</entry>
</feed>
with this ruby script:
require 'nokogiri'
def have_node_with_children(body, path_type, path, children_names)
doc = Nokogiri::XML(body)
case path_type
when :xpath
nodes = doc.xpath(path)
when :css
nodes = doc.css(path)
else
nodes = doc.xpath(path)
end
nodes.each do |node|
nchildren_names=[]
for child in node.children
nchildren_names << child.name unless child.to_s.strip =="" #nokogiri takes formating spaces as blank node with name "text"
end
puts("demanded_nodes: #{children_names.sort.join(", ")} , nodes found: #{nchildren_names.sort.join(", ")} ")
missing = children_names - nchildren_names
over = nchildren_names - children_names
puts("Missing: #{missing.sort.join(", ")} , Over: #{over.sort.join(", ")} ")
end
end
EXPECTED_ENTRY_NODES=[
'title',
'link',
'summary',
'updated',
'g:id',
'g:condition',
'g:price',
'g:availability',
'g:image_link',
'g:additional_image_link',
'g:brand',
'g:mpn',
'g:gender',
'g:google_product_category',
'g:product_type'
]
file=File.open('google.xml')
have_node_with_children(file.read,:xpath,'//xmlns:entry',EXPECTED_ENTRY_NODES)
It find node 'entry' (thanks for this tip ).
But when collecting it's children method child.name returns name without namespace prefix (e.g.: <'g:brand'>.name => 'brand'.
So comparsion with demanded fields fail.
Do anybody have tip hot to get node name with/and it's namespace prefix?
If I delete namespace definitions all work fine, but I cannot change the original XML.
I use this test in rspec request test, so another namespaces with maybe indentical base node names can appear.
xml_doc = Nokogiri::XML(xml)
xml_doc.xpath("//xmlns:entry").each do |entry|
entry.xpath("./*").each do |element| #Step through all Element nodes that are direct children of <entry>
prefix = element.namespace.prefix
puts prefix ? "#{element.namespace.prefix}:#{element.name}"
: element.name
end
break #only show output for the first <entry>
end
--output:--
title
link
summary
updated
g:id
g:condition
g:price
g:availability
g:image_link
g:additional_image_link
g:brand
g:mpn
g:gender
g:google_product_category
g:product_type
Now about this:
for child in node.children
A well grounded rubyist does not ever use a for-loop...because a for_loop just calls each(), so rubyists call each() directly:
node.children.each do |child|
I'm sorting some XML and would like to sort elements based on a particular, non-unique attribute, which allows me to group like-elements. Within these groups, I need to keep the original ordering. The problem is that when sorting, the order inside these groups changes and when re-sorting then the order changes yet again. I need a sort that doesn't change things if the content of the XML hasn't changed (otherwise SVN diffs are ugly). Here is a simplified example:
$xml = [xml]#"
<root>
<element name="a" number="1" />
<element name="b" number="1" />
<element name="c" number="1" />
<element name="d" number="2" />
<element name="e" number="2" />
<element name="f" number="2" />
<element name="g" number="3" />
<element name="h" number="3" />
</root>
"#
Write-Host "`nFirst Sort produces this:"
$result1 = $xml.SelectNodes('//element') | Sort-Object -Property 'number'
Write-Host (($result1 | select -ExpandProperty 'number') + " <---Sorted on")
Write-Host (($result1 | select -ExpandProperty 'name') + " <---Order not maintained")
Write-Host "`nSorting the (already sorted) results of First Sort produces this:"
$result2 = $result1 | Sort-Object -Property 'number'
Write-Host (($result2 | select -ExpandProperty 'number') + " <---Sorted on")
Write-Host (($result2 | select -ExpandProperty 'name') + " <---Order changed again")
Here is the output:
First Sort produces this:
1 1 1 2 2 2 3 3 <---Sorted on
c b a f e d h g <---Order not maintained
Sorting the (already sorted) results of First Sort produces this:
1 1 1 2 2 2 3 3 <---Sorted on
a b c d e f g h <---Order changed again
In this example, I need to preserve the original "name" ordering within the "number groups".
Can anyone think of an easy way to make this maintain the original order and come out the same when sorting multiple times?
I'm trying to avoid adding dummy attributes representing the original ordering. Maybe there is a different .NET sorting function that's more deterministic? I searched but nobody seemed to be concerned with ordering within equivalent groups.
Thanks.
If you don't have something that already represents the natural sort order, you can just add an index property with Add-Member:
$nodes = #($xml.SelectNodes('//element'))
for ($i=0; $i -lt $nodes.Count; $i++)
{
$nodes[$i] = $nodes[$i] | Add-Member -MemberType NoteProperty "Index" $i -PassThru
}
$result1 = $nodes | Sort-Object name, Index
Then you are guaranteed to get the same ordering no matter how many times you re-sort.
Update: added syntax for PowerShell v2
Sort with respect to both the properties:
$xml.SelectNodes('//element') | Sort-Object -Property number, name
This produces this output:
name number
---- ------
a 1
b 1
c 1
d 2
e 2
f 2
g 3
h 3