xpath for youtube entry search result - xpath

i need help with the xpath query to extract the title text and thumbnail url of each entry in the following rss results
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:media='http://search.yahoo.com/mrss/'>
<entry>
<title>LiveFit Augusta - "Gym Time"</title>
<media:group>
<media:thumbnail url='http://i.ytimg.com/vi/Ig7CcLCR2n4/default.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/Ig7CcLCR2n4/mqdefault.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/Ig7CcLCR2n4/hqdefault.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/Ig7CcLCR2n4/1.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/Ig7CcLCR2n4/2.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/Ig7CcLCR2n4/3.jpg'/>
</media:group>
</entry>
<entry>
<title>LiveFit Augusta - "Everyday Joes & Janes"</title>
<media:group>
<media:thumbnail url='http://i.ytimg.com/vi/REhn9jjjV7Q/default.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/REhn9jjjV7Q/mqdefault.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/REhn9jjjV7Q/hqdefault.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/REhn9jjjV7Q/1.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/REhn9jjjV7Q/2.jpg'/>
<media:thumbnail url='http://i.ytimg.com/vi/REhn9jjjV7Q/3.jpg'/>
</media:group>
</entry>
</feed>
the title
< title type='text'>Evolution of Dance< /title>
or
< media:title type='plain'>Evolution of Dance< /media:title>
thumbnail
<media:thumbnail
url='http://img.youtube.com/vi/dMH0bHeiRNg/1.jpg' height='97' width='130'
time='00:01:30'/>
EDIT
here is the code am using
Dim xmlDoc As MSXML2.DOMDocument30
Dim xmlNode As MSXML2.IXMLDOMNode
Dim xmlEntryNodes As IXMLDOMNodeList
Dim ns As String
Set xmlDoc = New DOMDocument30
ns = "xmlns:x='http://www.w3.org/2005/Atom' xmlns:media='http://search.yahoo.com/mrss/'"
xmlDoc.setProperty "SelectionLanguage", "XPath"
xmlDoc.setProperty "SelectionNamespaces", ns
If xmlDoc.loadXML(xmlText) = False Then
Exit Function
End If
Set xmlEntryNodes = xmlDoc.documentElement.selectNodes("/feed/entry")
debug.print xmlEntryNodes.Length returns 0

Use:
ns = "xmlns:x='http://www.w3.org/2005/Atom' xmlns:media='http://search.yahoo.com/mrss/'"
doc.SetProperty("SelectionNamespaces", ns)
And then use an XPath expression like this:
/*/x:entry/x:title[. = 'Evolution of Dance']
and this:
/*/x:entry/media:group/media:thumbnail[#url='http://img.youtube.com/vi/dMH0bHeiRNg/1.jpg']
Learn more about the SetProperty() function here:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms756048(v=vs.85).aspx
Alternatively, if you don't want to register namespaces, you may use less-readable expressions like these:
/*/*[name()='entry']/*[name()='title'][. = 'Evolution of Dance']
and this:
/*/*[name()='entry']
/*[name()='media:group']
/*[name()='media:thumbnail']
[#url='http://img.youtube.com/vi/dMH0bHeiRNg/1.jpg']

Is this what you are looking for?
title
/feed/entry/title
media:title
/feed/entry/media:group/media:title
thumbnail
/feed/entry/media:group/media:thumbnail

Related

Can't index data in alphabetical order in spanish alphabet before to select it in a query

I have a set of assets which had a property "name".
I want to get a dynamic number of those assets and I should get it alphabetically sorted by that "name" property.
I query that with this query:
type=dam:Asset
path=/content/dam/en/foobar/contacts/
orderby=#jcr:content/data/master/#name
orderby.sort=asc
p.limit=3
and this is working, so in a set of names:
[Paloma, Abel, José, Eduardo]
it retrieves:
Abel, Eduardo, José.
The problem is with spanish alphabet, in which Á is the same letter as A.
So in a set of:
[Paloma, Abel, José, Álvaro, Eduardo]
it retrieves:
Abel, Eduardo, José.
Being Álvaro excluded because its not part of the first 3 elements after ordeby it, when in should be the second, it should retrieve:
Abel, Álvaro, Eduardo.
So, to fix that, I've created a custom oak lucene index like below:
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:oak="http://jackrabbit.apache.org/oak/ns/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0" xmlns:rep="internal"
jcr:mixinTypes="[rep:AccessControllable]"
jcr:primaryType="nt:unstructured">
<socialLucene/>
<workflowDataLucene/>
<slingeventJob/>
<jcrLanguage/>
<versionStoreIndex/>
<repMembers/>
<cqReportsLucene/>
<commerceLucene/>
<counter/>
<authorizables/>
<enablementResourceName/>
<externalPrincipalNames/>
<cmLucene/>
<foobarCFIndexFilter
jcr:primaryType="oak:QueryIndexDefinition"
async="[async,nrt]"
evaluatePathRestrictions="{Boolean}true"
includedPaths="[/content/dam/es/foobar,/content/dam/en/foobar]"
queryPaths="[/content/dam/es/foobar,/content/dam/en/foobar]"
reindex="{Boolean}false"
reindexCount="{Long}24"
seed="{Long}3850652403740003290"
type="lucene">
<analyzers jcr:primaryType="nt:unstructured">
<default jcr:primaryType="nt:unstructured">
<filters jcr:primaryType="nt:unstructured">
<Synonym
jcr:primaryType="nt:unstructured"
format="solr"
synonyms="synonyms.txt">
<synonyms.txt/>
</Synonym>
</filters>
<tokenizer
jcr:primaryType="nt:unstructured"
name="Classic"/>
</default>
</analyzers>
<indexRules jcr:primaryType="nt:unstructured">
<nt:base jcr:primaryType="nt:unstructured">
<properties jcr:primaryType="nt:unstructured">
<title
jcr:primaryType="nt:unstructured"
analyzed="{Boolean}true"
isRegexp="{Boolean}false"
name="jcr:content/data/master/title"
nodeScopeIndex="{Boolean}true"
ordered="{Boolean}true"
propertyIndex="{Boolean}true"
type="String"/>
<date
jcr:primaryType="nt:unstructured"
name="jcr:content/data/master/date"
ordered="{Boolean}true"
propertyIndex="{Boolean}true"/>
<sectors
jcr:primaryType="nt:unstructured"
name="jcr:content/data/master/sectors"
propertyIndex="{Boolean}true"/>
<contentFragment
jcr:primaryType="nt:unstructured"
name="jcr:content/contentFragment"
propertyIndex="{Boolean}true"/>
<model
jcr:primaryType="nt:unstructured"
name="cq:model"
propertyIndex="{Boolean}true"/>
<name
jcr:primaryType="nt:unstructured"
analyzed="{Boolean}true"
isRegexp="{Boolean}false"
name="jcr:content/data/master/name"
nodeScopeIndex="{Boolean}true"
ordered="{Boolean}true"
propertyIndex="{Boolean}true"
type="String"/>
</properties>
</nt:base>
</indexRules>
</foobarCFIndexFilter>
<cqProjectLucene/>
<ntFolderDamLucene/>
<acPrincipalName/>
<uuid/>
<damAssetLucene/>
<rep:policy/>
<cqPayloadPath/>
<nodetypeLucene/>
<nodetype/>
<ntBaseLucene/>
<reference/>
<principalName/>
<cqTagLucene/>
<lucene/>
<repTokenIndex/>
<externalId/>
<authorizableId/>
<cqPageLucene/>
</jcr:root>
Where in the synonyms.txt I had:
á, a
Á, A
and so on.
Also tried with a charFilter with Mapping equivalent chars.
I have made sure that my custom oak index is the one my query is using with Query Performance Diagnosis tool.
But nothing works, after reindex the query results are the same.
How to solve that?

How to search for some XML data and repleace it with a new value using Nokogiri Ruby gem

Base on below XML exemple file employees.xml and using Ruby Nokogiri gem I wan to open this file, change the building number to 320 and the room number to 99 for Sandra Defoe and save the changes. What is the recommended way to do it.
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>
I'd use this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
</employees>
EOT
first_name = 'Sandra'
last_name = 'Defoe'
node = doc.at("//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
node.at('building').content = '320'
node.at('room').content = '99'
Which results in:
doc.to_xml
# => "\uFEFF<?xml version=\"1.0\" encoding=\"utf-16\"?>\n" +
# "<employees>\n" +
# " <employee id=\"be130\">\n" +
# " <firstname>William</firstname>\n" +
# " <lastname>Defoe</lastname>\n" +
# " <building>326</building>\n" +
# " <room>14a</room>\n" +
# " </employee>\n" +
# " <employee id=\"be132\">\n" +
# " <firstname>Sandra</firstname>\n" +
# " <lastname>Defoe</lastname>\n" +
# " <building>320</building>\n" +
# " <room>99</room>\n" +
# " </employee>\n" +
# "</employees>\n"
Normally I recommend using CSS selectors because they tend to result in less visual noise, however CSS doesn't let us peek into the text of nodes, and working around that, while possible, results in even more noise. XPath, on the other hand, can be very noisy, but for this sort of task, it's more usable.
XPath is very well documented and figuring out what this is doing should be pretty easy.
The Ruby side of it is using a "format string":
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
similar to
"%s %s" % [first_name, last_name] # => "Sandra Defoe"
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name]
# => "//employee[firstname/text()='Sandra' and lastname/text()='Defoe']"
Just for thoroughness, here's what I'd do if I wanted to use CSS exclusively:
node = doc.search('employee').find { |node|
node.at('firstname').text == first_name && node.at('lastname').text == last_name
}
This gets ugly though, because search tells Nokogiri to retrieve all employee nodes from libXML, then Ruby has to walk through them all telling Nokogiri to tell libXML to look in the child firstname and lastname nodes and return their text. That's slow, especially if there are many employee nodes and the one you want is at the bottom of the file.
The XPath selector tells Nokogiri to pass the search to libXML which parses it, finds the employee node with the child nodes containing the first and last names and returns only that node. It's much faster.
Note that at('employee') is equivalent to search('employee').first.
# File 'lib/nokogiri/xml/searchable.rb', line 70
def at(*args)
search(*args).first
end
Finally, mediate on the difference between a NodeSet#text and Node#text as the first will lead to insanity.
Assume your content is a string:
xml=%q(
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>)
doc = Nokogiri.parse(xml)
This will work but assumes the first and last names are unique, otherwise it will modify the first match of first and last name.
target = doc.css('employee').find do |node|
node.search('firstname').text == 'Sandra' &&
node.search('lastname').text == 'Defoe'
end
target.at_css('building').content = '320'
target.at_css('room').content = '99'
doc # outputs the updated xml
=> <?xml version="1.0"?>
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>320</building>
<room>99</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>

xpath selection of a title node within a resultset

I've an xml doc like below. I was trying to select a title node with a particular value in it say "![CDATA[ 1234 ]]". That Title node may be in any Type node. I was using this xpath query
/Results/ResultSet/Type[Title="![CDATA[ 1234 ]]"]
but didnt get anything selected. can someone pls help.
<Results>
<Info>...</Info>
<ResultSet num="4">
<Type type="A">
<Title>
<![CDATA[ 1234 ]]>
</Title>
<Description>
<![CDATA[ 1234 ]]>
</Description>
<Domain>
<![CDATA[1234 ]]>
</Domain>
<Target>
<![CDATA[]]>
</Target>
</Type>
<Type type="A">
<Title>
<![CDATA[ abcdef ]]>
</Title>
<Description>
<![CDATA[abcdef]]>
</Description>
<Domain>
<![CDATA[abcdef]]>
</Domain>
<Target>
<![CDATA[abcdef]]>
</Target>
</Type>
EDIT: included the ruby code that I am using
doc = Nokogiri::HTML(html)
Element = doc.xpath('/Results/ResultSet/Type/Title[text()=" 1234 "]')
if Element.empty?()
puts "not there "
else
Element.each do |node|
puts "Found Title: #{node.text}"
end
end
end
The XPath is wrong:
Use this:
/Results/ResultSet/Type/Title[text()=" 1234 "]
Based on the link OP posted for the XML, here is the working XPath:
/QuigoResults/ResultSet/Listing/Title[text()=" location in DYNAMICREGION "]

Get string between two strings?

I have a string:
<products type="array">
<product><brand>Rho2</brand>
<created-at type="datetime">2011-11-03T21:29:46Z</created-at><id type="integer">78013</id><name>Test2</name>
<price nil="true"/>
<quantity nil="true"/>
<sku nil="true"/>
<updated-at type="datetime">2011-11-03T21:29:46Z</updated-at>
</product>
<product>
<brand>Apple</brand>
<created-at type="datetime">2011-10-26T21:26:59Z</created-at>
<id type="integer">77678</id>
<name>iPhone</name>
<price>$199.99</price>
<quantity>5</quantity>
<sku>1234</sku>
<updated-at type="datetime">2011-10-26T21:27:00Z</updated-at>
</product>
I want to get the text between <brand> and </brand>.
I am trying to parse this XML, collecting data between tags.
XmlSimple should be easy.
require 'xmlsimple'
products = XmlSimple.xml_in('<YOUR WHOLE XML>', { 'KeyAttr' => 'product' })
You should use any XML parser available in you platform. Then you can use simple XPath expression:
//brand
It selects all brand elements in document.
The defacto standard for parsing XML and HTML in Ruby is Nokogiri these days:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<products type="array">
<product>
<brand>Rho2</brand>
<created-at type="datetime">2011-11-03T21:29:46Z</created-at>
</product>
<product>
<brand>Apple</brand>
<created-at type="datetime">2011-10-26T21:26:59Z</created-at>
</product>
</products>
EOT
puts doc.search('brand').map(&:text)
Which outputs:
Rho2
Apple
function getStringBetween(str , fromStr , toStr){
var fromStrIndex = str.indexOf(fromStr) == -1 ? 0 : str.indexOf(fromStr) + fromStr.length;
var toStrIndex = str.slice(fromStrIndex).indexOf(toStr) == -1 ? str.length-1 : str.slice(fromStrIndex).indexOf(toStr) + fromStrIndex;
var strBtween = str.substring(fromStrIndex,toStrIndex);
return strBtween;
}

I want to reprint the modified xml after deleting entire child node

<product>
<book>
<id>111</id>
<name>xxx</name>
</book>
<pen>
<id>222</id>
<name>yyy</name>
</pen>
<pencil>
<id>333</id>
<name>zzz</name>
</pencil>
I want to remove the "pencil" node and print the remaining xml using REXML (Ruby). Can anybody tell me how to do that ?
By using one of the delete methods http://rubydoc.info/stdlib/rexml/
require "rexml/document"
string = <<EOF
<product>
<book>
<id>111</id>
<name>xxx</name>
</book>
<pen>
<id>222</id>
<name>yyy</name>
</pen>
<pencil>
<id>333</id>
<name>zzz</name>
</pencil>
</product>
EOF
doc = REXML::Document.new(string)
doc.delete_element('//pencil')
puts doc
There is also nice tutorial to get you started: http://www.germane-software.com/software/rexml/docs/tutorial.html

Resources