Exclusive XPath test - xpath

I'm trying to write a Schematron rule to perform the following test...
If no li under root contains "abc" and root contains title, "def" then report.
The problem that I'm having is that I am getting many false positives. Here is my current XPath...
<rule context="li">
<assert test="not(contains(text(),'abc')) and ancestor::root/descendant::title = 'def'">Report this.</assert>
</rule>
My output ends up reporting on each li that does not contain "abc" which I understand since it is testing every li and reporting.
However, I don't know how to write the XPath so that I can test if any li contains "abc".
Thanks!

The problem, as you hinted at, is that you've expressed this in Schematron as a rule that applies to each li element; whereas the rule you've described in English is one that applies to each root element.
So you could express it as
<rule context="root">
<assert test=".//li[contains(text(),'abc')] or
not(.//title = 'def')">Report this.</assert>
</rule>
Note that I've flipped the sense of the test, to match your English description
If no li under root contains "abc" and root contains title, "def" then report.
Since you're using an <assert> element, you assert the opposite of what you want to report. It might make more sense to use a <report> element:
<rule context="root">
<report test="not(.//li[contains(text(),'abc')]) and
.//title = 'def'">Report this.</report>
</rule>

Related

IIS Rewrite with optional query string or adding defaults

I am using IIS 8 and just learning about rewrites as I haven't ever written anything that cared about SEO.
I have the following rules that is working assuming the url looks like this: /survey/abc123/email
<rule name="Survey Rule" stopProcessing="true">
<match url="survey/([_0-9a-z-]+)/([_0-9a-z-]+)" />
<action type="Rewrite" url="survey.htm?var1={R:1}&var2={R:2}" />
</rule>
On the survey.htm page I have code that check for existence of var1 & var2 but in this rewrite if I have the url /survey/abc123 it doesn't obviously hit the Survey Rule. I have tried a couple of <conditions> but could find the right one.
I feel there must be a way to say
If {R:1} exists then var1={R:1} else var1=''
If {R:2} exists then var2={R:2} else var1=''
Ideally some type if loop. Is there any way to do this in a rewrite to that no matter how many / there are after survey, whether 0 or 10 it will always it the survey page?
I have looked at the rewrite map but I am not sure that solves this issue.
Edit
Possible urls that I would like to be rewritten:
/survey/abc123/
/survey.htm?var1=abc123
/survey/abc123/email/
/survey.htm?var1=abc123&var2=email
/survey/abc123/email/bob/
/survey.htm?var1=abc123&var2=email&var3=bob
/survey/abc123/email/bob/someOtherVar
/survey.htm?var1=abc123&var2=email&var3=bob&var4=someOtherVar
/result/1/
/result.htm?var1=1
/result/1/test#example.com
/result.htm?var1=1&var2=test#example.com
I would like the first item after the slash to be the page name and then each item after turned into the "query_string". I hope this makes a little more sense.
Short answer
You can't have exactly what you want with IIS rewrite only. And by what you want, i mean dynamically handling it with a loop.
Long answer
(1) With IIS rewrite only, this is the closest possible solution to your problem:
<rule name="Survey/Result Loop Rule" stopProcessing="true">
<match url="^(survey|result)/([^/]+)/(.*)$" />
<action type="Rewrite" url="/{R:1}/{R:3}?{R:2}={R:2}" appendQueryString="true" />
</rule>
<rule name="Survey/Result Default Rule" stopProcessing="true">
<match url="^(survey|result)/$" />
<action type="Rewrite" url="/{R:1}.htm" appendQueryString="true" />
</rule>
It will simulate a loop as long as the url contains parameters as subfolders, for both /survey/ and /result/. Then, it finally rewrites it to .htm page with query string appened. This is not possible to dynamicly generate query names such as var1 var2 etc by incrementing a number (at least, if a solution exists, it would be very tricky and heavy, because the rewrite engine is not made for this). In this example, both query names and values are the same, such as ?abc123=abc123&email=email.
(2) The cleanest way for this would be to delegate the job to the script:
<rule name="Survey/Result Default Rule" stopProcessing="true">
<match url="^(survey|result)/(.*)$" />
<action type="Rewrite" url="/{R:1}.htm?params={R:2}" />
</rule>
This rule rewrites, for instance, /survey/XXX/YYY/ZZZ/ to /survey.htm?params=XXX/YYY/ZZZ/. Since the job is delegated to the script, your htm files need to implement something like this (in pseudo code):
params = query_get('params');
// remove trailing slash in params if present
parameters = explode("/", query_get('params'))
for (i = 0; i < count(parameters); i++)
var{i+1} = parameters[i]
// var1 = parameters[0]
// var2 = parameters[1]
// var3 = parameters[2]
// and so on...
I think you get the idea.

XACML rule check between resource and subject with XPath

I can't figure out how to write a rule that would solve this requirement :
Let's assume I have this request :
<Request>
<Attributes Category="urn:oasis:names:tc:xacml:1.0:subject-category:access-subject">
<Content>
<Categories>
<Category name="cat1">
<CategoryValue>A</CategoryValue>
<CategoryValue>B</CategoryValue>
<CategoryValue>C</CategoryValue>
</Category>
<Category name="cat2">
<CategoryValue>B</CategoryValue>
<CategoryValue>E</CategoryValue>
<CategoryValue>F</CategoryValue>
</Category>
</Categories>
</Content>
</Attributes>
<Attributes Category="urn:oasis:names:tc:xacml:3.0:attribute-category:resource">
<Content>
<Categories>
<Category name="cat1">
<CategoryValue>A</CategoryValue>
</Category>
<Category name="cat2">
<CategoryValue>A</CategoryValue>
<CategoryValue>E</CategoryValue>
<CategoryValue>F</CategoryValue>
<CategoryValue>G</CategoryValue>
</Category>
</Categories>
</Content>
</Attributes>
</Request>
I want to write a policy that contains a rule with a Permit effect when for each of the Category elements of the resource, the subject has a Category with the same #name and if both of these Category elements has at least one common CategoryValue.
In this Example above :
Resource has "cat1" with "A" - Subject has "cat1" with one value that is A : Permit
Resource has "cat2" with "A", "E", "F", "G" - Subject has "cat2" with value E (or F) : Permit
Final result of the rule : Permit
My question is not on which functionId I should use, but how can I combine these conditions so that the rule behaves the way I described ? How to compare the GenericValue elements of nodes that has the same #name ?
I think I will have to use the string-at-least-one-member-of function between the values of the subject and resource "cat1", then between the subject and resource "cat2", but the real difficulty is that the PDP has no idea of the #name of the Category elements, so I can't hardcode it directly in the rule and I don't know how to select them in particular to perform the check.
Any idea on this ?
First of all, your request is invalid. You are missing some elements e.g.
ReturnPolicyIdList="true"
CombinedDecision="true"
Secondly, I would recommend you do not use XPath in XACML. It makes your policies hard to write (hence your question), hard to maintain, and hard to read (audit). It defeats the purpose of XACML in a way. Let the PEP do the heavy XML processing and send in attributes with attribute values rather than XML content.
In addition, you cannot control the iteration over the different elements / attribute values in the XML in XACML. I can implement your use case with a specific #name value but I cannot manage to do it over an array of values.
Assuming a single value, you would have to implement a condition as follows:
<xacml3:Rule RuleId="axiomatics-example-xacml30" Effect="Permit" xmlns:xacml3="urn:oasis:names:tc:xacml:3.0:core:schema:wd-17">
<xacml3:Target/>
<xacml3:Condition >
<xacml3:Apply FunctionId="urn:oasis:names:tc:xacml:1.0:function:string-at-least-one-member-of">
<xacml3:AttributeSelector Path="/Categories/Category[#name='cat1']/CategoryValue/text()" DataType="http://www.w3.org/2001/XMLSchema#string" MustBePresent="false" Category="urn:oasis:names:tc:xacml:1.0:subject-category:access-subject"/>
<xacml3:AttributeSelector Path="/Categories/Category[#name='cat1']/CategoryValue/text()" DataType="http://www.w3.org/2001/XMLSchema#string" MustBePresent="false" Category="urn:oasis:names:tc:xacml:3.0:attribute-category:resource"/>
</xacml3:Apply>
</xacml3:Condition>
</xacml3:Rule>
But you cannot really iterate over the different values

How do I parse XML with Nokogiri css selectors, using loops?

I am trying to parse this sample XML file:
<Collection version="2.0" id="74j5hc4je3b9">
<Name>A Funfair in Bangkok</Name>
<PermaLink>Funfair in Bangkok</PermaLink>
<PermaLinkIsName>True</PermaLinkIsName>
<Description>A small funfair near On Nut in Bangkok.</Description>
<Date>2009-08-03T00:00:00</Date>
<IsHidden>False</IsHidden>
<Items>
<Item filename="AGC_1998.jpg">
<Title>Funfair in Bangkok</Title>
<Caption>A small funfair near On Nut in Bangkok.</Caption>
<Authors>Anthony Bouch</Authors>
<Copyright>Copyright © Anthony Bouch</Copyright>
<CreatedDate>2009-08-07T19:22:08</CreatedDate>
<Keywords>
<Keyword>Funfair</Keyword>
<Keyword>Bangkok</Keyword>
<Keyword>Thailand</Keyword>
</Keywords>
<ThumbnailSize width="133" height="200" />
<PreviewSize width="532" height="800" />
<OriginalSize width="2279" height="3425" />
</Item>
<Item filename="AGC_1164.jpg" iscover="True">
<Title>Bumper Cars at a Funfair in Bangkok</Title>
<Caption>Bumper cars at a small funfair near On Nut in Bangkok.</Caption>
<Authors>Anthony Bouch</Authors>
<Copyright>Copyright © Anthony Bouch</Copyright>
<CreatedDate>2009-08-03T22:08:24</CreatedDate>
<Keywords>
<Keyword>Bumper Cars</Keyword>
<Keyword>Funfair</Keyword>
<Keyword>Bangkok</Keyword>
<Keyword>Thailand</Keyword>
</Keywords>
<ThumbnailSize width="200" height="133" />
<PreviewSize width="800" height="532" />
<OriginalSize width="3725" height="2479" />
</Item>
</Items>
</Collection>
Here is my current code:
require 'nokogiri'
doc = Nokogiri::XML(File.open("sample.xml"))
somevar = doc.css("collection")
#create loop
somevar.each do |item|
puts "Item "
puts item['Title']
puts "\n"
end#items
Starting at the root of the XML document, I'm trying to go from the root "Collections" down to each new level.
I start in the node sets, and get information from the nodes, and the nodes contain elements. How do I assign the node to a variable, and extract every single layer underneath that and the text?
I can do something like the code below, but I want to know how to systematically move through each nested element of XML using loops, and output the data for each line. When finished showing text, how do I move back up to the previous element/node, whatever it may be (traversing a node in the tree)?
puts somevar.css("Keyworks Keyword").text
Nokogiri's NodeSet and Node support very similar APIs, with the key semantic difference that NodeSet's methods tend to operate on all the contained nodes in turn. For example, while a single node's children gets that node's children, a NodeSet's children gets all contained nodes' children (ordered as they occur in the document). So, to print all the titles and authors of all your items, you could do this:
require 'nokogiri'
doc = Nokogiri::XML(File.open("sample.xml"))
coll = doc.css("Collection")
coll.css("Items").children.each do |item|
title = item.css("Title")[0]
authors = item.css("Authors")[0]
puts title.content if title
puts authors.content if authors
end
You can get at any level of the tree in this way. Another example -- depth-first search printing every node in the tree (NB. the printed representation of a node includes the printed representations of its children, so the output will be quite long):
def rec(node)
puts node
node.children.each do |child|
rec child
end
end
Since you ask about this specifically, if you want to get at the parent of a given node, you can use the parent method. You may never need to though, if you can put your processing in blocks passed to each and the like on NodeSets containing subtrees of interest.

Searching for tags while parsing Wordpress XML with Nokogiri

I have an XML file of a Wordpress blog that consists of quotes:
<item>
<title>Brothers Karamazov</title>
<content:encoded><![CDATA["I think that if the Devil doesn't exist and, consequently, man has created him, he has created him in his own image and likeness."]]></content:encoded>
<category domain="post_tag" nicename="dostoyevsky"><![CDATA[Dostoyevsky]]></category>
<category domain="post_tag" nicename="humanity"><![CDATA[humanity]]></category>
<category domain="category" nicename="quotes"><![CDATA[quotes]]></category>
<category domain="post_tag" nicename="the-devil"><![CDATA[the Devil]]></category>
</item>
The things I'm trying to extract are title, author, content and tags. Here's my code so far:
require "rubygems"
require "nokogiri"
doc = Nokogiri::XML(File.open("/Users/charliekim/Downloads/quotesfromtheunderground.wordpress.2013-04-14.xml"))
doc.css("item").each do |item|
title = item.at_css("title").text
tag = item.at_xpath("category").text
content = item.at_xpath("content:encoded").text
#each post will later be pushed to an array, but I'm not worried about that yet, so for now....
puts "#{title} #{tag}"
end
I'm struggling to get all the tags from each item. I'm getting returns of something like Brothers Karamazov Dostoyevsky. I'm not worried about how it's formatted as it's only a test to see that it's picking things up correctly. Anyone know how I can go about this?
I also want to make tags that are capitalized = Author, so if you know how to do that it would help, too, although I haven't even tried it yet.
EDIT: I changed the code to this:
doc.css("item").each do |item|
title = item.at_css("title").text
content = item.at_xpath("content:encoded").text
tag = item.at_xpath("category").each do |category|
category
end
puts "#{title}: #{tag}"
end
which returns:
Brothers Karamazov: [#<Nokogiri::XML::Attr:0x80878518 name="domain" value="post_tag">, #<Nokogiri::XML::Attr:0x80878504 name="nicename" value="dostoyevsky">]
and which seems a bit more manageable. It screws up my plans for taking the Author from a capitalized tag, but, well, it's not so big of a deal. How could I pull just the second value?
You're using at_xpath and expecting it to return more than one result, when the at_ methods only return the first result.
You want something like:
tags = item.xpath("category").map(&:text)
which will return an array.
As for identifying the author, you can use a regex to select the items that start with a capital letter:
author = tags.select{|w| w =~ /^[A-Z]/}
Which will choose any capitalized tags. This leaves the tags untouched. If you wanted instead to separate the authors from the tags, you can use partition:
author, tags = item.xpath("category").map(&:text).partition{|w| w =~ /^[A-Z]/}
Note that in the above examples, author is an array and will contain all matching items (i.e. more than one capitalized tag).

Gsub and regular expression

I have a web page. The HTML source contains this text:
<meta property="og:title" content="John"/>
John is an example, the name may vary.
I am sure that og:title will appear only once in the text.
This is my code:
$browser.goto( url )
x = $browser.html.gsub( /^.*<meta property="og:title" content="(.+?)".>/m, '\1' )
I expected to find the name John in my variable x
The '\1' should give me the first part I put in the parenthesis, i.e. (.+?), i.e. John, right?
Also, I used a dot . to match a slash / , is there a better way?
Using Watir API:
x = browser.meta.attribute_value "content"
I was not able to access the meta element using either css and xpath.
If you only want the value of content:
html = '<meta property="og:title" content="John"/>'
=> "<meta property=\"og:title\" content=\"John\"/>"
html[/property="og:title" content="([^"]+)"/, 1]
=> "John"
If you're not familiar with regex, "([^"]+)" might throw you. It means "from the first ", grab everything until the next ". In effect it means "grab everything inside the double-quotes.
That code will return all of the HTML, with the matching code (which is everything between the start of the string up to and including the />) replaced by 'John'. So that comes down to "John", followed by the HTML that was after the /> of that meta property.
If you only want to extract the name, and that tag occurs only once, you can use something like:
#browser.html =~ /<meta property="og:title" content="(.+?)"/
x = $1

Resources