Where should I be limiting my results? - sql-order-by

What I have done is created an XML file with a list of several thousand search terms that I need to perform on a document. I then created this query, from a sample set of search terms, as a test, to perform against a test document, with some samples from the actual document:
let $keywords := ("best clients", "Very", "20")
for $keyword in $keywords
let $matches := doc('test')/set/entry[matches(comment, $keyword, 'i')]
return (<re>
{subsequence($matches/comment, 1, 1),
subsequence($matches/buyer, 1, 1)}</re>,
<re>
{subsequence($matches/comment, 2, 1),
subsequence($matches/buyer, 2, 1)}
</re>
)
Trying to get back <re><comment /><buyer /></re><re><comment /><buyer /></re>... continuous, but I am getting them back in a rough order.
This is a chunk from the document being parsed (I've removed the buyer names and some nests, to make it easier to read):
<set>
<entry>
<comment>The client is only 20 years old. Do not be surprised by his youth.</comment>
<buyer></buyer>
<id>1282</id>
<industry>International Trade; Fish and Game</industry>
</entry>
<entry>
<comment>!On leave in October.</comment>
<buyer></buyer>
<id>709</id>
<industry>Real Estate</industry>
</entry>
<entry>
<comment>Is often !out between 1 and 3 p.m.</comment>
<buyer></buyer>
<id>127</id>
<industry>Virus Software Marketting</industry>
</entry>
<entry>
<comment>Very personable. One of our best clients.</comment>
<buyer></buyer>
<id>14851</id>
<industry>Administrative support.</industry>
</entry>
<entry>
<comment>!Very difficult to reach, but one of our top buyers.</comment>
<buyer></buyer>
<id>1458</id>
<industry>Construction</industry>
</entry>
<entry>
<comment></comment>
<buyer></buyer>
<id>276470</id>
<industry>Bulk Furniture Sales</industry>
</entry>
<entry>
<comment>A bit of an eccentric. One of our best clients.</comment>
<buyer></buyer>
<id>1506</id>
<industry>Sports Analysis</industry>
</entry>
<entry>
<comment>Very gullible, so please !be sure she needs what you sell her. She's one of our best clients.</comment>
<buyer></buyer>
<id>1523</id>
<industry>International Trade</industry>
</entry>
<entry>
<comment>He wants to buy everything, but !he has a tight budget.</comment>
<buyer></buyer>
<id>1524</id>
<industry>Public Relations</industry>
</entry>
</set>
Some of the keywords I'm using: "Best client*," "Trade", "20", ....
I've been
The output is a long list of entries with comment and buyer children as siblings under the entry element. I'd like to limit the number of entries returned to 2 per keyword. I'm also trying to get comments that begin with an exclamation point (!) to be the priority.
Current output (getting close):
<re><comment>Very personable. One of our best clients.</comment>
<buyer/>
</re><re><comment>A bit of an eccentric. One of our best clients.</comment>
<buyer/>
</re><re><comment>Very personable. One of our best clients.</comment>
<buyer/>
</re><re><comment>!Very difficult to reach, but one of our top buyers.</comment>
<buyer/>
</re><re><comment>The client is only 20 years old. Do not be surprised by his youth.</comment>
<buyer/>
</re><re/>
Current output format:
<entry>
<comment>keyworda</comment>
<buyer></buyer>
</entry>
<entry>
<comment>keyworda</comment>
<buyer></buyer>
</entry>
<entry>
<comment>keywordb</comment>
<buyer></buyer>
</entry>
<entry>
<comment>!keywordb</comment> //Not prioritized.
<buyer></buyer>
</entry>
<entry>
<comment>keywordc</comment>
<buyer></buyer>
</entry>
Desired output:
<entry>
<comment>!keyworda</comment>
<buyer></buyer>
</entry>
<entry>
<comment>keyworda</comment>
<buyer></buyer>
</entry>
<entry>
<comment>!keywordb</comment>
<buyer></buyer>
</entry>
<entry>
<comment>!keywordb</comment>
<buyer></buyer>
</entry>
(Basically, prioritizing exclamation point-containing entries and limiting the results to 2 per keyword.).

let $reults :=
(
let $pKeywords := ('best clients', 'Very', '20')
return
for $kw in $pKeywords
return
(
/*/entry[contains(comment, concat('!', $kw))],
/*/entry[contains(comment, $kw)]
)
[not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
subsequence($results/comment, $i, 1),
subsequence($results/buyer, $i, 1)
)
Returns the correct solution:
<comment>The client is only 20 years old. Do not be surprised by his youth.</comment>
<buyer/>
<comment>Very personable. One of our best clients.</comment>
<buyer/>
<comment>!Very difficult to reach, but one of our top buyers.</comment>
<buyer/>
<comment>A bit of an eccentric. One of our best clients.</comment>
<buyer/>

Related

Extract numeric and non-numeric part from a string with WSO2

Thanks before. i want need help. I have simple process flow with wso2. The plan was validate and print string for alphabet and numeric. I can print both of them. but i think the formula was so much affort with that. i want find the simple way. i done try with regular expression. but when i try with that. i always get error result
My code :
<?xml version="1.0" encoding="UTF-8"?>
<api context="/split1" name="SplitAlphaNumber" xmlns="http://ws.apache.org/ns/synapse">
<resource methods="POST">
<inSequence>
<property expression="//OperationValueRegex/Value" name="Value" scope="default" type="STRING"/>
<payloadFactory media-type="xml">
<format>
<OperationValueRegex xmlns="">
<Result1>$1</Result1>
<Result2>$2</Result2>
</OperationValueRegex>
</format>
<args>
<arg evaluator="xml" expression="translate(., translate($ctx:Value,'0123456789',''), '')"/>
<arg evaluator="xml" expression="translate(., translate($ctx:Value,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',''), '')"/>
</args>
</payloadFactory>
<respond/>
</inSequence>
<outSequence/>
<faultSequence/>
</resource>
</api>
Expected Result was like this
<OperationValueRegex>
<Result1>1234</Result1>
<Result2>Mario Naga</Result2>
</OperationValueRegex>
Sample Input from postman :
<OperationValueRegex>
<Value>Mario Naga 1234</Value>
</OperationValueRegex>
and actual output :
<OperationValueRegex>
<Result1>
1234
</Result1>
<Result2>
MarioNaga
</Result2>
</OperationValueRegex>
Please need suggestion with this. thanks
Here is a simpler way to achieve what you need. Make sure Xpath 2.0 is enabled in WSO2 Server.
Use the following two XPath expressions.
fn:tokenize($ctx:Value, ' ')[matches(., '\d+')] // Tokenize the String with space and extract the part with numeric values.
fn:replace($ctx:Value, ' \d+', '') //Replace the numeric part from the string
PLFactory Mediator
<payloadFactory media-type="xml">
<format>
<OperationValueRegex xmlns="">
<Result1>$1</Result1>
<Result2>$2</Result2>
</OperationValueRegex>
</format>
<args>
<arg evaluator="xml" expression="fn:tokenize($ctx:Value, ' ')[matches(., '\d+')]" xmlns:fn="http://www.w3.org/2005/xpath-functions" />
<arg evaluator="xml" expression="fn:replace($ctx:Value, ' \d+', '')" xmlns:fn="http://www.w3.org/2005/xpath-functions" />
</args>
</payloadFactory>
It seems you do a second translate to get rid of leading/trailing spaces. This messes up your values. Instead you can use normalize-space() function as follows, this gets you the desired output:
<api xmlns="http://ws.apache.org/ns/synapse" name="SplitAlphaNumber" context="/split1">
<resource methods="POST">
<inSequence>
<property name="Value" expression="//OperationValueRegex/Value" scope="default" type="STRING"/>
<payloadFactory media-type="xml">
<format>
<OperationValueRegex xmlns="">
<Result1>$1</Result1>
<Result2>$2</Result2>
</OperationValueRegex>
</format>
<args>
<arg evaluator="xml" expression="normalize-space(translate($ctx:Value,'0123456789',''))"/>
<arg evaluator="xml" expression="normalize-space(translate($ctx:Value,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',''))"/>
</args>
</payloadFactory>
<respond/>
</inSequence>
<outSequence/>
<faultSequence/>
</resource>
</api>

XPath 2 expressions interfering with each other

I have a XML like this:
<Values>
<Value AttributeID="asset.extension">pdf</Value>
<Value AttributeID="asset.size">10326</Value>
<Value AttributeID="ATTR_AssetPush_Webshop">1</Value>
<Value AttributeID="asset.format">PDF (Portable Document Format application)</Value>
<Value AttributeID="asset.mime-type">application/pdf</Value>
<Value AttributeID="asset.filename">filename.pdf</Value>
<Value AttributeID="asset.uploaded">2018-01-10 17:05:39</Value>
<Value AttributeID="ATTR_Verwendungsort" Derived="true">WebShop,</Value>
</Values>
I have 2 (or more) XPath-expressions like this:
<xsl:template match="/STEP-ProductInformation/Assets/Asset/Values/Value[not(#AttributeID='asset.mime-type')]" />
<xsl:template match="/STEP-ProductInformation/Assets/Asset/Values/Value[not(#AttributeID='asset.size')]" />
For some reason though, If I have 2 of them together, all information are being stripped. If I use only 1 expressoin, I get my desired output. Can't I use 2 expressions like this?
I also tried combining them like this:
<xsl:template match="/STEP-ProductInformation/Assets/Asset/Values/Value[not(#AttributeID='asset.mime-type') and (#AttributeID='asset.size')]" />
But that didn't do it, either.
The desired output would be like this:
<Values>
<Value AttributeID="asset.size">10326</Value>
<Value AttributeID="asset.mime-type">application/pdf</Value>
</Values>
I think in XSLT 2/3 you could express it as
<xsl:template match="Values/Value[not(#AttributeID = ('asset.mime-type', 'asset.size'))]"/>
In XSLT/XPath 1.0 you would need Values/Value[not(#AttributeID = 'asset.mime-type' or #AttributeID = 'asset.size')].
<xsl:template match="/STEP-ProductInformation/Assets/Asset/Values
/Value[not(#AttributeID='asset.mime-type')]" />
<xsl:template match="/STEP-ProductInformation/Assets/Asset/Values
/Value[not(#AttributeID='asset.size')]" />
This is a logical error -- has nothing to do with XPath.
It is like saying:
From all days of the week I will work only on Mondays
From the days selected above I will work only on Tuesdays
The first statement above selects only Mondays. The 2nd statement selects all Tuesdays from these Mondays -- that is the empty set.
A correct statement:
From all days of the week I will work only on Mondays or Tuesdays

XPATH to get multiple values

<values>
<value index="1">
<token>
<secret>
<client_key>
</value>
<value index="2">
<token>
<secret>
<client_key>
</value>
<value index="3">
<token>
<secret>
<client_key>
</value>
</values>
Can anyone please help me with the xpath to get the token value from all the indexes . The no of indexes will vary each time and not a constant.possible values are (1 or >1)
Searches for tokens inside values that has attribute index
//value[#index]/token

XSLT - Sort by a a custom string set

I have an xml as follows
<feed>
<entry>
<id>4</id>
<updated>2012-11-18T16:55:54Z</updated>
<title>ASSIGNED</title>
</entry>
<entry>
<id>3</id>
<updated>2011-01-16T16:55:54Z</updated>
<title>ASSIGNED</title>
</entry>
<entry>
<id>2</id>
<updated>2014-12-01T16:55:54Z</updated>
<title>EXPIRED</title>
</entry>
<entry>
<id>1</id>
<updated>2013-01-12T16:55:54Z</updated>
<title>COMPLETED</title>
</entry>
<entry>
<id>1</id>
<updated>2012-01-09T16:55:54Z</updated>
<title>ASSIGNED</title>
</entry>
<entry>
<id>1</id>
<updated>2011-04-18T16:55:54Z</updated>
<title>COMPLETED</title>
</entry>
</feed>
I want to sort by with ASSIGNED first, then followed by EXPIRED, and then COMPLETED.
If there are more than one entries in each of these categories, I would like to sort by updated value descending.
I can sort by updated descending using xsl:sort, but how do I sort based on a set of strings {ASSIGNED, EXPIRED, COMPLETED} in an order
Appreciate your response!
You can use a translate in the xsl:sort line to convert the first character of the strings "ASSIGNED", "EXPIRED", and "COMPLETED" into simple "1", "2", "3". Since the first characters of your strings are unique, that's all that it takes; it would be harder if there were two strings starting with an "A".
The following example forces a hardcoded <feed> (as the template match itself removes it) and uses an Identity Transform for all other elements.
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/feed">
<feed>
<xsl:apply-templates select="entry">
<xsl:sort select="translate (title, 'AaEeCc', '112233')" />
<xsl:sort select="updated" />
</xsl:apply-templates>
</feed>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Handling an XML file with Ruby and Nokogiri

I am new to programming so bear with me. I have many XML documents that look like this:
File name: PRIDE_Exp_Complete_Ac_10094.xml.gz
<ExperimentCollection version="2.1">
<Experiment>
<ExperimentAccession>1015</ExperimentAccession>
<Title>Protein complexes in Saccharomyces cerevisiae (GPM06600002310)</Title>
<ShortLabel>GPM06600002310</ShortLabel>
<Protocol>
<ProtocolName>None</ProtocolName>
</Protocol>
<mzData version="1.05" accessionNumber="1015">
<cvLookup cvLabel="RESID" fullName="RESID Database of Protein Modifications" version="0.0" address="http://www.ebi.ac.uk/RESID/" />
<cvLookup cvLabel="UNIMOD" fullName="UNIMOD Protein Modifications for Mass Spectrometry" version="0.0" address="http://www.unimod.org/" />
<description>
<admin>
<sampleName>GPM06600002310</sampleName>
<sampleDescription comment="Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3.">
<cvParam cvLabel="NEWT" accession="4932" name="Saccharomyces cerevisiae (Baker's yeast)" value="Saccharomyces cerevisiae" />
</sampleDescription>
</admin>
</description>
<spectrumList count="0" />
</mzData>
</Experiment>
I want to take out the text in between "Title", "ProtocolName", and "SampleName" and save into a text file that has the same name as the .xml.gz. I have the following code so far (based on posts I saw on this site), but it seems not to work:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::XML(File.open("PRIDE_Exp_Complete_Ac_10094.xml.gz"))
#ExperimentCollection = doc.css("ExperimentCollection Title").map {|node| node.children.text }
Can someone help me?
Thanks
IF you are happy with REXML, AND there's only one <Experiment> per file, then something like the following should help ... (by the way, above text is invalid XML since no closing <ExperimentCollection> tag)
require "rexml/document"
include REXML
xml=<<EOD
<Experiment>
<ExperimentAccession>1015</ExperimentAccession>
<Title>Protein complexes in Saccharomyces cerevisiae (GPM06600002310)</Title>
<ShortLabel>GPM06600002310</ShortLabel>
<Protocol>
<ProtocolName>None</ProtocolName>
</Protocol>
<mzData version="1.05" accessionNumber="1015">
<cvLookup cvLabel="RESID" fullName="RESID Database of Protein Modifications" version="0.0" address="http://www.ebi.ac.uk/RESID/" />
<cvLookup cvLabel="UNIMOD" fullName="UNIMOD Protein Modifications for Mass Spectrometry" version="0.0" address="http://www.unimod.org/" />
<description>
<admin>
<sampleName>GPM06600002310</sampleName>
<sampleDescription comment="Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3.">
<cvParam cvLabel="NEWT" accession="4932" name="Saccharomyces cerevisiae (Baker's yeast)" value="Saccharomyces cerevisiae" />
</sampleDescription>
</admin>
</description>
<spectrumList count="0" />
</mzData>
</Experiment>
EOD
doc = Document.new xml
doc.elements["Experiment/Title"].text
doc.elements["Experiment/Protocol/ProtocolName"].text
doc.elements["Experiment/mzData/description/admin/sampleName"].text

Resources