DokuWiki nested lists regexp - ruby

How can i replace DokuWiki nested list string with using one or two regexps in Ruby?
For example, if we have this string:
* one
* two
* three
* four
we should get this HTML:
one
two
three
four
I've made a regexp replacing the whole list. E.g.:
s.sub!(/(^\s+\*\s.+$)+/m, '<ul>\1</ul>')
And it works as it should. But how to replace the single list items?

The regex :
Here are some example lists :
* first item
* second item
No longer a list
* third item? no, it's the first item of the second list
* first item
* second item with linebreak\\ second line
* third item with code: <code>
some code
comes here
</code>
* fourth item
The regex for matching all lists
(?<=^|\n)(?: {2,}\*([^\n]*?<code>.*?</code>[^\n]*|[^\n]*)\n?)+
View it in action : http://rubular.com/r/VMjwbyhJTm
The code :
Surround all lists with a <ul>...</ul>
s.sub!(/(?<=^|\n)(?: {2,}\*(?:[^\n]*?<code>.*?<\/code>[^\n]*|[^\n]*)\n?)+/m, '<ul>\0</ul>')
Add missing <li>s (s2 in the following code is the string with <ul>...</ul> added)
s2.sub!(/ {2,}\*([^\n]*?<code>.*?<\/code>[^\n]*|[^\n]*)\n?/m, '<li>\1</li>')
Note :
Nested lists can not be handled with this regex. If this is a requirement, a parser will be more adapted !

Related

Extract last word using Xpath 1.0

I need to select only the last word using xpath 1.0. I have something like this:
<Example>
<Ctry> Portugal PT </Ctry>
</Example>
I want to select only the PT word but the order is not exact, i.e: <Ctry> Portugal - Lisbon - PT </Ctry>, but the word i want to extract is always the last one.
I've already tried:
//*[name()='Example'][substring(., string-length(.) - string-length('PT')+1) = 'PT']/text() but extracts always the whole string.
Can anyone help me please?
You're selecting a node using the substring as a predicate to filter out other nodes. If you want the substring to be your output, it shouldn't go inside brackets.
substring(//*[name()='Example'], string-length(//*[name()='Example']) - string-length('PT')+1)
note that /text() can be ommited when working with string functions

How to use wildcard with a variable?

I need to evaluate the output to see if it starts with a specific sequence.
For example if Cat1 = (A)
I want to verify that the entry begins with the value of Cat1 and can contain any text after it. If so then to output that entry.
I don't exactly know how to use wildcards in conjunction with the variable to allow entries such as
(A) First assignment
(A) Second assignment
to be selected and then to be transferred.
The portion that is in question is the following in my code:
if(assign.title == ){
SpreadsheetApp.openByUrl(url).getSheetByName(shet).appendRow([assign.title, marks.assignedGrade,
assign.maxPoints]);}
}
Your issue can be solved by using Regular Expressions which essentially are special text strings used to describe a search pattern.
Therefore, if you want to search for the entries which begin with (A) and appendRow() like you mentioned above, you should use the following code snippet:
function theFunction() {
var ss = SpreadsheetApp.openByUrl("YOUR_URL").getSheetByName("YOUR_SHEET_NAME");
var regEx = /((A)).*/;
//Getting the assign & marks variables
if (assign.title.match(regEx))
appendRow([assign.title, marks.assignedGrade, assign.maxPoints]);
}
The regular expression here is represented by the var regEx = /((A)).*/; which searches for a string to see if it starts with the (A) string.
Furthermore, I suggest you take a look at these links since they might be of help:
Syntax for Regular Expressions;
Regular Expressions Tester.

last value of sublist in freemarker

I have a long date list with corresponding price values.
out of this list I need the last price of every month.
I sliced the list and tried to get the last value but cant!
please help.
here is part of the list:
here is my code:
<#assign new_list><#list reportData.daily as list><#if list?string('M')?number=6 && list?string('Y')?number=2017>${list?string("dd/MM/yyyy")}</#if></#list></#assign>${new_list?last}
if I remove the "?last" it will show me a sublist for all dates for June-2017 but once i add "?last" i get:
For "?last" left-hand operand: Expected a sequence, but this has evaluated to a string (wrapper: f.t.SimpleScalar):
You get that error because <#assign new_list>...</#assign> captures the raw output printed, which is not a sequence. If you really have to do this in template, put an #if inside the #list which only prints the current item if the next one is in a different month. (You can peek at the next item inside <#list xs as x>...</#list> like xs[x?index + 1]) But, you aren't supposed to do such data processing in a template...

xpath query omit results with parent tag

I'm fairly new to xpath so seeking some help with a pattern to match the following. My current attempt isn't matching what I would expect.
//text()[1][contains(.,'wordToMatch') and not(self::a)]
As i'm sure you can see from the pattern above, i'm a noob.
Sample payload 1:
<p>Sample 1 wordToMatch some
random text
to not be matched followed by wordToMatch, this should work.</p>
Expected Result 1:
wordToMatch (Not the one inside of a' tags but the following one)
Sample payload 2:
<p>Sample 2 wordToMatch some
random text to not be matched followed by <b>wordToMatch</b> this
should work.</p>
Expected Result 2:
wordToMatch (The one inside of the b' tags)
Sample payload 3:
<p>Sample 3 wordToMatch some
random text to not be matched followed by wordToMatch followed by
further occurrences of wordToMatch which should not be matched.</p>
Expected Result 3:
wordToMatch (The second occurrence of the term)
Expected results for all 3 payloads is the first occurrence of the term wordToMatch which is NOT wrapped inside of an 'a' Tag.
The end language that will implement this pattern is Java.
Please help.
It's still not clear from the question what you're after exactly, adding exact expected output for each sample will clears things up, I think. Anyway, based on current information, consider the following XPath which will match any element where inner text is exactly equals 'wordToMatch', and the element itself is not an <a> element :
//*[.='wordToMatch'][not(self::a)]
This will return b element in the 2nd case and none for other cases. If you want to relax the matching return the text node (instead of parent element), this will do:
//*[not(self::a)]/text()[contains(.,'wordToMatch')]
UPDATE:
In XPath 2.0 or above you can use for construct :
for $t in //*[not(self::a)]/text()[contains(.,'wordToMatch')]
return 'wordToMatch'
xpatheval demo

XPath 2.0:reference earlier context in another part of the XPath expression

in an XPath I would like to focus on certain elements and analyse them:
...
<field>aaa</field>
...
<field>bbb</field>
...
<field>aaa (1)</field>
...
<field>aaa (2)</field>
...
<field>ccc</field>
...
<field>ddd (7)</field>
I want to find the elements who's text content (apart from a possible enumeration, are unique. In the aboce example that would be bbb, ccc and ddd.
The following XPath gives me the unique values:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
Now I would like to extent that and perform another XPath on all the distinct values, that would be to count how many field start with either of them and retreive the ones who's count is bigger than 1.
These could be a field content that is equal to that particular value, or it starts witrh that value and is followed by " (". The problem is that in the second part of that XPath I would have refer to the context of that part itself and to the former context at the same time.
In the following XPath I will - instead of using "." as the context- use c_outer and c_inner:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))[count(//field[(c_inner = c_outer) or starts-with(c_inner, concat(c_outer, ' ('))]) > 1]
I can't use "." for both for obvious reasons. But how could I reference a particular, or the current distinct value from the outer expression within the inner expression?
Would that even be possible?
XQuery can do it e.g.
for $s
in distinct-values(
//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
where count(//field[(. = $s) or starts-with(., concat($s, ' ('))]) > 1
return $s

Resources