XPath: extract element nodes before/after substring - xpath

I have an XML file which basically is a list of entry elements:
...
<entry> Lorem ipsum <name n="1">dolor</name> sit amet, <name n="2">consectetur</name> adipiscing
elit, XXXXX sed do <name n="3">eiusmod</name> tempor incididunt ut labore et dolore magna
aliqua. YYYYY Ut enim ad minim veniam, quis <name n="3">nostrud</name> exercitation ullamco
laboris nisi ut aliquip ex ea commodo consequat.
</entry>
...
Each entry element has text with an arbitrary number of name elements. The text of each entry is split in three sections by the strings XXXXX and YYYYY. I'd like to access the name elements as three distinct lists:
all names of section I (from start of text to XXXXX)
all names of section II (from XXXXX to YYYYY)
all names of section III (from YYYYY to end of text)
My first instinct for 1. was to split the string at XXXXX, and look into results for extracting the name nodes, but //entry/tokenize(string(), "XXXXX") returns a list of strings, without the name nodes.
Is there any Xpath expression to solve this?

Related

Paginating imported XML Data

All,
I'm having a hell of a time getting this to work. I have a very basic XML structure:
<root>
<item>
<header>NEW HEADER</header>
<body>NEW BODY - Sed auctor justo et erat rutrum, nec molestie neque placerat. Quisque efficitur condimentum velit nec volutpat. Nunc sed magna vel mauris convallis sodales</body>
<footer>NEW - Footer: Donec in nibh risus. Sed placerat felis non pellentesque placerat. In non risus a elit malesuada consectetur.</footer>
</item>
<item>
<header>NEW HEADER 2</header>
<body>NEW BODY - Sed auctor justo et erat rutrum, nec molestie neque placerat. Quisque efficitur condimentum velit nec volutpat. Nunc sed magna vel mauris convallis sodales</body>
<footer>NEW - Footer: Donec in nibh risus. Sed placerat felis non pellentesque placerat. In non risus a elit malesuada consectetur.</footer>
</item>
</root>
I've created an InDesign template with tagged text-area placeholders. What I want to achieve is create a new page for each <item> tag and populate the data appropriately. When I load my XML, it loads each <item> but it doesn't generate a new page for each one.
Any help would be appreciated.
that's because you need to understand some basic rules. Number one is that xml is just about text within InDesign. In your case, your template has to dispose from a generic set of tags and a page break character. You will ask InDesign to duplicate that set and character at every occurence of the repeated incoming node. I wrote a blog post that talk about all those peculiarities. Especially for rookies ;) : http://www.ozalto.com/en/5-errors-you-will-do-with-indesign-xml/
You'll want to take a look at the "Merge Mode" section of Adobe's Importing XML documentation here:
https://helpx.adobe.com/indesign/using/importing-xml.html
From that page:
Merge mode not only makes automated layout possible, it provides more
advanced import options, including the ability to filter incoming text
and clone elements for repeating data.
it sounds like you need the "clone elements" feature.
To get new page for each <item> put a page break at the end of <item>
Then make sure to set a "Primary Text Frame" on your master page.
https://helpx.adobe.com/indesign/using/whats-new-cs6.html#id_16192
With this set, InDesign will simply create a new page as needed.

Getting the MoreLinq MaxBy function to return more than one element

I have a situation in which I have a list of objects with an int property and I need to retrieve the 3 objects with the highest value for that property. The MoreLinq MaxBy function is very convenient to find the single highest, but is there a way I can use that to find the 3 highest? (Not necessarily of the same value). The implementation I'm currently using is to find the single highest with MaxBy, remove that object from the list and call MaxBy again, etc. and then add the objects back into the list once I've found the 3 highest. Just thinking about this implementation makes me cringe and I'd really like to find a better way.
Update: In version 3, MaxBy (including MinBy) of MoreLINQ was changed to return a sequence as opposed to a single item.
Use MoreLINQ's PartialSort or PartialSortBy. The example below uses PartialSortBy to find and print the longest 5 words in a given text:
var text = #"
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Etiam gravida nec mauris vitae sollicitudin. Suspendisse
malesuada urna eu mi suscipit fringilla. Donec ut ipsum
aliquet, tincidunt mi sed, efficitur magna. Nulla sit
amet congue diam, at posuere lectus. Praesent sit amet
libero vehicula dui commodo gravida eget a nisi. Sed
imperdiet arcu eget erat feugiat gravida id non est.
Nam malesuada nibh sit amet nisl sollicitudin vestibulum.";
var words = Regex.Matches(text, #"\w+");
var top =
from g in words.Cast<Match>()
.Select(m => m.Value)
.GroupBy(s => s.Length)
.PartialSortBy(5, m => m.Key, OrderByDirection.Descending)
select g.Key + ": " + g.Distinct().ToDelimitedString(", ");
foreach (var e in top)
Console.WriteLine(e);
It will print:
14: malesuadafsfjs
12: sollicitudin
11: consectetur, Suspendisse
10: adipiscing, vestibulum
9: malesuada, fringilla, tincidunt, efficitur, imperdiet
in this case, you could simply do
yourResult.OrderByDescending(m => m.YourIntProperty)
.Take(3);
Now, this will retrieve you 3 objects.
So if you've got 4 objects sharing the same value (which is the max), 1 will be skipped. Not sure if that's what you want, or if it's not a problem...
But MaxBy will also retrieve only one element if you have many elements with the same "max value".

how to extract file names with .txt from a text?

I have a text like below,
Lorem ipsum dolor sit amet, consectetur sample1.txt adipiscing elit. Morbi nec urna non ante varius semper eget vitae ipsum. Pellentesque habitant sample2.txt morbi tristique senectus et netus et malesuada fames.
I have sample1.txt and sample2.txt in the above text. Name vary from sample1 and sample2. i just need to fetch the file name using c#.
Can anyone please help me on this ?
Since you tagged it LINQ:
var filesnames = text.Split(new char[] { }) // split on whitespace into words
.Where(word => word.EndsWith(".txt"));
Try something like this
var filesnames = text.Split(' ')
.Where(o => o.EndsWith(".txt")).Select(o => o.SubString(o.LastIndexOf('.'))).ToList();
It may be possible with a regular expression if there's a good way to capture what your file names will look like. I'm assuming here it's always blah.txt with alphanumeric characters:
var matches = Regex.Matches(input, #"\b[a-zA-Z0-9]+\.txt\b");

How to highlight multiple selections?

For example I have some text in ace-editor and a list of ranges of rows and lines in text where highlightings should happened. Like this (they're bolded):
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nam cursus.
Morbi ut mi. Nullam enim leo, egestas id, condimentum at, laoreet mattis,
massa. Sed eleifend nonummy diam. Praesent mauris ante, elementum et,
bibendum at, posuere sit amet, nibh.
How to highlight these words by using ace-editor API?
How to highlight multiple lines?
Finally I've got the answer.
Highlight the word:
var range = new Range(rowStart, columnStart, rowEnd, columnEnd);
var marker = editor.getSession().addMarker(range,"ace_selected_word", "text");
Remove the highlighted word:
editor.getSession().removeMarker(marker);
Highlight the line:
editor.getSession().addMarker(range,"ace_active_line","background");

Why are different delimiters used in percent notation?

I have seen different people use different types of braces/brackets for this. I tried them out in script console, and they all work. Why do they all work and does it matter which is used?
%w|one two|
%w{one two}
%w[one two]
%w(one two)
Actually, much more varaiety of characters can be used. Any non-alphanumeric character except = can be used.
%w!a!
%w#b#
%w#c#
%w$d$
%w%e%
%w^f^
%w&g&
%w*h*
%w(i)
%w_j_
%w-k-
%w+l+
%w\m\
%w|n|
%w`o`
%w~p~
%w[q]
%w{r}
%w;s;
%w:t:
%w'u'
%w"v"
%w,w,
%w<x>
%w.y.
%w/z/
%w?aa?
No difference. The reason for the flexibility is so that you can pick delimiters that won't appear within your %w() string.
You get to choose your own delimiter. Pick the one that saves you from having to escape characters.
There is no difference, just a personal preference for multiline strings. Some people like to use...
<<eos
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud.
eos
As long as the beginning and end are the same then you are fine.

Resources