Iterate over Umbraco getAllTagsInGroup result - xpath

I'm trying to get a list of tags from a particular tag group in Umbraco (v4.0.2.1) using the following code:
var tags = umbraco.editorControls.tags.library.getAllTagsInGroup("document downloads");
What I want to do is just output a list of those tags. However, if I output the variable 'tags' it just outputs a list of all tags in a string. I want to split each tag onto a new line.
When I check the datatype of the 'tags' variable:
string tagType = tags.GetType().ToString();
...it outputs MS.Internal.Xml.XPath.XPathSelectionIterator.
So question is, how do I get the individual tags out of the 'tags' variable? How do I work with a variable of this data type? I can find examples of how to do it by loading an actual XML file, but I don't have an actual XML file - just the 'tags' variable to work with.
Thanks very much for any help!
EDIT1: I guess what I'm asking is, how do I loop through the nodes returned by an XPathSelectionIterator data type?
EDIT2: I've found this code, which almost does what I need:
XPathDocument document = new XPathDocument("file.xml");
XPathNavigator navigator = document.CreateNavigator();
XPathNodeIterator nodes = navigator.Select("/tags/tag");
nodes.MoveNext();
XPathNavigator nodesNavigator = nodes.Current;
XPathNodeIterator nodesText = nodesNavigator.SelectDescendants(XPathNodeType.Text, false);
while (nodesText.MoveNext())
debugString += nodesText.Current.Value.ToString();
...but it expects the URL of an actual XML file to load into the first line. My XML file is essentially the 'tags' variable, not an actual XML file. So when I replace:
XPathDocument document = new XPathDocument("file.xml");
...with:
XPathDocument document = new XPathDocument(tags);
...it just errors.

Since it is an Iterator, I would suggest you iterate it. ;-)
var tags = umbraco.editorControls.tags.library.getAllTagsInGroup("document downloads");
foreach (XPathNavigator tag in tags) {
// handle current tag
}

I think this does the trick a little better.
The problem is that getAllTagsInGroup returns the container for all tags, you need to get its children.
foreach( var tag in umbraco.editorControls.tags.library.getAllTagsInGroup("category").Current.Select("/tags/tag") )
{
/// Your Code
}

Related

how to get value of a tag that has no class or id in html agility pack?

I am trying to get the text value of this a tag:
67 comments
so i'm trying to get '67' from this. however there are no defining classes or id's.
i've managed to get this far:
IEnumerable<HtmlNode> commentsNode = htmlDoc.DocumentNode.Descendants(0).Where(n => n.HasClass("subtext"));
var storyComments = commentsNode.Select(n =>
n.SelectSingleNode("//a[3]")).ToList();
this only give me "comments" annoyingly enough.
I can't use the href id as there are many of these items, so i cant hardcord the href
how can i extract the number aswell?
Just use the #href attribute and a dedicated string function :
substring-before(//a[#href="item?id=22513425"],"comments")
returns 67.
EDIT : Since you can't hardcode all the content of #href, maybe you can use starts-with. XPath 1.0 solution.
Shortest form (+ text has to contain "comments") :
substring-before(//a[starts-with(#href,"item?") and text()[contains(.,"comments")]],"c")
More restrictive (+ text has to finish with "comments") :
substring-before(//a[starts-with(#href,"item?")][substring(//a, string-length(//a) - string-length('comments')+1) = 'comments'],"c")
I am using ScrapySharp nuget which adds in my sample below, (It's possible HtmlAgilityPack offers the same functionality built it, I am just used to ScrapySharp from years ago)
var doc = new HtmlDocument();
doc.Load(#"C:\desktop\anchor.html"); //I created an html file with your <a> element as the body
var anchor = doc.DocumentNode.CssSelect("a").FirstOrDefault();
if (anchor == null) return;
var digits = anchor.InnerText.ToCharArray().Where(c => Char.IsDigit(c));
Console.WriteLine($"anchor text: {anchor.InnerText} - digits only: {new string(digits.ToArray())}");
Output:

JMeter Array of variables to text file

I am running a query via JDBC request and I am able to get the data and place it in a variable array. The problem is I want the values of the variables to be saved to a text file. However, each variable is being given a unique number appended to it i.e. SCORED_1, SCORED_2,SCORED_3 etc. I am using a beanshell post processor to write to the text file. The problem is I unless I define A LINE Number. How can I get all results from a SQL query and dump them into a single variable without the variables separated by brackets and line separated on their own row.
import org.apache.jmeter.services.FileServer;
// get variables from regular expression extractor
ClaimId = vars.get("SCORED _9"); // I want to just use the
SCORED variable to contain all values from the array
without "{[" characters.
// pass true if want to append to existing file
// if want to overwrite, then don't pass the second
argument
FileWriter fstream = new FileWriter("C:/JMeter/apache-
jmeter-4.0/bin/FBCS_Verify_Final/Comp.txt", true);
BufferedWriter out = new BufferedWriter(fstream);
out.write(ClaimId);
out.write(System.getProperty("line.separator"));
out.close();
fstream.close();
We are not telepathic enough to come up with the solution without seeing your query output and the result file format.
However I'm under impression that you're going into wrong direction. Given you're talking about {[ characters it appears that you're using Result Variable Name field
which returns an ArrayList which should be treated differently
However if you switch to Variable Names field
JMeter will generate a separate variable per each result set row and it should be much easier to work with and eventually concatenate
More information:
JDBC Request
Debugging JDBC Sampler Results in JMeter
JDBC request>Enter a Variable Name> Store as string>Add a Beanshell PostProcessor and add the following script.
import org.apache.jmeter.services.FileServer;
{
FileWriter fstream = new FileWriter("C:/JMeter/apache-jmeter-4.0/bin/FBCS_Verify_Final/Comp.txt", false);
BufferedWriter out = new BufferedWriter(fstream);
Count = vars.get("SCORED_#");
Counter=Integer.parseInt(vars.get("SCORED_#"));
for (int i=1;i<=Counter;i++)
{
ClaimId = vars.get("SCORED_"+i);
out.write(ClaimId);
out.write(System.getProperty("line.separator"));
}
out.flush();
out.close();
fstream.close();
}

Read image IPTC data

I'm having some trouble with reading out the IPTC data of some images, the reason why I want to do this, is because my client has all the keywords already in the IPTC data and doesn't want to re-enter them on the site.
So I created this simple script to read them out:
$size = getimagesize($image, $info);
if(isset($info['APP13'])) {
$iptc = iptcparse($info['APP13']);
print '<pre>';
var_dump($iptc['2#025']);
print '</pre>';
}
This works perfectly in most cases, but it's having trouble with some images.
Notice: Undefined index: 2#025
While I can clearly see the keywords in photoshop.
Are there any decent small libraries that could read the keywords in every image? Or am I doing something wrong here?
I've seen a lot of weird IPTC problems. Could be that you have 2 APP13 segments. I noticed that, for some reasons, some JPEGs have multiple IPTC blocks. It's possibly the problem with using several photo-editing programs or some manual file manipulation.
Could be that PHP is trying to read the empty APP13 or even embedded "thumbnail metadata".
Could be also problem with segments lenght - APP13 or 8BIM have lenght marker bytes that might have wrong values.
Try HEX editor and check the file "manually".
I have found that IPTC is almost always embedded as xml using the XMP format, and is often not in the APP13 slot. You can sometimes get the IPTC info by using iptcparse($info['APP1']), but the most reliable way to get it without a third party library is to simply search through the image file from the relevant xml string (I got this from another answer, but I haven't been able to find it, otherwise I would link!):
The xml for the keywords always has the form "<dc:subject>...<rdf:Seq><rdf:li>Keyword 1</rdf:li><rdf:li>Keyword 2</rdf:li>...<rdf:li>Keyword N</rdf:li></rdf:Seq>...</dc:subject>"
So you can just get the file as a string using file_get_contents(get_attached_file($attachment_id)), use strpos() to find each opening (<rdf:li>) and closing (</rdf:li>) XML tag, and grab the keyword between them using substr().
The following snippet works for all jpegs I have tested it on. It will fill the array $keys with IPTC tags taken from an image on wordpress with id $attachment_id:
$content = file_get_contents(get_attached_file($attachment_id));
// Look for xmp data: xml tag "dc:subject" is where keywords are stored
$xmp_data_start = strpos($content, '<dc:subject>') + 12;
// Only proceed if able to find dc:subject tag
if ($xmp_data_start != FALSE) {
$xmp_data_end = strpos($content, '</dc:subject>');
$xmp_data_length = $xmp_data_end - $xmp_data_start;
$xmp_data = substr($content, $xmp_data_start, $xmp_data_length);
// Look for tag "rdf:Seq" where individual keywords are listed
$key_data_start = strpos($xmp_data, '<rdf:Seq>') + 9;
// Only proceed if able to find rdf:Seq tag
if ($key_data_start != FALSE) {
$key_data_end = strpos($xmp_data, '</rdf:Seq>');
$key_data_length = $key_data_end - $key_data_start;
$key_data = substr($xmp_data, $key_data_start, $key_data_length);
// $ctr will track position of each <rdf:li> tag, starting with first
$ctr = strpos($key_data, '<rdf:li>');
// Initialize empty array to store keywords
$keys = Array();
// While loop stores each keyword and searches for next xml keyword tag
while($ctr != FALSE && $ctr < $key_data_length) {
// Skip past the tag to get the keyword itself
$key_begin = $ctr + 8;
// Keyword ends where closing tag begins
$key_end = strpos($key_data, '</rdf:li>', $key_begin);
// Make sure keyword has a closing tag
if ($key_end == FALSE) break;
// Make sure keyword is not too long (not sure what WP can handle)
$key_length = $key_end - $key_begin;
$key_length = (100 < $key_length ? 100 : $key_length);
// Add keyword to keyword array
array_push($keys, substr($key_data, $key_begin, $key_length));
// Find next keyword open tag
$ctr = strpos($key_data, '<rdf:li>', $key_end);
}
}
}
I have this implemented in a plugin to put IPTC keywords into WP's "Description" field, which you can find here.
ExifTool is very robust if you can shell out to that (from PHP it looks like?)

How to get attribute names of element in xml by LINQ in C#

I have xml element:
<.SECTIONS>
<.SECTION ID ="1" NAME="System Health" CONTROL-TYPE="Button" LINK="http://www.google.co.in/">
<.DATAITEMS>
<./DATAITEMS>
<./SECTION>
<./SECTIONS>
I want to get the all attribute names of SECTION Element. as ID,NAME,CONTROL-TYPE,LINK at server side using LINQ to XML in C# language. What query I have to write there?
As #Giu mentions, your XML is technically malformed with the . preceding each element name.
To get the names of the attributes available in SECTION:
string xmlData = "<SECTIONS> <SECTION ID =\"1\" NAME=\"System Health\" CONTROL-TYPE=\"Button\" LINK=\"http://www.google.co.in/\"> <DATAITEMS> </DATAITEMS> </SECTION> </SECTIONS>";
XDocument doc = XDocument.Parse( xmlData );
//The above line could also be XDocument.Load( fileName ) if you wanted a file
IEnumerable<string> strings = doc.Descendants("SECTIONS")
.Descendants("SECTION")
.Attributes()
.Select( a => a.Name.LocalName );
This will give you an enumerable containing ID, NAME, CONTROL-TYPE, and LINK.
However, if you want the values contained in them, I would use #Giu's answer.
Your XML looks a little bit malformed due to the . before each tag name; I therefore sanitized your XML code by removing the .s, and made a solution based on the following XML code:
<SECTIONS>
<SECTION ID ="1" NAME="System Health" CONTROL-TYPE="Button" LINK="http://www.google.co.in/">
<DATAITEMS> </DATAITEMS>
</SECTION>
</SECTIONS>
Thanks to the sanitized XML code, you now can use the following code snippet to achieve what you want (don't forget the using directive using System.Xml.Linq;):
XDocument doc = XDocument.Parse("<SECTIONS><SECTION ID =\"1\" NAME=\"System Health\" CONTROL-TYPE=\"Button\" LINK=\"http://www.google.co.in/\"><DATAITEMS></DATAITEMS></SECTION></SECTIONS>");
var query = from item in doc.Descendants("SECTIONS").Descendants("SECTION")
select new {
Name = (string)item.Attribute("NAME"),
Id = (string)item.Attribute("ID"),
ControlType = (string)item.Attribute("CONTROL-TYPE"),
Link = (string)item.Attribute("LINK")
};
(Sidenote: You can load your XML code directly from a file (e.g. file.xml), too, by defining the doc variable as follows:
XDocument doc = XDocument.Load(#"C:\Path\To\file.xml");
)
The following code will print the value of each attribute:
foreach (var elem in query)
System.Console.WriteLine(string.Format("{0}, {1}, {2}, {3}", elem.Id, elem.Name, elem.ControlType, elem.Link));
Console output:
1, System Health, Button, http://www.google.co.in/

Image tag not closing with HTMLAgilityPack

Using the HTMLAgilityPack to write out a new image node, it seems to remove the closing tag of an image, e.g. should be but when you check outer html, has .
string strIMG = "<img src='" + imgPath + "' height='" + pubImg.Height + "px' width='" + pubImg.Width + "px' />";
HtmlNode newNode = HtmlNode.Create(strIMG);
This breaks xhtml.
Telling it to output XML as Micky suggests works, but if you have other reasons not to want XML, try this:
doc.OptionWriteEmptyNodes = true;
Edit 1:Here is how to fix an HTML Agilty Pack document to correctly display image (img) tags:
if (HtmlNode.ElementsFlags.ContainsKey("img"))
{ HtmlNode.ElementsFlags["img"] = HtmlElementFlag.Closed;}
else
{ HtmlNode.ElementsFlags.Add("img", HtmlElementFlag.Closed);}
replace "img" for any other tag to fix them as well (input, select, and option come up frequently). Repeat as needed. Keep in mind that this will produce rather than , because of the HAP bug preventing the "closed" and "empty" flags from being set simultaneously.
Source: Mike Bridge
Original answer:
Having just labored over solutions to this issue, and not finding any sufficient answers (doctype set properly, using Output as XML, Check Syntax, AutoCloseOnEnd, and Write Empty Node options), I was able to solve this with a dirty hack.
This will certainly not solve the issue outright for everyone, but for anyone returning their generated html/xml as a string (EG via a web service), the simple solution is to use fake tags that the agility pack doesn't know to break.
Once you have finished doing everything you need to do on your document, call the following method once for each tag giving you a headache (notable examples being option, input, and img). Immediately after, render your final string and do a simple replace for each tag prefixed with some string (in this case "Fix_", and return your string.
This is only marginally better in my opinion than the regex solution proposed in another question I cannot locate at the moment (something along the lines of )
private void fixHAPUnclosedTags(ref HtmlDocument doc, string tagName, bool hasInnerText = false)
{
HtmlNode tagReplacement = null;
foreach(var tag in doc.DocumentNode.SelectNodes("//"+tagName))
{
tagReplacement = HtmlTextNode.CreateNode("<fix_"+tagName+"></fix_"+tagName+">");
foreach(var attr in tag.Attributes)
{
tagReplacement.SetAttributeValue(attr.Name, attr.Value);
}
if(hasInnerText)//for option tags and other non-empty nodes, the next (text) node will be its inner HTML
{
tagReplacement.InnerHtml = tag.InnerHtml + tag.NextSibling.InnerHtml;
tag.NextSibling.Remove();
}
tag.ParentNode.ReplaceChild(tagReplacement, tag);
}
}
As a note, if I were a betting man I would guess that MikeBridge's answer above inadvertently identifies the source of this bug in the pack - something is causing the closed and empty flags to be mutually exclusive
Additionally, after a bit more digging, I don't appear to be the only one who has taken this approach:
HtmlAgilityPack Drops Option End Tags
Furthermore, in cases where you ONLY need non-empty elements, there is a very simple fix listed in that same question, as well as the HAP codeplex discussion here: This essentially sets the empty flag option listed in Mike Bridge's answer above permanently everywhere.
There is an option to turn on XML output that makes this issue go away.
var htmlDoc = new HtmlDocument();
htmlDoc.OptionOutputAsXml = true;
htmlDoc.LoadHtml(rawHtml);
This seems to be a bug with HtmlAgilityPack. There are many ways to reproduce this, for example:
Debug.WriteLine(HtmlNode.CreateNode("<img id=\"bla\"></img>").OuterHtml);
Outputs malformed HTML. Using the suggested fixes in the other answers does nothing.
HtmlDocument doc = new HtmlDocument();
doc.OptionOutputAsXml = true;
HtmlNode node = doc.CreateElement("x");
node.InnerHtml = "<img id=\"bla\"></img>";
doc.DocumentNode.AppendChild(node);
Debug.WriteLine(doc.DocumentNode.OuterHtml);
Produces malformed XML / XHTML like <x><img id="bla"></x>
I have created a issue in CodePlex for this.

Resources