How to print Double-Quotes in Elisp - elisp

I can't seem to be able to insert the xml declaration in my code.
(insert "<?xml version="1.0" encoding="UTF-8" standalone="no" ?>")
It's an easy question but I can't figure it out!

(insert "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\" ?>")
This should work

http://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-for-Strings.html
To include a double-quote in a string, precede it with a backslash; thus, "\"" is a string containing just a single double-quote
character.

Related

How to find the parent node by matching text using XPath

I have some XML:
<sys>
<lang>
<employee>
<name>Employee 1</name>
<code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>
</employee>
<employee>
<name>Employee 2</name>
<code>1d960bdc-0853-49af-bb83-18cf92493897</code>
</employee>
</lang>
</syz>
How can I search and get the employee node where name ="Employee 1"?
I tried this but it didn't work:
obj.xpath("//sys/lang[/employee/name = 'Employee 1']")
This XPath
/sys/lang/employee[name = 'Employee 1']
will select the employee element whose name is Employee 1.
Why might OP be getting an "Invalid expression" using the above XPath?
Transcription error.
Resolution: Use copy and paste.
Single quotes around single quotes.
Resolution: Use outer double quotes: "/sys/lang/employee[name = 'Employee 1']"
Smart quotes.
Resolution: Replace ‘ and ’ with single quote '.
Misinterpretation of error message.
Resolution: Carefully check any line number mentioned in error, or carve away surrounding code as much as possible, and see if error goes away.
If none of the above possibilities apply, post a MCVE (Minimal, Complete, and Verifiable Example, including the provided XPath and the calling code -- the complete in MCVE) that produces the invalid expression error, and someone will likely immediately spot the problem.
I'm a big fan of using CSS over XPath for readability reasons. Nokogiri implements a number of jQuery's extensions to make it easier to use CSS for things we'd usually use XPath for.
I'd do it this way:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<sys>
<lang>
<employee>
<name>Employee 1</name>
<code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>
</employee>
<employee>
<name>Employee 2</name>
<code>1d960bdc-0853-49af-bb83-18cf92493897</code>
</employee>
</lang>
</syz>
EOT
emp1 = doc.at('employee name:contains("Employee 1")') # => #<Nokogiri::XML::Element:0x3ffed05285b4 name="name" children=[#<Nokogiri::XML::Text:0x3ffed05283d4 "Employee 1">]>
emp1.to_xml # => "<name>Employee 1</name>"
emp1.parent.to_xml # => "<employee>\n <name>Employee 1</name>\n <code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>\n </employee>"
Also note, it's not good practice to define the full path in the selector for a node. If the HTML or XML changes the structure that selector will break. Instead, find useful landmarks and hop from one to the next. That way your selector is more likely to survive changes in the markup. I only care about finding the appropriate <employee>...<name> combination, not those two tags embedded under <sys> and <lang>.
Sometimes an alternate way of getting to the information you want is to use search and look at a particular index:
doc.search('employee').first.to_xml # => "<employee>\n <name>Employee 1</name>\n <code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>\n </employee>"
Or:
doc.at('employee').to_xml # => "<employee>\n <name>Employee 1</name>\n <code>4fdaa994-7015-4ec1-b365-de4ee0279966</code>\n </employee>"
at('some selector') is equivalent to search('some selector').first.

Get nodes from xml string using regex

I have string xml like below:
<Query>
<Code>USD</Code>
<Description>United States Dollars</Description>
<UpdateTime>2013-03-04 02:27:33</UpdateTime>
<toUSD>1</toUSD>
<USDto>1</USDto>
<toEUR>2</toEUR>
<EURto>3</EURto>
</Query>
All text is in one line without white spaces. I can't write right regex pattern. I want get nodes which begin like <to. For example <toEUR>, <toUSD>.
How should I write this pattern?
With nokogiri and the xpath function starts-with:
require 'nokogiri'
doc = Nokogiri::XML <<EOF
<Query>
<Code>USD</Code>
<Description>United States Dollars</Description>
<UpdateTime>2013-03-04 02:27:33</UpdateTime>
<toUSD>1</toUSD>
<USDto>1</USDto>
<toEUR>2</toEUR>
<EURto>3</EURto>
</Query>
EOF
doc.search('//*[starts-with(name(),"to")]').map &:to_s
#=> ["<toUSD>1</toUSD>", "<toEUR>2</toEUR>"]
Although the general consensus is that parsing xml etc with regex is not the way to go, something like this should do the trick:
<\s*(to[^>\s]+)[^>]*>([^<]+)<\s*/\s*\1\s*>
In ruby format:
/<\s*(to[^>\s]+)[^>]*>([^<]+)<\s*\/\s*\1\s*>/
Matches <toWatever>value</toWhatever> back-reference group 1 returns the name (toWhatever) and back-reference group 2 returns the value.

Xpath matches with single quotes?

How can I assert an xpath match that contains single quotes within the string to be asserted?
This is my string with value '40' to be asserted.
I assumed to escape the single quote characters with \' but that does not work.
matches( //faultstring[1]/text(), 'This is my string with value \'40\' to be asserted.' )
How is this done properly?
Try this
//faultstring[matches(text(),''')]
or
//faultstring[matches(text(),'&apos;')]
or
//faultstring[matches(text(),''')]
For a more elegant solution see this post
My understanding of this question is that the problem is caused by the need to use nested quotes if the XPath expression is within an XML document.
If this is the case, one can use this XPath expression:
$yourString = "This is my string with value &apos;40&apos; to be asserted."
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:copy-of select=
"/* = "This is my string with value &apos;40&apos; to be asserted.""/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<t>This is my string with value '40' to be asserted.</t>
the XPath expression is evaluated and the result of this evaluation is copied to the output:
true

XPath expression for selecting all text in a given node, and the text of its chldren

Basically I need to scrape some text that has nested tags.
Something like this:
<div id='theNode'>
This is an <span style="color:red">example</span> <b>bolded</b> text
</div>
And I want an expression that will produce this:
This is an example bolded text
I have been struggling with this for hour or more with no result.
Any help is appreciated
The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order.
You want to call the XPath string() function on the div element.
string(//div[#id='theNode'])
You can also use the normalize-space function to reduce unwanted whitespace that might appear due to newlines and indenting in the source document. This will remove leading and trailing whitespace and replace sequences of whitespace characters with a single space. When you pass a nodeset to normalize-space(), the nodeset will first be converted to it's string-value. If no arguments are passed to normalize-space it will use the context node.
normalize-space(//div[#id='theNode'])
// if theNode was the context node, you could use this instead
normalize-space()
You might want use a more efficient way of selecting the context node than the example XPath I have been using. eg, the following Javascript example can be run against this page in some browsers.
var el = document.getElementById('question');
var result = document.evaluate('normalize-space()', el, null ).stringValue;
The whitespace only text node between the span and b elements might be a problem.
Use:
string(//div[#id='theNode'])
When this expression is evaluated, the result is the string value of the first (and hopefully only) div element in the document.
As the string value of an element is defined in the XPath Specification as the concatenation in document order of all of its text-node descendants, this is exactly the wanted string.
Because this can include a number of all-white-space text nodes, you may want to eliminate contiguous leading and trailing white-space and replace any such intermediate white-space by a single space character:
Use:
normalize-space(string(//div[#id='theNode']))
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
"<xsl:copy-of select="string(//div[#id='theNode'])"/>"
===========
"<xsl:copy-of select="normalize-space(string(//div[#id='theNode']))"/>"
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<div id='theNode'> This is an
<span style="color:red">example</span>
<b>bolded</b> text
</div>
the two XPath expressions are evaluated and the results of these evaluations are copied to the output:
" This is an
example
bolded text
"
===========
"This is an example bolded text"
If you are using scrapy in python, you can use descendant-or-self::*/text(). Full example:
txt = """<div id='theNode'>
This is an <span style="color:red">example</span> <b>bolded</b> text
</div>"""
selector = scrapy.Selector(text=txt, type="html") # Create HTML doc from HTML text
all_txt = selector.xpath('//div/descendant-or-self::*/text()').getall()
final_txt = ''.join( _ for _ in all_txt).strip()
print(final_txt) # 'This is an example bolded text'
How about this :
/div/text()[1] | /div/span/text() | /div/b/text() | /div/text()[2]
Hmmss I am not sure about the last part though. You might have to play with that.
normal code
//div[#id='theNode']
to get all text but if they become split then
//div[#id='theNode']/text()
Not sure but if you provide me the link I will try

How can I use Ruby to parse through XML easily to query and find certain tag values?

I am working with an API and want to know how I can easily search and display/format the output based on the tags.
For example, here is the page with the API and examples of the XML OUtput:
http://developer.linkedin.com/docs/DOC-1191
I want to be able to treat each record as an object, such as User.first-name User.last-name so that I can display and store information, and do searches.
Is there perhaps a gem that makes this easier to do?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<people-search>
<people total="108" count="10" start="0">
<person>
<id>tePXJ3SX1o</id>
<first-name>Bill</first-name>
<last-name>Doe</last-name>
<headline>Marketing Professional and Matchmaker</headline>
<picture-url>http://media.linkedin.com:/....</picture-url>
</person>
<person>
<id>pcfBxmL_Vv</id>
<first-name>Ed</first-name>
<last-name>Harris</last-name>
<headline>Chief Executive Officer</headline>
</person>
...
</people>
<num-results>108</num-results>
</people-search>
This might give you a jump start:
#!/usr/bin/env ruby
require 'nokogiri'
XML = %{<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<people-search>
<people total="108" count="10" start="0">
<person>
<id>tePXJ3SX1o</id>
<first-name>Bill</first-name>
<last-name>Doe</last-name>
<headline>Marketing Professional and Matchmaker</headline>
<picture-url>http://media.linkedin.com:/foo.png</picture-url>
</person>
<person>
<id>pcfBxmL_Vv</id>
<first-name>Ed</first-name>
<last-name>Harris</last-name>
<headline>Chief Executive Officer</headline>
</person>
</people>
<num-results>108</num-results>
</people-search>}
doc = Nokogiri::XML(XML)
doc.search('//person').each do |person|
firstname = person.at('first-name').text
puts "firstname: #{firstname}"
end
# >> firstname: Bill
# >> firstname: Ed
The idea is you're looping over the section that repeats, "person", in this case. Then you pick out the sections you want and extract the text. I'm using Nokogiri's .at() to get the first occurrence, but there are other ways to do it.
The Nokogiri site has good examples and well written documentation so be sure to spend a bit of time going over it. You should find it easy going.
nokogiri is a really nice xml parser for ruby that allows you to use xpath or css3 selectors to access your xml, but its not an xml to object mapper
there is a project called xml-mapping that does exactly this, by defining xpath expressions that should be mapped to object properties - and vice versa.
This is how I did it for the Ruby Challenge using the built-in REXML.
This is basicaly the parsing code for the whole document:
doc = REXML::Document.new File.new cia_file
doc.elements.each('cia/continent') { |e| #continents.push Continent.new(e) }
doc.elements.each('cia/country') { |e| #countries.push Country.new(self, e) }
http://nokogiri.org/ is an option you should investigate

Resources