i am learning xpath, and i am trying to get some data from html usint xpath
i found that google chrome has a option to "copy xpath" and it works nice
but doesnt work to this exemple
<div id="site-main">
<header class="main" role="banner">some divs </header>
</div>
i use this on google chrome console
$x("//*[#id="site-main"]/header")
and return "SyntaxError: Unexpected identifier"
i dont see anythin wrong...do you?
$x("//*[#id="site-main"]/header")
^ ^
The marked quotes cause the error — in fact, the string is terminated right after =.
You have to escape the quotes inside the XPath expression. The way of escaping depends on the language you are using. If it's Javascript, then it would be with \:
$x("//*[#id=\"site-main\"]/header")
Also have a look on this question: Escape quotes in JavaScript
You can use single quotes in the xpath query:
$x("//*[#id='site-main']/header")
Related
I am trying to use the following link to trigger event tracking in GA
<a href="/wp-content/uploads/2017/07/2017-brochure-web.pdf" target="0" onClick=”ga(‘send’,‘event’,‘Online Brochure’,‘Download’,‘Product Brochure’,10);”>DOWNLOAD BROCHURE</a>
Console is giving me this error:
Uncaught SyntaxError: Invalid or unexpected token
any ideas what is causing that error?
This error is occurring because starting at onClick there are left and right single/double (‘ ’ ”) quotation characters rather than neutral quotation marks (") and apostrophes ('). This can happen when copying and pasting from various sources.
Your code will work if you replace the left/right single/double quotation with neutral versions.
Hopefully that helps!
I'm trying to take a code snippet from Litmus to use within my Assemble.io project (HTML emails). A typical code block looks like this:
<style>#media print{ #_t { background-image: url('https://0me4e2bg.emltrk.com/0me4e2bg?p&d=%%Email%%');}} div.OutlookMessageHeader {background-image:url('https://0me4e2bg.emltrk.com/0me4e2bg?f&d=%%Email%%')} table.moz-email-headers-table {background-image:url('https://0me4e2bg.emltrk.com/0me4e2bg?f&d=%%Email%%')} blockquote #_t {background-image:url('https://0me4e2bg.emltrk.com/0me4e2bg?f&d=%%Email%%')} #MailContainerBody #_t {background-image:url('https://0me4e2bg.emltrk.com/0me4e2bg?f&d=%%Email%%')}</style><div id="_t"></div>
<img src="https://0me4e2bg.emltrk.com/0me4e2bg?d=%%Email%%" width="1" height="1" border="0" />
Ideally, I would love to just stick this whole thing into a YFM variable, which I've tried unsuccessfully. I believe the parser is getting stuck on the #, the quotes, the curly braces, or any combination of the above. I've tried wrapping the code block in '', "", ``, and ```, none of which work. Right now I've taken the variable part of that block (in this case, 0me4e2bg) and used just that in my YFM, which works well enough but I'm sure using CSS and HTML blocks/snippets in YFM has happened to someone else and I'm curious if there is a solution? Is it just that I'm not escaping it properly? Thanks!
EDIT: After trying the answer suggested by Anthon, I get the following error
can not read an implicit mapping pair; a colon is missed
which looks like it's triggered by the # in #media?
A scalar in YAML doesn't need quotes unless it has special characters, and in your case it does. Quoted scalars can use single quotes in which existing single quotes would need to be repeated, or double quotes in which you can use backslash escapes.
If you want your string as is using literal block quoting is most of the best approach, the only thing that has problems with is starting whitespace and end-of-line blanks (i.e. before the newline). You should be able to assign your code block to the variable code as follows:
---
title: YAML Front Matter
description: A very simple way to add structured data to a page.
code: |
<style>#media print{ #_t { background-image: url('https://0me4e2bg.emltrk.com/0me4e2bg?p&d=%%Email%%');}} div.OutlookMessageHeader {background-image:url('https://0me4e2bg.emltrk.com/0me4e2bg?f&d=%%Email%%')} table.moz-email-headers-table {background-image:url('https://0me4e2bg.emltrk.com/0me4e2bg?f&d=%%Email%%')} blockquote #_t {background-image:url('https://0me4e2bg.emltrk.com/0me4e2bg?f&d=%%Email%%')} #MailContainerBody #_t {background-image:url('https://0me4e2bg.emltrk.com/0me4e2bg?f&d=%%Email%%')}</style><div id="_t"></div>
<img src="https://0me4e2bg.emltrk.com/0me4e2bg?d=%%Email%%" width="1" height="1" border="0" />
---
<h1> {{ title }} </h1>
You have to make sure the indentation under code is consistent, which is normally easier than parsing the string for characters to escape.
You can e.g. check online that the first part is valid YAML.
<url>{substring-before(data($y/link[1]/#href),'&')}</url>
The error I get when trying to run this is
No closing ';' found for entity or character reference
Anybody have any idea what's causing this error?
In XQuery an ampersand within a string literal (and in certain other contexts) needs to be escaped as &, just as it would be in XML.
Michael Kay is correct. The "&" is illegal by itself in XML. It is always to be accompanied by an entity. Examples include & < >, etc.
If you think that your search won't work because you are searching for "&" instead of "&", that is not proper thinking. As a human, try to translate in your head that "&" really looks like "&" to the XML parser. Doing this will work:
<url>{substring-before(data($y/link[1]/#href),'&')}</url>
I have a string like this.
<p class='link'>try</p>bla bla</p>
I want to get only <p class='link'>try</p>
I have tried this.
/<p class='link'>[^<\/p>]+<\/p>/
But it doesn't work.
How can I can do this?
Thanks,
If that is your string, and you want the text between those p tags, then this should work...
/<p\sclass='link'>(.*?)<\/p>/
The reason yours is not working is because you are adding <\/p> to your not character range. It is not matching it literally, but checking for not each character individually.
Of course, it is mandatory I mention that there are better tools for parsing HTML fragments (such as a HTML parser.)
'/<p[^>]+>([^<]+)<\/p>/'
will get you "try"
It looks like you used this block: [^<\/p>]+ intending to match anything except for </p>. Unfortunately, that's not what it does. A [] block matches any of the characters inside. In your case, the /<p class='link'>[^<\/p>]+ part matched <p class='link'>try</, but it was not immediately followed by the expected </p>, so there was no match.
Alex's solution, to use a non-greedy qualifier is how I tend to approach this sort of problem.
I tried to make one less specific to any particular tag.
(<[^/]+?\s+[^>]*>[^>]*>)
this returns:
<p class='link'>try</p>
In my Ruby app, I've used the following method and regular expression to remove all HTML tags from a string:
str.gsub(/<\/?[^>]*>/,"")
This regular expression did just about all I was expecting it to, except it caused all quotation marks to be transformed into “
and all single quotes to be changed to ”
.
What's the obvious thing I'm missing to convert the messy codes back into their proper characters?
Edit: The problem occurs with or without the Regular Expression, so it's clear my problem has nothing to do with it. My question now is how to deal with this formatting error and correct it. Thanks!
Use CGI::unescapeHTML after you perform your regular expression substitution:
CGI::unescapeHTML(str.gsub(/<\/?[^>]*>/,""))
See http://www.ruby-doc.org/core/classes/CGI.html#M000547
In the above code snippet, gsub removes all HTML tags. Then, unescapeHTML() reverts all HTML entities (such as <, “) to their actual characters (<, quotes, etc.)
With respect to another post on this page, note that you will never ever be passed HTML such as
<tag attribute="<value>">2 + 3 < 6</tag>
(which is invalid HTML); what you may receive is, instead:
<tag attribute="<value>">2 + 3 < 6</tag>
The call to gsub will transform the above to:
2 + 3 < 6
And unescapeHTML will finish the job:
2 + 3 < 6
You're going to run into more trouble when you see something like:
<doohickey name="<foobar>">
You'll want to apply something like:
gsub(/<[^<>]*>/, "")
...for as long as the pattern matches.
This regular expression did just about
all I was expecting it to, except it
caused all quotation marks to be
transformed into “ and all
single quotes to be changed to ”
.
This doesn't sound as if the RegExp would be doing this. Are you sure it's different before?
See this question here for information about the problem, it has got an excellent answer:
Get non UTF-8 form fields as UTF-8 in php.
I've run into a similar problem with character changes, this happened when my code ran through another module that enforced UTF-8 encoding and then when it came back, I had a different file (slurped array of lines) on my hands.
You could use a multi-pass system to get the results you are looking for.
After running your regular expression, run an expression to convert &8220; to quotes and another to convert &8221; to single quotes.