XQuery looking for text with 'single' quote - xpath

I can't figure out how to search for text containing single quotes using XPATHs.
For example, I've added a quote to the title of this question. The following line
$x("//*[text()='XQuery looking for text with 'single' quote']")
Returns an empty array.
However, if I try the following
$x("//*[text()=\"XQuery looking for text with 'single' quote\"]")
It does return the link for the title of the page, but I would like to be able to accept both single and double quotes in there, so I can't just tailor it for the single/double quote.
You can try it in chrome's or firebug's console on this page.

Here's a hackaround (Thanks Dimitre Novatchev) that will allow me to search for any text in xpaths, whether it contains single or double quotes. Implemented in JS, but could be easily translated to other languages
function cleanStringForXpath(str) {
var parts = str.match(/[^'"]+|['"]/g);
parts = parts.map(function(part){
if (part === "'") {
return '"\'"'; // output "'"
}
if (part === '"') {
return "'\"'"; // output '"'
}
return "'" + part + "'";
});
return "concat(" + parts.join(",") + ")";
}
If I'm looking for I'm reading "Harry Potter" I could do the following
var xpathString = cleanStringForXpath( "I'm reading \"Harry Potter\"" );
$x("//*[text()="+ xpathString +"]");
// The xpath created becomes
// //*[text()=concat('I',"'",'m reading ','"','Harry Potter','"')]
Here's a (much shorter) Java version. It's exactly the same as JavaScript, if you remove type information. Thanks to https://stackoverflow.com/users/1850609/acdcjunior
String escapedText = "concat('"+originalText.replace("'", "', \"'\", '") + "', '')";!

In XPath 2.0 and XQuery 1.0, the delimiter of a string literal can be included in the string literal by doubling it:
let $a := "He said ""I won't"""
or
let $a := 'He said "I can''t"'
The convention is borrowed from SQL.

This is an example:
/*/*[contains(., "'") and contains(., '"') ]/text()
When this XPath expression is applied on the following XML document:
<text>
<t>I'm reading "Harry Potter"</t>
<t>I am reading "Harry Potter"</t>
<t>I am reading 'Harry Potter'</t>
</text>
the wanted, correct result (a single text node) is selected:
I'm reading "Harry Potter"
Here is verification using the XPath Visualizer (A free and open source tool I created 12 years ago, that has taught XPath the fun way to thousands of people):
Your problem may be that you are not able to specify this XPath expression as string in the programming language that you are using -- this isn't an XPath problem but a problem in your knowledge of your programming language.

Additionally, if you were using XQuery, instead of XPath, as the title says, you could also use the xml entities:
"" for double and &apos; for single quotes"
they also work within single quotes

You can do this using a regular expression. For example (as ES6 code):
export function escapeXPathString(str: string): string {
str = str.replace(/'/g, `', "'", '`);
return `concat('${str}', '')`;
}
This replaces all ' in the input string by ', "'", '.
The final , '' is important because concat('string') is an error.

Well I was in the same quest, and after a moment I found that's there is no support in xpath for this, quiet disappointing! But well we can always work around it!
I wanted something simple and straight froward. What I come with is to set your own replacement for the apostrophe, kind of unique code (something you will not encounter in your xml text) , I chose //apos// for example. now you put that in both your xml text and your xpath query . (in case of xml you didn't write always we can replace with replace function of any editor). And now how we do? we search normally with this, retrieve the result, and replace back the //apos// to '.
Bellow some samples from what I was doing: (replace_special_char_xpath() is what you need to make)
function repalce_special_char_xpath($str){
$str = str_replace("//apos//","'",$str);
/*add all replacement here */
return $str;
}
function xml_lang($xml_file,$category,$word,$language){ //path can be relative or absolute
$language = str_replace("-","_",$language);// to replace - with _ to be able to use "en-us", .....
$xml = simplexml_load_file($xml_file);
$xpath_result = $xml->xpath("${category}/def[en_us = '${word}']/${language}");
$result = $xpath_result[0][0];
return repalce_special_char_xpath($result);
}
the text in xml file:
<def>
<en_us>If you don//apos//t know which server, Click here for automatic connection</en_us> <fr_fr>Si vous ne savez pas quelle serveur, Cliquez ici pour une connexion automatique</fr_fr> <ar_sa>إذا لا تعرفوا أي سرفير, إضغطوا هنا من أجل إتصال تلقائي</ar_sa>
</def>
and the call in the php file (generated html):
<span><?php echo xml_lang_body("If you don//apos//t know which server, Click here for automatic connection")?>

Related

XPATH remove extra spaces in concatenation of elements

In XPATH, I am treating a source XML that looks like below, where I want to concatenate the child elements of each editor with a space delimiter to create a full name, and then in turn concatenate the resulting full names with commas:
<biblStruct type="book" xml:id="Biller_2011a">
<monogr>
<title>Inquisitors and Heretics in Thirteenth-Century Languedoc: Edition and Translation
of Toulouse Inquisition Depositions, 1273-1282</title>
<editor>
<forename>Peter</forename><surname>Biller</surname>
</editor>
<editor>
<forename>Caterina</forename><surname>Bruschi</surname>
</editor>
<editor>
<forename>Shelagh</forename><surname>Sneddon</surname>
</editor>
<imprint>
<pubPlace>
<settlement>Leiden</settlement>
<country>NL</country>
</pubPlace>
<publisher>Brill</publisher>
<date type="pub_date">2011</date>
</imprint>
</monogr>
</biblStruct>
Currently the XPATH (within XQuery) code looks like this, using XPATH map to introduce delimiters:
let $bibref := $bib//tei:biblStruct[#xml:id="Biller_2011a"]
return <editors>{
(for $auth in $bibref//tei:editor
return normalize-space(string-join($auth//child::text()," ")))!(if (position() > 1) then ', ' else (), .)
}</editors>
But this outputs extra space before and after the commas:
<editors>Peter Biller , Caterina Bruschi , Shelagh Sneddon</editors>
Rather, I want to output:
<editors>Peter Biller, Caterina Bruschi, Shelagh Sneddon</editors>
Thanks in advance.
"where I want to concatenate the child elements of each editor" would translate into $auth/* and not into $auth//child::text().
Somehow the whole mix of for return and ! and string-join looks odd, it seems you can just use string-join($bibref//tei:editor/string-join(*, ' '), ', ').

findelement is not throwing NoSuchElementException

I want to write a test to check if a webelement with a specified text is not present on a page. This is the code for the method doing the job:
public boolean checkOfAanvraagIsOpgevoerd (String titel)
{
String quote = "\"";
String titelMetQuotes = quote + titel +quote;
titelMetQuotes = "dierdieboeboe";
boolean isOpgevoerd=false;
try {
driver.findElement(By.xpath(".//*[#id='listRequests']//h4/a[contains(text(),"+titelMetQuotes+")]"));
isOpgevoerd=true;
} catch (NoSuchElementException NE) {
NE.printStackTrace();
}
return isOpgevoerd;
}
Although I'm absolutely sure that there is no a tag on the page wich contains the text "dierdieboeboe" still the catch block is skipped. When I replace for instance h4 in h5 in the xpath expression the NoSuchElementException is thrown as expected. It seems that the contains part in the expression is ignored.
Try this (note the single quotes around the actual text):
By.xpath("//*[#id='listRequests']//h4/a[contains(text(),'"+titelMetQuotes+"')]")
contains is a function that takes two strings. Hence the text of your variable titelMetQuotes needs to be quoted. Obviously, in this case it is easier to use single quotes.
Additionally, the variable name (titel with quotes) is quite misleading because it actually has no quotes for another flaw in the code:
String titelMetQuotes = quote + titel +quote;
titelMetQuotes = "dierdieboeboe";
The second line simply overwrites the quoted title with a non quoted string.
Finally, you don't need the leading dot in your xpath expression in order to locate the first element of any kind with id listRequests

How to remove amp; from URL

I am getting a URL that contains amp;. Is there any way to remove this as currently I tried URLDecode function, but It's not working. Do I need to remove It using simple string replacement or Is there any better way to do this?
As #Lankymart pointed out URLDecode only works on URL-encoded characters (%26), not on HTML entities (&). Use a regular string replacement to change the HTML entity & into a literal ampersand:
url = Replace(url, "&", "&")
In Angular I added amp; to the params names
this.activatedRoute.queryParams.subscribe(params => {
this.user_id = params['user_id'];
this.practice_id = params['amp;practice_id'];
this.patient_id = params['amp;patient_id'];
});

How to prevent CKEditor replacing spaces with ?

I'm facing an issue with CKEditor 4, I need to have an output without any html entity so I added config.entities = false; in my config, but some appear when
an inline tag is inserted: the space before is replaced with
text is pasted: every space is replaced with even with config.forcePasteAsPlainText = true;
You can check that on any demo by typing
test test
eg.
Do you know how I can prevent this behaviour?
Thanks!
Based on Reinmars accepted answer and the Entities plugin I created a small plugin with an HTML filter which removes redundant entities. The regular expression could be improved to suit other situations, so please edit this answer.
/*
* Remove entities which were inserted ie. when removing a space and
* immediately inputting a space.
*
* NB: We could also set config.basicEntities to false, but this is stongly
* adviced against since this also does not turn ie. < into <.
* #link http://stackoverflow.com/a/16468264/328272
*
* Based on StackOverflow answer.
* #link http://stackoverflow.com/a/14549010/328272
*/
CKEDITOR.plugins.add('removeRedundantNBSP', {
afterInit: function(editor) {
var config = editor.config,
dataProcessor = editor.dataProcessor,
htmlFilter = dataProcessor && dataProcessor.htmlFilter;
if (htmlFilter) {
htmlFilter.addRules({
text: function(text) {
return text.replace(/(\w) /g, '$1 ');
}
}, {
applyToAll: true,
excludeNestedEditable: true
});
}
}
});
These entities:
// Base HTML entities.
var htmlbase = 'nbsp,gt,lt,amp';
Are an exception. To get rid of them you can set basicEntities: false. But as docs mention this is an insecure setting. So if you only want to remove , then I should just use regexp on output data (e.g. by adding listener for #getData) or, if you want to be more precise, add your own rule to htmlFilter just like entities plugin does here.
Remove all but not <tag> </tag> with Javascript Regexp
This is especially helpful with CKEditor as it creates lines like <p> </p>, which you might want to keep.
Background: I first tried to make a one-liner Javascript using lookaround assertions. It seems you can't chain them, at least not yet. My first approach was unsuccesful:
return text.replace(/(?<!\>) (?!<\/)/gi, " ")
// Removes but not <p> </p>
// It works, but does not remove `<p> blah </p>`.
Here is my updated working one-liner code:
return text.replace(/(?<!\>\s.)( (?!<\/)|(?<!\>) <\/p>)/gi, " ")
This works as intended. You can test it here.
However, this is a shady practise as lookarounds are not fully supported by some browsers.
Read more about Assertions.
What I ended up using in my production code:
I ended up doing a bit hacky approach with multiple replace(). This should work on all browsers.
.trim() // Remove whitespaces
.replace(/\u00a0/g, " ") // Remove unicode non-breaking space
.replace(/((<\w+>)\s*( )\s*(<\/\w+>))/gi, "$2<!--BOOM-->$4") // Replace empty nbsp tags with BOOM
.replace(/ /gi, " ") // remove all
.replace(/((<\w+>)\s*(<!--BOOM-->)\s*(<\/\w+>))/gi, "$2 $4") // Replace BOOM back to empty tags
If you have a better suggestion, I would be happy to hear 😊.
I needed to change the regular expression Imeus sent, in my case, I use TYPO3 and needed to edit the backend editor. This one didn't work. Maybe it can help another one that has the same problem :)
return text.replace(/ /g, ' ');

Replace with use of regular expression result

Let's say I've got some text with a couple tags like this:
[twitter:jpunt]
I want to replace those into something like this:
#Jpunt
How could I do this in Ruby? I've been researching regular expressions for a couple of hours, just with a lot of frustration as a result. Anyone?
This should do the job:
initial = "[twitter:jpunt]"
link = initial.gsub(/\[twitter:(\w+)\]/i, '#\1')
It is one line code (click here to test this code) >>
output = input.gsub(/\[([^:]+):([^\]]+)\]/) {
'#' + $2.capitalize + '' }
The above code works with any tag name. If you want just twitter to be allowed, then go with modification:
output = input.gsub(/\[twitter:([^\]]+)\]/) {
'#' + $1.capitalize + '' }

Resources