Replace with use of regular expression result - ruby

Let's say I've got some text with a couple tags like this:
[twitter:jpunt]
I want to replace those into something like this:
#Jpunt
How could I do this in Ruby? I've been researching regular expressions for a couple of hours, just with a lot of frustration as a result. Anyone?

This should do the job:
initial = "[twitter:jpunt]"
link = initial.gsub(/\[twitter:(\w+)\]/i, '#\1')

It is one line code (click here to test this code) >>
output = input.gsub(/\[([^:]+):([^\]]+)\]/) {
'#' + $2.capitalize + '' }
The above code works with any tag name. If you want just twitter to be allowed, then go with modification:
output = input.gsub(/\[twitter:([^\]]+)\]/) {
'#' + $1.capitalize + '' }

Related

Proper gsub regular expression for this URL?

Say I have a string representing a URL:
http://www.mysite.com/somepage.aspx?id=33
..I'd like to escape the forward slashes and the question mark:
http:\/\/www.mysite.com\/somepage.aspx\?id=33
How can I do this via gsub? I've been playing with some regular expressions in there but haven't hit on the winning formula yet.
I suggest you use
url = url.gsub(/(?=[\/?])/, '\\')
As shown here
url = 'http://www.mysite.com/somepage.aspx?id=33'
url = url.gsub(/(?=[\/?])/, '\\')
puts url
output
http:\/\/www.mysite.com\/somepage.aspx\?id=33
How about this one result = searchText.gsub(/(\/|\?)/, "\\\\$1")
I will suggest using a block to make it more readable:
url.gsub(/[\/?]/) { |c| "\\#{c}" }

How to prevent CKEditor replacing spaces with ?

I'm facing an issue with CKEditor 4, I need to have an output without any html entity so I added config.entities = false; in my config, but some appear when
an inline tag is inserted: the space before is replaced with
text is pasted: every space is replaced with even with config.forcePasteAsPlainText = true;
You can check that on any demo by typing
test test
eg.
Do you know how I can prevent this behaviour?
Thanks!
Based on Reinmars accepted answer and the Entities plugin I created a small plugin with an HTML filter which removes redundant entities. The regular expression could be improved to suit other situations, so please edit this answer.
/*
* Remove entities which were inserted ie. when removing a space and
* immediately inputting a space.
*
* NB: We could also set config.basicEntities to false, but this is stongly
* adviced against since this also does not turn ie. < into <.
* #link http://stackoverflow.com/a/16468264/328272
*
* Based on StackOverflow answer.
* #link http://stackoverflow.com/a/14549010/328272
*/
CKEDITOR.plugins.add('removeRedundantNBSP', {
afterInit: function(editor) {
var config = editor.config,
dataProcessor = editor.dataProcessor,
htmlFilter = dataProcessor && dataProcessor.htmlFilter;
if (htmlFilter) {
htmlFilter.addRules({
text: function(text) {
return text.replace(/(\w) /g, '$1 ');
}
}, {
applyToAll: true,
excludeNestedEditable: true
});
}
}
});
These entities:
// Base HTML entities.
var htmlbase = 'nbsp,gt,lt,amp';
Are an exception. To get rid of them you can set basicEntities: false. But as docs mention this is an insecure setting. So if you only want to remove , then I should just use regexp on output data (e.g. by adding listener for #getData) or, if you want to be more precise, add your own rule to htmlFilter just like entities plugin does here.
Remove all but not <tag> </tag> with Javascript Regexp
This is especially helpful with CKEditor as it creates lines like <p> </p>, which you might want to keep.
Background: I first tried to make a one-liner Javascript using lookaround assertions. It seems you can't chain them, at least not yet. My first approach was unsuccesful:
return text.replace(/(?<!\>) (?!<\/)/gi, " ")
// Removes but not <p> </p>
// It works, but does not remove `<p> blah </p>`.
Here is my updated working one-liner code:
return text.replace(/(?<!\>\s.)( (?!<\/)|(?<!\>) <\/p>)/gi, " ")
This works as intended. You can test it here.
However, this is a shady practise as lookarounds are not fully supported by some browsers.
Read more about Assertions.
What I ended up using in my production code:
I ended up doing a bit hacky approach with multiple replace(). This should work on all browsers.
.trim() // Remove whitespaces
.replace(/\u00a0/g, " ") // Remove unicode non-breaking space
.replace(/((<\w+>)\s*( )\s*(<\/\w+>))/gi, "$2<!--BOOM-->$4") // Replace empty nbsp tags with BOOM
.replace(/ /gi, " ") // remove all
.replace(/((<\w+>)\s*(<!--BOOM-->)\s*(<\/\w+>))/gi, "$2 $4") // Replace BOOM back to empty tags
If you have a better suggestion, I would be happy to hear 😊.
I needed to change the regular expression Imeus sent, in my case, I use TYPO3 and needed to edit the backend editor. This one didn't work. Maybe it can help another one that has the same problem :)
return text.replace(/ /g, ' ');

XQuery looking for text with 'single' quote

I can't figure out how to search for text containing single quotes using XPATHs.
For example, I've added a quote to the title of this question. The following line
$x("//*[text()='XQuery looking for text with 'single' quote']")
Returns an empty array.
However, if I try the following
$x("//*[text()=\"XQuery looking for text with 'single' quote\"]")
It does return the link for the title of the page, but I would like to be able to accept both single and double quotes in there, so I can't just tailor it for the single/double quote.
You can try it in chrome's or firebug's console on this page.
Here's a hackaround (Thanks Dimitre Novatchev) that will allow me to search for any text in xpaths, whether it contains single or double quotes. Implemented in JS, but could be easily translated to other languages
function cleanStringForXpath(str) {
var parts = str.match(/[^'"]+|['"]/g);
parts = parts.map(function(part){
if (part === "'") {
return '"\'"'; // output "'"
}
if (part === '"') {
return "'\"'"; // output '"'
}
return "'" + part + "'";
});
return "concat(" + parts.join(",") + ")";
}
If I'm looking for I'm reading "Harry Potter" I could do the following
var xpathString = cleanStringForXpath( "I'm reading \"Harry Potter\"" );
$x("//*[text()="+ xpathString +"]");
// The xpath created becomes
// //*[text()=concat('I',"'",'m reading ','"','Harry Potter','"')]
Here's a (much shorter) Java version. It's exactly the same as JavaScript, if you remove type information. Thanks to https://stackoverflow.com/users/1850609/acdcjunior
String escapedText = "concat('"+originalText.replace("'", "', \"'\", '") + "', '')";!
In XPath 2.0 and XQuery 1.0, the delimiter of a string literal can be included in the string literal by doubling it:
let $a := "He said ""I won't"""
or
let $a := 'He said "I can''t"'
The convention is borrowed from SQL.
This is an example:
/*/*[contains(., "'") and contains(., '"') ]/text()
When this XPath expression is applied on the following XML document:
<text>
<t>I'm reading "Harry Potter"</t>
<t>I am reading "Harry Potter"</t>
<t>I am reading 'Harry Potter'</t>
</text>
the wanted, correct result (a single text node) is selected:
I'm reading "Harry Potter"
Here is verification using the XPath Visualizer (A free and open source tool I created 12 years ago, that has taught XPath the fun way to thousands of people):
Your problem may be that you are not able to specify this XPath expression as string in the programming language that you are using -- this isn't an XPath problem but a problem in your knowledge of your programming language.
Additionally, if you were using XQuery, instead of XPath, as the title says, you could also use the xml entities:
"" for double and &apos; for single quotes"
they also work within single quotes
You can do this using a regular expression. For example (as ES6 code):
export function escapeXPathString(str: string): string {
str = str.replace(/'/g, `', "'", '`);
return `concat('${str}', '')`;
}
This replaces all ' in the input string by ', "'", '.
The final , '' is important because concat('string') is an error.
Well I was in the same quest, and after a moment I found that's there is no support in xpath for this, quiet disappointing! But well we can always work around it!
I wanted something simple and straight froward. What I come with is to set your own replacement for the apostrophe, kind of unique code (something you will not encounter in your xml text) , I chose //apos// for example. now you put that in both your xml text and your xpath query . (in case of xml you didn't write always we can replace with replace function of any editor). And now how we do? we search normally with this, retrieve the result, and replace back the //apos// to '.
Bellow some samples from what I was doing: (replace_special_char_xpath() is what you need to make)
function repalce_special_char_xpath($str){
$str = str_replace("//apos//","'",$str);
/*add all replacement here */
return $str;
}
function xml_lang($xml_file,$category,$word,$language){ //path can be relative or absolute
$language = str_replace("-","_",$language);// to replace - with _ to be able to use "en-us", .....
$xml = simplexml_load_file($xml_file);
$xpath_result = $xml->xpath("${category}/def[en_us = '${word}']/${language}");
$result = $xpath_result[0][0];
return repalce_special_char_xpath($result);
}
the text in xml file:
<def>
<en_us>If you don//apos//t know which server, Click here for automatic connection</en_us> <fr_fr>Si vous ne savez pas quelle serveur, Cliquez ici pour une connexion automatique</fr_fr> <ar_sa>إذا لا تعرفوا أي سرفير, إضغطوا هنا من أجل إتصال تلقائي</ar_sa>
</def>
and the call in the php file (generated html):
<span><?php echo xml_lang_body("If you don//apos//t know which server, Click here for automatic connection")?>

Ruby Regular Expression: Setting $1 variable in a hash

Everything in this code works properly, except the contents of the $1 variable aren't being properly displayed. According to my tests, all the matching is being done properly, I am just having trouble figuring out how to actually output the contents of $1.
codeTags = {
/\[b\](.+?)\[\/b\]/m => "<strong>#{$1}</strong>",
/\[i\](.+?)\[\/i\]/m => "<em>#{$1}</em>"
}
regexp = Regexp.new(/(#{Regexp.union(codeTags.keys)})/)
message = (message).gsub(/#{regexp}/) do |match|
codeTags[codeTags.keys.select {|k| match =~ Regexp.new(k)}[0]]
end
return message.html_safe
Thank you!
As soon as you do this:
codeTags = {
/\[b\](.+?)\[\/b\]/m => "<strong>#{$1}</strong>",
/\[i\](.+?)\[\/i\]/m => "<em>#{$1}</em>"
}
The #{$1} bits in the values are interpolated using whatever happens to be in $1 at the time. The values will most likely be "<strong></strong>" and "<em></em>" and those aren't very useful.
And regexp is already a regular expression object so gsub(/#{regexp}/) should be just gsub(regexp). Similar things apply to the keys of codeTags, they're already regular expression objects so you don't need to Regexp.new(k).
I'd change the whole structure, you're overcomplicating things. Just something simple like this would be fine for only two replacements:
message = message.gsub(/\[b\](.*?)\[\/b\]/) { '<strong>' + $1 + '</strong>' }
message = message.gsub(/\[i\](.*?)\[\/i\]/) { '<em>' + $1 + '</em>' }
If you try to do it all at once you'll have problems with nesting in something like this:
message = 'Where [b]is[/b] pancakes [b]house [i]and[/i] more[/b] stuff?'
You'd end up having to use a recursive gsub and possibly some lambdas if you wanted to properly handle things like that with a single expression.
There are better things to spend your time on than trying to be clever on something like this.
Response to comments: If you have more bb-tags and some smilies to worry about and several messages per page then you should HTMLify each message when you create it. You could store only the HTML version or both HTML and BB-Code versions if you want the BB-Code stuff around for some reason. This way you'd only pay for the HTMLification once per message and producing your big lists would be nearly free.

How can I search for a text and fill/click on a link with Selenium?

Here's the deal:
Is there a way to search for an input name or type witch is not precise and fill it?
For example, I want to fill any input with the name email with my email, but I maybe have some inputs named email-123, emailemail, emails etc... Is there a way to do something like * email * ?
And how can I click on a link verifying some text that could be on the link, or above the link, or close, or at class etc ?
ps: I'm using selenium ide with firefox
You can use Xpath to find it with something like //input[contains(#name,'email'). If you have multiple instances like that on the page it will be worth moving your test to your favourite programming language and then doing
emailInstances = sel.get_xpath_count("//input[contains(#name,'email')]")
for i in range(int(emailInstances)):
sel.type("//input[contains(#name,'email')]["+ i + 1 +"]","email#address.tld")
Xpath works well and the solution above is good. If you are trying to test old verions of IE you could also use JavaScript injection. I find it is very fast, although can be a bit trickier to debug. I didn't actually check if the below works but hopefully it gives you an idea of what you can do:
String javaScript = "_sl_enterEmailStr = function(parentObj,str) { "+
" var allTags = parentObj.getElementsByTagName('input'); "+
" for (var i = 0; i < allTags.length; ++i) { "+
" var tag = allTags[i]; "+
" if (tag.name && tag.type && tag.type === 'text' "+
" && tag.name.match(/email/)) { "+
" tag.value = str; "+
" } "+
" } "+
"}; "+
"_sl_enterEmailStr(this.browserbot.getCurrentWindow().document "+
" ,'myemail#mydomain.org'); ";
mySelenium.getEval(javaScript);
I find JavaScript injection with regular expressions allows me to do great things to dynamic input fields. Note you can use findElement() to be more specific about where you look for tags.
Regarding clicking a link and getting text, those are simple click() and getText() operations that can be done given the proper locator. I would check out the selenium API. for example, here is the link to the Java one for 1.0b2.

Resources