How to rewrite a regexp (?<!...) in ruby1.8. (reimplement wpautop function) - ruby

I'm writing a blog archive converter in ruby. In order to convert wordpress post content to html format, I must implement wordpress's wpautop() function.
Original wpautop() function : http://pastebin.com/BzV8bXxQ
My ruby implement: https://github.com/chloerei/blog_converter/blob/master/lib/blog_converter/adaptor/wordpress.rb , see Wordpress#wpautop_filter
It work fine in ruby1.9.2, bu in 1.8.7, it throw an error
blog_converter/lib/blog_converter/adaptor/wordpress.rb:147: undefined (?...) sequence: /(?<!<br \/>)\s*\n/
The sources
// In php
$pee = preg_replace('|(?<!<br />)\s*\n|', "<br />\n", $pee); // optionally make line breaks
# In ruby
string.gsub!(%r|(?<!<br />)\s*\n|, "<br />\n") # optionally make line breaks
After some search, I found that ruby1.8.7 doesn't have the new regexp engine 'Oniguruma', it doesn't support new regexp syntax.
So I think I have two way:
Add dependent 'oniguruma' when using ruby < 1.9.0
Rewrite /(?<!<br \/>)\s*\n/ in old syntax
Which way is better? And how to rewrite this regexp?

If not works look-ahead assertions you can like this(attention!- not tested-i haven't 1.8):
string.gsub!(%r|(<br />)?\s*\n|, "<br />\n")

Try this .gsub!(%r|(<br />)?\s*\n|, "<br />\n")

Related

Turkish character not rendering properly

I have a problem in Perl with a Turkish character. I have set the Turkish character set but my Turkish character isn't displaying properly.
This is my Perl script.
#!"c:\xampp\perl\bin\perl.exe"
use strict;
use warnings;
use CGI qw(:standard);
my $name = param('name');
my $surname = param('surname');
my $age = param('age');
my $gender = param('gender');
my $q = new CGI;
# We're outputting plain text, not HTML
print $q->header(-content_type => 'text/plain' , -charset => 'ISO-8859-9');
my $text = $name." ".$surname." ".$age." ".$gender." kaydı, sistemde yapılacak olan güncellemelerden sonra sisteme başarıyla eklenecektir.";
# printf("%s %s %d %d kaydı, sistemde yapılacak olan güncellemelerden sonra sisteme başarıyla eklenecektir.", $name , $surname , $age , $gender);
print $text;
How can I fix the problem?
First and foremost: Do not use the HTML generation functionality in CGI.pm. Using those leads to an unmaintainable mess of a presentation layer. Instead, use Template Toolkit based templates to separate presentation and functionality.
Second, do not use indirect object notation. That is, do not write:
my $cgi = new CGI;
Instead, write
my $cgi = CGI->new
The use of $q or $query to refer to the CGI object is weird and has it roots in the early days of the WWW. There is no reason to perpetuate it when you are learning things from scratch.
In addition, given that you have just instantiated an object, don't use plain subs such as param and don't pollute the namespace of your script. Access parameter values using:
my $value = $cgi->param('surname');
Finally, if you are going to use "interesting" characters in your source code such as Ş, save your source code as UTF-8 and specify
use utf8;
at the top of your script.
Finally, do also save all your HTML templates in UTF-8 and generate all output from your script encoded in UTF-8 and specify the document encoding as UTF-8. All other paths lead to insanity.
Also, don't use sexual as a parameter name. Use nouns as parameter and variable names. And, most infuriating to me as a Turk, Mr. (Bay) and Ms. (Bayan) are just titles and they are not appropriate choices for an input field asking about the sex of the respondent.
See also How can I deal with diverse gender identities in user profiles? That may not be on your radar in Turkey at the moment, but you will eventually encounter the issue.
Here is an untested script that might work for you:
#!"c:\xampp\perl\bin\perl.exe"
use utf8;
use strict;
use warnings;
use warnings qw(FATAL utf8);
# Provides only object oriented interface
# without HTML generation cruft
use CGI::Simple;
run( CGI::Simple->new );
sub run {
my $cgi = shift;
binmode STDOUT, ':encoding(UTF-8)';
my ($name, $surname, $age, $gender) = map $cgi->param($_), qw(name surname age gender);
print $cgi->header(-type => 'text/plain', -charset => 'utf-8');
printf("%s %s %d %d kaydı, sistemde yapılacak olan güncellemelerden sonra sisteme başarıyla eklenecektir.\n",
$name , $surname , $age , $gender);
return;
}
I think you are misunderstanding the purpose of the charset attribute on the Content-type header. Your program is emitting this header:
Content-Type: text/plain; charset=ISO-8859-9
This says to an HTTP client (a browser, for example) "I am going to send you some plain text which is encoded as ISO-8859-9". But it's important to note that the header is purely informational. It tells the world that your text is encoded as ISO-8859-9. It does not do the encoding for you. That is up to you.
This is why Borodin and others have been asking you questions which you have not been answering. Most editors will create text that is encoded either as ISO-8859-1 or UTF-8. Unless you have a special Turkish editor or you have changed the configuration of your editor, it seems very unlikely to me that you are producing text in ISO-8859-9.
If you are determined to emit ISO-8859-9 text, then you need to do that encoding yourself. You can use the encode() function from the Encode module to do that.
use Encode;
my $encoded_text = encode('ISO-8859-9', $text);
print $encoded_text;
But I wonder why you want to use a relatively obscure encoding like ISO-8859-9. UTF-8 covers all of the characters used in Turkish. Why not use that instead? Your life will become far easier if you embrace the same standards as the web.
As an aside, you have introduced a small strangeness in your code. You use CGI.pm in "functions" mode and load it in a way which imports a number of its functions into your namespace.
use CGI qw(:standard);
And then you use the param() function a few times in this way. But after that you create a CGI query object in order to call the header() method on it.
my $q = new CGI;
print $q->header(...);
You probably don't realise it, but the header() function is included in the :standard set of imports, so you can call it without creating a CGI object.
print header(...);
I used it that way in my answer to your previous question. I'm not sure why you changed the code to make it more complicated.
I should also point out that if you do want to create a CGI query object, then you shouldn't use the indirect object notation:
my $q = new CGI;
This will cause you problems at some point. Far better to write:
my $q = CGI->new;
(As demonstrated in the CGI.pm documentation)

coffeescript syntax error "unexpected REGEX"

I'm trying to convert jquery into coffeescript but I'm getting syntax error
SyntaxError: unexpected REGEX
This is my code:
container = document.querySelector('#style-container');
msnry = new Masonry( container, {
// options
columnWidth: 200
itemSelector: '.item'
});
What am I doing wrong?
Thanks!
That's not CoffeeScript. This is CoffeeScript:
container = document.querySelector "#style-container"
msnry = new Masonry(container,
columnWidth: 200
itemSelector: ".item"
)
You can convert JavaScript to CoffeeScript using this tool.
The specific error is referring to the comment tag. // doesn't mean a comment in CoffeeScript, so it falls back to an empty regular expression. A more useful regular expression would be /[0-9]+/, however the contents are optional in CoffeeScript.
// this is a JS comment
# this is a CS comment
The error is you are using // for a comment instead of #.
In addition to that, you example still looks more like JavaScript than CoffeeScript, but that's the specific error you are getting. See also http://js2coffee.org/
CoffeeScript comments start with #, instead of //. As noted above the // is used for a blank regex. When learning CoffeeScript, I recommend http://coffeescript.org/ and the Try CoffeeScript tool, so that you can see the JavaScript that your CoffeeScript would give rise to.

Converting Jsonp to Json in different methods

I been trying to use JSONP data in a json format in a ruby project.
From your experiences how did you address this?
JSONP is easy to handle. It's just JSON in a minor wrapper, and that wrapper is easy to strip off:
require 'open-uri'
require 'json'
URL = 'http://www.google.com/dictionary/json?callback=a&sl=en&tl=en&q=epitome'
jsonp = open(URL).read
jsonp now contains the result in JSONP format:
jsonp[0, 3] # => "a({"
jsonp[-11 ... -1] # => "},200,null"
Those extraneous parts, a{ and ,200,null" are the trouble spots when passing the data to JSON for parsing, so we strip them.
A simple, greedy, regex is all that's needed. /{.+}/ will find everything wrapped by the outermost curly-braces and return it, which is all the JSON needs:
data = JSON.parse(jsonp[/{.+}/])
data['query'] # => "epitome"
data['primaries'].size # => 1
From my experience, one way is to use this regex to filter out the function callback name:
/(\{.*\})/m
or the lazy way would be find the index of the first occurrence of "(" and just substring it with last character, which would be a ")" .
I been trying to look for answers on here, didn't get a solid answer, hope this helps.
Cheers

XQuery looking for text with 'single' quote

I can't figure out how to search for text containing single quotes using XPATHs.
For example, I've added a quote to the title of this question. The following line
$x("//*[text()='XQuery looking for text with 'single' quote']")
Returns an empty array.
However, if I try the following
$x("//*[text()=\"XQuery looking for text with 'single' quote\"]")
It does return the link for the title of the page, but I would like to be able to accept both single and double quotes in there, so I can't just tailor it for the single/double quote.
You can try it in chrome's or firebug's console on this page.
Here's a hackaround (Thanks Dimitre Novatchev) that will allow me to search for any text in xpaths, whether it contains single or double quotes. Implemented in JS, but could be easily translated to other languages
function cleanStringForXpath(str) {
var parts = str.match(/[^'"]+|['"]/g);
parts = parts.map(function(part){
if (part === "'") {
return '"\'"'; // output "'"
}
if (part === '"') {
return "'\"'"; // output '"'
}
return "'" + part + "'";
});
return "concat(" + parts.join(",") + ")";
}
If I'm looking for I'm reading "Harry Potter" I could do the following
var xpathString = cleanStringForXpath( "I'm reading \"Harry Potter\"" );
$x("//*[text()="+ xpathString +"]");
// The xpath created becomes
// //*[text()=concat('I',"'",'m reading ','"','Harry Potter','"')]
Here's a (much shorter) Java version. It's exactly the same as JavaScript, if you remove type information. Thanks to https://stackoverflow.com/users/1850609/acdcjunior
String escapedText = "concat('"+originalText.replace("'", "', \"'\", '") + "', '')";!
In XPath 2.0 and XQuery 1.0, the delimiter of a string literal can be included in the string literal by doubling it:
let $a := "He said ""I won't"""
or
let $a := 'He said "I can''t"'
The convention is borrowed from SQL.
This is an example:
/*/*[contains(., "'") and contains(., '"') ]/text()
When this XPath expression is applied on the following XML document:
<text>
<t>I'm reading "Harry Potter"</t>
<t>I am reading "Harry Potter"</t>
<t>I am reading 'Harry Potter'</t>
</text>
the wanted, correct result (a single text node) is selected:
I'm reading "Harry Potter"
Here is verification using the XPath Visualizer (A free and open source tool I created 12 years ago, that has taught XPath the fun way to thousands of people):
Your problem may be that you are not able to specify this XPath expression as string in the programming language that you are using -- this isn't an XPath problem but a problem in your knowledge of your programming language.
Additionally, if you were using XQuery, instead of XPath, as the title says, you could also use the xml entities:
"" for double and &apos; for single quotes"
they also work within single quotes
You can do this using a regular expression. For example (as ES6 code):
export function escapeXPathString(str: string): string {
str = str.replace(/'/g, `', "'", '`);
return `concat('${str}', '')`;
}
This replaces all ' in the input string by ', "'", '.
The final , '' is important because concat('string') is an error.
Well I was in the same quest, and after a moment I found that's there is no support in xpath for this, quiet disappointing! But well we can always work around it!
I wanted something simple and straight froward. What I come with is to set your own replacement for the apostrophe, kind of unique code (something you will not encounter in your xml text) , I chose //apos// for example. now you put that in both your xml text and your xpath query . (in case of xml you didn't write always we can replace with replace function of any editor). And now how we do? we search normally with this, retrieve the result, and replace back the //apos// to '.
Bellow some samples from what I was doing: (replace_special_char_xpath() is what you need to make)
function repalce_special_char_xpath($str){
$str = str_replace("//apos//","'",$str);
/*add all replacement here */
return $str;
}
function xml_lang($xml_file,$category,$word,$language){ //path can be relative or absolute
$language = str_replace("-","_",$language);// to replace - with _ to be able to use "en-us", .....
$xml = simplexml_load_file($xml_file);
$xpath_result = $xml->xpath("${category}/def[en_us = '${word}']/${language}");
$result = $xpath_result[0][0];
return repalce_special_char_xpath($result);
}
the text in xml file:
<def>
<en_us>If you don//apos//t know which server, Click here for automatic connection</en_us> <fr_fr>Si vous ne savez pas quelle serveur, Cliquez ici pour une connexion automatique</fr_fr> <ar_sa>إذا لا تعرفوا أي سرفير, إضغطوا هنا من أجل إتصال تلقائي</ar_sa>
</def>
and the call in the php file (generated html):
<span><?php echo xml_lang_body("If you don//apos//t know which server, Click here for automatic connection")?>

is there a way to get codeigniter to create a plain text page where \n char works properly?

I am trying to use vanilla forum with proxyconnect and it requires i give it a plain text file of my user's authentication cookie values in the form below. However, when i send it the only way i can get it to work is with the tags and i needs to have the \n tag.
document should be:
UniqueID=5
Name=Kyle
Email=email#email.com
I can get it to display like that with br tags but when i use \n tags they show up like this:
UniqueID=5\nName=Kyle\nEmail=email#email.com
here is the method in the controller
function get_user_info(){
if(!empty($this->user)){
printf('UniqueID=' . $this->user['userID'] . '\n');
printf('Name=' . $this->user['userFirst'] . '\n');
printf('Email=' . $this->user['userEmail'] . '\n');
}
}
Try to use use "\n" with double quotes instead. Special characters will not be expanded when they occur in single quoted strings.
Example
printf('Name=' . $this->user['userFirst'] . "\n");
Along with what rkj has suggested above, you need to output the page as plain text. To do this, add this to the beginning of your controller's function:
$this->output->set_header("Content-Type: text/plain");

Resources