Currently in L4 you can't get slug from cyrillic string. In L3 there was an ascii array for that. Where and how can I add this array/ability to create a slug from cyrillic string?
EDIT
The library https://github.com/cocur/slugify is a good option, but I decided to use in L4 a custom Slug library from L3 methods and ascii array. Now I have in L4 working Slug maker just like in L3.
You can install this library (https://github.com/cocur/slugify) via composer and use.
It's super easy to install and use.
I have faced this problem when I was working with Arabic language, so I've made the following function which solved the problem for me.
function make_slug($string = null, $separator = "-") {
if (is_null($string)) {
return "";
}
// Remove spaces from the beginning and from the end of the string
$string = trim($string);
// Lower case everything
// using mb_strtolower() function is important for non-Latin UTF-8 string | more info: http://goo.gl/QL2tzK
$string = mb_strtolower($string, "UTF-8");;
// Make alphanumeric (removes all other characters)
// this makes the string safe especially when used as a part of a URL
// this keeps latin characters and arabic charactrs as well
$string = preg_replace("/[^a-z0-9_\s-ءاأإآؤئبتثجحخدذرزسشصضطظعغفقكلمنهويةى]/u", "", $string);
// Remove multiple dashes or whitespaces
$string = preg_replace("/[\s-]+/", " ", $string);
// Convert whitespaces and underscore to the given separator
$string = preg_replace("/[\s_]/", $separator, $string);
return $string;
}
This function solves the problem only for Arabic language, if you want to solve the problem for Cyrillic or any other language, you need to add Cyrillic characters (or the other language's characters) beside or instead of these ءاأإآؤئبتثجحخدذرزسشصضطظعغفقكلمنهويةى existing Arabic characters.
Related
When a text value contains a newline, does SQLite on a Windows machine save the newline as 0x0D0A or as 0x0A?
Edit: I asked this question because I would like to know if this user defined sqlite function will return the right value if the passed string has a newline in it.
#!/use/bin/env perl
use DBI;
# ...
# ....
$dbh->sqlite_create_function( 'bit_length', 1, sub {
use bytes;
return length( $_[0] );
}
);
SQLite does not change strings by itself; as long as you don't explicitly change them with some function, they are treated similar to blobs.
If you have a string like "Hello, world!\n" in your code, it will keep that newline style.
If you read the text from a file, it depends on how your language handles newline conversions in text files, but there are not other places where Perl would implicitly convert newlines.
I have a webservis in php and I encoded the string in utf-8 like this :
$str_output = mb_convert_encoding("MATEMATİK", "UTF-8");
$data_array = array('name' => $str_output);
echo json_encode($data_array);
I get this string from webservis in xcode : MATEMAT\u00ddK
I couldn't convert this string to Turkish string.
My json_dictionary is like this
2014-01-08 16:17:22.274 test_app[6432:70b] {
name = "MATEMAT\U00ddK";
}
I tried this encoding method, but it didn't work for me
NSString * name = [json_dictionary objectForKey:#"name"];
NSString * correctString = [NSString stringWithCString:[baslik cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSWindowsCP1254StringEncoding];
I got null
If I use NSUTF8StringEncoding
MATEMATÝK
Also I tried NSISOLatin1StringEncoding, NSISOLatin2StringEncoding ...
Thanks...
iOS is correctly decoding the \u00dd when you use NSUTF8StringEncoding (which is what you should be using). That's LATIN CAPITAL LETTER Y WITH ACUTE. The letter you want is LATIN CAPITAL LETTER I WITH DOT ABOVE, which is \u0130.
That suggests the problem is on your php side. If I had to guess, I'd suspect that the İ in your source file is not itself in the encoding that php expects. You may need to pass to "from" encoding to mb_convert_encoding depending on what encoding your editor is using.
I would strongly recommend that you stay in UTF-8 entirely if possible, and avoid creating a CP1254 (Turkish) string at all. UTF-8 is capable of encoding all the characters you need. In that case, you may be able to avoid the mb_convert_encoding entirely.
I have a list of Mobile devices that I'm using to display content correctly. The depreciated function looks like this:
function detectPDA($query){
$browserAgent = $_SERVER['HTTP_USER_AGENT'];
$userAgents = $this->getBrowserAgentsToDetect(); // comma separated list of devices
foreach ( $userAgents as $userAgent ) {
if(eregi($userAgent,$browserAgent)){
if(eregi("iphone",$browserAgent) || eregi("ipod",$browserAgent) ){
$this->iphone = true;
}else{
$this->pda = true;
}
}
}
}
What is the correct way to replace the eregi functions?
If all the pattern strings ($userAgent and iphone) can be trusted not to contain special regex chars (()[]!|.^${}?*+), then you just surround the eregi regex with slashes (/) and add an i after the last slash (which means "case insensitive").
So:
eregi($userAgent,$browserAgent) --> preg_match("/$userAgent/i",$browserAgent)
eregi("iphone",$browserAgent) --> preg_match('/iphone/i',$browserAgent)
However, are you just trying to match $userAgent as-is within $browserAgent? For example, if a particular $userAgent was foo.bar, would you want the . to match a literal period, or would you want to interpret it in its regex sense ("match any character")?
If the former, I'd suggest you forgo regex entirely and use stripos($haystack,$needle), which searches for the string $needle in $haystack (case-insensitive). Then you don't need to worry about (say) an asterisk in $userAgent being interpreted in the regex sense instead of the literal sense.
If you do use stripos don't forget it can return a 0 which would evaluate to false, so you need to use === false or !== false (see the documentation I linked).
in the sphinx changelog it says for 0.9.8:
"added query escaping support to query language, and EscapeString() API call"
can i assume, that there should be support for escaping special sphinx characters (#, !,
-, ...) for sphinxQL, too? if so, maybe someone could point me to an example on this. i'm
unable to find anything about it in the documentation or elsewhere on the net.
how do you do fulltext search (using spinxQL), if the search-phrase contains one of the special characters? i don't like the idea very much to "mask" them during indexing.
thanks!
The PHP version of the sphinxapi escape function did not work for me in tests. Also, it provides no protection against SQL-injection sorts of characters (e.g. single quote).
I needed this function:
function EscapeSphinxQL ( $string )
{
$from = array ( '\\', '(',')','|','-','!','#','~','"','&', '/', '^', '$', '=', "'", "\x00", "\n", "\r", "\x1a" );
$to = array ( '\\\\', '\\\(','\\\)','\\\|','\\\-','\\\!','\\\#','\\\~','\\\"', '\\\&', '\\\/', '\\\^', '\\\$', '\\\=', "\\'", "\\x00", "\\n", "\\r", "\\x1a" );
return str_replace ( $from, $to, $string );
}
Note the extra backslashes on the Sphinx-specific characters. I think what happens is that they put your whole query through an SQL parser, which removes escape backslashes 'extraneous' for SQL purposes (i.e. '\&' -> '&'). Then, it puts the MATCH clause through the fulltext parser, and suddenly '&' is a special character. So, you need the extra backslashes in the beginning.
There are corresponding functions EscapeString in each API ( php/python/java/ruby ) but to make escaping work with SphinxQL you have to write something similar in your application as SphinxQL hasn't such function.
The function itself is onliner
def EscapeString(self, string):
return re.sub(r"([=\(\)|\-!#~\"&/\\\^\$\=])", r"\\\1", string)
you could easy translate it to code of your application.
I want to prevent users to write an empty comment (whitespaces, , etc.). so I apply the following:
var.gsub(/^\s+|\s+\z|\s* \s*/.'')
However, then a smart user find a hole by using \302 or \240 unicode characters so I filtered out these characters too.
Then I ran into problem as I introduced several languages support, then a word like Déjà vu becomes an error. because part of the à character contains \240. is there any way to remove the whitespaces but leave the latin characters untouched?
A way around this is to use iconv to discard the invalid unicode characters (such as \230 on its own) before using your regexp to remove the whitespaces:
require 'iconv'
var1 = "Déjà vu"
var2 = "\240"
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid1 = ic.iconv(var1) # => "D\303\251j\303\240 vu"
valid2 = ic.iconv(var2) # => ""