I can do a MariaDB fulltext query which searches for the word beginning like this:
select * from mytable
where match(mycol) against ('+test*' in boolean mode)>0.0;
This finds words like "test", "tester", "testing".
If my search string contains special characters, I can put the search string in quotes:
select * from mytable
where match(mycol) against ('+"test-server"' in boolean mode)>0.0;
This will find all rows which contain the string test-server.
But it seems I cannot combine both:
select * from mytable
where match(mycol) against ('+"test-serv"*' in boolean mode)>0.0;
This results in an error:
Error: (conn:7) syntax error, unexpected $end, expecting FTS_TERM or FTS_NUMB or '*'
SQLState: 42000
ErrorCode: 1064
Placing the ´*´ in the quoted string will return no results (as expected):
select * from mytable
where match(mycol) against ('+"test-serv*"' in boolean mode)>0.0;
Does anybody know whether this is a limitation of MariaDB? Or a bug?
My MariaDB version is 10.0.31
WHERE MATCH(mycol) AGAINST('+test +serv*' IN BOOLEAN MODE)
AND mycol LIKE '%test_serv%'
The MATCH will find the desired rows plus some that are not desired. Then the LIKE will filter out the duds. Since the LIKE is being applied to only some rows, its slowness is masked.
(Granted, this does not work in all cases. And it requires some manual manipulation.)
d'Artagnan - Use
WHERE MATCH(mycol) AGAINST("+Arta*" IN BOOLEAN MODE)
AND mycol LIKE '%d\'Artagnan%'
Note that I used the suitable escaping for getting the apostrophe into the LIKE string.
So, the algorithm for your code goes something like:
Break the string into "words" the same way FULLTEXT would.
Toss any strings that are too short.
If no words are left, then you cannot use FULLTEXT and are stuck with a slow LIKE.
Stick * after the last word (or each word?).
Build the AGAINST with those word(s).
Add on AND LIKE '%...%' with the original phrase, suitably escaped.
I have a string field "title"(not analyzed) in elasticsearch. A document has title "Garfield 2: A Tail Of Two Kitties (2006)".
When I use the following json to query, no result returns.
{"query":{"term":{"title":"Garfield 2: A Tail Of Two Kitties (2006)"}}}
I tried to escape the colon character and the braces, like:
{"query":{"term":{"title":"Garfield 2\\: A Tail Of Two Kitties \\(2006\\)"}}}
Still not working.
Term query wont tokenize or apply analyzers to the search text. Instead if looks for the exact match which wont work as the string fields are analyzed/tokenized by default.
To give this a better explanation -
Lets say there is a string value as - "I am in summer:camp"
When indexing this its broken into tokens as below -
"I am in summer:camp" => [ I , am , in , summer , camp ]
Hence even if you do a term search for "I am in summer:camp" , it wont still work as the token "I am in summer:camp" is not present in the index.
Something like phrase query might work better here.
Or you can leave "index" field as "not_analyzed" to make sure that string is not tokenized.
I needed to use Oracle 11g's Contains() function to search some exact text contained in some field typed by the user. I was asked not to use the 'like' operator.
According to the Oracle documentation, for everything to work you need to:
Double } characters
Put the whole input between {}
This works in most cases except for a few ones. Below it a test case:
create table theme
(name varchar2(300 char) not null);
insert into theme (name)
values ('a');
insert into theme (name)
values ('b');
insert into theme (name)
values ('a or b');
insert into theme (name)
values ('Pdz344_1_b');
create index name_index on theme(name) indextype is ctxsys.context;
If the 'or' operator was interpreted, I would get all four results, which is hopefully not the case. Now if I run the following, I would expect is to only find 'a or b'.
select * from theme
where contains(name, '{a or b}')>0;
However I also get 'Pdz344_1_b'. But there's no 'a', 'o' not 'r' and I find it very surprising that this text is matched. Is there something I don't get about contains()'s syntax?
CONTAINS is not like LIKE operator at all. Since it using ORACLE TEXT search engine (something like google search), not just string matching.
{} - is an escape marker. Means everything you put inside should be treated as text to escape.
Therefore you issue query to find text that looks like a or b not like a or b.
So your query get matched against Pdz344_1_b because it has b char in it.
Row with only a character ain't matched because a character exists in the default stop list.
Why just b ain't matched? Because your match sequence actually looks like a\ or\ b.
So we have 3 tokens a _or _b (underscores represents spaces). a in stop list, and we have no string _b in the b row, because there only single character. But we do have this combination in the Pdz344_1_b row, because non-alphabetic characters are treated as whitespace. If you remove {} or query for {b or a} then you'll get matches against b as well.
I am having a string as below:
str1='"{\"#Network\":{\"command\":\"Connect\",\"data\":
{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"'
I wanted to extract the somename string from the above string. Values of xx:xx:xx:xx:xx:xx, somename and 123456789 can change but the syntax will remain same as above.
I saw similar posts on this site but don't know how to use regex in the above case.
Any ideas how to extract the above string.
Parse the string to JSON and get the values that way.
require 'json'
str = "{\"#Network\":{\"command\":\"Connect\",\"data\":{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"
json = JSON.parse(str.strip)
name = json["#Network"]["data"]["Name"]
pwd = json["#Network"]["data"]["Pwd"]
Since you don't know regex, let's leave them out for now and try manual parsing which is a bit easier to understand.
Your original input, without the outer apostrophes and name of variable is:
"{\"#Network\":{\"command\":\"Connect\",\"data\":{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"
You say that you need to get the 'somename' value and that the 'grammar will not change'. Cool!.
First, look at what delimits that value: it has quotes, then there's a colon to the left and comma to the right. However, looking at other parts, such layout is also used near the command and near the pwd. So, colon-quote-data-quote-comma is not enough. Looking further to the sides, there's a \"Name\". It never occurs anywhere in the input data except this place. This is just great! That means, that we can quickly find the whereabouts of the data just by searching for the \"Name\" text:
inputdata = .....
estposition = inputdata.index('\"Name\"')
raise "well-known marker wa not found in the input" unless estposition
now, we know:
where the part starts
and that after the "Name" text there's always a colon, a quote, and then the-interesting-data
and that there's always a quote after the interesting-data
let's find all of them:
colonquote = inputdata.index(':\"', estposition)
datastart = colonquote+3
lastquote = inputdata.index('\"', datastart)
dataend = lastquote-1
The index returns the start position of the match, so it would return the position of : and position of \. Since we want to get the text between them, we must add/subtract a few positions to move past the :\" at begining or move back from \" at end.
Then, fetch the data from between them:
value = inputdata[datastart..dataend]
And that's it.
Now, step back and look at the input data once again. You say that grammar is always the same. The various bits are obviously separated by colons and commas. Let's try using it directly:
parts = inputdata.split(/[:,]/)
=> ["\"{\\\"#Network\\\"",
"{\\\"command\\\"",
"\\\"Connect\\\"",
"\\\"data\\\"",
"\n{\\\"Id\\\"",
"\\\"xx",
"xx",
"xx",
"xx",
"xx",
"xx\\\"",
"\\\"Name\\\"",
"\\\"somename\\\"",
"\\\"Pwd\\\"",
"\\\"123456789\\\"}}}\\0\""]
Please ignore the regex for now. Just assume it says a colon or comma. Now, in parts you will get all the, well, parts, that were detected by cutting the inputdata to pieces at every colon or comma.
If the layout never changes and is always the same, then your interesting-data will be always at place 13th:
almostvalue = parts[12]
=> "\\\"somename\\\""
Now, just strip the spurious characters. Since the grammar is constant, there's 2 chars to be cut from both sides:
value = almostvalue[2..-3]
Ok, another way. Since regex already showed up, let's try with them. We know:
data is prefixed with \"Name\" then colon and slash-quote
data consists of some text without quotes inside (well, at least I guess so)
data ends with a slash-quote
the parts in regex syntax would be, respectively:
\"Name\":\"
[^\"]*
\"
together:
inputdata =~ /\\"Name\\":\\"([^\"]*)\\"/
value = $1
Note that I surrounded the interesting part with (), hence after sucessful match that part is available in the $1 special variable.
Yet another way:
If you look at the grammar carefully, it really resembles a set of embedded hashes:
\"
{ \"#Network\" :
{ \"command\" : \"Connect\",
\"data\" :
{ \"Id\" : \"xx:xx:xx:xx:xx:xx\",
\"Name\" : \"somename\",
\"Pwd\" : \"123456789\"
}
}
}
\0\"
If we'd write something similar as Ruby hashes:
{ "#Network" =>
{ "command" => "Connect",
"data" =>
{ "Id" => "xx:xx:xx:xx:xx:xx",
"Name" => "somename",
"Pwd" => "123456789"
}
}
}
What's the difference? the colon was replaced with =>, and the slashes-before-quotes are gone. Oh, and also opening/closing \" is gone and that \0 at the end is gone too. Let's play:
tmp = inputdata[2..-4] # remove opening \" and closing \0\"
tmp.gsub!('\"', '"') # replace every \" with just "
Now, what about colons.. We cannot just replace : with =>, because it would damage the internal colons of the xx:xx:xx:xx:xx:xx part.. But, look: all the other colons have always a quote before them!
tmp.gsub!('":', '"=>') # replace every quote-colon with quote-arrow
Now our tmp is:
{"#Network"=>{"command"=>"Connect","data"=>{"Id"=>"xx:xx:xx:xx:xx:xx","Name"=>"somename","Pwd"=>"123456789"}}}
formatted a little:
{ "#Network"=>
{ "command"=>"Connect",
"data"=>
{ "Id"=>"xx:xx:xx:xx:xx:xx","Name"=>"somename","Pwd"=>"123456789" }
}
}
So, it looks just like a Ruby hash. Let's try 'destringizing' it:
packeddata = eval(tmp)
value = packeddata['#Network']['data']['Name']
Done.
Well, this has grown a bit and Jonas was obviously faster, so I'll leave the JSON part to him since he wrote it already ;) The data was so similar to Ruby hash because it was obviously formatted as JSON which is a hash-like structure too. Using the proper format-reading tools is usually the best idea, but mind that the JSON library when asked to read the data - will read all of the data and then you can ask them "what was inside at the key xx/yy/zz", just like I showed you with the read-it-as-a-Hash attempt. Sometimes when your program is very short on the deadline, you cannot afford to read-it-all. Then, scanning with regex or scanning manually for "known markers" may (not must) be much faster and thus prefereable. But, still, much less convenient. Have fun.
I have:
BEFORE Gsub sql ::::
SELECT record_type.* FROM record_type WHERE (name = 'Registrars')
sql = sql.gsub(/SELECT\s+[^\(][A-Z]+\./mi,"SELECT ")
AFTER GSUB SQL ::::
SELECT record_type.* FROM record_type WHERE (name = 'Registrars')
The desired result is to remove the "record_type." from the statement:
So it should be :
SELECT * FROM record_type WHERE (name = 'Registrars')
After the regex is run.
I didn't write this, it's in the asf-soap-adaptor gem. Can someone tell me why it doesn't work, and how to fix?
I suppose it should be written like this...
sql = sql.gsub(/SELECT\s+[^\(][A-Z_]+\./mi,"SELECT ")
... as the code in the question won't match if the field name contains _ (underscore) symbol. I suppose that's why this code is in gem: it can work in some conditions (i.e., with underscoreless field names).
Still, I admit I don't understand why exactly this replacement should be done - and shouldn't it include 0-9 check as well (as, for example, 'record_id1' field still won't be matched - and replaced - by the character class in the regular expression; you may have to either expand it, like [0-9A-Z_], or just replace completely with \w).
so your before and after gsubs are the same? I can't tell you why it doesn't work if you dont tell me your expected result. Also for help with interpreting ruby regular expressions check out rubular.com