searching "-" in websolr - websolr

websolr is returning
RSolr::Error::Http - 400 Bad Request
Error: <html><head><title>Apache Tomcat/6.0.28 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;} {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '----': Encountered " "-" "- "" at line 1, column 1.
Was expecting one of:
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
"[" ...
"{" ...
<NUMBER> ...
when ever tried to search "-" character.
other special characters works fine like ":" etc i have tried to use CGI.escape but its not making escape to these characters.

Have you tried escaping it with backslash?
Normally when you index your documents, the tokenizer will remove dash characters on their own, so you may want to just strip the dash anyway, unless you mean for it to be a negative query.
The full Solr query syntax is here:

As Chris correctly notes, you need to escape the backslash.
Depending on which query parser you're using, there are some special characters that have meaning. As of this writing, the Lucene (and thus Solr) query parser assigns special meaning to these characters:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
You should refer to the docs for Lucene query parser syntax for their full meaning. The default Solr query parser offers a superset of the Lucene query parser syntax, as described by the SolrQueryParser wiki page.
If you don't want to worry about escaping things, the DisMax Query Parser is designed to accept input that's closer to what a user might type into a search box. I haven't tested the various special against it recently, but as a rule it's probably more graceful in the input that it accepts.


Replace pattern in string with value from a ruby array

I have a string like this
"base: [_0x3e63[241], _0x3e63[242]],
gray: [_0x3e63[243], _0x3e63[244], _0x3e63[245], _0x3e63[246], _0x3e63[247], _0x3e63[248], _0x3e63[249], _0x3e63[250], _0x3e63[251], _0x3e63[252]],
red: [_0x3e63[253], _0x3e63[254], _0x3e63[255], _0x3e63[256], _0x3e63[257], _0x3e63[258], _0x3e63[259], _0x3e63[260], _0x3e63[261], _0x3e63[262]],
pink: [_0x3e63[263], _0x3e63[264], _0x3e63[265], _0x3e63[266], _0x3e63[267], _0x3e63[268], _0x3e63[269], _0x3e63[270], _0x3e63[271], _0x3e63[272]],
grape: [_0x3e63[273], _0x3e63[274], _0x3e63[275], _0x3e63[276], _0x3e63[277], _0x3e63[278], _0x3e63[279], _0x3e63[280], _0x3e63[281], _0x3e63[282]],
violet: [_0x3e63[283], _0x3e63[284], _0x3e63[285], _0x3e63[286], _0x3e63[287], _0x3e63[288], _0x3e63[289], _0x3e63[290], _0x3e63[291], _0x3e63[292]],
indigo: [_0x3e63[293], _0x3e63[294], _0x3e63[295], _0x3e63[296], _0x3e63[297], _0x3e63[298], _0x3e63[299], _0x3e63[300], _0x3e63[301], _0x3e63[302]],
blue: [_0x3e63[303], _0x3e63[304], _0x3e63[305], _0x3e63[306], _0x3e63[307], _0x3e63[308], _0x3e63[309], _0x3e63[310], _0x3e63[311], _0x3e63[312]],
cyan: [_0x3e63[313], _0x3e63[314], _0x3e63[315], _0x3e63[316], _0x3e63[317], _0x3e63[318], _0x3e63[319], _0x3e63[320], _0x3e63[321], _0x3e63[322]],
teal: [_0x3e63[323], _0x3e63[324], _0x3e63[325], _0x3e63[326], _0x3e63[327], _0x3e63[328], _0x3e63[329], _0x3e63[330], _0x3e63[331], _0x3e63[332]],
green: [_0x3e63[333], _0x3e63[334], _0x3e63[335], _0x3e63[336], _0x3e63[337], _0x3e63[338], _0x3e63[339], _0x3e63[340], _0x3e63[341], _0x3e63[342]],
lime: [_0x3e63[343], _0x3e63[344], _0x3e63[345], _0x3e63[346], _0x3e63[347], _0x3e63[348], _0x3e63[349], _0x3e63[350], _0x3e63[351], _0x3e63[352]],
yellow: [_0x3e63[353], _0x3e63[354], _0x3e63[355], _0x3e63[356], _0x3e63[357], _0x3e63[358], _0x3e63[359], _0x3e63[360], _0x3e63[361], _0x3e63[362]],
orange: [_0x3e63[363], _0x3e63[364], _0x3e63[365], _0x3e63[366], _0x3e63[367], _0x3e63[368], _0x3e63[369], _0x3e63[370], _0x3e63[371], _0x3e63[372]]"
_0x3e63 is a ruby array with the values.
_0x3e63 = ["#f783ac", "#faa2c1", "#fcc2d7", "#ffdeeb", "#fff0f6", "#862e9c", "#9c36b5", "#ae3ec9", "#be4bdb", "#cc5de8", "#da77f2", "#e599f7", "#eebefa", "#f3d9fa", "#f8f0fc", "#5f3dc4", "#6741d9", "#7048e8", "#7950f2", "#845ef7", "#9775fa", "#b197fc", "#d0bfff", "#e5dbff", "#f3f0ff", "#364fc7", "#3b5bdb", "#4263eb", "#4c6ef5", "#5c7cfa", "#748ffc", "#91a7ff", "#bac8ff", "#dbe4ff", "#edf2ff", "#1864ab", "#1971c2", "#1c7ed6", "#228be6", "#339af0", "#4dabf7", "#74c0fc", "#a5d8ff", "#d0ebff", "#e7f5ff", "#0b7285", "#0c8599", "#1098ad", "#15aabf", "#22b8cf", "#3bc9db", "#66d9e8", "#99e9f2", "#c5f6fa", "#e3fafc", "#087f5b", "#099268", "#0ca678", "#12b886", "#20c997", "#38d9a9", "#63e6be", "#96f2d7", "#c3fae8", "#e6fcf5", "#2b8a3e", "#2f9e44", "#37b24d", "#40c057", "#51cf66", "#69db7c", "#8ce99a", "#b2f2bb", "#d3f9d8", "#ebfbee", "#5c940d", "#66a80f", "#74b816", "#82c91e", "#94d82d", "#a9e34b", "#c0eb75", "#d8f5a2", "#e9fac8", "#f4fce3", "#e67700", "#f08c00", "#f59f00", "#fab005", "#fcc419", "#ffd43b", "#ffe066", "#ffec99", "#fff3bf", "#fff9db", "#d9480f", "#e8590c"]
I cannot find a way to retrieve from the string _0x3e63[xxxxxxx] replacing it with the right value....
Use String#gsub with a block.
Assuming your input string is stored in the variable input, the following code does the replacement and displays the result:
puts input.gsub(/_0x3e63\[(\d+)\]/){|s| _0x3e63[$1.to_i]}
(The array _0x3e63 you posted in the question does not contain enough values to have indices like 247 or 251 but the code works nevertheless.)
The code is very simple. The regular expression /_0x3e63\[(\d+)\]/ matches any string that starts with _0x3e63[, continues with one or more digits (\d+) and ends with ].
For each match the block is executed and the value returned by the block is used to replace the matched piece of the original string.
The replacement uses $1 (that contains the sub-string that matches the first capturing group) as an index into the array _0x3e63. Because the value of $1 is a string, .to_i is used to convert it to a number (required to be used as index in the array).
We are given:
str =<<~END
base: [arr[6], arr[3]],
gray: [arr[0], arr[4], arr[1], arr[5]],
red: [arr[2]]
#=> "base: [arr[6], arr[3]],\ngray: [arr[0], arr[4], arr[1], arr[5]],\nred: [arr[2]]\n"
arr = ["#f783ac", "#faa2c1", "#fcc2d7", "#ffdeeb", "#fff0f6", "#862e9c",
We can perform the required replacements by using String#gsub with a regular expression and Kernel#eval:
puts str.gsub(/\barr\[\d+\]/) { |s| eval s }
base: [#9c36b5, #ffdeeb],
gray: [#f783ac, #fff0f6, #faa2c1, #862e9c],
red: [#fcc2d7]
The regular expression preforms the following operations:
\b # match a word break (to avoid matching 'gnarr')
arr\[ # match string 'arr['
\d+ # match 1+ digits
\] # match ']'
One must be cautious about using eval (to avoid launching missiles inadvertently, for example), but as long as the matches of the string can be trusted it's a perfectly safe and useful method.

How to parse username, ID or whole part using Ruby Regex in this sentence?

I have a sentences like this:
Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?
And what I want to is get #[Pratha](user:1) and #[John](user:3). Either their names and ids or just as texts as I quoted so that i can explode and parse name and id myself.
But there is an issue here. Names Pratha and John may include non-abc characters like ', ,, -, + , etc... But not [] and ()
What I tried so far:
c = ''
f = c.match(/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s(\[)|$)))(\w+)(?=\s|$)/i)
But no success.
You may use
See the regex demo
# - a # char
\[ - a [
([^\]\[]*) - Group 1: 0+ chars other than [ and ]
\] - a ] char
\( - a ( char
[^()]*- 0+ chars other than ( and )
: - a colon
(\d+) - Group 2: 1 or more digits
\) - a ) char.
Sample Ruby code:
s = "Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?"
rx = /#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
res = s.scan(rx)
puts res
# = > [["Pratha", "1"], ["John", "3"]]
"Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?".scan(/#.*?\)/)
#⇒ ["#[Pratha](user:1)", "#[John](user:3)"]
Since the line is not coming from the user input, you might rely on that the part you are interested in starts with # and ends with ).
You could use 2 capturing groups to get the names and the id's:
That will match
# Match literally
\[ Match [
([^]]+) 1st capturing group which matches not ] 1+ times using a negated character class.
\( Match literally
[^:]+: Match not :, then match :
([^)]+) 2nd capturing group which matches not ) 1+ times
\) Match )
Regex demo | Ruby demo

Ruby Regex find numbers not surrounded by alphabetical characters [duplicate]

I have a regex expression that I'm using to find all the words in a given block of content, case insensitive, that are contained in a glossary stored in a database. Here's my pattern:
The problem is, if I use /(Foo)/i then words like Food get matched. There needs to be whitespace or a word boundary on both sides of the word.
How can I modify my expression to match only the word Foo when it is a word at the beginning, middle, or end of a sentence?
Use word boundaries:
Or if you're searching for "S.P.E.C.T.R.E." like in Sinan Ünür's example:
To match any whole word you would use the pattern (\w+)
Assuming you are using PCRE or something similar:
Above screenshot taken from this live example:
Matching any whole word on the commandline with (\w+)
I'll be using the phpsh interactive shell on Ubuntu 12.10 to demonstrate the PCRE regex engine through the method known as preg_match
Start phpsh, put some content into a variable, match on word.
el#apollo:~/foo$ phpsh
php> $content1 = 'badger'
php> $content2 = '1234'
php> $content3 = '$%^&'
php> echo preg_match('(\w+)', $content1);
php> echo preg_match('(\w+)', $content2);
php> echo preg_match('(\w+)', $content3);
The preg_match method used the PCRE engine within the PHP language to analyze variables: $content1, $content2 and $content3 with the (\w)+ pattern.
$content1 and $content2 contain at least one word, $content3 does not.
Match a number of literal words on the commandline with (dart|fart)
el#apollo:~/foo$ phpsh
php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';
php> echo preg_match('(dart|fart)', $gun1);
php> echo preg_match('(dart|fart)', $gun2);
php> echo preg_match('(dart|fart)', $gun3);
php> echo preg_match('(dart|fart)', $gun4);
variables gun1 and gun2 contain the string dart or fart. gun4 does not. However it may be a problem that looking for word fart matches farty. To fix this, enforce word boundaries in regex.
Match literal words on the commandline with word boundaries.
el#apollo:~/foo$ phpsh
php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';
php> echo preg_match('(\bdart\b|\bfart\b)', $gun1);
php> echo preg_match('(\bdart\b|\bfart\b)', $gun2);
php> echo preg_match('(\bdart\b|\bfart\b)', $gun3);
php> echo preg_match('(\bdart\b|\bfart\b)', $gun4);
So it's the same as the previous example except that the word fart with a \b word boundary does not exist in the content: farty.
Using \b can yield surprising results. You would be better off figuring out what separates a word from its definition and incorporating that information into your pattern.
use strict; use warnings;
use re 'debug';
my $str = 'S.P.E.C.T.R.E. (Special Executive for Counter-intelligence,
Terrorism, Revenge and Extortion) is a fictional global terrorist
my $word = 'S.P.E.C.T.R.E.';
if ( $str =~ /\b(\Q$word\E)\b/ ) {
print $1, "\n";
Compiling REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
Final program:
1: BOUND (2)
2: OPEN1 (4)
4: EXACT (9)
9: CLOSE1 (11)
11: BOUND (12)
12: END (0)
anchored "S.P.E.C.T.R.E." at 0 (checking anchored) stclass BOUND minlen 14
Guessing start of match in sv for REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P
.E.C.T.R.E. (Special Executive for Counter-intelligence,"...
Found anchored substr "S.P.E.C.T.R.E." at offset 0...
start_shift: 0 check_at: 0 s: 0 endpos: 1
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P.E.C.T.R.E. (Special Exec
utive for Counter-intelligence,"...
0 | 1:BOUND(2)
0 | 2:OPEN1(4)
0 | 4:EXACT (9)
14 | 9:CLOSE1(11)
14 | 11:BOUND(12)
Match failed
Freeing REx: "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
For Those who want to validate an Enum in their code you can following the guide
In Regex World you can use ^ for starting a string and $ to end it. Using them in combination with | could be what you want :
It will return true only for Male or Female case.
If you are doing it in Notepad++
Would give you the entire word, and you can add parenthesis to get it as a group. Example: conv1 = Conv2D(64, (3, 3), activation=LeakyReLU(alpha=a), padding='valid', kernel_initializer='he_normal')(inputs). I would like to move LeakyReLU into its own line as a comment, and replace the current activation. In notepad++ this can be done using the follow find command:
([\w]+)( = .+)(LeakyReLU.alpha=a.)(.+)
and the replace command becomes:
\1\2'relu'\4 \n # \1 = LeakyReLU\(alpha=a\)\(\1\)
The spaces is to keep the right formatting in my code. :)
use word boundaries \b,
The following (using four escapes) works in my environment: Mac, safari Version 10.0.3 (12602.4.8)
var myReg = new RegExp(‘\\\\b’+ variable + ‘\\\\b’, ‘g’)
Get all "words" in a string
Basically ^/s means break on spaces (or match groups of non-spaces)
Don't forget the g for Greedy
Try it:
"Not the answer you're looking for? Browse other questions tagged regex word-boundary or ask your own question.".match(/([^\s]+)/g)
→ (17) ['Not', 'the', 'answer', "you're", 'looking', 'for?', 'Browse', 'other', 'questions', 'tagged', 'regex', 'word-boundary', 'or', 'ask', 'your', 'own', 'question.']

Talend Convert string to Float

I am using Talend to make an ETL project.
To convert my string to Double, i use Float.parseFloat(row4.Exportation2.trim())
This the error that it gives me.
This is how my data looks like "766,9997474" "1 345,43" in the exportation2
Does anyone have an idea why ?
Démarrage du job ConvertString a 14:03 27/01/2017.
Exception in component tMap_1
java.lang.NumberFormatException: For input string: "23,4897452"
at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)
at sun.misc.FloatingDecimal.parseDouble(Unknown Source)
at java.lang.Double.parseDouble(Unknown Source)
at last.convertstring_0_1.ConvertString.tFileInputDelimited_1Process(
at last.convertstring_0_1.ConvertString.runJobInTOS(
at last.convertstring_0_1.ConvertString.main(
[statistics] connecting to socket on port 3464
[statistics] connected
[statistics] disconnected
Job ConvertString terminé à 14:03 27/01/2017. [Code sortie=1]
Multiple problems here :
If you want to convert from String to Double you should use Double.parseDouble().
"," is not the expected char : it should be "." :
You will have to convert "," char to "." char : if your input comes from an excel or delimited file, you can set this option on the advanced settings of tFileInput component ("advanced separator"). Otherwise you should use yourString.replaceAll(",", "."))
there is a non-standard space in the String that you should replace with yourString.replaceAll(" ", ""))
To do so u can use so many fucntions in t_Map :
columnValue = columnValue.replaceAll("\\W","");
\w = Anything that is a word character
\W = Anything that isn't a word character (including punctuation etc)
\s = Anything that is a space character (including space, tab characters etc)
`\S = Anything that isn't a space character (including both letters and numbers, as well as punctuation etc)
or if u want to ignore any thing other then a letter or a number u can use simply :
.replaceAll("[^a-zA-Z0-9]", "")

Setting a string of ASCII-art to a variable while escaping special characters

I'm working in Ruby and I'd like to set a huge string of ASCII-art characters to a variable, however I am running into some problems dealing with special characters. Most ASCII-art contains octothorps and quotes and all sorts of problematic special characters that require escaping. Is there any easy way to escape mass bulks of special characters without having to go into each and every single character one by one?
Use heredoc. That is what it is for. With heredoc, you can set the terminating string to any string that does not appear in your ASCII art. Combining that with single quotes, which inactivates interpolation, you would not need to escape.
.':ok0KXXXNXK0kxolc:;;,,,,,,,,,,,;;,,,''''''',,''.. .'lOXKd'
.,lx00Oxl:,'............''''''................... ...,;;'. .oKXd.
.ckKKkc'...'',:::;,'.........'',;;::::;,'..........'',;;;,'.. .';;'. 'kNKc.
.:kXXk:. .. .................. .............,:c:'...;:'. .dNNx.
:0NKd, .....''',,,,''.. ',...........',,,'',,::,...,,. .dNNx.
.xXd. .:;'.. ..,' .;,. ...,,'';;'. ... .oNNo
.0K. .;. ;' '; .'...'. .oXX:
.oNO. . ,. . ..',::ccc:;,.. .. lXX:
.dNX: ...... ;. 'cxOKK0OXWWWWWWWNX0kc. :KXd.
.l0N0; ;d0KKKKKXK0ko:... .l0X0xc,...lXWWWWWWWWKO0Kx' ,ONKo.
.lKNKl...'......'. .dXWN0kkk0NWWWWWN0o. :KN0;. .,cokXWWNNNNWNKkxONK: .,:c:. .';;;;:lk0XXx;
:KN0l';ll:'. .,:lodxxkO00KXNWWWX000k. oXNx;:okKX0kdl:::;'',;coxkkd, ...'. ...'''.......',:lxKO:.
oNNk,;c,'',. ...;xNNOc,. ,d0X0xc,. .dOd, ..;dOKXK00000Ox:. ..''dKO,
'KW0,:,.,:..,oxkkkdl;'. 'KK' .. .dXX0o:'....,:oOXNN0d;.'. ..,lOKd. .. ;KXl.
;XNd,; ;. l00kxoooxKXKx:..ld: ;KK' .:dkO000000Okxl;. c0; :KK; . ;XXc
'XXdc. :. .. '' 'kNNNKKKk, .,dKNO. .... .'c0NO' :X0. ,. xN0.
.kNOc' ,. .00. ..''... .l0X0d;. 'dOkxo;... .;okKXK0KNXx;. .0X: ,. lNX'
,KKdl .c, .dNK, .;xXWKc. .;:coOXO,,'....... .,lx0XXOo;...oNWNXKk:.'KX; ' dNX.
:XXkc'.... .dNWXl .';l0NXNKl. ,lxkkkxo' .cK0. ..;lx0XNX0xc. ,0Nx'.','.kXo ., ,KNx.
cXXd,,;:, .oXWNNKo' .'.. .'.'dKk; .cooollox;.xXXl ..,cdOKXXX00NXc. 'oKWK' ;k: .l. ,0Nk.
cXNx. . ,KWX0NNNXOl'. .o0Ooldk; .:c;.':lxOKKK0xo:,.. ;XX: .,lOXWWXd. . .':,.lKXd.
lXNo cXWWWXooNWNXKko;'.. .lk0x; ...,:ldk0KXNNOo:,.. ,OWNOxO0KXXNWNO, ....'l0Xk,
.dNK. oNWWNo.cXK;;oOXNNXK0kxdolllllooooddxk00KKKK0kdoc:c0No .'ckXWWWNXkc,;kNKl. .,kXXk,
'KXc .dNWWX;.xNk. .kNO::lodxkOXWN0OkxdlcxNKl,.. oN0'..,:ox0XNWWNNWXo. ,ONO' .o0Xk;
.ONo oNWWN0xXWK, .oNKc .ONx. ;X0. .:XNKKNNWWWWNKkl;kNk. .cKXo. .ON0;
.xNd cNWWWWWWWWKOkKNXxl:,'...;0Xo'.....'lXK;...',:lxk0KNWWWWNNKOd:.. lXKclON0: .xNk.
.KX; .dNKoON0;lXNkcld0NXo::cd0NNO:;,,'.. .0Xc lXXo..'l0NNKd,. .c0Nk,
:XK. .xNX0NKc.cXXl ;KXl .dN0. .0No .xNXOKNXOo,. .l0Xk;.
.dXk. .lKWN0d::OWK; lXXc .OX: .ONx. . .,cdk0XNXOd;. .'''....;c:'..;xKXx,
.0No .:dOKNNNWNKOxkXWXo:,,;ONk;,,,,,;c0NXOxxkO0XXNXKOdc,. ..;::,...;lol;..:xKXOl.
,XX: ..';cldxkOO0KKKXXXXXXXXXXKKKKK00Okxdol:;'.. .';::,..':llc,..'lkKXkc.
:NX' . '' .................. .,;:;,',;ccc;'..'lkKX0d;.
lNK. .; ,lc,. ................ ..,,;;;;;;:::,....,lkKX0d:.
.oN0. .'. .;ccc;,'.... ....'',;;;;;;;;;;'.. .;oOXX0d:.
.dN0. .;;,.. .... ..''''''''.... .:dOKKko;.
lNK' ..,;::;;,'......................... .;d0X0kc'.
.xXO' .;oOK0x:.
.cKKo. .,:oxkkkxk0K0xc'.
.oKKkc,. .';cok0XNNNX0Oxoc,.
