NER is over writing the custom NERin stanford NLP - stanford-nlp

In the stanford nlp, I used a pattern to match the phone number in regexner. But the NER is over writing it as Number.
If I remove the ner annotation then it is showing as PHONE_NUMBER.
Can any one of you please help me.
Thanks in Advance.
Here is my regexner line:
^(?:(?:\+|0{0,2})91(\s*[\-]\s*)?|[0]?)?[789]\d{9}$ PHONENUMBER

java command:
java -Xmx10g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -file phone-number-example.txt -outputFormat text -ner.fine.regexner.mapping phone-number-regex.rules
example text:
I will call him at 555-555-5555
format of rules file:
555-555-5555 PHONE_NUMBER NUMBER 1
(note the columns are tab delimited)
The fine-grained NER will be applied after the statistical NER. You can also build a custom regexner and run it after the statistical model. The key is telling it to overwrite the NUMBER tag (which is indicated in the third column).

^(?:(?:\+|0{0,2})91(\s*[\-]\s*)?|[0]?)?[789]\d{9}$ PHONENUMBER NUMBER
this worked the column after the CUSTOM NER column to overwrite

Related

Can I use stopwords, case insensitivty and removing punctuation in Elastic search Keyword data type?

Actually, as we know for keyword data type we have to use normalizer, but I am gettting an error to use stopword in normalizer. Is there any other way to add stopwords in normalizer for keyword data type?
It is working for lowercase but not for stop word.
Can any one help me out?
I am just want track or any sample code...
Or I have to convert keyword data type into text?

How to parse a csv file which has some field containing seprator (comma) as-values

sample message - 111,222,333,444,555,val1in6th,val2in6th,777
The sixth column contains a value consisting of commas (val1in6th,val2in6th is a sample value of 6th column).
When I use a simple csv filter this message is getting converted to 8 fields. I want to be able to tell the filter that val1in6th,val2in6th should be treated as a single value and placed as the value of 6th column (its okay not to have comma between val1in6th and val2in6th when placed as the output as 6th column).
change your plugin, no more the csv one but grok filter - doc here.
Then you use a debugger to create a parser for your lines - like this one: https://grokdebug.herokuapp.com/
For your lines you could use this grok expression:
%{WORD:FIELD1},%{WORD:FIELD2},%{WORD:FIELD3},%{WORD:FIELD4},%{WORD:FIELD5},%{GREEDYDATA:FIELD6}
or :
%{INT:FIELD1},%{INT:FIELD2},%{INT:FIELD3},%{INT:FIELD4},%{INT:FIELD5},%{GREEDYDATA:FIELD6}
It changes the datatypes in elastic of the firsts 5 fields.
To know about parse csv with grok filter in elastic you could use this es official blog guide, it is explained how to use grok with ingestion pipeline, but it is the same with logstash

Search with hyphen, without and with a space

How can I tokenize a hyphenated term such that I can search using the following acceptance criteria:
with a hyphen (co-trimoxazole)
without a hyphen (cotrimoxazole)
with a space (co trimoxazole)
I managed to use the standard analyzer which tokenizes on hyphens both on the index side and query side which allows me to search on:
cotrimoxazole
co-trimoxazole
but not
co trimoxazole
I would suggest usage of combo analyzers.
Here create 2 analyzer , one which tokenizes based on standard analyzer and other based on white space.
This should work fine for you.

Indexing a file contents in ElasticSearch

**I have a text file which contains some names like below:
Tom,
Harry
Robert
Harry
Matt
Tremp
I want to index those names in ElasticSearch using JAVA APIs which should index all the names automatically.
Can anybody suggest any solution as I am new to ElasticSearch
Thanks in advance**

Multiple Field Search using Lucene Parser with Solr Using Sunspot

I'm using Solr with Sunspot (ruby) and due to other constraints i have to use the Lucene parser instead of the DisMax parser. I need to be able to search using username as well as first_name fields at the same time.
If i were using DisMax i can specify qf="username+first_name" but using only the lucene parser I am only able to set df (default field) and it will not allow me to specify more than one field.
How can I search multiple fields using the lucene parser?
Update: Answer: just use the q parameter
adjust_solr_params do |params|
params[:defType] = "lucene"
params[:q] = "username:\"#{params[:q]}\" OR first_name:\"#{params[:q]}\""
end
You can use copy fields instructions in your schema to create a "catch all" field from all the fields you want to search on. You then set df to that field.
To expand on Karussell's comment, the default field is just that, the default. You can explicitly specify however many fields you want, it's only if you don't specify one that the default comes into play.
So a query like username:foo first_name:bar will find documents with a username of "foo" and a first_name of "bar."

Resources