I would like to implement for my website a fuzzy phrase search using Hibernate Search. I've read from some people that ComplexPhraseQueryParser is ok for this.
But the thing is that I have some missunderstandings or issues related to it.
So for instance let's consider a test title like this (I have a lot of this and I need to let users search whatever they like):
"Do you know how to be in shape?"
If someone will type
"do you cnow how to be in shape?" my ComplexPhraseQueryParser will not find anyting. But if I put a "~" character after the misspeled word like "cnow~" it will work and return a result.
What is the reason of this? Doesn't ComplexPhraseQueryParser support such situations?
I would like to make my search deal with few misspelled words and return results (similar to google search).
ComplexPhraseQueryParser quizTitlePhraseQuery = new ComplexPhraseQueryParser(Version.LUCENE_30, "title", new StandardAnalyzer(
Version.LUCENE_30)) ;
quizTitlePhraseQuery.setPhraseSlop(100);
quizTitlePhraseQuery.setDefaultOperator(Operator.AND);
quizTitlePhraseQuery.setFuzzyMinSim(0.1f);
Thank you in advance!
Related
First off I'm a total novice for Javascript, so please go gently. I'm aware of how people feel about having to now pay for IFTTT, but it's perfect for what I need.
I am using a more expansive version of this code below to capture certain keywords from Tweets to then generate emails if the search returns a positive result. This search works very nicely, except it is case sensitive which is a problem.
Yes, I know you can manipulate the twitter search to pick up specific words or phrases. I am very proficient in achieving searches this way. I am casting a wide net to pick up approx 120 search words or phrases which is too long to achieve through "OR" Twitter search parameters alone which is why I'm using this.
Q1 - I have tried adding item.toLowerCase() and just .toLowerCase() in various parts of the code so it wouldn't matter if the sentence case of the search term is different to that of the original tweet text case. I just can't get it to work though. I've seen various posts on here but I can't get any of them to work in IFTTT. I believe IFTTT doesn't accept REGEX either, which is annoying.
Any advice of how to get this code running so it's case-insensitive for text within IFTTT?
Q2 - I have approx 120 search terms for the tweet text to return positive results. There is a lot of junk that comes through with that. Does anyone know how to add a second layer of 'and exclude' search terms?
I have something like 300-400 words and specific phrases which would be used to stop the email from being triggered - so it'd be something like "IF tweet text contains a, b, c BUT text ALSO contains x, y, z... do not send the email"
let str=Twitter.newTweetFromSearch.Text;
let searchTerms=[
"Northbound",
"Westbound",
"Southbound",
"Eastbound"
]
let foundOne=0;
if(searchTerms.some(function(v){return str.indexOf(v)>=0;})){
foundOne=1;
}
if(foundOne==0){
Email.sendMeEmail.skip();
}
I have looked at the Twitter API, but that is a step too far for my coding ability which is why I'm using IFTTT.
Any help is very much appreciated
Thank you.
I'm playing with IFTTT Filter myself at the moment, so here are some thoughts about solving your solution.
If you want to do a case insensitive seatch on the original text, convert the original text to lowercase, then have all your search terms in lowercase.
Plus I think you want to iterate over the searchTerms array, and use the includes() method. Ok, just realised that .some() does the iteration for you, but I prefer includes() over indexof().
let str=Twitter.newTweetFromSearch.Text.toLowerCase();
let searchTerms=[
"northbound",
"westbound",
"southbound",
"eastbound"
]
let foundOne=0;
if(searchTerms.some(function(term){return str.includes(term);})){
foundOne=1;
}
if(foundOne==0){
Email.sendMeEmail.skip();
}
Or you could just skip having the foundOne variable, and do the search in the if() statement.
let str=Twitter.newTweetFromSearch.Text.toLowerCase();
let searchTerms=[
"northbound",
"westbound",
"southbound",
"eastbound"
]
if(!searchTerms.some(function(term){return str.includes(term);})){
Email.sendMeEmail.skip();
}
I'm trying to use the PubMed API to search for articles with an exact title. As an example, I want to search for the title: The cost-effectiveness of mirtazapine versus paroxetine in treating people with depression in primary care.
I want up to 1000 results in JSON format, so I know that the first part of my URL should look like this:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=1000&term=
How do I add a title search as a GET parameter?
I've been using the Pubmed advanced search constructor, and that suggests that the query should look like The cost-effectiveness of mirtazapine versus paroxetine in treating people with depression in primary care[Title].
But if I try just adding that to the URL term=, PubMed tries to break down the title into all kinds of peculiar queries:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=1000&term=The%20cost-effectiveness%20of%20mirtazapine%20versus%20paroxetine%20in%20treating%20people%20with%20depression%20in%20primary%20care[Title]
How can I specify an exact title as a GET param?
Use field=title
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=1000&term=The%20cost-effectiveness%20of%20mirtazapine%20versus%20paroxetine%20in%20treating%20people%20with%20depression%20in%20primary%20care&field=title
Check out ESearch API for more information:
http://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_ESearch_
Use + instead of %20 (space).
For example:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=1000&term=cost+effectiveness+of+mirtazapine[title]
I am trying to study some rethinkdb for my next project. My backend is in Haskell and rethink db haskell driver looks a bit better then mongodb. So I want to try it.
My question is how do you do simple text search with rethinkdb?
Nothing too complex. Just find field which value contains these words.
I assume this should be built in as even a smallest blog app needs a search facility of some kind, right?.
So I am looking for a mongodb equivalent of:
var search = { "$text": { "$search": "some text" } };
Thank you.
EDIT
I am not looking for regular expressions and the match function.
It is extremely slow for more or less large sets.
I does not have any notion of indexes.
It does not have any notion of stemming.
With the rethinkdb driver documented here
run h $ table "table" # R.filter (\row -> match "some text" (row ! "field"))
I have the following:
tmpArray[cTerms++] = "[sclenka] CONTAINS \"*" + sessionScope.sclenka +"*\"";
(With the help of Per Henrik Lausten)
Which should result in: "*term*"
But it doesn't, I get this instead: "term"
So, my question is how do I use wildcard full text search?
Thank you!
If you want to use a wildcard search, then generate the following query string:
tmpArray[cTerms++] = "[sclenka] = \"*" + sessionScope.sclenka +"*\"";
This should generate a search on "*search query*".
In general, this is a good way of performing a search since the user probably expect your search to work like that.
Source: http://www-10.lotus.com/ldd/ddwiki.nsf/dx/Searching_for_Documents#Full-text+Search
If your string is correct and you are getting no results, then test the same string in the Notes client FTI search.
You can also use the following debug on the server.
DEBUG_FTV_SEARCH=1
Then check the output on the domino console when you do a search.
So if I understand you, the result is an escaped form of the search term in which the asterisks have been removed?
Could you use the construct:
tmpArray[cTerms++] = "[sclenka] CONTAINS \"" + String.fromCharCode(42) + sessionScope.sclenka + String.fromCharCode(42) + "\"";
At least that should avoid escaping?
I think you have missed a bit of escaping characters in the String you are generating.
tmpArray[cTerms++] = "[sclenka] CONTAINS \"" + sessionScope.sclenka +"\"";
leyrer, is it possible -- just possible -- that you're doing this in a browser and your session is not authenticated? If so, you may be searching the database as "anonymous" where when you test from the browser you're searching as "leyrer".
It's just a thought - but I used to see that all the time when people would start using my NCT Search tools. They'd swear they were getting no results, and when I'd dig I'd always find that they were using the browser as anonymous rather than as a logged in session.
#GKIDD
I just tested this on my own site. I have NCTSearch setup. I accepts the search term from the the web and runs database.ftsearch() as part of its job from within lotuscript.
I searched on "data*" and got at least as many results as when I searched on "database".
Based on that, I think something else is going on.
From my earlier comment on other answer, try this: Create another agent that does JUST the search. Have it grab the search term from agent context as if it were a docid. Call the agent from the first agent using "agent.runonserver(searchterm)" see if you can fool it
Andrew, I'm getting the results with Anonymous user, but not with the wildcard. Here goo.gl/YVtXm on the first line, it says that CONTAINS or contains or = does not work when searching from the web.
I need to OR two SqlMethods.Like statements in LINQ, and I'm not sure how to accomplish it (or if it's the right way to go about it).
I've got vendor ID and vendor name fields, but I've only got a generic vendor search that allows a user to search for a vendor based on their name or ID. I also allow wildcards in the search, so I need to find vendors whose ID or name is like the user's input.
I want to do something like below, but obviously it's not correct. (EDIT: It does work as written.)
results = results.Where(p => SqlMethods.Like(p.VendorId, inputVendor.Replace("*", "%") ||
SqlMethods.Like(p.VendorName, inputVendor.Replace("*", "%"));
Background: I add where statements depending on the search parameters entered by the user, hence the results = results.Where part.
Any help would be appreciated!
It's not clear to me why this is "obviously" not correct. Presumably it's not working, otherwise you wouldn't have posted, but it's not obvious how it's not working.
I would suggest performing the replacement before the query, like this:
string vendorPattern = inputVendor.Replace("*", "%");
But then I'd expect this to work:
results = results.Where(p => SqlMethods.Like(p.VendorId, vendorPattern) ||
SqlMethods.Like(p.VendorName, vendorPattern));
Of course you're limited to where wildcards can appear in a SQL LIKE query, but that's a separate problem. (I'm not sure of the behaviour offhand if it's not at the start or end.)
If that doesn't help, please update the question with what happens when you try this.