How to do SQL IN like query in hibernate search - filter

A simulating scenario is:
Search for books whose content contains "success" AND author is in a list of passed names(could be thousands of).
I looked into filter:
http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#query-filter
Seams like hibernate search has no native support of this.
What is recommended approach for this problem? I think I am not alone.
Thanks for any inputs.

Let me post my current solution.
Get the search results with minimal projections for the keywords, and loop through the results to get only matching ones from the IN list.
I am not using filter.
Open to other alternatives once convinced.

If you look here http://lucene.apache.org/java/2_4_1/queryparsersyntax.html (at the end "Field Grouping"), you can write a query with something like :
content:success AND author:("firstname" "secondname" "thirdname" ...)

Related

autocomplete and search in Elasticsearch

Is there any possibility to make a search on two non-complete words in the same field using Elasticsearch in Rails? I mean the situation when I could successfully search for example "victorian buildings" phrase by inserting into search input for example "vict bui" phrase (only beginnings of words, also with fuzziness).
Partial match (word_start, text_start etc. available in Searchkick) doesn't work in this project. I've also tried using wildcard queries, but it also failed. Maybe writing some custom mappings/settings would be a good idea?
Can I ask you for any suggestions on what to search/read to do this task?
Try this example
"%#{params[:place]}%"
Since % is a wildcard, doing a like on '%%' matches everything,
and you get all the records in the result.

How to search on FHIR using complex nested queries

I haven't really found examples or instructions on how a complex nested query should look like when searching a FHIR resource.
Some examples (pseudo-code):
(name=Mary AND gender=female) OR (address-city=Springfield AND
address-state=NY)
((name=Mary AND gender=female) OR
(address-city=Springfield & address-state=NY)) AND active=true
Is that even possible? If yes, how?
FHIR supports quite an elaborate search syntax, but it isn't a query language. The searches you want cannot be done in 1 go with this, unless you have access to the server and can implement the queries on that yourself.
If you have access/influence server side, you can implement a named query, and then use the _query search parameter to execute that (see http://www.hl7.org/fhir/search.html#query).
If you don't have that access, you can perform your queries in a couple of steps. For example your first one would take 2 queries:
GET [fhir endpoint]/Patient?name=Mary&gender=female
GET [fhir endpoint]/Patient?address-city=Springfield&address-state=NY
Both would give you a Bundle of results. The two Bundles together would be the complete list of matching resources you were looking for.
For the second example query, you would need to supply both GETs with &active=true.

Search algorithm options for ontology querying?

I have developed a tool that enables searching of an ontology I authored. It submits the searches as SPARQL queries.
I have received some feedback that my search implementation is all-or-none, or "binary". In other words, if a user's input doesn't exactly match a term in the ontology, they won't get any hit at all.
I have been asked to add some more flexible, or "advanced" search algorithms. Indexing and bag-of-words searching were suggested.
Can anyone give some examples of implementing search methods on an ontology that don't require a literal match?
FIrst of all, what kind of entities are you trying to match (literals, or string casts of URIs?), and what kind of SPARQL queries are you running now? Something like this?
?term ?predicate "user input" .
If you are searching across literals, you can make the search more flexible right off the bat by using case-insensitive regular expression filtering, although this will probably make your searches slower, and it won't catch cases where some of the word tokens are present but in a different order. In the following example, your should probably constrain the types of ?term and ?predicate first, or even filter on a string datatype on ?userInput
?term ?predicate ?someLiteral .
FILTER(regex(?someLiteral), "user input", "i"))
Several triplestores offer support for full-text searching and result scoring. These are often extensions to the SPARQL language.
For example, Virtuoso and some others offer a bif:contains predicate. Virtuoso also offers the faceted search web interface (plus a service, I think.) I have been pleased with the web-based full text search in Blazegraph and Stardog, but I can't say anything at this point about using them with a SPARQL query to get a score on a search pattern. Some (GraphDB) even support explicit integration with Lucene or Solr*, so you may be able to take advantage of their search languages.
Finally... are you using a library like the OWL API or RDF4J to access your ontology? If so, you could certainly save the relationships between your terms and any literals in a Java native data structure, and then directly use a fuzzy search component like Lucene to index each literal as a "document" and then search the user input across the index.
Why don't you post your ontology and give an example of a search you would like to peform in a non-binary way. I (or someone else) can try to show you a minimal implementation.
*Solr integration only appears to be offered in the commercially-licensed version of GraphDB

How to search for a substring (WHERE column LIKE '%foo%')

I'm reading parse API documentation at https://parse.com/docs/rest/guide#queries and can't find how to search by a substring. SQL equivalent would be:
... WHERE column_name LIKE "%foo%"
There's a bunch of options such as &gt, &lt, &in, and similar, but there's no option for LIKE. It's pretty common use case... What am I missing?
We've found something that looks like an undocumented feature.
There's a $regex lookup it's not mentioned in the official API reference. It allows for matching by regular expressions which solved the problem for us.
We believe it should be documented here:
https://parse.com/docs/rest/guide/#queries-query-constraints
But apparently it isn't.
In your case, I think you should follow Parse's tutorial here: http://blog.parse.com/learn/engineering/implementing-scalable-search-on-a-nosql-backend/
It is very helpful in terms of search. The main ideas are:
Separate the texts into single words and store as in array
Then perform search on the new field using containedIn(or $in as in REST API) query.
There's also a question regarding this too: How to make a "like" query in Parse.com
EDIT: New blog link: https://web.archive.org/web/20150416171914/http://blog.parse.com/2013/03/19/implementing-scalable-search-on-a-nosql-backend/

How do I see/debug the way SOLR find it's results?

Let's say I search for "ABLS" and the SOLR returns a result that to me does not make any sense.
How can I debug why SOLR picked this record to be returned?
debugQuery=true would help you get the detailed score calculation and the explanation for each scores.
An over view of the scoring is available at link
For detailed explaination of the debug information you can refer Link
You could add debugQuery=true&indent=true to the url and examine the results. You could also use the analysis tool in solr. Go to the admin and click analysis. You would need to read the wiki to understand either of these more in depth.
queryDebug will give you knowledge about why your scoring looks like it does (end how every field is relevant).
I will get some results that you are not understand and play with them with Solr's analysis
You should find it under:
/admin/analysis.jsp?highlight=on
Alternatively turn on highlighting over your results to see what is actually matching in your results
Solr queries are full of short parameters, hard to read and modify, especially when the parameters are too many.
And after it is even harder to debug and understand why a document is more or less relevant than another. The debug explain output usually is a three too big to fit in one page.
I found this Google Chrome extension useful to see Solr Query explain and debug in a clear manner.
For those who still use very old version of solr 3.X, "debugQuery=true" will not put the debug information. you should specify "debugQuery=on".
There are two ways of doing that. First is the query level, which means adding the debugQuery=on to your query. That will include a few things:
parsed query
debug timing information
detailed scoring information which helps you with analysis of why a give document is given a score.
In addition to that, you can use the [explain] transformer and add it to your fl parameter. For example ...&fl=*,[explain], which will result in your documents having the scoring information as another field.
The scoring information can be quite extensive and will include calculations done by the similarity algorithm. If you would like to learn more about the similarities and the scoring algorithm in Solr, have a look at this my and my colleague Radu from Sematext talk from the Activate conference: https://www.youtube.com/watch?v=kKocQdYGVJM

Resources