Searching and and ampersand - full-text-search

I have a php/mysql directory. If someone searches a company name like "Johnson & Johnson" and it's it the DB as "Johnson and Johnson" it doesn't match.
I'm doing a NAME LIKE '% var %' kind of search currently. Is there an easy way to get this to work? I'm not sure if it's just a matter of setting up the table as INNODB with full text on the column or if there's more involved.
Thanks,
Don

Yeah, you need a more sophisticated search capable of tokenising the search terms and searching through a tokenised index. You could probably get some of the way there with a full text search in the InnoDB table engine, but you could also look at other options. Some that you could consider:
Sphinx
Lucene
Solr
Nutch
All of these are more sophisticated full text indexers and searchers than you will get built into a database engine, but will require more work to get set up and going than a mysql full text search too, so it depends on the features you need.

Replacement of & by and is not really a trivial task in my book.
You may fare better by doing that kind of replacement beforehand, using a set of pre-defined rules (e.g. "&" and "+" become "and").

Related

Does Couchbase 5 makes ElasticSearch useless for Full Text Search?

Couchbase FTS is now an official feature in version 5. Why would one still use ElasticSearch along with Couchbase?
Quoting from the documentation:
Couchbase FTS is similar in purpose to other search software such as
ElasticSearch or Solr. Couchbase FTS is not intended as a replacement
for third party search software if search is at the core of your
application. It is a simple and lightweight way to add search to your
Couchbase data without deploying additional software and servers. If
you have many queries which look like SELECT ... field1 LIKE %pattern% OR field2 LIKE %pattern, then full-text search may be right for you.
It will depend on your specific use case, but there is a reason why search is a complicated problem and some products spent years and years on working on that (and continue).
Full text search NOT EQUAL Search engine. Full Text Search does support a lot of functions that ElasticSearch provides. For example in ElasticSearch you can set weight of fields in result set, do geo search etc. Couchbase full text search is just full text search implementation, i.e. basic string matching function in specially indexed field only.
So, if your task is to do basic search on sub string as a part of a query, then you don't need ElasticSearch anymore. It make development quicker and infrastructure cheaper. However, if you are building system that need proper search engine, then you need ElasticSearch as much as before.

Elasticsearch - Autocomplete return word/term/token suggestions instead of whole documents

I am trying to implement a simple auto completion for query terms.
There are many different approaches but most of them do return documents instead of terms
- or the authors simply stopped explaining from that point and i am not able to adapt.
A user is typing in a query - e.g. phil
What i want is to provide a list of term completion suggestions like philipp, philius, philadelphia, ...
I am able to get document matches via (edge)ngrams, phrase_prefix and so on but i am am stuck at retrieving matching terms (completion suggestions).
Can someone give me a hint?
I have documents like this {"title":"...", "description":"...", "content":"..."}
All fields have larger string values but especially the field content contains fulltext content.
I do not want to suggest the whole title of a document containing e.g. Philadelphia. Just the word "Philadelphia".
Looking for something like that, myself.
In SOLR it was relatively simple to configure (although a pain to build and keep up-to-date) using solr.SpellCheckComponent. Somehow the same underlying Lucene functionality is used differently between SOLR and ElasticSearch, and in ElasticSearch it is geared towards finding whole documents (or whole field values, if you will) or so it seems...
Despite the profusion of "elasticsearch autocomplete" articles, none appears to deal with this particular issue. Like it doesn't exist. Maybe their use case is different and ElasticSearch works for them just fine, who knows?
At this point I think that preparing the exact field values to use with ElasticSearch autocomplete (yes, that's the input field values, not analyzer tokens) maybe the only way to solve the problem. Which is terrible, because the performance is going to be very low.
Try term suggester:
The term suggester suggests terms based on edit distance. The provided
suggest text is analyzed before terms are suggested. The suggested
terms are provided per analyzed suggest text token. The term suggester
doesn’t take the query into account that is part of request.

How to search for multiple strings in very large database

I want to search for multiple strings in a very large database. These strings are part of different attributes of database table. I have tried string search using LIKE in sql query. But it is taking a lot of time to get results. I have used Oracle database.
Should I use indexing of database? I found that Lucene can be used for it.
I also got some suggestions of using big data concepts. Which approach should I use?
The easiest way is:
1.) adding an index to the columns you like to search trough
2.) using oracle text as #lalitKumarB wrote
The most powerful way is:
3.) use an separate search engine (solr, elaticsearch).
But, probably you have to change you application in order to explicit use the search index for searching trough the data,...
I had the same situation some years before. Trying to search text in an big database. After a wile I found out, that database based search will never reach the performance of an dedicate search engine. And: you will have much more search features working out of the box, if you use solr (for example), like spelling correction, "More like this", ...
One option is to hold the data on orcale, searching in solr and return the ID of the document in order to only load the one row form oracle, the is referenced by the ID.
2nd option is to keep oracle as base datapool for your search engine and search in solr (or elasticsearch) in order to return the whole document/row from solr, not only the ID. So you don't need to load the data from the database any more.
The best option depends on your needs.
You have the choice between elasticsearch, solr or lucene

Keyword search over a collection of OWL ontologies

I have a collection of OWL ontologies. Each ontology is stored in a dataset of a triple store database (e.g OWLIM, Stardog, AllegroGraph ). Now I need to develop an application which supposes searching these ontologies based on keywords, i.e., given a keyword, the application should return ontologies that contains this keyword.
I have checked OWLIM-SE and Stardag, they only provide full text search over one dataset but not the whole database. I also have considered Solr(Lucene). But in this case the ontologies will be indexed twice (once by Lucene, another one by triple store database.)
Is there any other solution for this problem?
Thanks in advance.
Stardog's full text indexing works over an entire database and can be done transparently with SPARQL which will allow you to easily access other properties of the concepts matching your search criteria in a single query. This will get you precisely what you're describing.
For some information on administering the search indexes, and Stardog in general, check out these docs

apache cassandra query/full text search

I've been playing around with apache's cassandra project. Done a fair bit of readin and i have some fairly complex examples that i've done, including inserting single and batch sets of data, retrieving a single and multiple data sets based on keys.
Some of the articles i've looked at include
http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example
http://github.com/digg/lazyboy
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
http://www.sodeso.nl/?p=80
I've got a fairly good grasp of the concepts explained and have even implemented a simple app.
None of the articles describe how one would go about performing a query where, for eg, the query is a search term a user has typed in.
Does anyone know how or can suggest how i'd go about performing such a query?
Or perhaps a way to create a searchable index, full text search or anything even remotely close?
You will probably split text into words, and than use these words as keys to your "index". Each word will contain timestamp ordered column family with list of IDs to your articles, messages etc. So you can only perform simple searches over keys (words).
When searching more than one word, use intersection over these column families.
This is very simple approach, if you need more complex queries look at Lucandra - http://github.com/tjake/Lucandra - Lucandra is a fulltext search engine with Cassandra as backend storage.

Resources