I'm thinking about copying my text searchable content to Google's BigQuery and then perform full-text search using BigQuery API.
Does Google BigQuery support that scenario?
I could not find "search" command in Google BigQuery API:
https://developers.google.com/bigquery/docs/reference/v2/
BigQuery support a collection of RegEx and String query functions, making it suitable for text search queries across STRING fields. However, there is a 64k per row (and field) limit for each BigQuery record, so it may not possible to support a totally unstructured, unlimited size, document text search case.
https://developers.google.com/bigquery/docs/query-reference#stringfunctions
https://developers.google.com/bigquery/docs/query-reference#regularexpressionfunctions
For a full text search capabilities in an App Engine application, I would suggest looking at the new Search API:
https://developers.google.com/appengine/docs/python/search/overview
10 years late and here we are. Today (07/04/22) BigQuery launched It equivalent of Full Text Search. Here is the doc:
https://cloud.google.com/blog/products/data-analytics/pinpoint-unique-elements-with-bigquery-search-features/
The litecene library provides full-text search support for BigQuery using a "lucene light" syntax.
(smartphone OR "smart phone"~8 OR iphone OR "i phone" OR "apple phone" OR android OR "google phone" OR "windows phone") AND app*
It compiles the boolean query language down to regular expression matches. It also makes use of new BigQuery search features -- namely the SEARCH function and search indexes -- when possible, although at the time of this writing the searches supported by those features are fairly limited. Using litecene, full-text search can also be deployed against existing production datasets without any ETL changes or re-indexing using non-aggregate materialized views. The searches can target one or multiple columns.
Disclaimer: I am the author of the library.
Related
I am evaluating search technologies and one of my requirements is the ability to hit translated text also.
For example, there are text documents written in English and French. And lucene will index them.
If I am searching for the string "apple", it should search for both "apple" and "pomme" and show documents with either.
Will any technologies provide automatic translation of token words ?
Or only way to do that is to translate it using Google API and then feed it to lucene for indexing?
There are no automatic translations in Lucene/Solr/Elasticsearch, but they have a similar feature, called Synonyms. You can create a list of synonyms with Google Api to translate the terms in the search time, not the index time.
With this approach, you can search for "apple" and the search engine will see "apple" and "pomme" as synonyms, and you will get the result as expected.
Couchbase FTS is now an official feature in version 5. Why would one still use ElasticSearch along with Couchbase?
Quoting from the documentation:
Couchbase FTS is similar in purpose to other search software such as
ElasticSearch or Solr. Couchbase FTS is not intended as a replacement
for third party search software if search is at the core of your
application. It is a simple and lightweight way to add search to your
Couchbase data without deploying additional software and servers. If
you have many queries which look like SELECT ... field1 LIKE %pattern% OR field2 LIKE %pattern, then full-text search may be right for you.
It will depend on your specific use case, but there is a reason why search is a complicated problem and some products spent years and years on working on that (and continue).
Full text search NOT EQUAL Search engine. Full Text Search does support a lot of functions that ElasticSearch provides. For example in ElasticSearch you can set weight of fields in result set, do geo search etc. Couchbase full text search is just full text search implementation, i.e. basic string matching function in specially indexed field only.
So, if your task is to do basic search on sub string as a part of a query, then you don't need ElasticSearch anymore. It make development quicker and infrastructure cheaper. However, if you are building system that need proper search engine, then you need ElasticSearch as much as before.
I want to search for multiple strings in a very large database. These strings are part of different attributes of database table. I have tried string search using LIKE in sql query. But it is taking a lot of time to get results. I have used Oracle database.
Should I use indexing of database? I found that Lucene can be used for it.
I also got some suggestions of using big data concepts. Which approach should I use?
The easiest way is:
1.) adding an index to the columns you like to search trough
2.) using oracle text as #lalitKumarB wrote
The most powerful way is:
3.) use an separate search engine (solr, elaticsearch).
But, probably you have to change you application in order to explicit use the search index for searching trough the data,...
I had the same situation some years before. Trying to search text in an big database. After a wile I found out, that database based search will never reach the performance of an dedicate search engine. And: you will have much more search features working out of the box, if you use solr (for example), like spelling correction, "More like this", ...
One option is to hold the data on orcale, searching in solr and return the ID of the document in order to only load the one row form oracle, the is referenced by the ID.
2nd option is to keep oracle as base datapool for your search engine and search in solr (or elasticsearch) in order to return the whole document/row from solr, not only the ID. So you don't need to load the data from the database any more.
The best option depends on your needs.
You have the choice between elasticsearch, solr or lucene
I have a collection of OWL ontologies. Each ontology is stored in a dataset of a triple store database (e.g OWLIM, Stardog, AllegroGraph ). Now I need to develop an application which supposes searching these ontologies based on keywords, i.e., given a keyword, the application should return ontologies that contains this keyword.
I have checked OWLIM-SE and Stardag, they only provide full text search over one dataset but not the whole database. I also have considered Solr(Lucene). But in this case the ontologies will be indexed twice (once by Lucene, another one by triple store database.)
Is there any other solution for this problem?
Thanks in advance.
Stardog's full text indexing works over an entire database and can be done transparently with SPARQL which will allow you to easily access other properties of the concepts matching your search criteria in a single query. This will get you precisely what you're describing.
For some information on administering the search indexes, and Stardog in general, check out these docs
I'm planing to implement a Free text search using Lucene.net and also I'm new to Lucene. In our project we've used ASP.net MVC 3.0 and Entity Framework 4.1.
Is it a good decision to use Lucene over free text search in MS SQL server ?
What are the implecations that I need to take care?
Is it possible to use MS SQL Sever to store indexed documents in Lucene over file system ?
Is it a good decision to use Lucene over free text search in MS SQL
server ?
It depends on the amount of data and the query flexibility you want. If you have a large amount of data and you want very flexible queries, yes it is.
What are the implecations that I need to take care?
You will need to manually keep your lucene indexes up to date with the database, and you will need to handle the free text search yourself.
Is it possible to use MS SQL Sever to store indexed documents in
Lucene over file system ?
see http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_store_the_Lucene_index_in_a_relational_database.3F
I'd recommend you to take a look at the Lucene java FAQ, pretty much everything there applies to Lucene.NET as well and it adresses lots of other questions you may have.