I am evaluating search technologies and one of my requirements is the ability to hit translated text also.
For example, there are text documents written in English and French. And lucene will index them.
If I am searching for the string "apple", it should search for both "apple" and "pomme" and show documents with either.
Will any technologies provide automatic translation of token words ?
Or only way to do that is to translate it using Google API and then feed it to lucene for indexing?
There are no automatic translations in Lucene/Solr/Elasticsearch, but they have a similar feature, called Synonyms. You can create a list of synonyms with Google Api to translate the terms in the search time, not the index time.
With this approach, you can search for "apple" and the search engine will see "apple" and "pomme" as synonyms, and you will get the result as expected.
Related
I have been using ES to handle regular text/keyword search, is there a way to use elastic search to handle context based search i.e when user have given a search text "articles between 10 august and 24 September" and such similar scenarios, ES should be able identify what user is asking and present results. I suppose we are supposed to involve ML to handle such scenarios, If any NLP or ML integrations need to be done where should i start to up the search experience.
Any insight over this is much appreciated
This is called semantic parsing. What you need to do is to map the sentence to a logical form. This is a challenging task, since the computer needs to understand your sentence. You may create your own Semantic Parser(e.g., SEMPRE) to do the translation, or use existing methods to do such translations (translate human language to elastic search queries).
Couchbase FTS is now an official feature in version 5. Why would one still use ElasticSearch along with Couchbase?
Quoting from the documentation:
Couchbase FTS is similar in purpose to other search software such as
ElasticSearch or Solr. Couchbase FTS is not intended as a replacement
for third party search software if search is at the core of your
application. It is a simple and lightweight way to add search to your
Couchbase data without deploying additional software and servers. If
you have many queries which look like SELECT ... field1 LIKE %pattern% OR field2 LIKE %pattern, then full-text search may be right for you.
It will depend on your specific use case, but there is a reason why search is a complicated problem and some products spent years and years on working on that (and continue).
Full text search NOT EQUAL Search engine. Full Text Search does support a lot of functions that ElasticSearch provides. For example in ElasticSearch you can set weight of fields in result set, do geo search etc. Couchbase full text search is just full text search implementation, i.e. basic string matching function in specially indexed field only.
So, if your task is to do basic search on sub string as a part of a query, then you don't need ElasticSearch anymore. It make development quicker and infrastructure cheaper. However, if you are building system that need proper search engine, then you need ElasticSearch as much as before.
I have a scenario where i need to map each article to an entity. To do so, we are maintaining a set of keywords / search phrase (ex: (icici OR hdfc) AND bank) that may be available in each article. We want to use the power of elastic search to scan all the search phrases that may be available in the article being processed.
What i have come across yet is forward search (like full text search and so on) But now here what i need is to have a reverse search of search phrases against an article.
I was digging for a solution and hopped some genius would have already discovered the same and would help in for the same.
In Elasticsearch it's called percolator.
I'm thinking about copying my text searchable content to Google's BigQuery and then perform full-text search using BigQuery API.
Does Google BigQuery support that scenario?
I could not find "search" command in Google BigQuery API:
https://developers.google.com/bigquery/docs/reference/v2/
BigQuery support a collection of RegEx and String query functions, making it suitable for text search queries across STRING fields. However, there is a 64k per row (and field) limit for each BigQuery record, so it may not possible to support a totally unstructured, unlimited size, document text search case.
https://developers.google.com/bigquery/docs/query-reference#stringfunctions
https://developers.google.com/bigquery/docs/query-reference#regularexpressionfunctions
For a full text search capabilities in an App Engine application, I would suggest looking at the new Search API:
https://developers.google.com/appengine/docs/python/search/overview
10 years late and here we are. Today (07/04/22) BigQuery launched It equivalent of Full Text Search. Here is the doc:
https://cloud.google.com/blog/products/data-analytics/pinpoint-unique-elements-with-bigquery-search-features/
The litecene library provides full-text search support for BigQuery using a "lucene light" syntax.
(smartphone OR "smart phone"~8 OR iphone OR "i phone" OR "apple phone" OR android OR "google phone" OR "windows phone") AND app*
It compiles the boolean query language down to regular expression matches. It also makes use of new BigQuery search features -- namely the SEARCH function and search indexes -- when possible, although at the time of this writing the searches supported by those features are fairly limited. Using litecene, full-text search can also be deployed against existing production datasets without any ETL changes or re-indexing using non-aggregate materialized views. The searches can target one or multiple columns.
Disclaimer: I am the author of the library.
Is there a way to do faceted searches using the elasticsearch Search API maintaining case (as opposed to having the results be converted to lowercase).
Thanks in advance, Chuck
Assuming you are using the "terms" facet, the facet entries are exactly the terms in the index. Briefly, analysis is the process of converting a field value into a sequence of terms, and lowercasing is a step in the default analyzer; that's why you're seeing lowercased terms. So you will want to change your analysis configuration (and perhaps introduce a multi_field if you want to run several different analyzers.)
There's a great explanation in Lucene in Action (2nd Ed.); it's applicable to ElasticSearch, too.