I'm using the Websolr addon in Heroku. What does it mean by "250,000 documents"? What number of DB records or size is that?
Nick from Websolr here.
In this case, 'documents' would be all the distinct 'things' that you want to search.
A Solr index is comprised of many documents. Each document has many fields. Typically each document is analogous to a row in a table, or an instance of a model for your particular ORM.
Typically, a Solr client for your preferred language will help you integrate that concept into your own application and the tools you have used to create it.
In Solr a document is an indexed 'item'.
Related
I am trying to apply enterprise search for our e-commerce webapp.
Actually we have some fields which doesn't need to be indexed / searched, but only stored.
The schema documentation does only allow 4 types for a field:
text
number
date
geolocation
Before I try to come around indexing by falsely setting fields to geolocation or date, I wanted to ask if there are any other options to only store, but not index data into App Search?
Documents indexed in AppSearch are visible in a dedicated indice in Elasticsearch.
As such, you can modify its mapping and reindex it or update all docs by query.
The only think I cannot tell you is the side effects incurred by such changes at appsearch level.
Best, Dan
Hi I'm trying out Elastic Enterprise Search with Elasticsearch. I have a couple of questions on data indexing.
When referring to Elasticsearch documentation, I read that there is a limit to the number of fields that an Elasticsearch index could have. Since Elasticsearch is used with Elastic Enterprise Search I believe there is no arguing that the same applies here. In that case lets say I have multiple document types with various fields. For an example Person.json and Dog.json, they both have different properties. So when indexing I use one search engine in Elastic Enterprise Search to index both Person and Dog so that when I query using the Elastic Enterprise Search API I'll get results which are both Person and Dog depending on the search term.
Is this the way to go,or should I specify a seperate search engine for each schema type?
I am assuming that your person.json and dog.json contains different fields as your heading suggest and weather to create a separate index for these entities or have them in a single index, depends on the various use-cases you have in your application and you will not find elasticsearch marking one approach better than other and mainly will explain the pros/cons based on a particular context(like relevance, performance, management etc).
Please refer to my this SO answer, where I talked about various pros/cons of both the approach and discussion in chat to get more context why OP chose an approach based on his use-case, after knowing the pros/cons.
Mostly what I do is to assemble the mapping by hand. Choosing the correct types myself.
Is there any tool which facilitates this?
For example which will read a class (c#,java..etc) and choosing the closest ES types accordingly.
I've never seen such a tool, however I know that ElasticSearch has a REST API over HTTP.
So you can create a simple HTTP query with JSON body that will depict your object with your fields: field names + types (Strings, numbers, booleans) - pretty much like a Java/C# class that you've described in the question.
Then you can ask the ES to store the data in the non-existing index (to "index" your document in ES terms). It will index the document, but it will also create an index, and the most importantly for your question, will create a mapping for you "dynamically", so that later you will be able to query the mapping structure (again via REST).
Here is the link to the relevant chapter about dynamically created mappings in the ES documentation
And Here you can find the API for querying the mapping structure
At the end of the day you'd still want to retain some control over how your mapping is generated. I'd recommend:
syncing some sample documents w/o a mapping
investigating what mapping was auto generated and
dropping the index & using dynamic_templates to pseudo-auto-generate / update the mapping as new documents come in.
This GUI could help too.
Currently, there is no such tool available to generate the mapping for elastic.
It is a kind of similar thing as we have to design a database in MySQL.
But if we want such kind of thing then we use Mongo DB which requires no predefined schema.
But Elastic comes with its very dynamic feature, which allows us to play around it. One of the most important features of Elasticsearch is that it tries to get out of your way and let you start exploring your data as quickly as possible like the mongo schema which can be manipulated dynamically.
To index a document, you don’t need to first define a mapping or schema and define your fields along with their data type .
You can just index a document and the index, type, and fields will be created automatically.
For further details you can go through the below documentation:
Elastic Dynamic Mapping
I'm creating a microservice to handle the contacts that are created in the software. I'll need to create contacts and also search if a contact exists based on some information (name, last name, email, phone number). The idea is the following:
A customer calls, if it doesn't exist we create the contact asking all his personal information. The second time he calls, we will search coincidences by name, last name, email, to detect that the contact already exists in our DB.
What I thought is to use a MongoDB as primary storage and use ElasticSearch to perform the query, but I don't know if there is really a big difference between this and querying in a common relational database.
EDIT: Imagine a call center that is getting calls all the time from mostly different people, and we want to search fast (by name, email, last name) if that person it's in our DB, wouldn't ElasticSearch be good for this?
A relational database can store data and also index it.
A search engine can index data but also store it.
Relational databases are better in read-what-was-just-written performance. Search engines are better at really quick search with additional tricks like all kinds of normalization: lowercase, ä->a or ae, prefix matches, ngram matches (if indexed respectively). Whether its 1 million or 10 million entries in the store is not the big deal nowadays, but what is your query load? Well, there are only this many service center workers, so your query load is likely far less than 1qps. No problem for a relational DB at all. The search engine would start to make sense if you want some normalization, as described above, or you start indexing free text comments, descriptions of customers.
If you don't have a problem with performance, then keep it simple and use 1 single datastore (maybe with some caching in your application).
Elasticsearch is not meant to be a primary datastore so my advice is to use a simple relational database like Postgres and use simple SQL queries / a ORM mapper. If the dataset is not really large it should be fast enough.
When you have performance issues on searches you can use a combination of relation db and Elasticsearch. You can use Elasticsearch feeders to update ES with your data in you relational db.
Indexed RDBMS works well for search
If your data is structured i.e. columns are clearly defined, searching 1 million records will also not be a problem in RDBMS.
When to use Elastic
Text Search: Searching words across multiple properties (e.g. description, name etc.)
JSON Store and search: If data being stored is in json format and later needs to be searched
Auto Suggestions: Elastic is better at providing autocomplete suggestions
Elastic as an application data provider
Elastic should not be seen as data store, even if you storing data in it. It is about how you perceive elastic. Elastic should be used to store and setup data for the application. It is the application which decides how and when to use elastic (search and suggestions). Elastic is not a nosql storage alternative if compared to RDBMS, you should use a nosql database instead.
This perception puts elastic in line with redis and kafka. These tools are key components of an application design and they are used to serve as events stores, search engines and cache etc. to the applications.
Database with Elastic
Your design should use both. For storing the contacts use the database, index the contacts for querying. Also make the data available in elastic for searching, autocomplete and related matches.
As always, it depends on your specific use case. You briefly described it, but how are you acually going to use the data?
If it's just something simple like checking if a customer exists and then creating a new customer, then use the RDMS option. Moreover, if you don't expect a large dataset, so that scaling isn't an issue (hence the designation that Elasticsearch is for BigData), but you have transactions and data integrity is important, then a RDMS will be the right fit. Some examples could be for tax, leasing, or financial reporting systems.
However, if you have a large dataset, you need a wide range of query capabilities, such as a fuzzy search or searches where the user
can select multiple filters on the data or you want to do some predictive analysis on the data, then Elasticsearch is the clear choice.
For example, I worked on an web based app with a large customer base: 11 million, with 200+ hits per second at peak time for a find a doctor application. The customer could check some checkboxes to determine, specialty, spoken languages, ratings, hospitals, etc. all sorted by the distance from the users location with a 2 second or less response time. It would be very difficult for a RDMS to match that.
I want to search for multiple strings in a very large database. These strings are part of different attributes of database table. I have tried string search using LIKE in sql query. But it is taking a lot of time to get results. I have used Oracle database.
Should I use indexing of database? I found that Lucene can be used for it.
I also got some suggestions of using big data concepts. Which approach should I use?
The easiest way is:
1.) adding an index to the columns you like to search trough
2.) using oracle text as #lalitKumarB wrote
The most powerful way is:
3.) use an separate search engine (solr, elaticsearch).
But, probably you have to change you application in order to explicit use the search index for searching trough the data,...
I had the same situation some years before. Trying to search text in an big database. After a wile I found out, that database based search will never reach the performance of an dedicate search engine. And: you will have much more search features working out of the box, if you use solr (for example), like spelling correction, "More like this", ...
One option is to hold the data on orcale, searching in solr and return the ID of the document in order to only load the one row form oracle, the is referenced by the ID.
2nd option is to keep oracle as base datapool for your search engine and search in solr (or elasticsearch) in order to return the whole document/row from solr, not only the ID. So you don't need to load the data from the database any more.
The best option depends on your needs.
You have the choice between elasticsearch, solr or lucene