Performance issue in drill view - view

We have a drill view where we are flattening a field. The specific field is a json with more than 160 attributes. When flattening the table with this field we are facing performance issue. Is there any optimization configuration we can do # apache drill ?

Related

Is there any performance benefit to creating an index mapping for Elasticsearch

I was wondering for people who have used Elasticsearch at scale if there is a performance benefit while searching if I create an index mapping and then put documents in it compared to not creating a mapping and just directly putting documents in
It is usually preferable to create the explicit mapping for an index, where possible.
For a search case, this is crucial in order to index data with the analysis chains needed to service the search strategy.
For a log use case, it may not be possible to know what the explicit mapping should be for log records that will be ingested, as there may be dynamic fields in the data that is not known ahead of time. Dynamic templates can help here, as can adopting a unified logging structure like Elastic Common Schema (ECS), either converting data to ECS format whilst logging, or converting whilst ingesting into Elasticsearch with ingest pipelines
Yes it is always better to use explicit mapping before putting the documents rather than depending on the dynamic mapping. If at all you are dependent on the dynamic mapping you may not be able to visualize on few data types like text. And also when you maintain mapping your index will always have the same kind of data. Please refer to this blog:
[https://qbox.io/blog/maximize-guide-elasticsearch-indexing-performance-part-1/][1]

What are the best ways to do a one time data load from Oracle to Elastic Search

We are trying to do a one-time data load from Oracle to Elastic Search.
We have evaluated Logstash but the indexing is taking a lot of time.
We have tried Apache Nifi but are facing difficulty in loading nested objects and computed results in apache Nifi.
We are trying to maintain one-many relations in nested objects(We have an oracle query to fetch these results) and we are also maintaining the result of a hierarchical query as a field in the index.
We are looking for an open-source alternative and an efficient approach to load around 10 tables with 3 million records each from Oracle to Elastic Search.
Please suggest.

FieldvalueCache not bein populated in solr 6.3 even though we are faceting on several fields?

We have large index of 200 GB. We have query requirements where we perform faceting on 5-6 fileds (whitespace tokenized). I have read solr documents which says Faceting tokenized field will populate fieldvalueCache. But for some reason all the facets are cached in FieldCache rather than fieldvaluecahe. Can someone explain as to why this is happening?
I guess this is due to Solr favors docValue to fieldValueCache.
https://issues.apache.org/jira/browse/LUCENE-5666
if you want to use fieldValueCache, you can do via json facet.
https://issues.apache.org/jira/browse/SOLR-8466
here is some more discussion regarding the changes
https://issues.apache.org/jira/browse/SOLR-7190
Here is some related discussion in stackoverflow,
lucene Fields vs. DocValues

ElasticSearch multiple types with same mapping in single index

I am designing an e-Commerce site with multiple warehouse. All the warehouses have same set of products.
I am using ElasticSearch for my search engine.
There are 40 fields each ES document. 20 out of them will differ in value per warehouse, rest 20 fields will contain same values for all warehouses.
I want to use multiple types (1 type for each warehouse) in 1 index. All of the types will have same mappings. Please advise if my approach is correct for such scenario.
Few things not clear to me,
Will the inverted index be created only once for all types in same index?
If new type (new warehouse) is added in future how it will be merged with the previously stored data.
How it will impact the query time if I would have used only one type in one index.
Depending on all types being assigned to the same index, it will only created once and
If a new type is added, its information is added to the existing inverted index as well - adding new terms to the index, adding pointers to existing terms in the index, adding data to doc values per new inserted document.
I honestly can't answer that one, though it is simple to test this in a proof of concept.
In my previous project, I experienced the same setting implementing a search engine with Elasticsearch on a multishop-platform. In that case we had all shops in one type and when searching per shop relevant filters were applied. Though, the approach to separate shop-data by "_type" seems pretty clean to me. We applied it the other way, since my implementation was already able to cover it by filters at the moment of the feature request.
Cheers, Dominik

In ElasticSearch, should I use multiple indexes for separate but related entities?

The overhead of adding indexes is well-documented, but I have not been able to find good information on when to use multiple indexes with regards to the various document types being indexed.
Here is a generic example to illustrate the question:
Say we have the following entities
Products (Name, ProductID, ProductCategoryID, List-of-Stores)
Product Categories (Name, ProductCategoryID)
Stores (Name, StoreID)
Should I dump these three different types of documents into a single index, each with the appropriate elasticsearch type?
I am having difficulty establishing where to the draw the line on one vs. multiple indexes.
What if we add an unrelated entity, "Webpages". Definitely a separate index?
A very interesting video explaining elasticsearch "Data Design Patterns" by Shay Banon:
http://vimeo.com/44716955
This exact question is answered at 13:40 where examining different data flows, by looking at the concepts of Type, Filter and Routing
Regards
I was recently modeling a ElasticSearch backend from scratch and from my point of view, the best option is putting all related documents types in the same index.
I read that some people had problems with too many concurrent indexes (1 index per type). It's better for performance and robustness to unify related types in the same index.
Besides, if the types are in the same index you can use "_parent" field to create hierarquical models that allow to you interesting features for search as "has_child" and "has_parent" and of course you have not to duplicate data in your model.

Resources