Cross querying distinct engines on AppSearch - elasticsearch

Does AppSearch support cross distinct engine searching with the same query (where for eg two engines have a one to many relationship), such that the result set is a combination of both engines and with the data having filters applying to both datasets at the same time?
If this is supported, how would I write a query to do this, and are there special requirements regarding the data structure in the engines?
Or is there perhaps another way to structure data such that the second engine is not necessary but still allows the additional data to still be queryable?

Related

Is it OK to have multiple merge steps in an Excel Power query?

I have data from multiple sources - a combination of Excel (table and non table), csv and, sometimes, even a tsv.
I create queries for each data source and then I am bringing them together one step at a time or, actually, it's two steps: merge and then expand to bring in the fields I want for each data source.
This doesn't feel very efficient and I think that maybe I should be just joining everything together in the Data Model. The problem when I did that was that I couldn't then find a way to write a single query to access all the different fields spread across the different data sources.
If it were Access, I'd have no trouble creating a single query one I'd created all my relationships between my tables.
I feel as though I'm missing something: How can I build a single query out of the data model?
Hoping my question is clear. It feels like something that should be easy to do but I can't home in on it with a Google search.
It is never a good idea to push the heavy lifting downstream in Power Query. If you can, work with database views, not full tables, use a modular approach (several smaller queries that you then connect in the data model), filter early, remove unneeded columns etc.
The more work that has to be performed on data you don't really need, the slower the query will be. Please take a look at this article and this one, the latter one having a comprehensive list for Best Practices (you can also just do a search for that term, there are plenty).
In terms of creating a query from the data model, conceptually that makes little sense, as you could conceivably create circular references galore.

Elasticsearch - Modelling video catalogue information into one index vs multiple indexes

I need to model a video catalogue composed of movies, tv shows, episodes, TV channels and live programs information into elasticsearch. Some of these entities are correlated, some not.
The attributes of these entities are quite different, even if there are some common ones.
Now since I may need to do query cross-entity, imagine the scenario of a customer searching for something that could be a movie, a tv channel or a live event program, is it better to have 1 single index containing a generic entity marked with a logical type attribute, or is it better to have multiple indexes, 1 for each entity (movie, show episode, channel, program) ?
In addition, some of these entities, like movies, can have metadata attributes into multiple languages.
Coming from a relational data model DB, I would create different indexes, one for every entity and have a language variant index for every language. Any suggestion or better approach in order to have great search performance and usability?
Whether to use several indexes or not very much depends on the application, so I cannot provide a definite answer, rather a few thoughts.
From my experience, indexes are rather a means to help maintenance and operations than for data modeling. It is, for example, much easier to delete an index than delete all documents from one source from a bigger index. Or if you support totally separate search applications which do not query across each others data, different indexes are the way to go.
But when you want to query, as you do, documents across data sources, it makes sense to keep them in one index. If only to have comparable ranking across all items in your index. Make sure to re-use fields across your data that have similar meaning (title, year of production, artists, etc.) For fields unique to a source we usually use prefix-marked field names, e.g. movie_... for movie-only meta data.
As for the the language you need to use language specific fields, like title_en, title_es, title_de. Ideally, at query time, you know your user's language (from the browser, because they selected it explicitly, ...) and then search in the language specific fields where available. Be sure to use the language specific analyzers for these fields, at query and at index time.
I see a search engine a bit as the dual of a database: A database stores data but can also index it. A search engine indexes data but can also store it. A database tends to normalize the schema to remove redundancy, a search engine works best with denormalized data for query performance.

Caching? Large Query performance for multiple, optional filters

I'm trying to figure out my options for a large query which is taking a somewhat long but sensible amount of time considering what it does. It has many joins and has to be searched against for up to a predefined number of parameters. Some of these parameter values are predefined (select box) while some are a free-form text box (unfortunately LIKE with prefixed and suffixed wildcards). The data sets returned are large and the filter options are very likely to be changed frequently. The order of the result sets are also controlled by the user. Additionally, user access must be restricted to only results the user is authorized to. This authorization is handled as part of a baseline WHERE clause which is applied regardless of the chosen filters.
I'm not really looking for query optimization advice as I've already reviewed the query and examined/optimized the query plan as much as I can given my requirements. I'm more interested in alternative solutions intended for after the query has been optimized. Outside of trying to break up the query into separate smaller bits (which unfortunately is not an acceptable solution), I can only think of two options. But, I don't think they are a good fit for this situation.
Caching first came to my mind, but I don't think it is viable based
on how likely the filters will vary and the large datasets returned.
From my research, options such as ElasticSearch and Solr would not be the
right fit either as the data sets can be manipulated my multiple programs and these data stores would quickly become outdated.
Are there other options to improve the perceived performance of a search feature with these requirements?
You don't provide enough information about your tables and queries for a concrete solution.
As mentioned in a comment by #jmarkmurphy, DB2 and IBM i does it's own "caching". I agree that it's unlikely you'd be able to improve upon it when dealing with large and varied results sets. But you need to make sure you're using what's provided by IBM. For example, if using SQL embedded in RPGLE, make sure you don't have set option CLOSQLCSR=*ENDMOD. Also check the settings in QAQQINI you're using.
You've mentioned using Visual Explain and building some of the requested indexes. That's a good start. But as the queries are run in production, keep an eye on the plan cache, index usage and the advised indexes.
Lastly, you mentioned that you're seeing full table scans do to the use of LIKE '%SOMETHING%'. Again, without details of the columns and data involved, it's a guess as to what may be useful. As suggested in my comment, Omnifind for IBM i may be an improvement.
However, Omnifind is NOT and improved LIKE. Omnifind is designed to handle linguistic searches. From the article i Can … Find a Needle in a Haystack using OmniFind Text Search Server for DB2 for i:
SELECT story_id FROM story_library.story_table
WHERE CONTAINS(story_doc, 'blind mouse') = 1;
This query result will include matches that we’d expect from a typical search engine. The search is case insensitive, and linguistic variations on the search words will be matched. In other words, the previous query will indicate a match for documents that contain “Blind Mice.” In a similar manner, a search for “bad wolves” would return documents that contained “the Big Bad Wolf.”

What is the most efficient way to filter a search?

I am working with node.js and mongodb.
I am going to have a database setup and use socket.io to have real-time updates that will have the db queried again as well or push the new update to the client.
I am trying to figure out what is the best way to filter the database?
Some more information in regards to what is being queried and what the real time updates are:
A document in the database will include information such as an address, city, time, number of packages, name, price.
Filters include city/price/name/time (meaning only to see addresses within the same city, or within the same time period)
Real-time info: includes adding a new document to the database which will essentially update the admin on the website with a notification of a new address added.
Method 1: Query the db with the filters being searched?
Method 2: Query the db for all searches and then filter it on the client side (Javascript)?
Method 3: Query the db for all searches then store it in localStorage then query localStorage for what the filters are?
Trying to figure out what is the fastest way for the user to filter it?
Also, if it is different than what is the most cost effective way, then the most cost effective as well (which I am assuming is less db queries)...
It's hard to say because we don't see exact conditions of the filter, but in general:
Mongo can use only 1 index in a query condition. Thus whatever fields are covered by this index can be used in an efficient filtering. Otherwise it might do full table scan which is slow. If you are using an index then you are probably doing the most efficient query. (Mongo can still use another index for sorting though).
Sometimes you will be forced to do processing on client side because Mongo can't do what you want or it takes too many queries.
The least efficient option is to store results somewhere just because IO is slow. This would only benefit you if you use them as cache and do not recalculate.
Also consider overhead and latency of networking. If you have to send lots of data back to the client it will be slower. In general Mongo will do better job filtering stuff than you would do on the client.
According to you if you can filter by addresses within time period then you could have an index that cuts down lots of documents. You most likely need a compound index - multiple fields.

Adding Advanced Search in ASP.NET MVC 3 / .NET

In a website I am working on, there is an advanced search form with several fields, some of them dynamic that show up / hide depending on what is being selected on the search form.
Data expected to be big in the database and records are spread over several tables in a very normalized fashion.
Is there a recommendation on using a 3rd part search engine, sql server full text search, lucene.net, etc ... other than using SELECT / JOIN queries?
Thank you
Thinking a little outside the box here -
Check out CSLA.NET; Using this framework you can create business objects and "denormalise" your search algorithm.
Either way, be sure the database has proper indexes in place for better performance.
On the frontend youre going to need to use some javascript to map which top level fields show sub level fields. Its pretty straight forward.
For the actual search, I would recommend some flavor of Lucene.
You have your option of the .NET flavor of Lucene.NET which Stackoverflow uses, Solr which is arguably easier to setup and get running than Lucene is, or the newest kid on the block which is ElasticSearch which aims to be schema free and infinitely scalable simply by dropping more instances in the cluster.
I have only used Solr myself, and it has a nice .NET client (SolrNet).
first index your database field that is important and very usable
and for search better use full text search
i try it and result is very different from when i dont use full text
and better use select and join query in stored proc and call sp from your program

Resources