One Solr core, multiple unrelated tables - java-8

I need to search over multiple tables, getting only one resultset, but these tables don't have any relationship.
Can I do it with Solr?

Maybe Solr's aliases might fit your needs.
Create your alias with :
http://[solr.host:port]/solr/admin/collections?action=CREATEALIAS&name=testalias&collections=[collection1],[collection2]
and then query as usual as if "testalias" was one collection. For example:
http://[solr.host:port]/solr/testalias/select?q=:&rows=100&wt=json&indent=true
This query will return all the data from both collections.

Related

Is it possible to merge to create a document C from document A and B with ElasticSearch

My problem is that I need to perform sorting from data coming from two different datasource, a MySQL database which contains information about some products and a PostgreSQL that contains some metrics linked to these products.
Because the data resides in two different datasources I cannot out of the box come up with a single performant query that would make the ordering (pagination) at database level.
I need to make two different queries and then manually merge the data and perform sorting and pagination code side.
I would like to avoid as much as possible having to create a custom pagination system and a manual data merging and as much as possible delegate this job to the underlying database.
This is where I thought a system such as ElasticSearch (or Solr, but ES seems to be easier to use) could help.
1) Does ES provide tools or mechanism to merge 2 datasource into 1 document ? Or this job needs to be done by a 3rd party tool that will peridocally pull the data from both datasource and create / update the documents?
2) I'm correct to assume that having 2 indices (or 2 different doc type) is pointless in my case since ES cannot perform join queries ?
3) Apart from creating one single document what other solution do I have that ES can help with? Is it possible 'somehow' that with having datasource1 data in an index1 and datasource2 data in an index2 I can perform multiple search queries using both the index at the same time (since join is a no go).
Does ES provide tools or mechanism to merge 2 datasource into 1 document ? Or this job needs to be done by a 3rd party tool that will peridocally pull the data from both datasource and create / update the documents?
There are two approaches to accomplish this :
An ETL process (Extract, Transform, Load) to load data from both sources into one single document. In the Elastic world you can use logstash to accomplish this
Data Virtualization is supposed to do this without the need to copy the data
3) Apart from creating one single document what other solution do I have that ES can help with? Is it possible 'somehow' that with having datasource1 data in an index1 and datasource2 data in an index2 I can perform multiple search queries using both the index at the same time (since join is a no go).
It's very easy to perform a single query through multiple indices. Answers here

Joining Tables in Kibana

Suppose I have a huge database (table a) about employees in a certain department which includes the employee name in addition to many other fields. Now in a different databse (or a different table, say table b) I have only two entries; the employee name and his ID. But this table (b) contains entries not only for one department but rather for the whole company. The raw format for both tables is text-files so I parse them with logstash into Elasticsearch and then I visualize the results with Kibana.
Now after I created several visualizations from table (a) in Kibana where the x-axis shows the employee name, I realize it would be nice if we have the employee IDs instead. Since I know I have this information in table (b), I search for someway to tell Kibana to translate the employee name in the graphs generated from table (a) to employee ID based on table (b). My questions are as follows:
1) Is there a way to do this directly in Kibana? If yes, can we do it if each table is saved in a separate index or do we have to save them both in the same idnex?
2) If this cannot be done directly in Kibana and has to be done when indexing the data, is there a way to still parse both text files separately with logstash?
I know Elasticsearch is a non-relational database and therefore is not designed for SQL-like functionalities (join). However there should be an equivalent or a workaround. This is just a simple use case but of course the generic question is how to correlate data from different sources. Otherwise Elasticsearch would be honestly not that powerful.
Similar questions have been asked and answered.
Basically the answer is that -- no you can't do joins in Kibana, you have to do them at indexing time. Space is cheap and elasticsearch handles duplicate data nicely, so just create any fields you need to display at indexing time.
You might want to give Kibi a try.
The answer, unfortunately that I know of, is either write your own plug-in OR as we have had to do, downgrade to ES 2.4.1 and install Kibi
(https://siren.solutions/new-release-siren-join-2-4-1-compatible-with-es-2-4-1/)
and then install the kibi join plugin
(http://siren.solutions/relational-joins-for-elasticsearch-the-siren-join-plugin/)
This will allow you to get the joins you seek from a relational DB.

Cassandra schema, query

I'm designing a new application, which will use Cassandra (I'm new in Cassandra). This database will contain only 2-4 column families. The problem is that, I have to provide opportunity to filter based on almost every column attributes. Could you give me some helpful suggestion that I have to keep in mind during planning? What about data redundancy?
Cassandra isn't optimized for this use-case. The preferred way to query data is using the primary key.
Filtering by arbitrary columns is possible
using the ALLOW FILTERING query modifier
creating a secondary index for each column, which could not be combined in a single query
creating lookup tables with different primary key variants based on the column you want to filter
All of those options have their limitations.

Exact Duplicate records in Ms Access DB Deleting?

I have a database which imports linked data-tables. Obviously with linked Tables I cannot change the design of the data tables. However there are many duplicates in the data table that I want to use and my aim is to run a query that deletes all but 1 of the duplicates in the table. Is there a way of doing this??
Any support would be appreciated.
Chris
To omit the duplicates you could conceivably create a SELECT DISTINCT query to return all the fields from the linked table, and then use that query (instead of the linked table) as a basis for your other queries.

NHibernate Criteria query on in-memory collection of entities

I would like to apply a Criteria query to an in-memory collection
of entities, instead of on the database. Is this possible?
To have Criteria API work like LINQ? Or alternatively, convert
Criteria query to LINQ query.
Thanks!
I don't believe you can use Criteria to query against an in-memory collection and come to think about it it doesn't seem to make much sense. If I'm understanding everything correctly you've already queried against your database. I'd suggest to either tune your original query (whichever method you choose) to include all of your filters. Or you could use LINQ (as you suggested) to refine your results.
Also, what's your reasoning for wanting to query from memory?
It sounds like you're rolling your own caching mechanism. I would highly recommend checking out NHibernate's 2nd level cache. It handles many complex scenarios gracefully such as invalidating query results on updates to the underlying tables.
http://ayende.com/Blog/archive/2009/04/24/nhibernate-2nd-level-cache.aspx

Resources