Building an index with multiple tables in ElasticSearch/Logstash 7.0 - elasticsearch

I have 20 tables in Oracle, all of them contain (among others) the following columns: id, name, description and notes. I would like the user to enter a text, the text to be searched in either name, description and/or notes of all the tables, and the result to return which table(s) and id(s) has the text.
In the Logstash 7.0 configuration file, do I need to define one jdbc input for each table? Or should the input be a single select with an union of all the tables?

So my answer for the above mentioned question is combine all tables info in single json.Then index it you can solve the problem in an easier way.

Related

HBase : Confusion between COLUMN and FILTER (SingleColumnValueFilter)

I have installed hbase and I have access to command's shell.
I have a table with 2 familly column like this:
create 'arbres', 'emplacement', 'propriete'
This request works fine :
scan 'arbres',{FILTER=>"SingleColumnValueFilter('emplacement', 'lieu_adresse', =,'binary:VOIE INCONNUE')", COLUMNS=>['emplacement'], COLUMN=>15}
But this second one, list all rows, without filter
scan 'arbres',{FILTER=>"SingleColumnValueFilter('emplacement', 'lieu_adresse', =,'binary:VOIE INCONNUE')", COLUMNS=>['propriete'], COLUMN=>15}
I don't understand why and I don't find the reason in the documentation.
Please can you explain a little the reason.
regards
The second command has a filter on different column family and column that you are not accessing.
The push down requires the columns to be accessed, meaning you should have the column family and column mentioned in the COLUMNS=>[]
The reason one would have two different column families is to make access easier and light weight, since each column family will have its own file.

Get data's source in kettle

When I use kettle , I was wandering how to get a table column's source column. Just for an example , after I have merged two tables into one table based on primary key already , Given any column in output table , I could judge whether table it belongs to and get the original column name in original table. Thank you for helping and sorry for my poor English...
http://i.stack.imgur.com/xoR0s.png
When I was given any field in table3 (suppose a field named A in table3) , I could know where it comes from without the graphical view (from java code or other ways) , like the original table name (here are input1 or input2) and the original column name(maybe B in input1 , but represents A in table3). Besides I use mysql.
There are a couple of ways to do this:
1) Manually. If you right-click on the output step and choose Show Output fields (or whatever it's called), you will see the "origin step" for each of the outgoing fields. You can do the same for input fields. Then you can trace them back to those origin steps, and repeat the process of viewing the input fields at those steps, and seeing those fields' origins, and so on. This is probably not what you're looking for.
2) With code. Prior to 6.0, you'd need to programmatically perform the same operations as are listed in option 1 above. In 6.0 there is the Data Lineage capability, which offers the LineageClient API that can find the origin fields for the specified output fields. For more information see my blog post describing the Data Lineage capability. Also I put a Gremlin Console in the PDI Marketplace, to make the use of LineageClient easier (and you can visually see the lineage graph too).

Hive: How to have a derived column that has stores the sentiment value from the sentiment analysis API

Here's the scenario:
Say you have a Hive Table that stores twitter data.
Say it has 5 columns. One column being the Text Data.
Now How do you add a 6th column that stores the sentiment value from the Sentiment Analysis of the twitter Text data. I plan to use the Sentiment Analysis API like Sentiment140 or viralheat.
I would appreciate any tips on how to implement the "derived" column in Hive.
Thanks.
Unfortunately, while the Hive API lets you add a new column to your table (using ALTER TABLE foo ADD COLUMNS (bar binary)), those new columns will be NULL and cannot be populated. The only way to add data to these columns is to clear the table's rows and load data from a new file, this new file having that new column's data.
To answer your question: You can't, in Hive. To do what you propose, you would have to have a file with 6 columns, the 6th already containing the sentiment analysis data. This could then be loaded into your HDFS, and queried using Hive.
EDIT: Just tried an example where I exported the table as a .csv after adding the new column (see above), and popped that into M$ Excel where I was able to perform functions on the table values. After adding functions, I just saved and uploaded the .csv, and rebuilt the table from it. Not sure if this is helpful to you specifically (since it's not likely that sentiment analysis can be done in Excel), but may be of use to anyone else just wanting to have computed columns in Hive.
References:
https://cwiki.apache.org/Hive/gettingstarted.html#GettingStarted-DDLOperations
http://comments.gmane.org/gmane.comp.java.hadoop.hive.user/6665
You can do this in two steps without a separate table. Steps:
Alter the original table to add the required column
Do an "overwrite table select" of all columns + your computed column from the original table into the original table.
Caveat: This has not been tested on a clustered installation.

iReports Grouping Bug - Multiple Occurrences?

I'm am getting irritated with iReports. Problem is that I have a data set returning data for multiple customers and I want to use the "Group Expression" against the customer ID and have the report lay out the Detail Tabs per customer.
I'm finding that, seeming randomly, where there is more that one data row for a customer iReports will generate two or more groupings (Sometimes it does what I expect and group all the customer data together), the field IDing the customer is the same and doesn't change.
Has anyone seen this before? To be honest I can't believe it is actually a bug, but something I've missed. Just much searching as yet to find a suitable result.
I think this is a data sorting problem.
The quote from iReport Ultimate Guide:
JasperReports groups records by evaluating the group expression. Every
time the expression's value changes, a new group instance is created.
The engine does not perform any record sorting (if not explicitly
requested), so when we define groups we should always take care of the
records sorting. That is, if we want to group a set of addresses by
country, the records we select for the report should already by
ordered by country. It is simple to sort data when using an SQL query
by using the ORDER BY clause. When this is not possible (that is, when
obtaining the records from an XML document), we can request that
JasperReports sort the data for us. This can be done using the sort
options available in the query window
You can sort data in these ways:
in case using of Database jdbc connection datasource type you can add ORDER BY customerId clause to the report's query, where customerId - column name of field with customer id
in case using of File csv connection or something like this you can organize data sorting by adding sortField property for field to the report's template (jrxml file):
<jasperReport ...>
...
<field name="customerId" class="java.lang.String"/>
<sortField name="customerId"/>
SQL Statement has ORDER BY?
iReport group is grouped by customer_id?

Altova Mapforce: Joining XML Input and conditional SQL Join using two tables

I'm trying to get the following done: Using Altova Mapforce, I use an XML file with schema as a source. I want to map it to exactly the same output, but only add data to one field.
The value of the field (it's Tax) is determined using a two table SQL join with a WHERE clause over both tables. The tables are joined using foreign keys, the relation is recognized by Mapforce.
The first field of the WHERE clause comes from the first table (header type table), the second and third field from the second tables (lines type tables).
However, I cannot seem to create the logical and correct equivalent of what I am describing here. I've tried it using complex AND constructions where it then inserts the one field I would need multiple times. I've tried WHERE clauses but they fail as they never supply both tables at the same time and there seems to be no way to use a pre-specified JOINing of two tables as a source. The WHERE clause then recognizes only the fields from the first table, not the second one.
Is there an example for this? Joining two (or more) tables, using WHERE to determine the exact row, then using a value from that row?
Best wishes.

Resources