KafkaConnect - JdbcSinkConnector - mapping fields columns - apache-kafka-streams

I made transformation to load data from topics to a postgres database.
Is it possible to map a field of the record in a column with different name ?
As for example : field_x > column_field_x
"fields.whitelist":"field_x"
Today this field is trying to go a column "field_x" but I would like it to go on "column_field_x"
Thanks

You can use the transformation "ReplaceField" https://docs.confluent.io/platform/current/connect/transforms/replacefield.html#replacefield

Related

Laravel Datatable Show Different data from database

Hi Guys i wanna ask something
I have column in database mysql like :
199910192022032006
but when laravel datatable show that data column, its change like:
199910192022032000
why its can be happen ?
what data type are you using? integer? try to change into float or double or maybe varchar

Why does Kafka Connect treat timestamp columns differently?

I have a Kafka Connect configuration set up to pull data from DB2. I'm not using Avro, just the out-of-the-box json. Among the colums in the db are several timestamp columns, and when they are streamed, they come out like this:
"Process_start_ts": 1578600031762,
"Process_end_ts": 1579268248183,
"created_ts": 1579268247984,
"updated_ts": {
"long": 1579268248182
}
}
The last column is rendered with this sub-element, though the other 3 are not. (This will present problems for the consumer.)
The only thing I can see is that in the DB, that column alone has a default value of null.
Is there some way I can force this column to render in the message as the prior 3?
Try to flatten your message using Kafka Connect Transformations.
The configuration snippet below shows how to use Flatten to concatenate field names with the period . delimiter character (you have to add these lines to the connector config):
"transforms": "flatten",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": "."
As a result, your JSON message should look like this:
{
"Process_start_ts": 1578600031762,
"Process_end_ts": 1579268248183,
"created_ts": 1579268247984,
"updated_ts.long": 1579268248182
}
See example Flatten SMT for JSON.
I'm not sure created is any different than the first two. The value is still a long, only the key is all lower case. It's not clear how it would know what the default should be - are you using sure you're not using AvroConverter? If not, it's not clear what fields would have defaults
The updated time is nested like that, based on the Avro or structured JSON Kafka Connect specifications that say the type name is included as part of the record to explicitly denote the type of a nullable field

Migrating table with PutDatabaseRecord with different column name at the target table

I need to migrate the data from a db2 table to a mssql table but one column has a different name, but the same datatype.
Db2 table:
NROCTA,NUMRUT,DIASMORA2
MSSQL table:
NROCTA,NUMRUT,DIAMORAS
As you see DIAMORAS is different.
Im using the following flow:
ExecuteSQL -> SplitAvro -> PutDatabaseRecord
In PutDataBaseRecord I have as RecordReader an AvroReader configured in this way:
Schema Acesss Strategy: Use Embedded Avro Schema.
Schema Text: ${avro.schema}
The flow just insert the two first columns.¿How I can do the mapping between DIASMORA2 and DIAMORAS columns ?
Thanks in advance!
First thing, you probably don't need SplitAvro in your flow at all, unless there's some logical subset of rows that you are trying to send as individual transactions.
For the column name change, use UpdateRecord and set the field /DIASMORAS to the record path /DIASMORA2, and change the name of the field in the AvroRecordSetWriter's schema from DIASMORA2 to DIASMORAS.
That last part is a little trickier since you are using the embedded schema in your AvroReader. If the schema will always be the same, you can stop the UpdateRecord processor and put in an ExtractAvroMetadata processor to extract the avro.schema attribute. That will put the embedded schema in the flowfile's avro.schema attribute.
Then before you start UpdateRecord, start the ExecuteSQL and ExtractAvroMetadata processors, then inspect a flow file in the queue to copy the schema out of the avro.schema attribute. Then in your AvroRecordSetWriter in ConvertRecord, instead of Inheriting the schema, you can choose to Use Schema Text, then paste in the schema from the attribute, changing DIASMORA2 to DIASMORAS. This approach puts values from the DIASMORA2 field into the DIASMORAS field, but since DIASMORA2 is not in the output schema, it is ignored, thereby effectively renaming the field (although under the hood it is a copy-and-remove).

Building an index with multiple tables in ElasticSearch/Logstash 7.0

I have 20 tables in Oracle, all of them contain (among others) the following columns: id, name, description and notes. I would like the user to enter a text, the text to be searched in either name, description and/or notes of all the tables, and the result to return which table(s) and id(s) has the text.
In the Logstash 7.0 configuration file, do I need to define one jdbc input for each table? Or should the input be a single select with an union of all the tables?
So my answer for the above mentioned question is combine all tables info in single json.Then index it you can solve the problem in an easier way.

Hive: How to have a derived column that has stores the sentiment value from the sentiment analysis API

Here's the scenario:
Say you have a Hive Table that stores twitter data.
Say it has 5 columns. One column being the Text Data.
Now How do you add a 6th column that stores the sentiment value from the Sentiment Analysis of the twitter Text data. I plan to use the Sentiment Analysis API like Sentiment140 or viralheat.
I would appreciate any tips on how to implement the "derived" column in Hive.
Thanks.
Unfortunately, while the Hive API lets you add a new column to your table (using ALTER TABLE foo ADD COLUMNS (bar binary)), those new columns will be NULL and cannot be populated. The only way to add data to these columns is to clear the table's rows and load data from a new file, this new file having that new column's data.
To answer your question: You can't, in Hive. To do what you propose, you would have to have a file with 6 columns, the 6th already containing the sentiment analysis data. This could then be loaded into your HDFS, and queried using Hive.
EDIT: Just tried an example where I exported the table as a .csv after adding the new column (see above), and popped that into M$ Excel where I was able to perform functions on the table values. After adding functions, I just saved and uploaded the .csv, and rebuilt the table from it. Not sure if this is helpful to you specifically (since it's not likely that sentiment analysis can be done in Excel), but may be of use to anyone else just wanting to have computed columns in Hive.
References:
https://cwiki.apache.org/Hive/gettingstarted.html#GettingStarted-DDLOperations
http://comments.gmane.org/gmane.comp.java.hadoop.hive.user/6665
You can do this in two steps without a separate table. Steps:
Alter the original table to add the required column
Do an "overwrite table select" of all columns + your computed column from the original table into the original table.
Caveat: This has not been tested on a clustered installation.

Resources