Migrating table with PutDatabaseRecord with different column name at the target table - apache-nifi

I need to migrate the data from a db2 table to a mssql table but one column has a different name, but the same datatype.
Db2 table:
NROCTA,NUMRUT,DIASMORA2
MSSQL table:
NROCTA,NUMRUT,DIAMORAS
As you see DIAMORAS is different.
Im using the following flow:
ExecuteSQL -> SplitAvro -> PutDatabaseRecord
In PutDataBaseRecord I have as RecordReader an AvroReader configured in this way:
Schema Acesss Strategy: Use Embedded Avro Schema.
Schema Text: ${avro.schema}
The flow just insert the two first columns.¿How I can do the mapping between DIASMORA2 and DIAMORAS columns ?
Thanks in advance!

First thing, you probably don't need SplitAvro in your flow at all, unless there's some logical subset of rows that you are trying to send as individual transactions.
For the column name change, use UpdateRecord and set the field /DIASMORAS to the record path /DIASMORA2, and change the name of the field in the AvroRecordSetWriter's schema from DIASMORA2 to DIASMORAS.
That last part is a little trickier since you are using the embedded schema in your AvroReader. If the schema will always be the same, you can stop the UpdateRecord processor and put in an ExtractAvroMetadata processor to extract the avro.schema attribute. That will put the embedded schema in the flowfile's avro.schema attribute.
Then before you start UpdateRecord, start the ExecuteSQL and ExtractAvroMetadata processors, then inspect a flow file in the queue to copy the schema out of the avro.schema attribute. Then in your AvroRecordSetWriter in ConvertRecord, instead of Inheriting the schema, you can choose to Use Schema Text, then paste in the schema from the attribute, changing DIASMORA2 to DIASMORAS. This approach puts values from the DIASMORA2 field into the DIASMORAS field, but since DIASMORA2 is not in the output schema, it is ignored, thereby effectively renaming the field (although under the hood it is a copy-and-remove).

Related

Inject a list into SpringData to be used as a kind of virtual database table

I have a database table on which a sorted query needs to be done.
To do the sorting a join on another table is requiered. The problem is that this other table does not exist in the database because we read the required data on the services startup from a CSV file and keep it as an in-memory list.
Is it possible to somehow inject this list as a kind of virtual database into Spring Data? So that it could use this list to make the required join and sorting.
As far as I know, the only other options I have would be to create a real database table from this in-memory list or load the whole table and do the sorting in the service itself.
You can add a special order by expression through e.g. Spring Data Specification, but that is going to be very ugly. In HQL it looks like this:
case rootAlias.attribute when 'value1' then 1 when 'value2' then 2 ... else null end
which will return some integer value by which you can sort ascending or descending, based on the mapping you have.
Even if you have lots of values, I would rather recommend you don't do a join at all, and instead try to make this attribute of your main table sortable, so that you don't need this mapping. You can maybe create a trigger that maintains a column based on the mapping, which can be used for sorting directly. If you do all your changes through JPA/Hibernate, you could also use a #PreUpdate/#PrePersist listener to handle the maintenance of this column.

Passing parameter from different source into insert statement using Nifi

I'm still new in NiFi. What I want to achieve is to pass a parameter from a different source.
Scenario:
I have 2 datasource which is Json data and record id (from oracle function). I declared record id using extract text as "${recid}" and json string default is "$1" .
How to insert into table using sql statement insert into table1 (json,recid) value ('$1','${recid}')
After I run the processor. I'm not able to get both attribute into one insert statement.
Please help.
Nifi flowfile
Flowfile after mergecontent
you should merge these 2 flowfiles to make one.
Use mergeFlowfile processor with Attribute Strategy set to Keep All Unique Attributes
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.MergeContent/index.html
Take a look at LookupAttribute with a SimpleDatabaseLookupService. You can pass your JSON flow file into that, look up the recid into an attribute, then do the ExtractText -> ReplaceText to get it into SQL form.

Why does Kafka Connect treat timestamp columns differently?

I have a Kafka Connect configuration set up to pull data from DB2. I'm not using Avro, just the out-of-the-box json. Among the colums in the db are several timestamp columns, and when they are streamed, they come out like this:
"Process_start_ts": 1578600031762,
"Process_end_ts": 1579268248183,
"created_ts": 1579268247984,
"updated_ts": {
"long": 1579268248182
}
}
The last column is rendered with this sub-element, though the other 3 are not. (This will present problems for the consumer.)
The only thing I can see is that in the DB, that column alone has a default value of null.
Is there some way I can force this column to render in the message as the prior 3?
Try to flatten your message using Kafka Connect Transformations.
The configuration snippet below shows how to use Flatten to concatenate field names with the period . delimiter character (you have to add these lines to the connector config):
"transforms": "flatten",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": "."
As a result, your JSON message should look like this:
{
"Process_start_ts": 1578600031762,
"Process_end_ts": 1579268248183,
"created_ts": 1579268247984,
"updated_ts.long": 1579268248182
}
See example Flatten SMT for JSON.
I'm not sure created is any different than the first two. The value is still a long, only the key is all lower case. It's not clear how it would know what the default should be - are you using sure you're not using AvroConverter? If not, it's not clear what fields would have defaults
The updated time is nested like that, based on the Avro or structured JSON Kafka Connect specifications that say the type name is included as part of the record to explicitly denote the type of a nullable field

diff between ResourceSchema & Schema in pig

What's the difference between ResourceSchema & Schema in pig?
There is already Schema class provided, why does pig bother to add another Schema-akin class called ResourceSchema(it is almost like Schema API , it needs to set its ResourceFieldSchema's name and type , it also can has child ResourceSchema) for storage functions?
The API Docs backup #zsxwing's comment:
Schema - The Schema class encapsulates the notion of a schema for a relational operator. A schema is a list of columns that describe the output of a relational operator.
Each column in the relation is represented as a FieldSchema, a static class inside the Schema. A column by definition has an alias, a type and a possible schema (if the column is a bag or a tuple).
In addition, each column in the schema has a unique auto generated name used for tracking the lineage of the column in a sequence of statements. The lineage of the column is tracked using a map of the predecessors' columns to the operators that generate the predecessor columns.
The predecessor columns are the columns required in order to generate the column under consideration. Similarly, a reverse lookup of operators that generate the predecessor column to the predecessor column is maintained.
ResourceSchema - A represenation of a schema used to communicate with load and store functions. This is separate from Schema, which is an internal Pig representation of a schema.
So one of the main differences i can see from the API docs is that a Schema is able to track the input columns required to build it, where as ResourceSchema is just the schema definition of the field name, type (and optional sub-schema)

Altova Mapforce: Joining XML Input and conditional SQL Join using two tables

I'm trying to get the following done: Using Altova Mapforce, I use an XML file with schema as a source. I want to map it to exactly the same output, but only add data to one field.
The value of the field (it's Tax) is determined using a two table SQL join with a WHERE clause over both tables. The tables are joined using foreign keys, the relation is recognized by Mapforce.
The first field of the WHERE clause comes from the first table (header type table), the second and third field from the second tables (lines type tables).
However, I cannot seem to create the logical and correct equivalent of what I am describing here. I've tried it using complex AND constructions where it then inserts the one field I would need multiple times. I've tried WHERE clauses but they fail as they never supply both tables at the same time and there seems to be no way to use a pre-specified JOINing of two tables as a source. The WHERE clause then recognizes only the fields from the first table, not the second one.
Is there an example for this? Joining two (or more) tables, using WHERE to determine the exact row, then using a value from that row?
Best wishes.

Resources