How to read sqoop options in PutTransformer class - sqoop

we have implemented PutTransformer class for our use case. We want to use bucketting concept in ROW KEY ID, so we want to pass the number of Buckets from out side which we will be configured in a config file. Is it possible to read sqoop options in PutTransformer class.
I am passing customized sqoop put transformer class using the option "sqoop.hbase.insert.put.transformer.class".
Any idea on this.

The PutTransformer class is currently not exposing Hadoop's configuration object. I would suggest to file a new feature request on Sqoop JIRA to add such capability!

Related

How to list resolvedDataSources from AbstractRoutingDataSource?

I implemented Dynamic DataSource Routing using Spring Boot (JavaConfig) to add and switch new DataSources in runtime.
I implemented AbstractRoutingDataSource and I need access to all resolvedDataSources that is a private property. How can I do it?
I actually don't know why that field has not been made protected to let implementing classes access the data sources set. Regarding your questions two options come into my mind.
Option 1:
Copy the code of AbstractRoutingDataSource into a class of your own. Then you can expose the resolvedDataSources simply by a getter. This should work as long as the configuration relies on the interface AbstractDataSource and not AbstractRoutingDataSource.
Option 2
Pick the brute force way by accessing the field via Reflection API

Configurable Component Scan

Is there a way to make the component scan configurable externally or through an intermediate resolver class? My requirement is that a common library should include one or more of other smaller facilities (each having their own controller, services etc.) depending on whether those are "configured" or needed - e.g. in application properties.
The closest I can see a possibility of designing this is to declare a #Configuration class in the common library and keep it in the component scan class path (always). In this class I need some way to say that the following are the allowed scan paths (based on how downstream projects have configured their application properties).
Seems like TypeFilter custom implementation should do it. But how do I read application properties from inside the type filter implementation (annotation takes only the .class, so Spring must be initializing it.
Any other ways? Thanks!
Regards,
Arnab.
This document describes how to create your own Auto-Configuration. It allows you to read properties and utilize several variations of #Conditional annotation.

Spring xd Gemfire sink key-class and value-class parameters

Is there any way to use key-class and value-class parameters for the Gemfire sink in Spring xd?
Regarding to documentation i can use only keyExpression but nothing about its class type. Same for the key-class.
I have such command for the Gemfire,
put --key-class = java.lang.String --value-class = Employee --key = ('id': '998') --value = ('id': 186, 'firstName': 'James', 'lastName': 'Goslinga') --region = replicated2
So i use --key-class and --value-class parameters in Gemfire.
But i cannot use them from Spring xd since there is only keyExpression parameter in Gemfire Sink.
Any idea to solve?
As far as I know the syntax above is not supported by native GemFire. So you can't do it out of the box with Spring XD. The syntax looks vaguely SQL-like. Are you using Gemfire XD? Is this something you wrote yourself?
The gemfire sink uses spring-integration-gemfire, allowing you to declare the keyExpression using SpEL. The value, using the gemfire sink, is always the payload. The SI gemfire outbound adapter wraps Region.put(key, value). The GemFire API supports typing via generics, i.e. Region<K,V> but this is not enforced in this case. GemFire RegionFactory allows keyConstraint and valueConstraint attributes to constrain types but this is part of the Region configuration which is external to Spring XD. Furthermore, none of this addresses the data binding in your example, e.g.,
Person p = ('id': 186, 'firstName': 'James', 'lastName': 'Goslinga')
This capability would require a custom sink module. If your command can be executed as a shell script, you might be able to use a shell sink to invoke it.
Thank you for your answer,
Maybe basically i can explain my problem in this way.
if i write following command to gemfire console i can create new entry in region which contains object of Employee class.
put --key-class=java.lang.String --value-class=Employee --key=('id':'998') --value=('id':186,'firstName':'James','lastName':'Goslinga') --region=replicated2
The think that i want to do is i will send data from spring-xd. And i will have a new object of Employee class in Gemfire.
If i create such stream which will get data from rabbit MQ and send it to gemfire.
stream create --name reference-data-import --definition "rabbit --outputType=text/plain | gemfire-json-server --host=MC0WJ1BC --regionName=region10 --keyExpression=payload.getField('id')" --deploy
I can see that data in this type of "com.gemstone.gemfire.pdx.internal.PdxInstanceImpl".
Regarding to spring-xd documentation i can use such parametter outputType=application/x-java-object;type=com.bar.Foo but i never managed to work it out even though i deploy my class.
if i can see a simple working example it will be great for me.

Assign ArrayList from the data in propeties file

this is my property file.
REDCA_IF_00001=com.sds.redca.biz.svc.RedCAIF00001SVC
REDCA_IF_00002=com.sds.redca.biz.svc.RedCAIF00002SVC
REDCA_IF_00003=com.sds.redca.biz.svc.RedCAIF00003SVC
REDCA_IF_00004=com.sds.redca.biz.svc.RedCAIF00004SVC
and I want to these values into hashmap in my spring context file.
How can I achieve this?
Does it have to be a HashMap or any kind of Map would be fine?
Because you can define that as a java.util.Properties instance (Spring has great support for properties loading), which already implements Map (it actually extends from Hashtable).

Hadoop - How to switch from implementing the writable interface to use an Avro object?

I’m using Hadoop to convert JSONs into CSV files to access them with Hive.
At the moment the Mapper is filling an own data structure parsing the JSONs with JSON-Smart. Then the reducer is reading out that object and is writing it to a file, separated by commas.
For making this faster I already implemented the writable interface in the data structure...
Now I want to use Avro for the data structure object to have more flexibility and performance. How could I change my classes to make them exchange an Avro object instead of a writable?
Hadoop offers a pluggable serialization mechanism via the SerializationFactory.
By default, Hadoop uses the WritableSerialization class to handle the deserialization of classes which implement the Writable interface, but you can register custom serializers that implement the Serialization interface by setting the Hadoop configuration property io.serializations (a CSV list of classes that implement the Serialization interface).
Avro has an implementation of the Serialization interface in the AvroSerialization class - so this would be the class you configure in the io.serializations property.
Avro actually has a whole bunch of helper classes which help you write Map / Reduce jobs to use Avro as input / output - there's some examples in the source (Git copy)
I can't seem to find any good documentation for Avro & Map Reduce at the moment, but i'm sure there are some other good examples out there.

Resources