How incorporate Storm component-specific configuration data? - apache-storm

I have a Storm topology containing spouts/bolts.
There is some configuration data that is specific to a particular spout and
also a particular bolt that I would like to use (i.e. read from a config file)
so that it is not hard coded. Examples of config data is a filename that the
spout is to read from and a filename that a bolt is to write to.
I think config data is passed into the open and prepare methods.
How can I incorporate the component-specific data from a configuration file?

There are at least two ways to do this:
1) Include application-specific configuration in Storm config, which will be available during IBolt.prepare() ISpout.open() method calls. One strategy you could use is to have application prefix for the configuration keys, avoiding potential conflicts.
Config conf = new backtype.storm.Config();
// Storm-specific configuration
// ...
// ..
// .
conf.put("my.application.configuration.foo", "foo");
conf.put("my.application.configuration.bar", "foo");
StormSubmitter.submitTopology(topologyName, conf, topology);
2) Include component configuration during Spout/Bolt constructor.
Properties properties = new java.util.Properties();
properties.load(new FileReader("config-file"));
BaseComponent bolt = new MyBoltImpl(properties);

Related

Spring integration SFTP - issue with filters and number of messages emits

I started using spring integration SFTP and I have some questions.
Filters not working. I have example configuration:
Sftp.inboundAdapter(ftpFileSessionFactory())
.preserveTimestamp(true)
.deleteRemoteFiles(false)
.remoteDirectory(integrationProperties.getRemoteDirectory())
.filter(sftpFileListFilter()) // doesn't work
.patternFilter("*.xlsx") // doesn't work
And my ChainFileListFilter:
private ChainFileListFilter<ChannelSftp.LsEntry> sftpFileListFilter() {
ChainFileListFilter<ChannelSftp.LsEntry> chainFileListFilter = new ChainFileListFilter<>();
chainFileListFilter.addFilter(new SftpPersistentAcceptOnceFileListFilter(metadataStore(), "INT"));
chainFileListFilter.addFilter(new SftpSimplePatternFileListFilter("*.xlsx"));
return chainFileListFilter;
}
If I understand correctly, only the XLSX file should be saved in the local directory. If yes it doesn't work with this configuration. Am I doing something wrong or misunderstood this?
How I can configure SFTP that each downloaded file emit message? I see in the doc two params max-messages-per-poll and max-fetch-size, but I don't know how to set it up so that every file emits a message. I would like to sync files once every 24 hours and produce batch job queue. Maybe there is a workaround?
Is there built-in filter which allow me fetch only files with changed content? The best solution would be to check the checksums of the files.
I will be grateful for your help and explanations.
You cannot combine filter() and patternFilter(). Only one of them can be used: the last one overrides whatever you used before. In other words: or filter() or patternFilter() - not both. By default the logic is like this:
public SftpInboundChannelAdapterSpec patternFilter(String pattern) {
return filter(composeFilters(new SftpSimplePatternFileListFilter(pattern)));
}
private CompositeFileListFilter<ChannelSftp.LsEntry> composeFilters(FileListFilter<ChannelSftp.LsEntry>
fileListFilter) {
CompositeFileListFilter<ChannelSftp.LsEntry> compositeFileListFilter = new CompositeFileListFilter<>();
compositeFileListFilter.addFilters(fileListFilter,
new SftpPersistentAcceptOnceFileListFilter(new SimpleMetadataStore(), "sftpMessageSource"));
return compositeFileListFilter;
}
So, technically you don't need your custom one, if you don't use external persistent MetadataStore. But if you do, think about flipping SftpSimplePatternFileListFilter with SftpPersistentAcceptOnceFileListFilter. Since it is better to check for the pattern before storing the file into MetadataStore.
It is the fact that every synched remote file, passed those filters, is stored into local dir and the message for that local file is emitted immediately when the poller does a request.
The maxFetchSize plays the role when we load remote files into a local dir. The maxMessagesPerPoll is used from the poller, but those are already built from the local files. The message is emitted per local file, not as a batch for all of them. That's not what messaging is designed for.
Please, share more info what does not work with files. The SftpPersistentAcceptOnceFileListFilter checks not only file name, but also mtime of the file. So, that it not about any checksum, but more last modified timestamp of the file.

How to read multiple files, process and write separately using spring batch

I want to read multiple files, name*.txt and process them.
For that I am using MultiResourceItemReader.
It is reading all files and process and write at one time only. I want to read multiple files seperately, process and write to them.
The code:
#Bean
public MultiResourceItemReader<POJO> multiResourceItemReader() {
MultiResourceItemReader<POJO> resourceItemReader = new MultiResourceItemReader<POJO>();
ClassLoader cl = this.getClass().getClassLoader();
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(cl);
Resource[] resources = resolver.getResources("file:" + filePath );
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
That's how the MultiResourceItemReader is designed to work. In your case, you can create a job instance per file.
There are many advantages of making one thing do one thing and do it well, one of them in your use case is restartability: If one of the jobs fail, you only restart the failed one.

Consuming properties file from Spring Config Server

It is possible to consume the values of the properties file being served using Spring Config server.
Here's my current implementation to consume the values inside a file
Parameters params = new Parameters();
FileBasedConfigurationBuilder<PropertiesConfiguration> builder = new FileBasedConfigurationBuilder<PropertiesConfiguration>(
PropertiesConfiguration.class).configure(params.fileBased().setFile(new File("client-config.properties")));
PropertiesConfiguration config = builder.getConfiguration();
// Perform interpolation on all variables
PropertiesConfiguration extConfig =
(PropertiesConfiguration) config.interpolatedConfiguration();
So I was wondering if can use this kind of implementation but instead of getting a file, it retrieves the values from the Spring Config Server

How to simulate hdfs operations using spring data

I'm new to spring data-hadoop and would like to ask one general question. I have files in different format and would like to extract the useful content with Apache Tika and store as text files in HDFS. I've gone through the reference documentation of spring data-hadoop(http://docs.spring.io/spring-hadoop/docs/2.0.0.RELEASE/reference/html/store.html) but didn't understand how to do it. And I didn't find any other useful resources for this.
Is there any sample projects or sources for writing data to HDFS using spring data-hadoop ?
From Risberg's comment one useful example :-
https://github.com/trisberg/springone-2015/tree/master/boot-ingest
Another code snippet with TextFileWriter implementation of DataWriter interface :-
//build naming strategy
ChainedFileNamingStrategy namingStrategy =
new ChainedFileNamingStrategy(
Arrays.asList(new FileNamingStrategy[] {
new StaticFileNamingStrategy("document"),
new UuidFileNamingStrategy(someUUID),
new StaticFileNamingStrategy("txt", ".") }));
//set the naming strategy
textFileWriter.setFileNamingStrategy(namingStrategy);
textFileWriter.write("this is a test content");
//flush and close the writer
textFileWriter.flush();
textFileWriter.close();

How do I get Lazybones to create Multi Modular Java EE 7 Gradle Projects?

This is my repository in github: https://github.com/joedayz/lazybones-templates/
I used processTemplates according with the documentation
processTemplates 'build.gradle', props
processTemplates 'gradle.properties', props
processTemplates 'src/main/java/*.java', props
processTemplates 'settings.gradle', props
I request the user this information:
props.project_megaproceso = ask("Define value for 'megaproceso' [megaproceso]: ", "megaproceso", "megaproceso")
props.project_macroproceso = ask("Define value for 'macroproceso' [macroproceso]: ", "macroproceso", "macroproceso")
props.project_proceso = ask("Define value for 'proceso' [proceso]: ", "proceso", "proceso")
megaproceso2, macroproceso, proceso are directories or part of file names in my template.
How do I change the names of the unpacked directories and files? The code is in my github.
The post-install scripts for Lazybones currently have full access to both the standard JDK classes and the Apache Commons IO library, specifically to aid with file manipulation.
In this specific case, you can either use File.renameTo() or FileUtils.moveFile/Directory(). For example:
def prevPath = new File(projectDir, "megaproceso2-macroproceso-proceso.ear")
prevPath.renameTo(new File(
projectDir,
"${props.megaproceso}-${props.macroproceso}-${props.processo}.ear"))
The projectDir variable is one of several properties injected into the post-install script. You can find a list of them in the Template Developers Guide.
I think the main advantage of FileUtils.moveFile() is that it works even if you're moving files across devices, but that's not necessary here. Also note that you have to explicitly import the classes from Commons IO if you want to use them.

Resources