How to import data into Apache Solr index from LDAP - jdbc

I have an LDAP-Server that contains a large set of user data and would like to import this into an Apache Solr index. The question is not about whether this is a good idea or not (as discussed here). I need this kind of architecture as one of our production systems depends on a Solr index of our ldap data.
I'm considering different options to do so, but I'm not sure which one should be preferred:
Option 1: Use the Apache Solr DataImportHandler:
This seems to be the most straight forward Solr way of doing so. Unfortunately there does not seem to be DataSource available that would work with LDAP.
I tried to combine the JdbcDataSource with the JDBC-LDAP-Bridge. In theory that might probably work but the driver looks quite dated (latest Version from 2007).
Another Option might be to write a custom LdapDataSource using some of the LDAP-Libraries for Java (probably Spring LDAP, directly via JNDI or something similar?).
Option 2: Build a custom Feeder:
Another option might be to write a standalone service/script that bridges between the two services. However that feels a bit like reinventing the wheel.
Option 3: Something I haven't thought of yet:
Maybe there are additional options here that I simply haven't discovered yet.

Solved it by writing a custom LDAP DataSource for the Solr DataImportHandler.
It's not as hard as it sounds. The JdbcDataSource can be used as a template for writing your custom DataSource, so basically you just have to rewrite that one Java-Class for the LDAP protocol.
For accessing the LDAP-Client there are numerous options, such as plain JNDI, UnboundID LDAP SDK, Apache LDAP API, OpenDJ LDAP SDK or OpenLDAP JLDAP (there are probably more but I only had a look at those).
I went for UnboundID LDAP due to its well documented API and full support for LDAPv3.
Afterwards it is just a matter of referencing the datasource from the data-config.xml.
A nice side-effect of this setup is, that you can use all the goodies that the Solr DataImportHandler provides while indexing the LDAP server (Entity Processors and Transformers). This makes it easy to map the data structure between LDAP and the Solr Index.

Related

Dynamic Camel route configuration at deployment time: Java DSL or XML DSL?

Let me preface this with the fact that I am still very new to Apache Camel. I'm still trying to understand how it all works, and what needs to be done (and HOW to do it) to achieve a particular effect.
I am trying to develop a Spring Boot application that will use Apache Camel to handle the transmission (and possibly also receipt) of data to/from a number of possible sources and destinations. The purpose of the application is to provide a means to produce/generate network traffic, at the network application level, that will be fed into another Spring Boot application - let's call this the target. We are trying to observe and measure the effects various network loads have on the target.
We would like to be able to transmit data via a number of protocols, including: ftp, http/s, file systems (nfs), various mail protocols (smtp, pop) and data streaming protocols for voice and video. There may be other protocols added at a later time. The data itself is irrelevant, we just need to be able to transmit data via various protocols with various loads.
These applications/services will be running in a containerized environment (Docker) that will be run within our local development and test environment, as well as possibly in a cloud environment, such as AWS. We have used Docker, Ansible, Terraform and are currently working towards using Kubernetes and Istio to manage the configuration, deployment, and operation of these applications.
We need to be able to provide specific configurations of Camel routes for particular deployments.
It would appear that the preferred method to configure Camel routes is via Java DSL, rather than XML DSL. The Camel documentation and nearly every other source of information I've found have a strong bias towards using Java DSL. Examples of XML DSL route configuration are far and few.
My initial impression is that going the Java DSL route (excuse the pun), would not work well with our need to be able to deploy a Camel application with a specific route configuration. It seems like you are required to have Java DSL defined route configurations hardwired into the code.
We think that it will be easier to provide a specific route configuration via an XML file that can be included in a deployment, hence why I've been trying to investigate and experiment with XML DSL. Perhaps we are mistaken in this regard.
My question to the community is: Considering what I've described above, can the Java DSL approach be used to meet the requirements as I've described them? Can we use Java DSL in a way that allows for dynamic route configuration? Keep in mind we would not be attempting to change configuration during operation, just in the course of performing a deployment.
If Java DSL could be used for this purpose, it would be very much appreciated if pointers to documentation, examples, etc. could be provided.
For your use cases you could use XML DSL also. Anyhow below book covers most aspects Camel development with examples. In this book authors describes XML DSL use for most of java DSL examples.
https://www.manning.com/books/camel-in-action-second-edition
In below github repository you can find the source code for all the examples listed in above book.
https://github.com/camelinaction/camelinaction2
Simple tutorial and github repository for Apache Camel using Spring boot.
https://www.baeldung.com/apache-camel-spring-boot
https://github.com/eugenp/tutorials/tree/master/spring-boot-modules/spring-boot-camel
Maven Plugin for build and deployment of spring boot container application into Kubernetes cluster
https://maven.fabric8.io/
In case if your company can afford some funding for your effort look at below link which provides commercial offerings around Camel.
https://camel.apache.org/manual/latest/commercial-camel-offerings.html
Thanks
Madhu Gupta
Our team has a few projects which use the Java DSL for building routes. In order to make them dynamic, there are control structures for iterating and setting endpoints based off configurations. That works for us because the routes are basically all the same, just with different sources and sinks.
If you could dynamically add/change the XML DSL files in a way that doesn't involve redeploying your application, that might be a viable route to follow. One might, for example, change the camel.springboot.xml-routes property to point to a folder which changes as needed.

Can ElasticSearch be used as a persistent store for Apache Ignite?

I want to know if there's a way to configure the datasource for Ignite as Elastic Search. I was browsing the web. But I did not find a solution.
I want to implement this integration for a Java application.
If I understand your idea correctly there's a way to do it. As far as I can see Elasticsearch supports SQL table-like data access and it's available through jdbc connection. From the Ignite's side we have 3rd party persistance, it uses jdbc to connect to an underlying store system. To be honest I haven't tested it but I suppose it should work.
Also I need mention that you can use GridGain WebConsole to generate simple Ignite project from existing jdbc connection. This functionality could be found on Configuration tab -> Create Cluster Configuration.

Are there any good alternatives to Apache Ignite as an In-Memory Data Grid used together with Spring as a distributed cache?

We have a solution which uses the Apache Ignite-provided In-Memory Data Grid as a distributed cache. For newer projects, we ended up using Spring, and as such we wished homogenize our software ecosystem and using Spring for the first solution as well. In addition, we do not use all the features of Ignite to excuse its use (discovery, caching).
Since we currently only use a limited subset of features from Ignite, we are basically looking for a self-managed application-level distributed cache solution (similar to what Ignite provides). This means that dedicated caching infrastructure like Redis, Memcached, etc. is not what we want.
I've researched the topic somewhat and found that there are some possible alternatives like:
Tayzgrid - Last update seems to be quite some time ago, not sure if still actively maintained
Druid - Still incubating, and I have also read that new releases being somewhat broken was not that uncommon
Hazelcast - Seems like the best choice given its maturity and the existence of Spring Data Hazelcast, though I am unsure what the level of support is here.
Has anyone has experience with integrating one of the above IMDGs (aside from Ignite) with Spring Cache? Any pointers in the right direction would be greatly appreciated.
You can use Redisson - Redis Java client with features of
In-Memory Data Grid. It also implements Spring Data support. Here is the documentation.
Hazelcast has official support for Spring Data Hazelcast and also this module has many users as now. I can also suggest you to have a look at the resources below:
Using Hazelcast with Spring Data
Getting Started with Microservices Using Hazelcast IMDG and Spring Boot

Store Spring security ACL to a nosql structure

I work on a project which uses Spring Security. I would like to use the ACL mechanism to manage security on domain object.
The problem is that my project uses Cassandra and elastic search, and so no sql database.
Is there a way to store spring security ACL into a nosql structure (cassandra) or indexer (elastic search) to avoid creating a relational database specifically for ACL ?
I don't think there is an implementation of nosql data structures for spring security domain ACL's. It should be simple if you want to implement one on your own. Start looking at MutableAclService and LookupStrategy interfaces.
I am currently looking for something that might already exist and came across this project on GitHub spring-acl-cassandra. (I've not tried this project yet.)
Yes, there is such a project now, specifically to connect Spring ACL to Cassandra:
https://github.com/RigasGrigoropoulos/spring-security-acl-cassandra

Spring Integration as embedded alternative to standalone ESB

Does anybody has an experience with Spring Integration project as embedded ESB?
I'm highly interesting in such use cases as:
Reading files from directory on schedule basis
Getting data from JDBC data source
Modularity and possibility to start/stop/redeploy module on the fly (e.g. one module can scan directory on schedule basis, another call query from jdbc data source etc.)
repeat/retry policy
UPDATE:
I found answers on all my questions except "Getting data from JDBC data source". Is it technically possible?
Remember, "ESB" is just a marketing term designed to sell more expensive software, it's not a magic bullet. You need to consider the specific jobs you need your software to do, and pick accordingly. If Spring Integration seems to fit the bill, I wouldn't be too concerned if it doesn't look much like an uber-expensive server installation.
The Spring Integration JDBC adapters are available in 2.0, and we just released GA last week. Here's the relevant section from the reference manual: http://static.springsource.org/spring-integration/docs/latest-ga/reference/htmlsingle/#jdbc
This link describes the FileSucker with Spring Integration. Read up on your Enterprise Integration patterns for more info I think.
I kinda think you need to do a bit more investigation your self, or do a couple of tries on some of your usecases. Then we can discuss whats good and bad
JDBC Adapters appear to be a work in progress.
Even if there is no specific adapter available, remember that Spring Integration is a thin wrapper around POJOs. You'll be able to access JDBC in any component e.g. your service activators.
See here for a solution based on a polling inbound channel adapter too.

Resources