How to send updated data from java program to NiFi? - apache-nifi

I have micro-services running and when a web user update data in DB using micro-service end-points, I want to send updated data to NiFi also. This data contains updated list of names, deleted names, edited names etc. How to do it? which processor I have to use from NiFi side?
I am new to NiFi. I am yet to try anything from my side. I am reading google documents which can guide me.
No source code is written. I want to start it. But I will share here once I write it.
Expected result is NiFi should get updated list of names and NiFi should refer updated list for generating required alerts/triggers etc.

You can actually do it in lots of ways. MQ, Kafka, HTTP(usinh ListenHTTP). Just deploy the relevant one to you and configure it, even listen to a directory(using ListFile & FetchFile).
You can connect NiFi to pretty much everything, so just choose how you want to connect your micro services to NiFi.

Related

Apache NiFi deployment in production

I am new to Apache NiFi. From the documentation, I could understand that NiFi is a framework with a drag and drop UI that helps to build data pipelines. This NiFi flow can be exported into a template which is then saved in Git.
Are we supposed to import this template into production NiFi server? Once imported, are we supposed to manually start all the processors via the UI? Please help.
Templates are just example flows to share with people and are not really meant for deployment. Please take a look at NiFi Registry and the concept of versioned flows.
https://nifi.apache.org/registry.html
https://www.youtube.com/watch?v=X_qhRVChjZY&feature=youtu.be
https://bryanbende.com/development/2018/01/19/apache-nifi-how-do-i-deploy-my-flow
Template is xml representation of your process structure (processors, processor groups, controllers, relationships etc). You can upload it to another nifi server for deploy. You also can start all the processors by nifi-api.

Store external data into NiFi Registry

Is that possible to store external data (not NiFi flow) into NiFi Registry using REST API?
https://nifi.apache.org/docs/nifi-registry-docs/index.html
As i know, NiFi Registry designed for versioning NiFi flow. But i want to know whether it is capable of storing other data into NiFi registry and retrieve it based on versions.
As of today, it is not currently possible to store data/objects in NiFi Registry other than a NiFi Flow and its configuration (component properties, default variable values, controller services, etc).
There have been discussions about extending NiFi Registry’s storage capabilities to include other items. Often discussed is NiFi extensions, such as NAR bundles which are the archive format for components such as custom processors. This would allow custom components to be versioned in the same place as a flow and downloaded at runtime based on a flow definition rather than pre-installed on a NiFi/MiNiFi instances.
Today though, only Flows are supported. Other data or components has to be stored/versioned somewhere else.
If you have data you want to associate with a specific flow version snapshot, here is a suggestion: You could store that data externally in another service and use the flow version snapshot comment field to store a URI/link to where the associated data resides. If you use a machine parsable format such as JSON in the snapshot comment to store this URI metadata, an automated process could retrieve this data from an external system by reading this field when doing an operation involving a specific flow snapshot version.

How to implement search functionalities?

Currently we are making a memory cache mechanism implementation in search functionalities.
Now the data is getting very large and we are not able to handle it in memory. Also we are getting more input source from different systems (oracle, flat file, and git).
Can you please share me how can we achieve this process?
We thought ES will help on this. But how can we provide input if any changes happen in end source? (batch processing will NOT help)
Hadoop - Not that level of data we are NOT handling
Also share your thoughts.
we are getting more input source from different systems (oracle, flat file, and git)
I assume that's why you tagged Kafka? It'll work, but you bring up a valid point
But how can we provide input if any changes happen...?
For plain text, or Git events, you'll obviously need to alter some parser engine and restart the job to get extra data in the message schema.
For Oracle, the GoldenGate product will publish table column changes, and Kafka Connect can recognize those events and update the payload accordingly.
If all you care about is searching things, plenty of tools exist, but you mention Elasticsearch, so using Filebeat works for plaintext, and Logstash can work with various other types of input sources. If you have Kafka, then feed events to Kafka, let Logstash or Kafka Connect update ES

Delete data in source once data has been pushed to kafka server

I'm using confluent platform 3.3 to pull data from Oracle database. Once the data has been pushed to kafka server the retrieved data should be deleted in the database.
Are there any way to do it ? Please suggest.
There is no default way of doing this with Kafka.
How are you reading your data from the database, using Kafka Connect, or with custom code that you wrote?
If the latter is the case I'd suggest implementing the delete in your code, collect ids once Kafka has confirmed send and batch delete regularly.
Alternatively you could write a small job that reads your Kafka topic with a different consumer group than your actual target system and deletes based on the records it pulls from the topic. If you run this job every few minutes, hours,... you can keep up with the sent data as well.

Can log data exposed as a web service be input to Elasticssearch?

I have a number of applications that are running in different data centers, developed and maintained by different vendors. Each application has a web service that exposes relevant log data (audit data, security data, data related to cost calculations, performance data, ...) consolidated for the application.
My task is to get data from each system into a setup of Elasticsearch, Kibana and Logstash so I can create business reports or just view data the way I want to.
Assume I have a JBoss application server for integration to these "expose log" services, what is the best way to feed Elasticssearch? Some Logstash plugin that calls each service? JBoss uses some Logstash plugin? Or some other way?
The best way is to set up the logstash shipper on the server where the logs are created.
This will then ship them to a Redis server.
Another logstash instance will then pull the data from Redis, and index it, and ship it to Elasticsearch.
Kibana will then provide an interface to Elasticsearch, which is where the goodness happens.
I wrote a post on how to install Logstash a little while ago. Versions may have been updated since, but its still valid
http://www.nightbluefruit.com/blog/2013/09/how-to-install-and-setup-logstash/
Do your JBoss application server writes logs to file?
In my experiences, My JBoss application(in multiple server) writes the logs to the file. Then I use logstash to read the logs file and ship all the logs to a central server. You can refer to here.
So, what can you do is setup a logstash shipper in different data center.
If you do not have permission to do this, maybe you want to write a program to get the logs from different web services and then save them to a file. Then setup the logstash to read the logs file. So far, logstash do not have any plugin that can call web services.

Resources