KStream disable local state strore - apache-kafka-streams

I am using Kafka Stream with Spring cloud Stream. Our application is stateful as it does some aggregation. When I run the app, I see the below ERROR message on the console.
I am running this app in a Remote Desktop Windows machine.
Failed to change permissions for the directory C:\Users\andy\project\tmp
Failed to change permissions for the directory C:\Users\andy\project\tmp\my-local-local
But when the same code is deployed in a Linux box, I don't see the error. So I assume it an access issue.
As per our company policy, we do not have access to the change a folder's permission and hence chmod 777 did not work as well.
My question is, is there a way to disable creating the state store locally and instead use the Kafka change log topic to maintain the state. I understand this is not ideal, but it only for my local development. TIA.

You could try to use in-memory state stores instead of the default persistent state stores.
You can do that by providing a state store supplier for in-memory state stores to your stateful operations:
StateStoreSupplier storeSupplier = Stores.inMemoryKeyValueStore("in-mem");
StreamsBuilder builder = stream("input-topic")
.groupByKey()
.count(Materialized.as(storeSupplier))
From Apache Kafka 3.2 onwards, you can set the store type in the stateful operation without the need for a state store supplier:
StreamsBuilder builder = stream("input-topic")
.groupByKey()
.count(Materialized.withStoreType(StoreType.IN_MEMORY))
Or you can set the state store type globally with:
props.put(StreamsConfig.DEFAULT_DSL_STORE_CONFIG, StreamsConfig.IN_MEMORY);

Related

How to update Abp Permission Cache for each application

I have multiple services (Administration.Api, Project.Api)
Administration service is managing permissions (create,update).
But i have a problem about caching, when i update permissions through Administration.Api, Project api's cache Permission grant don't change immediately(it's grant change after 20minutes, when cach removed automatically)
I want to change all permission cache under different cache prefixes immediately. How can i fix this?
You really need a true distributed cache service (like Redis) to do this properly. That way a cache-dump for one affects all services.
There are other solutions you could try, but really they are just bandaids, and more work with potential other sideeffects.
use a message bus to notify all services of the permission change and to dump their in-memory cache
use a new shared db table to add a new row with "LastUpdated". The permission service would need to write the updated time when permissions changed. Each service would need to query this table to check for a newer updated time (on each request), and dump in-memory cache if exists.
You can use AbpDistributedCacheOptions to change default cache settings and add prefix to your application for caching.
Configure<AbpDistributedCacheOptions>(options =>
{
options.GlobalCacheEntryOptions = new DistributedCacheEntryOptions()
{
AbsoluteExpiration = //20 mins default
};
options.KeyPrefix = "MyApp1";
});
You can also extend override permission management providers, such as RolePermissionManagementProvider and handle cache invalidation.
Docs about permission management providers: https://docs.abp.io/en/abp/latest/Modules/Permission-Management#permission-management-providers
One application has ONE ABP default cache (we are not talking about global caches like Redis now). So to have a single control of different applications caches, you can use RabbitMQ: you have a RabbitMQ queue in each application, named something like "abp-cache[appName]". In RabbitMQ receiver, you send messages to EACH of these queues. In the RabbitMQ receiver of the specific app, you handle the received message. I've already implemented this mechanism to update ABP permission cache for all my apps. Everything is easily wrapped inside Extensions Nuget package.

Ways to Trigger a Databricks Notebook

Can someone let me know the possible ways to trigger a Databricks notebook? My preferred method is via Azure Data Factory, but my company is sadly reluctant to deploy ADF at this present moment in time.
Basically, I would like my Databricks notebook to be triggered when a blob is uploaded to Blob store. Is that possible?
You can try Auto Loader: Auto Loader supports two modes for detecting new files: directory listing and file notification.
Directory listing: Auto Loader identifies new files by listing the input directory. Directory listing mode allows you to quickly start Auto Loader streams without any permission configurations other than access to your data on cloud storage. In Databricks Runtime 9.1 and above, Auto Loader can automatically detect whether files are arriving with lexical ordering to your cloud storage and significantly reduce the amount of API calls it needs to make to detect new files.
File notification: Auto Loader can automatically set up a notification service and queue service that subscribe to file events from the input directory. File notification mode is more performant and scalable for large input directories or a high volume of files but requires additional cloud permissions for set up.
Refer - https://learn.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loader

Store infrequently changing info in Spring App

I am working On a Microservice (Spring boot) that require to store some static information that infrequently changes (once per quarter). The data (below) is about the company reports that looks like
reportId#1: "frequency"="daily","to":"some email ids"
reportId#2: "frequency"="weekly", "to":"some emailids"
As you can see an entry in the data is basically a Report id, and associated attributes are frequency of reports and receiver's email id.
My question is.. What is the best place to store this information? I have some thoughts..and here are my views.
a) NoSQL DB like MongoDB seems to be a good option.. I can create a Collection and store it there and retrieve it once during app startup. But the I thought, whether creating a Collection just to store this static info is a good choice?
b) Redis seems to be another good option. I can create a template for above dataset and store it there. I can query the Redis based on the reportId to retrieve the frequency and senders list.
c) Store it in a file in the classpath and load at the app startup. The downside is that, I will have to redeploy the app with new changes in file whenever this report listing changes. I believe externalizing this information to either Mongo or Redis is a better option.
d) The app is running in the AWS..so I can even store this in a file in S3 bucket.
Would like to know your views?
Since the config will only change once a quarter, the overheard of a database is not required. You should consider Apache commons configuration. It will allow you to load config changes from files without the need for an application restart.
http://commons.apache.org/proper/commons-configuration///userguide/howto_reloading.html

Disabling/Pause database replication using ML-Gradle

I want to disable the Database Replication from the replica cluster in MarkLogic 8 using ML-Gradle. After updating the configurations, I also want to re-enable it.
There are tasks for enabling and disabling flexrep in ML Gradle. But I couldn't found any such thing for Database Replication. How can this be done?
ml-gradle uses the Management API to handle configuration changes. Database Replication is controlled by sending a PUT command to /manage/v2/databases/[id-or-name]/properties. Update your ml-config/databases/content-database.json file (example that does not include that property) to include database-replication, including replication-enabled: true.
To see what that object should look like, you can send a GET request to the properties endpoint.
You can create your own command to set replication-enabled - see https://github.com/rjrudin/ml-gradle/wiki/Writing-your-own-management-task
I'll also add a ticket for making official commands - e.g. mlEnableReplication and mlDisableReplication, with those defaulting to the content database, and allowing for any database to be specified.

Storing data in CouchbaseServer(without metadata)

I have created data in couchabse lite db and replicated it in couchabse server , but in replication unused data is also get created on server. is there any method to store pure data (without metadata) ?
is bucket shodowing is usefull for this problem ?
You can use couch base server 5.1 there the extra meta data is stored in XAttrs. And the document will not have meta data inside the document. If required this meta data can be found in the X- attributes.
For that you will need to set up the sync gateway in a manner where one sync gateway out of the cluster should have import_docs:"continuous" and all the sync gateway should have enable_shared_bucket_access:true.
By this change in the sync gateway configuration by using sync gateway 1.5 or 2.0 you will be able implement this functionality.
Also the good thing is, if the data is even changed directly on the server it will also flow to the devices.

Resources