Memory usage and number of open files in Kafka Stream application grows continuously - apache-kafka-streams

I'm using Kafka Streams to aggregate events produced by a high throughput service. The problem is that memory consumption continues to grow as well as the number of open files.
The following graphs show that number of open files and k8 used memory never drops, although jvm heap and off heap memory are stable: memory and open files
I'm using kafka version 3.1.0, and there is no iterator open on state stores.
In RocksDbMemoryConfig class the maximum number of open files is set to 100.
public class RocksDbMemoryConfig implements RocksDBConfigSetter {
private static final Cache LRUCache;
private static final WriteBufferManager writeBufferManager;
static {
LRUCache = new org.rocksdb.LRUCache(2500000000); //2.5G
writeBufferManager = new WriteBufferManager(100000000,LRUCache); //100M
}
#Override
public void setConfig(String storeName, Options options, Map<String, Object> configs) {
BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();
tableConfig
.setBlockSize(4000)
.setBlockCache(LRUCache); // ensure that all state stores will be using the same LRU Cache
options
.setTableFormatConfig(tableConfig)
.setWriteBufferManager(writeBufferManager) // ensure that all state stores will be using writeBufferManager
.setWriteBufferSize(1000000) //1MB
.setMaxWriteBufferNumber(1)
.setMaxBackgroundJobs(20)
.setMaxOpenFiles(100)
.setCompactionStyle(CompactionStyle.FIFO);
}
}
And in the topology, hopping window is used with 3 days duration, with one hour advance interval. The aggregated results are suppressed, so the records are outputted every hour:
val activityEventsByArticle: KGroup[ArticleViewedKey, ArticleViewedValue] = builder
.stream(SOURCE_TOPIC)(sourceSerDe)
.groupByKey(groupSerDe)
val viewsByArticle: KTable[Windowed[ArticleViewedKey], aggregator.AggregatedViews] = activityEventsByArticle
.windowedBy(
TimeWindows.ofSizeAndGrace(Duration.ofDays(3), Duration.ofMinutes(15)).advanceBy(Duration.ofHours(1))
)
.aggregate(aggregator.initializer())(aggregator.aggregationFn)(
Materialized
.as(WINDOWED_STATE_STORE)(serdes.ArticleViewedKeySerde, serdes.AggregateViewsSerde)
.withRetention(Duration.ofMinutes(3 * 24 * 60 + 15))
)
.suppress(
Suppressed
.untilWindowCloses(Suppressed.BufferConfig.unbounded())
.withName(SUPPRESSED_STATE_STORE)
)
viewsByArticle.toStream
.map((window, aggregateViews) => {
logAggregatedEvents(window, aggregateViews)
recordMapper(window, aggregateViews)
})
.to(SINK_TOPIC)(sinkSerDe)
}
Any suggestions would be highly appreciated.

Related

JCache Hazelcast embedded does not scale

Hello, Stackoverflow Community.
I have a Spring Boot application that uses Jcache with Hazelcast implementation as a cache Framework.
Each Hazelcast node has 5 caches with the size of 50000 elements each. There are 4 Hazelcast Instances that form a cluster.
The problem that I face is the following:
I have a very heavy call that reads data from all four caches. On the initial start, when all caches are yet empty, this call takes up to 600 seconds.
When there is one Hazelcast instance running and all 5 caches are filled with data, then this call happens relatively fast, it takes on average only 4 seconds.
When I start 2 Hazelcast instances and they form a cluster, then the response time gets worse, and the same call takes already 25 seconds on average.
And the more Hazelcast instances I add in a cluster, the longer the response time gets. Of course, I was expecting to see some worse delivery time when data is partitioned among Hazelcast nodes in a cluster. But I did not expect that just by adding one more hazelcast instance, the response time would get 6 - 7 times slower...
Please note, that for simplicity reasons and for testing purposes, I just start four Spring Boot Instances with each Hazelcast embedded node embedded in it on one machine. Therefore, such poor performance cannot be justified by network delays. I assume that this API call is so slow even with Hazelcast because much data needs to be serialized/deserialized when sent among Hazelcast cluster nodes. Please correct me if I am wrong.
The cache data is partitioned evenly among all nodes. I was thinking about adding near cache in order to reduce latency, however, according to the Hazelcast Documentation, the near cache is not available for Jcache Members. In my case, because of some project requirements, I am not able to switch to Jcache Clients to make use of Near Cache. Is there maybe some advice on how to reduce latency in such a scenario?
Thank you in advance.
DUMMY CODE SAMPLES TO DEMONSTRATE THE PROBLEM:
Hazelcast Config: stays default, nothing is changed
Caches:
private void createCaches() {
CacheConfiguration<?, ?> cacheConfig = new CacheConfig<>()
.setEvictionConfig(
new EvictionConfig()
.setEvictionPolicy(EvictionPolicy.LRU)
.setSize(150000)
.setMaxSizePolicy(MaxSizePolicy.ENTRY_COUNT)
)
.setBackupCount(5)
.setInMemoryFormat(InMemoryFormat.OBJECT)
.setManagementEnabled(true)
.setStatisticsEnabled(true);
cacheManager.createCache("books", cacheConfig);
cacheManager.createCache("bottles", cacheConfig);
cacheManager.createCache("chairs", cacheConfig);
cacheManager.createCache("tables", cacheConfig);
cacheManager.createCache("windows", cacheConfig);
}
Dummy Controller:
#GetMapping("/dummy_call")
public String getExampleObjects() { // simulates a situatation where one call needs to fetch data from multiple cached sources.
Instant start = Instant.now();
int i = 0;
while (i != 50000) {
exampleService.getBook(i);
exampleService.getBottle(i);
exampleService.getChair(i);
exampleService.getTable(i);
exampleService.getWindow(i);
i++;
}
Instant end = Instant.now();
return String.format("The heavy call took: %o seconds", Duration.between(start, end).getSeconds());
}
Dummy service:
#Service
public class ExampleService {
#CacheResult(cacheName = "books")
public ExampleBooks getBook(int i) {
try {
Thread.sleep(1); // just to simulate slow service here!
} catch (InterruptedException e) {
e.printStackTrace();
}
return new Book(Integer.toString(i), Integer.toString(i));
}
#CacheResult(cacheName = "bottles")
public ExampleMooks getBottle(int i) {
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
return new Bottle(Integer.toString(i), Integer.toString(i));
}
#CacheResult(cacheName = "chairs")
public ExamplePooks getChair(int i) {
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
return new Chair(Integer.toString(i), Integer.toString(i));
}
#CacheResult(cacheName = "tables")
public ExampleRooks getTable(int i) {
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
return new Table(Integer.toString(i), Integer.toString(i));
}
#CacheResult(cacheName = "windows")
public ExampleTooks getWindow(int i) {
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
return new Window(Integer.toString(i), Integer.toString(i));
}
}
If you do the math:
4s / 250 000 lookups is 0.016 ms per local lookup. This seems rather high, but let's take that.
When you add a single node then the data gets partitioned and half of the requests will be served from the other node. If you add 2 more nodes (4 total) then 25 % of the requests will be served locally and 75 % will be served over network. This should explain why the response time grows when you add more nodes.
Even simple ping on localhost takes twice or more time. On a real network the read latency we see in benchmarks is 0.3-0.4 ms per read call. This makes:
0.25 * 250k *0.016 + 0.75 * 250k * 0.3 = ~57 s
You simply won't be able to make so many calls serially over the network (even local one), you need to either
parallelize the calls - use javax.cache.Cache#getAll to reduce the number of calls
you can try enabling reading local backups via com.hazelcast.config.MapConfig#setReadBackupData so there is less requests over the network.
The read backup data feature is only available for IMap, so you would need to use Spring caching with hazelcast-spring module and its com.hazelcast.spring.cache.HazelcastCacheManager:
#Bean
HazelcastCacheManager cacheManager(HazelcastInstance hazelcastInstance) {
return new HazelcastCacheManager(hazelcastInstance);
}
See documentation for more details.

Spring data Page/Pageable returns duplicates on large data sets?

When operating on large data sets, Spring Data presents two abstractions: Stream and Page. We've been using Stream for awhile and had no issues, but recently I wanted to try a paginated approach and ran into a reliability issue.
Consider the following:
#Entity
public class MyData {
}
public interface MyDataRepository extends JpaRepository<MyData, UUID> {
}
#Component
public class MyDataService {
private MyDataRepository repository;
// Bridge between a Reactive service and a transactional / non-reactive database call
#Transactional
public void getAllMyData(final FluxSink<MyData> sink) {
final Pageable firstPage = PageRequest.of(0, 500);
Page<MyData> page = repository.findAll(firstPage);
while (page != null && page.hasContent()) {
page.getContent().forEach(sink::next);
if (page.hasNext()) {
page = repository.findAll(page.nextPageable());
}
else {
page = null;
}
}
sink.complete();
}
}
Using two Postgres 9.5 databases, the source database had close to 100,000 rows while the destination was empty. The example code was then used to copy from the source to the destination. At the end I would find that my destination database had far smaller row count than the source.
Run as a springboot app
The flux doing the copy was using 4-6 threads in parallel (for speed)
Total run time of at least an hour (max was 2 hours)
As it turns out, I was eventually processing the same rows multiple times (and missing other rows as a result). This lead me to discovering a fix that others had already ran into, where you should provide a Sort.by("") argument.
After changing the service to use:
// Make our pages sorted by the PKEY
final Pageable firstPage = PageRequest.of(0, 500, Sort.by("id"));
I found that while it GREATLY helped, I would still process multiple rows (from losing about half the rows to only seeing ~12 duplicates). When I use a Stream instead, I have no issues.
Does anyone have any explanation for what is going on? I don't seem to have any duplicates come through until the test has been running for at least 10-15min, which almost leads me to believe that there is some kind of session or other timeout happening (either in the client, or on the database) that causes the hiccups. But I'm really far out of my knowledge area for troubleshooting it further heh.

Kafka Streams - The state store may have migrated to another instance

I'm writing a basic application to test the Interactive Queries feature of Kafka Streams. Here is the code:
public static void main(String[] args) {
StreamsBuilder builder = new StreamsBuilder();
KeyValueBytesStoreSupplier waypointsStoreSupplier = Stores.persistentKeyValueStore("test-store");
StoreBuilder waypointsStoreBuilder = Stores.keyValueStoreBuilder(waypointsStoreSupplier, Serdes.ByteArray(), Serdes.Integer());
final KStream<byte[], byte[]> waypointsStream = builder.stream("sample1");
final KStream<byte[], TruckDriverWaypoint> waypointsDeserialized = waypointsStream
.mapValues(CustomSerdes::deserializeTruckDriverWaypoint)
.filter((k,v) -> v.isPresent())
.mapValues(Optional::get);
waypointsDeserialized.groupByKey().aggregate(
() -> 1,
(aggKey, newWaypoint, aggValue) -> {
aggValue = aggValue + 1;
return aggValue;
}, Materialized.<byte[], Integer, KeyValueStore<Bytes, byte[]>>as("test-store").withKeySerde(Serdes.ByteArray()).withValueSerde(Serdes.Integer())
);
final KafkaStreams streams = new KafkaStreams(builder.build(), new StreamsConfig(createStreamsProperties()));
streams.cleanUp();
streams.start();
ReadOnlyKeyValueStore<byte[], Integer> keyValueStore = streams.store("test-store", QueryableStoreTypes.keyValueStore());
KeyValueIterator<byte[], Integer> range = keyValueStore.all();
while (range.hasNext()) {
KeyValue<byte[], Integer> next = range.next();
System.out.println(next.value);
}
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
protected static Properties createStreamsProperties() {
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "random167");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "client-id");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
streamsConfiguration.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, Serdes.Integer().getClass().getName());
//streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10000);
return streamsConfiguration;
}
So my problem is, every time I run this I get this same error:
Exception in thread "main" org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, test-store, may have migrated to another instance.
I'm running only 1 instance of the application, and the topic I'm consuming from has only 1 partition.
Any idea what I'm doing wrong ?
Looks like you have a race condition. From the kafka streams javadoc for KafkaStreams::start() it says:
Start the KafkaStreams instance by starting all its threads. This function is expected to be called only once during the life cycle of the client.
Because threads are started in the background, this method does not block.
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html
You're calling streams.store() immediately after streams.start(), but I'd wager that you're in a state where it hasn't initialized fully yet.
Since this is code appears to be just for testing, add a Thread.sleep(5000) or something in there and give it a go. (This is not a solution for production) Depending on your input rate into the topic, that'll probably give a bit of time for the store to start filling up with events so that your KeyValueIterator actually has something to process/print.
Probably not applicable to OP but might help others:
In trying to retrieve a KTable's store, make sure the the KTable's topic exists first or you'll get this exception.
I failed to call Storebuilder before consuming the store.
Typically this happens for two reasons:
The local KafkaStreams instance is not yet ready (i.e., not yet in
runtime state RUNNING, see Run-time Status Information) and thus its
local state stores cannot be queried yet. The local KafkaStreams
instance is ready (e.g. in runtime state RUNNING), but the particular
state store was just migrated to another instance behind the scenes.
This may notably happen during the startup phase of a distributed
application or when you are adding/removing application instances.
https://docs.confluent.io/platform/current/streams/faq.html#handling-invalidstatestoreexception-the-state-store-may-have-migrated-to-another-instance
The simplest approach is to guard against InvalidStateStoreException when calling KafkaStreams#store():
// Example: Wait until the store of type T is queryable. When it is, return a reference to the store.
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}

Why are connections to Azure Redis Cache so high?

I am using the Azure Redis Cache in a scenario of high load for a single machine querying the cache. This machine roughly gets and sets about 20 items per second. During daytime this increases, during nighttime this is less.
So far, things have been working fine. Today I realized that the metric of "Connected Clients" is extremely high, although I only have 1 client that just constantly Gets and Sets items. Here is a screenshot of the metric I mean:
My code looks like this:
public class RedisCache<TValue> : ICache<TValue>
{
private IDatabase cache;
private ConnectionMultiplexer connectionMultiplexer;
public RedisCache()
{
ConfigurationOptions config = new ConfigurationOptions();
config.EndPoints.Add(GlobalConfig.Instance.GetConfig("RedisCacheUrl"));
config.Password = GlobalConfig.Instance.GetConfig("RedisCachePassword");
config.ConnectRetry = int.MaxValue; // retry connection if broken
config.KeepAlive = 60; // keep connection alive (ping every minute)
config.Ssl = true;
config.SyncTimeout = 8000; // 8 seconds timeout for each get/set/remove operation
config.ConnectTimeout = 20000; // 20 seconds to connect to the cache
connectionMultiplexer = ConnectionMultiplexer.Connect(config);
cache = connectionMultiplexer.GetDatabase();
}
public virtual bool Add(string key, TValue item)
{
return cache.StringSet(key, RawSerializationHelper.Serialize(item));
}
I am not creating more than one instance of this class, so this is not the problem. Maybe I missunderstand the connections metric and what they really mean is the number of times I access the cache, however, it would not really make sense in my opinion. Any ideas, or anyone with a similar problem?
StackExchange.Redis had a race condition that could lead to leaked connections under some conditions. This has been fixed in build 1.0.333 or newer.
If you want to confirm this is the issue you are hitting, get a crash dump of your client application and look at the objects on the heap in a debugger. Look for a large number of StackExchange.Redis.ServerEndPoint objects.
Also, several users have had a bugs in their code that resulted in leaked connection objects. This is often because their code is trying to re-create the ConnectionMultiplexer object if they see failures or disconnected state. There is really no need to recreate the ConnectionMultiplexer as it has logic internally to recreate the connection as necessary. Just make sure to set abortConnect to false in your connection string.
If you do decide to re-create the connection object, make sure to dispose the old object before releasing all references to it.
The following is the pattern we are recommending:
private static Lazy lazyConnection = new Lazy(() => {
return ConnectionMultiplexer.Connect("contoso5.redis.cache.windows.net,abortConnect=false,ssl=true,password=...");
});
public static ConnectionMultiplexer Connection {
get {
return lazyConnection.Value;
}
}

Twitter crawler: why does the memory grow?

I have been trying to crawl Twitter via the Streaming API and by filtering the retrieved tweets by keywords/hashtags/users.
Here is my example using HBC (although the same problem happens with Twitter4J):
// After connection:
final BlockingQueue<String> queue = new LinkedBlockingQueue<String>(10000);
StatusesFilterEndpoint filterQuery = new StatusesFilterEndpoint();
filterQuery.followings(myListOfUserIDs);
filterQuery.trackTerms(myListOfKeywordsAndHashtags);
final ExecutorService executor = Executors.newFixedThreadPool(4);
Runnable tweetAnalyzer = defineRunnable(queue);
for (int i = 0; i < NUM_THREADS; i++)
executor.execute(tweetAnalyzer);
where the analyzer tweetAnalyzer is returned by:
private Runnable defineRunnable(final BlockingQueue<String> queue) {
return new Runnable() {
#Override
public void run() {
while (true)
try {
System.out.println(queue.take());
}
catch (InterruptedException e) {
e.printStackTrace();
}
}
};
}
However, the process continues to grow in memory.
Two questions:
How to design this crawler properly, so that it does not grow in memory and does not saturate the RAM?
How to select the best queue length (here set to 10000) so that it does not saturate? I have seen that using this length the queue continues to be full of tweets (it never goes empty) and I am able to crawl 700 tweets/min, which is huge)
Thank you in advance.
It's a bit hard to determine from the snippets that you provide. Do you register StatusesFilterEndpoint correctly?
I would recommend that you write a separate thread to monitor the size of the queue.
Obvious you are not able to proceed all the twitter messages you download. So you can only:
reduce the number of tweets you download by filtering more aggressively
Sample the input by throwing away every n message.
use a faster machine although for the tweetAnalyzer you display in the question this might not help.
deploy on a cluster

Resources