I have an application that writes entries to a Chronicle Queue (V3) that also retains excerpt entry index values in other (Chronicle)Maps by way of providing indexed access in the queue. Sometimes we fail to find a given entry that we've earlier saved and I believe it maybe related to data-block roll-over.
Below is a stand-alone test program that reproduces such use-cases at small-scale. It repeatedly writes an entry and immediately attempts to find the resulting index value up using a separate ExcerptTailer. All is well for a while until the first data-block is used up and a second data file is assigned, then the retrieval failures start. If the data block size is increased to avoid roll-overs, then no entries are lost. Also using a small index data-block size, causing multiple index files to be created, doesn't cause a problem.
The test program also tries using an ExcerptListener running in parallel to see if the entries apparently 'lost' by the writer, are ever received by the reader thread - they're not. Also tries to re-read the resulting queue from start until end, which reconfirms that they really are lost.
Stepping thru' the code, I see that when looking up a 'missing entry', within AbstractVanillarExcerpt#index, it appears to successfully locate the correct VanillaMappedBytes object from the dataCache, but determines that there is no entry and the data-offset as the len == 0. In addition to the entries not being found, at some point after the problems start occurring post-roll-over, an NPE is thrown from within the VanillaMappedFile#fileChannel method due to it having been passed a null File path. The code-path assumes that when resolving a entry looked up successfully in the index that a file will always have been found, but isn't in this case.
Is it possible to reliably use Chronicle Queue across data-block roll-overs, and if so, what am I doing that maybe causing the problem I'm experiencing?
import java.io.IOException;
import java.util.Collection;
import java.util.HashSet;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Set;
import org.junit.Before;
import org.junit.Test;
import net.openhft.affinity.AffinitySupport;
import net.openhft.chronicle.Chronicle;
import net.openhft.chronicle.ChronicleQueueBuilder;
import net.openhft.chronicle.ExcerptAppender;
import net.openhft.chronicle.ExcerptCommon;
import net.openhft.chronicle.ExcerptTailer;
import net.openhft.chronicle.VanillaChronicle;
public class ChronicleTests {
private static final int CQ_LEN = VanillaChronicle.Cycle.DAYS.length();
private static final long CQ_ENT = VanillaChronicle.Cycle.DAYS.entries();
private static final String ROOT_DIR = System.getProperty(ChronicleTests.class.getName() + ".ROOT_DIR",
"C:/Temp/chronicle/");
private static final String QDIR = System.getProperty(ChronicleTests.class.getName() + ".QDIR", "chronicleTests");
private static final int DATA_SIZE = Integer
.parseInt(System.getProperty(ChronicleTests.class.getName() + ".DATA_SIZE", "100000"));
// Chunk file size of CQ index
private static final int INDX_SIZE = Integer
.parseInt(System.getProperty(ChronicleTests.class.getName() + ".INDX_SIZE", "10000"));
private static final int Q_ENTRIES = Integer
.parseInt(System.getProperty(ChronicleTests.class.getName() + ".Q_ENTRIES", "5000"));
// Data type id
protected static final byte FSYNC_DATA = 1;
protected static final byte NORMAL_DATA = 0;
protected static final byte TH_START_DATA = -1;
protected static final byte TH_END_DATA = -2;
protected static final byte CQ_START_DATA = -3;
private static final long MAX_RUNTIME_MILLISECONDS = 30000;
private static String PAYLOAD_STRING = "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
private static byte PAYLOAD_BYTES[] = PAYLOAD_STRING.getBytes();
private Chronicle _chronicle;
private String _cqPath = ROOT_DIR + QDIR;
#Before
public void init() {
buildCQ();
}
#Test
public void test() throws IOException, InterruptedException {
boolean passed = true;
Collection<Long> missingEntries = new LinkedList<Long>();
long sent = 0;
Thread listener = listen();
try {
listener.start();
// Write entries to CQ,
for (int i = 0; i < Q_ENTRIES; i++) {
long entry = writeQEntry(PAYLOAD_BYTES, (i % 100) == 0);
sent++;
// check each entry can be looked up
boolean found = checkEntry(i, entry);
if (!found)
missingEntries.add(entry);
passed &= found;
}
// Wait awhile for the listener
listener.join(MAX_RUNTIME_MILLISECONDS);
if (listener.isAlive())
listener.interrupt();
} finally {
if (listener.isAlive()) { // => exception raised so wait for listener
log("Give listener a chance....");
sleep(MAX_RUNTIME_MILLISECONDS);
listener.interrupt();
}
log("Sent: " + sent + " Received: " + _receivedEntries.size());
// Look for missing entries in receivedEntries
missingEntries.forEach(me -> checkMissingEntry(me));
log("All passed? " + passed);
// Try to find missing entries by searching from the start...
searchFromStartFor(missingEntries);
_chronicle.close();
_chronicle = null;
// Re-initialise CQ and look for missing entries again...
log("Re-initialise");
init();
searchFromStartFor(missingEntries);
}
}
private void buildCQ() {
try {
// build chronicle queue
_chronicle = ChronicleQueueBuilder.vanilla(_cqPath).cycleLength(CQ_LEN).entriesPerCycle(CQ_ENT)
.indexBlockSize(INDX_SIZE).dataBlockSize(DATA_SIZE).build();
} catch (IOException e) {
throw new InitializationException("Failed to initialize Active Trade Store.", e);
}
}
private long writeQEntry(byte dataArray[], boolean fsync) throws IOException {
ExcerptAppender appender = _chronicle.createAppender();
return writeData(appender, dataArray, fsync);
}
private boolean checkEntry(int seqNo, long entry) throws IOException {
ExcerptTailer tailer = _chronicle.createTailer();
if (!tailer.index(entry)) {
log("SeqNo: " + seqNo + " for entry + " + entry + " not found");
return false;
}
boolean isMarker = isMarker(tailer);
boolean isFsyncData = isFsyncData(tailer);
boolean isNormalData = isNormalData(tailer);
String type = isMarker ? "MARKER" : isFsyncData ? "FSYNC" : isNormalData ? "NORMALDATA" : "UNKNOWN";
log("Entry: " + entry + "(" + seqNo + ") is " + type);
return true;
}
private void log(String string) {
System.out.println(string);
}
private void searchFromStartFor(Collection<Long> missingEntries) throws IOException {
Set<Long> foundEntries = new HashSet<Long>(Q_ENTRIES);
ExcerptTailer tailer = _chronicle.createTailer();
tailer.toStart();
while (tailer.nextIndex())
foundEntries.add(tailer.index());
Iterator<Long> iter = missingEntries.iterator();
long foundCount = 0;
while (iter.hasNext()) {
long me = iter.next();
if (foundEntries.contains(me)) {
log("Found missing entry: " + me);
foundCount++;
}
}
log("searchFromStartFor Found: " + foundCount + " of: " + missingEntries.size() + " missing entries");
}
private void checkMissingEntry(long missingEntry) {
if (_receivedEntries.contains(missingEntry))
log("Received missing entry:" + missingEntry);
}
Set<Long> _receivedEntries = new HashSet<Long>(Q_ENTRIES);
private Thread listen() {
Thread returnVal = new Thread("Listener") {
public void run() {
try {
int receivedCount = 0;
ExcerptTailer tailer = _chronicle.createTailer();
tailer.toStart();
while (receivedCount < Q_ENTRIES) {
if (tailer.nextIndex()) {
_receivedEntries.add(tailer.index());
} else {
ChronicleTests.this.sleep(1);
}
}
log("listener complete");
} catch (IOException e) {
log("Interupted before receiving all entries");
}
}
};
return returnVal;
}
private void sleep(long interval) {
try {
Thread.sleep(interval);
} catch (InterruptedException e) {
// No action required
}
}
protected static final int THREAD_ID_LEN = Integer.SIZE / Byte.SIZE;
protected static final int DATA_TYPE_LEN = Byte.SIZE / Byte.SIZE;
protected static final int TIMESTAMP_LEN = Long.SIZE / Byte.SIZE;
protected static final int CRC_LEN = Long.SIZE / Byte.SIZE;
protected static long writeData(ExcerptAppender appender, byte dataArray[],
boolean fsync) {
appender.startExcerpt(DATA_TYPE_LEN + THREAD_ID_LEN + dataArray.length
+ CRC_LEN);
appender.nextSynchronous(fsync);
if (fsync) {
appender.writeByte(FSYNC_DATA);
} else {
appender.writeByte(NORMAL_DATA);
}
appender.writeInt(AffinitySupport.getThreadId());
appender.write(dataArray);
appender.writeLong(CRCCalculator.calcDataAreaCRC(appender));
appender.finish();
return appender.lastWrittenIndex();
}
protected static boolean isMarker(ExcerptCommon excerpt) {
if (isCqStartMarker(excerpt) || isStartMarker(excerpt) || isEndMarker(excerpt)) {
return true;
}
return false;
}
protected static boolean isCqStartMarker(ExcerptCommon excerpt) {
return isDataTypeMatched(excerpt, CQ_START_DATA);
}
protected static boolean isStartMarker(ExcerptCommon excerpt) {
return isDataTypeMatched(excerpt, TH_START_DATA);
}
protected static boolean isEndMarker(ExcerptCommon excerpt) {
return isDataTypeMatched(excerpt, TH_END_DATA);
}
protected static boolean isData(ExcerptTailer tailer, long index) {
if (!tailer.index(index)) {
return false;
}
return isData(tailer);
}
private static void movePosition(ExcerptCommon excerpt, long position) {
if (excerpt.position() != position)
excerpt.position(position);
}
private static void moveToFsyncFlagPos(ExcerptCommon excerpt) {
movePosition(excerpt, 0);
}
private static boolean isDataTypeMatched(ExcerptCommon excerpt, byte type) {
moveToFsyncFlagPos(excerpt);
byte b = excerpt.readByte();
if (b == type) {
return true;
}
return false;
}
protected static boolean isNormalData(ExcerptCommon excerpt) {
return isDataTypeMatched(excerpt, NORMAL_DATA);
}
protected static boolean isFsyncData(ExcerptCommon excerpt) {
return isDataTypeMatched(excerpt, FSYNC_DATA);
}
/**
* Check if this entry is Data
*
* #param excerpt
* #return true if the entry is data
*/
protected static boolean isData(ExcerptCommon excerpt) {
if (isNormalData(excerpt) || isFsyncData(excerpt)) {
return true;
}
return false;
}
}
The problem only occurs when initialising the data-block size with a value that is not a power of two. The built-in configurations on IndexedChronicleQueueBuilder (small(), medium(), large()) take care to initialise using powers of two which provided the clue as to the appropriate usage.
Notwithstanding the above response regarding support, which I totally appreciate, it would be useful if a knowledgeable Chronicle user could confirm that the integrity of Chronicle Queue depends on using a data-block size of a power of two.
Related
I built a stream that does a windowed join, when deploying on production all is fine in terms of memory and performance.
However, I needed Deduplication, hence implemented a Transformer that does that with the help of a WindowStore.
After deploying it, we are getting the data results that was expected, but memory keeps growing until the pod crashes with OOM.
After doing research I implemented many tricks to reduce memory usage, but they didn't help, below is the code.
It's clear to me that using the WindowStore is causing this issue, but how to limit it?
The Store:
var storeBuilder = Stores.windowStoreBuilder(
Stores.persistentWindowStore(
storeName,
Duration.ofSeconds(6),
Duration.ofSeconds(5),
false
),
Serdes.String(),
SerdeFactory.JsonSerde(valueDataClass)
);
The stream:
var leftStream = builder.stream("leftTopic").filter(...);
var rightStream = builder.stream("rightTopic").filter(...);
leftStream.join(rightStream, joiner, JoinWindows
.of(Duration.ofSeconds(5))
.grace(Duration.ofSeconds(1))
.until(
Duration.ofSeconds(6)
)
.transformValues(
() ->
new DeduplicationTransformer<>(
storeName,
Duration.ofSeconds(6),
(key, value) -> value.id
),
storeName
)
.filter((k, v) -> v != null)
.to("targetTopic");
Deduplication Transformer:
public class DeduplicationTransformer<K, V extends StreamModel>
implements ValueTransformerWithKey<K, V, V> {
private ProcessorContext context;
private String storeName;
private WindowStore<K, V> eventIdStore;
private final long leftDurationMs;
private final KeyValueMapper<K, V, K> idExtractor;
public DeduplicationTransformer(
String storeName,
long maintainDurationPerEventInMs,
final KeyValueMapper<K, V, K> idExtractor
) {
if (maintainDurationPerEventInMs < 2) {
throw new IllegalArgumentException(
"maintain duration per event must be > 1"
);
}
leftDurationMs = maintainDurationPerEventInMs;
this.idExtractor = idExtractor;
this.storeName = storeName;
}
#Override
public void init(final ProcessorContext context) {
this.context = context;
eventIdStore = (WindowStore<K, V>) context.getStateStore(storeName);
Duration interval = Duration.ofMillis(leftDurationMs);
this.context.schedule(
interval,
PunctuationType.WALL_CLOCK_TIME,
timestamp -> {
Instant from = Instant.ofEpochMilli(
System.currentTimeMillis() - leftDurationMs * 2
);
Instant to = Instant.ofEpochMilli(
System.currentTimeMillis() - leftDurationMs
);
KeyValueIterator<Windowed<K>, V> iterator = eventIdStore.fetchAll(
from,
to
);
while (iterator.hasNext()) {
KeyValue<Windowed<K>, V> entry = iterator.next();
eventIdStore.put(entry.key.key(), null, entry.key.window().start());
}
iterator.close();
context.commit();
}
);
}
#Override
public V transform(final K key, final V value) {
try {
final K eventId = idExtractor.apply(key, value);
if (eventId == null) {
return value;
} else {
final V output;
if (isDuplicate(eventId)) {
output = null;
} else {
output = value;
rememberNewEvent(eventId, value, context.timestamp());
}
return output;
}
} catch (Exception e) {
return null;
}
}
private boolean isDuplicate(final K eventId) {
final long eventTime = context.timestamp();
final WindowStoreIterator<V> timeIterator = eventIdStore.fetch(
eventId,
eventTime - leftDurationMs,
eventTime
);
final boolean isDuplicate = timeIterator.hasNext();
timeIterator.close();
return isDuplicate;
}
private void rememberNewEvent(final K eventId, V v, final long timestamp) {
eventIdStore.put(eventId, v, timestamp);
}
#Override
public void close() {}
}
RocksDB config:
public class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {
private Cache cache = new LRUCache(5 * 1024 * 1204L);
private Filter filter = new BloomFilter();
private WriteBufferManager writeBufferManager = new WriteBufferManager(
4 * 1024 * 1204L,
cache
);
#Override
public void setConfig(
final String storeName,
final Options options,
final Map<String, Object> configs
) {
BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(false);
tableConfig.setPinTopLevelIndexAndFilter(true);
tableConfig.setBlockSize(4 * 1024L);
options.setMaxWriteBufferNumber(1);
options.setWriteBufferSize(1024 * 1024L);
options.setTableFormatConfig(tableConfig);
}
#Override
public void close(final String storeName, final Options options) {
cache.close();
filter.close();
}
}
Config:
props.put(
StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG,
0
);
props.put(
StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG,
BoundedMemoryRocksDBConfig.class
);
Things I've tried so far:
Using a bounded RocksDB config setter
Using jemalloc instead of malloc
Reducing the retention period to 5 seconds
Reducing the number of partitions of the topics (this only slowed the rate of the memory leak)
used in-memory stores instead of persistent, and memory was very stable, but the app startup takes around 10 minutes on each deployment.
I have below code in traditional java loop. Would like to use Java 8 Stream instead.
I have a sorted list of files(Sorted by file size). I group these files together in a way that the total size of all files does not exceed the given max size and put them in a Map with the key 1,2,3,... so on. Here is the code.
List<File> allFilesSortedBySize = getListOfFiles();
Map<Integer, List<File>> filesGroupedByMaxSizeMap = new HashMap<Integer, List<File>>();
double totalLength = 0L;
int count = 0;
List<File> filesWithSizeTotalMaxSize = Lists.newArrayList();
//group the files to be zipped together as per maximum allowable size in a map
for (File file : allFilesSortedBySize) {
long sizeInBytes = file.length();
double sizeInMb = (double)sizeInBytes / (1024 * 1024);
totalLength = totalLength + sizeInMb;
if(totalLength <= maxSize) {
filesWithSizeTotalMaxSize.add(file);
} else {
count = count + 1;
filesGroupedByMaxSizeMap.put(count, filesWithSizeTotalMaxSize);
filesWithSizeTotalMaxSize = Lists.newArrayList();
filesWithSizeTotalMaxSize.add(file);
totalLength = sizeInMb;
}
}
filesGroupedByMaxSizeMap.put(count+1, filesWithSizeTotalMaxSize);
return filesGroupedByMaxSizeMap;
after reading,I found the solution using Collectors.groupBy instead.
Code using java8 lambda expression
private final long MB = 1024 * 1024;
private Map<Integer, List<File>> grouping(List<File> files, long maxSize) {
AtomicInteger group = new AtomicInteger(0);
AtomicLong groupSize = new AtomicLong();
return files.stream().collect(groupingBy((file) -> {
if (groupSize.addAndGet(file.length()) <= maxSize * MB) {
return group.get() == 0 ? group.incrementAndGet() : group.get();
}
groupSize.set(file.length());
return group.incrementAndGet();
}));
}
Code provided by #Holger then you are free to checking group whether equals 0
private static final long MB = 1024 * 1024;
private Map<Integer, List<File>> grouping(List<File> files, long maxSize) {
AtomicInteger group = new AtomicInteger(0);
//force initializing group starts with 1 even if the first file is empty.
AtomicLong groupSize = new AtomicLong(maxSize * MB + 1);
return files.stream().collect(groupingBy((file) -> {
if (groupSize.addAndGet(file.length()) <= maxSize * MB) {
return group.get();
}
groupSize.set(file.length());
return group.incrementAndGet();
}));
}
Code using anonymous class
inspired by #Holger, All “solutions” using a grouping function that modifies external state are hacks abusing the API,so you can use anonymous class to manage the grouping logic state in class.
private static final long MB = 1024 * 1024;
private Map<Integer, List<File>> grouping(List<File> files, long maxSize) {
return files.stream().collect(groupingBy(groupSize(maxSize)));
}
private Function<File, Integer> groupSize(final long maxSize) {
long maxBytesSize = maxSize * MB;
return new Function<File, Integer>() {
private int group;
private long groupSize = maxBytesSize + 1;
#Override
public Integer apply(File file) {
return hasRemainingFor(file) ? current(file) : next(file);
}
private boolean hasRemainingFor(File file) {
return (groupSize += file.length()) <= maxBytesSize;
}
private int next(File file) {
groupSize = file.length();
return ++group;
}
private int current(File file) {
return group;
}
};
}
Test
import org.junit.jupiter.api.Test;
import java.io.File;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;
import java.util.function.Function;
import static java.util.Arrays.asList;
import static java.util.Collections.singletonList;
import static java.util.stream.Collectors.groupingBy;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.equalTo;
/**
* Created by holi on 3/24/17.
*/
public class StreamGroupingTest {
private final File FILE_1MB = file(1);
private final File FILE_2MB = file(2);
private final File FILE_3MB = file(3);
#Test
void eachFileInIndividualGroupIfEachFileSizeGreaterThanMaxSize() {
Map<Integer, List<File>> groups = grouping(asList(FILE_2MB, FILE_3MB), 1);
assertThat(groups.size(), equalTo(2));
assertThat(groups.get(1), equalTo(singletonList(FILE_2MB)));
assertThat(groups.get(2), equalTo(singletonList(FILE_3MB)));
}
#Test
void allFilesInAGroupIfTotalSizeOfFilesLessThanOrEqualMaxSize() {
Map<Integer, List<File>> groups = grouping(asList(FILE_2MB, FILE_3MB), 5);
assertThat(groups.size(), equalTo(1));
assertThat(groups.get(1), equalTo(asList(FILE_2MB, FILE_3MB)));
}
#Test
void allNeighboringFilesInAGroupThatTotalOfTheirSizeLessThanOrEqualMaxSize() {
Map<Integer, List<File>> groups = grouping(asList(FILE_1MB, FILE_2MB, FILE_3MB), 3);
assertThat(groups.size(), equalTo(2));
assertThat(groups.get(1), equalTo(asList(FILE_1MB, FILE_2MB)));
assertThat(groups.get(2), equalTo(singletonList(FILE_3MB)));
}
#Test
void eachFileInIndividualGroupIfTheFirstFileAndTotalOfEachNeighboringFilesSizeGreaterThanMaxSize() {
Map<Integer, List<File>> groups = grouping(asList(FILE_2MB, FILE_1MB, FILE_3MB), 2);
assertThat(groups.size(), equalTo(3));
assertThat(groups.get(1), equalTo(singletonList(FILE_2MB)));
assertThat(groups.get(2), equalTo(singletonList(FILE_1MB)));
assertThat(groups.get(3), equalTo(singletonList(FILE_3MB)));
}
#Test
void theFirstEmptyFileInGroup1() throws Throwable {
File emptyFile = file(0);
Map<Integer, List<File>> groups = grouping(singletonList(emptyFile), 2);
assertThat(groups.get(1), equalTo(singletonList(emptyFile)));
}
private static final long MB = 1024 * 1024;
private Map<Integer, List<File>> grouping(List<File> files, long maxSize) {
AtomicInteger group = new AtomicInteger(0);
AtomicLong groupSize = new AtomicLong(maxSize * MB + 1);
return files.stream().collect(groupingBy((file) -> {
if (groupSize.addAndGet(file.length()) <= maxSize * MB) {
return group.get();
}
groupSize.set(file.length());
return group.incrementAndGet();
}));
}
private Function<File, Integer> groupSize(final long maxSize) {
long maxBytesSize = maxSize * MB;
return new Function<File, Integer>() {
private int group;
private long groupSize = maxBytesSize + 1;
#Override
public Integer apply(File file) {
return hasRemainingFor(file) ? current(file) : next(file);
}
private boolean hasRemainingFor(File file) {
return (groupSize += file.length()) <= maxBytesSize;
}
private int next(File file) {
groupSize = file.length();
return ++group;
}
private int current(File file) {
return group;
}
};
}
private File file(int sizeOfMB) {
return new File(String.format("%dMB file", sizeOfMB)) {
#Override
public long length() {
return sizeOfMB * MB;
}
#Override
public boolean equals(Object obj) {
File that = (File) obj;
return length() == that.length();
}
};
}
}
Since the processing of each element highly depends on the previous’ processing, this task is not suitable for streams. You still can achieve it using a custom collector, but the implementation would be much more complicated than the loop solution.
In other words, there is no improvement when you rewrite this as a stream operation. Stay with the loop.
However, there are still some things you can improve.
List<File> allFilesSortedBySize = getListOfFiles();
// get maxSize in bytes ONCE, instead of converting EACH size to MiB
long maxSizeBytes = (long)(maxSize * 1024 * 1024);
// use "diamond operator"
Map<Integer, List<File>> filesGroupedByMaxSizeMap = new HashMap<>();
// start with "create new list" condition to avoid code duplication
long totalLength = maxSizeBytes;
// count is obsolete, the map maintains a size
// the initial "totalLength = maxSizeBytes" forces creating a new list within the loop
List<File> filesWithSizeTotalMaxSize = null;
for(File file: allFilesSortedBySize) {
long length = file.length();
if(maxSizeBytes-totalLength <= length) {
filesWithSizeTotalMaxSize = new ArrayList<>(); // no utility method needed
// store each list immediately, so no action after the loop needed
filesGroupedByMaxSizeMap.put(filesGroupedByMaxSizeMap.size()+1,
filesWithSizeTotalMaxSize);
totalLength = 0;
}
totalLength += length;
filesWithSizeTotalMaxSize.add(file);
}
return filesGroupedByMaxSizeMap;
You may further replace
filesWithSizeTotalMaxSize = new ArrayList<>();
filesGroupedByMaxSizeMap.put(filesGroupedByMaxSizeMap.size()+1,
filesWithSizeTotalMaxSize);
with
filesWithSizeTotalMaxSize = filesGroupedByMaxSizeMap.computeIfAbsent(
filesGroupedByMaxSizeMap.size()+1, x -> new ArrayList<>());
but there might be different opinions whether this is an improvement.
The simplest solution to the problem I could think of is to use an AtomicLong wrapper for the size and a AtomicInteger wrapper for length. These have some useful methods for performing basic arithmetic operations on them which are very useful in this particular case.
List<File> files = getListOfFiles();
AtomicLong length = new AtomicLong();
AtomicInteger index = new AtomicInteger(1);
long maxLength = SOME_ARBITRARY_NUMBER;
Map<Integer, List<File>> collect = files.stream().collect(Collectors.groupingBy(
file -> {
if (length.addAndGet(file.length()) <= maxLength) {
return index.get();
}
length.set(file.length());
return index.incrementAndGet();
}
));
return collect;
Basically what Collectors.groupingBy does the work which you Intended.
I don't know how to put clob data which is of more than 66k in Oracle Forms.
The text field will take a long data type and that too not more than 66k. I have a clob data and wanted to display.
The easiest way to display this is too make a forms bean (PJC).
Then you can display it in a JTextPane which is big enough.
You can then make a function that gives you piece of 32000 characters from the clob and give them to the bean.
package be.axi.oracle.forms.jpc;
import java.awt.event.FocusEvent;
import java.awt.event.FocusListener;
import java.awt.event.KeyEvent;
import java.awt.event.KeyListener;
import javax.swing.JScrollPane;
import javax.swing.JTextPane;
import javax.swing.SwingUtilities;
import javax.swing.UIManager;
import javax.swing.text.BadLocationException;
import oracle.forms.handler.IHandler;
import oracle.forms.properties.ID;
import oracle.forms.ui.CustomEvent;
import oracle.forms.ui.VBean;
/**
* A TextArea to get more than 64k texts
* #author Francois Degrelle
* #version 1.1
*/
public class BigTextArea extends VBean implements FocusListener, KeyListener
{
private static final long serialVersionUID = 1L;
public final static ID ADDTEXT = ID.registerProperty("ADD_TEXT");
public final static ID VALUE = ID.registerProperty("VALUE");
public final static ID SHOW = ID.registerProperty("SHOW");
public final static ID CLEAR = ID.registerProperty("CLEAR");
public final static ID GETTEXT = ID.registerProperty("GET_TEXT");
public final static ID GETLENGTH = ID.registerProperty("GET_LENGTH");
public final static ID pLostFocus = ID.registerProperty("BEAN_QUITTED");
private IHandler m_handler;
private int iStart = 0 ;
private int iChunk = 8192 ;
private StringBuffer sb = new StringBuffer();
protected JTextPane jtp = new JTextPane();
public BigTextArea()
{
super();
try
{
UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());
SwingUtilities.updateComponentTreeUI(this);
}
catch (Exception ex)
{
ex.printStackTrace();
}
JScrollPane ps = new JScrollPane(jtp);
ps.setBorder(null);
add(ps);
ps.setVisible(true);
}
public void init(IHandler handler)
{
m_handler = handler;
super.init(handler);
addFocusListener(this);
jtp.addFocusListener(this);
jtp.addKeyListener(this);
}
public boolean setProperty(ID property, Object value)
{
//
// add text to the TextArea
//
if (property == ADDTEXT)
{
sb.append(value.toString()) ;
printMemory();
return true;
}
//
// display the whole text
//
else if(property == SHOW)
{
jtp.setText(sb.toString());
jtp.setCaretPosition(0);
sb = new StringBuffer();
System.gc();
printMemory();
return true ;
}
//
// clear the TextArea
//
else if(property == CLEAR) {
jtp.setText("");
return true ;
}
else
{
return super.setProperty(property, value);
}
}
/*-----------------------------------*
* Get the result string from Forms *
*-----------------------------------*/
public Object getProperty(ID pId)
{
//
// returns the text length
//
if (pId == GETLENGTH)
{
return "" + jtp.getText().length();
}
//
// returns the chunks
//
else if (pId == GETTEXT) {
String s = "" ;
int iLen = jtp.getText().length() ;
while (iStart < iLen)
{
try{
if(iStart+iChunk <= iLen) s = jtp.getText(iStart,iChunk);
else s = jtp.getText(iStart,iLen-iStart);
iStart += iChunk ;
return s ;
}
catch (BadLocationException ble) { ble.printStackTrace(); return ""; }
}
iStart = 0 ;
return "" ;
}
else
{
return super.getProperty(pId);
}
} // getProperty()
/*--------------------------*
* handle the focus events *
*--------------------------*/
public void focusGained(FocusEvent e)
{
if (e.getComponent() == this)
{
// put the focus on the component
jtp.requestFocus();
}
try
{
m_handler.setProperty(FOCUS_EVENT, e);
}
catch ( Exception ex )
{
;
}
}
public void focusLost(FocusEvent e)
{
CustomEvent ce = new CustomEvent(m_handler, pLostFocus);
dispatchCustomEvent(ce);
}
/*--------------------------*
* Handle the Key listener *
*--------------------------*/
public void keyPressed(KeyEvent e)
{
/*
** Allows TAB key to exit the item
** and continue the standard Forms navigation
*/
if ( (e.getKeyCode() == KeyEvent.VK_TAB) )
{
try
{
m_handler.setProperty(KEY_EVENT, e);
}
catch ( Exception ex )
{
}
}
}
public void keyTyped(KeyEvent e)
{
}
public void keyReleased(KeyEvent e)
{
}
// utility to output the memory available
private void printMemory() {
System.out.println("Java memory in use = "
+ (Runtime.getRuntime().totalMemory()
- Runtime.getRuntime().freeMemory()));
}
}
Netty 4.1 (on OpenJDK 1.6.0_32 and CentOS 6.4) message sending is strangely slow. According to the profiler, it is the DefaultChannelHandlerContext.writeAndFlush that makes the biggest percentage (60%) of the running time. Decoding process is not emphasized in the profiler. Small messages are being processed and maybe the bootstrap options are not set correctly (TCP_NODELAY is true and nothing improved)? DefaultEventExecutorGroup is used both in server and client to avoid blocking Netty's main event loop and to run 'ServerData' and 'ClientData' classes with business logic and sending of the messages is done from there through context.writeAndFlush(...). Is there a more proper/faster way? Using straight ByteBuf.writeBytes(..) serialization in the encoder and ReplayingDecoder in the decoder made no difference in encoding speed. Sorry for the lengthy code, neither 'Netty In Action' book nor the documentation helped.
JProfiler's call tree of the client side: http://i62.tinypic.com/dw4e43.jpg
The server class is:
public class NettyServer
{
EventLoopGroup incomingLoopGroup = null;
EventLoopGroup workerLoopGroup = null;
ServerBootstrap serverBootstrap = null;
int port;
DataServer dataServer = null;
DefaultEventExecutorGroup dataEventExecutorGroup = null;
DefaultEventExecutorGroup dataEventExecutorGroup2 = null;
public ChannelFuture serverChannelFuture = null;
public NettyServer(int port)
{
this.port = port;
DataServer = new DataServer(this);
}
public void run() throws Exception
{
incomingLoopGroup = new NioEventLoopGroup();
workerLoopGroup = new NioEventLoopGroup();
dataEventExecutorGroup = new DefaultEventExecutorGroup(5);
dataEventExecutorGroup2 = new DefaultEventExecutorGroup(5);
try
{
ChannelInitializer<SocketChannel> channelInitializer =
new ChannelInitializer<SocketChannel>()
{
#Override
protected void initChannel(SocketChannel ch)
throws Exception {
ch.pipeline().addLast(new MessageByteDecoder());
ch.pipeline().addLast(new MessageByteEncoder());
ch.pipeline().addLast(dataEventExecutorGroup, new DataServerInboundHandler(DataServer, NettyServer.this));
ch.pipeline().addLast(dataEventExecutorGroup2, new DataServerDataHandler(DataServer));
}
};
// bootstrap the server
serverBootstrap = new ServerBootstrap();
serverBootstrap.group(incomingLoopGroup, workerLoopGroup)
.channel(NioServerSocketChannel.class)
.childHandler(channelInitializer)
.option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
.option(ChannelOption.TCP_NODELAY, true)
.childOption(ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK, 32 * 1024)
.childOption(ChannelOption.WRITE_BUFFER_LOW_WATER_MARK, 8 * 1024)
.childOption(ChannelOption.SO_KEEPALIVE, true);
serverChannelFuture = serverBootstrap.bind(port).sync();
serverChannelFuture.channel().closeFuture().sync();
}
finally
{
incomingLoopGroup.shutdownGracefully();
workerLoopGroup.shutdownGracefully();
}
}
}
The client class:
public class NettyClient
{
Bootstrap clientBootstrap = null;
EventLoopGroup workerLoopGroup = null;
String serverHost = null;
int serverPort = -1;
ChannelFuture clientFutureChannel = null;
DataClient dataClient = null;
DefaultEventExecutorGroup dataEventExecutorGroup = new DefaultEventExecutorGroup(5);
DefaultEventExecutorGroup dataEventExecutorGroup2 = new DefaultEventExecutorGroup(5);
public NettyClient(String serverHost, int serverPort)
{
this.serverHost = serverHost;
this.serverPort = serverPort;
}
public void run() throws Exception
{
workerLoopGroup = new NioEventLoopGroup();
try
{
this.dataClient = new DataClient();
ChannelInitializer<SocketChannel> channelInitializer =
new ChannelInitializer<SocketChannel>()
{
#Override
protected void initChannel(SocketChannel ch)
throws Exception {
ch.pipeline().addLast(new MessageByteDecoder());
ch.pipeline().addLast(new MessageByteEncoder());
ch.pipeline().addLast(dataEventExecutorGroup, new ClientInboundHandler(dataClient, NettyClient.this)); ch.pipeline().addLast(dataEventExecutorGroup2, new ClientDataHandler(dataClient));
}
};
clientBootstrap = new Bootstrap();
clientBootstrap.group(workerLoopGroup);
clientBootstrap.channel(NioSocketChannel.class);
clientBootstrap.option(ChannelOption.SO_KEEPALIVE, true);
clientBootstrap.option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);
clientBootstrap.option(ChannelOption.TCP_NODELAY, true);
clientBootstrap.option(ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK, 32 * 1024);
clientBootstrap.option(ChannelOption.WRITE_BUFFER_LOW_WATER_MARK, 8 * 1024);
clientBootstrap.handler(channelInitializer);
clientFutureChannel = clientBootstrap.connect(serverHost, serverPort).sync();
clientFutureChannel.channel().closeFuture().sync();
}
finally
{
workerLoopGroup.shutdownGracefully();
}
}
}
The message class:
public class Message implements Serializable
{
public static final byte MSG_FIELD = 0;
public static final byte MSG_HELLO = 1;
public static final byte MSG_LOG = 2;
public static final byte MSG_FIELD_RESPONSE = 3;
public static final byte MSG_MAP_KEY_VALUE = 4;
public static final byte MSG_STATS_FILE = 5;
public static final byte MSG_SHUTDOWN = 6;
public byte msgID;
public byte msgType;
public String key;
public String value;
public byte method;
public byte id;
}
The decoder:
public class MessageByteDecoder extends ByteToMessageDecoder
{
private Kryo kryoCodec = new Kryo();
private int contentSize = 0;
#Override
protected void decode(ChannelHandlerContext ctx, ByteBuf buffer, List<Object> out) //throws Exception
{
if (!buffer.isReadable() || buffer.readableBytes() < 4) // we need at least integer
return;
// read header
if (contentSize == 0) {
contentSize = buffer.readInt();
}
if (buffer.readableBytes() < contentSize)
return;
// read content
byte [] buf = new byte[contentSize];
buffer.readBytes(buf);
Input in = new Input(buf, 0, buf.length);
out.add(kryoCodec.readObject(in, Message.class));
contentSize = 0;
}
}
The encoder:
public class MessageByteEncoder extends MessageToByteEncoder<Message>
{
Kryo kryoCodec = new Kryo();
public MessageByteEncoder()
{
super(false);
}
#Override
protected void encode(ChannelHandlerContext ctx, Message msg, ByteBuf out) throws Exception
{
int offset = out.arrayOffset() + out.writerIndex();
byte [] inArray = out.array();
Output kryoOutput = new OutputWithOffset(inArray, inArray.length, offset + 4);
// serialize message content
kryoCodec.writeObject(kryoOutput, msg);
// write length of the message content at the beginning of the array
out.writeInt(kryoOutput.position());
out.writerIndex(out.writerIndex() + kryoOutput.position());
}
}
Client's business logic run in DefaultEventExecutorGroup:
public class DataClient
{
ChannelHandlerContext ctx;
// ...
public void processData()
{
// ...
while ((line = br.readLine()) != null)
{
// ...
process = new CountDownLatch(columns.size());
for(Column c : columns)
{
// sending column data to the server for processing
ctx.channel().eventLoop().execute(new Runnable() {
#Override
public void run() {
ctx.writeAndFlush(Message.createMessage(msgID, processID, c.key, c.value));
}});
}
// block until all the processed column fields of this row are returned from the server
process.await();
// write processed line to file ...
}
// ...
}
// ...
}
Client's message handling:
public class ClientInboundHandler extends ChannelInboundHandlerAdapter
{
DataClient dataClient = null;
// ...
#Override
public void channelRead(ChannelHandlerContext ctx, Object msg)
{
// dispatch the message to the listeners
Message m = (Message) msg;
switch(m.msgType)
{
case Message.MSG_FIELD_RESPONSE: // message with processed data is received from the server
// decreases the 'process' CountDownLatch in the processData() method
dataClient.setProcessingResult(m.msgID, m.value);
break;
// ...
}
// forward the message to the pipeline
ctx.fireChannelRead(msg);
}
// ...
}
}
Server's message handling:
public class ServerInboundHandler extends ChannelInboundHandlerAdapter
{
private DataServer dataServer = null;
// ...
#Override
public void channelRead(ChannelHandlerContext ctx, Object obj) throws Exception
{
Message msg = (Message) obj;
switch(msg.msgType)
{
case Message.MSG_FIELD:
dataServer.processField(msg, ctx);
break;
// ...
}
ctx.fireChannelRead(msg);
}
//...
}
Server's business logic run in DefaultEventExecutorGroup:
public class DataServer
{
// ...
public void processField(final Message msg, final ChannelHandlerContext context)
{
context.executor().submit(new Runnable()
{
#Override
public void run()
{
String processedValue = (String) processField(msg.key, msg.value);
final Message responseToClient = Message.createResponseFieldMessage(msg.msgID, processedValue);
// send processed data to the client
context.channel().eventLoop().submit(new Runnable(){
#Override
public void run() {
context.writeAndFlush(responseToClient);
}
});
}
});
}
// ...
}
Please try using CentOS 7.0.
I've had similar problem:
The same Netty 4 program runs very fast on CentOS 7.0 (about 40k msg/s), but can't write more than about 8k msg/s on CentOS 6.3 and 6.5 (I haven't tried 6.4).
There is no need to submit stuff to the EventLoop. Just call Channel.writeAndFlush(...) directly in your DataClient and DataServer.
I have a set of node id's (Set< Long >) and want to restrict or filter the results of an query to only the nodes in this set. Is there a performant way to do this?
Set<Node> query(final GraphDatabaseService graphDb, final Set<Long> nodeSet) {
final Index<Node> searchIndex = graphdb.index().forNodes("search");
final IndexHits<Node> hits = searchIndex.query(new QueryContext("value*"));
// what now to return only index hits that are in the given Set of Node's?
}
Wouldn't be faster the other way round? If you get the nodes from your set and compare the property to the value you are looking for?
for (Iterator it=nodeSet.iterator();it.hasNext();) {
Node n=db.getNodeById(it.next());
if (!n.getProperty("value","").equals("foo")) it.remove();
}
or for your suggestion
Set<Node> query(final GraphDatabaseService graphDb, final Set<Long> nodeSet) {
final Index<Node> searchIndex = graphdb.index().forNodes("search");
final IndexHits<Node> hits = searchIndex.query(new QueryContext("value*"));
Set<Node> result=new HashSet<>();
for (Node n : hits) {
if (nodeSet.contains(n.getId())) result.add(n);
}
return result;
}
So the fastest solution I found was directly using lucenes IndexSearcher on the index created by neo4j and use an custom Filter to restrict the search to specific nodes.
Just open the neo4j index folder "{neo4j-database-folder}/index/lucene/node/{index-name}" with the lucene IndexReader. Make sure to use not add a lucene dependency to your project in another version than the one neo4j uses, which currently is lucene 3.6.2!
here's my lucene Filter implementation that filters all query results by the given Set of document id's. (Lucene Document id's (Integer) ARE NOT Neo4j Node id's (Long)!)
import java.io.IOException;
import java.util.PriorityQueue;
import java.util.Set;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.DocIdSet;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.search.Filter;
public class DocIdFilter extends Filter {
public class FilteredDocIdSetIterator extends DocIdSetIterator {
private final PriorityQueue<Integer> filterQueue;
private int docId;
public FilteredDocIdSetIterator(final Set<Integer> filterSet) {
this(new PriorityQueue<Integer>(filterSet));
}
public FilteredDocIdSetIterator(final PriorityQueue<Integer> filterQueue) {
this.filterQueue = filterQueue;
}
#Override
public int docID() {
return this.docId;
}
#Override
public int nextDoc() throws IOException {
if (this.filterQueue.isEmpty()) {
this.docId = NO_MORE_DOCS;
} else {
this.docId = this.filterQueue.poll();
}
return this.docId;
}
#Override
public int advance(final int target) throws IOException {
while ((this.docId = this.nextDoc()) < target)
;
return this.docId;
}
}
private final PriorityQueue<Integer> filterQueue;
public DocIdFilter(final Set<Integer> filterSet) {
super();
this.filterQueue = new PriorityQueue<Integer>(filterSet);
}
private static final long serialVersionUID = -865683019349988312L;
#Override
public DocIdSet getDocIdSet(final IndexReader reader) throws IOException {
return new DocIdSet() {
#Override
public DocIdSetIterator iterator() throws IOException {
return new FilteredDocIdSetIterator(DocIdFilter.this.filterQueue);
}
};
}
}
To map the set of neo4j node id's (the query result should be filtered with) to the correct lucene document id's i created an inmemory bidirectional map:
public static HashBiMap<Integer, Long> generateDocIdToNodeIdMap(final IndexReader indexReader)
throws LuceneIndexException {
final HashBiMap<Integer, Long> result = HashBiMap.create(indexReader.numDocs());
for (int i = 0; i < indexReader.maxDoc(); i++) {
if (indexReader.isDeleted(i)) {
continue;
}
final Document doc;
try {
doc = indexReader.document(i, new FieldSelector() {
private static final long serialVersionUID = 5853247619312916012L;
#Override
public FieldSelectorResult accept(final String fieldName) {
if ("_id_".equals(fieldName)) {
return FieldSelectorResult.LOAD_AND_BREAK;
} else {
return FieldSelectorResult.NO_LOAD;
}
}
};
);
} catch (final IOException e) {
throw new LuceneIndexException(indexReader.directory(), "could not read document with ID: '" + i
+ "' from index.", e);
}
final Long nodeId;
try {
nodeId = Long.valueOf(doc.get("_id_"));
} catch (final NumberFormatException e) {
throw new LuceneIndexException(indexReader.directory(),
"could not parse node ID value from document ID: '" + i + "'", e);
}
result.put(i, nodeId);
}
return result;
}
I'm using the Google Guava Library that provides an bidirectional map and the initialization of collections with an specific size.