ehcache is a hugely configurable beast, and the examples are fairly complex, often involving many layers of interfaces.
Has anyone come across the simplest example which just caches something like a single number in memory (not distributed, no XML, just as few lines of java as possible). The number is then cached for say 60 seconds, then the next read request causes it to get a new value (e.g. by calling Random.nextInt() or similar)
Is it quicker/easier to write our own cache for something like this with a singleton and a bit of synchronization?
No Spring please.
EhCache comes with a failsafe configuration that has some reasonable expiration time (120 seconds). This is sufficient to get it up and running.
Imports:
import net.sf.ehcache.CacheManager;
import net.sf.ehcache.Element;
Then, creating a cache is pretty simple:
CacheManager.getInstance().addCache("test");
This creates a cache called test. You can have many different, separate caches all managed by the same CacheManager. Adding (key, value) pairs to this cache is as simple as:
CacheManager.getInstance().getCache("test").put(new Element(key, value));
Retrieving a value for a given key is as simple as:
Element elt = CacheManager.getInstance().getCache("test").get(key);
return (elt == null ? null : elt.getObjectValue());
If you attempt to access an element after the default 120 second expiration period, the cache will return null (hence the check to see if elt is null). You can adjust the expiration period by creating your own ehcache.xml file - the documentation for that is decent on the ehcache site.
A working implementation of jbrookover's answer:
import net.sf.ehcache.CacheManager;
import net.sf.ehcache.Element;
import net.sf.ehcache.Cache;
public class EHCacheDemo {
public static final void main(String[] igno_red) {
CacheManager cchm = CacheManager.getInstance();
//Create a cache
cchm.addCache("test");
//Add key-value pairs
Cache cch = cchm.getCache("test");
cch.put(new Element("tarzan", "Jane"));
cch.put(new Element("kermit", "Piggy"));
//Retrieve a value for a given key
Element elt = cch.get("tarzan");
String sPartner = (elt == null ? null : elt.getObjectValue().toString());
System.out.println(sPartner); //Outputs "Jane"
//Required or the application will hang
cchm.removeAllCaches(); //alternatively: cchm.shutdown();
}
}
Related
Following issue: In a client/server environment with Spring-Boot and Kotlin the client wants to create objects of type A and therefore posts the data through a RESTful endpoint to the server.
Entity A is realized as a data class in Kotlin like this:
data class A(val mandatoryProperty: String)
Business-wise that property (which is a primary key, too) must never be null. However, it is not known by the client, as it gets generated quite expensively by a Spring #Service Bean on the server.
Now, at the endpoint Spring tries to deserialize the client's payload into an object of type A, however, the mandatoryProperty is unknown at that point in time, which would result in a mapping exception.
Several ways to circumvent that problem, none of which really amazes me.
Don't expect an object of type A at the endpoint, but get a bunch of parameters describing A that are passed on until the entity has actually been created and mandatoryProperty is present . Quite cumbersome actually, since there are a lot more properties than just that single one.
Quite similar to 1, but create a DTO. One of my favorites, however, since data classes can't be extended it would mean to duplicate the properties of type A into the DTO (except for the mandatory property) and copy them over. Furthemore, when A grows, the DTO has to grow, too.
Make mandatoryProperty nullable and work with !! operator throughout the code. Probably the worst solution as it foils the sense of nullable and non-nullable variables.
The client would set a dummy value for the mandatoryProperty which is replaced as soon as the property has been generated. However, A is validated by the endpoint and therefore the dummy value must obey its #Pattern constraint. So each dummy value would be a valid primary key, which gives me a bad feeling.
Any other ways I might have overseen that are more feasible?
I don't think there is a general-purpose answer to this... So I will just give you my 2 cents regarding your variants...
Your first variant has a benefit which no other really has, i.e. that you will not use the given objects for anything else then they were designed to be (i.e. endpoint or backend purposes only), which however probably will lead to cumbersome development.
The second variant is nice, but could lead to some other development errors, e.g. when you thought you used the actual A but you were rather operating on the DTO instead.
Variant 3 and 4 are in that regard similar to 2... You may use it as A even though it has all the properties of a DTO only.
So... if you want to go the safe route, i.e. no one should ever use this object for anything else then its specific purpose you should probably use the first variant. 4 sounds rather like a hack. 2 & 3 are probably ok. 3 because you actually have no mandatoryProperty when you use it as DTO...
Still, as you have your favorite (2) and I have one too, I will concentrate on 2 & 3, starting with 2 using a subclass approach with a sealed class as supertype:
sealed class AbstractA {
// just some properties for demo purposes
lateinit var sharedResettable: String
abstract val sharedReadonly: String
}
data class A(
val mandatoryProperty: Long = 0,
override val sharedReadonly: String
// we deliberately do not override the sharedResettable here... also for demo purposes only
) : AbstractA()
data class ADTO(
// this has no mandatoryProperty
override val sharedReadonly: String
) : AbstractA()
Some demo code, demonstrating the usage:
// just some random setup:
val a = A(123, "from backend").apply { sharedResettable = "i am from backend" }
val dto = ADTO("from dto").apply { sharedResettable = "i am dto" }
listOf(a, dto).forEach { anA ->
// somewhere receiving an A... we do not know what it is exactly... it's just an AbstractA
val param: AbstractA = anA
println("Starting with: $param sharedResettable=${param.sharedResettable}")
// set something on it... we do not mind yet, what it is exactly...
param.sharedResettable = UUID.randomUUID().toString()
// now we want to store it... but wait... did we have an A here? or a newly created DTO?
// lets check: (demo purpose again)
when (param) {
is ADTO -> store(param) // which now returns an A
is A -> update(param) // maybe updated also our A so a current A is returned
}.also { certainlyA ->
println("After saving/updating: $certainlyA sharedResettable=${certainlyA.sharedResettable /* this was deliberately not part of the data class toString() */}")
}
}
// assume the following signature for store & update:
fun <T> update(param : T) : T
fun store(a : AbstractA) : A
Sample output:
Starting with: A(mandatoryProperty=123, sharedReadonly=from backend) sharedResettable=i am from backend
After saving/updating: A(mandatoryProperty=123, sharedReadonly=from backend) sharedResettable=ef7a3dc0-a4ac-47f0-8a73-0ca0ef5069fa
Starting with: ADTO(sharedReadonly=from dto) sharedResettable=i am dto
After saving/updating: A(mandatoryProperty=127, sharedReadonly=from dto) sharedResettable=57b8b3a7-fe03-4b16-9ec7-742f292b5786
I did not yet show you the ugly part, but you already mentioned it yourself... How do you transform your ADTO to A and viceversa? I will leave that up to you. There are several approaches here (manually, using reflection or mapping utilities, etc.).
This variant cleanly seperates all the DTO specific from the non-DTO-specific properties. However it will also lead to redundant code (all the override, etc.). But at least you know on which object type you operate and can setup signatures accordingly.
Something like 3 is probably easier to setup and to maintain (regarding the data class itself ;-)) and if you set the boundaries correctly it may even be clear, when there is a null in there and when not... So showing that example too. Starting with a rather annoying variant first (annoying in the sense that it throws an exception when you try accessing the variable if it wasn't set yet), but at least you spare the !! or null-checks here:
data class B(
val sharedOnly : String,
var sharedResettable : String
) {
// why nullable? Let it hurt ;-)
lateinit var mandatoryProperty: ID // ok... Long is not usable with lateinit... that's why there is this ID instead
}
data class ID(val id : Long)
Demo:
val b = B("backend", "resettable")
// println(newB.mandatoryProperty) // uh oh... this hurts now... UninitializedPropertyAccessException on the way
val newB = store(b)
println(newB.mandatoryProperty) // that's now fine...
But: even though accessing mandatoryProperty will throw an Exception it is not visible in the toString nor does it look nice if you need to check whether it already has been initialized (i.e. by using ::mandatoryProperty::isInitialized).
So I show you another variant (meanwhile my favorite, but... uses null):
data class C(val mandatoryProperty: Long?,
val sharedOnly : String,
var sharedResettable : String) {
// this is our DTO constructor:
constructor(sharedOnly: String, sharedResettable: String) : this(null, sharedOnly, sharedResettable)
fun hasID() = mandatoryProperty != null // or isDTO, etc. what you like/need
}
// note: you could extract the val and the method also in its own interface... then you would use an override on the mandatoryProperty above instead
// here is what such an interface may look like:
interface HasID {
val mandatoryProperty: Long?
fun hasID() = mandatoryProperty != null // or isDTO, etc. what you like/need
}
Usage:
val c = C("dto", "resettable") // C(mandatoryProperty=null, sharedOnly=dto, sharedResettable=resettable)
when {
c.hasID() -> update(c)
else -> store(c)
}.also {newC ->
// from now on you should know that you are actually dealing with an object that has everything in place...
println("$newC") // prints: C(mandatoryProperty=123, sharedOnly=dto, sharedResettable=resettable)
}
The last one has the benefit, that you can use the copy-method again, e.g.:
val myNewObj = c.copy(mandatoryProperty = 123) // well, you probably don't do that yourself...
// but the following might rather be a valid case:
val myNewDTO = c.copy(mandatoryProperty = null)
The last one is my favorite as it needs the fewest code and uses a val instead (so also no accidental override is possible or you operate on a copy instead). You could also just add an accessor for the mandatoryProperty if you do not like using ? or !!, e.g.
fun getMandatoryProperty() = mandatoryProperty ?: throw Exception("You didn't set it!")
Finally if you have some helper methods like hasID(isDTO or whatever) in place it might also be clear from the context what you are exactly doing. The most important is probably to setup a convention that everyone understands, so they know when to apply what or when to expect something specific.
What is the cleaner way of extracting predicates which will have multiple uses. Methods or Class fields?
The two examples:
1.Class Field
void someMethod() {
IntStream.range(1, 100)
.filter(isOverFifty)
.forEach(System.out::println);
}
private IntPredicate isOverFifty = number -> number > 50;
2.Method
void someMethod() {
IntStream.range(1, 100)
.filter(isOverFifty())
.forEach(System.out::println);
}
private IntPredicate isOverFifty() {
return number -> number > 50;
}
For me, the field way looks a little bit nicer, but is this the right way? I have my doubts.
Generally you cache things that are expensive to create and these stateless lambdas are not. A stateless lambda will have a single instance created for the entire pipeline (under the current implementation). The first invocation is the most expensive one - the underlying Predicate implementation class will be created and linked; but this happens only once for both stateless and stateful lambdas.
A stateful lambda will use a different instance for each element and it might make sense to cache those, but your example is stateless, so I would not.
If you still want that (for reading purposes I assume), I would do it in a class Predicates let's assume. It would be re-usable across different classes as well, something like this:
public final class Predicates {
private Predicates(){
}
public static IntPredicate isOverFifty() {
return number -> number > 50;
}
}
You should also notice that the usage of Predicates.isOverFifty inside a Stream and x -> x > 50 while semantically the same, will have different memory usages.
In the first case, only a single instance (and class) will be created and served to all clients; while the second (x -> x > 50) will create not only a different instance, but also a different class for each of it's clients (think the same expression used in different places inside your application). This happens because the linkage happens per CallSite - and in the second case the CallSite is always different.
But that is something you should not rely on (and probably even consider) - these Objects and classes are fast to build and fast to remove by the GC - whatever fits your needs - use that.
To answer, it's better If you expand those lambda expressions for old fashioned Java. You can see now, these are two ways we used in our codes. So, the answer is, it all depends how you write a particular code segment.
private IntPredicate isOverFifty = new IntPredicate<Integer>(){
public void test(number){
return number > 50;
}
};
private IntPredicate isOverFifty() {
return new IntPredicate<Integer>(){
public void test(number){
return number > 50;
}
};
}
1) For field case you will have always allocated predicate for each new your object. Not a big deal if you have a few instances, likes, service. But if this is a value object which can be N, this is not good solution. Also keep in mind that someMethod() may not be called at all. One of possible solution is to make predicate as static field.
2) For method case you will create the predicate once every time for someMethod() call. After GC will discard it.
I am learning Phoenix CSV Bulk Load recently and I found that the source code of org.apache.phoenix.mapreduce.CsvToKeyValueReducer will cause OOM ( java heap out of memory ) when columns are large in one row (In my case, 44 columns in one row and the avg size of one row is 4KB).
What's more, this class is similar with the hbase bulk load reducer class - KeyValueSortReducer. It means that OOM may happen when using KeyValueSortReducer in my case.
So, I have a question of KeyValueSortReducer - why it need to sort all kvs in treeset first and then write all of them to context? If I remove the treeset sorting code and wirte all kvs directly to the context, the result will be different or be wrong ?
I am looking forward to your reply. Best wish to you!
here is the source code of KeyValueSortReducer:
public class KeyValueSortReducer extends Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable, KeyValue> {
protected void reduce(ImmutableBytesWritable row, java.lang.Iterable<KeyValue> kvs,
org.apache.hadoop.mapreduce.Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable, KeyValue>.Context context)
throws java.io.IOException, InterruptedException {
TreeSet<KeyValue> map = new TreeSet<KeyValue>(KeyValue.COMPARATOR);
for (KeyValue kv: kvs) {
try {
map.add(kv.clone());
} catch (CloneNotSupportedException e) {
throw new java.io.IOException(e);
}
}
context.setStatus("Read " + map.getClass());
int index = 0;
for (KeyValue kv: map) {
context.write(row, kv);
if (++index % 100 == 0) context.setStatus("Wrote " + index);
}
}
}
please have a look in to this case study. there are some requirements where you need to order keyvalue pairs into the same row in the HFile.
1.The main question : why hbase KeyValueSortReducer need to sort all KeyValue ?
Thanks to RamPrasad G's reply, we can look into the case study : http://www.deerwalk.com/blog/bulk-importing-data/
This case study will tell us more about hbase bulk import and the reducer class - KeyValueSortReducer.
The reason of sorting all KeyValue in KeyValueSortReducer reduce method is that the HFile need this sorting. you can focus on the section :
A frequently occurring problem while reducing is lexical ordering. It happens when keyvalue list to be outputted from reducer is not sorted. One example is when qualifier names for a single row are not written in lexically increasing order. Another being when multiple rows are written in same reduce method and row id’s are not written in lexically increasing order. It happens because reducer output is never sorted. All sorting occurs on keyvalue outputted by mapper and before it enters reduce method. So, it tries to add keyvalue’s outputted from reduce method in incremental fashion assuming that it is presorted. So, before keyvalue’s are written into context, they must be added into sorting list like TreeSet or HashSet with KeyValue.COMPARATOR as comparator and then writing them in order specified by sorted list.
So, when your columns is very large, it will use a lot of memory for sorting.
As the source code of KeyValueSortReducer memtioned :
/**
* Emits sorted KeyValues.
* Reads in all KeyValues from passed Iterator, sorts them, then emits
* KeyValues in sorted order. If lots of columns per row, it will use lots of
* memory sorting.
* #see HFileOutputFormat
*/
2.The referenced question : why Phoenix CSV BulkLoad reducer casue OOM ?
The reason of Phoenix CSV BulkLoad reducer casue OOM is the issue refer to PHOENIX-2649.
Due to the Comparator inside CsvTableRowKeyPair error to compare two CsvTableRowKeyPair and make all rows to pass by one single reducer in one single reduce call,
it will cause OOM quickly in my case.
Fortunately, Phoenix Team had fixed this issue upon the version of 4.7. If your phoenix version is under 4.7, please note about it and try to update your version,
or you can make a patch to your version.
I hope this answer will help you !
I am using SOLR 4.4.0 - I found (possible) issue related to internal caching mechanism.
JVM: -Xmx=15g but 12g was never free.
I created heap dump and analyze it using MemoryAnyzer - I found 2 x 6Gb used as cache data.
In second time I do the same for -Xmx12g - I found 1 x 3.5Gb
It was always the same cache.
I check in source code and I found:
/** Expert: The cache used internally by sorting and range query classes. */
public static FieldCache DEFAULT = new FieldCacheImpl();
see http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.4.0/org/apache/lucene/search/FieldCache.java#FieldCache.0DEFAULT
This is very bad news because it is public static field and it is used in about 160 places in source code.
MemoryAnalyzer say:
One instance of "org.apache.lucene.search.FieldCacheImpl" loaded by
"org.apache.catalina.loader.WebappClassLoader # 0x58c3a9848" occupies
4,103,248,240 (80.37%) bytes. The memory is accumulated in one
instance of "java.util.HashMap$Entry[]" loaded by "".
Keywords java.util.HashMap$Entry[]
org.apache.catalina.loader.WebappClassLoader # 0x58c3a9848
org.apache.lucene.search.FieldCacheImpl
I do not know how to manage this kind of caches - any advice?
And finally I got OutOfMemoryError + 12Gb of memory is blocked.
I implemented kind of workaround:
I created this kind of class:
public class InternalApplicationCacheManager implements InternalApplicationCacheManagerMBean {
public synchronized int getInternalCacheSize() {
return FieldCache.DEFAULT.getCacheEntries().length;
}
public synchronized void purgeInternalCaches() {
FieldCache.DEFAULT.purgeAllCaches();
}
}
and registered it in JMX via org.apache.lucene.search.FieldCacheImpl
...
private synchronized void init() {
...
initBeans();
}
private void initBeans() {
try {
InternalApplicationCacheManager cacheManagerMBean = new InternalApplicationCacheManager();
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
ObjectName name = new ObjectName("org.apache.lucene.search.jmx:type=InternalApplicationCacheManager");
mbs.registerMBean(cacheManagerMBean, name);
} catch (InstanceAlreadyExistsException e) {
...
}
}
...
This solution provide you invalidate internal caches - which solve partially this issue.
Unfortunately there are other places (mostly caches) where some data is stored and not removed as fast as I expect.
If you use FieldCacheRangeFilter you may wanna try range filters which work without field cache. If sorting is an issue, you may try using less sort fields or ones with a data type using less memory.
The field cache for each reader/atomic reader is thrown away when the reader is garbage collected. so a re-initialization of the reader should clear the cache which also means that the first operation using the cache will be a lot slower.
Fact is: FieldCache based range filter and sorting relies on the cache. There is no getting around when you really need those. You only can adapt your usage to minimize the memory consumption.
I was under the impression that the most costly method in Jsoup's API is parse().
But I just discovered that Document.html() could be even slower.
Given that the Document is the output of parse() (i.e. this is after parsing), I find this surprising.
Why is Document.html() so slow?
Answering myself. The Element.html() method is implemented as:
public String html() {
StringBuilder accum = new StringBuilder();
html(accum);
return accum.toString().trim();
}
Using StringBuilder instead of String is already a good thing, and the use of StringBuilder.toString() and String.trim() may not explain the slowness of Document.html(), even for a relatively large document.
But in the middle, our method calls an overloaded version, Element.html(StringBuilder) which loops through all child nodes in the document:
private void html(StringBuilder accum) {
for (Node node : childNodes)
node.outerHtml(accum);
}
Thus if the document contains lots of child nodes, it will be slow.
It would be interesting to see whether there could be a faster implementation of this.
For example, if Jsoup stores a cached version of the raw html that was provided to it via Jsoup.parse(). As an option of course, to maintain backward compatibility and small footprint in memory.