Performance problems with Redis/Spring Data Redis - performance

I have a performance problem in my Spring Boot application when it's communicating with Redis that I was hoping someone with expertise on the topic could shed some light on.
Explanation of what I'm trying to do
In short, my application I have 2 nested maps and 3 maps of lists which I want to save to Redis and load back into the application when the data is needed. The data in the first nested map is fairly big, with several levels of non-primitive data types (and lists of these). At the moment I have structured the data in Redis using repositories and Redis Hashes, with repositories A, B, and C, and two different ways of lookup on id for the primary datatype (MyClass) in A. B and C holds data that is referenced from a value in A (with the #Reference annotation).
Performance analysis
Using JProfiler, I have found that the bottleneck is somewhere between my call to a.findOne() and the end of reading the response from Redis (before any conversion from byte[] to MyClass has taken place). I have looked at the slowlog on my Redis server to check for any slow and blocking actions and found none. Each HGETALL command in Redis takes 400μs on average (for a complete hash in A, including finding the referenced hashes in B and C). What strikes me as weird is that timing the a.findOne() call takes from 5-20ms for one single instance of MyClass, depending on how big the hashes in B and C are. A single instance has on average ~2500 hash fields in total when references to B and C are included. When this is done ~900 times on the for the first nested map, I have to wait 10s to get all my data, which is way too long. In comparison, the other nested nested map, which has no references to C (the biggest part of the data), is timed to ~10μs in Redis and <1ms in Java.
Does this analysis seem like normal behavior when the Redis instance is run locally on the same 2015 MacBook Pro as the Spring Boot application? I understand that it will take longer for the complete findOne() method to finish than the actual HGETALL command in Redis, but I don't get why the difference is this big. If anyone could shed some light on the performance of the stuff going on under the hood in the Jedis connection code, I'd appreciate it.
Examples of my data structure in Java
#RedisHash("myClass")
public class MyClass {
#Id
private String id;
private Date date;
private Integer someValue;
#Reference
private Set<C> cs;
private someClass someObject;
private int somePrimitive;
private anotherClass anotherObject;
#Reference
private B b;
Excerpt of class C (a few primitives removed for clarity):
#RedisHash("c")
public class C implements Comparable<BasketValue>, Serializable {
#Id
private String id;
private EnumClass someEnum;
private aClass anObject;
private int aNumber;
private int anotherNumber;
private Date someDate;
private List<CounterClass> counterObjects;
Excerpt of class B:
#RedisHash("b")
public class B implements Serializable {
#Id
private int code;
private String productCodes;
private List<ListClass> listObject;
private yetAnotherClass yetAnotherObject;
private Integer someInteger;

Related

Is Guava's HashFunction threadsafe?

Is HashFunction in Guava library Threadsafe?
static HashFunction hashFunction = Hashing.sha256();
private static String getHashCredentials(String String) {
return hashFunction.newHasher()
.putString(String, Charsets.UTF_8).hash()
.toString();
}
Yes, if you're using built-in HashFunctions, they're pure function -- see the documentation page for HashFunction:
A hash function is a collision-averse pure function that maps an
arbitrary block of data to a number called a hash code.
Unpacking this definition:
(...)
pure function: the value produced must depend only on the input bytes, in the order they appear. Input data is never modified.
HashFunction instances should always be stateless, and therefore
thread-safe.
Bear in mind that because HashFunction is an interface, you could create stateful and non-thread-safe implementation, but it would break the contract.

JAVA 8 Extract predicates as fields or methods?

What is the cleaner way of extracting predicates which will have multiple uses. Methods or Class fields?
The two examples:
1.Class Field
void someMethod() {
IntStream.range(1, 100)
.filter(isOverFifty)
.forEach(System.out::println);
}
private IntPredicate isOverFifty = number -> number > 50;
2.Method
void someMethod() {
IntStream.range(1, 100)
.filter(isOverFifty())
.forEach(System.out::println);
}
private IntPredicate isOverFifty() {
return number -> number > 50;
}
For me, the field way looks a little bit nicer, but is this the right way? I have my doubts.
Generally you cache things that are expensive to create and these stateless lambdas are not. A stateless lambda will have a single instance created for the entire pipeline (under the current implementation). The first invocation is the most expensive one - the underlying Predicate implementation class will be created and linked; but this happens only once for both stateless and stateful lambdas.
A stateful lambda will use a different instance for each element and it might make sense to cache those, but your example is stateless, so I would not.
If you still want that (for reading purposes I assume), I would do it in a class Predicates let's assume. It would be re-usable across different classes as well, something like this:
public final class Predicates {
private Predicates(){
}
public static IntPredicate isOverFifty() {
return number -> number > 50;
}
}
You should also notice that the usage of Predicates.isOverFifty inside a Stream and x -> x > 50 while semantically the same, will have different memory usages.
In the first case, only a single instance (and class) will be created and served to all clients; while the second (x -> x > 50) will create not only a different instance, but also a different class for each of it's clients (think the same expression used in different places inside your application). This happens because the linkage happens per CallSite - and in the second case the CallSite is always different.
But that is something you should not rely on (and probably even consider) - these Objects and classes are fast to build and fast to remove by the GC - whatever fits your needs - use that.
To answer, it's better If you expand those lambda expressions for old fashioned Java. You can see now, these are two ways we used in our codes. So, the answer is, it all depends how you write a particular code segment.
private IntPredicate isOverFifty = new IntPredicate<Integer>(){
public void test(number){
return number > 50;
}
};
private IntPredicate isOverFifty() {
return new IntPredicate<Integer>(){
public void test(number){
return number > 50;
}
};
}
1) For field case you will have always allocated predicate for each new your object. Not a big deal if you have a few instances, likes, service. But if this is a value object which can be N, this is not good solution. Also keep in mind that someMethod() may not be called at all. One of possible solution is to make predicate as static field.
2) For method case you will create the predicate once every time for someMethod() call. After GC will discard it.

Creating composite key class for Secondary Sort

I am trying to create a composite key class of a String uniqueCarrier and int month for Secondary Sort. Can anyone tell me, what are the steps for the same.
Looks like you have an equality problem since you're not using uniqueCarrier in your compareTo method. You need to use uniqueCarrier in your compareTo and equals methods (also define an equals method). From the java lang reference
The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false.
You can also implement a RawComparator so that you can compare them without deserializing for some faster performance.
However, I recommend (as I always do) to not write things like Secondary Sort yourself. These have been implemented (as well as dozens of other optimizations) in projects like Pig and Hive. E.g. if you were using Hive, all you need to write is:
SELECT ...
FROM my_table
ORDER BY month, carrier;
The above is a lot simpler to write than trying to figure out how to write Secondary Sorts (and eventually when you need to use it again, how to do it in a generic fashion). MapReduce should be considered a low level programming paradigm and should only be used (IMHO) when you need high performance optimizations that you don't get from higher level projects like Pig or Hive.
EDIT: Forgot to mention about Grouping comparators, see Matt's answer
Your compareTo() implementation is incorrect. You need to sort first on uniqueCarrier, then on month to break equality:
#Override
public int compareTo(CompositeKey other) {
if (this.getUniqueCarrier().equals(other.getUniqueCarrier())) {
return this.getMonth().compareTo(other.getMonth());
} else {
return this.getUniqueCarrier().compareTo(other.getUniqueCarrier());
}
}
One suggestion though: I typically choose to implement my attributes directly as Writable types if possible (for example, IntWriteable month and Text uniqueCarrier). This allows me to call write and readFields directly on them, and also use their compareTo. Less code to write is always good...
Speaking of less code, you don't have to call the parent constructor for your composite key.
Now for what is left to be done:
My guess is you are still missing a hashCode() method, which should only return the hash of the attribute you want to group on, in this case uniqueCarrier. This method is called by the default Hadoop partitionner to distribute work across reducers.
I would also write custom GroupingComparator and SortingComparator to make sure grouping happens only on uniqueCarrier, and that sorting behaves according to CompositeKey compareTo():
public class CompositeGroupingComparator extends WritableComparator {
public CompositeGroupingComparator() {
super(CompositeKey.class, true);
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
CompositeKey first = (CompositeKey) a;
CompositeKey second = (CompositeKey) b;
return first.getUniqueCarrier().compareTo(second.getUniqueCarrier());
}
}
public class CompositeSortingComparator extends WritableComparator {
public CompositeSortingComparator()
{
super (CompositeKey.class, true);
}
#Override
public int compare (WritableComparable a, WritableComparable b){
CompositeKey first = (CompositeKey) a;
CompositeKey second = (CompositeKey) b;
return first.compareTo(second);
}
}
Then, tell your Driver to use those two:
job.setSortComparatorClass(CompositeSortingComparator.class);
job.setGroupingComparatorClass(CompositeGroupingComparator.class);
Edit: Also see Pradeep's suggestion of implementing RawComparator to prevent having to unmarshall to an Object each time, if you want to optimize further.

Performance analysis of using protobuff builders as general data object

Since protobuffers are a wonderful alternative for java serialisation, we have used it extensively. Also, we have used java builders as general data object. On examining the speeds of constructing an object using message builder ,forming the instance parameter, and normal java primitives forming the object, we found that for an object containing 6 primitive fields, constructing the object using builder(which is the parameter of the object) took 1.1ms whereas using java primitives took only 0.3ms! And for a list of 50 of such fields! Are builders heavy, that using them as general data object affects the speed of construction to this extent?
Below is the sample design i used for the analysis,
message PersonList
{
repeated Person = 1;
message Person
{
optional string name = 1;
optional int32 age = 2;
optional string place = 3;
optional bool alive = 4;
optional string profession = 5;
}
}
The java equivalent
Class PersonList {
List<Person> personList;
Class Person {
String name;
int age;
String place;
boolean alive;
String profession;
}
/* getters and setters*/
}
I have a hard time imagining anything that contains only "6 primitive values" could take 7ms to construct. That's perhaps 100,000 times as long as it should take. So I'm not sure I understand what you're doing.
That said, protobuf builders are indeed more complicated than a typical POJO for a number of reasons. For example, protobuf objects keep track of which fields are currently set. Also, repeated primitives are boxed, which makes them pretty inefficient compared to a Java primitive array. So if you measure construction time alone you may see a significant difference. However, these effects are typically irrelevant compared to the time spent in the rest of your app's code.

Class: Immutability vs Not Extensible

I was reading that there are many reasons for making a class final in SO threads and also in an arcticle
Two of which were
1. To remove extensibility
2. to make class immutable.
Does making a class immutable have the characteristic along with it as being final ( it's methods )? I don't see the difference between the two?
Immutable object does not allow to change his state. Final class does not allow to inherit itself. For example class Foo (see below) is immutable (the state, ie _name is never changed ) and class Bar is mutable (rename method allows to change the state):
final class Foo
{
private String _name;
public Foo(string name)
{
_name = name;
}
public String getName()
{
return _name;
}
}
final class Bar
{
private String _name;
public Bar(string name)
{
_name = name;
}
public String getName()
{
return _name;
}
public void rename(string newName)
{
_name = newName;
}
}
It can sometimes be useful to recognize types as "verifiably deeply immutable", meaning that static analysis can demonstrate that (1) once an instance is constructed, none of its properties will ever change, and (2) every object instance to which it holds a reference is verifiably deeply immutable. Classes which are open to extension cannot be verifiably deeply immutable, because a static analyzer would have no way of knowing whether a mutable subclass might be created, and a reference to that mutable subclass stored within what's supposed to be a verifiably deeply immutable object.
On the other hand, it can sometimes be useful to have abstract (and thus extensible) classes which are specified to be deeply immutable. The abstract class would have no way of forcing derived classes to immutable, but any mutable derived classes should be considered "broken". The situation would be somewhat analogous to the requirement that two object instances which report themselves as "equal" to each other should report the same hash code. It's possible to design classes which violate that requirement, but any errant hash-table behavior that results is the fault of the broken hash-code function, rather than the hash table.
For example, one might have an abstract ImmutableMatrix property with a method to read the element at a given (row,column) location. One possible implementation would be to back an NxM ImmutableMatrix with an array of N*M elements. On the other hand, it may also be useful to define some subclasses like ImmutableDiagonalMatrix, with an array of N elements, where Value(R,C) would yield 0 for R!=C, and Arr[R] for R==C. If a significant fraction of the arrays one is using will be diagonal arrays, one could save a lot of memory for each such instance. While leaving the class extensible would leave open the possibility that someone might extend it in a fashion which is open to mutation, it would also leave open the possibility that a programmer who knew that many of the arrays a program used would fit some particular form could design a class to optimally store that form.

Resources