Store EhCache Object as value not reference? - caching

I'm using EhCache all over the application and I have stumbled upon a problem.
I need to cache "raw" data (tree of maps and some lists). The cached value, after retrieved from cache is meant to be processed further (some elements filtered out, reordered etc).
My problem is that that I want to keep the original cached value intact - as it is meant to be used for some "post processing". So ultimately I want to store object "value/deep clone" not its reference.
An example code:
//create a List of Maps
List list = new ArrayList();
Map<String, String> map = new HashMap<String, String>();
map.put("key1", "v1");
map.put("key2", "v2");
list.add(map);
//add to cache
cache.put("cacheRegion", "list", list);
//now add a new element to list (2nd map)
list.add(new TreeMap());
//now remove 1 entry from the 1st Map
map = (Map<String, String>) list.get(0);
map.remove("key1");
list = (List) cache.get("cacheRegion", "list");
assertEquals("list should still have 1 element, despite adding new map after cache put", 1, list.size());
//check map
map = (Map<String, String>) list.get(0);
assertEquals("map should still contain 2 entries, as it was added to the cache", 2, map.size());
Does ehCache support that?
M

There is an attribute in ehcache (copyOnRead) which can be set to true for this.
the cache configuration will look something like :
<cache name="copyCache"
maxElementsInMemory="10"
eternal="false"
timeToIdleSeconds="5"
timeToLiveSeconds="10"
overflowToDisk="false"
copyOnRead="true"
copyOnWrite="true">
</cache>

Related

Get a list of entities by a list of attributes in Spring Boot eg. findAllByIdIn(List<Integer> ids)

I need help getting a list of entities by a list of attributes, say "id', "name' etc. Below is what I have so far and I'm expecting three entities but the result shows only one. The logic of the code below is basically to grab all entries with LOGIN as eventType and customer Id in the event repo. We're then getting the "eventdetailsIds" into a separate collection from the result found and searching(findByIdIn) for them in the LoginEventRepo. EventDetailsId is more or less a foreign key that joins event repo and loginEvent Repo. PS. I have implemented manually but I just need a more concise approach. Thanks.
public List<CustomerEventResponse> getAllCustomerLogins(String customerId) {
List<CustomerEventResponse> customerEventResponses;
List<EventEntity> eventEntityList = eventRepository.findByCustomerIdAndEventType(customerId, LOGIN_EVENT);
List<LoginEventDetailsEntity> loginEventDetailsEntityList = new ArrayList<>();
List<Integer> eventDetailsIds = new ArrayList<>();
for(EventEntity e: eventEntityList){
eventDetailsIds.add(e.getEventDetailsId());
Optional<LoginEventDetailsEntity> loginEventDetails = loginEventDetailsRepository.findById(e.getEventDetailsId());
loginEventDetails.ifPresent(loginEventDetailsEntityList::add);
}
List<LoginEventDetailsEntity> loginDetails = loginEventDetailsRepository.findAllById(eventDetailsIds);
The database is below:
Thanks.

Spring Data elastic search with out entity fields

I'm using spring data elastic search, Now my document do not have any static fields, and it is accumulated data per qtr, I will be getting ~6GB/qtr (we call them as versions). Lets say we get 5GB of data in Jan 2021 with 140 columns, in the next version I may get 130 / 120 columns, which we do not know, The end user requirement is to get the information from the database and show it in a tabular format, and he can filter the data. In MongoDB we have BasicDBObject, do we have anything in springboot elasticsearch
I can provide, let say 4-5 columns which are common in every version record and apart from that, I need to retrieve the data without mentioning the column names in the pojo, and I need to use filters on them just like I can do in MongoDB
List<BaseClass> getMultiSearch(#RequestBody Map<String, Object>[] attributes) {
Query orQuery = new Query();
Criteria orCriteria = new Criteria();
List<Criteria> orExpression = new ArrayList<>();
for (Map<String, Object> accounts : attributes) {
Criteria expression = new Criteria();
accounts.forEach((key, value) -> expression.and(key).is(value));
orExpression.add(expression);
}
orQuery.addCriteria(orCriteria.orOperator(orExpression.toArray(new Criteria[orExpression.size()])));
return mongoOperations.find(orQuery, BaseClass.class);
}
You can define an entity class for example like this:
public class GenericEntity extends LinkedHashMap<String, Object> {
}
To have that returned in your calling site:
public SearchHits<GenericEntity> allGeneric() {
var criteria = Criteria.where("fieldname").is("value");
Query query = new CriteriaQuery(criteria);
return operations.search(query, GenericEntity.class, IndexCoordinates.of("indexname"));
}
But notice: when writing data into Elasticsearch, the mapping for new fields/properties in that index will be dynamically updated. And there is a limit as to how man entries a mapping can have (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html). So take care not to run into that limit.

Spark RDD to update

I am loading a file from HDFS into a JavaRDD and wanted to update that RDD. For that I am converting it to IndexedRDD (https://github.com/amplab/spark-indexedrdd) and I am not able to as I am getting Classcast Exception.
Basically I will make key value pair and update the key. IndexedRDD supports update. Is there any way to convert ?
JavaPairRDD<String, String> mappedRDD = lines.flatMapToPair( new PairFlatMapFunction<String, String, String>()
{
#Override
public Iterable<Tuple2<String, String>> call(String arg0) throws Exception {
String[] arr = arg0.split(" ",2);
System.out.println( "lenght" + arr.length);
List<Tuple2<String, String>> results = new ArrayList<Tuple2<String, String>>();
results.addAll(results);
return results;
}
});
IndexedRDD<String,String> test = (IndexedRDD<String,String>) mappedRDD.collectAsMap();
The collectAsMap() returns a java.util.Map containing all the entries from your JavaPairRDD, but nothing related to Spark. I mean, that function is to collect the values in one node and work with plain Java. Therefore, you cannot cast it to IndexedRDD or any other RDD type as its just a normal Map.
I haven't used IndexedRDD, but from the examples you can see that you need to create it by passing to its constructor a PairRDD:
// Create an RDD of key-value pairs with Long keys.
val rdd = sc.parallelize((1 to 1000000).map(x => (x.toLong, 0)))
// Construct an IndexedRDD from the pairs, hash-partitioning and indexing
// the entries.
val indexed = IndexedRDD(rdd).cache()
So in your code it should be:
IndexedRDD<String,String> test = new IndexedRDD<String,String>(mappedRDD.rdd());

Elasticearch and Spark: Updating existing entities

What is the correct way, when using Elasticsearch with Spark, to update existing entities?
I wanted to something like the following:
Get existing data as a map.
Create a new map, and populate it with the updated fields.
Persist the new map.
However, there are several issues:
The list of returned fields cannot contain the _id, as it is not part of the source.
If, for testing, I hardcode an existing _id in the map of new values, the following exception is thrown:
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest
How should the _id be retrieved, and how should it be passed back to Spark?
I include the following code below to better illustrate what I was trying to do:
JavaRDD<Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc, INDEX_NAME+"/"+TYPE_NAME,
"?source=,field1,field2).values();
Iterator<Map<String, Object>> iter = esRDD.toLocalIterator();
List<Map<String, Object>> listToPersist = new ArrayList<Map<String, Object>>();
while(iter.hasNext()){
Map<String, Object> map = iter.next();
// Get existing values, and do transformation logic
Map<String, Object> newMap = new HashMap<String, Object>();
newMap.put("_id", ??????);
newMap.put("field1", new_value);
listToPersist.add(newMap);
}
JavaRDD javaRDD = jsc.parallelize(ImmutableList.copyOf(listToPersist));
JavaEsSpark.saveToEs(javaRDD, INDEX_NAME+"/"+TYPE_NAME);
Ideally, I would want to update the existing map in place, rather than create a new one.
Does anyone have any example code to show, when using Spark, the correct way to update existing entities in elasticsearch?
Thanks
This is how I've done it (Scala/Spark 2.3/Elastic-Hadoop v6.5).
To read (id or other metadata):
spark
.read
.format("org.elasticsearch.spark.sql")
.option("es.read.metadata",true) // allow to read metadata
.load("yourindex/yourtype")
.select(col("_metadata._id").as("myId"),...)
To update particular columns in ES:
myDataFrame
.select("myId","columnToUpdate")
.saveToEs(
"yourindex/yourtype",
Map(
"es.mapping.id" -> "myId",
"es.write.operation" -> "update", // important to change operation to partial update
"es.mapping.exclude" -> "myId"
)
)
Try adding this upsert to your Spark:
.config("es.write.operation", "upsert")
that will let you add new fields to existing documents
According to Elasticsearch Configuration you can get document metadata like _id by set read metadata option to true:
.config("es.read.metadata", "true")
And i think you cannot use '_id' as field name.
But you can create new field with different name like:
newMap.put("idfield", yourId);
then set name of the new field as a value for mapping id option to inform elastic that this field has the document id:
.config("es.mapping.id", "idfield")
BTW don't forget to set write operation as update:
.config("es.write.operation", "update")

How do I pass multiple "bq" arguments to LocalParams in SolrNet?

LocalParams is really just a Dictionary<string, string> behind the scenes.
However, I want to pass multiple Boost Queries, which use the key "bq". Obviously, any attempt to add my second "bq" key will fail with An item with the same key has already been added.
var lp = new LocalParams();
lp.Add("bq", "ContentType:Update^3.0");
lp.Add("bq", "ContentType:Comment^0.5"); // Error occurs here...
What's the trick to passing multiple Boost Queries (or multiple anything, really)...
The comment above set me onto ExtraParams.
I thought it wouldn't work since that was a Dictionary<string, string> (thus leaving me in the same situation), but the actual property definition is IEnumerable<KeyValuePair<string, string>>. It's just set to a Dictionary<string,string> in the constructor.
So I did this:
var extraParams = new List<KeyValuePair<string, string>>();
extraParams.Add(new KeyValuePair<string, string>("bq", "SomeQuery^10"));
extraParams.Add(new KeyValuePair<string, string>("bq", "SomeOtherQuery^10"));
var options new new QueryOptions();
options.ExtraParams = extraParams; //Since my List implements the right interface
solr.Query(myQuery, options)
My testing shows that it works as intended.

Resources