Ignite 2.4.0 - SqlQuery results do not match with results of query from H2 console - spring

We implemented a caching solution using ignite 2.0.0 version for data structure that looks like this.
public class EntityPO {
#QuerySqlField(index = true)
private Integer accessZone;
#QuerySqlField(index = true)
private Integer appArea;
#QuerySqlField(index = true)
private Integer parentNodeId;
private Integer dbId;
}
List<EntityPO> nodes = new ArrayList<>();
SqlQuery<String, EntityPO> sql =
new SqlQuery<>(EntityPO.class, "accessZone = ? and appArea = ? and parentNodeId is not null");
sql.setArgs(accessZoneId, appArea);
CacheConfiguration<String, EntityPO> cacheconfig = new
CacheConfiguration<>(cacheName);
cacheconfig.setCacheMode(CacheMode.PARTITIONED);
cacheconfig.setAtomicityMode(CacheAtomicityMode.ATOMIC);
cacheconfig.setIndexedTypes(String.class, EntityPO.class);
cacheconfig.setOnheapCacheEnabled(true);
cacheconfig.setBackups(numberOfBackUpCopies);
cacheconfig.setName(cacheName);
cacheconfig.setQueryParallelism(1);
cache = ignite.getOrCreateCache(cacheconfig);
We have method that looks for node in a particular accessZone and appArea. This method works fine in 2.0.0, we upgraded to the latest version 2.4.0 version and this method no longer returns anything(zero records). We enabled H2 debug console and ran the same query and we are seeing the same atleast 3k records. Downgrading the library back to 2.0.0 makes the code work again. Please let me know if you need more information to help with this question
Results from H2 console.
H2 Console Results

If you use persistence, please check a baseline topology for you cluster.
Baseline Topology is the major feature introduced in the 2.4 version.
Briefly, the baseline topology is a set server nodes that could store the data. Most probably the cause of your issue is to one or several server nodes are not in the baseline.

Related

Elastic-Cloud Not Receiving Data from Serilog Sink

I set up an Elastic Cloud to offload my local elasticsearch config (as one does), but for reasons unknown to me, I can't get it to show any logs in Elastic Cloud, despite it working fine locally.
The code I got: (modified for privacy reasons)
//var uri = new Uri("http://localhost:9200"); // old one
var uri = new Uri("https://my-server.kb.eastus2.azure.elastic-cloud.com:9243");
var sinkOptions = new ElasticsearchSinkOptions(uri)
{
AutoRegisterTemplate = true,
ModifyConnectionSettings = x => x.BasicAuthentication("elastic", "the password I was given"),
IndexFormat = $"test-logs-{env.EnvironmentName?.ToLower().Replace('.', '-')}-{DateTime.Now:yyyy-MM}",
};
Log.Logger = new LoggerConfiguration()
.ReadFrom.Configuration(config)
.Enrich.FromLogContext()
.Enrich.WithMachineName()
.WriteTo.Console()
.WriteTo.Elasticsearch(sinkOptions)
.Enrich.WithProperty("Environment", env.EnvironmentName)
.CreateLogger();
There are two possible reasons I can think of that might be the cause of this not working:
The credentials are wrong
The Uri is wrong
Every solution I've been given so far has provided the data in this fashion, and nowhere does it say what the URI I'm supposed to use looks like.
I get no errors.
I get no warnings.
I get no logs.
What am I doing wrong here?
The issue was using the incorrect uri. I wrote
my-server.kb.eastus2.azure.elastic-cloud.com:9243 rather than
my-server.es.eastus2.azure.elastic-cloud.com:9243.
Note the very tiny difference that is kb vs es in the url

couchbase upsert/insert silently failing with ttl

i am trying to upsert 10 documents using spring boot. It is failing to upsert "few documents" with TTL.There is no error or exception. If i do not provide ttl then it is working as expected.
In addition to that, if i increase the ttl to a different value then also all the documents are getting created.
On the other hand, if i reduce the ttl then failing to insert few more docuemnts.
I tried to insert the failed document(single document out of 10) from another poc with the same ttl the document is getting created.
public Flux<JsonDocument> upsertAll(final List<JsonDocument> jsonDocuments) {
return Flux
.from(keys())
.flatMap(key -> Flux
.fromIterable(jsonDocuments)
.parallel()
.runOn(Schedulers.parallel())
.flatMap(jsonDocument -> {
final String arg = String.format("upsertAll-%s", jsonDocument);
return Mono
.just(asyncBucket
.upsert(jsonDocument, 1000, TimeUnit.MILLISECONDS)
.doOnError(error -> log.error(jsonDocument.content(), error, "failed to upsert")))
.map(obs -> Tuples.of(obs, jsonDocument.content()))
.map(tuple2 -> log.observableHandler(tuple2))
.map(observable1 -> Tuples.of(observable1, jsonDocument.content()))
.flatMap(tuple2 -> log.monoHandler(tuple2))
;
})
.sequential())
;
}
List<JsonDocument> jsonDocuments = new LinkedList<>();
dbService.upsertAll(jsonDocuments)
.subscribe();
some one please suggest how to resolve this issue.
Due to an oddity in the Couchbase server API, TTL values less than 30 days are treated differently than values greater than 30 days.
In order to get consistent behavior with Couchbase Java SDK 2.x, you'll need to adjust the TTL value before passing it to the SDK:
// adjust TTL for Couchbase Java SDK 2.x
public static int adjustTtl(int ttlSeconds) {
return ttlSeconds < TimeUnit.DAYS.toSeconds(30)
? ttlSeconds
: (int) (ttlSeconds + (System.currentTimeMillis() / 1000));
}
In Couchbase Java SDK 3.0.6 this is no longer required; just pass a Duration and the SDK will adjust the value behind the scenes if necessary.

How to set the starting point when using the Redis scan command in spring boot

i want to migrate 70million data redis(sentinel-mode) to redis(cluster-mode)
ScanOptions options = ScanOptions.scanOptions().build();
Cursor<byte[]> c = sentinelTemplate.getConnectionFactory().getConnection().scan(options);
while(c.hasNext()){
count++;
String key = new String(c.next());
key = key.trim();
String value = (String)sentinelTemplate.opsForHash().get(key,"tc");
//Thread.sleep(1);
clusterTemplate.opsForHash().put(key, "tc", value);
}
I want to scan again from a certain point because redis connection disconnected at some point.
How to set the starting point when using the Redis scan command in spring boot?
Moreover, whenever the program is executed using the above code, the connection is broken when almost 20 million data are moved.

How to do _cat/indices/<index_name_with_reg_ex> with JAVA API?

I have some indexes with name test-1-in, test-2-in, test-3-in. I want to do _cat/indices/test-*-in from JAVA API. How to do this?
I tried using the IndexAdminClient but no luck.
Given an ElasticSearch Client object:
client.admin().indices()
.getIndex(new GetIndexRequest().indices("regex-*"))
.actionGet().getIndices();
In addition to Mario's answer, use the following to retrieve the indices with the Elasticsearch 6.4.0 high level REST client:
GetIndexRequest request = new GetIndexRequest().indices("*");
GetIndexResponse response = client.indices().get(request, RequestOptions.DEFAULT);
String[] indices = response.getIndices();
I have a solution:
final ClusterStateRequest clusterStateRequest = new ClusterStateRequest();
clusterStateRequest.clear().metaData(true);
final IndicesOptions strictExpandIndicesOptions = IndicesOptions.strictExpand();
clusterStateRequest.indicesOptions(strictExpandIndicesOptions);
ClusterStateResponse clusterStateResponse = client.admin().cluster().state(clusterStateRequest).get();
clusterStateResponse.getState().getMetadata().getIndices()
This will give all indexes. After that the reg ex matching has to be done manually. This is what is done for _cat implementation in elasticsearch source code.
In case you want to cat indices with ?v option:
IndicesStatsRequestBuilder indicesStatsRequestBuilder = new
IndicesStatsRequestBuilder(client, IndicesStatsAction.INSTANCE);
IndicesStatsResponse response = indicesStatsRequestBuilder.execute().actionGet();
for (Map.Entry<String, IndexStats> m : response.getIndices().entrySet()) {
System.out.println(m);
}
Each of the entries contains document count, storage usage, etc. You can run this for all or filtering some index.
PD: Tested with 5.6.0 version

How to stabilize spark streaming application with a handful of super big sessions?

I am running a Spark Streaming application based on mapWithState DStream function . The application transforms input records into sessions based on a session ID field inside the records.
A session is simply all of the records with the same ID . Then I perform some analytics on a session level to find an anomaly score.
I couldn't stabilize my application because a handful of sessions are getting bigger at each batch time for extended period ( more than 1h) . My understanding is a single session (key - value pair) is always processed by a single core in spark . I want to know if I am mistaken , and if there is a solution to mitigate this issue and make the streaming application stable.
I am using Hadoop 2.7.2 and Spark 1.6.1 on Yarn . Changing batch time, blocking interval , partitions number, executor number and executor resources didn't solve the issue as one single task makes the application always choke. However, filtering those super long sessions solved the issue.
Below is a code updateState function I am using :
val updateState = (batchTime: Time, key: String, value: Option[scala.collection.Map[String,Any]], state: State[Seq[scala.collection.Map[String,Any]]]) => {
val session = Seq(value.getOrElse(scala.collection.Map[String,Any]())) ++ state.getOption.getOrElse(Seq[scala.collection.Map[String,Any]]())
if (state.isTimingOut()) {
Option(null)
} else {
state.update(session)
Some((key,value,session))
}
}
and the mapWithStae call :
def updateStreamingState(inputDstream:DStream[scala.collection.Map[String,Any]]): DStream[(String,Option[scala.collection.Map[String,Any]], Seq[scala.collection.Map[String,Any]])] ={//MapWithStateDStream[(String,Option[scala.collection.Map[String,Any]], Seq[scala.collection.Map[String,Any]])] = {
val spec = StateSpec.function(updateState)
spec.timeout(Duration(sessionTimeout))
spec.numPartitions(192)
inputDstream.map(ds => (ds(sessionizationFieldName).toString, ds)).mapWithState(spec)
}
Finally I am applying a feature computing session foreach DStream , as defined below :
def computeSessionFeatures(sessionId:String,sessionRecords: Seq[scala.collection.Map[String,Any]]): Session = {
val features = Functions.getSessionFeatures(sessionizationFeatures,recordFeatures,sessionRecords)
val resultSession = new Session(sessionId,sessionizationFieldName,sessionRecords)
resultSession.features = features
return resultSession
}

Resources