can i query some value in hazelcast map? - filter

I have a map like("id1","34";"id2","45";"id3","55"...),can I retrieve the key whose value is big than 50,and get the keys whose value are the 3 biggest.Do hazelcast has the filter?Please tell where I can refer to .Thanks a lot.

Hazelcast has distributed queries. You can find a whole chapter about it in the documentation: http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#distributed-query

Related

Elastic search index lifecycle policy

i'm new with elasticsearch
i would like to set index lifecycle policy(from hot to warm), time based
Using java and spring boot to store data.
so my questions are:
can i set the lifecycle policy to read from my custom key(of date), if so, how do i do it? does the key needs to be in some format?
if 1 is not possible, is there a way to set #timestamp field manually? if we set a key with this format, will it do the trick?
if 1 and 2 is not possible, that means that all rollovers should be done programmatically, does anyone have good example? or just use simple select and insert and delete?
thanks!
I am exactly isn't sure what is your question. Anyway, I will try to answer as I understood.
life cycle policy can only be based on the date index created that's all.
it's index created time only
you can create rollover to happen at the hot phase automatically based on time or size or doc count of the index .

How to retrieve all existing indeces in painless

I'm wanting to retrieve the number of indices in my ES cluster from within a scripted field of an aggregation.
I know you can access some context values with ctx._source but does anyone know how to get the total number of indeces from my cluster?
Thanks!
That's not possible. The ctx context has no idea about the state of your cluster. It has only access to the currently iterated doc.

Using elasticsearch generated ID's in kafka elasticsearch connector

I noticed that documents indexed in elasticsearch using the kafka elasticsearch connector have their ids in the following format topic+partition+offset.
I would prefer to use id's generated by elasticsearch. It seems topic+partition+offset is not usually unique so I am loosing data.
How can I change that?
As Phil says in the comments -- topic-partition-offset should be unique, so I don't see how this is causing data loss for you.
Regardless - you can either let the connector generate the key (as you are doing), or you can define the key yourself (key.ignore=false). There is no other option.
You can use Single Message Transformations with Kafka Connect to derive a key from the fields in your data. Based on your message in the Elasticsearch forum it looks like there is an id in your data - if that's going to be unique you could set that as your key, and thus as your Elasticsearch document ID too. Here's an example of defining a key with SMT:
# Add the `id` field as the key using Simple Message Transformations
transforms=InsertKey, ExtractId
# `ValueToKey`: push an object of one of the column fields (`id`) into the key
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=id
# `ExtractField`: convert key from an object to a plain field
transforms.ExtractId.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractId.field=id
(via https://www.confluent.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/)
#Robin Moffatt, as much as I see it, topic-partition-offset can cause duplicates in case that upgrade your kafka cluster, but not in rolling upgrade fashion but just replace cluster with cluster (which is sometime easier to replace). In this case you will experience data loss because of overwriting data.
Regarding to your excellent example, this can be the solution for many of the cases, but I'd add another option. Maybe you can add epoc timestamp element to the topic-partition-offset so this will be like this topic-partition-offset-current_timestamp.
What do you think?

Can I use binary to store UUID?

I want to use UUID as a primary key for one of my very high volume tables in Apache Derby. Per Derby docs, I should be using CHAR(16) FOR BIT DATA. My question is: since this is a binary column and it does not support sorting how are indexes ordered and managed? I read explanations on how defragmentation occurs when out of order entries are added - so with binary PK since there is no sorting, won't entries get added to index page one after another? And in that case how are indexes managed/ordered by the db engine? What am I missing here? Can I use and index binary storing UUID?
I referred these links: http://kccoder.com/mysql/uuid-vs-int-insert-performance/ and How should I use UUID with JavaDB/Derby and JDBC? and http://www.informit.com/articles/article.aspx?p=25862&seqNum=7

What does the document cap mean in Websolr?

I'm using the Websolr addon in Heroku. What does it mean by "250,000 documents"? What number of DB records or size is that?
Nick from Websolr here.
In this case, 'documents' would be all the distinct 'things' that you want to search.
A Solr index is comprised of many documents. Each document has many fields. Typically each document is analogous to a row in a table, or an instance of a model for your particular ORM.
Typically, a Solr client for your preferred language will help you integrate that concept into your own application and the tools you have used to create it.
In Solr a document is an indexed 'item'.

Resources