I am quite new to spring data elasticsearch.
I wanted to know is there any way that i can group an index records based on one field's value and perform aggregations at each group level.
Any suggestions would be helpful.
Thanks & Regards
Sumanth K P
Related
I created an index from a Storm topology to ElasticSearch (ES). The index map is basically:
index: btc-block
miner: text
reward: double
datetime: date
From those documents I would like to create a histogram of the richest miner, on a daily scale.
I am wondering if I should aggregate first in storm and just use ES and Kibana to store, query and then display the data or if ES and Kibana can handle such requests.
I have been looking at the Transforms, in the index management section, that allows to create new indices from queries and aggregations in continuous modes but I can't succeed to get to the expected result.
Any help will be appreciated.
Sometimes we need to ask a question to find the answer...
I kept looking at the documentation and eventually I could solve the issue by using a sibling pipeline aggregation, in the visualization. In my case, a max bucket aggregation of the sum of reward on Y-axis.
In that case get like 6 records/hour so I guess it's ok to let Kibana and ES work. What if I got lot more data? Would it not be wiser to aggregate in Storm?
I'm wanting to retrieve the number of indices in my ES cluster from within a scripted field of an aggregation.
I know you can access some context values with ctx._source but does anyone know how to get the total number of indeces from my cluster?
Thanks!
That's not possible. The ctx context has no idea about the state of your cluster. It has only access to the currently iterated doc.
i am using spring boot application to load messages into elastic search. i have a use case where i need to query the elastic search data , get some id value and populate it in elastic search json document before inserting it into elastic search
Querying the elastic search before insertion . Will this be expensive ? If yes is there some other way to approach this issue.
You can use update_by_query to do it in one step.
But otherwise it shouldn't be slow if you do it in two steps (get + update). It depends on many things - how often you do it, how much data is transferred, etc.
Our current setup is MySQL as main data source through Spring Data JPA, with Hibernate Search to index and search data. We now decided to go to Elastic Search for searching to better align with other features, besides we need to have multiple servers sharing the indexing and searching.
I'm able to setup Elastic using Spring Data ElasticSearch for data indexing and searching easily, through ElasticsearchRepository. But the challenge now is how to index all the existing MySQL records into Elastic Search. Hibernate Search provides an API to do this org.hibernate.search.jpa.FullTextEntityManager#createIndexer which we use all the time. But I cannot find a handy solution within Spring Data ElasticSearch. Hope somebody can help me out here or provide some pointers.
There is a similar question here, however the solution proposed there doesn't fit my needs very well as I'd prefer to be able to index a whole object, which fields are mapped to multiple DB tables.
So far I haven't found a better solution than writing my own code to index all JPA entries to ES inside my application, and this one worked out for me fine
Pageable page = new PageRequest(0, 100);
Page<Instance> curPage = instanceManager.listInstancesByPage(page); //Get data by page from JPA repo.
long count = curPage.getTotalElements();
while (!curPage.isLast()) {
List<Instance> allInstances = curPage.getContent();
for (Instance instance : allInstances) {
instanceElasticSearchRepository.index(instance); //Index one by one to ES repo.
}
page = curPage.nextPageable();
curPage = instanceManager.listInstancesByPage(page);
}
The logic is very straightforward, just depending on the quantity of the data it might take a while, so breaking down to batches and adding some messages can be helpful.
I have a doubt about how indexed properties works in Alfresco 4.1.6 with SOLR 1.4.
I use something like this for my queries:
SearchParameters sp = new SearchParameters();
sp.addStore(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);
sp.setLanguage(SearchService.LANGUAGE_FTS_ALFRESCO);
sp.setQuery(query);
ResultSet results = getSearchService().query(sp);
where query variable is something like this:
PATH:" /app:company_home/app:user_homes/cm:_x0030_123//*" AND
((#cm\:title:food) OR (#cm\:name:abcde) OR (TEXT:valles) OR
(#doc\:custom_property:"report") OR (#doc\:custom_property2:"report")
AND (#doc\:custom_property3:"report") AND TYPE:"{my.model}voc_document"
On my model.xml I specify what custom properties are indexed
<index enabled="true">
My question is... How works SOLR 1.4 with the indexes if I put on the search query two or more indexed properties? Like Oracle? Oracle try the best index and use only this. Or maybe SOLR combine all the indexed properties and uses all the index on the query?
I need this answer to determine how many indexes put on my model.xml. Maybe put a lot of indexes don't give me the best and efficient result and is better index only a few properties.
And finally, one question. I use LANGUAGE_FTS_ALFRESCO, but I can see that exists a LANGUAGE_SOLR_FTS_ALFRESCO. Is the same? I need to use the second if I use SOLR?
Thanks a lot!
Best regards
There is only one "index". Every field you mark as indexable (which is enabled by default) ends up in your solr index. Alfresco takes your query and sends it to SOLR for processing.
If you don't have a lot of documents, you can go ahead and index every field. By far the biggest impact on indexing and search is the full text index of the content field, which is enabled by default also.
LANGUAGE_FTS_ALFRESCO will use whatever index subsystem you have enabled. In later versions it may use SOLR or the database depending on your configuration. If you try to LANGUAGE_SOLR_FTS_ALFRESCO, it's forcing SOLR, so if you don't have solr enabled, you would have an error.
Regards!