Solr multiple search and group by date range - time

if have the problem to execute multiple Solr queries at ones with the same phrase but for different timeranges.
Example:
search for "atom" at:
2011-04-01T10:20:22.0Z TO 2011-04-01T12:20:22.0Z
2011-03-08T10:20:22.0Z TO 2011-03-08T12:20:22.0Z
2011-02-05T10:20:22.0Z TO 2011-02-05T12:20:22.0Z
So i need a few messages from each 2 hour interval.
First of all, i thought about facet search, but i don't think, thats a way, is'n it?
2nd idea was to fire one solr request for every time range. But probably there is to much (network)overhead for that, because this example is only an simplified version.
Maybe anybody has an idea, how could i handle this? What solr functionality is the best way for this?
Thank you.

Use FieldCollapsing with the group by query option.

Related

Kibana composite query pagination

I have a composite aggregation query doing exactly what I want (the details of said query should not matter). I would like very much to visualise the results in Vega as a nice time-based chart, but I've hit a very stupid roadblock: I cannot find how to ask Vega to fetch all results. Composite aggregation results are paged (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html#_pagination) and therefore, in order to get all results, multiple queries should be done. So I can display one page of data, which is not enough in my case.
Is there a way to fetch all pages with Vega or Vega-Lite? If not, perhaps in another graph module of Kibana? A quick search gave no definitive answers… And finally, I have the latest version of everything.
Thanks!
Yup, no. Dynamic elastic URLs are not doable (basing a query of another query), I think I put through a feature request for this a while back, but unfortunately Vega and Kibana integrations get pushed to the way side for improvements in Lens.
Hopefully in the future this is something they do because it would severely improve the Vega-Kibana capibilities. I guess it depends on what you are actually trying to do, and whether you can find a way to get the data through in one search - this would be my advice.

Can multiple add/delete of document to an index make it inconsistent?

For a use-case, I'll need to add and remove multiple documents to an elastic search index. My understanding is that the tf-idf or BM25 scores are affected by the frequencies that are calculated using the postings list (?)... But, if I add and remove many documents in a day, will that affect the document/word statistics?
I've already went though a lot of API's but my untrained eyes could not locate if this is the case, or if there's a way for me to force ElasticSearch to update/recompute the index every day or so...
Any help would be appreciated
Thanks
"The IDF portion of the score can be affected by deletions and modifications" the rest should be fine... (Igor Motov)
Link to discussion:
https://discuss.elastic.co/t/can-multiple-add-delete-of-document-to-an-index-make-it-inconsistent/137030

Elasticsearch slow performance for huge data retrieval with source field

I'm using ElasticSearch to search from more than 10 million records, most records contains 1 to 25 words. I want to retrieve data from it, the method I'm using now is drastically slow for big data retrieval as I'm trying to get data from the source field. I want a method that can make this process faster. I'm free to use other database or anything with ElasticSearch. Can anyone suggest some good Ideas and Example for this?
I've tried searching for solution on google and one solution I found was pagination and I've already applied it wherever it's possible but pagination is not an option when I want to retrieve many(5000+) hits in one query.
Thanks in advance.
Try using scroll
While a search request returns a single “page” of results, the scroll
API can be used to retrieve large numbers of results (or even all
results) from a single search request, in much the same way as you
would use a cursor on a traditional database.

How to delete data from ElasticSearch through JavaAPI

EDITED
I'm trying to find out how to delete data from Elasticsearch according to a criteria. I know that older versions of ElasticSearch had Delete By Query feature, but it had really serious performance issues, so it was removed. I know also for that there is a Java plugin for delete by query:
org.elasticsearch.plugin:delete-by-query:2.2.0
But I don't know if it has a better implementation of delete which has a better performance or it's the same as the old one.
Also, someone suggested using scroll to remove data, but I know how to retrieve data scrolling, not how to use scroll to remove!
Does anyone have an idea (the amount of documents to remove in a call would be huge, over 50k documents.
Thanks in advance!
Finally used this guy's third option
You are correct that you want to use the scroll/scan. Here are the steps:
begin a new scroll/scan
Get next N records
Take the IDs from each record and do a BulkDelete of those IDs
go back to step 2
So you don't delete exactly using the scroll/scan, you just use that as a tool to get all the IDs for the records that you want to delete. In this way you're only deleting N records at a time and not all 50,000 in 1 chunk (which would cause you all kinds of problems).

How to do a time range search in Kibana

We are using the ELK for log aggregation. Is it possible to search for events that occured during a particular time range. Lets say I want to see all exceptions that occurred between 10am and 11am in last month.
Is it possible to extract the time part from #timestamp and do a range search on that somehow (similiar to date() in SQL)?
Thanks to Magnus who pointed me to looking at scripted fields. Take a look at:
https://www.elastic.co/blog/kibana-4-beta-3-now-more-filtery
or
https://www.elastic.co/guide/en/elasticsearch/reference/1.3/search-request-script-fields.html
Unfortunately you can not use these scripted fields in queries but only in visualisations.
So I resorted to a workaround and use logstashs drop filter to remove the events I don't want to show up in Kibana in the first-place. That is not perfect for obvious reasons but it does the job.

Resources