Query regarding Statsd and Collectd - elasticsearch

I have a query regarding the usage of statsd and collectd.
Wherever I see in the internet, I am only getting examples where statsd/collectd is used to collect metric information about the Application/System.
My Question is: Can statsd/collectd be used to collect statistical information on any other datasets which is not a system performance related data Eg: in Ecommerce,?
Can we use it to get the information of top 10 or top 15 users / URLs that are hitting the website, in a time-series analysis(say in last 15 minutes or last 15 days)?
Any relevant links or document in this regards is most welcome.
Also, I wanted to know if we can store this data in Elastic Search as well. Any documents on this is also most relevant to me and most welcome.
Thanks

Related

Can I put the result from kibana to elasicsearch again?

Can I put the response result that I query in Kibana dev tools into elasticsearch directly?
Or must I write a script to achieve it?
Any recommends?
Ok So here is one basic understanding after discussion.
Please observe carefully.
If you have head plugin installed for ES .
search for .kibana index .
open the .kibana index and you will have all the designed dashboards listed there with processd info.
Think ES as another Storage from where you can read the data and put that data into Another ES index.
Refer to this link :
https://www.elastic.co/blog/kibana-under-the-hood-object-persistence
Tools you can opt is Logstash for Reading and writing.
Grok pattern learning can give you good lead about that.
Tell me if need some real time pics for same problem.
Happy learning.
It is like you cook in kitchen and ask to put the cooked food in kitchen again.If you cooked food better consume it :)
See the visualization or processed data you see on kibana end is just for kibana.The algorithms or processing techniques for the data set residing at elastic search will be applied over the upcoming data set.
So offcourse you can put/consume your data in Elastic search back again.
It depends what sort of requirement you are facing.
Note : Data in elastic search(inverted index) after kibana processing not gonna change its architecture, due to which you are able to apply another processing techniques from kibana over the same index assuming that data is in it's earlier state.

Per store indexing in Solr

We have a requirement where we have say 500 stores and skus in each store is having different prices and they change everyday. the inventory status for each also changes everyday. We want to index data from all these stores in Solr and elastic search both. What is most effective way in which we can achieve this. Also I need help for querying too when i want to display this on website.
your question is a bit unclear, but if you are looking on how to index diff price/inventory per store, there is a very recent Lucene Solr Revolution presentation by Erik Hatcher showing how to this using Payloads (Solr recently got support for using payload stuff by Erik himself). He is actually using the same example in his presentation.

How to merge old data to save space in Elasticsearch

I tried to find information about this, but I have not find what I was looking for.
I am storing metrics every minutes in an Elasticsearch database. My idea is that the frequency is important only in a short period.
For example, I want to have my metrics every minutes for the last past week, but then I would like to merge these metrics in order to have only one document of metrics for each past weeks.
Thus, I have an idea to achieve this with a stream processing framework such as Spark streaming or Flink, but my question is : is there a native way / tool / tricks to make it happen in Elasticsearch ?
Thank you, hope my question is clear enough, otherwise leave a comment for more details.
One idea would be to have a weekly index in which you store all your metrics every minutes, once the week has passed, you could run an aggregation query on the past week index and aggregate all info at the day or week level. You'd then store that weekly aggregated information as new document in another historical index that you can query later on. I don't think it's necessary to leverage Spark streaming for this, ES aggregations can do the job pretty easily.

ElasticSearch Kibana match_all

I have the next problem: I made a python program and it was indexing a lot of domains (8000 per hour). Now i Have 16000 domains (more or less). In Kibana Discover window I can see my data but if I pick in Dev Tools and I make the query "match_all" I can only see 10 domains. Where is the problem?
I need to show all data in one query.
This is my actual query:
GET /project/_search
{"query": {"match_all": {}}}
Thanks in advance!
You get 10 results because it's the default size for a query - you can see that information here.
As is stated in the link, you can add the size argument with another value to see more information, but will be limited by the index.max_result_window which is 10000 by default.
What is the purpose of retrieving all information in one go?
The python modules available to interact with elasticsearch would allow you to retrieve all the information easily, see this link to see the elasticsearch.helpers.scan function.

ElasticSearch - Operational Insights , and Trends

Given an ElasticSearch Installation, I want to know current trends and insights. I am not sure if Aggregators would help here.
What are the top queries for last 24 hours?
Most frequently searched terms in last 24 hours? etc.
Most accessed documents in last 24 hours?
Is there any way to collect and get hold of these metrics from ElasticSearch.
A typical use case- As a user visits the homepage, i want to show the trending searches, and top content.
One alternative if you do not want to load your existing ElasticSearch installation with additional metrics is to send this data to a log management solution on the cloud -- such as Loggly, Logentries, etc.

Resources