We're using elasticsearch to store data related to the features that our customers use. The index, say feature-usage is updated every time a customer activates or deactivates a feature.
Sample data:
Customer ID, Uses feature A, Uses feature B
1 , true , false
2 , true , true
3 , false , true
4 , true , false
This data reflects the "right now". There's no timestamp attached.
One of the views I can currently provide based on that is:
feature A is used by 3 customers right now.
feature B is used by 2 customers right now.
I would like to be able to show a history for this data:
feature A was used by 2 customers yesterday
feature A was used by 1 customer 2 days ago
Essentially, I want to create a graph showing the evolution of feature usages. For that, I need to store historical data, which I imagine would look something like this:
Day , customers using feature A, customers using feature B
2021-05-17, 2 , 1
2021-05-18, 3 , 1
2021-05-19, 2 , 2
On a SQL database, I would probably run a nightly cron job to generate this data. I tried playing around with elasticsearch's transforms and rollups, but I couldn't figure out a good solution.
Is there a way to transform feature-usage into the historical data as shown here, using only elasticsearch and no external code/cron jobs?
You can provide your index with an ingestion pipeline that adds the current timestamp, when the data is ingested. So you would have historical information.
You can define a sample pipeline as reported in the official documentation
PUT _ingest/pipeline/my-pipeline
{
"processors": [
{
"set": {
"description": "Index the ingest timestamp as 'event.ingested'",
"field": "event.ingested",
"value": "{{{_ingest.timestamp}}}"
}
}
]
}
Then, you can set the default pipeline for your index:
PUT feature-usage
{
"index.default_pipeline": "my-pipeline"
}
Now you should have a timestamp (event.ingested) for each document. Now you should be able to work with date_histograms and find the answers you seek.
The idea is that first you aggregate the data based on the date and then perform counts based on conditions.
Kind regards,
Mirko
Related
I'm trying to extract data from Google Analytics, using KingswaySoft - SSIS Integration Toolkit, in Visual Studio.
I've set the metrics and dimensions, but I get this error message:
Please remove transactions to make the request compatible. The request's dimensions & metrics are incompatible. To learn more, see https://ga-dev-tools.web.app/ga4/dimensions-metrics-explorer/
I've tried to remove transactions metric and it works, but this metric is really necessary.
Metrics: sessionConversionRate, sessions, totalUsers, transactions
Dimensions: campaignName, country, dateHour, deviceCategory, sourceMedium
Any idea on how to solve it?
I'm not sure how helpful this suggestion is but could a possible work around include having two queries.
Query 1: Existing query without transactions
Query 2: The same dimensions with transactionId included
The idea would be to use the SSIS Aggregate component to group by the original dimensions and count the transactions. You could then merge the queries together via a merge join.
Would that work?
The API supports what it supports. So if you've attempted to pair things that are incompatible, you won't get any data back. Things that seem like they should totally work go together like orange juice and milk.
While I worked on the GA stuff through Python, an approach we found helped us work through incompatible metrics and total metrics was to make multiple pulls using the same dimensions. As the data sets are at the same level of grain, as long as you match up each dimension in the set, you can have all the metrics you want.
In your case, I'd have 2 data flows, followed by an Execute SQL Task that brings the data together for the final table
DFT1: Query1 -> Derived Column -> Stage.Table1
DFT2: Query2 -> Derived Column -> Stage.Table2
Execute SQL Task
SELECT
T1.*, T2.Metric_A, T2.Metric_B, ... T2.Metric_Z
INTO
#T
FROM
Stage.T1 AS T1
INNER JOIN
Stage.T2 AS T2
ON T2.Dim1 = T1.Dim1 /* etc */ AND T2.Dim7 = T1.Dim7
-- Update you have solid data aka
-- isDataGolden exists in the "data" section of the response
-- Usually within 7? days but possibly sooner
UPDATE
X
SET
metric1 = S.metric1 /* etc */
FROM
dbo.X AS X
INNER JOIN #T AS T
ON T.Dim1 = X.Dim1
WHERE
X.isDataGolden IS NULL
AND T.isDataGolden IS NOT NULL;
-- Add new data but be aware that not all nodes might have
-- reported in.
INSERT INTO
dbo.X
SELECT
*
FROM
#T AS T
WHERE
NOT EXISTS (SELECT * FROM dbo.X AS X WHERE X.Dim1 = T.Dim1 /* etc */);
Here's the situation I have, suppose my index document looks like this :
{
"user" : 1
"started" : "2021-06-05"
"finished" : -1
"status": "ONGOING"
}
{
"user" : 2
"started" : "2021-06-05"
"finished" : "2021-06-06"
"status": "DONE"
}
Like this I have 100 docs indexed. The ongoing documents have -1 as the finished time and completed once have a valid timestamp. I want to visualize a graph that can give me the number of ongoing applications with the "started" field in the X-axis.
In the date histogram, I'm only able to get the filtered ongoing processes for that specific interval. But I want the count for the ongoing application to be counted for every interval until the document is updated with the finish time.
Is there anyway I can visualize this in Kibana? Even an elastic search query that can give me this output will do.
This is really similar to a problem I had and have now solved. I spent ages trying to create a query that does this to no avail, but luckily this can be achieved using Vega's transforms.
If you want to bin it evenly, not using start time as your x-values Here is the posted solution (look for my answer). The one thing I would add is; for the documents where you have "-1" as the finished time, if you do a formula transform you can round these to the end bin times.
However, if you still want to stick to the "started"/"finished" field being the points of summation/evaluation this is also possible. I'll give you a quick rundown on how to do this...
Method:
First thing you need to do is create two copies of your data with a common field referring to the "timestamp". The first dataset will have the "started" value assigned to the field "timestamp" (started dataset) and the second will have "finished" (finished dateset). You can achieve this using the formula transform.
You will then need to create a column in each dataset named "operation" referring to what that that data entry does - add a user or remove a user. For the finished dataset you want to assign a column of '-1's and the started dataset '1's. Again using formula transforms.
Then join these datasets back up. You now need to order by "timestamp" and cumulatively sum the "operation" column up. This can be achieved using the window transform.
This should give you the data needed to plot it. Arguably this is much more accurate than binning, but if your data set is large it can yield quite messy results - binning in this case is much cleaner.
Good luck, there is obviously a lot to fill in but a working example would of taken me quite a while to draw up - plus where is the fun in copying.
I'm trying to setup a sweep job that moves a document from one class to a different class, but I only want to test right now -- not move ALL documents.
I was trying to add a filter to only pull over certain documents to test this before I pull the trigger, but it isn't working (ALL documents get listed in the results when I run this as preview).
The current filter I have is:
[DocumentTitle] like '%Z*%'
Any ideas what I need to do to change the filter to only have this run on the subset of documents I want??
Please clarify on below queries to resolve your issue:
1) Is your sweep job based on Java API / .net API? or
2) Is it based on FEM (Enterprise manager) tool
From the Filter : [DocumentTitle] like "%Z%" will filter all documents with the title %Z%, Please try to filter with ID to fetch one record, Once successful, then test with multiple records.
Thanks,
Habi
The sweep jobs typically take a condition that is similar to part after WHERE condition in search, the easiest way hence is simply to go to the search view, create your search, move to the SQL view tab, and then take whatever after WHERE condition and then add it to your sweep search filter.
Here are examples of filter conditions:
VersionStatus = 4 //All superseded documents
DateCreated < NOW() - TimeSpan(365, 'Days') //All documents that were created at least a year ago
StorageArea = OBJECT('{5E2BE09A-F4B1-49E2-A229-77FE32E5FEF1}') //All content in a specific storage area
VersionStatus = 4 AND DateCreated < NOW() - TimeSpan(365, 'Days') AND ContentSize > (1024 * 1024 * 500) //Complex logical expression
Final point in regard to your question about
I only want to test right now -- not move ALL documents.
Sweeps has Sweep Mode which defines how the sweep is going to execute, in your case you need to set it to Preview.
I'm testing Kibana 4 for a project.
I have created an index from my database table which is composed by 3 fields:
Date
User
Action
I would like to display my index as a simple table (3 column, N rows) in my dashboard.
I tried to use "Data table" visualization but I can't find a way to display my results without any Metrics (Count, Sum etc...)
Maybe is pretty simple and I missed something... is there a way to do this?
Regards,
On the Discover tab, create a view that has just the fields you want and then save that as a search.
On the Dashboard tab, click on Edit then hit the + Create new button to add a widget, but if you look at the top, there's a Searches tab. Select that and add your saved search in.
[Elastic 7.x / 2019 Update]
I was a bit confused when I read #Alcanzar's answer so I am sharing a little more noob-friendly step-by-step how-to here :
STEP 1 : Create the Index Pattern
STEP 2 : Go to the Dashboard view, and create a view on your index
Select each column you want to include/add in your view by clicking "add" on it (The confusing part is that until you do that, you will have a "scrambled" view listing everything in a jumbled way.)
STEP 3 : Go to the Dashboard view, and create a view on your index
The trick is to select the specific columns you want to include... and voila !
Don't forget to save your view, this will help a lot in the process.
In Kibana 7.5.0 you can do it as follows:
Go to Discover section
Select fields you are interested in
Click on Save to save your discover search so you can use it in visualizations and dashboards
Click on Dashboard and create a new dashboard
Click on Add and select the panel
There is no step 6
The accepted solution has its pros (if, for simplicity, you see your index as a table, this is the only way to deal with rows naturally) but also cons (it allows the user to see too much information, by expanding the records that appear in the table; users cannot get an export of the values).
So if you plan to build tables to use in reports seen by users which should not see everthing and may want to get exports of the data, I recommend a different (hacky) approach using Table visualizations:
Say you have three columns A, B and C:
If there are no duplicates considering the combined values of A and B, you can use these two vales as aggregation fields, and then set a Max or Top hit Metric for C.
If even A, B and C have duplicates, then you can use the three of them as aggregation fields and add a Metric count, that will give you the number of repeated rows. This solution makes somehow sense, because instead of repeating the same row 'n' times you just tells you should have repeated 'n' times that row.
If A and B have duplicates but A, B and C are unique, then there is, afaik, no elegant solution. You have to use the three of them as aggregation fields, but then you would have a dummy metric at the end (e.g. count, always equal to 1).
Why? why do we have to go through all of this? that is another question...
I'm testing Kibana 4 for a project.
I have created an index from my database table which is composed by 3 fields:
Date
User
Action
I would like to display my index as a simple table (3 column, N rows) in my dashboard.
I tried to use "Data table" visualization but I can't find a way to display my results without any Metrics (Count, Sum etc...)
Maybe is pretty simple and I missed something... is there a way to do this?
Regards,
On the Discover tab, create a view that has just the fields you want and then save that as a search.
On the Dashboard tab, click on Edit then hit the + Create new button to add a widget, but if you look at the top, there's a Searches tab. Select that and add your saved search in.
[Elastic 7.x / 2019 Update]
I was a bit confused when I read #Alcanzar's answer so I am sharing a little more noob-friendly step-by-step how-to here :
STEP 1 : Create the Index Pattern
STEP 2 : Go to the Dashboard view, and create a view on your index
Select each column you want to include/add in your view by clicking "add" on it (The confusing part is that until you do that, you will have a "scrambled" view listing everything in a jumbled way.)
STEP 3 : Go to the Dashboard view, and create a view on your index
The trick is to select the specific columns you want to include... and voila !
Don't forget to save your view, this will help a lot in the process.
In Kibana 7.5.0 you can do it as follows:
Go to Discover section
Select fields you are interested in
Click on Save to save your discover search so you can use it in visualizations and dashboards
Click on Dashboard and create a new dashboard
Click on Add and select the panel
There is no step 6
The accepted solution has its pros (if, for simplicity, you see your index as a table, this is the only way to deal with rows naturally) but also cons (it allows the user to see too much information, by expanding the records that appear in the table; users cannot get an export of the values).
So if you plan to build tables to use in reports seen by users which should not see everthing and may want to get exports of the data, I recommend a different (hacky) approach using Table visualizations:
Say you have three columns A, B and C:
If there are no duplicates considering the combined values of A and B, you can use these two vales as aggregation fields, and then set a Max or Top hit Metric for C.
If even A, B and C have duplicates, then you can use the three of them as aggregation fields and add a Metric count, that will give you the number of repeated rows. This solution makes somehow sense, because instead of repeating the same row 'n' times you just tells you should have repeated 'n' times that row.
If A and B have duplicates but A, B and C are unique, then there is, afaik, no elegant solution. You have to use the three of them as aggregation fields, but then you would have a dummy metric at the end (e.g. count, always equal to 1).
Why? why do we have to go through all of this? that is another question...