Crate.io: Facets for search? - faceted-search

Does https://crate.io support facets (for faceted search)?
I didn't find anything in the docs. ElasticSearch replaced facets with aggregations in 2014, but the aggregation section in the crate docs only talks about SQL aggregation functions.
My use case:
I've got a list of web sites, each record has a domain and a language field. When displaying the search results, I want to get a list of all domains that the search results appear in, as well a list of all languages, ordered by number of occurences so search results can be narrowed down. The number of results for those single facet values shall also be given.
Screenshot with facets:

There is no way to get the facets I want from crate itself.
Instead we're enabling the ElasticSearch REST API in crate.yml now
es.api.enabled: true
.. and can use the ElasticSearch aggregation API.

Crate doesn't support facets or Elasticsearch aggregations directly. Like you suggested, you can always turn on the Elasticsearch API. However, there are other ways to get these aggregations.
1) Have you considered to issue multiple queries to the cluster? For example, if you load your page dynamically with Javascript, you can first return the search results and load the facets later. This should also decrease the overall response time of the application.
2) In CrateDB 2.1.x, there will be support for subqueries, which allow you to include the facets within your query:
select q1.id, q1.domain, q1.tag, q2.d_count, q3.t_count from websites q1,
(select domain, count(*) as d_count from websites where text like '%query%' group by domain) q2,
(select tag, count(*) as t_count from websites where text like '%query%' group by tag) q3
where q1.domain = q2.domain and q1.tag = q3.tag and q1.text like '%query%'
order by q1.id
limit 5;
This gives you a result table like this where you have the search results alongside with the domain and tag count for the query:
+----+--------------+-----------+---------+-----------+
| id | domain | tag | d_count | t_count |
+----+--------------+-------------+---------+---------+
| 1 | example.com | example | 2 | 3 |
| 14 | crate.io | software | 1 | 4 |
| 17 | google.com | search | 5 | 2 |
| 29 | github.com | open-source | 3 | 3 |
| 47 | linux.org | software | 2 | 4 |
+----+--------------+-------------+---------+---------+
Disclaimer: I'm new to Crate :)

Related

Alert on Absent Data for Combined Metric in GCP Monitoring

I have created an alert policy in GCP MOnitoring which will notify me when a certain kind of log message stops appearing (a dead man's switch). I have create a logs-based metric with a label, "client", which I use to group the metric and get a timeseries per client. I have been using "absence of data" as the trigger for the alert. This has all been working well, until...
After a recent change, the logs now also com from different resources, so there is a need to combine the metric across those resources. I can achieve this using QML
{ fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
| every 30m
; fetch global::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
| every 30m }
| union
Notice that I need to align the two series with the same bucket size (30m) to be able to join them, which makes sense. I notice that the value for a timeseries is "undefined" in those buckets where the metric data was absent (by downloading a CSV of the query).
To create an alert using this query, I tried something like this:
{ fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
| every 30m
; fetch global::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
| every 30m }
| union
| absent_for 1h
If I look at the CSV output for this query it doesn't reflect the absence of metric data for a timeseries, and this is presumably because a value of "undefined" doesn't qualify as absent data.
Is there a way to detect for absence of data for a "unioned" metric (and therefore aligned) across multiple resources?
Update 1
I have tried this, which seems to get me some of the way there. I'd really appreciate comments on this approach.
{
fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
;
fetch global::logging.googleapis.com/user/ping
| group_by [metric.client], sum(val())
}
| union
| absent_for 1h
I have settled on a solution as follows,
{
fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client]
;
fetch global::logging.googleapis.com/user/ping
| group_by [metric.client]
}
| union
| absent_for 1h
| every 30m
Note:
group_by [metric.client] conforms the tables from different resource, which allows the union to work
absent_for does align input timeseries using the default period or one specified by a following every
I found it really hard to debug these MQL queries, in particular to confirm that absent_for was going to trigger an alert. I realised that I could use value [active] to show a plot of the active column (which absent_for produces) and that gave me confidence that my alert was actually going to work.
{
fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client]
;
fetch global::logging.googleapis.com/user/ping
| group_by [metric.client]
}
| union
| absent_for 1h
| value [active]

Using header names as a filter parameter in a dashboard

I have a data table that resembles the structure here:
| Prof | PI | Class |
|:----:|:------:|:-----:|
| Dr.K | Louisa | A |
| Dr.L | Jenny | B |
| Dr.X | Liu | C |
Filter 1: I'd like to create two dropdown, single selection parameter-filters, the first of which contains the headers of the columns. So, filter one would contain the option to select: Pro, PI, or Class.
Filter 2: The second filter would then dynamically change to represent values of the selected column. If a user chose "Prof" in Filter 1, Filter 2 would show: Dr. K, Dr. L, and Dr. X. The table in the dashboard would then reflect the chosen filters.
I believe choosing "only relevant values" on Filter 2 would take care of some of the issues, but I still don't understand how I can turn column headers into a list, and those values still retain the integrity of the original columns. Thank you for any help you can provide!
IF [Parameter 1] = STR("Prof") THEN [Prof] ELSEIF [Parameter 1] STR("PI") THEN [PI] END

max() Aggregation in Kibana

I'm creating a dashboard in Kibana with an index.
Date | Place | Value
------|-------|-------
12/16|LUXURY | 5
12/16|LUXURY | 3
12/16|LUXURY | 5
12/16|LUXURY | 6
from the above records, I want to fetch only the record with maximum value (6).
simply, the logic should be max(Value for Date,Place) in many languages but I'm not sure about Kibana. Please help me to create a column (new scripted field) for my visualization.
Thanks

How to Get Distinct Value of DataTable and Append New Column with the Count of each distinct value returned [UIPath][VB.Net]?

I am a newbie in UIPath.
I have a DataTable with these headers:
1.) Date
2.) Error
I want to extract a Distinct Date for every error, and use this code:
dtQuery = ExtractDataTable.DefaultView.ToTable(True,{"Date","Error"})
With this, I get my desired result. My problem is how can I append (a new Column, "Count") EACH COUNT of DISTINCT VALUES given? For Example:
DATE | ERROR | COUNT
2/27/2019 | Admin Query String |
2/27/2019 | 404 Shield |
2/26/2019 | 404 Shield |
2/25/2019 | 404 Shield |
2/25/2019 | Admin Query String |
I tried to use ADD DATA COLUMN ACTIVITY with these properties:
Column Name = "COUNT"
Data Table = dtQuery
DefaultValue = ExtractDataTable.DefaultView.ToTable(True,{"Date","Error"}).Rows.Count
But by using this, it gives me this:
DATE | ERROR | COUNT
2/27/2019 | Admin Query String | 5
2/27/2019 | 404 Shield | 5
2/26/2019 | 404 Shield | 5
2/25/2019 | 404 Shield | 5
2/25/2019 | Admin Query String | 5
Thanks in advance! Happy coding!
After hours of research, here is what I learned.
I can iterate on each item of the datatable by using FOR EACH ROW Activity.
So for every row item of my dtQuery, I add ASSIGN Activity that looks like this:
row(2) = [item i want to add]
But that doesn't answer my question. I want to know the count of each unique item with 2 criteria - They are same DATE and ERROR.
Maybe I can code directly on the Excel File?
So I researched for Excel Formula that looks like "Select Distinct Col1....etc."
I found this video tutorial, hope it might help: Countif
But its only for a single criterion, so I found this: Countifs
So to wrap it up,
For Each Row Image
1.) I loop inside dtQuery using For Each Row Activity
2.) Inside loop, I add Assign Activity with this code
row(2) = "=COUNTIFS('LookUp Sheet'!B:B,'Result Sheet'!A" & indexerRow + 2 & ",'LookUp Sheet'!D:D,'Result Sheet'!B" & indexerRow + 2 & ")"
Hope this help others who will be stumbling upon the same problem. Happy Automating! ^_^

SSRS Reporting - count number of group rows

I am trying to the count of number of groups in my report I know I could do it in the SQL however trying to avoid adding redundant data to my dataset if I can.
I have a MainDataSet that could have multiple entries per distinct group item. All I want is the no. of groups not the count of items within the group.
For example words starting with alphabet letters, lets say I have 2 groups A and B only (NB: number of groups can change dynamically as I filter the MainDataSet based on user parameter selection):
Group | Data
------|-----
A | Apple
A | Ant
B | Balloon
B | Book
B | Bowl
Final Result:
Group | Index | NGroups
A | 1 | 2
B | 2 | 2
I know I can get the Index using a aggregate function as follows:
RunningValue(Fields!Group.Value, CountDistinct, "TablixName")
But how do I get the NGroups value?
I guess I could also create another dataset based on the MainDataSet (make use of a sql function) and do:
SELECT 'X' AS GroupCount, COUNT(Distinct Group) AS NGroups
FROM dbo.udf_MainDataSet()
WHERE FieldX = #Parameter1
Then use a LookUp:
Lookup("X", Fields!GroupCount.Value, Fields!NGroups.Value, "NewDataSet")
But is there a simple solution that I am not seeing?
CountDistinct(Fields!Group.Value, "TablixName")

Resources