randomly take two different objects for one predicate - random

I have an RDF dataset like this:
<subject1> <some_predicate> "Value 1" .
<subject1> <some_predicate> "Value 2" .
<subject1> <some_predicate> "Value 3" .
<subject1> <some_predicate> "Value 4" .
<subject1> <some_predicate> "Value 5" .
<subject2> <some_predicate> "Value 6" .
<subject2> <some_predicate> "Value 7" .
<subject2> <some_predicate> "Value 8" .
<subject2> <some_predicate> "Value 9" .
<subject2> <some_predicate> "Value 10" .
Now, for each subject I want to have two random values of "some_predicate". They ought to be two different ones. So, the expected result would be something like:
--------------------------------------------
| subject | random_value_1 | random_value_2 |
============================================
| subject1 | "Value 2" | "Value 5" |
| subject2 | "Value 6" | "Value 7" |
--------------------------------------------
I found this question sparql: randomly select one connection for each node However, the problem there is just to get one value, I need two and different values.

You can do just about the same thing, but it's a bit more complicated. First select one random value for other the subject. Then, in an outer query, select one more random value in the same way, but one that's different from the first (you could allow the same by removing the filter):
select ?subject (sample(?v1) as ?value1) (sample(?v2) as ?value2) {
{ select ?subject ?v1 ?v2 {
{ select ?subject ?v1 {
?subject <some_predicate> ?v1
}
order by rand() }
?subject <some_predicate> ?v2
filter(!sameTerm(?v1,?v2))
}
order by rand()
}
}
group by ?subject
Note that the same caveats apply that applied to the linked question, sparql: randomly select one connection for each node; since the implementation of sample isn't specified, it could conceivably give you non-random results. Here are some sample outputs using Jena's ARQ:
---------------------------------------
| subject | value1 | value2 |
=======================================
| <subject1> | "Value 1" | "Value 2" |
| <subject2> | "Value 10" | "Value 6" |
---------------------------------------
--------------------------------------
| subject | value1 | value2 |
======================================
| <subject1> | "Value 4" | "Value 1" |
| <subject2> | "Value 8" | "Value 6" |
--------------------------------------
--------------------------------------
| subject | value1 | value2 |
======================================
| <subject1> | "Value 4" | "Value 3" |
| <subject2> | "Value 6" | "Value 8" |
--------------------------------------

Related

Convert DAX if statement to Power Query (or M)?

I need to create a calculated column in order to filter a Tabular model table with the following structure:
Table1
| ID | Attr A | Attr B | Value |
|-----|-----------|--------|-------|
| 123 | text here | blah | 130 |
| 123 | blah | blah | 70 |
| 456 | blah | blah | 90 |
| 456 | blah | blah | 110 |
And I want the following new column to be created:
| ID | Attr A | Attr B | Value | MaxValue |
|-----|-----------|--------|-------|----------|
| 123 | text here | blah | 130 | TRUE |
| 123 | blah | blah | 70 | FALSE |
| 456 | blah | blah | 90 | FALSE |
| 456 | blah | blah | 110 | TRUE |
I would like to create a calculated column using Power Query equivalent to the following DAX statement which returns TRUE if the Values column is the largest for a given ID, FALSE otherwise.
= IF(CALCULATE(MAX('Table1'[Value]),ALLEXCEPT('Table1','Table1'[ID])) = 'Table1'[Value], TRUE(), FALSE())
P.S. I used the default M language editor to generate an if shell statement so this is similar to what I'm looking for:
= Table.AddColumn(#"Changed Type", "MaxValue", each if [#"[Value]"] = 'some logic here' then true else false)
If your source table is set up like this and called Table1:
Then this M code should do what you're asking:
let
Source = Table1,
#"Grouped Rows" = Table.Group(Source, {"ID"}, {{"ValueMax", each List.Max([Value]), type number}, {"AllData", each _, type table [ID=text, Attr A=text, Attr B=text, Value=number]}}),
#"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Attr A", "Attr B", "Value"}, {"Attr A", "Attr B", "Value"}),
#"Added Custom" = Table.AddColumn(#"Expanded AllData", "MaxValue", each [ValueMax]=[Value]),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"ID", "Attr A", "Attr B", "Value", "MaxValue"})
in
#"Removed Other Columns"
It should give you this result:

Grafana & Elastic - How to count sub array length

So I have a document that has two nested arrays i.e.
foo.bars[].baz[]
I am trying to figure out how I can use graphana to group by bars and give me a count of bar's for each bar. So it would look something like:
| bars.id| count|
| 1 | 10 |
| 2 | 15 |
| 3 | 20 |
What I have tried is the following:
Group by bars.id
Add a Sum metric for bars.baz.id
Override the script value to return 1
While this does give me the count of the bars, it does so for all bars in the document and not grouped by the bars.id i.e.
| bars.id| count|
| 1 | 45 |
| 2 | 45 |
| 3 | 45 |
Any help to achieve this would be very helpful.
Now if this can be done I have another more complex problem. I have another collection let's call it bobs that is a child of the root document. Now bobs isn't nested under the bars array but it has a bar.id field. I would also like to sum this based on that i.e.
{
bobs: [
{bar_id: 1},
{bar_id: 2},
],
bars: [
{id: 1, bazes: []},
{id: 2, bazes: []}
]
}
In this case I would also like in the table:
| bars.id| bobs.count|
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
Is this possible?

Elasticsearch index with jdbc driver

Sorry my english is bad
I am using elasticsearch and jdbc river. I have two table with many-to-many relations. For example:
product
+---+---------------+
| id| title |
+---+---------------+
| 1 | Product One |
| 2 | Product Two |
| 3 | Product Three |
| 4 | Product Four |
| 5 | Product Five |
+---+---------------+
product_category
+------------+-------------+
| product_id | category_id |
+------------+-------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 4 |
| 2 | 5 |
+------------+-------------+
category
+---+---------------+
| id| name |
+---+---------------+
| 1 | Category One |
| 2 | Category Two |
| 3 | Category Three|
| 4 | Category Four |
| 5 | Category Five |
+---+---------------+
I want to use array type.
{
"id": 1,
"name": "Product one",
"categories": {"Category One", "Category Two", "Category Three"}
},
How should I write a sql?
Use elasticsearch-jdbc structured objects with sql, no need to group_concat:
SELECT
product.id AS _id,
product.id,
title,
name AS categories
FROM product
LEFT JOIN (
SELECT *
FROM product_category
LEFT JOIN category
ON product_category.category_id = category.id
) t
ON product.id = t.product_id
Since river has been deprecated since ES v1.5, maybe run a standalone importer is better.

RethinkDB - Query with count and join

I have two "tables" such as:
PEOPLE (ID / NAME)
1 | JOHN
2 | MARY
3 | PETER
MESSAGES (ID / PERSON_ID / TEXT)
1 | 1 | 'Text'
1 | 1 | 'Text 2'
1 | 2 | 'Text 3'
1 | 3 | 'Text 4'
How can I get the number of messages of each person? Just like:
(PERSON_ID / NAME / MESSAGES)
1 | JOHN | 2
2 | MARY | 1
3 | PETER | 1
this should do the trick:
r.db("so").table("messages")
.group(r.row('person_id')).count().ungroup()
.map((result) => {
return result.merge(r.db("so").table("users").get(result('group')));
})
Result looks like this:
[
{"group":1,"id":1,"name":"Dalan","reduction":2},
{"group":2,"id":2,"name":"Rodger","reduction":3}
]
You can further rename the fields as you like with the .merge method but this gets you the join and grouping that you wanted!
Let me know if you have any questions.

Apache Drill - Using Multiple Delimiters in File Storage Plugin?

I have logs that resemble the following:
value1 value2 "value 3 with spaces" value4
using:
"formats": {
"csv": {
"type": "text",
"delimiter": " "
}
}
for the storage plugin delimiting by " " gives me the following columns:
columns[0] | columns[1] | columns[2] | columns[3] | columns[5] | columns[6] | columns[7]
value1 | value2 | value | 3 | with | spaces | value4
what I'd like is:
columns[0] | columns[1] | columns[2] | columns[3]
value1 | value2 | value 3 with spaces | value4
To my knowledge, there is no way to skip delimiters in Drill. However, if variable 3 is the only one that can have those " " in between, a workaround I can think of is:
structure your first query so that columns[3] is always the last, Ex
select columns[0], columns[1], columns[2], columns[4], columns[3] from dfs.default./path/to/your/file;
use the CONCATENATE() command to build your variable in a separate column.
Another way around it would require changing the default delimiter in the file prior having Drill reading it. Depending on where you are ingesting your data from this may be feasible or not.
Good luck and if you are looking for more things on Drill, be sure to check out MapR's Community page on Drill, which has code examples that might be helpful: https://community.mapr.com/community/products/apache-drill

Resources