Finding k nearest neighbors with SPARQL query - algorithm

I would like to write a single SPARQL query to find the k nearest neighbors for a set of vectors. To find the average label for the 100 nearest neighbors for a single vector I can use the following query:
PREFIX : <ml://>
PREFIX vector: <ml://vector/>
PREFIX feature: <ml://feature/>
SELECT (AVG(?label) as ?prediction)
WHERE {
{
SELECT ?other_vector (COUNT(?common_feature) as ?similarity)
WHERE { vector:0 :has ?common_feature .
?other_vector :has ?common_feature .
} GROUP BY ?other_vector ORDER BY DESC(?similarity) LIMIT 100
}
?other_vector :hasLabel ?label .
}
Is there a way to do this for multiple vectors in a single query?

Unless I'm overlooking something, you can do this by replacing the URI vector:0 with a variable, like so:
SELECT ?vector (AVG(?label) as ?prediction)
WHERE {
{
SELECT ?vector ?other_vector (COUNT(?common_feature) as ?similarity)
WHERE { ?vector :has ?common_feature .
?other_vector :has ?common_feature .
FILTER(?vector != ?other_vector)
} GROUP BY ?other_vector ORDER BY DESC(?similarity) LIMIT 100
}
?other_vector :hasLabel ?label .
}
I added a filter condition to check that ?vector and ?other_vector are not equal, whether that is necessary is up to you of course :)
If you need to restrict the list of vectors for which you want to find a match, you can use a VALUES clause to restrict possible bindings for ?vector:
VALUES ?vector { vector:0 vector:1 ... }

Related

Google Sheet Query SUM exclude all SUM equal zero

cracking brain for this.
I have a very simple query with group and sum, and I want to exclude all sum results that are zero.
My actual query is:
=query(A:J;"SELECT D, SUM(I), SUM(H) WHERE C<>'S' GROUP BY D ORDER BY D DESC")
So.. I know I cant do something like this:
=query(A:J;"SELECT D, SUM(I), SUM(H) WHERE C<>'S' AND SUM(I)>0 GROUP BY D ORDER BY D DESC")
I'm trying with query inside filter, query inside query, but I can't figure out how to solve it.
try:
=QUERY(QUERY(A:J;
"select D,sum(I),sum(H)
where C<>'S'
group by D
order by D desc");
"where Col2>0"; 1)

How to use (opaque) cursors in GraphQL / Relay when using filter arguments and order by

Imagine the following GraphQL request:
{
books(
first:10,
filter: [{field: TITLE, contains: "Potter"}],
orderBy: [{sort: PRICE, direction: DESC}, {sort: TITLE}]
)
}
The result will return a connection with the Relay cursor information.
Should the cursor contain the filter and orderBy details?
Meaning querying the next set of data would only mean:
{
books(first:10, after:"opaque-cursor")
}
Or should the filter and orderBy be repeated?
In the latter case the user can specify different filter and/or orderBy details which would make the opaque cursor invalid.
I can't find anything in the Relay spec about this.
I've seen this done multiple ways, but I've found that with cursor-based pagination, your cursor exists only within your dataset, and to change the filters would change the dataset, making it invalid.
If you're using SQL (or something without cursor-based-pagination), then, you would need to include enough information in your cursor to be able to recover it. Your cursor would need to include all of your filter / order information, and you would need to disallow any additional filtering.
You'd have to throw an error if they sent "after" along with "filter / orderBy". You could, optionally, check to see if the arguments are the same as the ones in your cursor, in case of user error, but there simply is no use-case to get "page 2" of a DIFFERENT set of data.
I came across the same question / problem, and came to the same conclusion as #Dan Crews. The cursor must contain everything you need to execute the database query, except for LIMIT.
When your initial query is something like
SELECT *
FROM DataTable
WHERE filterField = 42
ORDER BY sortingField,ASC
LIMIT 10
-- with implicit OFFSET 0
then you could basically (don't do this in a real app, because of SQL Injections!) use exactly this query as your cursor. You just have to remove LIMIT x and append OFFSET y for every node.
Response:
{
edges: [
{
cursor: "SELECT ... WHERE ... ORDER BY ... OFFSET 0",
node: { ... }
},
{
cursor: "SELECT ... WHERE ... ORDER BY ... OFFSET 1",
node: { ... }
},
...,
{
cursor: "SELECT ... WHERE ... ORDER BY ... OFFSET 9",
node: { ... }
}
]
pageInfo: {
startCursor: "SELECT ... WHERE ... ORDER BY ... OFFSET 0"
endCursor: "SELECT ... WHERE ... ORDER BY ... OFFSET 9"
}
}
The next request will then use after: CURSOR, first: 10. Then you'll take the after argument and set the LIMIT and OFFSET:
LIMIT = first
OFFSET = OFFSET + 1
Then the resulting database query would be this when using after = endCursor:
SELECT *
FROM DataTable
WHERE filterField = 42
ORDER BY sortingField,ASC
LIMIT 10
OFFSET 10
As already mentioned above: This is only an example, and it's highly vulnerable to SQL Injections!
In a real world app, you could simply encode the provided filter and orderBy arguments within the cursor, and add offset as well:
function handleGraphQLRequest(first, after, filter, orderBy) {
let offset = 0; // initial offset, if after isn't provided
if(after != null) {
// combination of after + filter/orderBy is not allowed!
if(filter != null || orderBy != null) {
throw new Error("You can't combine after with filter and/or orderBy");
}
// parse filter, orderBy, offset from after cursor
cursorData = fromBase64String(after);
filter = cursorData.filter;
orderBy = cursorData.orderBy;
offset = cursorData.offset;
}
const databaseResult = executeDatabaseQuery(
filter, // = WHERE ...
orderBy, // = ORDER BY ...
first, // = LIMIT ...
offset // = OFFSET ...
);
const edges = []; // this is the resulting edges array
let currentOffset = offset; // this is used to calc the offset for each node
for(let node of databaseResult.nodes) { // iterate over the database results
currentOffset++;
const currentCursor = createCursorForNode(filter, orderBy, currentOffset);
edges.push({
cursor = currentCursor,
node = node
});
}
return {
edges: edges,
pageInfo: buildPageInfo(edges, totalCount, offset) // instead of
// of providing totalCount, you could also fetch (limit+1) from
// database to check if there is a next page available
}
}
// this function returns the cursor string
function createCursorForNode(filter, orderBy, offset) {
return toBase64String({
filter: filter,
orderBy: orderBy,
offset: offset
});
}
// function to build pageInfo object
function buildPageInfo(edges, totalCount, offset) {
return {
startCursor: edges.length ? edges[0].cursor : null,
endCursor: edges.length ? edges[edges.length - 1].cursor : null,
hasPreviousPage: offset > 0 && totalCount > 0,
hasNextPage: offset + edges.length < totalCount
}
}
The content of cursor depends mainly on your database and you database layout.
The code above emulates a simple pagination with limit and offset. But you could (if supported by your database) of course use something else.
In the meantime I came to another conclusion: I think it doesn't really matter whether you use an all-in-one cursor, or if you repeat filter and orderBy with each request.
There are basically two types of cursors:
(1.) You can treat a cursor as a "pointer to a specific item". This way the filter and sorting can change, but your cursor can stay the same. Kinda like the pivot element in quicksort, where the pivot element stays in place and everything around it can move.
Elasticsearch's Search After works like this. Here the cursor is just a pointer to a specific item in the dataset. But filter and orderBy can change independently.
The implementation for this style of cursor is dead simple: Just concat every sortable field. Done. Example: If your entity can be sorted by price and title (plus of course id, because you need some unique field as tie breaker), your cursor always consists of { id, price, title }.
(2.) The "all-in-one cursor" on the other hand acts like a "pointer to an item within a filtered and sorted result set". It has the benefit, that you can encode whatever you want. The server could for example change the filter and orderBy data (for whatever reason) without the client noticing it.
For example you could use Elasticsearch's Scroll API, which caches the result set on the server and though doesn't need filter and orderBy after the initial search request.
But aside from Elasticsearch's Scroll API, you always need filter, orderBy, limit, pointer in every request. Though I think it's an implementation detail and a matter of taste, whether you include everything within your cursor, or if you send it as separate arguments. The outcome is the same.

Filtering in linq and assigning to list

I have a list of names in NameList list .
I want to filter it and one more list of objects "NameObject" . i able to achieve in the following , but I want to avoid for loop , Is there any better way of achieving this.
foreach (string name in NamesList)
{
var find = context.Names.Single(x => x.PersonName == name);
NameObject.Add(find);
}
You can use this query:
NameObject = context.Names.Where(n => NamesList.Contains(n.PersonName)).ToList();

Sparql multi lang data compression to one row

I'd like to select data property values using sparql with some restrictions on their languages:
I have an ordered set of preferred languages ("ru", "en", ... etc )
If an item have more than one language for value, I'd like to have only one value restricted by my set of languages (if ru is available - I want to see ru value, else if en available I want to see en else if ... etc if no lang available - no lang value).
Current query is:
select distinct ?dataProperty ?dpropertyValue where {
<http://dbpedia.org/resource/Blackmore's_Night> ?dataProperty ?dpropertyValue.
?dataProperty a owl:DatatypeProperty.
FILTER ( langmatches(lang(?dpropertyValue),"ru") || langmatches(lang(? dpropertyValue),"en") || lang(?dpropertyValue)="" )
}
The problem with it: results contain two rows for abstract (ru+en). I want only one row, which should contain ru. In case when ru is not available I'd like to get en etc.
How?
Suppose you have data like this:
#prefix : <http://stackoverflow.com/q/21531063/1281433/> .
:a a :resource;
:p "a in english"#en, "a in russian"#ru .
:b a :resource ;
:p "b in english"#en .
Then you're hoping to get results like this:
--------------------------------
| resource | label |
================================
| :b | "b in english"#en |
| :a | "a in russian"#ru |
--------------------------------
Here are two ways of doing this.
Associate language tags with ranks, find the rank of the best label, then find the label with that rank
This way uses SPARQL 1.1 subqueries, aggregates, and data provided with values. The idea is to use values to associate each language tag with a rank. Then you use a subquery to pull out the optimal rank over all the labels that the resource has). Then in the outer query, you have access to the optimal rank, and you just retrieve the label with the language corresponding to that rank.
prefix : <http://stackoverflow.com/q/21531063/1281433/>
select ?resource ?label where {
# for each resource, find the rank of the
# language of the most preferred label.
{
select ?resource (min(?rank) as ?langRank) where {
values (?lang ?rank) { ("ru" 1) ("en" 2) }
?resource :p ?label .
filter(langMatches(lang(?label),?lang))
}
group by ?resource
}
# ?langRank from the subquery is, for each
# resource, the best preference. With the
# values clause, we get just the language
# that we want.
values (?lang ?langRank) { ("ru" 1) ("en" 2) }
?resource a :resource ; :p ?label .
filter(langMatches(lang(?label),?lang))
}
Select the labels separately and coalesce in the order that you want
You can select an optional label for each of the languages you're considering, and then coalesce them into (so you get the first one that's bound) in the order of your preference. This is kind of verbose, but if you need to do anything else with the labels in various languages other than the most preferred, you'll have access to them.
prefix : <http://stackoverflow.com/q/21531063/1281433/>
select ?resource ?label where {
# find resources
?resource a :resource .
# grab a russian label, if available
optional {
?resource :p ?rulabel .
filter( langMatches(lang(?rulabel),"ru") )
}
# grab an english label, if available
optional {
?resource :p ?enlabel .
filter( langMatches(lang(?enlabel),"en") )
}
# take either as the label, but russian over english
bind( coalesce( ?rulabel, ?enlabel ) as ?label )
}

cannot convert from 'string' to 'System.Collections.Generic.IEqualityComparer<LightSwitchApplication.LettersSentItem>'

The table PatientsMaster has a relationship with LettersSentitem. In the letterssentitem table I have a field called lettertype. I have three different types of letters. How can I allow only type of letter from one patient.
partial void DateSent_Validate(EntityValidationResultsBuilder results)
{
if (this.PatientsMasterItem.LettersSentItem.Count() > 3 || this.PatientsMasterItem.LettersSentItem.Distinct(LetterType))
{
results.AddPropertyError("Can't Print More than 3 letters per patient");
}
}
You need to check if there is more than one letter of the same type, and one way to do this is by grouping on LetterType, and checking .Count() for each of the results:
partial void DateSent_Validate(EntityValidationResultsBuilder results)
{
if (this.PatientsMasterItem.LettersSentItem.GroupBy(i => i.LetterType).Any(l => l.Count() > 1))
{
results.AddPropertyError("Can't print more than one letter of the same type per patient");
}
}
.Distinct(x) isn't going to work. letters.Distinct(x) selects all distinct letters (letting x decide which ones are equal), but if you drop any letters by using Distinct(), you no longer have the information you need to count.

Resources