Is Weaviate compatible with fielded search? - graphql

I'm working with a dataset that contains multiple fields. I need to conduct searches on several fields simultaneously. Is Weaviate compatible with fielded search? If that is the case, I'd appreciate it if you could instruct me on how to combine many search queries.
This is a scheme:
schema = {
"classes": [{
"class": "Post",
"vectorizer": "none", # explicitly tell Weaviate not to vectorize anything, we are providing the vectors ourselves through our BERT model
"properties": [{
"name":"pmid",
"dataType": ["int"],
},
{
"name":"title",
"dataType": ["text"],
},
{
"name": "body",
"dataType": ["text"],
},
{
"name":"summary",
"dataType": ["text"],
}]
}]
}
I'd want to do a simultaneous search on the body and summary. For instance, it identifies publications that have the term "HIV" in their body and summary.

This is certainly possible. Check out the where-filter in the Weaviate docs :-)
Example based on your example schema.
{
Get {
Post(
nearVector: {
vector: [0, 0, 0] # <== your custom vector
}
where: { # <== searching for a pmid > 12
operator: GreaterThan
valueInt: 12
path: ["pmid"]
}
) {
pmid
title
}
}
}

Related

Incorrectly selected data in the query

Only articles that contain the EmailMarketing tag are needed.
I'm probably doing the wrong search on the tag, since it's an array of values, not a single object, but I don't know how to do it right, I'm just learning graphql. Any help would be appreciated
query:
query {
enArticles {
title
previewText
tags(where: {name: "EmailMarketing"}){
name
}
}
}
result:
{
"data": {
"enArticles": [
{
"title": "title1",
"previewText": "previewText1",
"tags": [
{
"name": "EmailMarketing"
},
{
"name": "Personalization"
},
{
"name": "Advertising_campaign"
}
]
},
{
"title": "title2",
"previewText": "previewText2",
"tags": [
{
"name": "Marketing_strategy"
},
{
"name": "Marketing"
},
{
"name": "Marketing_campaign"
}
]
},
{
"title": "article 12",
"previewText": "article12",
"tags": []
}
]
}
}
I believe you first need to have coded an equality operator within your GraphQL schema. There's a good explanation of that here.
Once you add an equality operator - say, for example _eq - you can use it something like this:
query {
enArticles {
title
previewText
tags(where: {name: {_eq: "EmailMarketing"}}){
name
}
}
}
Specifically, you would need to create a filter and resolver.
The example here may help.

Cursor Based Pagination Naming Convention in GraphQL

In the GraphQL API, I often see naming conventions such as NQ and MQ as parameters used in cursor. This is an example, shown below,
"data": {
"items": {
"totalCount": 351,
"pageInfo": {
"hasNextPage": true,
"hasPreviousPage": false,
"endCursor": "Mw",
"startCursor": "MQ"
},
"edges": [
{
"cursor": "MQ",
"node": {
"id": "UGxhY2UtMzUy",
"displayName": "Redbeard"
}
},
{
"cursor": "Mg",
"node": {
"id": "UGxhY2UtMzUx",
"displayName": "Frey of Riverrun"
}
},
{
"cursor": "Mw",
"node": {
"id": "QmlsbGVyLTI=",
"displayName": "Something Else"
}
}
]
}
}
}
Source:
https://dev.to/tymate/first-dive-into-graphql-ruby-nak
Other examples include this rails example: https://www.2n.pl/blog/graphql-pagination-in-rails
What are these naming conventions and how would you for example paginate?
The Relay Server Specification defines how pagination should be done in order to be compatible with the Relay GraphQL Client. While it is not the only way how pagination can be done, it has evolved as a standard - at least in examples, since it can be easily referenced.
The section on connections gives more info about how cursors work:
Each edge gets a cursor value. This value is - what they call - an opaque value, meaning it should not be interpreted by the server. It is a reference/a pointer that only the server can interpret. So, if you have a query that gets a bunch of values:
edges: [
{ cursor: "abc", node: {...} },
{ cursor: "def", node: {...} },
{ cursor: "ghi", node: {...} },
{ cursor: "jkl", node: {...} },
{ cursor: "mno", node: {...} }
]
You can request the next page by looking at the cursor of the last element mno and pass it into the query.
query {
manyQuery(first: 5, after: "mno") {
edges {
cursor
node {...}
}
}
}
This will give you the next 5 nodes. See also this section on graphql.org.
So to answer your question: The string can potentially contain anything that the server can use to reference one of your nodes. E.g. an id in the database. To remove the temptation to pass in an arbitrary value from the API user this string is often encoded into the base64 format. The value should be meaningless to the client and only be used to be passed around back to the server.

graphQL filter array containing ALL

I am quite new to graphQL, and after searching the whole afternoon, i didn't found my answer to a relative quite simple problem.
I have two objects in my strapi backend :
"travels": [
{
"id": "1",
"title": "Bolivia: La Paz y Salar de Uyuni",
"travel_types": [
{
"name": "Culturales"
},
{
"name": "Aventura"
},
{
"name": "Ecoturismo"
}
]
},
{
"id": "2",
"title": "Europa clásica 2020",
"travel_types": [
{
"name": "Clasicas"
},
{
"name": "Culturales"
}
]
}
]
I am trying to get a filter where I search for travels containing ALL the user-selected travel_types.
I then wrote a query like that :
query($where: JSON){
travels (where:$where) {
id # Or _id if you are using MongoDB
title
travel_types {name}
}
And the parameter i try to input for testing :
{
"where":{
"travel_types.name_contains": ["Aventura"],
"travel_types.name_contains": ["Clasicas"]
}
}
This should return an empty array, because none of the travels have both Aventura and Clasicas travel-types.
But instead it returns the travel with id=2. It seems that only the second filter is taken.
I searched for a query which would be like Array.every() in javascript, but i wasn't able to find.
Does someone has an idea how to achieve this type of filtering ?
Thank you very much,

Is this expected Query Performance from CosmosDB for "between" queries on an integer property

I have a cosmosdb collection (sql api) that I've populated with documents representing CIDR Network Ranges.
The relevant part of each document is
{
"Network": "31.216.102.0/23",
"IPRangeStart": 534275584,
"IPRangeEnd": 534276095,
Each CIDR block has it's start and end IP addresses converted to uint and stored in hte RangeStart and RangeEnd properties.
When I run a query to search for a specific entry by it's start range, it works as expected and is quite fast.
SELECT top 1 * FROM c WHERE c.IPRangeStart = 532361216
Request Charge: 3.02 RUs
However when I introduce a between query using <= / => operators, it gets VERY expensive.
SELECT top 1 * FROM c WHERE c.IPRangeStart <= 534275590 AND c.IPRangeEnd >= 534275590
Request Change: 1647.99 RUs
I've reviewed the index setup on the collection
I've also applied 2 additional integer range indices on the collection for the two specific properties in question. Though there doesn't appear to be a way to check for progress of these indices being applied/created in the background.
Is there something obvious that I might be missing.
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
},
{
"path": "/IPRangeStart/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
},
{
"path": "/IPRangEnd/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}
],
"excludedPaths": []
}
Think I solved it. The problem stemmed from the fact that I had a greater than query on one property and a less than query on a different property.
It appears that cosmos was merging the full set of documents that satisfied each independent filter clause.
Since the largest CIDR range in the set was a /18 (16k address block) was able to get it working by saying.
Where start <= value
And start >= value-32786
And end >= value
And end <= value+32768

Parse Query by subfield/dot notation

tl;dr
Can ParseCloud/MongoDB filter by Pointer<class>.filed ? By
Pointer<class>.Pointer<class> ? By existence of data in that filed?
Long question:
Round is object which will be played automatically when time will come.
Payment object which indicates that user made payment. When payment being spent we set field round to it.
Player which links online User with Payment
I need to query player for few conditions:
Player
online
has valid(no round and valid equal to 'valid') payment
Player
user equal to specific user
has no payment
Player
user equal to specific user
has valid(no round and valid equal to 'valid') payment
And I made everything to work except validating Payment inside Player query.
Here is condition 1 from the list.
var query = new Parse.Query(keys.Player);
query.skip(0);
query.limit(oneRoundMaxPlayers);
query.greaterThanOrEqualTo(keys.last_online_date, lastAllowedOnline);
// looks like no filter applied here
query.doesNotExist("payment.round");
query.exists(keys.payment);
// This line will make query return 0 elements
// query.equalTo("payment.valid", "valid");
query.include(keys.user);
query.include(keys.payment);
Here is 2 OR 3
var queryPaymentExists = new Parse.Query(keys.Player);
queryPaymentExists.skip(0);
queryPaymentExists.limit(1);
queryPaymentExists.exists(keys.payment);
//This line not filtering
queryPaymentExists.doesNotExist(keys.payment + "." + keys.round);
queryPaymentExists.equalTo(keys.user, user);
// This line makes query always return 0 elements
// queryPaymentExists.equalTo(keys.payment + "." + keys.valid, keys.payment_valid);
var queryPaymentDoesNotExist = new Parse.Query(keys.Player);
queryPaymentDoesNotExist.skip(0);
queryPaymentDoesNotExist.limit(1);
queryPaymentDoesNotExist.doesNotExist(keys.payment);
queryPaymentDoesNotExist.equalTo(keys.user, user);
var compoundQuery = Parse.Query.or(queryPaymentExists, queryPaymentDoesNotExist);
compoundQuery.include(keys.user);
compoundQuery.include(keys.payment);
compoundQuery.include(keys.payment + "." + keys.round);
I've checked logs from Mongo and they looks following
verbose: REQUEST for [GET] /classes/Player: {
"include": "user,payment,payment.round",
"where": {
"$or": [
{
"payment": {
"$exists": true
},
"payment.round": {
"$exists": false
},
"user": {
"__type": "Pointer",
"className": "_User",
"objectId": "ASPKs6UVwb"
}
},
{
"payment": {
"$exists": false
},
"user": {
"__type": "Pointer",
"className": "_User",
"objectId": "ASPKs6UVwb"
}
}
]
}
}
Here is response:
verbose: RESPONSE from [GET] /classes/Player: {
"response": {
"results": [
{
"objectId": "VHU9uwmLA7",
"last_online_date": {
"__type": "Date",
"iso": "2017-10-28T15:15:23.547Z"
},
"user": {
"objectId": "ASPKs6UVwb",
"username": "cn92Ekv5WPJcuHjkmTajmZMDW",
},
"createdAt": "2017-10-22T11:43:16.804Z",
"updatedAt": "2017-10-25T09:23:20.035Z",
"ACL": {
"*": {
"read": true
},
"ASPKs6UVwb": {
"read": true,
"write": true
}
},
"__type": "Object",
"className": "_User"
},
"createdAt": "2017-10-27T21:03:35.442Z",
"updatedAt": "2017-10-28T15:15:23.556Z",
"payment": {
"objectId": "nr7ln7U3eJ",
"payment_date": {
"__type": "Date",
"iso": "2017-10-27T23:42:50.614Z"
},
"user": {
"__type": "Pointer",
"className": "_User",
"objectId": "ASPKs6UVwb"
},
"createdAt": "2017-10-27T23:42:50.624Z",
"updatedAt": "2017-10-28T15:12:30.131Z",
"valid": "valid",
"round": {
"objectId": "jF9gqG4ndh",
"round_date": {
"__type": "Date",
"iso": "2017-10-28T15:12:00.027Z"
},
"createdAt": "2017-10-28T15:11:00.036Z",
"updatedAt": "2017-10-28T15:12:30.108Z",
,
"ACL": {
"*": {
"read": true
}
},
"__type": "Object",
"className": "Round"
},
"ACL": {
"ASPKs6UVwb": {
"read": true
}
},
"__type": "Object",
"className": "Payment"
},
"ACL": {
"ASPKs6UVwb": {
"read": true
}
}
}
]
}
}
You can see that response contains payment.round.
My question is following:
Can ParseCloud/MongoDB filter by Pointer<class>.filed ? By Pointer<class>.Pointer<class> ? By existence of data in that filed?
How can I workaround in situation when I need to check field presence if User can have may Players, User can have many Payments.
UPD
As far as I found mongo should support filtering by "dot notation"
mongodb query by sub-field
So what am I doing wrong?
Short answer:
No
Simplify your data structure
Long answer:
Dot notation can be used to
include documents of pointers, as you already did in your code, e.g. include(keys.user)
filter for properties of fields, e.g. {properyA: 1, propertyB: 2}. All the data is in the field, not in another document in another collection that is referenced by a Parse pointer.
Dot notation cannot be used as filter parameter for referenced pointers in a Parse query. MongoDB also does not support such a filtering, the concept of pointer is one by Parse and not by MongoDB. In a NoSQL environment like MongoDB there are no relations between tables to be used in the query language, as it is not a "relational database" like an SQL database. However Parse provides some comfort of an SQL for simple queries with its concepts of pointer, compoundQuery and matchesKeyInQuery.
If that is not sufficient in your case, simply add the fields to the collection. To the expense that you may have the same fields and data in multiple collections but with the advantage of faster query execution time.
Finding the right data structure is one of the big topics for NoSQL as there is no general right structure. The collections and document structures are basically designed as a trade off between:
execution performance
query necessity / frequency
security (access level)
and data storage size
And they are liquid and can change over time. As your app and its queries mutate you'd also change the data structure if the long term gain is greater than the one time effort.

Resources