How to use auto increment index in Tarantool? - tarantool

I made auto increment index:
box.space.metric:create_index('primary', {
parts = {{'id', 'unsigned'}},
sequence = true,
})
Then I try to pass nil in id field:
metric.id = nil
When I try insert this values, I catch error:
Tuple field 1 type does not match one required by operation: expected unsigned
What value do I have to pass for autoincrement field?
Second questions. If I use tarantool-cluster with few instances (for ex. cartridge-application based), is it prove use autoincrement indexes? Will there be a cases that there will be duplicate keys on different instances?

It is not possible to pass nil. When you assign nil, you erase field. Use box.NULL instead.
But better, use some kind of cluster id, which perform well across cluster, instead of autoincrement, which works only inside one node.
For cluster-wide ids I could propose UUID or something like ULID (for ex from https://github.com/moonlibs/id)

Related

Elasticsearch query not returning expected results for multiple should filters

I am performing an Elasticsearch query using the high-level-rest-api for Java and expect to see records that are either active or do not have a reference id. I'm querying by name for the records and if I hit the index directly with /_search?q=, I see the results I want.
Is my logic correct (pseudo-code):
postFilters.MUST {
Should {
MustNotExist {referenceId}
Must {status = Active}
}
Should {
MustNotExist {referenceId}
Must {type = Person}
}
}
What I get are records that are active with a reference id. But, I want to include records that also do not have a referenceId, hence why I have MustNotExist {referenceId}.
For simplicity, the second Should clause can be dropped (for testing) as the first one is not working as expected by itself.
In my case, I had to use a match query instead of a term query because the value I was querying for was not a primitive or a String. For example, the part where Must, type = Person, Person was an enum, and so looking for "Person" was not quite right, whereas match allowed it to "match".

Slow AQL and data type conversion, how can I improve my AQL performance?

Hello ArangoDB community,
I have imported two collections from sqlite to ArangoDB with arangoimport (via a CSV).
Next, I try to run a simple AQL to cross reference these collections (with an end goal to connect them via edges).
Collection1 has 1,682,642 documents
Collection2 has 3,290 documents
The following AQL takes a whopping 30 seconds to complete:
FOR c1 IN Collection1
FOR c2 IN Collection2
FILTER c2._key == TO_STRING(c1.someField) return {"C2": c2._id, "C1": c1._id}
If I switch the conversion like so, it takes forever (I abandoned after 5 minutes):
FOR c1 IN Collection1
FOR c2 IN Collection2
FILTER TO_NUMBER(c2._key) == c1.someField return {"C2": c2._id, "C1": c1._id}
Adding an index on "someField" didn't help.
The same JOIN query in Sqlite (from which the data was imported) takes less than 1 second to complete
A few thoughts and questions:
1) How can I know the data types of the fields in a document?
2) _key is a string. I think "someField" is a number (because without the TO_STRING, no results returned).
3) Is adding TO_STRING on "someField" effectively makes the index on the field unusable?
4) Is there a way to make _key a number (preferably an integer). I think number comparison is faster, is it not?
5) Alternatively, can I tell arangoimport to force "someField" to be a string?
6) Is there anything else I can do to make the AQL run faster?
Any input appreciated,
Elad
The supported data types follow the JSON specs. You can determine the data types by looking at a document, e.g. using the Web UI. Use the Code view mode in the document editor to see the document as JSON:
"Aerosmith" is a string, 1973 is a number, the genres are string in an [ ... ] array and each song is an { ... } object. There are also null, true and false literals.
For a programmatic way to determine the data type of an attribute there are Type check functions, e.g. TYPENAME() to return the data type name as string. Example query to count how often the attribute someField is of which data type:
RETURN MERGE( FOR c1 IN Collection1
COLLECT type = TYPENAME(c1.someField) WITH COUNT INTO count
RETURN { [type]: count }
)
_key is always a string indeed. You can use above query if you are unsure what someField is. Please share this information.
If you cast a value which is only known at run-time (here: document attribute) to a different type then yes, no index can be utilized. An index lookup is only possible if you query for a value as-is. However, you may type-cast bind variables and other constant values, as they are known at query compile time.
No, the document key is always a string. There is an index on the _key attribute (the primary index), hence there is no performance penalty because it is a string instead of a numeric value.
arangoimport has an option to convert numeric strings to numbers, "null" to null and "true" / "false" to Boolean values (--convert), but there is no option to force an attribute to become a string. There is a feature request to add the ability to prescribe the desired data types.
In case you want numeric strings to stay strings, use --convert false to turn the auto-conversion off. If the values are numbers in the source file (not in quote marks), then you can adjust the file before you import it. You can also use a one-off AQL query to convert an attribute to a certain data type:
FOR c1 IN Collection1
UPDATE doc WITH { someField: TO_STRING(someField) } IN Collection1
I assume that in SQLite the primary key was an integer value and therefore references to it as well (foreign keys). Because the primary key must be a string in ArangoDB, the references need to be of type string as well. Change the documents to store foreign keys as strings as well. Add a hash index to Collection1 on someField (the field you use for a join). Then this query should be fast and return the expected result:
FOR c1 IN Collection1
FOR c2 IN Collection2
FILTER c2._key == c1.someField
RETURN { C2: c2._id, C1: c1._id }

How do you explicitly find a record where a field is NULL?

From the documentation:
When query with struct, GORM will only query with those fields has non-zero value, that means if your field’s value is 0, '', false or other zero values, it won’t be used to build query conditions.
Here is an example of what I am trying to do :
type Dog struct {
ID uuid.UUID
OwnerID *uuid.UUID
}
The owner ID pointer may be nil.
db.Where("owner_id", nil).Find(&dogs)
But that returns all dogs (I expected this from the doc). I tried doing:
db.Where("owner_id", "NULL").Find(&dogs)
But that returns an empty list.
Is there a way to explicitly search for a NULL value field?
According to the docs, this should work:
db.Where("owner_id IS NULL").Find(&dogs)
Yeah, if you want to get the data with the value is null. You only have some choices.
use map interface
use struct with conditions
db.Where("owner_id IS NULL").Find(&dogs)
// or you can also try different way of writing
db.Find(&dogs, "owner_id IS NULL")
You were close in your original attempt. You can also achieve your desired result by writing the query as below.
db.Where("owner_id = ?", "NULL").Find(&dogs)

How do I query an optional column with a secondary index using phantom?

I have a secondary index on an optional column:
class Sessions extends CassandraTable[ConcreteSessions, Session] {
object matchId extends LongColumn(this) with PartitionKey[Long]
object userId extends OptionalLongColumn(this) with Index[Option[Long]]
...
}
However, the indexedToQueryColumn implicit conversion is not available for optional columns, so this does not compile:
def getByUserId(userId: Long): Future[Seq[Session]] = {
select.where(_.userId eqs userId).fetch()
}
Neither does this:
select.where(_.userId eqs Some(userId)).fetch()
Or changing the type of the index:
object userId extends OptionalLongColumn(this) with Index[Long]
Is there a way to perform such a query using phantom?
I know that I could denormalize, but it would involve some very messy housekeeping and triple our (substantial) data size. The query usually returns only a handful of results, so I'd be willing to use a secondary index in this case.
Short answer: You could not use optional fields in order to query things in phantom.
Long detailed answer:
But, if you really want to work with secondary optional columns, you should declare your entity field as Option but your phantom representation should not be an option in order to query.
object userId extends LongColumn(this) with Index[Long]
In the fromRow(r: Row) you can create your object like this:
Sessions(matchId(r), Some(userId(r)))
Then in the service part you could do the following:
.value(_.userId, t.userId.getOrElse(0))
You also have a better way to do that. You could duplicate the table, making a new kind of query like sessions_by_user_id where in this table your user_id would be the primary key and the match_id the clustering key.
Since user_id is optional, you would end with a table that contains only valid user ids, which is easy and fast to lookup.
Cassandra relies on queries, so use it in your favor.
Take a look up on my github project that helps you get up with multiple queries in the same table.
https://github.com/iamthiago/cassandra-phantom

RethinkDB: Get only one record from cursor/selection

If I have a query that returns multiple results, how do I get a single element from the selection?
e.g.
r.db("test").table("things") // returns an array of things. I want one of them
Using limit(1) is not what I want because that returns an array.
Rethink DB supports getting the nth element so the query should be:
r.db("test").table("things").nth(0)
In the event that there are no elements, the above will fail with:
Index out of bounds: 0
The solution to this is to return a default object (null in my case) if no element exists.
r.db("test").table("things").nth(0).default(null)

Resources