How to remove a field in Tarantool space? - tarantool

I have field in tarantool space I no longer need.
local space = box.schema.space.create('my_space', {if_not_exists = true})
space:format({
{'field_1', 'unsigned'},
{'field_2', 'unsigned'},
{'field_3', 'string'},
})
How to remove field_2 if it's indexed and if it's not indexed?

There is no any convenient way to do it.
The first way, just declare this field as nullable and insert NULL value to this field. Yes, it will be stored physically but you could hide them from users.
It's simple and not expensive.
The second way, write in-place migration. It's not possible if you have indexed fields after field you want to drop (in your example it's field_3).
And it's dangerous if you have a huge amount of data in this space.
local space = box.schema.space.create('my_space', {if_not_exists = true})
space:create_index('id', {parts = {{field = 1, type = 'unsigned'}}})
space:format({
{'field_1', 'unsigned'},
{'field_2', 'unsigned'},
{'field_3', 'string'},
})
-- Create key_def instance to simplify primary key extraction
local key_def = require('key_def').new(space.index[0].parts)
-- drop previous format
space:format({})
-- Migrate your data
for _, tuple in space:pairs() do
space:depete(key_def:extract_key(tuple))
space:replace({tuple[1], tuple[3]})
end
-- Setup new format
space:format({
{'field_1', 'unsigned'},
{'field_3', 'string'},
})
The third way is to create new space, migrate data into it and drop previous.
Still it's quite dangerous.
local space = box.schema.space.create('new_my_space', {if_not_exists = true})
space:create_index('id', {parts = {{field = 1, type = 'unsigned'}}})
space:format({
{'field_1', 'unsigned'},
{'field_3', 'string'},
})
-- Migrate your data
for _, tuple in box.space['my_space']:pairs() do
space:replace({tuple[1], tuple[3]})
end
-- Drop the old space
box.space['my_space']:drop()
-- Rename new space
local space_id = box.space._space.index.name:get({'my_new_space'}).id
-- In newer version of Tarantool (2.6+) space.alter method available
-- But in older versions you could update name via system "_space" space
box.space._space:update({space_id}, {{'=', 'name', 'my_space'}})

Related

crudrepository save takes too long to update an entity

I've got a controller that first selects all data with status = 'CREATED' transferType = 'SOME_TYPE' and DATE_TIME between x and y, and then put all the data in the List<TransferEntity>
then i am going through each element in the list and updating status to 'CHECKED'
if (listOfTransfers.isNotEmpty()){
for(element in listOfTransfers){
element.status = "CHECKED"
repos.transfers.save(element)
}
}
entity itself is pretty straight forward with no relations to other tables
#Entity
#Table( name = "TRANSFERS")
class TransferEntity(
#Id
#Column(name = "Identifier", nullable = false)
var Identifier: String? = null,
#Column(name = "TRANS_DATE_TIME")
var transDateTime: LocalDateTime? = null,
#Column(name = "TRANS_TYPE", nullable = true, length = 255)
var transType: String? = null,
#Column(name = "STATUS")
var status: String = ""
)
i tried to experiment with indexes (oracle)
`CREATE INDEX TRANS_INDEX_1 ON TRANSFERS(STATUS)`
`CREATE INDEX TRANS_INDEX_2 ON TRANSFERS(TRANS_DATE_TIME)`
`CREATE INDEX TRANS_INDEX_3 ON TRANSFERS(TRANS_TYPE)`
or created them as one index
CREATE INDEX TRANS_INDEX_4 ON TRANSFERS(STATUS,TRANS_DATE_TIME,TRANS_TYPE)
but it wasnt a big difference
UPDATE
witn TRANS_INDEX_1 2 and 3 - 3192 elements were updateind in 5 minutes 30 sec
with TRANS_INDEX_4 - 3192 elements were updated in 5 minutes 30 sec
maybe there are different approaches to mass update elements inside the list or perhaps indexes are completely wrong and i dont understand them as much as i want it to.
UPDATE 2
technically saveAll() method works much faster but still I think there should be a room for improvement
saveAll() - 3192 elements were saved under 3minutes 21seconds
save() 3192 elements were save under 5minutes 30 seconds
You call save() each time you update an element. 1000 elements will create 1000 query calls to the database, you repeat too many calls to your DB and that's why your function is slow.
Instead, you could use saveAll() after you updated all the elements
as suggested below, we also have to config the batch_size properly to really do the trick
Indexes won't help in this situation since they benefit the select operation more than update or insert
Since you set the same value to all the elements of your list, you can make a batch update query :
Query q = entityManager.createQuery("update TransferEntity t set t.status = :value where t in (:list)");
q.setParameter("value", "CHECKED");
q.setParamter("list", listOfTransfers);
q.execute();
If you use ORACLE as backend be aware that in clause is limited to 1000 elements. Therefore you might have to split your list in buckets of 1000 elements and loop on this query for each bucket.

PostgreSQL's JSONB-like indexable column in Tarantool?

In PostgreSQL we can create a JSONB column that can be indexed and accessed something like this:
CREATE TABLE foo (
id BIGSERIAL PRIMARY KEY
-- createdAt, updatedAt, deletedAt, createdBy, updatedBy, restoredBy, deletedBy
data JSONB
);
CREATE INDEX ON foo((data->>'email'));
INSERT INTO foo(data) VALUES('{"name":"yay","email":"a#1.com"}');
SELECT data->>'name' FROM foo WHERE id = 1;
SELECT data->>'name' FROM foo WHERE data->>'email' = 'a#1.com';
Which is very beneficial in the prototyping phase (no need for migration at all or locking when adding column).
Can we do similar thing in Tarantool?
Sure, tarantool supports JSON path indices. The example:
-- Initialize / load the database.
tarantool> box.cfg{}
-- Create a space with two columns: id and obj.
-- The obj column supposed to contain dictionaries with nested data.
tarantool> box.schema.create_space('s',
> {format = {[1] = {'id', 'unsigned'}, [2] = {'obj', 'any'}}})
-- Create primary and secondary indices.
-- The secondary indices looks at the nested field obj.timestamp.
tarantool> box.space.s:create_index('pk',
> {parts = {[1] = {field = 1, type = 'unsigned'}}})
tarantool> box.space.s:create_index('sk',
> {parts = {[1] = {field = 2, path = 'timestamp', type = 'number'}}})
-- Insert three tuples: first, third and second.
tarantool> clock = require('clock')
tarantool> box.space.s:insert({1, {text = 'first', timestamp = clock.time()}})
tarantool> box.space.s:insert({3, {text = 'third', timestamp = clock.time()}})
tarantool> box.space.s:insert({2, {text = 'second', timestamp = clock.time()}})
-- Select tuples with timestamp of the last hour, 1000 at max.
-- Sort them by timestamp.
tarantool> box.space.s.index.sk:select(
> clock.time() - 3600, {iterator = box.index.GT, limit = 1000})
---
- - [1, {'timestamp': 1620820764.1213, 'text': 'first'}]
- [3, {'timestamp': 1620820780.4971, 'text': 'third'}]
- [2, {'timestamp': 1620820789.5737, 'text': 'second'}]
...
JSON path indices are available since tarantool 2.1.2.

Dash plotly live update datatable

I would like to implement a live-update feature to the add column function based on the documentation on editable DataTable https://dash.plotly.com/datatable/editable such that a new column is added to the datatable when the CSV file is updated. I've got the callback working and I can add new columns as new CSV data is updated through live update but I've run into some problems. In my first attempt (labelled first code), I declared a global variable (which I know is bad in DASH) in an attempt to keep track of current CSV contents and the compare it to an updated CSV file to check for new data, but this version fails to load existing CVS data on initial start up and only adds new data. The second code, loads up existing CSV data into columns, but on live-update it simply adds duplicate columns. I cannot simply remake the whole datatable with a new CSV file because if there is user input in the cells, then that would be lost on update.
My question is therefore: How do I store the state of something (ie: number of lines of CSV data) without using global variables in Dash so that I have the ability to compare the existing state of a CSV file to a new state. I can't see how this is accomplished since assigned variables are reset on live-update. I've read about Hidden Div and storing data in a users browser session but I can't get it to work. Is that the direction I should be going or is there a more elegant solution?
html.Div([dash_table.DataTable(
id='editing-columns',
columns=[{
'name': 'Parameter',
'id': 'column1',
'deletable': True,
'renamable': True
}],
style_cell={
'whiteSpace': 'normal',
'height': 'auto',
'width': '20',
# 'minWidth': '50px',
'textAlign': 'left',
},
data=[
{'column1': j}
for j in table_contents
],
editable=True,
)], className="drug_input"),
#First code
#app.callback(Output('editing-columns', 'columns'),
[Input('graph-update', 'n_intervals')],
[State('editing-columns', 'columns')])
def update_columns(n, existing_columns):
check_length = []
global sierra
with open('assets' + '/' + 'record ' + str(current) + '.csv', 'r') as rf:
reader = csv.reader(rf)
for a in reader:
check_length.append(a[3])
if len(check_length) == 0:
return existing_columns
elif len(check_length) > sierra:
existing_columns.append({
'id': check_length[-1], 'name': check_length[-1],
'renamable': True, 'deletable': True
})
sierra = len(check_length)
return existing_columns
else:
return existing_columns
#Second code
#app.callback(Output('editing-columns', 'columns'),
Input('graph-update', 'n_intervals'),
State('editing-columns', 'columns'))
def update_columns(n, existing_columns):
counter = 0
with open('assets' + '/' + 'record ' + str(current) + '.csv', 'r') as rf:
reader = csv.reader(rf)
for a in reader:
counter = counter + 1
existing_columns.append({
'id': 'counter', 'name': a[3],
'renamable': True, 'deletable': True
})
return existing_columns
I've read about Hidden Div and storing data in a users browser session but I can't get it to work. Is that the direction I should be going or is there a more elegant solution?
Yes, you should use the store component for this.
You can store everything in a dict, with one key for your original file, and another key containing the modified one with user inputs. Make sure each callback reads in the store as a State. One should output to the table, and the other should output to the store.

Slicing in PyTables

What is the fastest way to slice arrays that are saved in h5 using PyTables?
The scenario is the following:
The data was already saved (no need to optimize here):
filters = tables.Filters(complib='blosc', complevel=5)
h5file = tables.open_file(hd5_filename, mode='w',
title='My Data',
filters=filters)
group = h5file.create_group(h5file.root, 'Data', 'Data')
X_atom = tables.Float32Atom(shape=[50,50,50])
X = h5file.create_carray(group, 'X', atom=X_atom, title='XData',
shape=(1000,), filters=filters)
The data is opened :
h5file = tables.openFile(hd5_filename, mode="r")
node = h5file.getNode('/', data_node)
X = getattr(node, X_str)
This is where I need optimization, I need to make a lot of the following kind of array slicing that cannot be sorted, for many many indexes and different min/max locations:
for index, min_x, min_y, min_z, max_x, max_y, max_z in my_very_long_list:
current_item = X[index][min_x:max_x,min_y:max_y,min_z:max_z]
do_something(current_item)
The question is:
Is this the fastest way to do the task?

Can I override a Lua table's return value for itself?

Is it possible for a table, when referenced without a key, to return a particular value rather than a reference to itself?
Let's say I have the following table:
local person = {
name = "Kapulani",
level = 100,
age = 30,
}
In Lua, I can quite easily refer to "person.name", "person.level", or "person.age" and get the values as expected. However, I have certain cases where I may want to just reference "person" and, instead of getting "table: " I'd like to return the value of "person.name" instead.
In other words, I'd like person.x (or person[x]) to return the appropriate entry from the table, but person without a key to return the value of person.name (or person["name"]). Is there a mechanism for this that I haven't been able to find?
I have had no success with metatables, since __index will only apply to cases where the key does not exist. If I put "person" into a separate table, I can come up with:
local true_person = {
... -- as above
}
local env_mt = {
__index = function(t, k)
if k == 'person' then
return true_person
end
end
}
local env = setmetatable( {}, env_mt )
This lets me use __index to do some special handling, except there's no discernable way for me to tell, from __index(), whether I'm getting a request for env.person (where I'd want to return true_person.name) or env.person[key] (where I'd want to return true_person as a table, so that 'key' can be accessed appropriately).
Any thoughts? I can approach this differently, but hoping I can approach this along these lines.
You can do it when the table is being used as a string by setting the __tostring metatable entry:
$ cat st.lua
local person = {
name = "Kapulani",
level = 100,
age = 30,
}
print(person)
print(person.name)
print(person.age)
setmetatable(person, {__tostring = function(t) return t.name end})
print(person)
$ lua st.lua
lua st.lua
table: 0x1e8478e0
Kapulani
30
Kapulani
I am not sure that what you are asking for is a good idea because it flies in the face of compositionality. Usually one would expect the following two programs to do the same thing but you want them to behave differently
print(person.name)
local p = person
print( p.name )
Its also not very clear how assignment would work. person.age = 10 should change the age but person = otherPerson should change the reference to the perrson, not the age.
If you don't care about compositionality and are onyl reading data, then a more direct way to solve the problem is to have a query function that receives the fields encoded in a string
query("person.age") -- 17
query("person.name") -- "hugomg"
query("person") -- 17; query gets to default to whatever it wants.
To keep the syntax more lightweight you can omit the optional parenthesis
q"person.age"
q"person"
Or you can extend the __index metamethod on the global table, _G
setmetattable(_G, { __index = function(self, key) return query(key) end })
print ( person_age ) -- You will need to use "_" instead of "." for the
-- query to be a valid identifier.

Resources