Tarantool sphia make slow selects? - tarantool

Use tarantool version: Tarantool 1.6.8-586-g504e151
It installed from epel.
I use tarantool on sphia mode:
log_space = box.schema.space.create('logs',
{
engine = 'sophia',
if_not_exists = true
}
)
log_space:create_index('primary', {
parts = {1, 'STR'}
}
)
I have 500.000 records and make select request:
box.space.logs:select({'log_data'})
it takes aboute 1min.
Why so slow ?
unix/:/var/run/tarantool/g_sofia.control> box.stat()
—-
- DELETE:
total: 0
rps: 0
SELECT:
total: 587575
rps: 25
INSERT:
total: 815315
rps: 34
EVAL:
total: 0
rps: 0
CALL:
total: 0
rps: 0
REPLACE:
total: 1
rps: 0
UPSERT:
total: 0
rps: 0
AUTH:
total: 0
rps: 0
ERROR:
total: 23
rps: 0
UPDATE:
total: 359279
rps: 17

Sophia engine is deprecated since 1.7.x . Please use vinyl engine instead of it.
Please take a look for more details: https://www.tarantool.io/en/doc/1.10/book/box/engines/vinyl/

After direct on-site help and debugging with agent-0007, we have found several issues.
Most of them been related to slow virtual environment (openvz been used), which shows inadequate pread() stalls and io timings.
Additionally we have found two integration issues:
https://github.com/tarantool/tarantool/issues/1411 (SIGSEGV in eio_finish)
https://github.com/tarantool/tarantool/issues/1401 (Bug in upsert applier callback function using sophia)
Thanks.

Related

Edit yaml objects in array with yq. Speed up Terminalizer's terminal cast (record)

The goal: Speed up Terminalizer's terminal cast (record)
I have a record of terminal created with Terminalizer. cast.yaml:
# The configurations that used for the recording, feel free to edit them
config:
# do not touch it
# Records, feel free to edit them
records:
- delay: 841
content: "\e]1337;RemoteHost=kyb#kyb-nuc\a\e]1337;CurrentDir=/home/kyb/devel/git-rev-label\a\e]1337;ShellIntegrationVersion=7;shell=fish\a"
- delay: 19
content: "\e]1337;RemoteHost=kyb#kyb-nuc\a\e]1337;CurrentDir=/home/kyb/devel/git-rev-label\a\e]0;fish /home/kyb/devel/git-rev-label\a\e[30m\e(B\e[m"
- delay: 6
content: "\e[?2004h"
- delay: 28
content: "\e]0;fish /home/kyb/devel/git-rev-label\a\e[30m\e(B\e[m\e[2m⏎\e(B\e[m \r⏎ \r\e[K\e]133;D;0\a\e]133;A\a\e[44m\e[30m ~/d/git-rev-label \e[42m\e[34m \e[42m\e[30m demo \e[30m\e(B\e[m\e[32m \e[30m\e(B\e[m\e]133;B\a\e[K"
- delay: 1202
content: "#\b\e[38;2;231;197;71m#\e[30m\e(B\e[m"
- delay: 134
content: "\e[38;2;231;197;71m#\e[30m\e(B\e[m"
- delay: 489
content: "\e[38;2;231;197;71m \e[30m\e(B\e[m"
- delay: 318
I want to speed up payback without passing --speed-factor to terminalizer play. To do so delays should be decreased.
So, I need to create yq-expression to make delays lower
.records.delay=.records.delay/3
but this expression won't work. Please help to write proper one.
.records is an array, so you could use this filter:
.records |= map(.delay /= 3)
Or you might prefer:
.records[].delay |= (. /= 3)

Replication queue got stuck on error

I've 3 node cluster with replication 2 and the replicated table stats.
Recently saw that there is a delay on the replica db using /replica_satatus
db.stats: Absolute delay: 0. Relative delay: 0.
db2.stats: Absolute delay: 912916. Relative delay: 912916.
Here is data from system.replication_queue
Row 1:
──────
database: db2
table: stats
replica_name: replica_2
position: 3
node_name: queue-0001743101
type: GET_PART
create_time: 2018-06-19 20:57:42
required_quorum: 0
source_replica: replica_1
new_part_name: 20180619_20180619_823572_823572_0
parts_to_merge: []
is_detach: 0
is_currently_executing: 0
num_tries: 917943
last_exception:
last_attempt_time: 2018-06-29 15:32:50
num_postponed: 118617
postpone_reason:
last_postpone_time: 2018-06-29 15:32:23
Row 2:
──────
database: db2
table: stats
replica_name: replica_2
position: 4
node_name: queue-0001743103
type: MERGE_PARTS
create_time: 2018-06-19 20:57:48
required_quorum: 0
source_replica: replica_1
new_part_name: 20180619_20180619_823568_823573_1
parts_to_merge: ['20180619_20180619_823568_823568_0','20180619_20180619_823569_823569_0','20180619_20180619_823570_823570_0','20180619_20180619_823571_823571_0','20180619_20180619_823572_823572_0','20180619_20180619_823573_823573_0']
is_detach: 0
is_currently_executing: 0
num_tries: 917943
last_exception: Code: 234, e.displayText() = DB::Exception: No active replica has part 20180619_20180619_823568_823573_1 or covering part, e.what() = DB::Exception
last_attempt_time: 2018-06-29 15:32:50
num_postponed: 199384
postpone_reason: Not merging into part 20180619_20180619_823568_823573_1 because part 20180619_20180619_823572_823572_0 is not ready yet (log entry for that part is being processed).
last_postpone_time: 2018-06-29 15:32:35
Any clue how to deal with it?.
Should I detach broken replika partition and attach it again?
Stop all inserts to this cluster, it should auto clear the replication queue.

using Jmeter with non-GUI for test plan

i have faced a weird problem
i run 300 users simultaneously to log in to a website and read a file
i used the non-GUI mode to do this test plan
my problem is that this test plan have passed for just one time then when i run it again it get error then i tried to reduce the number of users to 200 and it passed but again after a while it did not.
Here is what i get:
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=64m; support
was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; sup
port was removed in 8.0
Creating summariser <summary>
Created the tree successfully using C:\Users\samo\Dropbox\Jmeter\jmetet\Reading_
script.jmx
Starting the test # Mon Jul 07 13:24:11 GMT+03:00 2014 (1404728651964)
Waiting for possible shutdown message on port 4445
summary + 1980 in 48s = 41.6/s Avg: 5536 Min: 6 Max: 21171 Err: 77
2 (38.99%) Active: 300 Started: 300 Finished: 0
summary + 1272 in 40.1s = 31.7/s Avg: 3257 Min: 3 Max: 39796 Err: 3
1 (2.44%) Active: 192 Started: 300 Finished: 108
summary = 3252 in 77.4s = 42.0/s Avg: 4644 Min: 3 Max: 39796 Err: 80
3 (24.69%)
summary + 1203 in 70s = 17.2/s Avg: 6020 Min: 3 Max: 69837 Err: 5
8 (4.82%) Active: 84 Started: 300 Finished: 216
summary = 4455 in 107s = 41.5/s Avg: 5016 Min: 3 Max: 69837 Err: 86
1 (19.33%)
summary + 608 in 100s = 6.1/s Avg: 6753 Min: 3 Max: 78722 Err: 4
2 (6.91%) Active: 7 Started: 300 Finished: 293
summary = 5063 in 137s = 36.9/s Avg: 5224 Min: 3 Max: 78722 Err: 90
3 (17.84%)
summary + 37 in 41s = 0.9/s Avg: 4880 Min: 4 Max: 37736 Err: 1
7 (45.95%) Active: 0 Started: 300 Finished: 300
summary = 5100 in 142s = 35.9/s Avg: 5222 Min: 3 Max: 78722 Err: 92
0 (18.04%)
Tidying up ... # Mon Jul 07 13:26:34 GMT+03:00 2014 (1404728794704)
... end of run
what did i miss to face this problem?
and how to know if the problem is out of memory or something else
hi guys i have figured out the problem
1- First of all i changed the heap size in the properties file in the bin folder to be:
BEFORE: set HEAP=-Xms512m -Xmx512m AFTER: set HEAP=-Xms2048m -Xmx2048m
2- Removed all the listeners i used before
3- Set the ramp up time in the thread group interface to be 180.
Ramp-Up before making changes was set to 1 which is not realistic because Jmeter can not run all the 300 users in 1 second
4- Set the loop count in the thread group interface to be 2
the error that i get before making these change was
java.net.SocketException,Non HTTP response message: Connection reset
which means that the server closed the connection
hope this can help someone out there

Cassandra read latency high even with row caching, why?

I am testing cassandra performance with a simple model.
CREATE TABLE "NoCache" (
key ascii,
column1 ascii,
value ascii,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='ALL' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
I am fetching 100 columns of a row key using pycassa, get/xget function (). but getting read latency about 15ms in the server.
colums=COL_FAM.get(row_key, column_count=100)
nodetool cfstats
Column Family: NoCache
SSTable count: 1
Space used (live): 103756053
Space used (total): 103756053
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 20
Read Latency: 15.717 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Bloom Filter False Positives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 976
Compacted row minimum size: 4769
Compacted row maximum size: 557074610
Compacted row mean size: 87979499
Latency of this type is amazing! When nodetool info shows that read hits directly in the row cache.
Row Cache : size 4834713 (bytes), capacity 67108864 (bytes), 35 hits, 38 requests, 1.000 recent hit rate, 0 save period in seconds
Can anyone tell me why is cassandra taking so much time while reading from row cache?
Enable tracing and see what it's doing. http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

Caching not Working in Cassandra

I dont seem to have any caching enabled when checking in Opscenter or cfstats. Im running Cassandra 1.1.7 with Solandra on Debian. I have set the required global options in cassandra.yaml:
key_cache_size_in_mb: 800
key_cache_save_period: 14400
row_cache_size_in_mb: 800
row_cache_save_period: 15400
row_cache_provider: SerializingCacheProvider
Column Families were created as follows:
create column family example
with column_type = 'Standard'
and comparator = 'BytesType'
and default_validation_class = 'BytesType'
and key_validation_class = 'BytesType'
and read_repair_chance = 1.0
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'ALL';
Opscenter shows no data available on caching graphs and CFSTATS doesn't show any cache related fields:
Column Family: charsets
SSTable count: 1
Space used (live): 5558
Space used (total): 5558
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 61381
Read Latency: 0.123 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 16
Compacted row minimum size: 1917
Compacted row maximum size: 2299
Compacted row mean size: 2299
Any help or suggestions are appreciated.
Sam
The caching stats have been moved from cfstats to info in Cassandra 1.1. If you run nodetool info you should see something like:
Key Cache : size 5552 (bytes), capacity 838860800 (bytes), 38 hits, 47 requests, 0.809 recent hit rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 838860800 (bytes), 0 hits, 0 requests, NaN recent hit rate, 15400 save period in seconds
This is because there are now global caches, rather than per-CF. It seems that Opscenter needs updating for this change - maybe there is a later version available that will work.

Resources