We have a problem with our Cassandra 2.0.6. cluster. Our setup is the following:
2 data centers, named: DC1, DC2
Two nodes in each DC
Using the NetworkTopologyStrategy for replication
Client is connecting with the Datastax Java Driver v. 1.0.3
First, I made the keyspace containing one table.
CREATE KEYSPACE test
WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC1': '1',
'DC2': '1'
};
CREATE TABLE account (
id text,
code text,
alias text,
PRIMARY KEY (id, code)
);
And then I shutdown DC2 before running this statement:
INSERT INTO test.account (id, code, alias) VALUES ( '1', '2', '3') if not exists;
which resulted in the error message:
>>>> Unable to complete request: one or more nodes were unavailable.
Using the same environment, running this statement was OK:
INSERT INTO test.account (id, code, alias) VALUES ( '1', '2', '3')
I found the Cassandra ticket for DC-local CAS, so I thought the CQL in this situation must be processed only in the local data center, but it wasn't.
What's wrong with my understanding of light-weight transactions?
The DataStax documentation explains (emphasis added):
Cassandra 2.0 uses the Paxos consensus protocol, which resembles 2-phase commit, to support linearizable consistency. All operations are quorum-based and updates will incur a performance hit ...
By default, your CAS operation (IF NOT EXISTS) uses a SERIAL isolation level for Paxos and needs to contact a quorum of all replicas. When you have only two replicas, as in your case (one in each data center), quorum requires both replicas. If you try your non-CAS insert with QUORUM consistency level, it too will fail.
CASSANDRA-5797 introduces the concept of LOCAL_SERIAL isolation level but it is not the default and must be explicitly specified in order to be used. How to do that will depend on how you are interfacing with Cassandra, e.g. the cqlsh client vs the DataStax Java Driver.
Related
We have a Oracle 19C database (19.0.0.0.ru-2021-04.rur-2021-04.r1) on AWS RDS which is hosted on an 4 CPU 32 GB RAM instance. The size of the database is not big (35 GB) and the PGA Aggregate Limit is 8GB & Target is 4GB. Whenever the scheduled internal Oracle Auto Optimizer Stats Collection Job (ORA$AT_OS_OPT_SY_nnn) runs then it consumes substantially high PGA memory (approx 7GB) and sometimes this makes database unstable and AWS loses communication with the RDS instance so it restarts the database.
We thought this may be linked to existing Oracle bug 30846782 (19C+: Fast/Excessive PGA growth when using DBMS_STATS.GATHER_TABLE_STATS) but Oracle & AWS had fixed it in the current 19C version we are using. There are no application level operations that consume this much PGA and the database restart have always happened when the Auto Optimizer Stats Collection Job was running. There are couple of more databases, which are on same version, where same pattern was observed and the database was restarted by AWS. We have disabled the job now on those databases to avoid further occurrence of this issue however we want to run this job as disabling it may cause old stats being available in the database.
Any pointers on how to tackle this issue?
I found the same issue in my AWS RDS Oracle 18c and 19c instances, even though I am not in the same patch level as you.
In my case, I applied this workaround and it worked.
SQL> alter system set "_fix_control"='20424684:OFF' scope=both;
However, before applying this change, I strongly suggest that you test it on your non production environments, and if you can, try to consult with Oracle Support. Dealing with hidden parameters might lead to unexpected side effects, so apply it at your own risk.
Instead of completely abandoning automatic statistics gathering, try find any specific objects that are causing the problem. If only a small number of tables are responsible for a large amount of statistics gathering, you can manually analyze those tables or change their preferences.
First, use the below SQL to see which objects are causing the most statistics gathering. According to the test case in bug 30846782, the problem seems to be only related to the number of times DBMS_STATS is called.
select *
from dba_optstat_operations
order by start_time desc;
In addition, you may be able to find specific SQL statements or sessions that generate a lot of PGA memory with the below query. (However, if the database restarts, it's possible that AWR won't save the recorded values.)
select username, event, sql_id, pga_allocated/1024/1024/1024 pga_allocated_gb, gv$active_session_history.*
from gv$active_session_history
join dba_users on gv$active_session_history.user_id = dba_users.user_id
where pga_allocated/1024/1024/1024 >= 1
order by sample_time desc;
If the problem is only related to a small number of tables with a large number of partitions, you can manually gather the stats on just that table in a separate session. Once the stats are gathered, the table won't be analyzed again until about 10% of the data is changed.
begin
dbms_stats.gather_table_stats(user, 'PGA_STATS_TEST');
end;
/
It's not uncommon for a database to spend a long time gathering statistics, but it is uncommon for a database to constantly analyze thousands of objects. Running into this bug implies there is something unusual about your database - are you constantly dropping and creating objects, or do you have a large number of objects that have 10% of their data modified every day? You may need to add a manual gather step to a few of your processes.
Turning off the automatic statistics job entirely will eventually cause many performance problems. Even if you can't add manual gathering steps, you may still want to keep the job enabled. For example, if tables are being analyzed too frequently, you may want to increase the table preference for the "STALE_PERCENT" threshold from 10% to 20%:
begin
dbms_stats.set_table_prefs
(
ownname => user,
tabname => 'PGA_STATS_TEST',
pname => 'STALE_PERCENT',
pvalue => '20'
);
end;
/
We are working on 2 AIX 7 server and 2 Oracle databases 12.1.0.2.
1 database (called in this topic DB1) is our central PROD db.
The second database (called in this topic DB2) is a production DB too, but for used for a non critical application.
We want to isolate traitement (impact as less as possible DB1) executed on DB2 (with joins) from the central production database DB1.
These traitements uses DBLINK to read DB1 datas.
So the question is:
If we perform a query like
select col1, col2 from table1#dblink_DB1, table2#dblink_DB1 where JOIN DB1/DB2
On which server the JOIN treatment is executed?
Are only reads occurring on DB1 (so low performance case) and JOIN treatment is executed with SGA/CPU on DB2?
Or is everything executing on DB1?
Such queries (which can be executed fully remotely, without access to local database) usually work on the remote db link site and it's much better than if it work on local database, since in this case it would read leading table and run (Select * from table#dblink_DB1 where col=:a) so many times as a number of rows returned from table1#dblink_DB2. Of course, you can force it run locally using hint driving_site, but this case it would be far less effective for both databases. Read more about driving_site hint. And also you should now that dml statements (update/delete/merge/insert) work always on the database where you change data.
We have a replication with goldengate from a prod environment.
The tables got initial dumped from the prod and afterwards we started the replication with goldengate. Now we want to migrate the data to another database. But the query plans are different from the prod environment. We think it is because all statistics from the database of the replication are broken/wrong.
The number of rows stated in dba_tables are null, 0 or differs 50-80%.
We tried to do dbms_stats.gather_table_stats on all relevant tables.
It's still broken. We run that querie for all tables that had wrong statistics:
dbms_stats.GATHER_TABLE_STATS(OWNNAME => 'SCHEMA', TABNAME => 'TABLE_NAME', CASCADE => true);
We can't migrate with the bad queryplans.
We are using Oracle Release 12.2.0.1.0 - Production
EDIT: After the answer of #Jon Heller we saw that some indices are partitioned in the prod environment not in the replication. Additionally the global preference DEGREE is 32768 on the replication and NULL on prod.
Are the tables built exactly the same way? Maybe a different table structure is causing the statistics to break, like if one table is partitioned and another is not. Try comparing the DDL:
select dbms_metadata.get_ddl('TABLE', 'TABLE1') from dual;
I'm surprised to hear that statistics are wrong even after gathering stats. Especially the number of rows - since 10g, that number should always be 100% accurate with the default settings.
Can you list the exact commands you are using to gather stats? Also, this is a stretch, but possibly the global preference were changed on one database. It would be pretty evil, but you could set a database default to only look at 0.00001% of the data, which would create terrible statistics. Check your global preferences between both databases.
--Thanks to Tim Hall for this query: https://oracle-base.com/dba/script?category=monitoring&file=statistics_prefs.sql
SELECT DBMS_STATS.GET_PREFS('AUTOSTATS_TARGET') AS autostats_target,
DBMS_STATS.GET_PREFS('CASCADE') AS cascade,
DBMS_STATS.GET_PREFS('DEGREE') AS degree,
DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT') AS estimate_percent,
DBMS_STATS.GET_PREFS('METHOD_OPT') AS method_opt,
DBMS_STATS.GET_PREFS('NO_INVALIDATE') AS no_invalidate,
DBMS_STATS.GET_PREFS('GRANULARITY') AS granularity,
DBMS_STATS.GET_PREFS('PUBLISH') AS publish,
DBMS_STATS.GET_PREFS('INCREMENTAL') AS incremental,
DBMS_STATS.GET_PREFS('STALE_PERCENT') AS stale_percent
FROM dual;
If gathering statistics still leads to different results, the only thing I can think of is corruption. It may be time to create an Oracle service request.
(This is more of an extended comment than an answer, but it might take a lot of code to diagnose this problem. Please update the original question with more information as you find it.)
We are using HDF to fetch large data from oracle. We have a generateTableFetch to create partition of 8000 records which create query like below :
Select * from ( Select a.*, ROWNUM rnum FROM (SELECT * FROM OPUSER.DEPENDENCY_TYPES WHERE (1=1))a WHERE ROWNUM <= 368000) WHERE rnum > 361000
Now this query is taking almost 20-25min to return from oracle.
Is there anything wrong that we are doing wrong or any configuration changes we can do.
Nifi uses jdbc connection so is there any oracle side configuration for that.
Also if we somehow add parallelism hint to the query example /parallel(c,2)/. WIll this help?
I'm guessing you're using Oracle 11 (or less) and have selected Oracle as the database type. Since LIMIT/OFFSET wasn't introduced until Oracle 12, NiFi uses the nested SELECT with ROWNUM approach to ensure each "page" of data contains unique values. If you are using Oracle 12+, make sure to use the Oracle 12+ database adapter instead, as it can leverage the LIMIT/OFFSET capabilities resulting in a faster query. Also make sure you have the appropriate index(es) in place to help with query execution.
As of NiFi 1.7.0, you might also consider setting the Column for Value Partitioning property. If you have a column (perhaps your DEPENDENCY_TYPES column) that is fairly uniformly distributed, and is not "too sparse" in relation to your Partition Size property value, GenerateTableFetch can use the column's values rather than the ROWNUM approach, resulting in faster queries. See NIFI-5143 and the GenerateTableFetch documentation for more details.
If you need to add hints to the JDBC session, then as of NiFi 1.9.0 (see NIFI-5780 for more details) you can add pre- and post-query statements to ExecuteSQL.
I'm having trouble setting up a new Cassandra cluster. I've set up a 3 node cluster in EC2 (Zone: eu-west-1b). When I try to insert a record into a new table I receive this error message:
cqlsh:test> insert into mytest (id, value) values(1,100);
Unable to complete request: one or more nodes were unavailable.
I've confirmed that the 3 nodes are up and running:
nodetool status
UN ***.***.***.*** 68.1 KB 256 33.2% bbf1c5e9-ac68-41a1-81a8-00c7877c4eac rack1
UN ***.***.***.*** 81.95 KB 256 34.1% e118e3a7-2486-4c08-8ba1-d337888ff59c rack1
UN ***.***.***.*** 68.12 KB 256 32.7% 041cb88e-df21-4640-b7ac-7a87fd38dae6 rack1
The commands I used to create the keyspace and table are:
create keyspace test with replication ={'class':'NetworkTopologyStrategy', 'eu-west-1b': 2};
use test;
create table mytest (id int primary key, value int);
insert into mytest (id, value) values(1,100);
Each node can see the keyspace - I used CQLSH and ran descibe keyspace and got this output from each node:
CREATE KEYSPACE test WITH replication = {
'class': 'NetworkTopologyStrategy',
'eu-west-1b': '2'
};
USE test;
CREATE TABLE mytest (
id int PRIMARY KEY,
value int
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
I finally tracked down the problem - I had set the endpoint_snitch to Ec2Snitch, but the default Datastax was set beneath the comments (which I hadn't noticed). I commented out the DS snitch, restarted the dse service on all nodes and ran nodetool repair on each node and the problem went away.
As per Mark's response first check if your cassandra cluster is in aws. if it is then change the configuration in "cassandra.yaml" just change endpoint_snitch to Ec2Snitch again one thing that culd be poossible is your datacenter is actually the "region" of ec2 instance and it should be like "us-east','us-west'. In your case it should be 'eu-west' only.
As per datastax says about it
EC2Snitch¶
Use the EC2Snitch for simple cluster deployments on Amazon EC2 where all nodes in the cluster are within a single region. The region is treated as the data center and the availability zones are treated as racks within the data center. For example, if a node is in us-east-1a, us-east is the data center name and 1a is the rack location. Because private IPs are used, this snitch does not work across multiple regions.
When defining your keyspace strategy_options, use the EC2 region name (for example,us-east) as your data center name.
link - http://www.datastax.com/docs/1.0/cluster_architecture/replication
http://www.datastax.com/documentation/cql/3.1/cql/cql_using/update_ks_rf_t.html