What are known reliable tools for syncing huge amounts of data between Oracle DB instances in live environment?
Requirements are that the host with live data is running in a live environment, i.e. the database is updated. Receiving host is offline, and will go online only when data sync is complete.
Most of the data is stored in blob columns and amount of data to sync reaches ~100GB. Only part of data from a table needs to move, while the actual size of the table is around 50 TB.
This is a clustered system, and each live machine is a clone of the other, each machine contains an instance of Oracle DB. Sometimes machines need to go under maintenance and they lose live data. When they come back up, the data needs to be synchronized. Machine is brought offline for maintenance usually not longer for 6 hours. Without having clone machines, we would not be able to ensure that system is up, when one of the machines must go for maintenance.
Sync should not severely influence the live machine CPU usage.
First thing to look at is Oracle Advanced Replication and Oracle Streams. You might want to consider getting a good book on Streams.
Related
We are using Golden Gate in production to replicate from Oracle Database into the Postgres. Together with that, the Golden Gate replicates also into another instance of Oracle Database.
Replicated Oracle Database is placed in the internal network of our company.
Target Oracle database is placed also in the internal network of our company.
Postgres is placed in AWS Amazon Cloud.
Replication Oracle->Oracle is without problem, there is no delay.
Replication Oracle->Postgres can have an inedibly large delay - sometimes in can grow up to 1 day delay. Also, there is no error reported.
We have been investigating the problem and we cannot find the cause: the network bandwidth is large enough for our transferred data, there is enough RAM memory and CPU is used only by 20%.
The only difference seems to be in the Ping in between internal network and AWS Amazon Cloud. In internal network the ping is approx 2ms and and into the amazon the ping is almost 20ms.
What can be the cause and how to resolve it?
You really should contact Oracle Support on this topic; however, Oracle GoldenGate 12.2 supports Postgres as a target (only).
As for your latency within your replication process. It sounds like Oracle-to-Oracle is working fine and that is within your internal network. The problem only appears when going Oracle-to-Postgres (AWS Cloud).
Do you have your lag monitoring configured? LAGINFO (https://docs.oracle.com/goldengate/c1221/gg-winux/GWURF/laginfo.htm#GWURF532) should be configured within your MGR processes. This will provide some baseline lag information for determining how to proceed forward.
Are you compressing the trail files?
How much data are you sending? DML stats?
This should get you started on the right path.
We are building a reporting app on Laravel that need to fetch users data from a third-party server that allow 1 request per seconds.
We need to fetch 100K to 1000K rows based on user and we can fetch max 250 rows per request.
So the restriction is:
1. We can send 1 request per seconds
2. 250 rows per request
So, it requires 400-4000 request/jobs to fetch a user data, So, loading data for multiple users is very time-consuming and the server gets slow.
So, now, we are planning to load the data using multiple servers, like 4-10 servers to fetch users data, so we can send 10 requests per second from 10 servers.
How can we design the system and process jobs from multiple servers?
Is it possible to use a dedicated server for hosting Redis and connect to that Redis server from multiple servers and execute jobs? Can any conflict/race-condition happen?
Any hint or prior experience related to this would be really helpful.
The short answer is yes, this is absolutely possible and is something I've implemented in production apps many times before.
Redis is just like any other service and can run anywhere, with clients from anywhere, connecting to it. It's all up to your configuration of the server to dictate how exactly that happens (and adding passwords, configuring spiped, limiting access via the firewall, etc.). I'd reccommend reading up on the documentation they have in the Administration section here: https://redis.io/documentation
Also, when you do make the move to a dedicated Redis host, with multiple clients accessing it, you'll likely want to look into having more than just one Redis server running for reliability, high availability, etc. Redis has efficient and easy replication available with a few simple configuration commands, which you can read more about here: https://redis.io/topics/replication
Last thing on Redis, if you do end up implementing a master-slave set up, you may want to look into high availability and auto-failover if your Master instance were to go down. Redis has a really great utility built into the application that can monitor your Master and Slaves, detect when the Master is down, and automatically re-configure your servers to promote one of the slaves to the new master. The utility is called Redis Sentinel, and you can read about that here: https://redis.io/topics/sentinel
For your question about race conditions, it depends on how exactly you write your jobs that are pushed onto the queue. For your use case though, it doesn't sound like this would be too much of an issue, but it really depends on the constraints of the third-party system. Either way, if you are subject to a race condition, you can still implement a solution for it, but would likely need to use something like a Redis Lock (https://redis.io/topics/distlock). Taylor recently added a new feature to the upcoming Laravel version 5.6 that I believe implements a version of the Redis Lock in the scheduler (https://medium.com/#taylorotwell/laravel-5-6-preview-single-server-scheduling-54df8e0e139b). You can look into how that was implemented, and adapt for your use case if you end up needing it.
Here is the problem - I have to use remote db for few hours a day. And the VPN we use (for unknown reason) drops Oracle connection several times an hour which is really annoying and time consuming..
The sysadmin who manages both the Sonic VPN and the DB cant help..
So I am thinking to place a db copy locally.
What I need/don't need:
the all changes on the remote db (the master) should propagate quite easily to the copy (auto or manually - I don't mind as soon as it a one button push). they are rare - once a day at most
my changes to the local db should not be propagated to the master (but I am flexible here)
I don't have to spend more than 5 min a day to maintain this
it would be nice to replicate only DDL from master (I don't need the actual data changes, only tables changes)
is there a sort of replication or any other solution I can use to achieve this?
Database Replication isn't cheap. Your company will pay more to build replication environment , starting from the oracle edition and license and many extra.
Replication will increase the complexity of the database administration.
Finally, the More important point ,Database replication work in your VPN environment :) (which is disconnected all the time ) and replication will fail all the time.
You can with network team:
Review the service level agreement (SLA) contract of VPN with the
service provider to know the percentage of time down and the Quality of service.
Network administrator monitor network to spot where is the problem-may be line /router/network configuration/network card.
Do some measures: what the size of your transaction per minute (in bytes) to select the best speed from the network service provider.
Measuring Network Bandwidth Using iperf , for ref: https://blogs.oracle.com/mandalika/entry/measuring_network_bandwidth_using_iperf
Perform a Network Performance Test
if the changes are once a day your best and easiest solution would be to do a full backup of master db then zip it ftp/email and unzip + restore on your end. But this wont be feasible if the db size is too large.
There is a new Redis cluster setup, one team I know in my company is working on, in order to improve the application data caching based out on Redis. The setup is as follows, a Redis cluster with a Redis master and many slaves, say 40-50 (but can grow more when the application is scaled), one Redis instance per one virtual machine. I was told this setup helps the applications deployed in servers on every virtual machines query the data present in the local Redis instance than querying an instance in the network in order to avoid network latency. Periodically, the Redis master is updated only with whatever data are modified or newly created or deleted (data backed by a relational database), say every 5 seconds or so. This will initiate the data sync operation with all the Redis slave instances. The data-consumers (the application deployed on all the virtual machines) of the Redis (slaves) reads updated values to do processing. Is this approach a correct one to the network latency problem faced by the applications in querying from a Redis instance that is within a data center network? Will this setup not create lots of network traffic when Redis master syncing the data with all its slave nodes?
I couldn't find much answers on this from the internet. Your opinions on this are much appreciated.
The relevance of this kind of architecture depends a lot about the workload. Here are the important criteria:
the ratio between the write and read operations. Obviously, the more read operations, the more relevant the architecture. The main benefit IMO, is not necessarily the latency gains, but the scalability, the extra reliability it brings, and the network resource consumption.
the ratio between the cost of a local Redis access against the cost of a remote Redis access. Do not assume that the only cost of a remote Redis access is the network latency. It is not. On my systems, a local Redis access costs about 50 us (in average, very low workload), while a remote access costs 120 us (in average, very low workload). The network latency is about 60 us. Measure the same kind of figures on your own system/network, with your own data.
Here are a few advices:
do not use a single Redis master against many slave instances. It will limit the scalability of the system. If you want to scale, you need to build a hierarchy of slaves. For instance, have the master replicates to 8 slaves. Each slave replicates to 8 other slaves locally running on your 64 application servers. If you need to add more nodes, you can tune the replication factor at the master or slave level, or add one more layer in this tree for extreme scalability. It brings you flexibility.
consider using unix socket between the application and the local slaves, rather than TCP sockets. If it good for both latency and throughput.
Regarding your last questions, you really need to evaluate the average local and remote latencies to decide whether this is worth it. Note that the protocol used by Redis to synchronize master and slaves is close to the normal client server traffic. Every SET commands applied on the master, will be also applied on the slave. The network bandwidth consumption is therefore similar. So in the end, it is really a matter of how many reads and how many writes you expect.
I have created a site that uses MongoDB as the database engine and at the moment it is still under construction so it is not getting much traffic. This means that there are periods of no requests and therefor, no queries to the database.
When I do eventually hit the site pages that use the database, MongoDB seems to take 4 or 5 seconds to come back but from that request on, it is very fast.
I can't find any information on there being a timeout or anything like that. Is it just that the database in memory is being paged out and it takes a few seconds to page it back in? It is running on a Windows Server 2008 VM and I am running it as a windows service.
Any help would be appreciated.
MongoDB allows the OS Kernel to handle what is kept in Memory (the current "Working Set"). Even if nothing is happening, the system will still page objects out of RAM into the page/swap, even if the RAM capacity is not being taxed.
One way around this would be to monitor for idleness and send queries in the background, or even have a background process cat the files on-disk. This is especially helpful in pre-warming databases after startup, and likewise if your usage forms cyclical patterns.
Like most styles of databases recent query results can be cached, the execution plans can be stored in some but it doesn't seem like mongodb stores query caches.
Also to improve performance make sure you implement indexes well, so you don't mistakenly create full table scans and leverage some form of index. use the explain command to see your query execution plan. (http://docs.mongodb.org/manual/reference/method/cursor.explain/)
http://docs.mongodb.org/manual/faq/fundamentals/
Does MongoDB handle caching?
Yes. MongoDB keeps all of the most recently used data in RAM. If you have created indexes for your queries and your working data set fits in RAM, MongoDB serves all queries from memory.
MongoDB does not implement a query cache: MongoDB serves all queries directly from the indexes and/or data files.