How do I decide where I should locate my TimesTen database files? - performance

I am setting up a TimesTen In-Memory database and I am looking for guidance on the storage and location that I should use for the database's persistence files.

A TimesTen database consists of two types of file; checkpoint files (two) and transaction log files (always at least one, usually many).
There are 3 criteria to consider:
a) Data safety and availability (regular storage versus RAID). The database files are critical to the operation of the database and if they become inaccessible or are lost/damaged then your database will become inoperable and you will likely lose data. One way to protect against this is to use TimesTen's built in replication to implement high availability but even if you do that you may also want to protect your database files using some form of RAID storage. For performance reasons RAID-1 is preferred over RAID-5 or RAID-6. Use of NFS storage is not recommended for database files.
b) Capacity. Both checkpoint files are located in the same directory (Datastore attribute) and hence in the same filesystem. Each file can grow to a maximum size of PermSize + ~64 MB. Normally the space for these files is pre-allocated when the files are created, so it is less likely you will run out of space for them. By default, the transaction log files are also located in the same directory as the checkpoint files, though you can (and almost always should) place them in a different location by use of the LogDir attribute. The filesystem where the transaction logs are located should have enough space such that you never run out. If the database is unable to write data to the transaction logs it will stop processing transactions and operations will start to receive errors.
c) Performance. If you are using traditional spinning magnetic media, then I/O contention is a significant concern. The checkpoint files and the transaction log files should be stored on separate devices and separate from any other files that are subject to high levels of I/O activity. I/O loading and contention is less of a consideration for SSD storage and usually irrelevant for PCIe/NVMe flash storage.

Related

Ordered mode behavior in journaling file system

In the following article, it says "ordered mode is metadata journaling only but writes the data before journaling the metadata." Does this mean the data is physically written to disk before the metadata is written? From what I understand, data written to a disk is first placed in a cache and then flushed to disk. Or is whether the data was actually written to disk irrevelent to the journaling service?
Is the metadata that is placed in the journal written directly to disk without first being written to a cache?

Offloading unstructured data saved in RDBMS to Hadoop

My organization is thinking about offloading the unstructured data like Text , images etc saved as part of Tables in Oracle Database , into Hadoop. The size of the DB is around 10 TB and growing. The size of the CLOB/BLOB columns is around 3 TB.Right now these columns are queried for certain kind of reports through a web application. They are also written into but not very frequently.
What kind of approach we can take to achieve proper offloading of data and ensuring that the offloaded data is available for read through existing web application.
You can get part of the answer in oracle blog (link).
If data needs to be pulled in HDFS environment via sqoop, then you must first read the following from sqoop documentation.
Sqoop handles large objects (BLOB and CLOB columns) in particular ways. If this data is truly large, then these columns should not be fully materialized in memory for manipulation, as most columns are. Instead, their data is handled in a streaming fashion. Large objects can be stored inline with the rest of the data, in which case they are fully materialized in memory on every access, or they can be stored in a secondary storage file linked to the primary data storage. By default, large objects less than 16 MB in size are stored inline with the rest of the data. At a larger size, they are stored in files in the _lobs subdirectory of the import target directory. These files are stored in a separate format optimized for large record storage, which can accomodate records of up to 2^63 bytes each. The size at which lobs spill into separate files is controlled by the --inline-lob-limit argument, which takes a parameter specifying the largest lob size to keep inline, in bytes. If you set the inline LOB limit to 0, all large objects will be placed in external storage.
Reading via web application is possible if you are using MPP query engine like Impala and it works pretty well and it is production ready technology. We heavily use complex Impala queries to render content for SpringBoot application. Since Impala runs everything in memory, there is a chance of slowness or failure if it is multi-tenant Cloudera cluster. For smaller user groups (1000-2000 user base) it works perfectly fine.
Do let me know if you need more input.
Recommendation will be
Use Cloudera distribution (read here)
Give enough memory for Impala Deamons
Make sure you YARN is configured correctly for schedule (fair share or priority share) based ETL load vs Web Application Load
If required keep the Impala Daemons away from YARN
Define memory quota for Impala Memory so it allows concurrent queries
Flatten your queries so Impala runs faster without joins and shuffles.
If you are reading just a few columns, store in Parquet, it works very fast.

HBase - What's the difference between WAL and MemStore?

I am trying to understand the HBase architecture. I can see two different terms are used for same purpose.
Write Ahead Logs and Memstore, both are used to store new data that hasn't yet been persisted to permanent storage.
What's the difference between WAL and MemStore?
Update:
WAL - is used to recover not-yet-persisted data in case a server crashes.
MemStore - stores updates in memory as Sorted Keyvalue.
It seems lot of duplication of data before writing the data to Disk.
WAL is for recovery NOT for data duplication.(further see my answer here)
Pls go through below to understand more...
A Hbase Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
The Write Ahead Log (WAL) records all changes to data in HBase, to file-based storage. if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed.
With a single WAL per RegionServer, the RegionServer must write to the WAL serially, because HDFS files must be sequential. This causes the WAL to be a performance bottleneck.
WAL can be disabled to improve performance bottleneck.
This is done by calling the Hbase client field
Mutation.writeToWAL(false)
General Note : Its general practice that while doing bulkloading data, WAL is disabled to get speed. But side effect is if you disable WAL you cant get back data to replay if in case any memory crashes.
More over if you use solr+ HBASE + LILY, i.e LILY Morphiline NRT indexes with hbase then it will work on WAL if you disable WAL for performance reasons, then Solr NRT indexing wont work. since Lily works on WAL.
please have a look at Hbase architecture section

How to get the hdfs usage report in details

We got hdfs of capacity 900TB. As the data stored is growing a lot its difficult to keep track of what is useful and what could be deleted.
I want to analyze hdfs usage for following pattern so that the capacity could be used optimally.
What is the frequently accessed data.
Data not being touched/accessed for long time (Possible candidate for deletion)
Data usage distribution by users.
Active users.
You can derive that data from:
(1) HDFS audit log (access patterns per user/ip)
(2) fsimage (access times per file, data not accessed)
(1) Do you have HDFS audit log enabled? Read more here.
(2) To start with fsimage read this - there is an example to get "Data not being touched/accessed for long time"
You may also want to consider HAR to archive the data (instead of delete) - thus reduce both storage usage and precious memory on the namenode.

Backup COPY vs BACKUPSET

Oracle has two options of backuping database, and documentation on them is very brief.
To back up to disk as image copies, use BACKUP AS COPY as shown in
BACKUP AS COPY
DEVICE TYPE DISK
DATABASE;
To back up your data
into backup sets, use the AS BACKUPSET clause. You can allow backup
sets to be created on the configured default device, or direct them
specifically to disk or tape
BACKUP AS BACKUPSET
DATABASE;
BACKUP AS BACKUPSET
DEVICE TYPE DISK
DATABASE;
What is the difference between the two, why there are these multiple options?
To put it simply, back up as copy makes a simple copy of database files(the same way Linux cp command does), whereas backup sets is a logical entity to backup pieces as a tablespace to data files. Backup pieces are in an RMAN specific binary format.
why there are these multiple options?
To give the opportunity to perform backup and recovery more effectively and efficiently. For example, you can simply switch to an image copy of a data file avoiding, possibly time consuming, restoration process. But you cannot perform incremental backups with image copies as you be able to do so with backup sets, etc.
The choice of options, of course depends on your B&R strategy.
Find out more

Resources