Why only one active partition in MBR? - boot

I was reading about bootstrapping of operating systems and read in detail about usage of MBR. But wherever I referred, they mentioned that there can be only one active partition among four primary partitions. And when the code contained in the boot sector of that active partition is executed, the user is given a menu for selecting one of the operating systems (in case of multiboot).
I have following questions regarding my above description:
What are disadvantages of having more than one active partition in MBR?
And why only four primary partition? (Is it because of the limitation of size of MBR)
What is the use of primary partitions other than active partition?

Wikipedia has a nice article on MBR with a lot of useful links. "Only one active partition" seems to be a design choice from the early IBM/DOS bootloader, and has remained that way since. Basically they defined multiple active partitions as an error, and checked for this error at boot. It kind of makes sense because you can only boot one operating system at a time anyway, and a forced single active partition will prevent ambiguity. If I recall correctly LILO and possibly GRUB (linux bootloaders) don't mind if there are multiple active partitions, so I think this is a DOS/Windows issue mostly.
As for your questions.
An "active" partition only means that the first byte is different from an "inactive" partition. There's no advantage or disadvantage, it's just a flag.
Partition information is stored in a fixed size record, and it has only room for 4 partitons. However, "extended" and "logical" partitions (which are stored separately) can be used to enable more than 4 partitions. Only primary partitions can be used for booting.
DOS/Windows has no particular use of more than a single partition, but a user can decide to partition his disk for convenience. A partition is a logical volume that shows up as drive letter and "disk" in "My Computer". One advantage of having a partition separate from the operating system is that you store files there and later reinstall the operating system without loosing all your files. Installing an operating system usually involves formatting (erasing) a partition.
Linux (or rather Un*x) has a tradition of using partitions to improve system resillience, by keeping essential tools and boot images on a single small partition. That way the essential parts of the system are less likely to suffer from disk errors. This can be done more elaborately by segmenting different parts of the system into different partitions with the intention of isolating any disk error that might occur. A major advantage of this is that you can use the essential system to recover from many errors that otherwise would be impossible to recover from.

Related

Efficiently store daily dumps in Hadoop HDFS

I believe a common usage pattern for Hadoop is to build a "data lake" by loading regular (e.g. daily) snapshots of data from operational systems. For many systems, the rate of change from day to day is typically less than 5% of rows (and even when a row is updated, only a few fields may change).
Q: How can such historical data be structured on HDFS, so that it is both economical in space consumption, and efficient to access.
Of course, the answer will depend on how the data is commonly accessed. On our Hadoop cluster:
Most jobs only read and process the most recent version of the data
A few jobs process a period of historical data (e.g. 1 - 3 months)
A few jobs process all available historical data
This implies that, while keeping historical data is important, it shouldn't come at the cost of severely slowing down those jobs that only want to know what the data looked like at close-of-business yesterday.
I know of a few options, none of which seem quite satisfactory:
Store each full dump independently as a new subdirectory. This is the most obvious design, simple, and very compatible with the MapReduce paradigm. I'm sure some people use this approach, but I have to wonder how they justify the cost of storage? Supposing 1Tb is loaded each day, then that's 365Tb added to the cluster per year of mostly duplicated data. I know disks are cheap these days, but most budget-makers are accustomed to infrastructure expanding proportional to business growth, as opposed to growing linearly over time.
Store only the differences (delta) from the previous day. This is a natural choice when the source systems prefer to send updates in the form of deltas (a mindset which seems to date from the time when data was passed between systems in the form of CD-ROMs). It is more space efficient, but harder to get right (for example, how do you represent deletion?), and even worse it implies the need for consumers to scan the whole of history, "event sourcing"-style, in order to arrive at the current state of the system.
Store each version of a row once, with a start and end date. Known by terms such as "time variant data", this pattern pops up very frequently in data warehousing, and more generally in relational database design when there is a need to store historical values. When a row changes, update the previous version to set the "end date", then insert the new version with today as the "start date". Unfortunately, this doesn't translate well to the Hadoop paradigm, where append-only datasets are favoured, and there is no native concept of updating a row (although that effect can be achieved by overwriting the existing data files). This approach requires quite complicated logic to load the data, but admittedly it can be quite convenient to consume data with this structure.
(It's worth noting that all it takes is one particularly volatile field changing every day to make the latter options degrade to the same space efficiency as option 1).
So...is there another option that combines space efficiency with ease of use?
I'd suggest a variant of option 3 that respects the append only nature of HDFS.
Instead of one data set, we keep two with different kinds of information, stored separately:
The history of expired rows, most likely partitioned by the end date (perhaps monthly). This only has rows added to it when their end dates become known.
A collection of snapshots for particular days, including at least the most recent day, most likely partitioned by the snapshot date. New snapshots can be added each day, and old snapshots can be deleted after a couple of days since they can be reconstructed from the current snapshot and the history of expired records.
The difference from option 3 is just that we consider the unexpired rows to be a different kind of information from the expired ones.
Pro: Consistent with the append only nature of HDFS.
Pro: Queries using the current snapshot can run safely while a new day is added as long as we retain snapshots for a few days (longer than the longest query takes to run).
Pro: Queries using history can similarly run safely as long as they explicitly give a bound on the latest "end-date" that excludes any subsequent additions of expired rows while they are running.
Con: It is not just a simple "update" or "overwrite" each day. In practice in HDFS this generally needs to be implemented via copying and filtering anyway so this isn't really a con.
Con: Many queries need to combine the two data sets. To ease this we can create views or similar that appropriately union the two to produce something that looks exactly like option 3.
Con: Finding the latest snapshot requires finding the right partition. This can be eased by having a view that "rolls over" to the latest snapshot each time a new one is available.

Setting workarea_size_policy to manual vs automatic

I am working on a data warehouse system which was upgraded about a year ago to Oracle 10g (now 10.2.0.5).
The database is set up with workarea_size_policy=auto and pga_aggregate_target=1G. Most of the ETL process is written in PL/SQL and this code generally sets workarea_size_policy=manual and sets the SORT_AREA_SIZE and HASH_AREA_SIZE for particular sessions when building specific parts of the warehouse.
The values chosen for the SORT_AREA_SIZE and HASH_AREA_SIZE are different for different parts of the build. These sizes are probably based on the expected amount of data that will be processed in each area.
The problem I am having is that this code is starting to cause a number of ORA-600 errors to occur. It is making me wonder if we should even be overriding the automatic settings at all.
The code that sets the manual settings was written many years ago by a developer who is no longer here. It was probably originally written for Oracle 8 with an amendment for Oracle 9 to set the workarea_size_policy to manual. No one really knows how the values used for HASH_AREA_SIZE and SORT_AREA_SIZE were found. They could be completely inappropriate for all I know.
After that long preamble, I've got a few questions.
How do I know when (if ever) I should be overriding the manual settings with workarea_size_policy=manual?
How do I find appropriate values for HASH_AREA_SIZE, SORT_AREA_SIZE, etc?
How do I benchmark that particular settings are actually providing any sort of benefit?
I'm aware that this is a pretty broad question but help would be appreciated.
I suggest you comment out the manual settings and do a test run only with automatic (dynamic) settings, like PGA_AGGREGATE_TARGET.
Management of Sort and Hash memory areas has improved a lot since Oracle 8!
It's hard to predetermine the memory requirements of your procedures, so the best is to test them with representative volumes of data and see how it goes.
You can then create an AWR report covering the timeframe of the execution of the procedures. There's a section in the report named PGA Memory Advisory. That will tell you if you need more memory assigned to PGA_AGGREGATE_TARGET, based on your current data volumes.
See sample here:
In this case you can clearly see that there's no need to go over the current 103 MB assigned, and you could actually stay at 52 MB without impacting the application.
Depending on the volumes we're talking about, if you can't assign more memory, some Sort or Hash operations might spill to a TEMPORARY tablespace, so make sure you have a properly sized one and possibly spread across as many disks / volumes as possible (see SAME configuration, also here).

Berkeley DB Java Edition - tuning for large amount of data

I need to load over 1 billion keys into Berkley DB and therefore I want to tune it in advance to get better performance. With standard configuration it takes me now about 15min to load 1'000'000 keys which is too slow.
Is there a proper way to tune for example the B+Tree of Berkley DB (node size etc...)?
(As an comparision, after tuning tokyo cabinet, it loads 1 billion keys in 25min).
P.S.
I'm looking for tuning tips as a code and not parameters to set for a running system (like jvm size etc...)
I'm curious, when TokyoCabinet loads 1B keys in 25 minutes what are the sizes of the keys/values being stored? What's the I/O systems and the storage system you're using? Are you using the term "load" to mean 1B transactional commits to permanent stable storage? That would be ~666,666 inserts/second, which is physically impossible given any I/O system I'm aware of. Multiply that number times the key and value size and now you're hopelessly beyond physical limits.
Please take a look at Gustavo Duarte's blog, read a bit about I/O systems and how things work in hardware and then review your statement. I'm very interested in finding out what exactly TokyoCabinet is doing and what it isn't doing. If I had to guess I'd say that either it's committing to file-system cache in the operating system, but not flushing (fdsync()-ing) those buffers to disk.
Full Disclosure: I'm a product manager at Oracle for Oracle Berkeley DB (a direct competitor of TokyoCabinet), I've been playing with these databases and the best hardware around for them for about ten years so I'm both biased and skeptical.
Berkeley DB has flags you can set on the transaction handle which mimic this and other similar methods of trading off durability (the "D" in ACID) for speed.
As far as how to make Berkeley DB Java Edition (BDB-JE) faster you can try the following:
Deferred writes: this delays writing
to the transaction log for as long as
possible (when buffers are full, it
flushes the data)
Sort your keys in advance: most
B-Trees (ours included) do much
better with in-order insertions for
fast load times-
Increasing the size of the log
files from the default of 10MiB to
something larger, like 100MiB, this
reduces I/O cost-
It's very important to be clear about claims of performance with databases. They seems simple, but it turns out to be very very tricky to get them right so that they don't ever corrupt data or lose committed transactions.
I hope this helps you a bit.
Bulk inserts on BDB-JE are an order of magnitude faster if you group them into a single transaction. The reason is that each single commit causes (by default) a sync write to disk while a transaction is synchronized on commit. In my application writing 100'000 small keys as single commits tooks more than a minute while in a transaction it takes just a few seconds.

Storage for Write Once Read Many

I have a list of 1 million digits. Every time the user submit an input, I would need to do a matching of the input with the list.
As such, the list would have the Write Once Read Many (WORM) characteristics?
What would be the best way to implement storage for this data?
I am thinking of several options:
A SQL Database but is it suitable for WORM (UPDATE: using VARCHAR field type instead of INT)
One file with the list
A directory structure like /1/2/3/4/5/6/7/8/9/0 (but this one would be taking too much space)
A bucket system like /12345/67890/
What do you think?
UPDATE: The application would be a web application.
To answer this question you'll need to think about two things:
Are you trying to minimize storage space, or are you trying to minimize process time.
Storing the data in memory will give you the fastest processing time, especially if you could optimize the datastructure for your most common operations (in this case a lookup) at the cost of memory space. For persistence, you could store the data to a flat file, and read the data during startup.
SQL Databases are great for storing and reading relational data. For instance storing Names, addresses, and orders can be normalized and stored efficiently. Does a flat list of digits make sense to store in a relational database? For each access you will have a lot of overhead associated with looking up the data. Constructing the query, building the query plan, executing the query plan, etc. Since the data is a flat list, you wouldn't be able to create an effective index (your index would essentially be the values you are storing, which means you would do a table scan for each data access).
Using a directory structure might work, but then your application is no longer portable.
If I were writing the application, I would either load the data during startup from a file and store it in memory in a hash table (which offers constant lookups), or write a simple indexed file accessor class that stores the data in a search optimized order (worst case a flat file).
Maybe you are interested in how The Pi Searcher did it. They have 200 million digits to search through, and have published a description on how their indexed searches work.
If you're concerned about speed and don't want to care about file system storage, probably SQL is your best shot. You can optimize your table indexes but also will add another external dependency on your project.
EDIT: Seems MySQL have an ARCHIVE Storage Engine:
MySQL supports on-the-fly compression since version 5.0 with the ARCHIVE storage engine. Archive is a write-once, read-many storage engine, designed for historical data. It compresses data up to 90%. It does not support indexes. In version 5.1 Archive engine can be used with partitioning.
Two options I would consider:
Serialization - when the memory footprint of your lookup list is acceptable for your application, and the application is persistent (a daemon or server app), then create it and store it as a binary file, read the binary file on application startup. Upside - fast lookups. Downside - memory footprint, application initialization time.
SQL storage - when the lookup is amenable to index-based lookup, and you don't want to hold the entire list in memory. Upside - reduced init time, reduced memory footprint. Downside - requires DBMS (extra app dependency, design expertise), fast, but not as fast as holding the whole list in memeory
If you're concerned about tampering, buy a writable DVD (or a CD if you can find a store which still carries them ...), write the list on it and then put it into a server with only a DVD drive (not a DVD writer/burner). This way, the list can't be modified. Another option would be to buy an USB stick which has a "write protect" switch but they are hard to come by and the security isn't as good as with a CD/DVD.
Next, write each digit into a file on that disk with one entry per line. When you need to match the numbers, just open the file, read each line and stop when you find a match. With todays computer speeds and amounts of RAM (and therefore file system cache), this should be fast enough for a once-per-day access pattern.
Given that 1M numbers is not a huge amount of numbers for todays computers, why not just do pretty much the simplest thing that could work. Just store the numbers in a text file and read them into a hash set on application startup. On my computer reading in 1M numbers from a text file takes under a second and after that I can do about 13M lookups per second.

Is there any logical reason of having different tablespace for indexes?

Hi Can some let me know why we created different table space for Index and data.
It is a widespread belief that keeping indexes and tables in separate tablespaces improves performance. This is now considered a myth by many respectable experts (see this Ask Tom thread - search for "myth"), but is still a common practice because old habits die hard!
Third party edit
Extract from asktom: "Index Tablespace" from 2001 for Oracle version 8.1.6 the question
Is it still a good idea to keep indexes in their own tablespace?
Does this inhance performance or is it more of a recovery issue?
Does the answer differ from one platform to another?
First part of the Reply
Yes, no, maybe.
The idea, born in the 1980s when systems were tiny and user counts were in the single
digits, was that you separated indexes from data into separate tablespaces on different
disks.
In that fashion, you positioned the head of the disk in the index tablespace and the head
of the disk in the data tablespace and that would be better then seeking 2 times on the
same disk.
Drives back then were really slow at seeking and typically measured in the 10's to 100's
of megabytes (if you were lucky)
Today, with logical volumes, raid, NN gigabyte (nn is rapidly becoming NNN gigabytes)
drives, hundreds/thousands of concurrent users, thousands of tables, 10's of thousands of
indexes - this sort of "optimization" is sort of impossible.
What you strive for today is to be able to manage things, to spread IO out evenly
avoiding hot spots.
Since I believe all things should be in locally managed tablespaces with UNIFORM extent
sizes, I would say that yes, indexes would be in a different tablespace from the data but
only because they are a different SIZE then the data. My table with 50 columns and an
average row size of 4k might belong in a tablespace that has 5meg extents whereas the
index on a single number column might belong in a tablespace with 512k or 1m extents.
I tend to keep my indexes separate from the data but for the above sizing reason. The
tablespaces frequently end up on the same exact mount points. You strive for even io
across your disks and you may end up with indexes and data on the same devices.
It makes a sense in 80s, when there were not to many users and the databases size was not too big. At that time it was usefull to store indexes and tables in the different physical volumes.
Now there are the logical volumes, raid and so on and it is not necessary to store the indexes and tables in different tablespaces.
But all tablespaces must be locally managed with uniform extends size. From this point of view the indexes must be stored in different tablespace as the table with the 50 columns could be stored in the tablespace with 5Mb exteds size, when the tablespace for indexes will be enought 512Kb extended size.
Performance. It should be analyzed from case to case. I think that keeping all toghether in one tablespace becomes another myth too! It should be enough spindles, enough luns and take care of queuing in operating system. if someone thinks that making one tablespace is enough and is the same like many tablespaces without taking in consideration all other factors, means again another myth. It depends!
High Avalilability. using separate tablespaces can improve high availability of the system in case that some file corrution, files system corruption, block corruption. If the problem occures only at index tablespace there is achance to do the recovery online and our application still beeing available to the customer. see also: http://richardfoote.wordpress.com/2008/05/02/indexes-in-their-own-tablespace-recoverability-advantages-get-back/
using separate tablespaces for indexes, data, blobs, clobs, eventually some individual tables can be important for the manageability and costs. We can use our storage system to store our blobs, clobs, eventually archive to a different layer of storage with different quality of service

Resources