In documentaion (11g, 12c) we read:
This specifies a positive integer representing how often the recurrence repeats. The default is 1, which means every second for secondly, every day for daily, and so on. The maximum value is 99.
In documentaion (10g) we read:
... same ... The maximum value is 999.
But in Oracle 10g and 12c real maximum is 7999 for SECONDLY frequency. Where is true? I can't find some errata docs.
A larger value for secondly is less of a problem compared to the lower values. A job that runs very frequently uses resources more often than a job that runs only a few times per day (same job).
Monitoring is an important part of using Oracle Scheduler.
Did job creations succeed?
Did job runs succeeed?
How many ran concurrently?
Did other users get problems because of the jobs load?
Related
Why are there no WEEKLY RollCycles defined? Is there some reason they should not be used?
When chronicle queue appends a message to a queue it gives each message a unique index, this index is made up from a 64bit number, the high bits of this 64 bit number are used to define the cycle ( in the case of daily rolling, which day ) and the low bits are used to define the message sequence within that day. It is possible to create a weekly roll cycle, however, when using a weekly roll cycle the maximum number of messages that you could write in any week would approximately be about the same as the number of messages that you can write in a day with daily rolling. [ this is, of course, depending on how many of the higher bits you devoted to the cycle number ]. I believe if you wanted to you could create your own custom roll cycle which did weekly rolling, but at the time the RollCycle was created daily rolling was deemed sufficient.
I suggest you don't allocate just 1 bit to the cycle if you are doing weekly rolling because after a couple of weeks it will stop working. But yes if you did this you could write 2^63 messages per week. Your second option of "6 bits and up to 2^58 entries per week", I feel makes more sense. Having said this you are still going to have to work out what you will do at the end of the year when you have used all the cycles.
Let's say I have a data with 25 blocks and the replication factor is 1. The mapper requires about 5 mins to read and process a single block of the data. Then how can I calculate the time for one worker node? The what about 15 nodes? Will the time be changed if we change the replication factor to 3?
I really need a help.
First of all I would advice reading some scientific papers regarding the issue (Google Scholar is a good starting point).
Now a bit of discussion. From my latest experiments I have concluded that processing time has very strong relation with amount of data you want to process (makes sense). On our cluster, on average it takes around 7-8 seconds for Mapper to read a block of 128MBytes. Now there are several factors which you need to consider in order to predict the overall execution time:
How much data the Mapper produces, which will determine moreless the time Hadoop requires to execute Shuffling
What Reducer is doing? Does it do some iterative processing? (might be slow!)
What is the configuration of the resources? (how many Mappers and Reducers are allowed to run on the same machine)
Finally are there other jobs running simultaneously? (this might be slowing down the jobs significantly, since your Reducer slots can be occupied waiting for data instead of doing useful things).
So already for one machine you are seeing the complexity of the task of predicting the time of job execution. Basically during my study I was able to conclude that in average one machine is capable of processing from 20-50 MBytes/second (the rate is calculated according to the following formula: total input size/total job running time). The processing rate includes the staging time (when your application is starting and uploading required files to the cluster for example). The processing rate is different for different use cases and greatly influenced by the input size and more importantly the amount of data produced by Mappers (once again this values are for our infrastructure and on different machine configuration you will be seeing completely different execution times).
When you start scaling your experiments, you would see in average improved performance, but once again from my study I could conclude that it is not linear and you would need to fit by yourself, for your own infrastructure the model with respective variables which would approximate the job execution time.
Just to give you an idea, I will share some part of the results. The rate when executing determine use case on 1 node was ~46MBytes/second, for 2 nodes it was ~73MBytes/second and for 3 nodes it was ~85MBytes/second (in my case the replication factor was equal to the number of nodes).
The problem is complex requires time, patience and some analytical skills to solve it. Have fun!
There are several partitions on the cluster I work on. With sinfo I can see the time limit for each partition. I put my code to work on mid1 partition which has time limit of 8-00:00:00 from which I understand that time limit is 8 days. I had to wait for 1-15:23:41 which means nearly 1 day and 15 hours. However, my code ran for only 00:02:24 which means nearly 2.5 minutes (and the solution was converging). Also, I did not set a time limit in the file submitted with sbatch The reason of my code stopped was given as:
JOB 3216125 CANCELLED AT 2015-12-19T04:22:04 DUE TO TIME LIMIT
So, why my code was stopped if I did not exceed the time limit? I was asking this to the guys who were responsible for the cluster but they did not return.
Look at the value of DefaultTime in the output of scontrol show partitions. This is the maximum time that is allocated to your job in the case you do not specify it by yourself with --time.
Most probably this value is set to 2 minutes to force you to specify a sensible time limit (within the limits of the partition.)
We are using a timestamp to ensure that entries in a log table are recorded sequentially, but we have found a potential flaw. Say, for example, we have two nodes in our RAC and the node timestamps are 1000ms off. Our app server inserts two log entries within 30ms seconds of each other. The first insert is serviced by Node1 and the second by Node2. With 1000ms difference between the two nodes, the timestamp could potentially show the log entries occurring in the wrong order! (I would just use a sequence, but our sequences are cached for performance reasons... )
NTP sync doesn't help this situation because NTP has a fault tolerance of 128ms -- which leaves the door open for records to be recorded out of order when they occur more frequently than that.
I have a feeling I'm looking at this problem the wrong way. My ultimate goal is to be able to retrieve the actual sequence that log entries are recorded. It doesn't have to be by a timestamp column.
An Oracle sequence with ORDER specified is guaranteed to return numbers in order across a RAC cluster. So
create sequence my_seq
start with 1
increment by 1
order;
Now, in order to do this, that means that you're going to be doing a fair amount of inter-node communication in order to ensure that access to the sequence is serialized appropriately. That's going to make this significantly more expensive than a normal sequence. If you need to guarantee order, though, it's probably the most efficient approach that you're going to have.
Bear in mind that an attached timestamp on a row is generated at time of the insert or update, but the time that the actual change to the database takes place is when the commit happens - which, depending on the complexity of the transactions, row 1 might get inserted before row2, but gett committed after.
The only thing I am aware of in Oracle across the nodes that guarantees the order is the SCN that Oracle attaches to the transaction, and by which transactions in a RAC environment can be ordered for things like Streams replication.
1000ms? It is one sec, isn't it? IMHO it is a lot. If you really need precise time, then simply give up the idea of global time. Generate timestamps on log server and assume that each log server has it's own local time. Read something about Lamport's time, if you need some theory. But maybe the source of your problem is somewhere else. RAC synchronises time between nodes, and it would log some bigger discrepancy.
If two consecutive events are logged by two different connections, is the same thread using both connections? Or are those evens passed to background threads and then those threads write into the database? i.e. is it logged sequentially or in parallel?
I could see the DBA team advises to set the sequence cache to a higher value at the time of performance optimization. To increase the value from 20 to 1000 or 5000.The oracle docs, says the the cache value,
Specify how many values of the sequence the database preallocates and keeps in memory for faster access.
Somewhere in the AWR report I can see,
select SEQ_MY_SEQU_EMP_ID.nextval from dual
Can any performance improvement be seen if I increase the cache value of SEQ_MY_SEQU_EMP_ID.
My question is:
Is the sequence cache perform any significant role in performance? If so how to know what is the sufficient cache value required for a sequence.
We can get the sequence values from oracle cache before them used out. When all of them were used, oracle will allocate a new batch of values and update oracle data dictionary.
If you have 100000 records need to insert and set the cache size is 20, oracle will update data dictionary 5000 times, but only 20 times if you set 5000 as cache size.
More information maybe help you: http://support.esri.com/en/knowledgebase/techarticles/detail/20498
If you omit both CACHE and NOCACHE, then the database caches 20 sequence numbers by default. Oracle recommends using the CACHE setting to enhance performance if you are using sequences in an Oracle Real Application Clusters environment.
Using the CACHE and NOORDER options together results in the best performance for a sequence. CACHE option is used without the ORDER option, each instance caches a separate range of numbers and sequence numbers may be assigned out of order by the different instances. So more the value of CACHE less writes into dictionary but more sequence numbers might be lost. But there is no point in worrying about losing the numbers, since rollback, shutdown will definitely "lose" a number.
CACHE option causes each instance to cache its own range of numbers, thus reducing I/O to the Oracle Data Dictionary, and the NOORDER option eliminates message traffic over the interconnect to coordinate the sequential allocation of numbers across all instances of the database. NOCACHE will be SLOW...
Read this
By default in ORACLE cache in sequence contains 20 values. We can redefine it by given cache clause in sequence definition. Giving cache caluse in sequence benefitted into that when we want generate big integers then it takes lesser time than normal, otherwise there are no such drastic performance increment by declaring cache clause in sequence definition.
Have done some research and found some relevant information in this regard:
We need to check the database for sequences which are high-usage but defined with the default cache size of 20 - the performance
benefits of altering the cache size of such a sequence can be
noticeable.
Increasing the cache size of a sequence does not waste space, the
cache is still defined by just two numbers, the last used and the
high water mark; it is just that the high water mark is jumped by a
much larger value every time it is reached.
A cached sequence will return values exactly the same as a non-cached
one. However, a sequence cache is kept in the shared pool just as
other cached information is. This means it can age out of the shared
pool in the same way as a procedure if it is not accessed frequently
enough. Everything is the cache is also lost when the instance is
shut down.
Besides spending more time updating oracle data dictionary having small sequence caches can have other negative effects if you work with a Clustered Oracle installation.
In Oracle 10g RAC Grid, Services and Clustering 1st Edition by Murali Vallath it is stated that if you happen to have
an Oracle Cluster (RAC)
a non-partitioned index on a column populated with an increasing sequence value
concurrent multi instance inserts
you can incur in high contention on the rightmost index block and experience a lot of Cluster Waits (up to 90% of total insert time).
If you increase the size of the relevant sequence cache you can reduce the impact of Cluster Waits on your index.