What's the meaning of "make -j" without a number - makefile

I am using "make" command to compile. I know if I use "make -jN", N indicates the job number. But if I don't use any number after -j, what does it mean?

No number means no limit.
From the GNU Make manual [emphasis mine]:
If the ‘-j’ option is followed by an integer, this is the number of recipes to execute at once; this is called the number of job slots. If there is nothing looking like an integer after the ‘-j’ option, there is no limit on the number of job slots. The default number of job slots is one, which means serial execution (one thing at a time).

Related

How to understand the progress bar for take() in spark-shell

I called take() method of an RDD[LabeledPoint] from spark-shell, which seemed to be a laborious job for spark.
The spark-shell shows a progress-bar:
The progress-bar fills again and again and I don't know how to produce a reasonable estimate of the needed time (or total-progress ) from those numbers above.
Does anyone know what those numbers mean ?
Thanks in advance.
The numbers show the Spark stage that is running, the number of completed, in-progress, and total tasks in the stage. (See
What do the numbers on the progress bar mean in spark-shell? for more on the progress bar.)
Spark stages run tasks in parallel. In your case 5 tasks are running in parallel at the moment. If each task takes roughly the same time, this should give you an idea of how much longer you have to wait for this stage to finish.
But RDD.take can take more than one stage. take(1) will first get the first element of the first partition. If the first partition is empty, it will take the first elements from the second, third, fourth, and fifth partitions. The number of partitions it looks at in each stage is 4× the number of partitions already checked. So if you have a whole lot of empty partitions, take(1) can take many iterations. This can be the case for example if you have a large amount of data, then do filter(_.name == "John").take(1).
If you know your result will be small, you can save time by using collect instead of take(1). This will always gather all the data in a single stage. The main advantage is that in this case all the partitions will be processed in parallel, instead of the somewhat sequential manner of take.

Dbms_scheduler repeat interval maximum value

In documentaion (11g, 12c) we read:
This specifies a positive integer representing how often the recurrence repeats. The default is 1, which means every second for secondly, every day for daily, and so on. The maximum value is 99.
In documentaion (10g) we read:
... same ... The maximum value is 999.
But in Oracle 10g and 12c real maximum is 7999 for SECONDLY frequency. Where is true? I can't find some errata docs.
A larger value for secondly is less of a problem compared to the lower values. A job that runs very frequently uses resources more often than a job that runs only a few times per day (same job).
Monitoring is an important part of using Oracle Scheduler.
Did job creations succeed?
Did job runs succeeed?
How many ran concurrently?
Did other users get problems because of the jobs load?

What's the best Task scheduling algorithm for some given tasks?

We have a list of tasks with different length, a number of cpu cores and a Context Switch time.
We want to find the best scheduling of tasks among the cores to maximize processor utilization.
How could we find this?
Isn't it like if we choose the biggest available tasks from the list and give them one by one to the current ready cores, it's going to be best or you think we must try all orders to find out which is the best?
I must add that all cores are ready at the time unit 0 and the tasks are supposed to work concurrently.
The idea here is that there's no silver bullet, for what you must consider what are the types of tasks being executed, and try to schedule them as nicely as possible.
CPU-bound tasks don't use much communication (I/O), and thus, need to be continuously executed, and interrupted only when necessary -- according to the policy being used;
I/O-bound tasks may be continuously put aside in the execution, allowing other processes to work, since it will be sleeping for many periods, waiting for data to be retrieved to primary memory;
interative tasks must be continuously executed, but needs not to be executed without interruptions, as it will generate interruptions, waiting for user inputs, but it needs to have a high priority, in order not to let the user notice delays in the execution.
Considering this, and the context switch costs, you must evaluate what types of tasks you have, choosing, thus, one or more policies for your scheduler.
Edit:
I thought this was a simply conceptual question. Considering you have to implement a solution, you must analyze the requirements.
Since you have the length of the tasks, and the context switch times, and you have to maintain the cores busy, this becomes an optimization problem, where you must keep the minimal number of cores idle when it reaches the end of the processes, but you need to maintain the minimum number of context switches, so that your overall execution time does not grow too much.
As pointed by svick, this sounds like a partition problem, which is NP-complete, and in which you need to divide a sequence of numbers into a given number of lists, so that the sum of each list is equal to each other.
In your problem you'd have a relaxation on the objective, so that you no longer need all the cores to execute the same amount of time, but you want the difference between any two cores execution time to be as small as possible.
In the reference given by svick, you can see a dynamic programming approach that you may be able to map onto your problem.

Controlling number of lines to be written to the output file

I am new to Hadoop programming.
I have a situation in which I want to stop writing <k3,v3> to my output file after n-lines.
In my program, I am sure that the output file will be sorted according to k3, but I don't want the entire list. I only want the first n.
Is there a mechanism in Hadoop to do this?
I couldn't find an Class/API for the same.
But, you could increment a Counter when the OutputCollector.collect() is called in the Reduce function. When the counter reaches the a certain value, stop calling the OutputCollector.collect().
It's a waste of CPU cycles because the reduce tasks keeps on running even after n lines are written to the o/p. There might be a better approach for the problem.

What's best value for make -j

What's the best value of -j switch?
I usually set this up to the number of CPU/Cores available.
Thanks.
I've always seen the number of cores available plus 1 as the recommended value
Just measure.
Start with the number of cores. And then add one until you feel that you get diminishing returns.

Resources