Conditional Entropy - entropy

I need help solving the conditional entropy for the questions listed below. The data set is listed below as well. Would the formula for the first question be: 0.67 * 1 + 0.33 * 1 or 0.43 * 1 + 0.57 * 1?
1. What is the conditional entropy H(Repurposed | Veteran = Yes)?
2. What is H(Repurposed | Veteran)?
(Index)Droid Base New Veteran Type Output(Repurposed)
U-2FG Scarif No Yes Security Yes
K-919 Scarif No Yes Security Yes
X-23L Wobani Yes Yes Security No
X-24L Death Star Yes No Security No
K-2SO Wobani No No Security ?
C-PM1 Jedha No No Protocol No
B-NC5 Death Star No Yes Security Yes
K-OS4 Death Star No No Security Yes
K-OS5 Jedha Yes No Security No
K-OS3 Scarif No Yes Security Yes
T-101 Wobani No No Protocol No
T-HX1 Death Star Yes Yes Protocol No
T-HX3 Wobani Yes No Protocol No
T-HX8 Jedha Yes No Protocol No

Related

How to format number in PL/SQL?

I need to convert some numbers to chars according to the following logic :
Input => Expected Output | Current Output
0 => 0 | 0.00 << Wrong
.1111 => 0.11 | 0.11
.1 => 0.1 | 0.10 << Wrong
1.111 => 1.11 | 1.11
Basically my logic is to have the minimum of characters. Only the user friendly caracters that describe the number.
Here is my current function
to_char(Value,'9999999999999990D99');
As you can see for 0 for example, it returns 0.00
Does anyone know how to solve that please ?
Thanks.
Looks like you want this one:
rtrim(to_char(Value,'fm99999999999990D99'),'.')
Ie, you need to add 'fm' in format mask and them remove '.':
Example:
select
to_char(Value,'9999999999999990D99') xx
,to_char(Value,'fm9999999999999990D99') x_fm -- just FM
,rtrim(to_char(Value,'fm99999999999990D99'),'.') x_fm_trim -- FM + rtrim
from xmltable('0, 0.1111, 0.1, 1.111' columns value number path '.');
XX X_FM X_FM_TRIM
-------------------- -------------------- ------------------
0.00 0. 0
0.11 0.11 0.11
0.10 0.1 0.1
1.11 1.11 1.11

Job scheduling with minimization by parallel grouping

I have a job scheduling problem with a twist- a minimization constraint. The task is- I have many jobs, each with various dependencies on other jobs, without cycles. These jobs have categories as well, and can be ran together in parallel for free if they belong to the same category. So, I want to order the jobs so that each job comes after its dependencies, but arranged in such a way that they are grouped by category (to run many in parallel) to minimize the number of serial jobs I run. That is, adjacent jobs of the same category count as a single serial job.
I know I can sort topologically to handle dependency ordering. I’ve tried using graph coloring on the subgraphs containing each category of jobs, but I run into problems with inter-category dependency conflicts. More specifically, when I have to make a decision of which of two or more pairs of jobs to group. I can brute force this, and I can try random walks over the search space, but I’m hoping for something smarter. The former blows up exponentially in the worst case, the latter is not guaranteed to be optimal.
To put things into scale- there can be as many as a couple hundred thousand jobs to schedule at once, with maybe a couple hundred categories of jobs.
I’ve stumbled upon many optimizations such as creating a graph of dependencies, splitting into connected components, and solving each subproblem independently and merging. I also realize there’s a lower bound by either the number of colors to color each category, but not sure how to use that beyond an early exit condition.
Is there a better way to find an ordering or jobs to maximize this “grouping” of jobs of a category, in order to minimize the total number of serial jobs?
No sure if this is helpful, but instead of aiming for an algorithm, it is also possible to develop an optimization model and let a solver do the work.
A Mixed Integer Programming model can look like:
The idea is that we minimize the total makespan, or the finish time of the latest job. This will automatically try to group together jobs of the same category (to allow parallel processing).
I created some random data for 50 jobs and 5 categories. The data set includes some due dates and some precedence constraints.
---- 28 SET j jobs
job1 , job2 , job3 , job4 , job5 , job6 , job7 , job8 , job9 , job10, job11, job12
job13, job14, job15, job16, job17, job18, job19, job20, job21, job22, job23, job24
job25, job26, job27, job28, job29, job30, job31, job32, job33, job34, job35, job36
job37, job38, job39, job40, job41, job42, job43, job44, job45, job46, job47, job48
job49, job50
---- 28 SET c category
cat1, cat2, cat3, cat4, cat5
---- 28 SET jc job-category mapping
cat1 cat2 cat3 cat4 cat5
job1 YES
job2 YES
job3 YES
job4 YES
job5 YES
job6 YES
job7 YES
job8 YES
job9 YES
job10 YES
job11 YES
job12 YES
job13 YES
job14 YES
job15 YES
job16 YES
job17 YES
job18 YES
job19 YES
job20 YES
job21 YES
job22 YES
job23 YES
job24 YES
job25 YES
job26 YES
job27 YES
job28 YES
job29 YES
job30 YES
job31 YES
job32 YES
job33 YES
job34 YES
job35 YES
job36 YES
job37 YES
job38 YES
job39 YES
job40 YES
job41 YES
job42 YES
job43 YES
job44 YES
job45 YES
job46 YES
job47 YES
job48 YES
job49 YES
job50 YES
---- 28 PARAMETER length job duration
job1 11.611, job2 12.558, job3 11.274, job4 7.839, job5 5.864, job6 6.025, job7 11.413
job8 10.453, job9 5.315, job10 12.924, job11 5.728, job12 6.757, job13 10.256, job14 12.502
job15 6.781, job16 5.341, job17 10.851, job18 11.212, job19 8.894, job20 8.587, job21 7.430
job22 7.464, job23 6.305, job24 14.334, job25 8.799, job26 12.834, job27 8.000, job28 6.255
job29 12.489, job30 5.692, job31 7.020, job32 5.051, job33 7.696, job34 9.999, job35 6.513
job36 6.742, job37 8.306, job38 8.169, job39 8.221, job40 14.640, job41 14.936, job42 8.699
job43 8.729, job44 12.720, job45 8.967, job46 14.131, job47 6.196, job48 12.355, job49 5.554
job50 10.763
---- 28 SET before dependencies
job3 job9 job13 job21 job23 job27 job32 job41 job42
job1 YES
job3 YES
job4 YES
job8 YES
job9 YES YES
job12 YES
job14 YES
job21 YES
job26 YES
job31 YES
+ job43 job46 job48
job10 YES YES
job11 YES
---- 28 PARAMETER due some jobs have a due date
job16 50.756, job19 57.757, job20 58.797, job25 74.443, job29 65.605, job32 55.928, job50 58.012
The solution can look like:
This model (with this particular data set) solved in about 30 seconds (using Cplex). Of course it is noted that, in general, these models can be difficult to solve to optimality.
Here is a CP Optimizer model which solves very quickly using the most recent 12.10 version (a couple of seconds).
The model is quite natural using precedence constraints and a "state function" to model the batching constraints (no two tasks from different categories can execute concurrently).
DURATION = [
11611, 12558, 11274, 7839, 5864, 6025, 11413, 10453, 5315, 12924,
5728, 6757, 10256, 12502, 6781, 5341, 10851, 11212, 8894, 8587,
7430, 7464, 6305, 14334, 8799, 12834, 8000, 6255, 12489, 5692,
7020, 5051, 7696, 9999, 6513, 6742, 8306, 8169, 8221, 14640,
14936, 8699, 8729, 12720, 8967, 14131, 6196, 12355, 5554, 10763
]
CATEGORY = [
1, 5, 3, 2, 2, 2, 2, 5, 1, 3,
5, 3, 5, 4, 1, 4, 1, 2, 4, 3,
2, 2, 1, 1, 3, 5, 2, 4, 4, 2,
1, 3, 1, 5, 2, 2, 3, 4, 4, 3,
3, 1, 2, 1, 2, 1, 4, 3, 4, 2
]
PREC = [
(0, 2), (2, 8), (3, 12), (7, 26), (8, 20), (8, 22), (11, 22),
(13, 40), (20, 26), (25, 41), (30, 31), (9, 45), (9, 47), (10, 42)
]
DEADLINE = [ (15, 50756), (18, 57757), (19, 58797),
(24, 74443), (28, 65605), (31, 55928), (49, 58012) ]
assert(len(CATEGORY) == len(DURATION))
# ===========================================================================
from docplex.cp.model import CpoModel
mdl = CpoModel()
TASKS = range(len(DURATION))
# Decision variables - interval variables with duration (length) and name
itv = [
mdl.interval_var(length=DURATION[j], name="ITV_{}".format(j+1))
for j in TASKS
]
# Deadlines - constrain the end of the interval.
for j,d in DEADLINE :
mdl.add(mdl.end_of(itv[j]) <= d)
# Precedences - use end_before_start
for b, a in PREC :
mdl.add(mdl.end_before_start(itv[b], itv[a]))
# Batching. This uses a "state function" which is an unknown function of
# time which needs to be decided by CP Optimizer. We say that this function
# must take the value of the category of the interval during the interval
# (using always_equal meaning the state function is always equal to a value
# over the extent of the interval). This means that only tasks of a particular
# category can execute at the same time.
r = mdl.state_function()
for j in TASKS :
mdl.add(mdl.always_equal(r, itv[j], CATEGORY[j]))
# Objective. Minimize the latest task end.
makespan = mdl.max(mdl.end_of(itv[j]) for j in TASKS)
mdl.add(mdl.minimize(makespan))
# Solve it, making sure we get the absolute optimal (0 tolerance)
# and limiting the log a bit. 's' contains the solution.
s = mdl.solve(OptimalityTolerance=0, LogVerbosity="Terse")
# Get the final makespan
sol_makespan = s.get_objective_values()[0]
# Print the solution by zone
# s[X] gets the value of unknown X in the solution s
# s[r] gets the value of the state function in the solution
# this is a list of triples (start, end, value) representing
# the full extent of the state function over the whole time line.
zones = s[r]
# Iterate over the zones, ignoring the first and last ones, which
# are the zones before the first and after the last task.
for (start, end, value) in zones[1:-1] :
print("Category is {} in window [{},{})".format(value, start, end))
for j in TASKS:
(istart, iend, ilength) = s[itv[j]] # intervals are start/end/length
if istart >= start and iend <= end:
print("\t{} # {} -- {} --> {}".format(
itv[j].get_name(), istart, ilength, iend))
for job scheduling I encourage you to have a look at CPOptimizer within CPLEX introduction to CPOptimizer
A basic jobshop model will look like
using CP;
int nbJobs = ...;
int nbMchs = ...;
range Jobs = 0..nbJobs-1;
range Mchs = 0..nbMchs-1;
// Mchs is used both to index machines and operation position in job
tuple Operation {
int mch; // Machine
int pt; // Processing time
};
Operation Ops[j in Jobs][m in Mchs] = ...;
dvar interval itvs[j in Jobs][o in Mchs] size Ops[j][o].pt;
dvar sequence mchs[m in Mchs] in all(j in Jobs, o in Mchs : Ops[j][o].mch == m) itvs[j][o];
minimize max(j in Jobs) endOf(itvs[j][nbMchs-1]);
subject to {
forall (m in Mchs)
noOverlap(mchs[m]);
forall (j in Jobs, o in 0..nbMchs-2)
endBeforeStart(itvs[j][o], itvs[j][o+1]);
}
as can be seen in the sched_jobshop example

How to extract all rows, for which row a particular criteria is met? Details in description

I am trying to load a set of policy numbers in my Target based on below criteria using Informatica PowerCenter.
I want to select all those rows of policy numbers, for which policy the Rider = 0
This is my source: -
Policy Rider Plan
1234 0 1000
1234 1 1010
1234 2 3000
9090 0 2000
9090 2 2545
4321 3 2000
4321 1 2000
Target should look like this: -
Policy Rider Plan
1234 0 1000
1234 1 1010
1234 2 3000
9090 0 2000
9090 2 2545
The policy number 4321 would not be loaded.
If I use filter as Rider = 0, then I miss out on below rows: -
1234 1 1010
1234 2 3000
9090 0 2000
9090 2 2545
What would be ideal way to load this kind of data using PowerCenter Designer?
Take the same source in one more qualifier in same mapping, use a filter as Rider=0 to get list of unique policy numbers that has Rider=0, then use a joiner with your regular source on policy number. This should work.
Another method, sort your data based on policy and Rider, and use variable ports with condition similar to below.
v_validflag=IIF(v_policy_prev!=policy, IIF(Rider=0, 'valid','invalid'), v_validflag)
v_policy_prev=policy
Then filter valid records.
There are many options. Here are two...
First:
It'll look like:
// AGGREGATOR \\
SOURCE >> SOURCE QUALIFIER >> SORTER << >> JOINER >> TARGET
\\============//
Connect all ports from Source Qualifier (SQ) to SORTER transformation (or sort in SQ itself) and define sorting Key for ‘Policy’ and ‘Rider’. After that split stream into two pipelines:
- Connect ‘Policy’ and ‘Rider’ to FILTER transformation and filter records by ‘Rider’ = 0. - After that link ‘Policy’ (only) to AGGREGATOR and set Group By to ‘YES’ for ‘Policy’. - Add a new port with FIRST or MAX function for ‘Policy’ port. This is to remove duplicate ‘Policy’-es.- Indicate ‘Sorted Input’ in the AGGREGATOR properties.- After that link ‘Policy’ from AGR to JOINER as Master in Port tab.
2.- Second stream, from SORTER, directly link to above JOINER (with aggregated ‘Policy’) as Detail. - Indicate ‘Sorted Input’ in the JOINER properties. - Set Join Type as ‘Normal Join’ and Join Condition as POLICY(master)=POLICY(detail) in JOINER properties.
... Target
Second option:
Just Override SQL in Source Qualifier...
WITH PLC as (
select POLICY
from SRC_TBL
where RIDER=0)
select s.POLICY, s.RIDER, s.PLAN
from PLC p left JOIN SRC_TBL s on s.POLICY = p.POLICY;
may vary depend on your source table constructions...

Magento table rate shipping - exclude some regions from shipping

I have setup a table rate shipping in Magento 1.9. I need to exclude some region from shipping.
For eg., In CSV there are 2 rows, in this format:
Country code - Region - postal - thrashhold - shipping cost
1. FR - Corsica - * - 0 - 18
2. * - * - * - 0 - 50
Currently, If I select France - Corsica in shipping calculator, then it returns shipping cost as 18, which is correct. If I select France - any other region then it shows shipping cost 50, which is not as requirement. Is it possible to restrict other regions, if I select France - any other region?
As per your CSV format condition on 2nd line is applied for all the location except FR-Corsica.You need to remove the second line then It must work as expected.
I think, I got it. I need to remove the regions from tables: directory_country_region and directory_country_region_name, then it wont give option to select other regions of France. Then the shipping can be restricted for these regions.
Only change order of rows
Example:
Country code - Region - postal - thrashhold - shipping cost
1. * - * - * - 0 - 50
2. FR - Corsica - * - 0 - 18

price filter problem in magento

In magento sidebar basically how the price filter option is working, i went through all the templte and block files under my custom design.
I am getting this ranges by default.
1. $0.00 - $10,000.00 (1027)
2. $10,000.00 - $20,000.00 (3)
3. $20,000.00 - $30,000.00 (1)
These limits are automatically taken but i want give my own ranges, but they are using only one template file called filter.phtml if i touch that then all other filter options are having problem. How can i customize this price filter as per my own set of ranges?
I need something like this
# $40.00 - $60.00 (155)
# $60.00 - $80.00 (150)
# $80.00 - $100.00 (153)
# $100.00 - $200.00 (248)
# $200.00 - $300.00 (100)
# $300.00 - $400.00 (43)
# $400.00 - $500.00 (20)
# $500.00 - $600.00 (6)
# $600.00 - $700.00 (6)
# $700.00 - $800.00 (2)
If you look in filter.phtml, you will see that it is using the block Mage_Catalog_Block_Layer_Filter_xxx where xxx is the attribute type. Which in turn leads you to the model: Mage_Catalog_Model_Layer_Filter_Price.
Inside app/code/core/Mage/Catalog/Model/Layer/Filter/Price.php, you will see the method getPriceRange() which calculates the price breaks.
You can override that model by copying it into app/code/local/Mage/Catalog/Model/Layer/Filter and adjusting that method so that it calculates the ranges per your requirements.
Good luck.
JD

Resources