How to add up variables of different hosts in Ansible - ansible
I have a situation similar to the following:
all:
hosts:
host1:
num: 3
host2:
num: 4
host3:
num: 2
I want to template a file to the hosts which should aggregate the value num step by step on each of the hosts starting at 1.
So e.g. for host1, the content of a new variable/file from should be 1, for host2 it should be 4 (1+3), and for host3 8 (4+4). But the order of execution does not really matter - it could also be: host3 1, host1 3 (1+2), host2 6 (3+3). So the variable num basically states how many items the host will handle, and the goal of my question is to give each of the hosts a dedicated number range which would be [from,from+num-1].
EDIT: I have further thought about it, and this could also be precalculated. Basically I want to get from [3,4,2] to [0,3,7](or [1,4,8]). I unfortunately cannot find a Jinja2 Filter that does this.
Q: "reduce/map an array [a,b,c] to [1, a+1, a+b+1]"
A: map doesn't apply filters additively. An iteration is needed, I'm afraid. For example, given the list
l: [10, 20, 30]
create the items and concatenate the new list
- set_fact:
l2: "{{ l2|d([]) + [l[0:(item)]|sum + 1] }}"
loop: "{{ range(0, l|length) }}"
gives
l2:
- 1
- 11
- 31
Of course, you can write a filter if you want to reduce the list in one step.
Related
YTT overlays: modify arrays using data from that arrays
This question is about YTT. Is it possible to modify YAML list of items using the data from that items via overlays? For example we have a template: --- vlans: - vlan-id: 10 - vlan-id: 20 - vlan-id: 30 some_other_configuration: #! some other config here And using overlays we need to transform the template above into this: --- vlans: - vlan-id: 10 vlan-name: vlan10 - vlan-id: 20 vlan-name: vlan20 - vlan-id: 30 vlan-name: vlan30 some_other_configuration: #! some other config here
Yes. One can use an overlay within an overlay. š¤Æ ## load("#ytt:overlay", "overlay") ## def with_name(vlan): ##overlay/match missing_ok=True vlan-name: ## "vlan{}".format(vlan["vlan-id"]) ## end ##overlay/match by=overlay.all --- vlans: ##overlay/match by=lambda idx, left, right: "vlan-id" in left, expects="1+" ##overlay/replace via=lambda left, right: overlay.apply(left, with_name(left)) - which can be read: for all documents, in the vlans: map item... for every array item it contains that has a map that includes the key "vland-id"... replace the map with one that's been overlay'ed with the vlan name https://carvel.dev/ytt/#gist:https://gist.github.com/pivotaljohn/33cbc52e808422e68c5ec1dc2ca38354 See also: the #overlay/replace action: https://carvel.dev/ytt/docs/v0.40.0/lang-ref-ytt-overlay/#overlayreplace Overlays, programmatically: https://carvel.dev/ytt/docs/v0.40.0/lang-ref-ytt-overlay/#programmatic-access.
Hierarchical list of name value pairs in YAML
What's the best way to represent a hierarchical list of name value pairs like the following in YAML: name_1: value_1 subName1_1: subValue1_1 subName1_2: subValue1_2 name_2: value_2 subName2_1: subValue2_1 subName2_2: subValue2_2 name_3: value_3 subName3_1: subValue3_1 subName3_2: subValue3_2 name_4: value_4 subName4_1: subValue4_1 subName4_2: subValue4_2 I am thinking of the following but not sure if this is the best way or not: - name_1: ID: 1 subNames: - subName1_1: ID: 1 - subName1_2: ID: 2 - name_2: ID: 2 subNames: - subName2_1: ID: 1 - subName2_2: ID: 2 or I could also do: - Name: Name_1 ID: 1 SubNames: - SubName: subName1_1 ID: 1 - SubName: subName1_2 ID: 2 - Name: Name_2 ID: 2 SubNames: - SubName: subName2_1 ID: 1 - SubName: subName2_2 ID: 2 I need the name_* to be unique as well as their corresponding values as well so I'd prefer something which python can easily consume to validate there are no duplicates.
Well there's the value key type. It's not part of the standard and defined for YAML 1.1, but it has been designed to solve this problem. It suggests you basically have a value in your mapping named = which contains the default value: name_1: =: value_1 subName1_1: subValue1_1 subName1_2: subValue1_2 name_2: =: value_2 subName2_1: subValue2_1 subName2_2: subValue2_2 name_3: =: value_3 subName3_1: subValue3_1 subName3_2: subValue3_2 name_4: =: value_4 subName4_1: subValue4_1 subName4_2: subValue4_2 Alternatively, you could make the values a list with single key_value pairs: name_1: - value_1 - subName1_1: subValue1_1 - subName1_2: subValue1_2 name_2: - value_2 - subName2_1: subValue2_1 - subName2_2: subValue2_2 name_3: - value_3 - subName3_1: subValue3_1 - subName3_2: subValue3_2 name_4: - value_4 - subName4_1: subValue4_1 - subName4_2: subValue4_2 You can write this with flow sequences since YAML allows flow sequences to contain single key-value pairs which will be interpreted as implicit mappings: name_1: [value_1, subName1_1: subValue1_1, subName1_2: subValue1_2] name_2: [value_2, subName2_1: subValue2_1, subName2_2: subValue2_2] name_3: [value_3, subName3_1: subValue3_1, subName3_2: subValue3_2] name_4: [value_4, subName4_1: subValue4_1, subName4_2: subValue4_2] Be aware that when you do this, you can't have any kind of block-style nodes in the subnames, but other flow nodes will be fine.
Job scheduling with minimization by parallel grouping
I have a job scheduling problem with a twist- a minimization constraint. The task is- I have many jobs, each with various dependencies on other jobs, without cycles. These jobs have categories as well, and can be ran together in parallel for free if they belong to the same category. So, I want to order the jobs so that each job comes after its dependencies, but arranged in such a way that they are grouped by category (to run many in parallel) to minimize the number of serial jobs I run. That is, adjacent jobs of the same category count as a single serial job. I know I can sort topologically to handle dependency ordering. Iāve tried using graph coloring on the subgraphs containing each category of jobs, but I run into problems with inter-category dependency conflicts. More specifically, when I have to make a decision of which of two or more pairs of jobs to group. I can brute force this, and I can try random walks over the search space, but Iām hoping for something smarter. The former blows up exponentially in the worst case, the latter is not guaranteed to be optimal. To put things into scale- there can be as many as a couple hundred thousand jobs to schedule at once, with maybe a couple hundred categories of jobs. Iāve stumbled upon many optimizations such as creating a graph of dependencies, splitting into connected components, and solving each subproblem independently and merging. I also realize thereās a lower bound by either the number of colors to color each category, but not sure how to use that beyond an early exit condition. Is there a better way to find an ordering or jobs to maximize this āgroupingā of jobs of a category, in order to minimize the total number of serial jobs?
No sure if this is helpful, but instead of aiming for an algorithm, it is also possible to develop an optimization model and let a solver do the work. A Mixed Integer Programming model can look like: The idea is that we minimize the total makespan, or the finish time of the latest job. This will automatically try to group together jobs of the same category (to allow parallel processing). I created some random data for 50 jobs and 5 categories. The data set includes some due dates and some precedence constraints. ---- 28 SET j jobs job1 , job2 , job3 , job4 , job5 , job6 , job7 , job8 , job9 , job10, job11, job12 job13, job14, job15, job16, job17, job18, job19, job20, job21, job22, job23, job24 job25, job26, job27, job28, job29, job30, job31, job32, job33, job34, job35, job36 job37, job38, job39, job40, job41, job42, job43, job44, job45, job46, job47, job48 job49, job50 ---- 28 SET c category cat1, cat2, cat3, cat4, cat5 ---- 28 SET jc job-category mapping cat1 cat2 cat3 cat4 cat5 job1 YES job2 YES job3 YES job4 YES job5 YES job6 YES job7 YES job8 YES job9 YES job10 YES job11 YES job12 YES job13 YES job14 YES job15 YES job16 YES job17 YES job18 YES job19 YES job20 YES job21 YES job22 YES job23 YES job24 YES job25 YES job26 YES job27 YES job28 YES job29 YES job30 YES job31 YES job32 YES job33 YES job34 YES job35 YES job36 YES job37 YES job38 YES job39 YES job40 YES job41 YES job42 YES job43 YES job44 YES job45 YES job46 YES job47 YES job48 YES job49 YES job50 YES ---- 28 PARAMETER length job duration job1 11.611, job2 12.558, job3 11.274, job4 7.839, job5 5.864, job6 6.025, job7 11.413 job8 10.453, job9 5.315, job10 12.924, job11 5.728, job12 6.757, job13 10.256, job14 12.502 job15 6.781, job16 5.341, job17 10.851, job18 11.212, job19 8.894, job20 8.587, job21 7.430 job22 7.464, job23 6.305, job24 14.334, job25 8.799, job26 12.834, job27 8.000, job28 6.255 job29 12.489, job30 5.692, job31 7.020, job32 5.051, job33 7.696, job34 9.999, job35 6.513 job36 6.742, job37 8.306, job38 8.169, job39 8.221, job40 14.640, job41 14.936, job42 8.699 job43 8.729, job44 12.720, job45 8.967, job46 14.131, job47 6.196, job48 12.355, job49 5.554 job50 10.763 ---- 28 SET before dependencies job3 job9 job13 job21 job23 job27 job32 job41 job42 job1 YES job3 YES job4 YES job8 YES job9 YES YES job12 YES job14 YES job21 YES job26 YES job31 YES + job43 job46 job48 job10 YES YES job11 YES ---- 28 PARAMETER due some jobs have a due date job16 50.756, job19 57.757, job20 58.797, job25 74.443, job29 65.605, job32 55.928, job50 58.012 The solution can look like: This model (with this particular data set) solved in about 30 seconds (using Cplex). Of course it is noted that, in general, these models can be difficult to solve to optimality.
Here is a CP Optimizer model which solves very quickly using the most recent 12.10 version (a couple of seconds). The model is quite natural using precedence constraints and a "state function" to model the batching constraints (no two tasks from different categories can execute concurrently). DURATION = [ 11611, 12558, 11274, 7839, 5864, 6025, 11413, 10453, 5315, 12924, 5728, 6757, 10256, 12502, 6781, 5341, 10851, 11212, 8894, 8587, 7430, 7464, 6305, 14334, 8799, 12834, 8000, 6255, 12489, 5692, 7020, 5051, 7696, 9999, 6513, 6742, 8306, 8169, 8221, 14640, 14936, 8699, 8729, 12720, 8967, 14131, 6196, 12355, 5554, 10763 ] CATEGORY = [ 1, 5, 3, 2, 2, 2, 2, 5, 1, 3, 5, 3, 5, 4, 1, 4, 1, 2, 4, 3, 2, 2, 1, 1, 3, 5, 2, 4, 4, 2, 1, 3, 1, 5, 2, 2, 3, 4, 4, 3, 3, 1, 2, 1, 2, 1, 4, 3, 4, 2 ] PREC = [ (0, 2), (2, 8), (3, 12), (7, 26), (8, 20), (8, 22), (11, 22), (13, 40), (20, 26), (25, 41), (30, 31), (9, 45), (9, 47), (10, 42) ] DEADLINE = [ (15, 50756), (18, 57757), (19, 58797), (24, 74443), (28, 65605), (31, 55928), (49, 58012) ] assert(len(CATEGORY) == len(DURATION)) # =========================================================================== from docplex.cp.model import CpoModel mdl = CpoModel() TASKS = range(len(DURATION)) # Decision variables - interval variables with duration (length) and name itv = [ mdl.interval_var(length=DURATION[j], name="ITV_{}".format(j+1)) for j in TASKS ] # Deadlines - constrain the end of the interval. for j,d in DEADLINE : mdl.add(mdl.end_of(itv[j]) <= d) # Precedences - use end_before_start for b, a in PREC : mdl.add(mdl.end_before_start(itv[b], itv[a])) # Batching. This uses a "state function" which is an unknown function of # time which needs to be decided by CP Optimizer. We say that this function # must take the value of the category of the interval during the interval # (using always_equal meaning the state function is always equal to a value # over the extent of the interval). This means that only tasks of a particular # category can execute at the same time. r = mdl.state_function() for j in TASKS : mdl.add(mdl.always_equal(r, itv[j], CATEGORY[j])) # Objective. Minimize the latest task end. makespan = mdl.max(mdl.end_of(itv[j]) for j in TASKS) mdl.add(mdl.minimize(makespan)) # Solve it, making sure we get the absolute optimal (0 tolerance) # and limiting the log a bit. 's' contains the solution. s = mdl.solve(OptimalityTolerance=0, LogVerbosity="Terse") # Get the final makespan sol_makespan = s.get_objective_values()[0] # Print the solution by zone # s[X] gets the value of unknown X in the solution s # s[r] gets the value of the state function in the solution # this is a list of triples (start, end, value) representing # the full extent of the state function over the whole time line. zones = s[r] # Iterate over the zones, ignoring the first and last ones, which # are the zones before the first and after the last task. for (start, end, value) in zones[1:-1] : print("Category is {} in window [{},{})".format(value, start, end)) for j in TASKS: (istart, iend, ilength) = s[itv[j]] # intervals are start/end/length if istart >= start and iend <= end: print("\t{} # {} -- {} --> {}".format( itv[j].get_name(), istart, ilength, iend))
for job scheduling I encourage you to have a look at CPOptimizer within CPLEX introduction to CPOptimizer A basic jobshop model will look like using CP; int nbJobs = ...; int nbMchs = ...; range Jobs = 0..nbJobs-1; range Mchs = 0..nbMchs-1; // Mchs is used both to index machines and operation position in job tuple Operation { int mch; // Machine int pt; // Processing time }; Operation Ops[j in Jobs][m in Mchs] = ...; dvar interval itvs[j in Jobs][o in Mchs] size Ops[j][o].pt; dvar sequence mchs[m in Mchs] in all(j in Jobs, o in Mchs : Ops[j][o].mch == m) itvs[j][o]; minimize max(j in Jobs) endOf(itvs[j][nbMchs-1]); subject to { forall (m in Mchs) noOverlap(mchs[m]); forall (j in Jobs, o in 0..nbMchs-2) endBeforeStart(itvs[j][o], itvs[j][o+1]); } as can be seen in the sched_jobshop example
Edit yaml objects in array with yq. Speed up Terminalizer's terminal cast (record)
The goal: Speed up Terminalizer's terminal cast (record) I have a record of terminal created with Terminalizer. cast.yaml: # The configurations that used for the recording, feel free to edit them config: # do not touch it # Records, feel free to edit them records: - delay: 841 content: "\e]1337;RemoteHost=kyb#kyb-nuc\a\e]1337;CurrentDir=/home/kyb/devel/git-rev-label\a\e]1337;ShellIntegrationVersion=7;shell=fish\a" - delay: 19 content: "\e]1337;RemoteHost=kyb#kyb-nuc\a\e]1337;CurrentDir=/home/kyb/devel/git-rev-label\a\e]0;fish /home/kyb/devel/git-rev-label\a\e[30m\e(B\e[m" - delay: 6 content: "\e[?2004h" - delay: 28 content: "\e]0;fish /home/kyb/devel/git-rev-label\a\e[30m\e(B\e[m\e[2mā\e(B\e[m \rā \r\e[K\e]133;D;0\a\e]133;A\a\e[44m\e[30m ~/d/git-rev-label \e[42m\e[34mī° \e[42m\e[30mī demo \e[30m\e(B\e[m\e[32mī° \e[30m\e(B\e[m\e]133;B\a\e[K" - delay: 1202 content: "#\b\e[38;2;231;197;71m#\e[30m\e(B\e[m" - delay: 134 content: "\e[38;2;231;197;71m#\e[30m\e(B\e[m" - delay: 489 content: "\e[38;2;231;197;71m \e[30m\e(B\e[m" - delay: 318 I want to speed up payback without passing --speed-factor to terminalizer play. To do so delays should be decreased. So, I need to create yq-expression to make delays lower .records.delay=.records.delay/3 but this expression won't work. Please help to write proper one.
.records is an array, so you could use this filter: .records |= map(.delay /= 3) Or you might prefer: .records[].delay |= (. /= 3)
ruamel - block_seq_indent and indent
I have been struggling with getting my YAML file with correct indentation after using yaml.round_trip_dump. I am trying to figure what is the difference between block_seq_indent and indent. Couldn't really find anything useful in the documentation as well.
indent is the normal indent that ruamel.yaml inherited from PyYAML. It affects both mapping keys and sequences elements. For sequences that means it doesn't affect the hash ('-') before a sequence element. So if you run: import sys import ruamel.yaml d = dict(a=1, b=[1, 2, {3: [3.1, 3.2, 3.3]}], c=dict(d=1, e=2)) ruamel.yaml.safe_dump(d, sys.stdout, default_flow_style=False, explicit_start=True) ruamel.yaml.safe_dump(d, sys.stdout, default_flow_style=False, indent=4, explicit_start=True) The output will be: --- a: 1 b: - 1 - 2 - 3: - 3.1 - 3.2 - 3.3 c: d: 1 e: 2 --- a: 1 b: - 1 - 2 - 3: - 3.1 - 3.2 - 3.3 c: d: 1 e: 2 If you also provide block_seq_indent you can do: ruamel.yaml.safe_dump(d, sys.stdout, default_flow_style=False, indent=4, block_seq_indent=3, explicit_start=True) to get: a: 1 b: - 1 - 2 - 3: - 3.1 - 3.2 - 3.3 c: d: 1 e: 2 To have even more control you should use the new ruamel.yaml API where you can do: yaml = ruamel.yaml.YAML() yaml.indent(mapping=3, sequence=5, offset=2) yaml.explicit_start = True yaml.dump(d, sys.stdout) to get: a: 1 b: - 1 - 2 - 3: - 3.1 - 3.2 - 3.3 c: d: 1 e: 2 i.e. you can use offset to position the dash within the spaces that are the indent for the sequence elements. This is documented here