Consider the following device-tree overlay example.
The fragments are numbered 0, 1, 2.
Do the numbers matter?
Do they have to be in ascending order?
Or would 0, 2, 1 also work?
Where is it specified?
/dts-v1/;
/plugin/;
/ {
fragment#0 {
target = <&foo>;
__overlay__ {
...
};
};
fragment#1 {
target = <&bar>;
__overlay__ {
...
};
};
fragment#2 {
target = <&baz>;
__overlay__ {
...
};
};
};
Those numbers (and names) don't matter. Take a look at next functions in drivers/of/overlay.c:
of_overlay_create() -> of_build_overlay_info() -> of_fill_overlay_info() -> find_target_node()
As you can see, the code just iterates over the tree (using for_each_child_of_node()) and then obtaining node of interest by "__overlay__" name, like this:
ovinfo->overlay = of_get_child_by_name(info_node, "__overlay__");
So those fragments are just some nodes, and their names dosn't matter. The only thing really used is content of those nodes.
I can even suppose that you can completely omit those #1, #2 postfixes. Take a look at Device Tree specification (section 2.2.1 Node Names):
Each node in the device tree is named according to the following convention:
node-name#unit-address
The unit-address component of the name is specific to the bus type on which the node sits. It consists
of one or more ASCII characters from the set of characters in Table 2-1. The unit-address must
match the first address specified in the reg property of the node. If the node has no reg property, the
# and unit-address must be omitted and the node-name alone differentiates the node from other nodes
at the same level in the tree. The binding for a particular bus may specify additional, more specific
requirements for the format of reg and the unit-address.
Of course, there can be some tricks in code that parses device tree file, like this: drivers/of/fdt.c, unflatten_dt_node():
if ((*p1) == '#')
But I really doubt it that the number after '#' means something (in your case).
Related
I have some extremely old legacy procedural code which takes 10 or so enumerated inputs [ i0, i1, i2, ... i9 ] and generates 170 odd enumerated outputs [ r0, r1, ... r168, r169 ]. By enumerated, I mean that each individual input & output has its own set of distinct value sets e.g. [ red, green, yellow ] or [ yes, no ] etc.
I’m putting together the entire state table using the existing code, and instead of puzzling through them by hand, I was wondering if there was an algorithmic way of determining an appropriate function to get to each result from the 10 inputs. Note, not all input columns may be required to determine an individual output column, i.e. r124 might only be dependent on i5, i6 and i9.
These are not continuous functions, and I expect I might end up with some sort of hashing function approach, but I wondered if anyone knew of a more repeatable process I should be using instead? (If only there was some Karnaugh map like approach for multiple value non-binary functions ;-) )
If you are willing to actually enumerate all possible input/output sequences, here is a theoretical approach to tackle this that should be fairly effective.
First, consider the entropy of the output. Suppose that you have n possible input sequences, and x[i] is the number of ways to get i as an output. Let p[i] = float(x[i])/float(n[i]) and then the entropy is - sum(p[i] * log(p[i]) for i in outputs). (Note, since p[i] < 1 the log(p[i]) is a negative number, and therefore the entropy is positive. Also note, if p[i] = 0 then we assume that p[i] * log(p[i]) is also zero.)
The amount of entropy can be thought of as the amount of information needed to predict the outcome.
Now here is the key question. What variable gives us the most information about the output per information about the input?
If a particular variable v has in[v] possible values, the amount of information in specifying v is log(float(in[v])). I already described how to calculate the entropy of the entire set of outputs. For each possible value of v we can calculate the entropy of the entire set of outputs for that value of v. The amount of information given by knowing v is the entropy of the total set minus the average of the entropies for the individual values of v.
Pick the variable v which gives you the best ratio of information_gained_from_v/information_to_specify_v. Your algorithm will start with a switch on the set of values of that variable.
Then for each value, you repeat this process to get cascading nested if conditions.
This will generally lead to a fairly compact set of cascading nested if conditions that will focus on the input variables that tell you as much as possible, as quickly as possible, with as few branches as you can manage.
Now this assumed that you had a comprehensive enumeration. But what if you don't?
The answer to that is that the analysis that I described can be done for a random sample of your possible set of inputs. So if you run your code with, say, 10,000 random inputs, then you'll come up with fairly good entropies for your first level. Repeat with 10,000 each of your branches on your second level, and the same will happen. Continue as long as it is computationally feasible.
If there are good patterns to find, you will quickly find a lot of patterns of the form, "If you put in this that and the other, here is the output you always get." If there is a reasonably short set of nested ifs that give the right output, you're probably going to find it. After that, you have the question of deciding whether to actually verify by hand that each bucket is reliable, or to trust that if you couldn't find any exceptions with 10,000 random inputs, then there are none to be found.
Tricky approach for the validation. If you can find fuzzing software written for your language, run the fuzzing software with the goal of trying to tease out every possible internal execution path for each bucket you find. If the fuzzing software decides that you can't get different answers than the one you think is best from the above approach, then you can probably trust it.
Algorithm is pretty straightforward. Given possible values for each input we can generate all the input vectors possible. Then per each output we can just eliminate these inputs that do no matter for the output. As the result we for each output we can get a matrix showing output values for all the input combinations excluding the inputs that do not matter for given output.
Sample input format (for code snipped below):
var schema = new ConvertionSchema()
{
InputPossibleValues = new object[][]
{
new object[] { 1, 2, 3, }, // input #0
new object[] { 'a', 'b', 'c' }, // input #1
new object[] { "foo", "bar" }, // input #2
},
Converters = new System.Func<object[], object>[]
{
input => input[0], // output #0
input => (int)input[0] + (int)(char)input[1], // output #1
input => (string)input[2] == "foo" ? 1 : 42, // output #2
input => input[2].ToString() + input[1].ToString(), // output #3
input => (int)input[0] % 2, // output #4
}
};
Sample output:
Leaving the heart of the backward conversion below. Full code in a form of Linqpad snippet is there: http://share.linqpad.net/cknrte.linq.
public void Reverse(ConvertionSchema schema)
{
// generate all possible input vectors and record the resul for each case
// then for each output we could figure out which inputs matters
object[][] inputs = schema.GenerateInputVectors();
// reversal path
for (int outputIdx = 0; outputIdx < schema.OutputsCount; outputIdx++)
{
List<int> inputsThatDoNotMatter = new List<int>();
for (int inputIdx = 0; inputIdx < schema.InputsCount; inputIdx++)
{
// find all groups for input vectors where all other inputs (excluding current) are the same
// if across these groups outputs are exactly the same, then it means that current input
// does not matter for given output
bool inputMatters = inputs.GroupBy(input => ExcudeByIndexes(input, new[] { inputIdx }), input => schema.Convert(input)[outputIdx], ObjectsByValuesComparer.Instance)
.Where(x => x.Distinct().Count() > 1)
.Any();
if (!inputMatters)
{
inputsThatDoNotMatter.Add(inputIdx);
Util.Metatext($"Input #{inputIdx} does not matter for output #{outputIdx}").Dump();
}
}
// mapping table (only inputs that matters)
var mapping = new List<dynamic>();
foreach (var inputGroup in inputs.GroupBy(input => ExcudeByIndexes(input, inputsThatDoNotMatter), ObjectsByValuesComparer.Instance))
{
dynamic record = new ExpandoObject();
object[] sampleInput = inputGroup.First();
object output = schema.Convert(sampleInput)[outputIdx];
for (int inputIdx = 0; inputIdx < schema.InputsCount; inputIdx++)
{
if (inputsThatDoNotMatter.Contains(inputIdx))
continue;
AddProperty(record, $"Input #{inputIdx}", sampleInput[inputIdx]);
}
AddProperty(record, $"Output #{outputIdx}", output);
mapping.Add(record);
}
// input x, ..., input y, output z form is needed
mapping.Dump();
}
}
I need to be able to output all the ranges of IP addresses that are not in a given list of IP addresses ranges.
There is some sort of algorithm that I can use for this kind of task that I can transform into working code?
Basically I will use Salesforce Apex code, so any JAVA like language will do if a given example is possible.
I think the key for an easy solution is to remember IP addresses can be treated as a number of type long, and so they can be sorted.
I assumed the excluded ranges are given in a "nice" way, meaning no overlaps, no partial overlaps with global range and so on. You can of course add such input checks later on.
In this example I'll to all network ranges (global, included, excluded) as instances of NetworkRange class.
Following is the implementation of NetworkRange. Pay attention to the methods splitByExcludedRange and includes.
public class NetworkRange {
private long startAddress;
private long endAddress;
public NetworkRange(String start, String end) {
startAddress = addressRepresentationToAddress(start);
endAddress = addressRepresentationToAddress(end);
}
public NetworkRange(long start, long end) {
startAddress = start;
endAddress = end;
}
public String getStartAddress() {
return addressToAddressRepresentation(startAddress);
}
public String getEndAddress() {
return addressToAddressRepresentation(endAddress);
}
static String addressToAddressRepresentation(long address) {
String result = String.valueOf(address % 256);
for (int i = 1; i < 4; i++) {
address = address / 256;
result = String.valueOf(address % 256) + "." + result;
}
return result;
}
static long addressRepresentationToAddress(String addressRep) {
long result = 0L;
String[] tokens = addressRep.split("\\.");
for (int i = 0; i < 4; i++) {
result += Math.pow(256, i) * Long.parseLong(tokens[3-i]);
}
return result;
}
public List<NetworkRange> splitByExcludedRange(NetworkRange excludedRange) {
if (this.startAddress == excludedRange.startAddress && this.endAddress == excludedRange.endAddress)
return Arrays.asList();
if (this.startAddress == excludedRange.startAddress)
return Arrays.asList(new NetworkRange(excludedRange.endAddress+1, this.endAddress));
if (this.endAddress == excludedRange.endAddress)
return Arrays.asList(new NetworkRange(this.startAddress, excludedRange.startAddress-1));
return Arrays.asList(new NetworkRange(this.startAddress, excludedRange.startAddress-1),
new NetworkRange(excludedRange.endAddress+1, this.endAddress));
}
public boolean includes(NetworkRange excludedRange) {
return this.startAddress <= excludedRange.startAddress && this.endAddress >= excludedRange.endAddress;
}
public String toString() {
return "[" + getStartAddress() + "-" + getEndAddress() + "]";
}
}
Now comes the class that calculates the network ranges left included. It accepts a global range in constructor.
public class RangeProducer {
private NetworkRange global;
public RangeProducer(NetworkRange global) {
this.global = global;
}
public List<NetworkRange> computeEffectiveRanges(List<NetworkRange> excludedRanges) {
List<NetworkRange> effectiveRanges = new ArrayList<>();
effectiveRanges.add(global);
List<NetworkRange> effectiveRangesSplitted = new ArrayList<>();
for (NetworkRange excludedRange : excludedRanges) {
for (NetworkRange effectiveRange : effectiveRanges) {
if (effectiveRange.includes(excludedRange)) {
effectiveRangesSplitted.addAll(effectiveRange.splitByExcludedRange(excludedRange));
} else {
effectiveRangesSplitted.add(effectiveRange);
}
}
effectiveRanges = effectiveRangesSplitted;
effectiveRangesSplitted = new ArrayList<>();
}
return effectiveRanges;
}
}
You can run the following example:
public static void main(String[] args) {
NetworkRange global = new NetworkRange("10.0.0.0", "10.255.255.255");
NetworkRange ex1 = new NetworkRange("10.0.0.0", "10.0.1.255");
NetworkRange ex2 = new NetworkRange("10.1.0.0", "10.1.1.255");
NetworkRange ex3 = new NetworkRange("10.6.1.0", "10.6.2.255");
List<NetworkRange> excluded = Arrays.asList(ex1, ex2, ex3);
RangeProducer producer = new RangeProducer(global);
for (NetworkRange effective : producer.computeEffectiveRanges(excluded)) {
System.out.println(effective);
}
}
Output should be:
[10.0.2.0-10.0.255.255]
[10.1.2.0-10.6.0.255]
[10.6.3.0-10.255.255.255]
First, I assume you mean that you get one or more disjoint CIDR ranges as input, and need to produce the list of all CIDR ranges not including any of the ones given as input. For convenience, let's further assume that the input does not include the entire IP address space: i.e. 0.0.0.0/0. (That can be accommodated with a single special case but is not of much interest.)
I've written code analogous to this before and, though I'm not at liberty to share the code, I can describe the methodology. It's essentially a binary search algorithm wherein you bisect the full address space repeatedly until you've isolated the one range you're interested in.
Think of the IP address space as a binary tree: At the root is the full IPv4 address space 0.0.0.0/0. Its children each represent half of the address space: 0.0.0.0/1 and 128.0.0.0/1. Those, in turn, can be sub-divided to create children 0.0.0.0/2 / 64.0.0.0/2 and 128.0.0.0/2 / 192.0.0.0/2, respectively. Continue this all the way down and you end up with 2**32 leaves, each of which represents a single /32 (i.e. a single address).
Now, consider this tree to be the parts of the address space that are excluded from your input list. So your task is to traverse this tree, find each range from your input list in the tree, and cut out all parts of the tree that are in your input, leaving the remaining parts of the address space.
Fortunately, you needn't actually create all the 2**32 leaves. Each node at CIDR N can be assumed to include all nodes at CIDR N+1 and above if no children have been created for it (you'll need a flag to remember that it has already been subdivided -- i.e. is no longer a leaf -- see below for why).
So, to start, the entire address space is present in the tree, but can all be represented by a single leaf node. Call the tree excluded, and initialize it with the single node 0.0.0.0/0.
Now, take the first input range to consider -- we'll call this trial (I'll use 14.27.34.0/24 as the initial trial value just to provide a concrete value for demonstration). The task is to remove trial from excluded leaving the rest of the address space.
Start with current node pointer set to the excluded root node.
Start:
Compare the trial CIDR with current. If it is identical, you're done (but this should never happen if your input ranges are disjoint and you've excluded 0.0.0.0/0 from input).
Otherwise, if current is a leaf node (has not been subdivided, meaning it represents the entire address space at this CIDR level and below), set its sub-divided flag, and create two children for it: a left pointer to the first half of its address space, and a right pointer to the latter half. Label each of these appropriately (for the root node's children, that will be 0.0.0.0/1 and 128.0.0.0/1).
Determine whether the trial CIDR falls within the left side or the right side of current. For our initial trial value, it's to the left. Now, if the pointer on that side is already NULL, again you're done (though again that "can't happen" if your input ranges are disjoint).
If the trial CIDR is exactly equivalent to the CIDR in the node on that side, then simply free the node (and any children it might have, which again should be none if you have only disjoint inputs), set the pointer to that side NULL and you're done. You've just excluded that entire range by cutting that leaf out of the tree.
If the trial value is not exactly equivalent to the CIDR in the node on that side, set current to that side and start over (i.e. jump to Start label above).
So, with the initial input range of 14.27.34.0/24, you will first split 0.0.0.0/0 into 0.0.0.0/1 and 128.0.0.0/1. You will then drop down on the left side and split 0.0.0.0/1 into 0.0.0.0/2 and 64.0.0.0/2. You will then drop down to the left again to create 0.0.0.0/3 and 32.0.0.0/3. And so forth, until after 23 splits, you will then split 14.27.34.0/23 into 14.27.34.0/24 and 14.27.35.0/24. You will then delete the left-hand 14.27.34.0/24 child node and set its pointer to NULL, leaving the other.
That will leave you with a sparse tree containing 24 leaf nodes (after you dropped the target one). The remaining leaf nodes are marked with *:
(ROOT)
0.0.0.0/0
/ \
0.0.0.0/1 128.0.0.0/1*
/ \
0.0.0.0/2 64.0.0.0/2*
/ \
0.0.0.0/3 32.0.0.0.0/3*
/ \
0.0.0.0/4 16.0.0.0/4*
/ \
*0.0.0.0/5 8.0.0.0/5
/ \
*8.0.0.0/6 12.0.0.0/6
/ \
*12.0.0.0/7 14.0.0.0/7
/ \
14.0.0.0/8 15.0.0.0/8*
/ \
...
/ \
*14.27.32.0/23 14.27.34.0/23
/ \
(null) 14.27.35.0/24*
(14.27.34.0/24)
For each remaining input range, you will run through the tree again, bisecting leaf nodes when necessary, often resulting in more leaves, but always cutting out some part of the address space.
At the end, you simply traverse the resulting tree in whatever order is convenient, collecting the CIDRs of the remaining leaves. Note that in this phase you must exclude those that have previously been subdivided. Consider for example, in the above tree, if you next processed input range 14.27.35.0/24, you would leave 14.27.34.0/23 with no children, but both its halves have been separately cut out and it should not be included in the output. (With some additional complication, you could of course collapse nodes above it to accommodate that scenario as well, but it's easier to just keep a flag in each node.)
First, what you describe can be simplified to:
you have intervals of the form x.x.x.x - y.y.y.y
you want to output the intervals that are not yet "taken" in this range.
you want to be able to add or remove intervals efficiently
I would suggest the use of an interval tree, where each node stores an interval, and you can efficiently insert and remove nodes; and query for overlaps at a given point (= IP address).
If you can guarantee that there will be no overlaps, you can instead use a simple TreeSet<String>, where you must however guarantee (for correct sorting) that all strings use the xxx.xxx.xxx.xxx-yyy.yyy.yyy.yyy zero-padded format.
Once your intervals are in a tree, you can then generate your desired output, assuming that no intervals overlap, by performing a depth-first pre-order traversal of your tree, and storing the starts and ends of each visited node in a list. Given this list,
pre-pend 0.0.0.0 at the start
append 255.255.255.255 at the end
remove all duplicate ips (which will forcefully be right next to each other in the list)
take them by pairs (the number will always be even), and there you have the intervals of free IPs, perfectly sorted.
Note that 0.0.0.0 and 255.255.255.255 are not actually valid, routable IPs. You should read the relevant RFCs if you really need to output real-world-aware IPs.
Recently I encounter an interview question. I was required to write code for expression evaluation. The expression format looks like this:
B=10;
A={
A=100;
B=BDE;
C=C;
D={
A=windows;
B=mac;
C={
A=redhat;
B=ubuntu;
};
};
A+={
A=200;
E=1000;
};
To represent the key of the expression, period delimitated method is used. For example, A.B represents the element B in Map A, and the value of A.B is BDE; similarly, the value of A.D.C.A is redhat. the the represent string is called 'path expression'.
the configuration also support append and override operation. for the above example, we use += operation to append or override the value in Map A. now the value of A.A is 200, and the value of A.E is 1000;
Now, given a configuration strings and the key path of configuration, I was required to return the value of configuration based the configuration strings.
Rules
1) the key name and his value only contains alphabet(A-Z,a-z) and number(0-9), no other characters;
2) if cannot find the value or the expression point to a map, please output "N/A"
3) if find the value, please output the value. no spaces when output the value.
Input and Output
there are three part sin the input. the first line contains two integers indicates the number of congiruation lines(M) and the number of expressions(N).
M<=100, N<=100. the following M lines are the confugration and the last N lines are expression. every configuration line contains one or more configurations. every line length less than 1000.
Input :
2 2
A={A=1;B=2;C=3;E={A=100;};};
A+={D=4;E={B=10;C=D;};};
A.E.B
B.D.E
Output
A.E.B=10
B.D.E=N/A
My thoughts
I was thinking about using a N-nary tree to represent the expression. For example, the expression: A = {A = 1;D = 1;B = {C = 1,D = {D = 1,F = 2};};}; can be represented as:
(A,-)
/ | \
(A,1) (D,1) (B,-)
/ \
(C,1) (D,-)
/ \
(D,1) (F,2)
Since a N-nary tree can be represented as a binary tree. Thus, all append or search operations would be either the insert or search operations for a binary tree. It seems that this approach works. But I am wondering if there is a better way to approach this problem?
I am thinking about putting all children in a hash map (since that's what interviewers like)
Node{
String val;
HashMap<String, Node> children;
Node(int val){
this.val = val;
children = new HashMap<String, Node>();
}
}
I've implemented a basic prefix tree or "trie". The trie consists of nodes like this:
// pseudo-code
struct node {
char c;
collection<node> childnodes;
};
Say I add the following words to my trie: "Apple", "Ark" and "Cat". Now when I look-up prefixes like "Ap" and "Ca" my trie's "bool containsPrefix(string prefix)" method will correctly return true.
Now I'm implementing the method "bool containsWholeWord(string word)" that will return true for "Cat" and "Ark" but false for "App" (in the above example).
Is it common for nodes in a trie to have some sort of "endOfWord" flag? This would help determine if the string being looked-up was actually a whole word entered into the trie and not just a prefix.
Cheers!
The end of the key is usually indicated via a leaf node. Either:
the child nodes are empty; or
you have a branch, with one prefix of the key, and some children nodes.
Your design doesn't have a leaf/empty node. Try indicating it with e.g. a null.
If you need to store both "App" and "Apple", but not "Appl", then yes, you need something like an endOfWord flag.
Alternatively, you could fit it into your design by (sometimes) having two nodes with the same character. So "Ap" has to childnodes: The leaf node "p" and an internal node "p" with a child "l".
What is the algorithm - seemingly in use on domain parking pages - that takes a spaceless bunch of words (eg "thecarrotofcuriosity") and more-or-less correctly breaks it down into the constituent words (eg "the carrot of curiosity") ?
Start with a basic Trie data structure representing your dictionary. As you iterate through the characters of the the string, search your way through the trie with a set of pointers rather than a single pointer - the set is seeded with the root of the trie. For each letter, the whole set is advanced at once via the pointer indicated by the letter, and if a set element cannot be advanced by the letter, it is removed from the set. Whenever you reach a possible end-of-word, add a new root-of-trie to the set (keeping track of the list of words seen associated with that set element). Finally, once all characters have been processed, return an arbitrary list of words which is at the root-of-trie. If there's more than one, that means the string could be broken up in multiple ways (such as "therapistforum" which can be parsed as ["therapist", "forum"] or ["the", "rapist", "forum"]) and it's undefined which we'll return.
Or, in a wacked up pseudocode (Java foreach, tuple indicated with parens, set indicated with braces, cons using head :: tail, [] is the empty list):
List<String> breakUp(String str, Trie root) {
Set<(List<String>, Trie)> set = {([], root)};
for (char c : str) {
Set<(List<String>, Trie)> newSet = {};
for (List<String> ls, Trie t : set) {
Trie tNext = t.follow(c);
if (tNext != null) {
newSet.add((ls, tNext));
if (tNext.isWord()) {
newSet.add((t.follow(c).getWord() :: ls, root));
}
}
}
set = newSet;
}
for (List<String> ls, Trie t : set) {
if (t == root) return ls;
}
return null;
}
Let me know if I need to clarify or I missed something...
I would imagine they take a dictionary word list like /usr/share/dict/words on your common or garden variety Unix system and try to find sets of word matches (starting from the left?) that result in the largest amount of original text being covered by a match. A simple breadth-first-search implementation would probably work fine, since it obviously doesn't have to run fast.
I'd imaging these sites do it similar to this:
Get a list of word for your target language
Remove "useless" words like "a", "the", ...
Run through the list and check which of the words are substrings of the domain name
Take the most common words of the remaining list (Or the ones with the highest adsense rating,...)
Of course that leads to nonsense for expertsexchange, but what else would you expect there...
(disclaimer: I did not try it myself, so take it merely as a food for experimentation. 4-grams are taken mostly out of the blue sky, just from my experience that 3-grams won't work all too well; 5-grams and more might work better, even though you will have to deal with a pretty large table). It's also simplistic in a sense that it does not take into the account the ending of the string - if it works for you otherwise, you'd probably need to think about fixing the endings.
This algorithm would run in a predictable time proportional to the length of the string that you are trying to split.
So, first: Take a lot of human-readable texts. for each of the text, supposing it is in a single string str, run the following algorithm (pseudocode-ish notation, assumes the [] is a hashtable-like indexing, and that nonexistent indexes return '0'):
for(i=0;i<length(s)-5;i++) {
// take 4-character substring starting at position i
subs2 = substring(str, i, 4);
if(has_space(subs2)) {
subs = substring(str, i, 5);
delete_space(subs);
yes_space[subs][position(space, subs2)]++;
} else {
subs = subs2;
no_space[subs]++;
}
}
This will build you the tables which will help to decide whether a given 4-gram would need to have a space in it inserted or not.
Then, take your string to split, I denote it as xstr, and do:
for(i=0;i<length(xstr)-5;i++) {
subs = substring(xstr, i, 4);
for(j=0;j<4;j++) {
do_insert_space_here[i+j] -= no_space[subs];
}
for(j=0;j<4;j++) {
do_insert_space_here[i+j] += yes_space[subs][j];
}
}
Then you can walk the "do_insert_space_here[]" array - if an element at a given position is bigger than 0, then you should insert a space in that position in the original string. If it's less than zero, then you shouldn't.
Please drop a note here if you try it (or something of this sort) and it works (or does not work) for you :-)