What is the correct mapping of inverse bind matrices? - animation

A glTF file defines, for each skin, an array of inverse bind matrices and a list of joints.
Let's assume the list of joints is:
{2,5,7,4,3,6}
If we shift it so that the indices start at 0 we get
{0,3,5,2,1,4}
In this scenario inverse_matrix[2] can refer to one of two things. It can either refer to the second joint in the array, that is to say, joint 5, or to joint 2.
This exact same question applies to the weights array.
Put in a different way. If one takes the data as is from the gltf file and loads the buffers into a shader. I need to figure out how to map a vertex index in the shader to its corresponding 4 skin matrices.
So if joints[2] maps to (3,1,0,0)
I need to know whether I am supposed to be fetching the inverse bind matrices at ibm[3] and ibm[1] or ibm[2] and ibm[3] (Since the value at joints[3] is 2).
I hope this is not confusing.

glTF 2.0 specification ยง Skins says (emphasis mine)
Each skin is defined by the inverseBindMatrices property (which points to an accessor with IBM data), used to bring coordinates being skinned into the same space as each joint; and a joints array property that lists the nodes indices used as joints to animate the skin. The order of joints is defined in the skin.joints array and it must match the order of inverseBindMatrices data.
That directly answers your question. I'll give a more detailed explanation here:
There're two IDs: node ID and joint ID
Node ID: Index of a node in the nodes array
Joint ID: Index of a joint in the joints array
A joint in the joints array is specified with the node ID; the joint ID is different though
"nodes" : [
{ // 0
"name" : "Cave"
},
{ // 1
"name" : "TailBone",
},
// snipped for brievity
{ // 29
"name" : "Root"
}
],
"skins" : [
{
"inverseBindMatrices" : 6,
"joints" : [
29,
1,
]
}
]
For this .glTF, inverseBindMatrix[0] would be for joint[0], this joint is given by node 29; inverseBindMatrix[1] would be for joint[1], this joing is given by node 1.
If we shift it so that the indices start at 0 we get
{0,3,5,2,1,4}
I don't think this is correct. Why do you shift? You basically just need a mapping between node ID to bone ID until you digest and load the glTF data into your own structures (after which node IDs have no use unlike bone IDs)
Node Bone
29 0
1 1
When you calculate the skin matrix, make sure you follow the order of the joints array. In the above example, skin[0] should be for joints[0] i.e. for node 29.

Related

How do I add noise/variability to a dataset in Python, given the CV?

Given a dataset of blood results, say cholesterol level, and knowing that the instrument that produced those results is subject to a known degree of variability, how would I add that variability back into the dataset? i.e. I want to assume the result in the original dataset is the true/mean value, and then produce new results that are subject to the known variability of the instrument.
In Excel you use =NORM.INV(RAND(), mean, std_dev), where RAND() provides a random value between 0 and 1, "mean" will be the original value and I have the CV so I can calculate the SD. NORM.INV then provides the inverse of the cumulative normal distribution function.
I've done the following to create a new column with my new values, but would like to know if it is valid (i.e., will each row have a different random number between 0 and 1 as the probability? and is this formula equivalent to NORM.INV?
df8000['HDL_1'] = norm.ppf(random(), loc = df8000['HDL_0'], scale = TAE_df.loc[0,'HDL'])
Thanks in advance!

Find objects based on multiple criteria 10k+ times a second

I have ~10-15 categories cat1,cat2 etc that are fixed enums, which change once maybe couple of weeks, so we can say they are constant.
For example cat1 enum could have values like that:
cat1: [c1a,c1b,c1c,c1d,c1e]
I have objects (around 10 000 of them) like these:
id: 1, cat1: [c1a, c1b, c1c, c1d], cat2: [ c2a , c2d, c2z], cat3: [c3d] ...
id: 2, cat1: [c1b, c1d], cat2: [ c2a , c2b], cat3: [c3a, c3b, c3c] ...
id: 3, cat1: [c1b, c1d, c1e], cat2: [ c2a], cat3: [c3a, c3d] ...
...
id: n, cat1: [c1a, c1c, c1d], cat2: [ c2e], cat3: [c3a, c3b, c3c, c3d] ...
Now I have incoming request looking like these, with one value for every category:
cat1: c1b, cat2: c2a, cat3: c3d ...
I need to get all ids for objects that match that request, so all objects that include every cat value from that request. Request and objects always have the same number of categories.
To get better understanding of the problem, naive way of solving that in SQL would be something like
SELECT id FROM objects WHERE 'c1b' IN cat1 AND 'c2a' IN cat2 AND 'c3d' IN cat3 ...
Result for our example request and example objects would be: id: [1,3]
I've tried using sets for that, so I had set for every category-category_value for example cat1-c1a, cat1-c1b, cat2-c2a etc with ids of the objects as values in that set and then on request I would do intersection between sets matching values from the request but having 5 digits of requests/s this doesn't scale really well. Maybe I could trade more space for time or trade almost all the space for time and precompute a hashtable with all the possibilities to get O(1) but amount of space needed would be really high. I'm looking for any other viable solutions to this problem. Objects do not change often and new ones are not added very often too so we are only read heavy. Anyone have any idea or suggestions or solved similar problem? maybe some databases/key-value stores that would handle this use case well? Any white papers ?
I store your ids in a Python list ids. ids[id_num] is a list of categories. ids[id_num][cat_num] is a set of integers instead of your letters within your enums but all that matters is they are distinct.
From that list of ids you can generate a reverse-mapping so that given a (cat_num, enum_num) pair you map to the set of all id_nums of ids that contain that enum_num in their cat_num'th category!
#%% create reverse map from (cat, val) pairs to sets of possible id's
cat_entry_2_ids = dict()
for id_num, this_ids_cats in enumerate(ids):
for cat_num, cat_vals in enumerate(this_ids_cats):
for val in cat_vals:
cat_num_val = (cat_num, val)
cat_entry_2_ids.setdefault(cat_num_val, set()).add(id_num)
The above mapping could be saved+reloaded until enums/id's change.
Given a particular request, here shown as a list of enum contained in that numbered category; then the mapping is used to return all ids that have the requested enum in each category.
def get_id(request):
idset = cat_entry_2_ids[(0, request[0])].copy()
for cat_num_req in enumerate(request):
idset.intersection_update(cat_entry_2_ids.get(cat_num_req, set()))
if not idset:
break
return sorted(idset)
Timings depend on 10 to 15 dict lookups and set intersections. In Python I get a speed of around 2_500 per second. Maybe a change of language and/or parallel lookup in the mapping (one thread for each of your 10-15 categories), might get you over that 10_000 lookups/second barrier?

EasyPredictModelWrapper giving wrong prediction

public BinomialModelPrediction predictBinomial(RowData data) throws PredictException {
double[] preds = this.preamble(ModelCategory.Binomial, data);
BinomialModelPrediction p = new BinomialModelPrediction();
double d = preds[0];
p.labelIndex = (int)d;
String[] domainValues = this.m.getDomainValues(this.m.getResponseIdx());
p.label = domainValues[p.labelIndex];
p.classProbabilities = new double[this.m.getNumResponseClasses()];
System.arraycopy(preds, 1, p.classProbabilities, 0, p.classProbabilities.length);
if(this.m.calibrateClassProbabilities(preds)) {
p.calibratedClassProbabilities = new double[this.m.getNumResponseClasses()];
System.arraycopy(preds, 1, p.calibratedClassProbabilities, 0, p.calibratedClassProbabilities.length);
}
return p;
}
Eg: classProbabilities =[0.82333,0,276666]
labelIndex = 1
label = true
domainValues = [false,true]
what does this labelIndex signifies and does the class probabilities
order is same as the domain value order ,If order is same then it means that here probability of false is 0.82333 and probability of true is 0.27666 but why is this labelIndex showing as 1 and label as true.
Please help me to figure out this issue.
Like Tom commented, the prediction is not "wrong". You can infer from this that the threshold H2O has chosen is less than 0.27666. You probably have imbalanced training data, otherwise H2O would have not picked a low threshold for classifying a predicted value of 0.27666 as a 1. Does your training set include fewer examples of the positive class than the negative class?
If you don't like that threshold for whatever reason, then you can manually create your own. Just make sure you know how to properly evaluate the effect of using different thresholds on the performance of your model, otherwise I'd recommend just using the default threshold.
The name, "classProbabilities" is a misnomer. These are not actual probabilities, they are predicted values, though people often use the terms interchangeably. Binary classification algorithms produce "predicted values" that look like probabilities when they're between 0 and 1, but unless a calibration process is performed, they are not going to represent the probabilities. Calibration is not necessarily a straight-forward process and there are many techniques. Here's some more info about calibration methods for imbalanced data. In H2O, you can perform calibration using Platt scaling using the calibrate_model option. But this is probably not really necessary to what you're trying to do.
The proper way to use the raw output from a binary classification model is to only look at the predicted value for the positive class (you can simply ignore the predicted value for the negative class). Then you choose a threshold which suits your needs, or you can use the default threshold in H2O, which is chosen to maximize the F1 score. Some other software will use a hardcoded threshold of 0.5, but that will be a terrible choice if you don't have an even number of positive and negative examples in your training data. If you have only a few positive examples in your training data, then the best threshold will be something much lower than 0.5.

Boost Confidence of Overlapping Observations In Apache Spark

I'm fairly new to scala/spark, so forgive me if my question is elementary but I've searched everywhere and can't find the answer.
Problem
I'm trying to boost the confidence scores a bunch of network router observations (observations of probable router types at different network junctions).
I have a type NetblockObservation combines device types seen on a network with an associated netblock and a confidence. The confidence is the confidence that we accurately identified which device the device we saw.
case class NetblockObservation(
device_type: String
ip_start: Long,
ip_end: Long,
confidence: Double
)
If the confidence is above some threshold thresh, then I want that observation to be in the returned dataset. If it's below thresh, it should not be.
In addition if I have two observations with the same device_type and that one contains the other, the containee should have its confidence increased by by the confidence of the container.
Example
Let's say I have 3 Netblock Observations
// 0.0.0.0/28
NetblockObservation(device_type: "x", ip_start: 0, ip_end: 15, confidence_score: .4)
// 0.0.0.0/29
NetblockObservation(device_type: "x", ip_start: 0, ip_end: 7, confidence_score: .4)
// 0.0.0.0/30
NetblockObservation(device_type: "x", ip_start: 0, ip_end: 3, confidence_score: .4)
With a confidence threshold of 1, I would expect to have a single output of NetblockObservation(device_type: "x", ip_start: 0, ip_end: 4, confidence_score: 1.2)
Explanation: I am allowed to add the confidence scores of NetblockObservation's together if it's contained and has the same device_type
I was allowed to add the confidence score of the 0.0.0.0/29 to the confidence of the 0.0.0.0/30 because it's contained within it.
I was not allowed to add the confidence score of 0.0.0.0/30 to the 0.0.0.0/29 because 0.0.0.0/29 is not contained within 0.0.0.0/30.
My (pitiful) Attempt
Failure reason: Too slow / never completed
I attempted to implement this while simultaneously learning scala/spark so I'm not sure if it's the idea or the implementation which is wrong. I think it would eventually work but after an hour, it hadn't completed on a dataset of size 300,000 (small compared to production scale) so I gave up on it.
The idea is to find the largest netblock and separate the data into netblocks which are contained and netblocks which are not contained. The netblocks which are not contained are recursively passed back into the same function. If the largest netblock has a confidence_score of 1, the entire contained dataset is disregarded and the largest is added to return dataset. If the confidence_score is less then 1, then its confidence_score is added to everything in the contained dataset and that group is recursively passed back to the same function. Eventually, you should only be left with the data which has a confidence_score greater then 1. This algorithm also has the issue of not taking device_type into account.
def handleDataset(largestInNetData: Option[NetblockObservation], netData: RDD[NetblockObservation]): RDD[NetblockObservation] = {
if (netData.isEmpty) spark.sparkContext.emptyRDD else largestInNetData match {
case Some(largest) =>
val grouped = netData.groupBy(item =>
if (item.ip_start >= largest.ip_start && item.ip_end <= largest.ip_end) largestInNetData
else None)
def lookup(k: Option[NetblockObservation]) = grouped.filter(_._1 == k).flatMap(_._2)
val nos = handleDataset(None, lookup(None))
// Threshold is assumed to be 1
val next = if (largest.confidence_score >= 1) spark.sparkContext.parallelize(Seq(largest)) else
handleDataset(None, lookup(largestInNetData)
.filter(x => x != largest)
.map(x => x.copy(confidence_score = x.confidence_score + largest.confidence_score)))
nos ++ next
case None =>
val largest = netData.reduce((a: NetblockObservation, b: NetblockObservation) => if ((a.ip_end - a.ip_start) > (b.ip_end - b.ip_start)) a else b)
handleDataset(Option(largest), netData)
}
}
It is a fairly involved bit of code, so here is a general algorithm that I hope will help:
Forget about Spark for a moment and write a Scala function, probably in the companion object for NetblockObservation, that takes a collection of them and returns a subset of that collection that is contained. You should unit test the heck out of this function, and again this is pure Scala.
Moving now to Spark. Do a groupBy on your RDD[NetblockObservation] with device_type as the key producing essentially a map of String to Iterable[NetblockObservation].
Filter out all the entries in the map that have a value of size 1 and have a confidence below thresh.
For the entries that remain, apply your function from the first step to the collections of NetblockObservations with a mapValues.
Do a reduceByKey or similar to simply add up the confidence_scores of the contained values.
Enjoy a refreshing beverage.

Create groups from sets of nodes

I have a list of sets (a,b,c,d,e in below example). Each of the sets contains a list of nodes in that set (1-6 below). I was wondering that there probably is a general known algorithm for achieving the below, and I just do not know about it.
sets[
a[1,2,5,6],
b[1,4,5],
c[1,2,5],
d[2,5],
e[1,6],
]
I would like to generate a new structure, a list of groups, with each group having
all the (sub)sets of nodes that appear in multiple sets
references to the original sets those nodes belong to
So the above data would become (order of groups irrelevant).
group1{nodes[2,5],sets[a,c,e]}
group2{nodes[1,2,5],sets[a,c]}
group3{nodes[1,6],sets[a,e]}
group4{nodes[1,5],sets[a,b,c]}
I am assuming I can get the data in as an array/object structure and manipulate that, and then spit the resulting structure out in whatever format needed.
It would be a plus if:
all groups had a minimum of 2 nodes and 2 sets.
when a subset of nodes is contained in a bigger set that forms a group, then only the bigger set gets a group: in this example, nodes 1,2 do not have a group of their own since all the sets they have in common already appear in group2.
(The sets are stored in XML, which I have also managed to convert to JSON so far, but this is irrelevant. I can understand procedural (pseudo)code but also something like a skeleton in XSLT or Scala could help to get started, I guess.)
Go through the list of sets. For each set S
Go through the list of groups. For each group G
If S can be a member of G (i.e. if G's set is a subset of S), add S to G.
If S cannot be a member of G but the intersection of S ang G's set contains more than one node, make a new group for that intersection and add it to the list.
Give S a group of its own and add it to the list.
Combine any groups that have the same set.
Delete any group with only one member set.
For example, with your example sets, after reading a and b the list of groups is
[1,2,5,6] [a]
[1,5] [a,b]
[1,4,5] [b]
And after reading c it's
[1,2,5,6] [a]
[1,5] [a,b,c]
[1,4,5] [b]
[1,2,5] [a,c]
There are slightly more efficient algorithms, if speed is a problem.
/*
Pseudocode algorithm for creating groups data from a set dataset, further explained in the project documentation. This is based on
http://stackoverflow.com/questions/1644387/create-groups-from-sets-of-nodes
I am assuming
- Group is a structure (class) the objects of which contain two lists: a list of sets and a list of nodes (group.nodes). Its constructor accepts a list of nodes and a reference to a Set object
- Set is a list structure (class), the objects (set) of which contain the nodes of the list in set.nodes
- groups and sets are both list structures that can contain arbitrary objects which can be iterated with foreach().
- you can get the objects two lists have in common as a new list with intersection()
- you can count the number of objects in a list with length()
*/
//Create groups, going through the original sets
foreach(sets as set){
if(groups.nodes.length==0){
groups.addGroup(new Group(set.nodes, set));
}
else{
foreach (groups as group){
if(group.nodes.length() == intersection(group.nodes,set.nodes).length()){
// the group is a subset of the set, so just add the set as a member the group
group.addset(set);
if (group.nodes.length() < set.nodes.length()){
// if the set has more nodes than the group that already exists,
// create a new group for the nodes of the set, with set as a member of that group
groups.addGroup(new Group(set.nodes, set));
}
}
// If group is not a subset of set, and the intersection of the nodes of the group
// and the nodes of the set
// is greater than one (they have more than one person in common), create a new group with
// those nodes they have in common, with set as a member of that group
else if(group.nodes.length() > intersection(group.nodes,set.nodes).length()
&& intersection(group.nodes,set.nodes).length()>1){
groups.addGroup(new Group(intersection(group.nodes,set.nodes), set);
}
}
}
}
// Cleanup time!
foreach(groups as group){
//delete any group with only one member set (for it is not really a group then)
if (group.sets.length<2){
groups.remove(group);
}
// combine any groups that have the same set of nodes. Is this really needed?
foreach(groups2 as group2){
//if the size of the intersection of the groups is the same size as either of the
//groups, then the groups have the same nodes.
if (intersection(group.nodes,group2.nodes).length == group.nodes.length){
foreach(group2.sets as set2){
if(!group.hasset(set)){
group.addset(set2);
}
}
groups.remove(group2);
}
}
}

Resources