How to simplify with topojson API? - topojson

So I have no problem simplifying using topojson from the command line using the -s flag, however, I can't figure out how to do it from the node module.
I see a topojson.simplify() method, but I can't figure out how it works as there is no documentation.
Does anyone have any insight?

By looking at the simplification tests for topojson, I was able to figure out how to use toposjson.simplify(), but I can't fully claim to know whats going on. You can see the tests on the topojson github.
Basically topojson.simplify takes a topology input and has 2 possible options for simplification, "retain-proportion" and "minimum-area", you can also pass the coordinate system, aka "cartesian" or "spherical", although it can be inferred under most circumstances.
examples:
output = topojson.simplify(topology,{"minimum-area": 2,"coordinate-system": "spherical"});
output =topojson.simplify(topology,{"retain-proportion: 2,"coordinate-system": "spherical"});
I am not really sure exactly what the values you pass into these options mean, however higher values tends to produce more simplification. As a note, retain proportion often returns invalid topologies when passed LineStrings, that may be as intended.
Additionally using the quantization option in topojson.topology can be used to create a smaller, simpler output and may be the best solution to some similar use cases and also doesn't have any clearly documented server API examples anywhere so:
//very simplified, small output
topojson.topology({routes: routesCollection},{"quantization":100});
//very unfiltered, large output
topojson.topology({routes: routesCollection},{"quantization":1e8});
note: the default quantization is 10000 (1e4), so anything less than 10000 will create a smaller output and vice versa.

Related

Do Synthea names generally end with a number

I'm using data from synthea and it looks like most (all?) of the given and family names I'm getting back end with a three digit number (e.g. Gregg522). Is this part of the design of synthea or am I parsing the data incorrectly. A snippet of the json I'm getting back is shown below. If this is part of the design, what is the motivation of ending the name with a number (I would think this would make the data less realistic).
Yes, they generally do. It is sometimes nice to be able to see that the patients are fake/synthetic ones. However, this is a setting you can change: In the synthea.properties file, look for the setting "append_numbers_to_person_names" and set it to false.

When cola.js fails, what's a fast way to find out what went wrong?

When cola.js fails, how do you find out what went wrong?
I don't have a minimal reproducible example, because that will likely take hours of debugging to make, and if I had it, then it wouldn't be an example of what I'm asking about. I'm hoping to hear of a way to quickly find out what's gone wrong. Presumably it's some bad data in my graph. But how can I tell what the bad data is?
The screenshot below illustrates a typical situation where something has gone wrong. Various NaNs are reported, there are a bunch of Uncaught TypeErrors, and an Assertion fails. There are a whole lot of local variables, all named with a single letter. Clicking and looking at them shows that a NaN got into a coordinate somewhere, as suggested by the previous error messages. It looks like sometime earlier, something got a null when it needed some sort of list-like data structure. How can I quickly find out what was the bad data that got into the graph?
I'm using cola.v3.min.js.
Details
I'm looking for a general approach to finding this kind of error, but here's the specific data that produced the errors in this example. Each line is the argument passed to cy.add() (as suggested by Stepan T.—and printing out everything passed to cy.add() might turn out to be the first step of the answer!).
{"group":"nodes","data":{"id":1,"label":"Workspace","parent":[]},"position":{"x":45,"y":0}}
{"group":"nodes","data":{"id":2,"label":"Target(121)","parent":[1]},"position":{"x":5,"y":0}}
{"group":"nodes","data":{"id":3,"label":"WantFullySourced","parent":[]},"position":{"x":50,"y":0}}
{"group":"nodes","data":{"id":4,"label":"Brick(120)","parent":[1]},"position":{"x":10,"y":0}}
{"group":"nodes","data":{"id":5,"label":"Avail","parent":[]},"position":{"x":55,"y":0}}
{"group":"nodes","data":{"id":6,"label":"Brick(1)","parent":[1]},"position":{"x":15,"y":0}}
{"group":"nodes","data":{"id":7,"label":"Avail","parent":[]},"position":{"x":60,"y":0}}
{"group":"nodes","data":{"id":8,"label":"Brick(2)","parent":[1]},"position":{"x":20,"y":0}}
{"group":"nodes","data":{"id":9,"label":"Avail","parent":[]},"position":{"x":65,"y":0}}
{"group":"nodes","data":{"id":10,"label":"Brick(3)","parent":[1]},"position":{"x":25,"y":0}}
{"group":"nodes","data":{"id":11,"label":"Avail","parent":[]},"position":{"x":70,"y":0}}
{"group":"nodes","data":{"id":12,"label":"Brick(4)","parent":[1]},"position":{"x":30,"y":0}}
{"group":"nodes","data":{"id":13,"label":"Avail","parent":[]},"position":{"x":75,"y":0}}
{"group":"nodes","data":{"id":14,"label":"Brick(5)","parent":[1]},"position":{"x":35,"y":0}}
{"group":"nodes","data":{"id":15,"label":"Avail","parent":[]},"position":{"x":80,"y":0}}
{"group":"nodes","data":{"id":16,"label":"NumericalRelationScout","parent":[1]},"position":{"x":40,"y":0}}
{"group":"nodes","data":{"id":17,"label":"OperandView","parent":[1]},"position":{"x":0,"y":0}}
{"group":"edges","data":{"id":"2.member_of.1.members","source":2,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"4.member_of.1.members","source":4,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"6.member_of.1.members","source":6,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"8.member_of.1.members","source":8,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"10.member_of.1.members","source":10,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"12.member_of.1.members","source":12,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"14.member_of.1.members","source":14,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"16.member_of.1.members","source":16,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"17.member_of.1.members","source":17,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"3.taggees.2.tags","source":3,"source_port_label":"taggees","target":2,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"2.support_from.3.support_to","source":2,"source_port_label":"support_from","target":3,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"3.support_from.2.support_to","source":3,"source_port_label":"support_from","target":2,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"5.taggees.4.tags","source":5,"source_port_label":"taggees","target":4,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"4.support_from.5.support_to","source":4,"source_port_label":"support_from","target":5,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"7.taggees.6.tags","source":7,"source_port_label":"taggees","target":6,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"6.support_from.7.support_to","source":6,"source_port_label":"support_from","target":7,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"9.taggees.8.tags","source":9,"source_port_label":"taggees","target":8,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"8.support_from.9.support_to","source":8,"source_port_label":"support_from","target":9,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"11.taggees.10.tags","source":11,"source_port_label":"taggees","target":10,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"10.support_from.11.support_to","source":10,"source_port_label":"support_from","target":11,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"13.taggees.12.tags","source":13,"source_port_label":"taggees","target":12,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"12.support_from.13.support_to","source":12,"source_port_label":"support_from","target":13,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"15.taggees.14.tags","source":15,"source_port_label":"taggees","target":14,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"14.support_from.15.support_to","source":14,"source_port_label":"support_from","target":15,"target_port_label":"support_to","weight":2}}
The object passed to cy.layout() had a circular structure, so JSON.stringify() wouldn't print it. Adding the getCircularReplacer() function recommended here seemed to crash the browser. But manually removing elements from the layout object exposed the error: in a relative alignment constraint (documented under API here), I had an offset of '0' where I needed a 0, i.e. a string where a number was needed.
OK, happily that problem is now fixed. That still leaves the original question: is there a faster way to find errors like that? (The static-typing advocates are surely cackling by now.)

Parsing STDF Files to Compare results

I am new to this site and I would like to get some inputs regarding parsing STDF files. Generally speaking, I am trying to parse a STDF file to gather only the results (numbers) and not the rest of the line. If I am able to achieve this, I would then like to compare all the numbers together through a bubble sort or insertion sort and see if any numbers are equal to each other. I am capable of doing this in C/C++ and Java but I have no experience parsing documents using Scripts.
Could anyone push me in the right direction? What should I be reading to learn my way around this?
Are you already using an STDF library?
You did not mention one, so I assume not.
You should find a library you are comfortable with (the list changes over time, but you can find some by Googling or looking at the STDF page on Wikipedia) rather than attempting to parse STDF yourself, unless you have a good reason to recreate the STDF parser wheel.
An STDF file contains many tests. It generally does not make sense to compare the results for different tests, so I assume you are looking for matching values within the set of results for each test.
I would use your chosen STDF parser to read the value of each test for each part. Keep a set of the results for each test. As you read each new result, check the set to see if already exists. If it does, you have found the case you were looking for, otherwise add the result to the set.

With CRF++, MIRA works for me but CRF-L1 and CRF-L2 do not

It may not matter, but I am using the windows distribution of CRF++ 0.58.
So I have successfully used mallet to train a model with a CRF and then test it. When I try to use the same train and test files with CRF++ (and after creating a template file), I get a
The line search routine mcsrch failed: error code:0
error when I use either
-a CRF-L1
or the default
-a CRF-L2
When I use
-a MIRA
though, training works without error and same with test.
The format of the test and training data can be the same for both mallet and crf++, so that is not the issue. My template file is as simple as
#Mixed
M00:%x[0,0]
M01:%x[0,1]
M02:%x[0,2]
......
M12:%x[0,12]
My last column is either 0 or 1 in my training data which is the value to classify with. No whitespace in any of my features, I use underscores when necessary. Am I missing something simple here, what would cause the L1 and L2 regularizations to fail like that?
I knew it was something silly ...
To use features like I am using, you need to use the U prefix (as in Unigram). So like U00:%x[0,0] is fine. You can't just call you features anything you want.
I also discovered that if I stripped down my test data to a single sentence, I would get the same error message. When I restored my test data back to its original size of around 2600 sentences, the regularization algorithms now run. Overfitting is a common cause of this error message across various nlp and ml applications, but that was not the true problem in my case.
It can also happen in the extreme case of a dataset with just one CLASS (due to bug in the training set generation procedure).

Alternative solution for <fr:currency>

We have performance issue using for controls which gets number in the format of $dollars as input. In-order to over come this performance issue we used instead of and implemented the following calculate functionality in the bind of the control.
<xforms:bind id="Amount"
nodeset="instance('sample_form')/Amounts/Amount"
calculate="if (. !=0)
then format-number(xs:double(.),'$#,##0.000')
else ."/>
But the problem with the above code is, its converting the control's value into String type which leads to error in the controls which has its value dependent on this. Kindly provide me a solution for the above problem or provide a better recommendation to handle this situation.
If you only want to show the formatted value (which I assume to be the case since you are thinking of using a calculate), then you could put that expression you have on the calculate inside an:
<xforms:output value="..."/>
If you need both input and output, and you'd like the value stored in your instance to be just the unformatted number, then I don't think there is an easy way around using <fr:currency> other than somehow reimplementing the functionality provided by <fr:currency>. If you need both input and output, then I would recommend you to investigate this further to find, and then solve, the source of the performance issue.

Resources