Parsing STDF Files to Compare results - shell

I am new to this site and I would like to get some inputs regarding parsing STDF files. Generally speaking, I am trying to parse a STDF file to gather only the results (numbers) and not the rest of the line. If I am able to achieve this, I would then like to compare all the numbers together through a bubble sort or insertion sort and see if any numbers are equal to each other. I am capable of doing this in C/C++ and Java but I have no experience parsing documents using Scripts.
Could anyone push me in the right direction? What should I be reading to learn my way around this?

Are you already using an STDF library?
You did not mention one, so I assume not.
You should find a library you are comfortable with (the list changes over time, but you can find some by Googling or looking at the STDF page on Wikipedia) rather than attempting to parse STDF yourself, unless you have a good reason to recreate the STDF parser wheel.
An STDF file contains many tests. It generally does not make sense to compare the results for different tests, so I assume you are looking for matching values within the set of results for each test.
I would use your chosen STDF parser to read the value of each test for each part. Keep a set of the results for each test. As you read each new result, check the set to see if already exists. If it does, you have found the case you were looking for, otherwise add the result to the set.

Related

Simple arithmetic functions in Elasticsearch

I am starting to get acquainted with the use of ELK for work purposes, but struggle to find a solution to use simple mathematic requests in my database.
As shown on the picture, my DB contains 16 available fields, but I would like to create others, without doing it on Excel before converting my file in CVS again.
For example, I would like to create a variable #Bugs/Release. I've heard that this is quite easy to make with no need of scripting, but I can't find the way to do it... Has anybody the solution of this problem?
Huge thanksenter image description here

How to split a large csv file into multiple files in GO lang?

I am a novice Go lang programmer,trying to learn Go lang features.I wanted to split a large csv file into multiple files in GO lang, each file containing the header.How do i do this? I have searched everywhere but couldnt get the right solution.Any help in this regard will be greatly appreciated.
Also please suggest me a good book for reference.
Thanking You
Depending on your shell fu this problem might be better suited for common shell utilities but you specifically mentioned go.
Let's think through the problem.
How big is this csv file? Are we talking 100 lines or is it 5G ?
If it's smallish I typically use this:
http://golang.org/pkg/io/ioutil/#ReadFile
However, this package also exists:
http://golang.org/pkg/encoding/csv/
Regardless - let's return to the abstraction of the problem. You have a header (which is the first line) and then the rest of the document.
So what we probably want to do (if ignoring csv for the moment) is to read in our file.
Then we want to split the file body by all the newlines in it.
You can use this to do so:
http://golang.org/pkg/strings/#Split
You didn't mention but do you know how many files you want to split by or would you rather split by the line count or byte count? What's the actual limitation here?
Generally it's not going to be file count but if we pretend it is we simply want to divide our line count by our expected file count to give lines/file.
Now we can take slices of the appropriate size and write the file back out via:
http://golang.org/pkg/io/ioutil/#WriteFile
A trick I use sometime to help think me threw these things is to write down our mission statement.
"I want to split a large csv file into multiple files in go"
Then I start breaking that up into pieces but take the divide/conquer approach - don't try to solve the entire problem in one go - just break it up to where you can think about it.
Also - make gratiutious use of pseudo-code until you can comfortably write the real code itself. Sometimes it helps to just write a short comment inline with how you think the code should flow and then get it down to the smallest portion that you can code and work from there.
By the way - many of the golang.org packages have example links where you can literally run in your browser the example code and cut/paste that to your own local environment.
Also, I know I'll catch some haters with this - but as for books - imo - you are going to learn a lot faster just by trying to get things working rather than reading. Action trumps passivity always. Don't be afraid to fail.
Here is a package that might help. You can set a necessary chunk size in bytes and a file will be split on an appropriate amount of chunks.

Rainmeter: How to concatenate strings

I am getting data from a broken RSS feed that gives me wrong link. I wanted to fix this link so I made this code:
<link.*>(.*)&.*tid(.*)</link>
and the link could be like:
www.somedomain.com/?value=50&burrrdurrrr;tid=120
But the real working link is in this form:
www.somedomain.com/?value=50&tid=120
The thing that I'm asking is if my measure thing looks like this:
[FeedURL]
Measure=Plugin
Plugin=Plugins\WebParser.dll
Url=[Feed]
StringIndex=2 ;now I only get www.somedomain.com/?value=50
Substitute=#SubstituteFeed#
How am I supposed to concatenate the strings together to complete the url?
I'm guessing rather than &burrrdurrrr;, the link has &, which is how you have to write & in an HTML or XML file.
If that's the case, you just need to set the DecodeCharacterReference option, as described in this handy-looking tutorial. Another option mentioned there is Substitute, which would be able to strip it out even if it really was &burrrdurrrr;.
None of this is a particularly sensible way of dealing with HTML or XML - a much better approach would be a plugin which actually parsed the document structure and let you reference nodes using XPath or CSS rules - but you work with what you've got, I guess. (I've never heard of this "Rainmeter" before, despite its claim to be "the best known and most popular desktop customization program for Windows"; maybe because nobody else calls their program that, instead almost universally using the word "widget"?)

How to simplify with topojson API?

So I have no problem simplifying using topojson from the command line using the -s flag, however, I can't figure out how to do it from the node module.
I see a topojson.simplify() method, but I can't figure out how it works as there is no documentation.
Does anyone have any insight?
By looking at the simplification tests for topojson, I was able to figure out how to use toposjson.simplify(), but I can't fully claim to know whats going on. You can see the tests on the topojson github.
Basically topojson.simplify takes a topology input and has 2 possible options for simplification, "retain-proportion" and "minimum-area", you can also pass the coordinate system, aka "cartesian" or "spherical", although it can be inferred under most circumstances.
examples:
output = topojson.simplify(topology,{"minimum-area": 2,"coordinate-system": "spherical"});
output =topojson.simplify(topology,{"retain-proportion: 2,"coordinate-system": "spherical"});
I am not really sure exactly what the values you pass into these options mean, however higher values tends to produce more simplification. As a note, retain proportion often returns invalid topologies when passed LineStrings, that may be as intended.
Additionally using the quantization option in topojson.topology can be used to create a smaller, simpler output and may be the best solution to some similar use cases and also doesn't have any clearly documented server API examples anywhere so:
//very simplified, small output
topojson.topology({routes: routesCollection},{"quantization":100});
//very unfiltered, large output
topojson.topology({routes: routesCollection},{"quantization":1e8});
note: the default quantization is 10000 (1e4), so anything less than 10000 will create a smaller output and vice versa.

Eliminating code duplication in a single file

Sadly, a project that I have been working on lately has a large amount of copy-and-paste code, even within single files. Are there any tools or techniques that can detect duplication or near-duplication within a single file? I have Beyond Compare 3 and it works well for comparing separate files, but I am at a loss for comparing single files.
Thanks in advance.
Edit:
Thanks for all the great tools! I'll definitely check them out.
This project is an ASP.NET/C# project, but I work with a variety of languages including Java; I'm interested in what tools are best (for any language) to remove duplication.
Check out Atomiq. It finds code that is duplicate that is prime for extracting to one location.
http://www.getatomiq.com/
If you're using Eclipse, you can use the copy paste detector (CPD) https://olex.openlogic.com/packages/cpd.
You don't say what language you are using, which is going to affect what tools you can use.
For Python there is CloneDigger. It also supports Java but I have not tried that. It can find code duplication both with a single file and between files, and gives you the result as a diff-like report in HTML.
See SD CloneDR, a tool for detecting copy-paste-edit code within and across multiple files. It detects exact copyies, copies that have been reformatted, and near-miss copies with different identifiers, literals, and even different seqeunces of statements.
The CloneDR handles many languages, including Java (1.4,1.5,1.6) and C# especially up to C#4.0. You can see sample clone detection reports at the website, also including one for C#.
Resharper does this automagically - it suggests when it thinks code should be extracted into a method, and will do the extraction for you
Check out PMD , once you have configured it (which is tad simple) you can run its copy paste detector to find duplicate code.
One with some Office skills can do following sequence in 1 minute:
use ordinary formatter to unify the code style, preferably without line wrapping
feed the code text into Microsoft Excel as a single column
search and replace all dual spaces with single one and do other replacements
sort column
At this point the keywords for duplicates will be already well detected. But to go further
add comparator formula to 2nd column and counter to 3rd
copy and paste values again, sort and see the most repetitive lines
There is an analysis tool, called Simian, which I haven't yet tried. Supposedly it can be run on any kind of text and point out duplicated items. It can be used via a command line interface.
Another option similar to those above, but with a different tool chain: https://www.npmjs.com/package/jscpd

Resources