Are there any good validation tools for Intel hex files. Im looking for something that will validate checksums, as well as being able to identify and extract things like start address records?
This one is tricky... Well in any case I'm sorry to say this but the best way to do it is to use a HEX editor and cross reference with things you know exists in the start or end address. So in other words the best way is to do it manually by hand.
UPDATE
I may have something that you would find interesting http://en.wikipedia.org/wiki/Intel_HEX. At the bottom of the page is exactly what I believe you are looking for but I stay by my standpoint that validating it by hand is still the best.
You can use the following .NET library: IntelHexFormatReader for this.
(Disclaimer: I wrote the library).
Validating a single line in the HEX file could look like this:
string hexRecordLine = ":100130003F0156702B5E712B722B732146013421C7";
IntelHexRecord record = HexFileLineParser.ParseLine(hexRecordLine);
Console.WriteLine(record.ByteCount);
Console.WriteLine(record.Address.ToString("X4"));
Console.WriteLine(record.RecordType.ToString());
Console.WriteLine(record.Bytes[5]);
Console.WriteLine(record.Bytes[9]);
Console.WriteLine(record.CheckSum);
Resulting in the following output:
16
0130
Data
94
43
199
The tools does exhaustive validation (check the unit tests in order to get an idea of what validations are in place) and also allows you to return a full "memory representation" for a full Intel Hex file.
Related
I am getting data from a broken RSS feed that gives me wrong link. I wanted to fix this link so I made this code:
<link.*>(.*)&.*tid(.*)</link>
and the link could be like:
www.somedomain.com/?value=50&burrrdurrrr;tid=120
But the real working link is in this form:
www.somedomain.com/?value=50&tid=120
The thing that I'm asking is if my measure thing looks like this:
[FeedURL]
Measure=Plugin
Plugin=Plugins\WebParser.dll
Url=[Feed]
StringIndex=2 ;now I only get www.somedomain.com/?value=50
Substitute=#SubstituteFeed#
How am I supposed to concatenate the strings together to complete the url?
I'm guessing rather than &burrrdurrrr;, the link has &, which is how you have to write & in an HTML or XML file.
If that's the case, you just need to set the DecodeCharacterReference option, as described in this handy-looking tutorial. Another option mentioned there is Substitute, which would be able to strip it out even if it really was &burrrdurrrr;.
None of this is a particularly sensible way of dealing with HTML or XML - a much better approach would be a plugin which actually parsed the document structure and let you reference nodes using XPath or CSS rules - but you work with what you've got, I guess. (I've never heard of this "Rainmeter" before, despite its claim to be "the best known and most popular desktop customization program for Windows"; maybe because nobody else calls their program that, instead almost universally using the word "widget"?)
I am processing some CDRs (call detailed record). I dont know which exactly the file it is? But i supposed this to be 'ASN.1' format BER encoded files. Now my problem is that I want to modify some data in this files but I dont know which Editor or decorder I can use to modify this files. I searched a lot and found many ASN.1 Decorder as well as ASN.1 BSR viewer/editor but no one allows what i want to perform.
This CDR is supposed to contain Customer detail, phone number, telecom services(telephony, SMS, MMS) etc.
One of CDR name is - GGSN01_20120105000102_56641-09-12-01-09%3A30
and file type is - File
No other information is available. When I am opening this file in some text editor it show some rectangles and some text data.
Any telecom guy can definite help me. I am new to telecom domain.
Please ask if you need more information. Thanks
You would need to know something about ASN.1 and BER to be able to correctly edit your file. BER is a binary format, not ASCII text, thus what you see in your text editor. Even modifying any embedded plain text is only safe if you are not changing the length of the string; BER uses nested structures that encode lengths and so a change in the length of a string value requires adjustments to the encoded lengths of the enclosing structures. Additionally, in order to really know what your data is, you would need to know the ASN.1 that describes it (defines the types that describe your encoded data).
You could use a tool such as ASN.1 editor, but without the requisite background knowledge, I think it will not be very helpful to you. You can follow various links on this resources page to get more information about ASN.1. (full disclosure: I am currently an Obj-Sys employee).
Look for tools like enber and unber, they come as debugging tools with the fee asn.1-compiler of Lev Walkin. At least you get text-format from them.
The systemic solution is, of course to write a program that reads the BER-file, applies the schnages and then writes out the altered BER-file. To do so you need the ASN.1-Specification file of your CDR-Format (usually to be found in the specifications of the standard e.g. IMS, you are using) an asn1-compiler such as Lev's and some programming skills.
I have two barcodes that I am working with. They are clearly different, but both scan as code 128. One is weird and one is normal. I have tried to reprint the data for the barcode in every way I can think of to I can see what subset (A, B or C) is being used.
For the normal one I know it is A for the first 10 chars then it changes the encoding to B.
I cannot seem to find out how to see what the encoding is on the other (weird) one.
I am using a symbol scanner. (I turned on the prefix char but that only told me D (Code 128)
Is there any tool to allow me to dig into the barcode symbologies?
I know very little about barcodes and zero about non-european ones, but for weird implementations of Code 128, there is also GS1-128.
This online barcode generator looks quite nice and can generate a lot of formats you might want to check against.
Chiming in late here, but the ZXing library (I'm a developer) reads Code 128. You could have it scan the barcode while you attach a debugger to the code. It would show you exactly what's happening, step by step, in the decoding, including subset changes.
I know the problem is fixed. But here's some more resource in case someone's in need :)
Like Pekka mentioned, a Code 128 have subsets like Code 128A, Code 128B, Code 128C and GS1-128(UCC/EAN-128). Here is more information on Code 128 barcode, with encoding pattern illustration.
Thanks for those that answered and commented.
Turns out the company that made our barcodes had a printing error. That caused the barcode to look different.
How it ever successfully decoded I do not know. Anyway. I am going to award the question to Pekka because he gave me a workable solution.
Sadly, a project that I have been working on lately has a large amount of copy-and-paste code, even within single files. Are there any tools or techniques that can detect duplication or near-duplication within a single file? I have Beyond Compare 3 and it works well for comparing separate files, but I am at a loss for comparing single files.
Thanks in advance.
Edit:
Thanks for all the great tools! I'll definitely check them out.
This project is an ASP.NET/C# project, but I work with a variety of languages including Java; I'm interested in what tools are best (for any language) to remove duplication.
Check out Atomiq. It finds code that is duplicate that is prime for extracting to one location.
http://www.getatomiq.com/
If you're using Eclipse, you can use the copy paste detector (CPD) https://olex.openlogic.com/packages/cpd.
You don't say what language you are using, which is going to affect what tools you can use.
For Python there is CloneDigger. It also supports Java but I have not tried that. It can find code duplication both with a single file and between files, and gives you the result as a diff-like report in HTML.
See SD CloneDR, a tool for detecting copy-paste-edit code within and across multiple files. It detects exact copyies, copies that have been reformatted, and near-miss copies with different identifiers, literals, and even different seqeunces of statements.
The CloneDR handles many languages, including Java (1.4,1.5,1.6) and C# especially up to C#4.0. You can see sample clone detection reports at the website, also including one for C#.
Resharper does this automagically - it suggests when it thinks code should be extracted into a method, and will do the extraction for you
Check out PMD , once you have configured it (which is tad simple) you can run its copy paste detector to find duplicate code.
One with some Office skills can do following sequence in 1 minute:
use ordinary formatter to unify the code style, preferably without line wrapping
feed the code text into Microsoft Excel as a single column
search and replace all dual spaces with single one and do other replacements
sort column
At this point the keywords for duplicates will be already well detected. But to go further
add comparator formula to 2nd column and counter to 3rd
copy and paste values again, sort and see the most repetitive lines
There is an analysis tool, called Simian, which I haven't yet tried. Supposedly it can be run on any kind of text and point out duplicated items. It can be used via a command line interface.
Another option similar to those above, but with a different tool chain: https://www.npmjs.com/package/jscpd
What are the best algorithms for recognizing structured data on an HTML page?
For example Google will recognize the address of home/company in an email, and offers a map to this address.
A named-entity extraction framework such as GATE has at least tackled the information extraction problem for locations, assisted by a gazetteer of known places to help resolve common issues. Unless the pages were machine generated from a common source, you're going to find regular expressions a bit weak for the job.
If you have the markup proper—and not just the text from the page—I second the Beautiful Soup suggestion above. In particular, the address tag should provide the lowest of low-hanging fruit. Also look into the adr microformat. I'd only falll back to regexes if the first two didn't pull enough info or I didn't have the necessary data to look for the first two.
If you also have to handle international addresses, you're in for a world of headaches; international address formats are amazingly varied.
I'd guess that Google takes a two step approach to the problem (at least that's what I would do). First they use some fairly general search pattern to pick out everything that could be an address, and then they use their map database to look up that string and see if they get any matches. If they do it's probably an address if they don't it probably isn't. If you can use a map database in your code that will probably make your life easier.
Unless you can limit the geographic location of the addresses, I'm guessing that it's pretty much impossible to identify a string as an address just by parsing it, simply due to the huge variation of address formats used around the world.
Do not use regular expressions. Use an existing HTML parser, for example in Python I strongly recommend BeautifulSoup. Even if you use a regular expression to parse the HTML elements BeautifulSoup grabs.
If you do it with your own regexs, you not only have to worry about finding the data you require, you have to worry about things like invalid HTML, and lots of other very non-obvious problems you'll stumble over..
What you're asking is really quite a hard problem if you want to get it perfect. While a simple regexp will get it mostly right most of them time, writing one that will get it exactly right everytime is fiendishly hard. There are plenty of strange corner cases and in several cases there is no single unambiguous answer. Most web sites that I've seen to a pretty bad job handling all but the simplest URLs.
If you want to go down the regexp route your best bet is probably to check out the sourcecode of
http://metacpan.org/pod/Regexp::Common::URI::http
Again, regular expressions should do the trick.
Because of the wide variety of addresses, you can only guess if a string is an address or not by an expression like "(number), (name) Street|Boulevard|Main", etc
You can consider looking into some firefox extensions which aim to map addresses found in text to see how they work
You can check this USA extraction example http://code.google.com/p/graph-expression/wiki/USAAddressExtraction
It depends upon your requirement.
for email and contact details regex is more than enough.
For addresses regex alone will not help. Think about NLP(NER) & POS tagging.
For finding people related information you cant do anything without NER.
If you need information like paragraphs get the contents by using tags.