what is the best output format / platform to display different sorts of extracted data? - shell

I am writing a script, that extracts different types of data from different kind of custom log files.
But before I continue to write, I want to determine in what output format / platform I want it to be, so it is displayed properly or it can be read properly.
examples:
sometimes it is certain lines of text with an important word in it
sometimes it is a block of text between a start and end phrase
sometimes it are data points, which i then want to visualize better in a line chart
....
OR it is a combination of those
At first i thought i write it so that it is in a markdown file format, so i can for instance create fold able blocks, so that i just unfold the part that i want to read.
But markdown is not versatile. Meaning I cant create line charts or other kinds of stuff (thinking about the future)
So know I put the different types of data in different type of output formats and visualize them in an HTML file.
meaning, the blocks of text in a markdown file, which I then import though a java-script markdown viewer
the data points, I create a line chart through a java-script chart
.....
HOWEVER, I am not sure that this is the best/correct way to go .....
What is your advice ?

Related

Using Multiple Variables to Create a Single Data Matrix in ZPL

I am trying to set up a label template for a Zebra printer that has 4 variables containing different pieces of information, FN1, FN2, FN3, and FN4. These 4 variables are printed to text and barcode fields on the label, however I also want to concatenate them together to create a single data matrix containing the data in all 4 variables.
I have tried inserting line breaks using \& as suggested on page 144 of the documentation but this does not seem to have any effect. The code for the field data I'm trying to use is shown below.
^FH\^FN1\&^FN2\&^FN3\&^FN4^FS
Only the content of the last variable in the list (FN4, in that case) is encoded into the data matrix, the rest are ignored. I suspect I'm missing something fairly straightforward, but have not been able to find any articles relating to this exact problem.
For anyone else looking for this kind of solution, I have received confirmation from Zebra that this is unfortunately not currently (as at 01/02/2023) possible in ZPL. You should be able to implement it with ZBI, but this is quite a different approach and requires compatible hardware.

How do I find formatting settings for CSV on Mac?

I have a Python program that extracts data from an API, applies transformations, and converts it to a csv to be used in Tableau. When I view the file in excel and Google Sheets, it looks fine. No data formatting or read errors as it is formatted in standard UTF8.
When I read it in Tableau, different story. You will notice how the columns lose shape and get parsed incorrectly.
I am thinking it has to do with the fact that my data set is text heavy and contains punctuation, but I have been able to work with data in this format just fine without having to do any custom formatting.
It looks like your csv has multiline fields (which are quoted).
You'll somehow have to tell the Tableau reader/parser to read your data as quoted (and multiline).
Also check the escaping of the quotes (if they are inside a field) - usually this is done with another quote, but could also be with a backslash.

How does file convertors work in general like word to pdf, XML to json, word to txt etc

I've used many types of file convertor like word to pdf, XML to json, word to txt etc.
How do they work in backend? Is there some specific guidelines each of them follow? Are there some similarity in the way they are implemented.
I tried searching it but most of the articles take me to the web app that can convert the doc, but none of them gives clarity on how it's done.
All of them work by parsing the first document into a data structure. Then generate a document in the other format from that data structure using recursion.
Parsing itself is a giant topic that people take courses on in computer science. But long story short, it proceeds by breaking the document into tokens, and then fitting the tokens into a parse tree using one of a standard set of methods. They have all sorts of fancy names like Recursive Descent and LALR(1). That's where most of the theory you'd want to learn is.
For example if you're writing a JSON to XML converter, you'd first need to parse that JSON. A JSON Parser shows how you could write that, from scratch, using recursive descent. Once written you just need to write a recursive function that takes each data type and does something appropriate with it to generate text in the format that you want.
Incidentally you can also write a "document converter" that converts from a document format to the same document format. Why would someone want to do that? The two most common use cases are to prettify or minify code. Despite the fact that only one format is being dealt with, the principles of how you do it are exactly the same.

Snapshot testing PDFs [duplicate]

I am generating and storing PDFs in a database.
The pdf data is stored in a text field using Convert.ToBase64String(pdf.ByteArray)
If I generate the same exact PDF that already exists in the database, and compare the 2 base64strings, they are not the same. A big portion is the same, but it appears about 5-10% of the text is different each time.
What would make 2 pdfs different if both were generated using the same method?
This is a problem because I can't tell if the PDF was modified since it was last saved to the db.
Edit: The 2 pdfs visually appear exactly the same when viewing the actual pdf, but the base64string of the bytes are different
Two PDFs that look 100% the same visually can be completely different under the covers. PDF producing programs are free to write the word "hello" as a single word or as five individual letters written in any order. They are also free to draw the lines of a table first followed by the cell contents, or the cell contents first, or any combination of these such as one cell at a time.
If you are actually programmatically creating the PDFs and you create two PDFs using completely identical code you still won't get files that are 100% identical. There's a couple of reasons for this, the most obvious is that PDFs support creation and modification dates. These will obviously change depending on when they are created. You can override these (and confuse everyone else so I don't recommend this) using something like this:
var info = writer.Info;
info.Put(PdfName.CREATIONDATE, new PdfDate(new DateTime(2001,01,01)));
info.Put(PdfName.MODDATE, new PdfDate(new DateTime(2001,01,01)));
However, PDFs also support a unique identifier in the trailer's /ID entry. To the best of my knowledge iText has no support for overriding this parameter. You could duplicate your PDF, change this manually and then calculate your differences and you might get closer to a comparison.
Then there's fonts. When subsetting fonts, producers create a unique internal name based on the original name and an arbitrary selection of six uppercase ASCII letters. So for the font Calibri the font's name could be JLXWHD+Calibri one time and SDGDJT+Calibri another time. iText doesn't support overriding of this because you'd probably do more harm than good. These internal names are used to avoid font subset collisions.
So the short answer is that unless you are comparing two files that are physical duplicates of each other you can't perform a direct comparison on their binary contents. The long answer is that you can tweak some of the PDF entries to remove unique parts for comparison only but you'd probably be doing more work than it would take to just re-store the file in the database.

How do I plot some data (using xmgrace in the terminal) using dots, not lines, without explicitly changing it in the GUI?

i'm using xmgrace in the terminal, and want the data to be displayed directly as dots instead of lines. Achieving this in the GUI is simple, but I have to read in multiple files, and do not want to change it every time i start xmgrace. Can I add a command to the files that are read in? Or can I use an option in the terminal when I start xmgrace?
The correct way to set the appearance of a plot from the commandline is to use an existing parameter file, specified using the flag
-param settings.par
The parameter file can be stored beforehand, using the GUI to modify the appearance of an existing, similar plot. Modify the plot as you like, then save the appearance settings in a parameter file (convention is to use the extension .par) using Plot > Save Parameters.
A typical example command would then be
xmgrace -block data2.dat -bxy 1:4 -block data2.dat -bxy 1:6 -param settings.par
In my experience, calling the
-param
flag last thing in your command works best.
There really is no need to be manually text-editing your grace plot files (.agr) to achieve this.
xmgrace has a full and complex language for expressing the configuration of the look and feel for the graph. There are two ways to go about what you described. The simple way is to load the dataset into xmgrace, change everything to make it look the way you want, then save the dataset. You will see the dataset now has tons of lines describing the configuration "#g0 on" "# s0 linestyle 1" etc with your dataset at the end, terminated by a &.
To replicate that graph, spit out the saved header, insert your data, and the insert the trailing &. Feed the result into xmgrace and everything will be all set up. Once you get comfortable you can start doing dynamic substitutions to rename the graph or change the symbol or whatever. See /usr/share/grace/examples for examples of what grace can do (and the config files which generate that).
The more complex method is to load the dataset, save it immediately, change it to look the way you want, and then save it again under a different name. Run diff on the two files and you will get a set of changes. You might need at most a handful of other lines from the non-changing portion, but that is somewhat rare. This produces the minimal set of fixed headers you need to prepend to the dataset. It usually isn't worth the effort to reduce the prefix size.

Resources