Flat file data validation - validation

I am supposed to load some data that is received in flat files (csv). The problem is that the supplier is generating a lot of junk data.
before starting to develop anything new on my own, I would like to ask if there is something that could automate this process.
I have found an open source tool called flat file checker, It can accept a bunch of various validation rules including regex, but the problem is that it does not work. It is exactly what I need, but it is not validating.
Does anyone have any suggestion for something like this, but actually works

Related

What is the required file format for Google AutoML Datasets?

Whenever I try to upload my dataset to the AutoML Natural Language Web UI, I get the error
Something is wrong, please try again.
The documentation is not very insightful about how my CSV file is supposed to look, but I tried to make a simple sample file just to make sure it works at all, it looks like this:
text,label
asdf,cat
asodlkao,dog
asdkasdsadksafask,cat
waewq23,cat
dads,cat
saiodjas,cat
skdoaskdoas,dog
hgfkgizk,dog
fzdrgbfd,cat
otiujrhzgf,cat
vchztzr,dog
aksodkasodks,dog
sderftz,dog
dsoakd,dog
qweqweqw,cat
asdqweqe,cat
dkawosdkaodk,dog
ewqeweq,cat
fdsffds,dog
bvcghh,cat
rthnghtd,dog
sdkosadkasodk,cat
sdjidghdfig,cat
kfodskdsof,dog
saodsadok,dog
ksaodksaod,dog
vncvb,cat
I chose this formatting according to the Google suggested Syntax
But even with this formatting I still get the same error
I've seen this question Format of the input dataset for Google AutoML Natural Language multi-label text classification but according to the answers there it seems my formatting should work, so I do not know why I get the error
I've just copied the CSV file and uploaded it to my own project and the dataset created worked. One problem is that an extra label was created "label" - this is because the header is not expected to be in the csv file (probably this should get fixed).
Based on that it seems the problem isn't the CSV file format. I would recommend to check if your project is setup correctly. You can open a bug to get someones help. Either you can open a bug in public issue tracker or send feedback using the UI (there is 'Feedback' option in the menu on top right side of the page).
I have found the problem! As Michal K said, there was nothing wrong with the formatting, the real problem was I was not assigned the role of Storage Object Creator, which is necessary because the Data is uploaded in Cloud Storage first

Could not load document /LoW.xml. Maybe it is not valid TEI or not in the TEI namespace?

I am sure this is quite basic, but I'm putting together a traning-database with some files on the loss of a particular ship.
When uploading it to exist-db (I created the DB with tei-publisher ...) and wanting to present the file, I get the above error.
I am quite sure that my file is okay, however, I am quite irritated.
Please find the tei-code in the following paste.
https://pastebin.com/0H3jCfe8
the header is in code, as required (and irritating)
<teiHeader>
<fileDesc>
<titleStmt>
<title>TITLE</title>
</titleStmt>
<editionStmt/>
<extent/>
<publicationStmt/>
<seriesStmt/>
<notesStmt/>
<sourceDesc>Description</sourceDesc>
</fileDesc>
<revisionDesc/>
</teiHeader>
OxygenXML claims IllegalStateException but I cannot find the error....

Create Multiple Slides from a List with Common Template

I have created a certificate design with powerpoint.
Now I have to create 100+ copies of it... each with a different name (the recipent).
I was wondering if there was an easy way to do it...
I can have the list of names in excel or txt.
I am open to other ideas as well, like changing the slide into an images and batch processing it in a simple way
You may also try out SlideMight, a tool for merging hierarchical data with PowerPoint templates. SlideMight supports iteration over data, to generate slides or to populate tables. There is more functionality, but you don't seem to need that. SlideMight is in fact a coding system, like mail merge for Word is.
Input data format is at this time just JSON; you would need to convert your Excel sheets first, e.g. using this Excel to JSON add-in for Excel.
There are versions for Windows and Mac OS X.
More information is at www.SlideMight.com
Disclaimer:
I am the owner of Delftware Technology, the company that developed SlideMight.
And I am one of the developers.
This is a question that really belongs in SuperUser, not StackOverflow (which is intended for coding questions, not software how-to-use questions).
But ...
Save your names to a plain notepad TXT file, one name per line.
Start PowerPoint, choose File, Open and point to your TXT file (you may force the matter by choosing . in Files of type:
Apply whatever template you like to the result.
I have a commercial add-in that'll do this and quite a bit more, but from your description, you don't need it.

How to make Compass/Sass compilation command generate parsable output

I’d like to automate the compilation of Compass projects and be able to get output that I can parse so I can take only what I need (the errors) and further format them how I want.
The issue is that Compass output is not in a format that can be easily parsed (it has error messages on multiple lines).
Is there any reliable way to parse this output? Or… any idea what would need to be changed and where in Compass’s code to allow a new param that would allow you to specify the output format (e.g. JSON, XML)?
I’m asking this because I don’t know Ruby, so I would need a starting point. Their current code is not easy to understand (due to the fact that I don’t know Ruby), but if I at least have a starting point I would try to see what I can do and hopefully create a pull request with this if I get it working.
I think, there is another way to solve this problem, what you think about to parse the output css and do not touch compass.
There is a good framework for creating postproccesor for css:
https://github.com/postcss/postcss
You can do what you want with output css and send message to console or send email or other things, and many other things.

CSV import with user correction

I'm looking for general UI advice on importing a CSV file. The UI is done in ASP.NET MVC3.
When the user uploads the file I need to validate it and allow them to manually correct any errors within the browser before I store it in the database. There's so many potential errors to check for and I'm really not sure what the best way is to achieve this. Another thing is that I only have a few days to implement this so it can't be too complicated. I'm fine with regular expressions and programming and I already have the posted file stream available, but I just can't think of a good and practical way to present this functionaly to the user.
Hope someone can inspire me. Many thanks.
There are some suggestions here:
Reading a CSV file in .NET?
Of these, we chose to use Linq2CSV in our MVC projects.
http://www.codeproject.com/KB/linq/LINQtoCSV.aspx
It is fairly easy to use, and validation is nice. You define a simple class that lays out the structure (columns) of the csv file. It will do basic validation, and if that passed, we sent it through a Validator that used DataAnnotation attributes to validate against more complex rules. We found it reliable, and we were able to add some features to it that we wanted.
If the file was pathologically bad, we'd fail the whole thing and present a single error message. If the file was reasonably sound, we would display the rows in error along with the error messages for the row so they could see the problem in context. In our case, this was a display grid only - we did not allow editing through the website - because the CSVs were being generated out of their data system, and we needed them to edit the source data in their system and regenerate the CSV. To do in place editing, you would need to stage all the column values as strings so they can fix numbers that don't parse, etc.

Resources