Performing ETL on Non ODBC Compliant V2$ Files - etl

I am reposting as my last question was closed due to format issues...
I am trying to extract data from a V2$ file format and some of the usual ETL tools such as Talend, custom scripts etc have not worked, These are These are non-odbc compliant files. I have been unable to find a script or tool that specifically will do the ETL. This file is used by a veterinary practice management system called Avimark. I am looking for recommendations about how to do an ETL. Thanks & Happy Holidays!

you have a custom file format not supported by ETL tools, so you're either going to have to:
add support for this file format to the ETL tool (some ETL tools are open source and you can contribute to them)
or
convert the files into a format that is supported by your ETL tool
As you probably don't have the time and resources to extend the ETL tool to support the V2$ file format (this is just an assumption, but based on your question I assume you're not a developer yourself), you will have to find some way to convert the file(s) before handing them over to the ETL tool. You can ask the software vendor of Avimark if they have a tool available - if not, you're more or less forced to build something yourself (or find someone who does it for you).

Related

Why do many file formats are disguised zip files? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Through the years I've had many opportunities to "reverse engineer" proprietary files, and I noticed that many times these are "disguised" ZIP files which just pack standard XML, HTML, config and raw text files. However, I don't understand why would the developpers do that.
A few examples on top of my head of these "disguised" file formats are:
PPTX, XLS, DOCX and probably all Microsoft's file formats
EPUB
JAR, WAR, though this one I can understand as it is meant to be an archive
There are many other files formats of this sort, and sometimes even company that really don't want their data files to be publicly read rely on this disguised ZIP to store data (like games saves).
What are the technological advantages of ZIP files over custom file types ?
Is there a name for the practice of building a (sometimes proprietary) new file format on top of ZIP ?
If you want your new file format to be interoperable by other applications, you'll need to define your format completely. Building on top of other standards, such as ZIP, XML, and HTML cut down a large part of documents and maintenance effort.
The format designer is usually also the first implementor. Using existing standards means they can use existing, known to be correct and working tools to create and read. This means Microsoft Office file format designer, for example, doesn't need to debug serializing and deserializing logic since they're already using the industry-proven XML.
Using a compressed archive instead of plain archiving such as TAR means your format automatically reduces the required storage when possible. ZIP is an ISO standard and patent-free (as long as it's not encrypted with a strong algorithm), so the designer and implementor don't need to pay for a license, unlike, say, RAR.
Implementing the consuming application on different hardware or platform may require rewriting a large part of code unless it's built on top of already popular standards. An EPUB reader, for example, can be patched together with the ZIP reader library (which is usually built-in with various frameworks) and HTML viewer. That's a near-zero effort from the developer side who can then focus on other features. Since the framework and CPU are likely optimized to handle ZIP compression, they usually perform much better than custom compression format. Another rarely considered factor is security and reliability. A custom archiving format may seemingly work faster or compress more efficiently, but on real-world data it might crash, or worse, return wrong reads which can result in a security breach or incorrect result.
As for companies not wanting their file to be read, plenty of solutions that can be built on top of ZIP. AES encryption is available as an open standard for ZIP under AE-x. Maybe they don't need to hide the entire structure, just values, they can encrypt the individual entries on the XML/JSON or files. EPUB DRMs can be broken easily, but that's going to happen regardless if the ebook was using non-zip based format.
I don't think there's a specific name for building a new format based on top of ZIP. When you want to store a string, you pick one of the available text encoding standards, if you want to keep the value secret, you encrypt it with yet another encryption standard, not invent a new encoding scheme. What those designers doing is simply taking the existing standards, and they're not just using ZIP, they're also using XML, Unicode, various image formats, etc.
About Microsoft formats being ZIP, well, not all of them. Pre-2007 Office files aren't, which is partly the reason behind the difficulties of implementing and improving the format (another reason is Microsoft deliberately prevent people from doing it in the first place by not documenting them). XLSB is ZIP, but instead of XMLs it uses binary serialization, which speeds up saving & opening, but afterward, it operates as fast and as memory efficient as XLSX file. ACCDB, like the precursor MDB, aren't ZIP files, database, in general, are allergic to being compressed. Visio transitioned slower, Visio 2010 uses XML based VDX (not compressed), then in 2013, it adds VSDX (XML and ZIP based), while Project and Publisher don't seem to be moving on new format soon. XPS, Nuget, and Appx are zip, but csproj, vbproj etc aren't. MSI installers are archives but they're not ZIP files.
It's interesting you stopped at JAR & WAR, because continuing on, Android APK files are ZIP files (which in itself may contain the content of the JAR it referenced), so does the overarching AAB. On iOS, IPA files are ZIP too. The LibreOffice default format, ODT, ODS, and ODP are all ZIP & XML based, designed around the same time as Microsoft Office's new format.

How to download data using script?

I want to download bulk number of data from ZINC database. The ZINC identifiers are like this;
ZINC18923487, ZINC45780921, ZINC45670936 etc. I aim to download almost 10,000 identifiers. Can some guide me, how to write a script, or use an already developed script to download my data.
ZINC database is an online free database, http://zinc.docking.org.
I am sorry, if my question is too general, or inappropriate.
Thanks
That website provides a huge repository of information in variety of formats.
There are some .xls (MS Excel) files, that you can click and view in any excel viewer (like MS Excel or open source alternatives).
There are lot of databases in gzip format (.gz files). In order to download these, you need a unix/like like operating (or Cygwin or MinGW or some such alternative). That environment specifically needs csh
csh (C-Shell) to run their scripts
Wget in order to fetch parse the links on their site
Curl in order to fetch the database files
gzip in order to uncompress the zipped files
If you are a programmer with shell scripting skills, you can mash your way into the databases in many different ways, using alternative tools.

How to use Subversion with HelpNDoc

I am writing a documentation for a project that involves multiple developers. We use Subversion (SVN) to work on our code base.
I wrote the first draft of the documentation document using HelpNDoc, which I like for the nice tree-view and easy of use; the problem is that there is a single file, so I don't know how to use SVN to allow other developers to contribute to the documentation and update it.
Do you know if it's possible? If not, can you advice a nice software, easy to use, with a tree-view of the documentation that can be used with SVN or makes it possible for multiple users to update it? We use Windows.
HelpNDoc projects are binary files based on the SQLite open source database engine. The advantage is that the whole documentation stored in a single file so it can easily be copied, moved, shared, backed-up...
However one drawback is that it has to be checked-in as binary content in any version control system including Subversion: diff and merge are not possible on those files.
One possible solution would be to use external documents in HelpNDoc's library: each user works on her own document (which can be a Word document, and HTML web-page...) and a master HelpNDoc project is created to include those documents at generation time. See "Include a file at generation time" in the following step by step guide: How to add an item to the library
Amount of files doesn't matter, real format (text/* or binary) - does. If SVN|any VCS can merge two HelpNDoc files with diverged history (just try it by hand), you'll be happy
I once used Helpinator for software documentation, it's pretty close to HelpnDoc but it's storage format is more suitable for version control.

StarTeam -- How do I move files programmatically?

Any StarTeam users out there?
I'm using StarTeam 2009, which comes with some client tools, including a command line tool, stcmd.exe.
I'm reorganizing a somewhat large Java project containing thousands of .java files and resource files. I would like to know if there is a way using stcmd or some other scripting tool to move files from one folder in the StarTeam repository to another folder.
Of course I can use the StarTeam client interface to drag and drop files, but I'm hoping to avoid that.
Any help would be appreciated.
Thanks
stcmd gives you the ability to add, modify, and delete files and folders. It doesn't provide a mechanism for sharing or moving.
It may be possible to do some of the moves using VCMUtility, but that's not really what it was designed for and I would guess it would end up being more of a pain than doing it by hand.
The StarTeam SDK does provide all the tools necessary to do moving and sharing. It may be worthwhile to look into writing a small utility to do what you're looking for.

MS Excel automation without macros in the generated reports. Any thoughts?

I know that the web is full of questions like this one, but I still haven't been able to apply the answers I can find to my situation.
I realize there is VBA, but I always disliked having the program/macro living inside the Excel file, with the resulting bloat, security warnings, etc. I'm thinking along the lines of a VBScript that works on a set of Excel files while leaving them macro-free. Now, I've been able to "paint the first column blue" for all files in a directory following this approach, but I need to do more complex operations (charts, pivot tables, etc.), which would be much harder (impossible?) with VBScript than with VBA.
For this specific example knowing how to remove all macros from all files after processing would be enough, but all suggestions are welcome. Any good references? Any advice on how to best approach external batch processing of Excel files will be appreciated.
Thanks!
PS: I eagerly tried Mark Hammond's great PyWin32 package, but the lack of documentation and interpreter feedback discouraged me.
You could put your macros in a separate excel file.
Almost anything you can do in VBA to automate excel you can do in VBScript (or any other script/language that supports COM).
Once you have created an instance of Excel.Application you can pretty much drop your VBA into a VBS and go from there.
If it's the Excel/VBA capability that you're looking to use then you could always start by creating all of the code that will interact with the Excel files you're wanting to work on within an Excel file - a kind of master file that is separated from the regular files, as suggested by Karsten W.
This gives you the freedom to write Excel/VBA.
Then you can call your master workbook (which can be configured to run your code when the book is opened, for example) from a VB script, batch file, Task Scheduler, etc.
If you want to get fancy, you can even use VBA in your master file to create/modify/delete custom macros/VBA modules in any of the target files that you're processing.
The info for just about all of the techniques I'm describing I got from the Excel VBA built-in reference docs, but it certainly helps to be familiar with the specific programming tasks that you're tackling. I'd advise that the best approach is to put together your tasks (eg, make column blue, update/sort data etc) one by one and then worry about the automation at the end.

Resources