Although I am using Doxygen for some years, there are lots of other methods used to create documentation: PowerPoint files, Word files, UML files. Although every file type has its advantages (e.g. PowerPoint is easy to create and ideal for presentations), this quickly leads to lots of different files, in different locations, and no overview.
I now want to merge all these files into one entry point. My idea is to use Doxygen to generate HTML files that contain the source code documentation, and to write overview documentation in Doxygen (just as Qt does), and include images based on the other files. I am also trying to automate the whole system.
What I am now looking for is a way to convert these other files to images, so my automated system can include these images in the generated HTML (and finally in the generated QCH (Qt Help) file).
For PowerPoint I found some tools, but I don't seem to get the free ones working correctly, and a trial version of a commercial one fails on some of my PPTX files.
For Enterprise Architect I didn't find a way to automate the generation of images.
Does anybody know how to automatically generate images from Enterprise Architect files and from PowerPoint files?
Found it.
Enterprise architect has a Java Archive (eaapi.jar) which you can use to access the EAP files from Java.
What you need to do is get the repository, open the file and getting all top-level packages from it (I've omitted all error handling code from this sample):
org.sparx.Repository repo = new org.sparx.Repository();
org.sparx.Project project = repo.GetProjectInterface();
repo.OpenFile(args[0]);
org.sparx.Collection packages = repo.GetModels();
Then loop over all the packages and call your own method (because you will need to call yourself recursively):
for (short i=0;i<packages.GetCount();i++)
{
org.sparx.Package pack = (org.sparx.Package)packages.GetAt(i);
handlePackage (project, pack, currentdir);
}
Then in the method, loop over all diagrams and generate the images, and loop over all sub-packages recursively:
public static void handlePackage (org.sparx.Project project, org.sparx.Package pack, String output)
{
for (org.sparx.Diagram diagram : pack.GetDiagrams())
{
project.PutDiagramImageToFile (diagram.GetDiagramGUID(), output + diagram.GetName() + ".png", 1);
}
for (org.sparx.Package subpack : pack.GetPackages())
{
handlePackage(project,subpack,output);
}
}
That's it.
Some pitfalls:
You must pass a fully qualified path to the OpenFile method. Even if the file is in your local folder, you still have to pass the full path.
Same for the output files
Related
My background:
I am a newbie when it comes to HTML scrubbing. It has been about four years since I did my only work coding for with C# for html. My other coding with C# equally a while back was for forms to manipulate data in SQL Server databases.
What I have done to try to get started with HTML Agility Pack (HAP):
I have spent several days trying to make sense of instructions found from various online sources about how to get started with HTML Agility Pack. Some of what I have found so far is listed below:
www.4guysfromrolla.com/articles/011211-1.aspx
olussier.net/2010/03/30/easily-parse-html-documents-in-csharp/
stackoverflow.com/questions/846994/how-to-use-html-agility-pack
shatalov.su/en/articles/web/parser_1.php
still more referred to below...
My Results so far:
I have found the material to be quite confusing with each source seeming to tell me something different. All my attempts have come to dead ends.
So that you can efficiently sort out my confusion and reply to my specific situation I will describe in three sections below my project, my environment and my questions;
My Project
I am tasked with creating a process to scrub data from html files. I know the files well. The files will reside on the file system on local on the machine. The html file(s) will be created elsewhere by a process we do not own and will be placed in the local folder I just referred to above. (FYI - Though it is not a part of my question, I expect to create a project or app that will be run on a schedule to perform the scrubbing task and then input the collected data into a database table.)
My Environment
As stated above the html file(s) to be processed will reside on the local machine.
I have newly installed Visual Studio 2010 Professional on this machine to code for this project.
The HTML Agility Pack is now accessible to this machine on a file share.
Under REGEIT: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP are listed the following indicating the version of .NET framework installed on this machine;
CDF
V2.0.50727
V3.0
V3.5
V4
V4.0
My Questions
1.) I am told by some sites to download HTML Agility Pack and to use the file "HtmlAgilityPack.dll," however the zip file contains nine folders, each with a different copy of this file. Which one do I want?
Here are the names of the folders;
Net20
Net40
Net40-client
Net45
sl3-wp
sl4
sl4-windowsphone71
sl5
winrt45
2.) An answer to a forum question “How to I use the HTML Agility Pack” at stackoverflow.com/questions/846994/how-to-use-html-agility-pack instructs the questioner to “Download and build the HTML Agility Pack Solution”, and directs the questioner to the site htmlagilitypack.codeplex.com which then has a link to nuget.org/packages/HtmlAgilityPack which says to ‘install’ the HTMLAgilityPack by running the command “PM> Install-Package HtmlAgilityPack” in the “Package Manager Console”
What does all this mean? Other sites say to bout the dll in the bin folder. What is that telling me to do?
Please explain with more detailed to get me started.
3.) Assuming I am using C# what kind of project should I create?
4.) Please direct me to any other resources that you believe is applicable to my project.
Looks like you can create a .NET 4.0 project, given the .NET framework versions you have installed on your machine. What type of project depends on how you'd like your application to run. I'd personally opt for creating a C# Class Library project that contains the load html and scrub code and then host that in whatever mechanism you want to use to actually open the files.
To open a file from FileSystem, either use File.OpenRead or File.ReadAllText from System.IO.File. You can pass the stream or the file contents to the HtmlDocument.Load/LoadHtml methods.
HtmlDocument doc = new HtmlDocument();
// Use File.ReadAllText
string contents = File.ReadAllText("PathToFileName");
doc.LoadHtml(contents);
// Or use a stream
using (var contents = File.OpenRead("PathToFileName"))
{
doc.Load(contents);
}
Possibilities for hosting are plentiful. Console Application (can be invoked from the command line or through the Task Scheduler), Windows Service (can be loaded in Windows, run in the background even when nobody is logged on to the machine and can potentially use the FileSystemWatcher to automatically pic up the files, or a Windows Forms/WPF application which will let the user select the files to process and then show the results somehow.
As for how to use it, this is one of the primary issues with the Html Agility Pack. New ways of using it have been added over time and the actual library has therefore several ways you can use. You could take the old fashioned XPath query route (which was the original API) or you can use the Linq-to-HTML/XML route (which is the newer, way). Neither is better than the other, they both have their distinct advantages. The XPath solution allows you to store the queries in a text file easily, so it's great for a configurable system, while the Linq-To-HTML version is a little easier on the eyes from a developer perspective.
As for how to download it, there are a number of options here as well.
You can indeed download the sources from the CodePlex website. Regardless of how you proceed, you might want to do that any way, it allows you to look under the hood and figrue out why something works the way it does, even if you don't compile the library yourself.
You can download the binaries from CodePlex and store them with your project, before the creation of services such as NuGet, this was the only easy way for developers to distribute their libraries.
I'd personally choose to go the NuGet route. When you're using Visual Studio 2012, NuGet is already integrated with Visual Studio. When you're using Visual Studio 2010, you'll have to install the NuGet extension to get the same functionality. Once installed you can open the Nuget Package manager Console from within Visual Studio. With a Visual Studio Solution open and your freshly created Class Library selected in the Solution Explorer you then proceed to enter the Install-Package HtmlAgilityPack command to let Visual Studio download and install the proper version of the HTML Agility Pack for your project. No worries about which library to select, Visual Studio will do that for you.
How to use it now that you've installed the library completely depends on what type of HTML scrubbing you're after and whether you choose the XPath or the Linq-to-HTML route. But it generally comes down to loading the HTML Document:
HtmlDocument doc = new HtmlDocument();
doc.Load(/* path to file or stream */); or doc.LoadHtml(/*string*/);
And after loading the file and catching any parsing errors that might occur, proceed to query the HTML using XPath like the contents are actually XML (the XML/XPath documentation from MSDN actually applies here):
var nodes = doc.DocumentNode.SelectNodes("//table/tr/td");
Or the same query using Linq-to-HTML:
var nodes = doc.DocumentNode.Descendants("table")
.Select(table => table.Elements("tr").Select(tr => tr.Elements("td")));
Or use the Linq-to-Html with Linq query syntax:
var tds = from tables in doc.DocumentNode.Descendants("table")
from tr in tables.Elements("tr")
from td in tr.Elements("td")
select td;
You can make the queries as wild as you want. The syntax is either similar to the standard XPathnavigator syntax in the .NET Framework (using SelectNodes/SelectSingleNode/Children etc) or the Linq-to-XML syntax (using .Descendants/.Ancesters/.Element(s) and standard Linq).
See also:
Linq to XML documentation
XPathNavigator/IXPathNavigable documentation
I did some basic research, and it seems that there is no one standard to create these. I saw a number of code project pages that had code for building such an exe, but no reference on how to identify one, nor if it possible to extract files without actually running the application.
So the questions are:
Is there a method I can use to identify if a particular file is a self extracting exe?
Are there a few formats that are extremely common in the computing industry
If so, do these formats allow for extraction without running the packaging exe?
If not, is there a method to run such a file in some sort of sandbox environment, allowing us to see the resulting files (on Windows)?
Note I'm open to both programmatic solutions or those using third party utilities/libraries.
I need to split PowerPoint presentation file (pptx and, if possible, ppt) into a set of original format files (pptx or ppt) – each containing one slide from the original. I need to do this programmatically on Linux Ubuntu server using free tools or external free API. When a file gets uploaded to a directory program will be called from my main program (written in PHP) and do the split.
I am looking for suggestions about language or set of tools to use. I looked at several options listed below. It will take some time to try all of them but if anyone could exclude or add to the list and/or provide code examples it would help.
Thanks!
(1) Apache POI project (POI-XSLF)
(2) OpenOffice unoconv command line utility
(3) C# (with compiler Mono for Linux). This may include indirect option of deleting slides with powerPoint.Slides(x).Delete
(4) JODConverter (Java OpenDocument Converter)
(5) PyODConverter (Python OpenDocument Converter)
(6) Google Documents API
(7) Aspose.Slides for .NET is out because of cost
When I had the same needs I ended up shelling and using "UNOCONV" to convert the files to PDF. And then used "PDFTK" to split the file by pages. Once that is done you should be able to take the extra step and convert the new split PDF files back to PPTX using one more UNOCONV.
While it seems rather complicated, PPTX seems to be "that one ooxml file no one wants to touch". Libraries seem to be few and incomplete mostly.
I've developed a GUI for some build scripts, and am now in the process of deploying it. As the script will be deployed to a number of different machines at various points, I need to use the standard format of directories that the team use.
The GUI consists of a ".fig" file that contains the visual definition of the UI, and a m-script that defines the functionality. I need to locate these two in "fig/" and "m/" folders respectively, but I can't figure out how to. I first searched for an include statement of some kind in the m-script, as when I Run it on its own, the error message in the command window states that the ".fig" file can't be found, but there doesn't seem to be a reference to the ".fig" file anywhere, I assume that it's inferred as both files have the same name but a different extension.
I fear that Matlab's GUI system requires that both ".m" and ".fig" files are in the same location, but this will be an inelegant solution that I'd rather not go for if I can avoid it.
The next thing I'm going to try is to call a script that copies the fig file from the other directory to the same location as the m-script, when it is executed, then deletes that copy once the script exits, which again seems a clunky solution, but will allow me to adhere to the team's organisation conventions.
Does anyone else know of an undocumented means of specifying the relative location of a GUI ".fig" file?
You can export the GUIDE-generated GUI as a single .m file. Check out this blog post: GUIDE GUIs in All One File.
I'm not sure if this is a new feature, or one of those things that has always been there...
This question is kinda similar to this one, but not exactly. I have a game engine in C#, and I'm working with some people who want to use my engine. Originally I designed the engine so that all the assets are external - non programmers can create art, music, xml settings, etc. and that anyone could modify an existing game, and share them amongst each other. Basically the whole thing including the engine itself is open source.
The group I'm working with (one of only two projects using my engine currently) wants to close their assets so they can't be modified. Although it's against my principle, I don't want to turn them away, both because I've already been working with them a while and because the market is very small (both for engines like mine, and for users of those engines).
The Actual Question
Is there a way, maybe some available software, that can take an exe and a bunch of other arbitrary files, and smash them into a single exe, that isn't just an archive? I would like the final exe to behave like it runs the first exe with some command line parameters that refer to the bundled files. For example, running bundle.exe would be just like running original.exe --project_path=/project but the project files are inside the bundle, and cannot be retrieved from it.
My original exe is written in C#. I doubt that matters.
You could pack these files as embedded resources.