I want to dynamically load (AJAX) the text from some Microsoft Word files into a webpage. So I might have a link to essays I've written and upon mouseover have it load the first few sentences in a tooltip.
Only if you have a parser. I think the new format is a zip archive with XML schema. But the old one is just binary.
There are some parsers out there.
I know of wvWare but it seems it's outdated. (http://wvware.sourceforge.net/)
This is maybe something worth looking at: http://poi.apache.org/hwpf/index.html
And yeah, forgot to mention how to do this. :-)
First you need to make the javascript ask for the data through ajax. The serverside has to take care of the parsing and return the text to the javascript. This will be a pain in the ass. I haven't done this myself and have never tried the parsers I linked, so I'm not sure if they suit you. Images, stylesheets, etc.... not sure if that will be useable.
At least, good luck.
For security reasons, it is not possible to directly load a local file (such as a Word document) into the page using simply Javascript. The user will need to upload the file to the server, which you will want to parse on the server and then you can load whatever result you like into the page using Ajax.
It sounds like you mean to upload your files (e.g. essays) to your server to allow users to download them, and want to create a server-side page that will parse the files and print the first few lines (so it can be called by an AJAX method that displays a preview on hover).
To suggest a tool for this, we'll need to know whether these are "old" Word format (Office 2003 - extension is .doc) or "new" Word format (Office 2007 - extension is .docx).
It will also be good to know what you're using to create your pages server-side, since different document-reading tools support different programming languages. If you're using Java to read .doc files, you can use the tool we use at my place of work, which is POI (http://poi.apache.org/). If you're using something else, try searching google for {read in }, e.g. {read .docx in ruby}.
If all of this is Greek to you and you have no prior experience with developing custom server-side web code, this is probably going to be unnecessarily painful and you should consider an alternative (like manually creating a 3-line text "preview" page for each regular page, and then just showing that).
Related
So I have an Excel file that functions as a report folks fill out daily. I'd like to have it as it's own tab, but when they're done filling it out, they FILE > SAVE AS to the Team files. Basically a quick access tab, where they would then save each file to a folder that functions as an archive.
Is this possible? If so, what am I missing during setup?
Thanks!
So I guess it's kind of possible to do what you're wanting, maybe by hosting the spreadsheet somewhere else (not in the general "Shared" folder in the Team's SharePoint site), and making it Read Only so users have to save it elsewhere, but for something like this, it really sounds like more of a "form" type of scenario, and I'd suggest looking to a more standard form solution. Maybe look into Microsoft Forms, Power Apps, or even Forms for Excel (Also part of Office 365, but stores the final result into a spreadsheet in your OneDrive). It depends on the Excel and what it's doing, of course, but this is assuming it's a reasonably simple form. If not, post a comment here with more information and I can update this answer / suggest alternatives.
We've got a system that takes in a large variety of PDFs from unknown sources, and then uses them as templates for new PDFs generated by Prawn.
Occasionally some PDFs don't work as templates for Prawn- they either trigger a generic Prawn error ("Prawn::Errors::TemplateError => Error reading template file. If you are sure it's a valid PDF, it may be a bug.") or the resulting PDF comes out malformed.
(It's a known issue that some PDFs don't work as templates in Prawn, so I'm not trying to address that here:
[1]
[2])
If I take any of the problematic PDFs, and manually re-save them on my Mac using Preview > Save As [new PDF], I can then always use them as Prawn templates without any problem.
My question is, is there some (open source) server-side utility I can use that might be able to do the same thing- i.e. process problematic PDFs into something Prawn can use?
Yarin, it at least partially depends on why the PDFs don't work in the first place. If you can use them after re-saving with Apple's (quite bad) preview PDF code, you should be able to get the same result using a number of different tactics:
-) Use an actual PDF library to open and save the PDF files (libraries from Adobe and Global Graphics come to mind). These are typically commercial products but (I know the Adobe library the best) they do allow you to open a file and save it, performing a number of optimisations in the process. The Adobe libraries are currently licensed through a company called DataLogics (http://www.datalogics.com)
-) Use a commercial product that embeds these libraries. callas pdfToolbox comes to mind (warning, I'm affiliated with this product). This basically gives you the same possibilities as the previous point, but in a somewhat easier to use package (command-line use for example).
-) Use an open source product. I'm not very well positioned to provide useful links for that.
There is another approach that may work depending on your workflow and files. In graphic arts bad files are sometimes "made better" by a process called re-distilling; you basically convert the PDF file to PostScript and re-distill the postscript into PDF again. Because this rewrites the whole file structure, it often fixes fundamental problems. However, it also comes with risks as you're going through a different file format. Libraries such as GhostScript (watch the licensing conditions) may allow you to do this.
Given that your files seem to be fixed simply by using preview, I would think a redistilling approach would be overly dangerous and overkill. I would look into finding a good PDF library that can automatically open and save your files.
I recently automated the creation of Powerpoint Presentations in a site I'm making. I found the Office Interop libraries extremely simple to use.
Office isn't built for this kind of thing in a webserver environment, so I'm looking at creating the Powerpoints using Open Office XML, only it's so extremely complex. For example I downloaded some code to create a blank presentation with some text. This code was around 300 lines! Using the Office Interop libraries I could do the same thing in just a couple of lines of code.
I don't have time, nor do I want to attempt to learn how to interact with the Open Office XML libraries, so I'm hoping someone has made a wrapper for the Open Office XML libraries. So far all my searching has only given me one result, Aspose Slides for .NET. This looks really hopeful, but it also looks rather expensive
Has anyone ever used a decent wrapper or alternative before?
If you are looking at automating the creation of Powerpoint presentation files, I'd say you continue with OpenXML, there's nothing better than it. Everything else is either paid or don't offer entire gamut of functionality that Open XML can provide.
If you find creating a blank file tedious, you could save an empty file somewhere and use that as a template for performing further operations on it.
The only thing close to a wrapper for PowerPoint I've found is the Open XML PowerTools. It includes a PresentationBuilder class which can be used for some specific tasks like combining slides from multiple PowerPoint documents into a new document. Although its pretty limited in its functionality you could extend the class.
However, I've come to the conclusion that there just is not a good wrapper out there so I've had to do what everybody pretty much recommends and that is using the Open XML SDK Productivity Tool and the Reflect code button.
I put together a basic presentation then Reflect Code and put that into a class. Yes its a lot of lines of code and its not the most elegant solution but it does work. Then from there I can extend or modify that class to do the specific things I need to do with each slide. The Productivity Tool is a big help for figuring out the code need to do specific things. I try to keep it simple and just do one or two things at a time, Reflect Code, then look at the code to see what it does.
You could try SoftArtisans PowerPointWriter, it has a template mode that allows you to start with an existing PowerPoint file with a few place holders, and merge your data with your presentation with as little as 5 lines of code.
Disclaimer: I work for SoftArtisans
I need to create pdf reports in my app. I'm using asp.net mvc3. What's the best way to do this? I don't really want to use a reporting framework if i can avoid it, it's just a few reports, table layout, groupings, pagination possibly, totals, ability to merge pdfs into 1 pdf....any ideas? what would be ideal is if i could convert my html view into a pdf simply...
There is nothing built into .NET allowing to create PDF files. So you have two possibilities: write one yourself from scratch or use one that exists.
In case you decide to go with the second you may take a look at flying-saucer which along with ikvmc.exe could be used to convert XHTML files into PDF. I have blogged about some of the required steps in order to get this working.
Some possibilities:
I think you can do this with SQL Server reporting services (in SQL rather than a 3rd party reporting framework)
Low level PDF libraries that can be used: PDFSharp, iTextSharp.
You could print an html file to a postscript driver using word automation, then convert the PS to PDF via GhostScript
I would like to start working with parsing large numbers of raw HTML pages into semantic data structures.
Just interested in the community opinion on various available tools for such a task, particularly various useful libraries in any language.
So far, planning on using Hadoop to manage a lot of the processing, but curious about alternatives.
First you need to download your page source and then create a DOM tree.
if you are coding in C# you can user the following tools to create your DOM tree.
1) http://htmlagilitypack.codeplex.com/
2) http://www.majestic12.co.uk/projects/html_parser.php
the first one is easy to use but second one is much faster and memory friendly and I suggest you to use the second one if you want to create a robust application
then you can extract usefull content from web page using:
http://www.chrisspen.com/blog/how-to-extract-a-webpages-main-article-content.html
and many other articles you can find to extract content from web page by Googling (extract main content from web page)
Hope it helps