When we try to save msword doc file as html file we get "wmz" files for the math equation objects.
I tried decompressing the wmz file and saving the content as jpg.
I can open this jpg file in the "Microsoft Picture manager" properly. But trying to open the file in browser displays the error message "The image cannot be displayed, because it contain errors".
What is the procedure to decompress this wmz file and convert it to jpg.
What will be the extension of decompressed file?
.WMZ seems to be a zipped .WMF file.
You can open the unzipped file with a picture view/editor (just tried IrfanView) and save as .jpg.
When you save your Word documents as "Web Page, filtered" you won't get these WMZ files but just PNG files.
Set the "Web Options" to target to a low version of IE (i.e. 4.0) and check "allow PNG files" and "disable features not supported by these browsers".
Added advantage is that the webpage will display better in different browsers.
However, you should do all of this after you first make a copy of your document (and associated files) using Explorer into another location. Open this copy with Word and then Save as "Web Page, filtered". The original you keep for editing. (Don't save the original as a "web page, filtered" or you will loose the ability to edit the equation objects).
Thanks for the help.
Finally i could not remove the black background from the image file.
So using the round about approach for now
1)Decompress the wmz file to byte array(wmf).
2)Open a new word document
3)Paste the byte array into word document.(this document should only contain this data, and no other extra information)
4)Save the doc as html file (WdSaveFormat.wdFormatFilteredHTML)
5)open the "_files" directory created for the html output
6)Find the only "gif" file created inside the directory
Related
I am trying to do exactly what is described in the following thread:
AppleScript/Automator: renaming PDF with extracted text content of this PDF
So I am using the Chino22's version and there are two issues with it:
First, instead of the contents of the pdf, theFileContentsText gets some metadata stuff.
Second, althought the script runs to the end, I get the following error for the last step:
error "The variable thisFile is not defined." number -2753 from "thisFile"
So, how do I get the text contents instead, and how do I define thisFile to the current pdf that is being processed in the loop?
Thanks in advance!
I would not expect the linked script to work.
Except for document metadata, extracting text content from PDF is notoriously difficult and unreliable, and not a road you want to go down if you can possibly avoid it. Adobe’s PDF file format is designed for printing, not for data processing. PDF files contain blocks of Postscript-like page drawing instructions, typically compressed, and while it’s possible for PDFs also to include the original plain text for accessibility use, most PDF generators do not do this so the only way to get the original text is by reconstructing it from those low-level drawing instructions—not a trivial job.
AppleScript’s read command only reads that raw file data; it does not parse it into drawing instructions, never mind translating those drawing instructions back into plain text. Change a PDF file’s extension to .txt and open it in a plain text editor, and you’ll see what I mean. Nasty.
If you need to work with the PDF’s original content (text, images, whatever), your best solution is to get those files before they were converted into a PDF.
If you must extract content from a PDF file, use an existing tool that knows how to do it.
For instance, if you’re lucky enough to have PDFs that contain XFDF (XML form) or accessibility data, there are 3rd-party apps and libraries to extract that content in readable form. I can’t think offhand of any that are AppleScriptable (Adobe Acrobat has only minimal AS support) so you’ll probably need to find one you can run from command line (do shell script in AS).
Or, if the PDFs have a consistent visual structure, a 3rd-party library such as Python’s PDFMiner (which I’ve used in the past) can identify blocks of characters by position and convert those back into strings with varying degrees of reliability (it has to convert font glyphs back into Unicode characters, guess at which characters are close enough to constitute a word, and where to insert space and return characters between those words). You’ll have to write some Python code to extract the bits you want, so look for tutorials to get started (or pay someone to write it for you).
But again, if you can possibly avoid having to extract text from PDF, you should. You will save yourself a lot of trouble.
I'm trying to convert my existing asciidoc documentation into pdf. Asciidoctor-pdf seems quite easy and I'm able to convert single files into pdf.
asciidoctor-pdf -a pdf-theme='./theme/styles.yml' -a pdf-fontsdir='GEM_FONTS_DIR, theme/fonts/' 01-intro.adoc
But my docs are spread across many files. I want do create a single pdf from all those files. Does anyone know how to do this?
Secondly I don't want the generated pdf to be located next du the adoc file. I want to specify a target path.
I'd appreciate every hint. Thanks and best regards. Sebastian
(Dec 26, 2021)
The easiest and most convenient way is to use the VSCode editor with the AsciiDoc extension installed. This extension is developed by the same team that develops the AsciiDoctor text processor. This is a GUI-based approach to solve all your problems so I'm pretty sure u're gonna love it.
(Step 1) After the extension is installed, use the keyboard shortcut Cmd + , to go to the settings and then enter asciidoc.use_asciidoctorpdf in the search bar and tick the check box (see the demonstration below)
(Step 2) To create a single pdf file from multiple .adoc files, just simply put all of them in a single .adoc file with include::directory-to-the-adoc-file.adoc[] (see the illustration below)
(Step 3) Press F1, then type in as pdf and hit Enter to export this single .adoc file as a single PDF file, this will allow u to specify the target export directory for the PDF. Please be patient and wait for a few seconds for the export to complete, the editor will immediately inform u as soon as the export is complete (see the image at the bottom)
Have you considered to work with includes?
Just add to your document "01-intro.adoc" an any position this line:
include::02-next-file.adoc[]
When you build the 01-intro.adoc with your regular command, the contents of 02-next-file.adoc will be put to the position of the include line. Using this method we create a file with many includes and just build that file. We're very happy with that.
I don't have much knowledge of imaging tools but I need to extract images contained within the layers of a psd file. I tried using GIMP with a "save all layers" plugin but that is just saving the root layers so I am ending up with just two .pngs. I need every image in a separate file with the correct sizes.
The reason I need the files is that I have been asking to create an animation with CSS using the images. An example animations is at http://srv1.contobox.com/frontend/ads/preview.html?id=981
The psd document I am trying to extract is
https://www.dropbox.com/sh/ud2eaesej08o0g3/AAAi-_pPHGESOFOBpA0uQfjta
The problem is that these files are structured with Layer Groups (I opened just one of them).
While GIMP is supporting open the file, the "save all layers" plug-in you are using probably is not aware of layer groups.
(BTW, GIMP unstable - the 2.9 development version is likely currently broken for opening PSDs - the image opens garbled there. It opens in gimp 2.8.10, though)
It is possible to save all layers - including sublayers, as separate images with an interaction in the Python console.
With you PSD being the only image open in GIMP, go to filters->python->console and type something along this:
img = gimp.image_list()[0] # retrieves a reference to the image
folder = "/tmp/" # folder of you choice for saving the files
counter = 0
def save_recurse(item):
global counter
if hasattr(item, "layers"):
for layer in reversed(item.layers):
save_recurse(layer)
else:
counter += 1
name = folder + "layer_%03d.png" % counter
pdb.gimp_file_save(img, item, name, name)
save_recurse(img)
(btw, I typed it here in a way you can just copy and paste the listing above in GIMP's Python console)
I am saving the active document as an HTML file, which automatically produces a sub folder containing all of the the document's inline shapes (pictures). I used this code for that:
ActiveDocument.SaveAs FileName:=HTMLPath, _
FileFormat:=wdFormatHTML, AddToRecentFiles:=True
This is exactly what I want, however for each image in the document it saves either one or two files. one if in Word, the image was untouched. But any manipulation (resize, coloring, crop, etc) will cause this HTML save to produce an original image version and edited. I want to delete the originals. the images are just incremented like image001.png, image002.png, etc so I can't compare file names and the file sizes might be different, etc.
How can I determine if in the Active Document that the image is original or edited? By having that information, I assume I can delete every other image (if all are edited) or track which ones are and which ones aren't.
If I have a loop like this, can store an array or something to figure out which ones are original or edited.
For Each oILShp In ActiveDocument.InlineShapes
'if oILShp is not edited, add current index to array
'loop through array and delete images that have an original and edited version
Next
I thing this is not possible, because the original image is shown in Internet Explorer (with filters, like cropping), while the second one is shown in other browsers.
Solution for non-IE browsers
When you delete original file, it will be visible only in non-IE browsers. If you want go this way, open HTML file as string and search, where every file is listed. If is file name preceded by a tag "v:imagedata", delete it. For example
<v:imagedata src="x_files/image001.jpg"
If it is preceded by a tag "img", don´t delete it:
<img width=181 height=241 src="x_files/image002.jpg"
Solution for Internet Explorer only is change setting of Word:
On the Tools menu, click Options.
Click the General tab, and then click Web Options.
Click the Browsers tab.
Under Options, select the Rely on VML for displaying graphics in browsers check box.
or in VBA easily
ActiveDocument.WebOptions.RelyOnVML = True
...and save document. Big disadvantage is that images will be visible only in Internet Explorer.
Solution for all browsers
When you are saving document, use
FileFormat:=WdSaveFormat.wdFormatFilteredHTML
Image file will be only one, but original is lost (for next editing in word), and some formating will be lost. Note, that in non-IE browsers will be the look of document the same as with full formating, minor differences will be visible only in IE and in Word.
For example if I have a business card design done in InDesign and now I need to provide print ready PDF for printers containing multiple copies of the business card. How would you do that? Are there any specific tools?
InDesign doesn't do imposition (placing of pages on one output page in a particular order).
You have to buy/find a tool, a plugin. Like croptima dot com.
Or on this page, there's some interesting stuff:
http://www.adobe.com/cfusion/exchange/index.cfm?l=6&s=5&o=desc&exc=19&cat=223&event=producthome
Alternatively do it by hand, or use a pdf imposition tool.
Succes!
Do an export to PDF ( with any marks you need ). Get the file path. Open a text file and type in :
file
/myFile.pdf
/myFile.pdf
/myFile.pdf
/myFile.pdf
/myFile.pdf
/myFile.pdf
/myFile.pdf
…
Once that done. Go to Indesign, set a box that will host the pdf and run a data merge. You will get your imposition quite freely ;)
Loic
My bad, you need to specify that you are placing images files with a trailing arobase :
#pdfs
"/myFile.pdf"
"/myFile.pdf"
"/myFile.pdf"
…
And specify the absolute path to the file.
How many cards do you need to layout ? If few, you could just flow the indd file into another document and duplicate boxes.
I didn't test but maybe you could draw a grid and point for the indesign file. Best scenario, if grid is selected, the file is flowed in every frame.
Loic