How to convert text from one format to another? - format

I have a docx document which was removed and then restored but text in this file looks like this
$ÄjŸÕ˚ˆw‹~µ2ÑCpW'ø¥:©°»xa"º¥ ∫ÓŒV!‰áOc‘Nü·è?ÒQºrΩg¬~í¬;Æzã\k˝E…$ën"‡Íâ
Is there are anything i could do with it?
I guess i need to change format from something to something but i don't know how and where?
Would be very grateful for any advices.
I was trying to look but i can't even find name of this problem
And how do i know what format is that ?

DOCX document are not plain text files !
You cannot just open open them an get the content. You have to open them up with a document editor that support docx files like Office Word or Libre Office Writer.

Related

Change pptx language encoding

I came across the neat little R package slidex() to convert pptx to rmd. However, it does only support "en-US" language encoding.
How do I change the language encoding of an existing .pptx file?
Language tags are scattered all over a PowerPoint file. There are add-ins that can do a pretty thorough job: PPTools LanguageSelector
Personally, I prefer to change the file ending to .Zip, expand the file and use a text editor like NotePad++ to find and replace all language tags (you're looking for tags like lang="en-US"), then rezip. The default Windows Zip utility is not the best for this, it adds a top-level folder that PowerPoint can't parse. WinZip and 7-Zip are better.
If you're using PowerPoint for Windows, save the file as a PowerPoint XML Presentation (*.xml). Do the find and replace on that, then resave as a normal presentation. That avoids the unzip/rezip issue.

Replace text in PDF using Cocoa

I am looking for a way to replace text in an PDF document in my Mac Application. But the problem is that I don't know how. I am thinking of converting the PDF to an HTML file, so I can use stringByReplacingOccurrencesOfString: and then converting it back to an PDF, but I can not find out how.
I also tried to replace the text using CGPDFDocumentRef but I couldn't find a valide method.
Can anyone please help me to solve this issue?
Thanks, David
It is not possible to replace text in PDF using CGPDF* API. PDF -> HTML -> PDF will not work because the double conversion will loose content (PDF and HTML formats are not quite compatible).
The only solution is to find a 3rd party toolkit that supports this functionality.

Why Microsoft OneNote can show a link to the contents are copied from?

When I copied some good information online and pasted them on Microsoft OneNote, it can tell that
Pasted from [website_address]
I have done some search, but hard to find how they did this. How to add this feature to a software?
According to the HTML clipboard format, there is an optional SourceURL property in the description section of the HTML clipboard data. You can also extract it from the BASE element in the HTML fragment.
Sometimes the website / web-page itself has javascript code that appends that text to anything copied from it. If this is the case you will see it when pasting into even notepad.exe
The main provider of such code is tynt.com and a lot of people find it annoying.

Converting Word to PDF Using SharePoint 2010 Word Automation Services

I have tried to find out the way I can put locks or disable the copy and paste on the PDF file after the conversion. I looked at the ConversionJobSettings properties but I couldn’t be able to accomplish this.
Based on what I have read, the sharepoint2010 Word Automation services API provides very limited capability in manipulating the conversion logics but is there any way I can lock down the content so that it cannot be copied?
Thank for your help
You will either need to code something up yourself or get a third party product such as this one, which allows conversion as well as PDF manipulation including security and watermarking.
Note that I worked on this product, so I am obviously biased. Having said that, it works brilliantly.
The only way to prevent copy and paste (as text) is to create image versions of the pages and saves those as a PDF.
a possible solution:
1) Use Word automation to print to a PostScript (PS) printer driver to get a .ps file
2) Use GhostScript to convert the PS to tif files
3) Create a PDF using the tif files (possibly with GhostScript too)

How to convert pdf and doc files to html using Cocoa

I would like to convert pdf, doc files to html files using Cocoa
Please help me in this.
Thanks in advance,
You can convert Word files to HTML using NSAttributedString. You can't do this in pure Cocoa for PDF files; you'll have to use a conversion tool, such as stigi suggested. To do that, use NSTask.
Cocoa's PDFKit framework can convert a PDF file to text, through PDFDocument's -string method for example. Of course this won't copy images or formatting though, and it depends on PDFKit being able to recognize text in the file.
there are a couple of tools for the unix commandline that do such kind of conversions.
check out http://pdftohtml.sourceforge.net/ & http://rtf2html.sourceforge.net/
you may see if there are other tools like this.
but to get back to your question. these command line tools can be called from within your cocoa app (won't work on the iphone) and produce the html result.
check out this link for a guide on how to embed such command line tools within your app.

Resources