How is formatting retained on the clipboard? - clipboard

When copy & pasting formatted text, where does the separation between content and formatting occur? For example, copy & pasting from Word. When the content is copied, it can be pasted into something like TinyMCE which can retain the formatting, or into a regular browser textarea which will strip the formatting.
I've been told that the stripping of the formatting occurs by the application that is having content pasted into it. Because of this, pasting formatted content is unreliable.
However, to me it seems like the clipboard either has two versions of the content, and pastes the appropriate one, or one version is copied but the formatting is somehow flagged so that an app can request the content with or without the formatting.
How and where does this separation occur?
Thanks

Since you're talking about Word, I'm assuming you're asking about Windows:
Clipboard Formats.

Related

AppleScript: renaming PDF with content of PDF

I am trying to do exactly what is described in the following thread:
AppleScript/Automator: renaming PDF with extracted text content of this PDF
So I am using the Chino22's version and there are two issues with it:
First, instead of the contents of the pdf, theFileContentsText gets some metadata stuff.
Second, althought the script runs to the end, I get the following error for the last step:
error "The variable thisFile is not defined." number -2753 from "thisFile"
So, how do I get the text contents instead, and how do I define thisFile to the current pdf that is being processed in the loop?
Thanks in advance!
I would not expect the linked script to work.
Except for document metadata, extracting text content from PDF is notoriously difficult and unreliable, and not a road you want to go down if you can possibly avoid it. Adobe’s PDF file format is designed for printing, not for data processing. PDF files contain blocks of Postscript-like page drawing instructions, typically compressed, and while it’s possible for PDFs also to include the original plain text for accessibility use, most PDF generators do not do this so the only way to get the original text is by reconstructing it from those low-level drawing instructions—not a trivial job.
AppleScript’s read command only reads that raw file data; it does not parse it into drawing instructions, never mind translating those drawing instructions back into plain text. Change a PDF file’s extension to .txt and open it in a plain text editor, and you’ll see what I mean. Nasty.
If you need to work with the PDF’s original content (text, images, whatever), your best solution is to get those files before they were converted into a PDF.
If you must extract content from a PDF file, use an existing tool that knows how to do it.
For instance, if you’re lucky enough to have PDFs that contain XFDF (XML form) or accessibility data, there are 3rd-party apps and libraries to extract that content in readable form. I can’t think offhand of any that are AppleScriptable (Adobe Acrobat has only minimal AS support) so you’ll probably need to find one you can run from command line (do shell script in AS).
Or, if the PDFs have a consistent visual structure, a 3rd-party library such as Python’s PDFMiner (which I’ve used in the past) can identify blocks of characters by position and convert those back into strings with varying degrees of reliability (it has to convert font glyphs back into Unicode characters, guess at which characters are close enough to constitute a word, and where to insert space and return characters between those words). You’ll have to write some Python code to extract the bits you want, so look for tutorials to get started (or pay someone to write it for you).
But again, if you can possibly avoid having to extract text from PDF, you should. You will save yourself a lot of trouble.

macOS using html tags in nstextfield

Im wondering is there any ways to make this possible:
I have a nstextfield(or nstextview). And I also have one button, clicking on that should activate Bold mode for selected text, or the text that would written further.
First idea I had - is to use attributes for characters that would be written further, but this idea is not so good, as I would need to save that string in file later. I can save attributed string, but this gives me not proper format, what I would like to see is kind of or smth like that.
If I understand correctly your "First idea" is correct. Within your program you use NSAttributedString to add bold etc. your text. When you wish to save the text you can convert to HTML, or a number of other formats, and reading these formats and converting back to NSAttributed is also supported. A good place to start is Formatted Documents and Attributed Strings.

angular-translate and escaped characters

I have an app that uses AngularJS along with angular-translate to provide localization. The app currently uses only English and German.
On the login page is a required field, an email. If there is an error, the app displays "A valid email is required" in English.
In German (and forgive me if this is mangled, this is Google Translate, I don't know any German) the phrase is "Eine gültige E-Mail erforderlich".
In the second word, you'll notice an international character, it looks like a "u" with two little dots over it. When the app is set to display in German, that character gets escaped and much weirdness happens on the screen.
Looking that the docs, it seems like using $translateProvider.useSanitizeValueStrategy() is supposed to handle this, but it's not. If I use $translateProvider.useSanitizeValueStrategy('escaped') then it look like this onscreen:
If use $translateProvider.useSanitizeValueStrategy('sanitize')(which I'd really prefer of course) then it looks like this:
I also happened to come across this article which states that my *.js translation file needs to be UTF-8 encoded. I opened up that file in NotePad++, changed the encoding to UTF-8 Without BOM and saved it, but I'm still seeing the error. And VS really hates the file now.
I know, it's a little late, but maybe others have similar Problems:
Adressing the UI:
Are you using the attribute style e.g.
<span translate="key"><span>
or inline style
<span>{{key | translate}}</span>
in your view?
I am working with the second style without issues.
Addressing your Problem with UTF-8:
I am not using Visual Studio nor Notepad++, so I don't know how Notepad++ handles the conversion. Possibly it does not convert the characters at all, but only changes the file to be seen as UTF-8.
Sublime Text 2 (1), on the other hand, offers you to 'Save with Encoding', which converts all characters accordingly. I stressed this conversion pretty much, so that I can recommend this approach with a clean conscience.
(1) I have no relations to Sublime Text, this is not meant be any form of commercial advertisement

Is there a way to copy text from emacs with faces in Windows?

Every now and then I run into a situation when I need to email a piece of code from emacs. When I paste text into my email program (not emacs), all the color highlighting is lost. This is especially disappointing when pasting from org-mode, which relies heavily on colors for readability. It would be good to preserve font faces.
Is there a way to do this? I am looking for output similar to that of ps-print-buffer-with-faces.
Suppose your email program can handle html, try M-xhtmlfontify-buffer, which converts the contents of the current buffer (with faces) to css-styled html.

Html editor - Text to HTML convertor

I want to convert my text into HTML format, it would be just like this: that I just copy paste the text from word, pdf [with formatting & colors] to the editor and it will convert it into HTML tags, so that when I decode it again it would give me the same format that I have pasted.
I am mostly happy with PageBreeze but sometimes it destroys the formatting.
Are there any other editor suggestions?
Though I think it's a crude solution, you can try using the on-the-fly generated comment below, highlight, view source and copy it or pretty much any of the Rich Text Editor Javascript plugins out there such as RTE, the simplest I could find. (I'm not sure if those preserves copy-pasted formatting)
However, you won't be assured that any formatting (font/color) you get from here will be carried over to your website. In addition to HTML, CSS plays a huge part in styling, especially text-color, highlighting, spacing, etc.
I think in word you can do file >saveas > html
However it's going to be junky and nasty.
Your best option is to learn basic HTML (it really is super easy) and manually do it yourself.

Resources