Add text layer to PDF of scanned handwritten notes in OSX

Add text layer to PDF of scanned handwritten notes in OSX - macos

While in class I like to take handwritten notes, afterwards I scan them and then type them up (helps me remember them and also makes them easily searchable). The main issue is I have is I use A LOT of drawings and complex math and converting the math formulas into latex (or word) is very time consuming and the drawings require that I keep the PDF and the text document. What I would like to do is take the basic text that I have typed myself (no OCR) and add a text layer to the PDF's that way the PDF's will be searchable and I can save a lot of time by not converting the math or drawings.
I've looked into Preview, PDFpenPro, acrobat, a couple of linux programs but so far I haven't really found anything that will do this.
Any idea of how I could do this or a program to use?

I also scan my notes. Sometimes I go back and add some text to them using this technique:
Open up the scanned pdf in Preview, then click on the "Edit" button in the top right corner, then the "Text tools" button on the left side (its a little box with Aa in it). From there you can drag open a text box and type into it.
Now the secret trick is that if you save it here as it is and try to open it in your ipad using PDFExpert or some other program then the text might not be there. So here's how to go through that slight hiccup: After you've annotated your notes how you want instead of just saving it as a pdf, use the Print option: File->Print or Command+P. Now click the PDF button on the left to "Save it as a pdf". Now that its printed you can open it and search it in any program that reads pdfs. Attached is an example.
One other thing, it seems like maybe you want to write over your existing handwritten text with typed text? I'm not sure if this is the best way. But if that's what I was trying to do I would:
Scan my notes
Read through them, typing them up as you said
Open the scanned notes in Photoshop or some other program
Draw a giant White Fill White Stroke rectangle over the handwritten text
Save it as a pdf
Do the technique above and copy and paste the typed text from step 2.
I hope this helps. And I wish you luck, I'm still working out the kinks myself for scanned notes but the possibilities have me pretty excited!
EDIT: I just checked out PDFpenPro, which I highly recommend because you don't have to go through that printing trick, you can just save the pdf document after annotating and other programs will recognize the annotations.

Related

Replacing fonts in Powerpoint view does not replace font

I have a PowerPoint template. When this template was passed off It included some special fonts that I needed to remove because it was throwing warnings when users opened them up.
When I use the "replace fonts" feature it does not remove the font. I deal a lot with the XML properties of these templates because some of the content is generated dynamically when a report is run. I can still see in the slides the font is present
<a:buFont typeface="Poppins"/> the other is <a:buFont typeface="Noto Sans Symbols"/>
Which both appear to be bullet list fonts? There are no lists in the view though...
Removing it from the XML itself is not an option because when I update the template again it will override that and given that doesn't happen often I will have forgotten all about this. I need to fix this in the template so I can then export it out.
I have edited all the text I can see to either Ariel or Calibri but this Poppins font is still in there and I have no idea how to get it out.
Specifics are
Powerpoint version is 16.36
The program is actually Powerpoint for Mac (if that matters)
If anyone solved a similar issue and can give me some direction it would be much appreciated.

The buFont tag means that font is being used for a bullet rather than actual text. Probably a text level somewhere uses a custom bullet specced with this font. Each content or text placeholder can have up to 9 text levels, you may hove to create 9 levels using Home>Indent More to find the right one.
Start with the Slide Master (View>Slide Master>the larger thumbnail at the top). Then check each placeholder on each Layout (smaller thumbnails below the Master). Finally, check each multilevel placeholder on each slide, in case this was added with local formatting.
My go-to technique is to unzip the presentation into the XML files and do a find and replace on them. That's the quickest way to replace fonts, which can be tucked away in all kinds of obscure places in a presentation. On a Mac, this takes a bit of preparation to avoid problems caused by the OS. If you regularly create PowerPoint files, it may be worth it to set this up. Here's my article on this: OOXML Hacking: Editing in macOS. Look for the part about using a USB or network drive that is set to not create hidden .DS_Store files. Then use a text editor like BBEdit to do multi-file find and replace operations on the font name.

I have PowerPoint 16.39 on my MacBook Pro. Try to click on PowerPoint in the upper left. Then Preferences, then the Save icon. At the bottom you'll have Font Embedding. If you un-check this option, it should not save fonts to the template anymore.

Extract text from rectangle on Windows screen without using OCR

Given a rectangle that represents an area on a Windows screen that contains text, what is the best way to extract the text?
I know that it is possible using OCR, but even after significant pre processing, the quality is really poor.
Getting the Window Text using Win32 API does not always work as well.
Assuming that the text was rendered using a font, is it possible to get it from there?
Any directions would be extremely helpful. Thanks!

Given a rectangle that represents an area on window screen, the best way to extract text is indeed OCR. Use a better OCR library like this one from Microsoft.
The reason getting the window text using Win32 API does not work well is because there may be multiple windows in that rectangle. You will have to find out what all windows the rectangle contains and send a message to get the text for each window. It is not impossible but difficult to do and even if you manage to do that, you will run into issues of text alignment, etc. OCR is your best option.

It does seem possible without using OCR, as NirSoft SysExporter can do this:
https://www.nirsoft.net/utils/sysexp.html
This may be suitable for programmatic use as it can be run from a command line:
Starting from version 1.70, you can export the content of Windows
control from command-line, without displaying any user interface.
You may not be able to target it at a specific rectangle on the screen, but maybe the same result could be achieved by first scraping everything followed by some post-processing.
Further basic info:
SysExporter utility allows you to grab the data stored in standard
list-views, tree-views, list boxes, combo boxes, text-boxes, and
WebBrowser/HTML controls from almost any application running on your
system, and export it to text, HTML or XML file.
...
Known Limitations
SysExporter can export data from most combo boxes, list boxes,
tree-view, and list-view controls, but not from all of them. There are
some applications that use these controls to display data, but the
data itself is not actually stored in the control, but in another
location in the computer's memory. In such cases, SysExporter won't be
able to export the data.
Personally I've used it to grab text from what look like label controls.

exist-db how to access a pdf

I am sure it is very simple ... I just cannot get my head around this...
the exist-db Documentation is a bit fuzzy on content extraction...
http://exist-db.org/exist/apps/doc/contentextraction.
I have a pdf-file, containing of about 162 high-res images (the pdf is quite big ...) and I do not know how to access any of the that are presumably created ...
please do not destroy me! I am just starting to build a database (for an Edition at Uni)I'd love to have a facsimile edition (so one Tab with the image-file and one tab with the transcribed texts)
I aim at doing something similar to what Heidelberg Universitdy did with the "Welsche Gast Digital" http://digi.ub.uni-heidelberg.de/diglit/cpg389/0190/image
(the choosen image is just an example! )
This pic
When clicking on faksimile the Scan opens and when clicking on Transkription the transcribed texts open!
I am quite new to Xquery, Xpath and most X-related stuff. I have a "working design" put together in exist-db and am looking at TEI for marking up the transcritpion etc, I fear I'll have to spend quite some time on this issue ...
(it is not about doing my job for me, it's just about pointing me in the right direction)

I m afraid the short answer is simply don't.
Storing a pdf in your db, and then trying to extract images from it, is kind of a recipe for disaster. Instead you should use the source images (not necessarily extracted from the pdf), and store these individually in a collection (e.g. resources/img). Those image files are then the binary resources that the documentation is actually talking about.
You might want to take a look at tei-publisher for creating digital edition in exist, especially this demo app for how to present high-res facsimiles with transcribed portions of text. I m afraid its all a bit more involved then just opening a pdf in a browser, but so is the Welsche Gast Digital

Generating Powerpoint or Keynote from XML (or via a Ruby gem?)

I'm looking for a nice way to generate either a Keynote file from XML or a Powerpoint file that I can then import to Keynote. Basically, I'm looking for a simple human-writable markup format (for easy scripting) that can be exported into slides.
I volunteer with a local nonprofit, where anything remotely technical falls to me. On a fairly regular basis, I'm sent information for events and produce a nice looking printed program in Word, though much of the same material also goes into slides in Keynote. (Keynote is used rather than PowerPoint so that Keynote Remote can be used.)
Anyway, there's a large volume of text I work with that I'm sent via email, and it has to go in both a Keynote presentation and a Word document, and requires all sorts of odd manual formatting to not break pages or slides at odd times, also requiring a good deal of manual restyling, since I'm not going to allow something I do to come out looking like something sloppy from the 1990s.
My hope is to write up a Ruby script that I can feed the source text to, and it'll go do all the processing for me, at least for Powerpoint or Keynote. I've normally had fantastic luck finding a gem for just about any format or service I've wanted to work with, but I haven't found anything that works with Powerpoint or Keynote.
My next thought was to have the Ruby code generate appropriate XML since both Office and I Work allegedly open the Office XML format, but I couldn't find any actual friendly documentation for human-writable XML code.
Is it wishful thinking to want to be able to do something like the following?
<SLIDE FORMAT="Title & Bullets">
<SLIDE_TITLE>
Lorem Ipsum
</SLIDE_TITLE>
<PARAGRAPH>
[etc.]
All I can find as far as converter scripts is all related to charts and tables and such which is of zero use here), usually revolves around opening or converting FROM Powerpoint or Keynote rather than creating, and furthermore generally seems to be for Windows using OLE or VBScript. This needs to run on the Macs they have there, so no Visual Studio stuff, Windows related scripting, etc will work. I don't HAVE to do it in Ruby, but that's what I'd be most comfortable with on the Mac end of things.
So is there documentation out there on a marginally friendly XML format for Powerpoint or Keynote, or even better, a Ruby gem for either?

If all you need to do is title + bullet point slides, you simply need to create an ascii text file. Each line of text will become the title of a new slide. But if the first character in a line of text is a tab, the line will become a first level bullet point on the same slide as the previous title. If two tabs, it indents the text to a second level bullet point and so on.
This becomes the title on slide one
This becomes the title on slide two
<tab>This is a bullet point, first level
<tab><tab>And this is a bullet point, second level
<tab>Back to first level bullet point
And another new slide
Once you have the text file, you can do File Open in PPT and force files of type to all files . and select your .TXT file. Or you can use Insert Slide From File to bring the .TXT file into an existing presentation.
There's a limit to the number of slides you can create at one go like this; 100 perhaps?
Note also that VBA disappeared in Mac Ofice 2008 but is back in Mac Office 2011, so if you can find examples of VB/VBA code that do what you want, you can use them on Mac, so long as it doesn't have to happen in Office 2008.

Why alt attribute shows for a split second in Firefox?

I'm working with Course Management System Moodle and in the admin the folder tree (which uses folder icons) displays for about a second the alt attribute given (In this case "Open Folder") then it hides and shows the image when the image is ready.
The system is kind of slow so I assume Firefox thinks at first that the images don't exist.
This is a problem because during that split second the layout stretches to fit the wider words making it look unprofessional in my opinion.
Is there a way I can hide this tag without having to remove the alt tags? (which would be labor intensive) maybe using JQUERY or CSS.

displays for about a second the alt attribute given (In this case "Open Folder") then it hides and shows the image when the image is ready.
Yes, that's what alt text is for: it provides a textual alternative for when the image isn't available — whether that's because there's an error, or images are turned off in the browser settings, or, in this case, the file just hasn't arrived yet.
Is alt text really what you want? Unless the image in question actually contains the words “Open Folder”, the above is inappropriate alt text. If we're talking about one of those little plus/minus icons that opens a tree, a better alt text would be ‘+’. “Open folder”, as a description of what the image does (as opposed to what it contains), would be better applied to the ‘title’ attribute used for tooltips.
Note that if you're using Quirks Mode and the image has a fixed size specified, Firefox will use a ‘broken image’ icon with the alt text overlaid and cropped inside, instead of the plain alt text on its own. This is to match IE's old behaviour. But you don't really want to use Quirks Mode, and in the common case where the fixed size is small, the cropping makes the alt text unreadable and useless.
This is a problem because during that split second the layout stretches to fit the wider words making it look unprofessional in my opinion.
I'd recommend: getting over it. That's how the web rolls, any page can move about a bit as it renders progressively. For images you should only ever see it happen once, then the image will be cached and will appear straight away. If it doesn't, there's something wrong with the cacheing setup.
Depending on what kind of layout you are talking about, you can perhaps fix that to not respond to the changing image size, too. For example if using a table, setting “table-layout: fixed” on the table and “width: (some number of)px” on the top row's image cell will make it stick to that width even if the text inside is smaller. Possibly causing the alt text to run over into the next cell though, mind.

If the images are part of the layout, I'd recommend moving them to CSS. You should also optimize your images wherever possible whether they are CSS or otherwise. You could also move your JavaScript files to the bottom of the page where possible as they block parallel downloads. In general, applying a lot of the techniques here would probably help.

If the images have to be a certain width, give them an explicit width.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio