I have a pdf file that is scanned from a hard copy . Therefore the pdf file has an image of the hardcopy . Now when I try to convert the pdf into word , I dont get an editable document , rather I get an image sitting on the word document . Is there any way I can make a editable word document out of it ? Any Software program or something which will help me do that ?
It's called optical character recognition OCR
There are lots of software packages that do this - to do this in a program try http://code.google.com/p/tesseract-ocr/
Related
I have an image that I'm trying to embed in the markdown text of the document. When I click on the "Insert image" icon and navigate to my image, it inserts an ENORMOUS block of text. I copy/pasted into a text file and it's over 287,000 characters long. It makes the notebook obnoxious to work with. My questions are:
Is there a way to redact this somehow?
Why on earth would the path be so long?
I've searched for workarounds (like this one, but I can't find any that work for me.
As in the title - imagine there is some Gimp .xcf file containing many layers. Part of these layers contain text. Is there any format I can export .xcf file to, that it somehow preserve 'human readable' text ?
The final goal is to process that text and put it again into the file, I am aware that this sounds unusual but maybe some of you have an idea how to achieve scenario like that.
I did some research and I saw I can export image to .psd format and then using NPM package process that image and extract text. This is just partially solves the problem, because I will not know how to put the processed text back into this .psd file (unless I decompile this NPM package and try to write some implementation myself...)
Any solutions and alternatives higly appreciated
You can script Gimp (using Scheme or Python). Technically you cannot change the text in a layer (there is no API for that), but you can recover the characteristics of a text layer (original text, font type, font size...) and recreate a new layer with a new text. Here is some Python code to recover the text information:
def text_info(img,layer):
parasites=None
try:
parasites=layer.parasite_list()
except Exception as e:
pass;
if parasites and 'gimp-text-layer' in parasites:
data=layer.parasite_find('gimp-text-layer').data
pdb.gimp_message('Text layer "%s": %s' % (layer.name,data))
else:
pdb.gimp_message('No text information found for layer "%s"' % layer.name)
(this information is only present of the file has been saved, it is not available on a newly created layer, but this shouldn't bea problem in your case)
Of course if the text is in a plain bitmap layer of its own this cannot be done, you have to guess the font type & size (but sometimes the code above can still recover the text information)
But if your XCF has a simple structure, it can be a lot simpler to decompose it into individual images, and build a new image with ImageMagick, using some of these layers plus new text images (or directly rendered text).
I created RTF Template from MS Word .I have problem which is wrapping of text in output excel cell.
Data gets wrapped in the output cell but full data is not visible when I open the xls file.
I tried :
-Uncheck Wrap text.
-resize width column .
-Check fit text.
-Check Automatically resize to fit content.
but it didn't work . Can anyone help me find what the problem is?
Regards ,
Mint
If you don't have any formatting requirements, and you just need to export date into something Excel can easily work with, try e-text templates. Use the comma separated version, not the fixed width.
I would like to know if there is any way to just take our relevant data from a pdf file. Suppose we have something like this Name:John, so we can some how automate to take just this field value in order to store it somewhere like a predefined database or file?? Thanks.
Use pdftotext to extract text content from your pdf file. Then parse the text file with your favorite programming language.
If your pdf doesn't contain real text, just images of text, you will need to use an optical character recognition software to extract the text.
I have question about creating a binary / raw image file.
I've made an image in photoshop and now I want to load that in a C program.
I followed this tutorial http://www.nullterminator.net/gltexture.html but I don't know how to convert my own image to a .RAW file.
Can anyone help me out with this question?
Download ImageMagick, it is a command-line utility that can convert images into all kind of ways. And it supports many platforms.
So you could save your file in Photoshop to for example a PNG file, then run the following command to convert it to raw grayscale 8 bit:
convert MyImage.png -depth 8 gray:MyRawImage.raw
Try Save As and selecting RAW from the format drop-down. Humorously, it is listed under "P" for Photoshop RAW.
I don't know if it is the right kind of RAW which GLUT requires...
This worked for me, with that tutorial code:
Open your image in GIMP
Go to Layer/Transparency/Remove Alpha Channel. If it's already removed the option will be greyed out, which is fine. If you have
multiple layers, do this for all of them. (You MUST remove the alpha channel or else GIMP will write RGBA instead of RGB, and you'll just see a repeating lattice pattern instead of your image.)
File/Save As... and at the very bottom of the save popup, there's an option "Select File Type (By Extension)" with a (+). Expand it.
Select Raw image data
At the top of the save popup, manually give your file a .raw extension and save. Click OK to accept the default options.
Then you should be able to save it, move it to your program's directory, and read it in with the code from that tutorial.
Also, to save yourself another headache source, I suggest adding an error message if the file isn't found, like replace the line
if ( file == NULL ) return 0;
with
if ( file == NULL ){
printf("texture file not found.");
return 0;
}