Text Extraction with bold and italics identification - vb6

I want to extract text from pdf with bold and italics identifiction. for example bold letters need t be extracted like this.<b>TEST</b> and italics must be enclosed like <i> test </i>
Currently i am using texttopdf.exe to extract text..the accuracy was good.but not able to identify bold italics.
any one have another idea or the same pdftoexe having the feature?
Thanks in Advance

Related

RUBY plain text to Docx with specific formatting

I regularly have to produce word documents that are pretty standard. The content changes regarding certain parameters, but it's always a mix of pre-written stuff. So I decided to write some ruby code to do this more easily and it works pretty well on creating the txt file with the final text I need.
The problem is that I need this text converted to .docx and with specific formatting. So, I'm trying to find a way to indicate in the text file which text should be bold, italic, have different indentation, or be a footnote, to make it easy to interpret (like html does). For example:
<b>this text should be bold</b>
\t indentation works with the tabs
<i>hopefully this could be italic</i>
<f>and I wish this could be a footnote of the previous phrase</f>
However, I haven't been able to do this.
Does anybody know how this can be achieved? I've read about macros and pandoc, but haven't had any luck achieving this. Seems too complicated for macros. Maybe what I'm trying is not the best way. Perhaps with LaTeX or creating html and then converting to word? Can html create footnotes? (that seems to be the most complicated)
I have no idea, I just learned Ruby with a video tutorial, so my knowledge is very limited.
Thanks everybody!
EDIT: Arjun's answer solved almost the whole issue, but the gem he pointed out doesn't include a funcionality for footnotes, which unfortunately constitute a big part of my documents. So if anybody knows a gem that does, would be greatly appreciated. Thanks!
Ahh Ruby got gems for that ;)
https://github.com/trade-informatics/caracal
This would help you to write docs from Ruby code itself.
From the Readme
docx.p 'this text should be bold' do
style 'custom_style' # sets the paragraph style. generally used at the exclusion of other attributes.
align :left # sets the alignment. accepts :left, :center, :right, and :both.
color '333333' # sets the font color.
size 32 # sets the font size. units in 1/2 points.
bold true # sets whether or not to render the text with a bold weight.
italic false # sets whether or not render the text in italic style.
underline false # sets whether or not to underline the text.
bgcolor 'cccccc' # sets the background color.
vertical_align 'superscript' # sets the vertical alignment.
end
There is also this gem, https://github.com/nickfrandsen/htmltoword, which converts plain html to doc files. I haven't tried it though.

Graphviz bold font attribute

I would like to change font attribute as the example below
Is there any way to change the font of just few words inside a label instead of using the global attribute change fontname="times bold italic" ? I need to convert to png.
You may use HTML-like labels in graphviz and define labels with partially bold text:
mynode [label=<<FONT FACE="boldfontname">bold text</FONT>>]
Or use the <B> tag:
mynode [label=< <B>bold text</B> regular text >]
If you are really desperate, you could also copy&paste bold unicode strings into your graph description, for example using the following website:
https://lingojam.com/BoldTextGenerator
Apparently, this too does not work in all setups.

text highlight in markdown

Within a Markdown editor I want to support text highlight, not in the sense of code highlighting, but the type of highlighting people do on books.
In code oriented sites people can use backquotes for a grey background, normally inline code within a paragraph. However on books there is the marker pen for normal text within a paragraph. That is the classical black text on yellow background.
Is there any syntax within Markdown (or its variants) to specify that the user want that type of highlight? I want to preserve the backquotes syntax for code related marking, but also want a way to enable highlighted user text
My first thought is just using double backquotes, since triple backquotes are reserved for code blocks. I am just wondering if other implementations have already decided a syntax for it... I would also appreciate if someone could justify if this is a very bad idea.
As the markdown documentation states, it is fine to use HTML if you need a feature that is not part of Markdown.
HTML5 supports
<mark>Marked text</mark>
Else you can use span as suggested by Rad Lexus
<span style="background-color: #FFFF00">Marked text</span>
I'm late to the party but it seems like a couple of markdown platforms (Quilt & iA Writer) are using a double equal to show highlighting.
==highlight==
Typora is also using double equal for highlighting. It would be nice it that becomes a CommonMark standard, as mentioned by DirtyF. It would be nice for those who use it frequently, since it is only 4 repeated chars: ==highlight==
If you want the option to use multiple editors, it may be best to stick with <mark>highlight</mark> for now, as answered by Matthias.
Here is the latest spec from CommonMark, "which attempts to specify Markdown syntax unambiguously". Currently "highlighting" is not included.
Editors using ==highlight== from comments mentioned previously:
Typora
Obsidian
Quilt
IA Writer
Feel free to add to this list.
You can use the Grave accent (backtick) ` to highlight text in markdown
Highlighted text
Also works with VS Code extension markdownlint
Grey-colored Higlighting Solution
A possible solution is to use the <code> element:
This solution works really well on git/github, because git/github doesn't allow css styling.
OBS!:
Using the code-element for highlighting is not semantic.
However, it is a possible solution for adding grey-colored highlighting to text in markdown.
Markdown/HTML
<code> <i>This text will be italic</i> <b>this text will be bold</b> </code>
Output
This text will be italic this text will be bold
Roam markdown uses double-caret: ^^highlight^^. Andrew Shell's answer mentions double-equals.
The accepted and clearly correct answer is <mark> from Matthias above, but I thought I had seen carets in some other flavor of markdown. Maybe not. I want to transform my ^^highlights^^ to <mark>highlights</mark> in pandoc conversion to html, and somehow ended up here...
Probably best bet is just use html e.g
<pre><b>Hello</b> is higlighted</pre>
Hello is higlighted
Remember nearly all html is valid in markdown too.

Convert markdown backticks to asciidoc

I'm switching over from markdown to asciidoc and have a question. In my markdown file, I use backticks to indicate code font (foo.bar()). When this is converted to html, the text gets placed inside code blocks (foo.bar()).
How should I format a text fragment in asciidoc if I want it to appear within code blocks when the document is converted to html?
Late and short answer:
You can use backticks just like in Markdown.
From the AsciiDoc User manual:
Monospaced text
Word phrases +enclosed in plus characters+ are rendered in a monospaced font. Word phrases `enclosed in backtick characters` (grave accents) are also rendered in a monospaced font but in this case the enclosed text is rendered literally and is not subject to further expansion
As you can see here, you can use `foo.bar()` the same way in asciidoc.
Here's an example of that:
* Utilizar as funções `fgetc` ou `getc` para ler carácteres (...);

Syntax Highlighting for plain text (Sublime Text)

I am a great fan of syntax highlighting in any form. But i am missing something similar for plain text files. Imagine different colors for indented lines or lines preceded by special chars. Does anything like that already exist? I'd especially appreciate a plugin for Sublime Text.
The closest thing I know of is the PlainTasks plugin:
It's a plugin to make styled TODO lists, but what you see in the screenshot is basically it.
You could modify the Markdown or reStructuredText files to actually color the text.

Resources