How can I generate a rich text link for pbcopy - ruby

I've been playing with a script that takes the selected text in Chrome and looks it up in Google, offering the four top choices, and then pasting the relevant link. It is pasted in different formats depending on which page is currently open in Chrome - DokuWiki format with DokuWiki open, HTML with normal websites, and I want rich text for my WordPress WYSIWYG editor.
I tried to use pbpaste -Prefer rtf to see what a rich-text link with no other styling looked like on the pasteboard, but it still outputs plain text. After saving a file in Text Edit, and experimenting, I came up with the following
text = %q|{\rtf1{\field{\*\fldinst{HYPERLINK "URL"}}{\fldrslt TEXT}}}|
text.gsub!("URL", url)
text.gsub!("TEXT", stext)
(I had to use the gsub, because somehow when using %Q and #{} to insert the variables, the string didn't work)
This works, however, when I paste it, there is an additional lineshift before and after the link. What would the string look like to avoid this?

From the shell the clean solution is this:
URL="http://www.google.com/"
NAME="Click here for Google"
echo "<a href='$URL'>$NAME</a>" | textutil -stdin -format html -convert rtf -stdout | pbcopy
So, use the textutil command to convert correct html .. into rtf...
ruby variant:
url = 'http://www.google.com'
name = 'click here'
system("echo '#{name}' | textutil -stdin -format html -convert rtf -stdout | pbcopy")
so, when you run the above without pbcopy part, you'll get:
{\rtf1\ansi\ansicpg1250\cocoartf1038\cocoasubrtf350
{\fonttbl\f0\froman\fcharset0 Times-Roman;}
{\colortbl;\red255\green255\blue255;\red0\green0\blue238;}
\deftab720
\pard\pardeftab720\ql\qnatural
{\field{\*\fldinst{HYPERLINK "http://www.google.com/"}}{\fldrslt
\f0\fs24 \cf2 \ul \ulc2 click here}}}
EDIT: Just tested this on BigSur and working as should. Any HTML is got converted to rtf. Another demo (without variables)
echo '<b>BOLD TEXT</b><br>stackoverflow link<br><h1>big title</h1>' | textutil -stdin -format html -convert rtf -stdout | pbcopy
after pasting into TextEdit yields

One way of doing this is using MacRuby, which is able to directly access the pasteboard through the Cocoa framework, rather than using the command line tool, which gives you more options.
For example, you can use this function to paste in HTML code, including hyperlinks, which will function correctly inserted into TextEdit or a WordPress editing box:
framework 'Cocoa'
def pbcopy(string)
pasteBoard = NSPasteboard.generalPasteboard
pasteBoard.declareTypes([NSHTMLPboardType], owner: nil)
pasteBoard.setString(string, forType: NSHTMLPboardType)
end
This works much better than the command-line pbcopy, in that it definitively avoids adding white-space, and also avoids having to send RTF for rich text, where HTML is much easier to generate programmatically.

macOS's pbcopy command can detect RTF. The following example (using pandoc to convert markdown to RTF), places a rich text snippet in your paste buffer:
echo '**foo**' | pandoc -t rtf -s | pbcopy

Related

Specifying metadata for input formats other than Markdown

Pandoc allows you to include metadata at the beginning of a Markdown document using a header like
---
title: The Song That Never Ends
subtitle: It Goes On and On My Friends
author: Abraham Lincoln
lang: en_US
---
Is there any way to convey this information to Pandoc when the input format is not Markdown? I’m specifically interested in HTML input. I tried calling Pandoc with --from=html+yaml_metadata_block, but this didn’t seem to change the behavior at all—the YAML block is just interpreted as HTML.
(It is possible to include some metadata in the “percent format” shown in the “pandoc_title_block” section of the manual, but there doesn’t seem to be a way to give a separate title and subtitle with that syntax. It’s also possible to include the YAML header before the HTML and to force Pandoc to interpret the input as Markdown, but this seems hacky, and if you try to convert that to “real” Markdown then the output is full of HTML tags instead of Markdown formatting characters.)
You can use the --metadata (short -M) or --metadata-file options to supply metadata on the command line, for example:
pandoc -M title="The Song That Never Ends"
A simple solution would be to use Lua filters to augment the metadata read from the HTML file as described in the Lua filters doc. Below is an updated version:
-- file: additional-metadata.lua
function read_file_as_markdown_yaml (filename)
-- read metadata file into string
local metafile = io.open(filename, 'r')
local content = metafile:read('*a')
metafile:close()
-- get metadata
return pandoc.read(content, 'markdown').meta
end
function Meta (meta)
-- read YAML file and add its content to the metadata
local yaml_meta = read_file_as_markdown_yaml(meta.default_meta_file)
for k, v in pairs(yaml_meta) do
-- use YAML metadata as fallback
meta[k] = meta[k] or v
end
return meta
end
Use with
pandoc --lua-filter additional-metadata.lua \
--metadata default_meta_file:YOUR-FILE-HERE.yaml \
your-input-file.html

How do I highlight the output of grep / ag / ack tools in asciidoc

I'd like the output from asciidoc to look something like it does in the terminal. I could attach a screenshot as image, but I think highlighting the text is preferable.
// asciidoc source file
[source,XXX]
----
/home/neale/.zcompdump:670:4:'mhpath' '_mh'
/home/neale/.zcompdump:855:4:'pmpath' '_perl_modules'
/home/neale/.zcompdump:858:5:'podpath' '_perl_modules'
/home/neale/.zcompdump:1151:7:'tracepath' '_tracepath'
/home/neale/.zcompdump:1152:7:'tracepath6' '_tracepath'
/home/neale/.zcompdump:1482:11:'-value-,*path,-default-' '_directories'
/home/neale/.zcompdump:1483:11:'-value-,*PATH,-default-' '_dir_list'
/home/neale/.zcompdump:1484:23:'-value-,RUBY(LIB|OPT|PATH),-default-' '_ruby'
----
Is there some value of XXX that looks something like this? I have highlighting working for other languages (with local styles) generating pdfs (via a2x)

How to get HTML data out of of the OS X pasteboard / clipboard?

I do have to send a report regarding pasting some clipboard content into a web rich editor and I need a way to dump/restore the clipboard content to (probably) HTML.
How can I do this?
It seems that pbcopy / pbpaste do alway give me text even if I use the pbpaste -P rtf or pbpaste -P HTML
Three years later, in more civilized times, we have Swift. You can write a short Swift script to pull exactly what you need off of OS X's pasteboard.
Put the following Swift 4 snippet into a new text file. I named mine pbpaste.swift:
import Cocoa
let type = NSPasteboard.PasteboardType.html
if let string = NSPasteboard.general.string(forType:type) {
print(string)
}
else {
print("Could not find string data of type '\(type)' on the system pasteboard")
exit(1)
}
Then, copy some html, and run swift pbpaste.swift from the directory where you put that file.
Yay, html! Uggh, OS X added a ton of custom markup (and a <meta> tag?!) — but hey, at least it's not plain text!
Notes:
NSPasteboard.PasteboardType.html is a special global that evaluates to the string "public.html"
Obviously this is html specific, so you'd probably want to either:
Name it pbpaste-html.swift, or
Read the desired type from the command line arguments
It's kind of slow, because it's being interpreted on the fly, not compiled and executed. Compilation gives me a 10x speed-up:
xcrun -sdk macosx swiftc pbpaste.swift -o pbpaste-html
Then just call ./pbpaste-html instead of swift pbpaste.swift.
I realise you've already found this, but for the benefit of people who turn up here from Google, the solution given for RTF data at Getting RTF data out of Mac OS X pasteboard (clipboard) works fine for getting HTML out of the clipboard, too.
That is, the command
osascript -e 'the clipboard as «class HTML»' | perl -ne 'print chr foreach unpack("C*",pack("H*",substr($_,11,-3)))'

dblatex ignore --texstyle or -s command

I want to write an asciidoc document and convert it into a pdf document. However, I want to use a format style different than the default ones. To do so I convert the txt file to docbook using asciidoc and then try to convert the resulting docbook xml to a pdf file using dblatex.
The idea is to set a particular tex style for dblatex to obtain the desired pdf result. I've copied the existing docbook.sty style as it is recommended here to do a small style modification. The only change done in the ./docbook file is \setlength{\textwidth}{18cm} to \setlength{\textwidth}{12cm}. However, when I run the command
dblatex --texstyle=./docbook.sty test.txt
Or the command
dblatex -s ./docbook.sty test.txt
Both produce the same result in the style change: none. I mean, no matter which modification I do to ./docbook.sty file, these modifications are not applied to the output. I obtain always the same result, a pdf with the default formatting. Do you guys have any idea where is the problem?
Thanks in advance.
I would recommend:
Copy the Dblatex docbook.sty to a new filename in your working directory which is "obviously yours" (e.g., mydbstyle.sty).
Continue to supply a full or relative path argument to the --texstyle option (e.g., /path/to/mydbstyle.sty or ./mydbstyle.sty). Failing to do so requires that mydbstyle.sty be in a directory enumerated by the TEXINPUTS environment variable (which you likely have not explicitly set).
Within mydbstyle.sty, use the following directives to initialize your style:
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{mydbstyle}[2013/02/15 DocBook Style]
\RequirePackageWithOptions{docbook}
% ...
% your LaTeX commands here
Pass a DocBook 4.5 XML file as an argument to Dblatex (in your example you are passing test.txt which makes me uncertain whether you're passing an AsciiDoc source file).
dblatex --texstyle=./mydbstyle.sty mybook.xml

Get encoding of a file in Windows

This isn't really a programming question, is there a command line or Windows tool (Windows 7) to get the current encoding of a text file? Sure I can write a little C# app but I wanted to know if there is something already built in?
Open up your file using regular old vanilla Notepad that comes with Windows.
It will show you the encoding of the file when you click "Save As...".
It'll look like this:
Whatever the default-selected encoding is, that is what your current encoding is for the file.
If it is UTF-8, you can change it to ANSI and click save to change the encoding (or visa-versa).
I realize there are many different types of encoding, but this was all I needed when I was informed our export files were in UTF-8 and they required ANSI. It was a onetime export, so Notepad fit the bill for me.
FYI: From my understanding I think "Unicode" (as listed in Notepad) is a misnomer for UTF-16.
More here on Notepad's "Unicode" option: Windows 7 - UTF-8 and Unicdoe
If you have "git" or "Cygwin" on your Windows Machine, then go to the folder where your file is present and execute the command:
file *
This will give you the encoding details of all the files in that folder.
The (Linux) command-line tool 'file' is available on Windows via GnuWin32:
http://gnuwin32.sourceforge.net/packages/file.htm
If you have git installed, it's located in C:\Program Files\git\usr\bin.
Example:
C:\Users\SH\Downloads\SquareRoot>file *
_UpgradeReport_Files; directory
Debug; directory
duration.h; ASCII C++ program text, with CRLF line terminators
ipch; directory
main.cpp; ASCII C program text, with CRLF line terminators
Precision.txt; ASCII text, with CRLF line terminators
Release; directory
Speed.txt; ASCII text, with CRLF line terminators
SquareRoot.sdf; data
SquareRoot.sln; UTF-8 Unicode (with BOM) text, with CRLF line terminators
SquareRoot.sln.docstates.suo; PCX ver. 2.5 image data
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary info
SquareRoot.vcproj; XML document text
SquareRoot.vcxproj; XML document text
SquareRoot.vcxproj.filters; XML document text
SquareRoot.vcxproj.user; XML document text
squarerootmethods.h; ASCII C program text, with CRLF line terminators
UpgradeLog.XML; XML document text
C:\Users\SH\Downloads\SquareRoot>file --mime-encoding *
_UpgradeReport_Files; binary
Debug; binary
duration.h; us-ascii
ipch; binary
main.cpp; us-ascii
Precision.txt; us-ascii
Release; binary
Speed.txt; us-ascii
SquareRoot.sdf; binary
SquareRoot.sln; utf-8
SquareRoot.sln.docstates.suo; binary
SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary infobinary
SquareRoot.vcproj; us-ascii
SquareRoot.vcxproj; utf-8
SquareRoot.vcxproj.filters; utf-8
SquareRoot.vcxproj.user; utf-8
squarerootmethods.h; us-ascii
UpgradeLog.XML; us-ascii
Another tool that I found useful: https://archive.codeplex.com/?p=encodingchecker
EXE can be found here
Install git ( on Windows you have to use git bash console). Type:
file --mime-encoding *
for all files in the current directory , or
file --mime-encoding */*
for the files in all subdirectories
Here's my take how to detect the Unicode family of text encodings via BOM. The accuracy of this method is low, as this method only works on text files (specifically Unicode files), and defaults to ascii when no BOM is present (like most text editors, the default would be UTF8 if you want to match the HTTP/web ecosystem).
Update 2018: I no longer recommend this method. I recommend using file.exe from GIT or *nix tools as recommended by #Sybren, and I show how to do that via PowerShell in a later answer.
# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
$bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)
if(!$bytes) { return 'utf8' }
switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
'^efbbbf' { return 'utf8' }
'^2b2f76' { return 'utf7' }
'^fffe' { return 'unicode' }
'^feff' { return 'bigendianunicode' }
'^0000feff' { return 'utf32' }
default { return 'ascii' }
}
}
dir ~\Documents\WindowsPowershell -File |
select Name,#{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} |
ft -AutoSize
Recommendation: This can work reasonably well if the dir, ls, or Get-ChildItem only checks known text files, and when you're only looking for "bad encodings" from a known list of tools. (i.e. SQL Management Studio defaults to UTF16, which broke GIT auto-cr-lf for Windows, which was the default for many years.)
A simple solution might be opening the file in Firefox.
Drag and drop the file into firefox
Press Ctrl+I to open the page info
and the text encoding will appear on the "Page Info" window.
Note: If the file is not in txt format, just rename it to txt and try again.
P.S. For more info see this article.
I wrote the #4 answer (at time of writing). But lately I have git installed on all my computers, so now I use #Sybren's solution. Here is a new answer that makes that solution handy from powershell (without putting all of git/usr/bin in the PATH, which is too much clutter for me).
Add this to your profile.ps1:
$global:gitbin = 'C:\Program Files\Git\usr\bin'
Set-Alias file.exe $gitbin\file.exe
And used like: file.exe --mime-encoding *. You must include .exe in the command for PS alias to work.
But if you don't customize your PowerShell profile.ps1 I suggest you start with mine: https://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0
and save it to ~\Documents\WindowsPowerShell. It's safe to use on a computer without git, but will write warnings when git is not found.
The .exe in the command is also how I use C:\WINDOWS\system32\where.exe from powershell; and many other OS CLI commands that are "hidden by default" by powershell, *shrug*.
you can simply check that by opening your git bash on the file location then running the command file -i file_name
example
user filesData
$ file -i data.csv
data.csv: text/csv; charset=utf-8
Some C code here for reliable ascii, bom's, and utf8 detection: https://unicodebook.readthedocs.io/guess_encoding.html
Only ASCII, UTF-8 and encodings using a BOM (UTF-7 with BOM, UTF-8 with BOM,
UTF-16, and UTF-32) have reliable algorithms to get the encoding of a document.
For all other encodings, you have to trust heuristics based on statistics.
EDIT:
A powershell version of a C# answer from: Effective way to find any file's Encoding. Only works with signatures (boms).
# get-encoding.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)
begin {
# set .net current directoy
[Environment]::CurrentDirectory = (pwd).path
}
process {
$reader = [System.IO.StreamReader]::new($filename,
[System.Text.Encoding]::default,$true)
$peek = $reader.Peek()
$encoding = $reader.currentencoding
$reader.close()
[pscustomobject]#{Name=split-path $filename -leaf
BodyName=$encoding.BodyName
EncodingName=$encoding.EncodingName}
}
.\get-encoding chinese8.txt
Name BodyName EncodingName
---- -------- ------------
chinese8.txt utf-8 Unicode (UTF-8)
get-childitem -file | .\get-encoding
Looking for a Node.js/npm solution? Try encoding-checker:
npm install -g encoding-checker
Usage
Usage: encoding-checker [-p pattern] [-i encoding] [-v]
Options:
--help Show help [boolean]
--version Show version number [boolean]
--pattern, -p, -d [default: "*"]
--ignore-encoding, -i [default: ""]
--verbose, -v [default: false]
Examples
Get encoding of all files in current directory:
encoding-checker
Return encoding of all md files in current directory:
encoding-checker -p "*.md"
Get encoding of all files in current directory and its subfolders (will take quite some time for huge folders; seemingly unresponsive):
encoding-checker -p "**"
For more examples refer to the npm docu or the official repository.
Similar to the solution listed above with Notepad, you can also open the file in Visual Studio, if you're using that. In Visual Studio, you can select "File > Advanced Save Options..."
The "Encoding:" combo box will tell you specifically which encoding is currently being used for the file. It has a lot more text encodings listed in there than Notepad does, so it's useful when dealing with various files from around the world and whatever else.
Just like Notepad, you can also change the encoding from the list of options there, and then saving the file after hitting "OK". You can also select the encoding you want through the "Save with Encoding..." option in the Save As dialog (by clicking the arrow next to the Save button).
The only way that I have found to do this is VIM or Notepad++.
EncodingChecker
File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify.
File Encoding Checker requires .NET 4 or above to run.

Resources