I need to parse all data from Character Viewer on Mac, how can I do it? Or where is it stored?
I need this format:
☀︎
BLACK SUN WITH RAYS
Unicode: U+2600 U+FE0E, UTF-8: E2 98 80 EF B8 8E
☼
WHITE SUN WITH RAYS
Unicode: U+263C, UTF-8: E2 98 BC
and so on.
Thanks!
In OS X El Capitan (Version 10.11.6), the "Character Viewer" data can be found inside the package of the "Character Palette" system application located at /System/Library/Input Methods/CharacterPalette.app, in the SQLite database file: /System/Library/Input Methods/CharacterPalette.app/Contents/Resources/CharacterDB.sqlite3.
You can use an appropriate application (such as DB Browser for SQLite) to open the database file and export its main table to a file in CSV format, then extract the data by yourself.
In JavaScript (Node.js), provided you already know how to read the file lines, that would be something like:
let lines =
[
"☼ WHITE SUN WITH RAYS|||||||||||||||",
"☀︎ BLACK SUN WITH RAYS|||||||||||||||",
"☀️ BLACK SUN WITH RAYS|||||||||||||||",
"☀ BLACK SUN WITH RAYS|||||||||||||||"
];
for (let line of lines)
{
let fields = line.split ('\t');
let characterSequence = fields[0];
let name = fields[1].split ('|')[0];
let codePoints = Array.from (characterSequence).map (char => "U+" + char.codePointAt (0).toString (16).toUpperCase ().padStart (4, "0")).join (" ");
console.log (characterSequence, name, "Unicode:", codePoints);
}
Notes:
The name of the main table (unihan_dict) is somehow misleading, but it contains data for all non-Unihan characters as well, with minimal information though.
The Unicode character codes are not stored in the database file, since this would be redundant, but they can be easily computed.
Related
I have an Epson CW-C6000 that I'm trying to control with ESC commands. I've gotten text to print, so I know I have the IP address, port, etc correct but cannot for the life of me get an image printed.
Here is my code (running from a Ruby on Rails server, with most of the image truncated):
streamSock = TCPSocket.new( "X.X.X.X", 9100 )
str = "~DYR:PRODIMG,B,P,183208,0,89504E470D...4AE426082" + "^XA" + "^FO150,150^IMR:PRODIMG.PNG^FS" + "^XZ"
streamSock.send( str , 0)
streamSock.close
The image is a .png I converted to hexadecimal with this site:
http://tomeko.net/online_tools/file_to_hex.php?lang=en
I'm mostly using page 10 of this PDF for reference:
https://files.support.epson.com/pdf/pos/bulk/esclabel_apg_en_forcw-c6000series_reve.pdf
Does anyone have a hint? Epson support staff was spectacularly unhelpful.
Also I'm sorry if my formatting is bad; I'm new here and will happily edit my post if something is wrong.
Alright I finally got it working. The command for printing a color .PNG is this:
~DYE:[Image Name].PNG,p,p,[Image Size],0,:B64:[Base64 String]:[CRC]
Things that tripped me up:
-You seem to need the .PNG extension on the file name, even though the Epson manual doesn't show that.
-[Image Size] is the number of characters in the Base64 string, even though the Epson manual says it should be the size of the original .PNG image file. If this is wrong the printer will hang and no longer accept input of any kind until restarted.
-There may be other options, but I could only get it working with a CRC of the hex CRC-16/XMODEM type.
Thanks to K J for his/her suggestions and coming along with me!
Perhaps this material can be used as an additional reference.
They seem to have a completely different command/data format than ESC/POS.
ESC/Label Command Reference Guide
Page 12
1.3.4 About Saving the Graphics and Label Formats in the Printer
With ESC/Label command, you can save graphics and label formats in the printer. The printer has a file system. Data saved in the printer is handled as files and is managed in the following way.
The file system does not have a hierarchy.
The printer has a non-volatile saving device, such as Flash ROM, and a volatile saving device, such as RAM, and different drive letters are allocated for each device.
Files are designated as
"<drive letter> colon <:> <file name> dot <.> <extension>".
Page 40-41
2.8 Printing Graphics
...Details have been omitted. Please refer to the actual document...
2.8.1 Registering a Graphic in a Printer and Printing It
...Pick up some from the content. Please refer to the actual document...
Delete the files that remain in the printer (^ID command).
Register the graphic in the printer (~DY command).
When registering a color graphic, you can use the PNG format. When registering a monochrome graphic, you can register the PNG format or the GRF format.
PNG format Monochrome and color graphics
GRF format Monochrome graphics
The reason to execute the step 1.
To ensure capacity of the storage memory necessary for print which application will perform.
2.8.2 Embedding a Graphic in the Field and Printing It
...Details have been omitted. Please refer to the actual document...
In Addition:
Page 104-106
~DY
[Name]
Save File
[Format]
~DY d: o ,f ,x ,t ,w ,data
...A table detailing the parameters is due, but omitted...
[Function]
...Further detailed explanations and figures of functions and parameters are due, but omitted...
Graphic data is handled as follows.
If the data format is binary, you can use any binary data as Parameter data. At this time, the size of Parameter data must be matched to the size specified in Parameter t.
If the data format is a hexadecimal character string, one character from 1. to 3. below is used as Parameter data. At this time, the size of Parameter data written in binary must be matched to the size specified in Parameter t.
0 to 9, A to F, and a to f in ASCII can be used as hexadecimal graphic data.
ASCII comma <,>, the parameter separator character, is used to separate lines. If a comma is input, processing is carried out as if ASCII 0 was input for the remainder of the line.
G to Y and g to z in ASCII can be used as repetition characters. For example, if I9 is input, processing is carried out as if 999 were input. The following table indicates the number of repetitions.
...Characters and repeat specified number of times table omitted...
Looking at the contents of this Technical Reference Guide, it seems that you can register images with tools instead of commands.
CW-C6000/C6500 Series Technical Reference Guide
Page 173-174
And page 288 outlines the Epson Inkjet Label Printer SDK and also describes the existence of sample programs.
#Farmbot26. I have been attempting this same using vb.Net and as you noted Epson support is not helpful. I'm not sure if it's the actual image data that is wrong, CRC, or the ZPL code as nothing helps. Here's 2 examples that have not worked.
`Dim binaryData As Byte() = System.IO.File.ReadAllBytes(txtPNGFile.Text)
zplImageData = Convert.ToBase64String(binaryData)
crc = calcrc(binaryData, binaryData.Length).ToString("X4")
Dim zplToSend As String = "~DYE:" & Path.GetFileName(txtPNGFile.Text).ToUpper & ",P,P," & zplImageData.Length & ",0,:B64:" & zplImageData & ":" & crc & "^XZ"`
`Dim binaryData As Byte() = System.IO.File.ReadAllBytes(txtPNGFile.Text)
crc = calcrc(binaryData, binaryData.Length).ToString("X4") 'Calculate CRC
zplImageData = BitConverter.ToString(binaryData).Replace("-", "")
Dim zplToSend As String = "~DYE:" & Path.GetFileName(txtPNGFile.Text).ToUpper & ",A,P," & zplImageData.Length & ",0,:B64:" & zplImageData & ":" & crc & "^XZ"`
This is the CRC example I have.
`Function calcrc(ByVal data() As Byte, ByVal count As Integer) As Integer
Dim crc As Integer = 0
For Each b As Byte In data
Dim d As Integer = CInt(b)
crc = crc Xor (d << 8)
For j = 0 To 7
If ((crc And &H8000) <> 0) Then
crc = (crc << 1) Xor &H1021
Else
crc = (crc << 1)
End If
Next
Next
Return crc And &HFFFF
End Function`
I have figured out another solution. Save the PNG Image using the Binary data. I found this when reading the Saved Backup file of Image data using the Epson Settings Utility.
~DYE:FILENAME.PNG,B,P,BINARYFILESIZE,0, BINARYIMGDATA
` Try
Dim binaryData As Byte() = System.IO.File.ReadAllBytes(txtPNGFile.Text)
Dim client As System.Net.Sockets.TcpClient = New System.Net.Sockets.TcpClient()
client.Connect(IP_TextBox1.Text.Replace(" ", ""), txtPort.Text)
Dim writer As System.IO.StreamWriter = New System.IO.StreamWriter(client.GetStream(), Encoding.UTF8)
Using mStream As New MemoryStream(binaryData)
Dim zplToSend As String = "~DYE:" & Path.GetFileName(txtPNGFile.Text).ToUpper & ",B,P," & mStream.Length & ",0,"
writer.Write(zplToSend)
writer.Flush()
mStream.WriteTo(client.GetStream())
writer.Flush()
End Using
writer.Close()
client.Close()
MsgBox("Send Complete", MsgBoxStyle.OkOnly, "Complete")
Catch ex As Exception
MsgBox(ex.Message.ToString, MsgBoxStyle.OkOnly, "ERROR")
End Try`
You can also open the image file in an IMAGE object and resize it as needed. I had to do this for the label size of the printer.
Further to my question here I'm writing a list of hex colours to a binary file from within Photoshop using Extendscript. So far so good.
Only the binary file written with the code below is 119 bytes. When cut and pasted and saved using Sublime Text 3 it's only 48 bytes, which then causes complications later on.
This is my first time in binary land, so I may be a little lost. I suspect it's an either an encoding issue (which could explain the 2.5 file size), or doing something very wrong trying to recreate the file in a literal, character for character sense. *
// Initially, my data is a an array of strings
var myArray = [
"1a2b3c",
"4d5e6f",
"a10000",
"700000",
"d10101",
"dc0202",
"c30202",
"de0b0b",
"d91515",
"f06060",
"fbbaba",
"ffeeee",
"303030",
"000000",
"000000",
"000000"
]
// I then separate them to four character chunks
// in groups of 8
var data = "1a2b 3c4d 5e6f a100 0070 0000 d101 01dc\n" +
"0202 c302 02de 0b0b d915 15f0 6060 fbba\n" +
"baff eeee 3030 3000 0000 0000 0000 0000";
var afile = "D:\\temp\\bin.act"
var f = new File(afile);
f.encoding = "BINARY";
f.open ("w");
// f.write(data);
// amended code
for (var i = 0; i < data.length; i++)
{
var bytes = String.fromCharCode(data.charCodeAt(i));
f.write(bytes);
}
f.close();
alert("Written " + afile);
* ...or it's the tracking on my VHS.
I'm rubbish at JavaScript but I have hacked something together that will show you how to write 3 bytes of hex to a file in binary. I hope it is enough for you to work out how to do the rest!
I saved this file as /Users/mark/StackOverflow/AdobeJavascript.jsx
alert("Starting");
// Open binary file
var afile = "/Users/mark/StackOverflow/data.bin"
var f = new File(afile);
f.encoding = "BINARY";
f.open ("w");
// Define hex string
str = "1a2b3c"
for(offset=0;offset<str.length;offset+=2) {
i = parseInt(str.substring(offset, offset+2), 16)
f.write(String.fromCharCode(i));
}
f.close();
alert("Done");
If you dump the data.bin you'll see 3 bytes:
xxd data.bin
00000000: 1a2b 3c
You can write more of your values by simply changing the string to:
str = "1a2b3c"+ "4d5e6f"+ "a10000";
I also discovered how to run ExtendScript from a shell script in Terminal which is my "happy place" so I'll add that in here for my own reference:
#!/bin/bash
osascript << EOF
tell application "Adobe Photoshop CC 2019"
do javascript "#include /Users/mark/StackOverflow/AdobeJavascript.jsx"
end tell
EOF
The corresponding reading part of this answer is here.
I am working with Graphchi's pagerank example: https://github.com/GraphChi/graphchi-cpp/wiki/Example-Apps#pagerank-easy
The example app writes a binary file with vertex information that I would like to read/convert to a plan text file (to later call into R or some other language).
The documentation states that:
"GraphChi will write the values of the edges in a binary file, which is easy to handle in other programs. Name of the file containing vertex values is GRAPH-NAME.4B.vout. Here "4B" refers to the vertex-value being a 4-byte type (float)."
The 'easy to handle' part is what I'm struggling with - I have experience with high level languages but not C++ or dealing with binary files. I have found a few things through searching stackoverflow but no luck yet in reading this file. Ideally this would be done through bash or python.
thanks very much for your help on this.
Update: hexdump graph-name.4B.vout | head -5 gives:
0000000 999a 3e19 7468 3e7f 7d2a 3e93 d8e0 3ec4
0000010 cec6 3fe4 d551 3f08 eff2 3e54 999a 3e19
0000020 999a 3e19 3690 3e8c 0080 3f38 9ea3 3ef5
0000030 b7d6 3f66 999a 3e19 10e3 3ee1 400c 400d
0000040 a3df 3e7c 999a 3e19 979c 3e91 5230 3f18
Here is example code how you can use GraphCHi to write the output out as a string:
https://github.com/GraphChi/graphchi-cpp/wiki/Vertex-Aggregators
But the array is simple byte array. Here is example how to read it in python:
import struct
from array import array as binarray
import sys
inputfile = sys.argv[1]
data = open(inputfile).read()
a = binarray('c')
a.fromstring(data)
s = struct.Struct("f")
l = len(a)
print "%d bytes" %l
n = l / 4
for i in xrange(0, n):
x = s.unpack_from(a, i * 4)[0]
print ("%d %f" % (i, x))
I was having the same trouble. Luckily I work with a bunch of network engineers who helped me out! On Mac Linux, the following command works to print the 4B.vout data one line per node, with the integer values the same as is given in the summary file. If your file is called eg, filename.4B.vout, then some command line perl gets you:
cat filename.4B.vout | LANG= perl -0777 -e '$,=\"\n\"; print unpack(\"L*\",<>),\"\";'
Edited to add: this is for the assignments of connected component ID and community ID, written implicitly the 1st line is the ID of the node labeled 0, the 2nd line is the node labeled 1 etc. But I am copypasting here so I'm not sure how it would need to change for floats. It works great for the integer values per node.
IDENTIFICATION DIVISION.
PROGRAM-ID. HENSEM as "Test1.Program1".
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT CUSTOMER-FILE
ASSIGN TO "CUSTOMER.DAT"
ORGANIZATION IS SEQUENTIAL.
SELECT PRINTER-FILE
ASSIGN TO PRINTER
ORGANIZATION IS LINE SEQUENTIAL.
DATA DIVISION.
FILE SECTION.
FD CUSTOMER-FILE
LABEL RECORDS ARE STANDARD.
01 CUSTOMER-RECORD.
05 CUSTOMER-NAME PIC X(30).
05 CUSTOMER-PRODUCT PIC X(20).
05 CUSTOMER-QUANTITY PIC 9(2).
05 CUSTOMER-DATE PIC X(10).
FD PRINTER-FILE
LABEL RECORDS ARE OMITTED.
01 PRINTER-RECORD PIC X(80).
WORKING-STORAGE SECTION.
*VARIABLES FOR SCREEN ENTRY
01 Y-N PIC X.
01 ENTRY-STATUS PIC X.
PROCEDURE DIVISION.
OPEN EXTEND CUSTOMER-FILE.
OPEN OUTPUT PRINTER-FILE.
MOVE "Y" TO Y-N.
PERFORM ADD-RECORDS
UNTIL Y-N = "N".
PERFORM CLOSING-PROCEDURE.
GOBACK.
* OPENING AND CLOSING
OPENING-PROCEDURE.
CLOSING-PROCEDURE.
CLOSE CUSTOMER-FILE.
MOVE SPACE TO PRINTER-RECORD.
WRITE PRINTER-RECORD BEFORE ADVANCING PAGE.
CLOSE PRINTER-FILE.
ADD-RECORDS.
MOVE "N" TO ENTRY-STATUS.
PERFORM GET-FIELDS
UNTIL ENTRY-STATUS = "Y".
PERFORM ADD-THIS-RECORD.
PERFORM ANY-MORE.
GET-FIELDS.
MOVE SPACE TO CUSTOMER-RECORD.
DISPLAY "ENTER CUSTOMER NAME: ".
ACCEPT CUSTOMER-NAME.
DISPLAY "ENTER WHAT DID THE CUSTOMER BOUGHT: ".
ACCEPT CUSTOMER-PRODUCT.
DISPLAY "ENTER HOW MUCH DID THE CUSTOMER BOUGHT: ".
ACCEPT CUSTOMER-QUANTITY.
DISPLAY "ENTER WHEN DID THE CUSTOMER BOUGHT: ".
ACCEPT CUSTOMER-DATE.
PERFORM VALIDATE-FIELDS.
VALIDATE-FIELDS.
MOVE "Y" TO ENTRY-STATUS.
IF CUSTOMER-NAME = SPACE
DISPLAY "CUSTOMER NAME MUST BE ENTERED"
MOVE "N" TO ENTRY-STATUS.
ADD-THIS-RECORD.
MOVE CUSTOMER-RECORD TO PRINTER-RECORD.
WRITE CUSTOMER-RECORD.
WRITE PRINTER-RECORD BEFORE ADVANCING 1.
ANY-MORE.
DISPLAY "IS THERE ANY MORE INPUT?".
ACCEPT Y-N.
IF Y-N = "Y"
MOVE "Y" TO Y-N.
IF Y-N NOT = "Y"
MOVE "N" TO Y-N.
END PROGRAM HENSEM.
My problem is the OPEN OUTPUT FILE-PRINTER LINE. The program does not run and I'm getting illegal file-name error. If I delete that whole line, it runs but later produces error at WRITE PRINTER-RECORD BEFORE ADVANCING 1. Thank you.
You can assign a device name to your file using one of the standard Windows symbolic names, instead of the COBOL keyword PRINTER:
SELECT PRINTER-FILE
ASSIGN TO "lpt1"
ORGANIZATION IS LINE SEQUENTIAL.
Whereas R seems to handle Unicode characters well internally, I'm not able to output a data frame in R with such UTF-8 Unicode characters. Is there any way to force this?
data.frame(c("hīersumian","ǣmettigan"))->test
write.table(test,"test.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
The output text file reads:
hiersumian <U+01E3>mettigan
I am using R version 3.0.2 in a Windows environment (Windows 7).
EDIT
It's been suggested in the answers that R is writing the file correctly in UTF-8, and that the problem lies with the software I'm using to view the file. Here's some code where I'm doing everything in R. I'm reading in a text file encoded in UTF-8, and R reads it correctly. Then R writes the file out in UTF-8 and reads it back in again, and now the correct Unicode characters are gone.
read.table("myinputfile.txt",encoding="UTF-8")->myinputfile
myinputfile[1,1]
write.table(myinputfile,"myoutputfile.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
read.table("myoutputfile.txt",encoding="UTF-8")->myoutputfile
myoutputfile[1,1]
Console output:
> read.table("myinputfile.txt",encoding="UTF-8")->myinputfile
> myinputfile[1,1]
[1] hīersumian
Levels: hīersumian ǣmettigan
> write.table(myinputfile,"myoutputfile.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
> read.table("myoutputfile.txt",encoding="UTF-8")->myoutputfile
> myoutputfile[1,1]
[1] <U+FEFF>hiersumian
Levels: <U+01E3>mettigan <U+FEFF>hiersumian
>
This "answer" serves rather the purpose of clarifying that there is something odd going on behind the scenes:
"hīersumian" doesn't even make it into the data frame it seems. The "ī"-symbol is in all cases converted to "i".
options("encoding" = "native.enc")
t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)
t1
# a
# 1 hiersumian
options("encoding" = "UTF-8")
t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)
t1
# a
# 1 hiersumian
options("encoding" = "UTF-16")
t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)
t1
# a
# 1 hiersumian
The following sequence successfully writes "ǣmettigan" to the text file:
t2 <- data.frame(a = c("ǣmettigan"), stringsAsFactors=F)
getOption("encoding")
# [1] "native.enc"
Encoding(t2[,"a"]) <- "UTF-16"
write.table(t2,"test.txt",row.names=F,col.names=F,quote=F)
It is not going to work with "encoding" as "UTF-8" or "UTF-16" and also specifying "fileEncoding" will either lead to a defect or no output.
Somewhat disappointing as so far I managed to get all Unicode issues fixed somehow.
I may be missing something OS-specific, but data.table appears to have no problem with this (or perhaps more likely it's an update to R internals since this question was originally posed):
t1 = data.table(a = c("hīersumian", "ǣmettigan"))
tmp = tempfile()
fwrite(t1, tmp)
system(paste('cat', tmp))
# a
# hīersumian
# ǣmettigan
fread(tmp)
# a
# 1: hīersumian
# 2: ǣmettigan
I found a blog post that basically says its windows way of encoding text. Lots more detail in post. User should write the file in binary using
writeBin(charToRaw(x), con, endian="little")
https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/