How to convert a cairo-pdf to eps without converting fonts to outlines - ghostscript

We use cairo to write pdf-files. The results are great, the files are editable so we can extract text via copy & paste or even open and edit the files in Adobe Illustrator and Inkscape to change the font properties.
But as soon as we convert the PDF to EPS all fonts are converted to outlines.
My favourite tools are pdftops and gs and this is the way I tried it:
gs -sDEVICE=eps2write -dLanguageLevel=3 -dEmbedAllFonts=true -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.eps input.pdf
and
pdftops -eps -level3 input.pdf output.eps
In addition I tried ps2eps, ps2epsi, epspdf and Inkscape via command line, but the result was always the same and all fonts are converted to outlines.
We are using the Dejavu-fonts and the font embedding seems to be OK:
$ pdffonts input.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
JTFVDF+DejaVuSans-Bold TrueType WinAnsi yes yes yes 5 0
BTWYHK+DejaVuSansCondensed-Bold TrueType WinAnsi yes yes yes 6 0
VIBPBS+DejaVuSans-Oblique TrueType WinAnsi yes yes yes 7 0
TKGUZX+DejaVuSansCondensed TrueType WinAnsi yes yes yes 8 0
Any idea how to produce EPS-files with editable Fonts?
Here is my file: https://www.dropbox.com/s/11afckra7i8trdq/input.pdf?dl=0

Ghostscript's eps2write device doesn't convert fonts to outlines. BTW how do you know the fonts are being converted to outlines ?
I'll grab the example file you supplied (kudos! a load of people don't do that) and report back shortly, I can think of 2 possibilities offhand:
The file contains transparency. Cairo has something of a habit of creating PDF files which contain transparency operations that don't actually do anything (like setting alpha to 100%). You can't represent PDF transparency in PostScript, so the whole page gets rendered to an image.
The file is an image (or similar) with text on top in text rendering mode 3 (neither stroke nor fill). Although the actual text is invisible, Acrobat and other applications will often allow you to cut/paste it. However, PostScript doesn't have a mode for doing this, and since the text doesn't make any marks, it usually just gets dropped.
[Later]
Hmm, complex file. Decompressed this is > 11 MB....
Anyway, the page is in a transparency group:
9 0 obj
<<
/Type /Page
/Parent 1 0 R
/MediaBox [ 0 0 720 720 ]
/Contents 3 0 R
/Group <<
/Type /Group
/S /Transparency
/I true
/CS /DeviceRGB
>>
/Resources 2 0 R
>>
endobj
However it looks like Ghostscript decided the transparency could be dropped as the page is not a complete bitmap.
The eps file I get out does not have the fonts converted to outlines, it embeds complete fonts, and it uses them, eg:
8 0 obj
<</BaseFont/ENTCOM+DejaVuSansCondensed-Bold/FontDescriptor 9 0 R/Type/Font
/FirstChar 32/LastChar 220/Widths[
313 0 0 0 0 0 0 0 0 0 0 0 0 374 0 0
0 0 0 0 0 626 626 626 0 0 0 0 0 0 0 0
0 696 686 660 747 615 615 738 753 334 0 697 573 896 753 765
659 765 693 648 614 730 696 993 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 730]
/Encoding 20 0 R/Subtype/TrueType>>
endobj
%%EndResource
9 0 obj
<</Type/FontDescriptor/FontName/ENTCOM+DejaVuSansCondensed-Bold/FontBBox[-362 -176 964 927]/Flags 4
/Ascent 745
/CapHeight 745
/Descent -176
/ItalicAngle 0
/StemV 144
/MissingWidth 540
/FontFile2 17 0 R>>
endobj
%%EndResource
%%BeginResource: file (PDF FontFile obj_17)
17 0 obj
<</Filter/ASCII85Decode
/Length1 6088/Length 7019>>stream
!!*'"!"ApY!!<3t:K&o%z!!!e-!!!""#r5Xnz!!!\J!!!)-#s2r:5.NBR!!!#S!!!("AoMC">68U0
!!!*T!!!##B5Dj*z!!!DJ!!!86BOtU_%6ag,!!!gs!!!!WBP:sc%JC""!!!,V!!!!EBPhj9z!!!f0
.....
Lots of data omitted
.....
!!)s8!!<3$zzzzzzz!!*'"!!6K:Z*:FC?Oo9l!$;IHze&!X4ze&!X4peC[h%QOi,!!*'*zz~>
endstream
endobj
So that's a TrueType font, which is later used:
10 0 0 10 0 0 cm BT
/R8 12.96 Tf
1 0 0 1 262.795 318.916 Tm
[(N)1(E)1(US)0.998415(T)79.0063(ADT)1.00218]TJ
126.609 339.675 Td
[(F)1(IN)1.00218(DO)0.998415(R)1.00218(F)0.998415(F)1.00218]TJ
-338.998 -203.387 Td
[(W)1(O)1(L)166.005(T)1(M)1(E)1(R)1(S)1(H)1(A)29.9863(US)1(E)1(N)1]TJ
373.499 -12.6809 Td
[(M)0.998415(IT)-21.9915(T)0.998415(E)1.00218]TJ
ET
It could be you are using an old version, I used the current version, 9.21. Failing that the obvious question would be 'why do you think the fonts are outlines ?'

Related

How to merge rows if values in one column contains consective numbers and all other columns match

I have a very large file (~700M rows) and I would like to reduce the size by grouping mostly matching rows. Specifically, the file is sorted by fields 1 and 2 and I would like to group rows where field 2 contains consecutive numbers but all other fields match. If there is a gap in field 2 or if any other fields do not match the previous row then I would like to start a new interval. Ideally, I would like the output to return the interval range for the grouped rows and would prefer a solution that works in bash with awk and/or sed. I'm open to other solutions as well as long as they don't require re-sorting or other operations that might crash with such a long file.
The input file looks something like this.
NW_005179401.1 100 1 0 0 0 0 0 0 0 0
NW_005179401.1 101 1 0 0 0 0 0 0 0 0
NW_005179401.1 102 1 0 0 0 0 0 0 0 0
NW_005179401.1 103 1 0 0 0 0 0 1 0 0
NW_005179401.1 104 1 0 0 0 0 0 1 0 0
NW_005179401.1 105 1 0 0 0 0 0 1 0 0
NW_005179401.1 106 1 0 0 0 0 0 1 0 0
NW_005179401.1 108 1 0 0 0 0 0 1 0 0
NW_005179401.1 109 1 0 0 0 0 0 1 0 0
NW_005179401.1 110 1 0 0 0 0 0 1 0 0
NW_005179401.1 111 1 0 0 0 0 0 1 0 0
NW_005179401.1 112 1 0 0 0 0 0 1 0 0
NW_005179401.1 992 0 0 1 1 0 0 0 0 2
NW_005179401.1 993 0 0 1 1 0 0 0 0 2
NW_005179401.1 994 0 0 1 1 0 0 0 0 2
NW_005179401.1 995 0 0 1 1 0 0 0 0 2
NW_005179401.1 996 0 0 1 1 0 0 0 0 0
NW_005179401.1 997 0 0 1 1 0 0 0 0 0
NW_005179401.1 998 0 0 1 1 0 0 0 0 0
NW_005179401.1 999 0 0 1 1 0 0 0 0 0
In reality the file has more fields but all contain integers like fields 3 and beyond in the example. The ideal output will look like this, with first and last values from consecutive field 2 interval printed in output fields 2 and 3.
NW_005179401.1 100 102 1 0 0 0 0 0 0 0 0
NW_005179401.1 103 106 1 0 0 0 0 0 1 0 0
NW_005179401.1 108 112 1 0 0 0 0 0 1 0 0
NW_005179401.1 992 995 0 0 1 1 0 0 0 0 2
NW_005179401.1 996 999 0 0 1 1 0 0 0 0 0
I found solutions group consecutive rows with matches in specific fields, but none that also look for consecutive integers in one field and not one that can return the range. One thought was using uniq with the -c flag while skipping the first 2 fields, then adding the counts to the value in field 2, but given the additional condition of requiring consecutive numbers in field 2 I'm not too sure where to start with this one. Thanks in advance.
EDIT: I apologize for not originally adding my attempted code but my pipeline used the bioinformatics program bedtools and it kept getting killed for lack of memory, which wasn't something I expected to be troubleshot due to lack of pre-programmed functionality. I am an awk novice and didn't know where to start for an alternative pipeline for reformatting this type of file.
I doubt there is a standard tool like uniq -c for this. But you can use this custom awk script:
awk '{$1=$1} $0!=n {s=$2; printf "%s", g}
{$2=$2+1; n=$0; $2=s" "$2-1; g=$0 ORS}
END {printf "%s", g}' yourFile
n is the the next anticipated record,
e.g. if the current line is abc 100 x y z then n=abc 101 x y z.
g is the group of records to be printed in case the next anticipated line n does not occur and the group ends.
s is the start number of group g, i.e. the lower bound of the interval.
{$1=$1} is only there to ensure that the field separators in the current line $0 and the generated line n are consistent, so that we can check equality using ==, or rather != in this case.
For your example, this prints
NW_005179401.1 100 102 1 0 0 0 0 0 0 0 0
NW_005179401.1 103 106 1 0 0 0 0 0 1 0 0
NW_005179401.1 108 112 1 0 0 0 0 0 1 0 0
NW_005179401.1 992 995 0 0 1 1 0 0 0 0 2
NW_005179401.1 996 999 0 0 1 1 0 0 0 0 0
$ cat tst.awk
{
prevVals = currVals
origRec = $0
$2 = ""
currVals = $0
$0 = origRec
}
($2 != endKey+1) || (currVals != prevVals) {
if ( NR>1 ) {
prt()
}
begKey = $2
}
{ endKey = $2 }
END { prt() }
function prt( origRec) {
origRec = $0
$2 = begKey OFS endKey
print
$0 = origRec
}
$ awk -f tst.awk file
NW_005179401.1 100 102 1 0 0 0 0 0 1 0 0
NW_005179401.1 103 106 1 0 0 0 0 0 1 0 0
NW_005179401.1 108 112 0 0 1 1 0 0 0 0 2
NW_005179401.1 992 995 0 0 1 1 0 0 0 0 0
NW_005179401.1 996 999 0 0 1 1 0 0 0 0 0

AWK Formatting Using First Row as a Header and Iterating by column

I'm struggling trying to format a collectd ploted file si I can later import it to an influx db instance.
This is how the file looks like:
#Date Time [CPU]User% [CPU]Nice% [CPU]Sys% [CPU]Wait% [CPU]Irq% [CPU]Soft% [CPU]Steal% [CPU]Idle% [CPU]Totl% [CPU]Intrpt/sec [CPU]Ctx/sec [CPU]Proc/sec [CPU]ProcQue [CPU]ProcRun [CPU]L-Avg1 [CPU]L-Avg5 [CPU]L-Avg15 [CPU]RunTot [CPU]BlkTot [MEM]Tot [MEM]Used [MEM]Free [MEM]Shared [MEM]Buf [MEM]Cached [MEM]Slab [MEM]Map [MEM]Anon [MEM]Commit [MEM]Locked [MEM]SwapTot [MEM]SwapUsed [MEM]SwapFree [MEM]SwapIn [MEM]SwapOut [MEM]Dirty [MEM]Clean [MEM]Laundry [MEM]Inactive [MEM]PageIn [MEM]PageOut [MEM]PageFaults [MEM]PageMajFaults [MEM]HugeTotal [MEM]HugeFree [MEM]HugeRsvd [MEM]SUnreclaim [SOCK]Used [SOCK]Tcp [SOCK]Orph [SOCK]Tw [SOCK]Alloc [SOCK]Mem [SOCK]Udp [SOCK]Raw [SOCK]Frag [SOCK]FragMem [NET]RxPktTot [NET]TxPktTot [NET]RxKBTot [NET]TxKBTot [NET]RxCmpTot [NET]RxMltTot [NET]TxCmpTot [NET]RxErrsTot [NET]TxErrsTot [DSK]ReadTot [DSK]WriteTot [DSK]OpsTot [DSK]ReadKBTot [DSK]WriteKBTot [DSK]KbTot [DSK]ReadMrgTot [DSK]WriteMrgTot [DSK]MrgTot [INODE]NumDentry [INODE]openFiles [INODE]MaxFile% [INODE]used [NFS]ReadsS [NFS]WritesS [NFS]MetaS [NFS]CommitS [NFS]Udp [NFS]Tcp [NFS]TcpConn [NFS]BadAuth [NFS]BadClient [NFS]ReadsC [NFS]WritesC [NFS]MetaC [NFS]CommitC [NFS]Retrans [NFS]AuthRef [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss [TCP]FTrans [BUD]1Page [BUD]2Pages [BUD]4Pages [BUD]8Pages [BUD]16Pages [BUD]32Pages [BUD]64Pages [BUD]128Pages [BUD]256Pages [BUD]512Pages [BUD]1024Pages
20190228 00:01:00 12 0 3 0 0 1 0 84 16 26957 20219 14 2991 3 0.05 0.18 0.13 1 0 198339428 197144012 1195416 0 817844 34053472 1960600 76668 158641184 201414800 0 17825788 0 17825788 0 0 224 0 0 19111168 3 110 4088 0 0 0 0 94716 2885 44 0 5 1982 1808 0 0 0 0 9739 9767 30385 17320 0 0 0 0 0 0 12 13 3 110 113 0 16 16 635592 7488 0 476716 0 0 0 0 0 0 0 0 0 0 0 8 0 0 22 0 1 0 0 0 0 48963 10707 10980 1226 496 282 142 43 19 6 132
20190228 00:02:00 11 0 3 0 0 1 0 85 15 26062 18226 5 2988 3 0.02 0.14 0.12 2 0 198339428 197138128 1201300 0 817856 34054692 1960244 75468 158636064 201398036 0 17825788 0 17825788 0 0 220 0 0 19111524 0 81 960 0 0 0 0 94420 2867 42 0 7 1973 1842 0 0 0 0 9391 9405 28934 16605 0 0 0 0 0 0 9 9 0 81 81 0 11 11 635446 7232 0 476576 0 0 0 0 0 0 0 0 0 0 0 3 0 0 8 0 1 0 0 0 0 49798 10849 10995 1241 499 282 142 43 19 6 132
20190228 00:03:00 11 0 3 0 0 1 0 85 15 25750 17963 4 2980 0 0.00 0.11 0.10 2 0 198339428 197137468 1201960 0 817856 34056400 1960312 75468 158633880 201397832 0 17825788 0 17825788 0 0 320 0 0 19111712 0 75 668 0 0 0 0 94488 2869 42 0 5 1975 1916 0 0 0 0 9230 9242 28411 16243 0 0 0 0 0 0 9 9 0 75 75 0 10 10 635434 7232 0 476564 0 0 0 0 0 0 0 0 0 0 0 2 0 0 6 0 1 0 0 0 0 50029 10817 10998 1243 501 282 142 43 19 6 132
20190228 00:04:00 11 0 3 0 0 1 0 84 16 25755 17871 10 2981 5 0.08 0.11 0.10 3 0 198339428 197140864 1198564 0 817856 34058072 1960320 75468 158634508 201398088 0 17825788 0 17825788 0 0 232 0 0 19111980 0 79 2740 0 0 0 0 94488 2867 4 0 2 1973 1899 0 0 0 0 9191 9197 28247 16183 0 0 0 0 0 0 9 9 0 79 79 0 10 10 635433 7264 0 476563 0 0 0 0 0 0 0 0 0 0 0 5 0 0 12 0 1 0 0 0 0 49243 10842 10985 1245 501 282 142 43 19 6 132
20190228 00:05:00 12 0 4 0 0 1 0 83 17 26243 18319 76 2985 3 0.06 0.10 0.09 2 0 198339428 197148040 1191388 0 817856 34059808 1961420 75492 158637636 201405208 0 17825788 0 17825788 0 0 252 0 0 19112012 0 85 18686 0 0 0 0 95556 2884 43 0 6 1984 1945 0 0 0 0 9176 9173 28153 16029 0 0 0 0 0 0 10 10 0 85 85 0 12 12 635473 7328 0 476603 0 0 0 0 0 0 0 0 0 0 0 3 0 0 7 0 1 0 0 0 0 47625 10801 10979 1253 505 282 142 43 19 6 132
What I'm trying to do, is to get it in a format that looks like this:
cpu_value,host=mxspacr1,instance=5,type=cpu,type_instance=softirq value=180599 1551128614916131663
cpu_value,host=mxspacr1,instance=2,type=cpu,type_instance=interrupt value=752 1551128614916112943
cpu_value,host=mxspacr1,instance=4,type=cpu,type_instance=softirq value=205697 1551128614916128446
cpu_value,host=mxspacr1,instance=7,type=cpu,type_instance=nice value=19250943 1551128614916111618
cpu_value,host=mxspacr1,instance=2,type=cpu,type_instance=softirq value=160513 1551128614916127690
cpu_value,host=mxspacr1,instance=1,type=cpu,type_instance=softirq value=178677 1551128614916127265
cpu_value,host=mxspacr1,instance=0,type=cpu,type_instance=softirq value=212274 1551128614916126586
cpu_value,host=mxspacr1,instance=6,type=cpu,type_instance=interrupt value=673 1551128614916116661
cpu_value,host=mxspacr1,instance=4,type=cpu,type_instance=interrupt value=701 1551128614916115893
cpu_value,host=mxspacr1,instance=3,type=cpu,type_instance=interrupt value=723 1551128614916115492
cpu_value,host=mxspacr1,instance=1,type=cpu,type_instance=interrupt value=756 1551128614916112550
cpu_value,host=mxspacr1,instance=6,type=cpu,type_instance=nice value=21661921 1551128614916111032
cpu_value,host=mxspacr1,instance=3,type=cpu,type_instance=nice value=18494760 1551128614916098304
cpu_value,host=mxspacr1,instance=0,type=cpu,type_instance=interrupt value=552 1551
What I have managed to do so far is just to convert the date string into EPOCH format.
I was thinking somehow to use the first value "[CPU]" as the measurement, and the "User%" as the type, the host I can take it from the system where the script will run.
I would really appreciate your help, because I really basic knowledge of text editing.
Thanks.
EDIT: this is what would expect to get with the information of the second line using as a header the first row:
cpu_value,host=mxspacr1,type=cpu,type_instance=user% value=0 1551128614916131663
EDIT: This is what I have so far, and I'm stuck here.
awk -v HOSTNAME="$HOSTNAME" 'BEGIN { FS="[][]"; getline; NR==1; f1=$2; f2=$3 } { RS=" "; printf f1"_measurement,host="HOSTNAME",type="f2"value="$3" ", system("date +%s -d \""$1" "$2"\"") }' mxmcaim01-20190228.tab
And this is what I get, but this is only for 1 column, now I don't know how to process the remaining columns such as Nice, Sys, Wait and so on.
CPU_measurement,host=mxmcamon05,type=User% value= 1552014000
CPU_measurement,host=mxmcamon05,type=User% value= 1551960000
CPU_measurement,host=mxmcamon05,type=User% value= 1551343500
CPU_measurement,host=mxmcamon05,type=User% value= 1551997620
CPU_measurement,host=mxmcamon05,type=User% value= 1551985200
CPU_measurement,host=mxmcamon05,type=User% value= 1551938400
CPU_measurement,host=mxmcamon05,type=User% value= 1551949200
CPU_measurement,host=mxmcamon05,type=User% value= 1551938400
CPU_measurement,host=mxmcamon05,type=User% value= 1551938400
CPU_measurement,host=mxmcamon05,type=User% value= 1551945600
CPU_measurement,host=mxmcamon05,type=User% value= 1551938400
Please help.
EDIT. First of all, Thanks for your help.
Taking Advantage from you knowledge in text editing, I was expecting to use this for 3 separate files, but unfortunately and I don't know why the format is different, like this:
#Date Time SlabName ObjInUse ObjInUseB ObjAll ObjAllB SlabInUse SlabInUseB SlabAll SlabAllB SlabChg SlabPct
20190228 00:01:00 nfsd_drc 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfsd4_delegations 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfsd4_stateids 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfsd4_files 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfsd4_stateowners 0 0 0 0 0 0 0 0 0 0
20190228 00:01:00 nfs_direct_cache 0 0 0 0 0 0 0 0 0 0
So I don't how to handle the arrays in a way that I can use nfsd_drc as the type and then Iterate through ObjInUse ObjInUseB ObjAll ObjAllB SlabInUse SlabInUseB SlabAll SlabAllB SlabChg SlabPct and use them like the type_instance and finally the value in this case for ObjInUse will be 0, ObjInUseB = 0, ObjAll = 0, an so one, making something like this:
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=ObjectInUse value=0 1551128614916131663
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=ObjInuseB value=0 1551128614916131663
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=ObjAll value=0 1551128614916112943
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=ObjAllB value=0 1551128614916128446
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabInUse value=0 1551128614916111618
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabInUseB value=0 1551128614916127690
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabAll value=0 1551128614916127265
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabAllB value=0 1551128614916126586
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabChg value=0 1551128614916116661
slab_value,host=mxspacr1,type=nfsd_drc,type_instance=SlabPct value=0 1551128614916115893
slab_value is a hard-coded value.
Thanks.
It is not clear where do instance and type_instance=interrupt come from in your final desired format. Otherwise awk code below should work.
Note: it doesn't strip % from tag values and prints timestamp at end of line in seconds (append extra zeros if you want nanoseconds).
gawk -v HOSTNAME="$HOSTNAME" 'NR==1 {split($0,h,/[ \t\[\]]+/,s); for(i=0;i<length(h);i++){ h[i]=tolower(h[i]); };}; NR>1 { for(j=2;j<NF;j++) {k=2*j; printf("%s_value,host=%s,type=%s,type_instance=%s value=%s %s\n", h[k], HOSTNAME, h[k], h[k+1],$(j+1), mktime(substr($1,1,4)" "substr($1,5,2)" "substr($1,7,2)" "substr($2,1,2)" "substr($2,4,2)" "substr($2,7,2)));}}' mxmcaim01-20190228.tab

Can`t plot matrix in Gnuplot

I have a matrix of 1s and 0s saved in file. It looks like this:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 0
0 0 0 0 0 1 1 0 0 0
1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
I am trying to plot in gnuplot using command:
plot 'data.rtf' matrix with image
but when I do that I get an error:
warning: matrix contains missing or undefined values
Matrix does not represent a grid
I think I should get an image where 0 is white space and 1 is black space. I am new to gnuplot so i have no idea what might be wrong nor if i am using correct way to do it. I will be grateful for any help. Thanks.
Your file is an rtf (rich text format) file which is a markup language format, which gnuplot will not understand. You will need to create the file in a text editor (not a word processor) in order to be able to use it.
The file that you provided looks like:
{\rtf1\ansi\ansicpg1250\cocoartf1404\cocoasubrtf340
{\fonttbl\f0\fnil\fcharset0 Menlo-Regular;}
{\colortbl;\red255\green255\blue255;}
\paperw11900\paperh16840\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural\partightenfactor0
\f0\fs22 \cf0 \CocoaLigature0 0 0 0 0 0 0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0\
0 0 0 0 0 0 0 0 0 1\
0 0 0 0 0 0 0 0 0 0\
0 0 0 0 0 0 0 1 1 0\
0 0 0 0 0 1 1 0 0 0\
1 1 1 1 1 0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0
Notice that it starts with a bunch of markup text. Gnuplot is designed to work with text files and not formatted text or binary files (with some limited exceptions).
Creating a text file containing your designed matrix will work just fine.
Color plots are surface-like plots, thus you have to use splot not plot
set pm3d map
set palette gray
splot 'test.txt' matrix w image

Using Rstudio to create Graph

i want to make Bar graph from attached data in Rstudio i want to show that what ip used what protocol and how many times
Protocol
Source DNS FTP HTTP IMF LLC SMTP TCP TELNET
172.16.112.100 306 0 0 0 0 0 0 0
172.16.112.50 0 0 0 0 0 0 0 24
172.16.113.168 0 0 0 0 0 0 0 15
172.16.113.204 1 0 0 0 0 0 0 0
172.16.114.50 1 0 0 0 0 0 0 0
172.16.115.20 158 0 0 0 0 0 0 2
192.168.1.20 3 0 0 0 0 0 0 0
194.7.248.153 0 0 0 0 0 0 0 2
197.218.177.69 0 0 0 0 0 0 0 0
HP_ed:9b:2d 0 0 0 0 0 0 0 0
Simple way to build one plot for each IP:
After loading data as a data.frame with colnames and rownames.
par(mfrow=c(4,3))
for (i in 1:nrow(data)) {
barplot(as.numeric(data[i,]), main=rownames(data)[i], names.arg=colnames(data))
}
Your data is very poor, which makes the graphs have only one or no bar at all. If you want stacked or grouped bars you should have a look at the pakages ggplot2 or lattice.

GhostScript issue with extracting text, and -dProvideUnicode usage

I use GS with DjVu driver, as in this example:
gs %gs_args% -dProvideUnicode -dExtractText -sDEVICE=djvusep -o out.sep in.ps
and noticed issue, that with some files text is not correctly extracted (I get question marks in clipboard on copying text from generated file).
I thought it's some issue with encoding, and removed -dProvideUnicode switch, but then text is not extracted at all, and I assume that -dExtractText flag, specific only to this driver, needs it to function properly.
Trying to run single page PDF file (that generated in.ps through ps2write device), with ps2ascii gives also no text. But other tools like pdfminer, xpdf, extract correct text. Also PDF viewers like SumatraPDF that uses mupdf, or Acrobat, extract the text as expected.
Does anyone maybe knows something about these undocumented switches, and what could be the problem here?
Update: This only happens if I go through ps2write route. If instead I use PDF directly there is no issue.
Here is encoding info from the PDF file:
c:\temp>pdf-parser -s encoding sample.pdf
obj 11 0
Type: /Font
Referencing: 12 0 R, 20 0 R
<<
/BaseFont /XQKNMY+TT14112O00
/FontDescriptor 12 0 R
/Type /Font
/FirstChar 32
/LastChar 144
/Widths [
253 0 0 0 0 0 0 0 293 293 0 0 220 313 220 0
0 467 467 0 0 0 467 0 467 467 0 0 0 0 0 0
0 680 0 0 0 653 0 773 760 0 0 740 0 833 0 0
0 0 0 480 613 0 680 0 0 0 0 0 0 0 0 0
0 407 513 414 500 414 320 447 513 227 0 467 227 773 513 513
513 0 333 367 293 487 467 667 460 414 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
220]
/Encoding 20 0 R
/Subtype /Type1
>>
obj 20 0
Type: /Encoding
Referencing:
<<
/Type /Encoding
/BaseEncoding /WinAnsiEncoding
/Differences [
144/quoteright]
>>
This isn't really a Ghostscript question. The djvu device is decidedly non-standard in a Ghostscript build. I can't tell you anything about the switches, because they are specific to the DejaVu device.
If all you want to do is extract the text from a file you could use the txtwrite device (with recent versions of Ghostscript).
If you use a PDF file then a ToUnicode CMap may be present in the file and can be used to get Unicode information about the text. PostScript does not contain ToUnicode CMaps and so the Unicode information is NOT available from the PostScript file. I would imagine this is why the ps2write output can't have text extracted from it by the device.

Resources