Some installed font within a font family does not take effect - windows

I generated a font family (Ricty) using the script on this site. There are four font files (styles) with four font styles:
Ricty-Regular.ttf
Ricty-Bold.ttf
Ricty-Oblique.ttf
Ricty-BoldOblique.ttf
I installed these files on Windows 10.
(Situation 1) If install Ricty-Regular.ttf and Ricty-Bold.ttf and then Ricty-Oblique.ttf and Ricty-BoldOblique.ttf, in Control Panel > Fonts, only Ricty-Regular.ttf and Ricty-Bold.ttf showed. (Sorry for the screenshots is Japanese)
(Situation 2) If install Ricty-Oblique.ttf and Ricty-BoldOblique.ttf firstly, and then other two fonts, only these fonts displayed on Control Panel > fonts.
(Other situation) For Ricty-Regular.ttf and Ricty-Oblique.ttf, whichever installed first will win (show on Control Panel). Similarly, for Ricty-Bold.ttf and Ricty-BoldOblique.ttf, whichever installed first will win (show on Control Panel).
On whichever situation, all four fonts will be showed on Settings > Fonts.
I set Emacs (Spacemacs) to use this font family, in situation 1, all characters are in normal style, and in situation 2, all characters are in oblique style. (FYI, in situation 2, VSCode can show normal style).
UPDATE
Checked the values of tables using OT Master Light 3.70.
Font Table
Field
Ricty-Regular.ttf
Ricty-Oblique.ttf
Ricty-Bold.ttf
Ricty-BoldOblique.ttf
head
macStyle
0x0000
0x0000
0x0000
0x0000
name
name IDs 0
<long copyright string>
<long copyright string>
<long copyright string>
<long copyright string>
name IDs 1
Ricty
Ricty
Ricty
Ricty
name IDs 2
Regular
Oblique
Bold
Bold Oblique
name IDs 3
FontForge 2.0 : Ricty Regular : 27-11-2021
FontForge 2.0 : Ricty Oblique : 27-11-2021
FontForge 2.0 : Ricty Bold : 27-11-2021
FontForge 2.0 : Ricty Bold Oblique : 27-11-2021
name IDs 4
Ricty Regular
Ricty Oblique
Ricty Bold
Ricty Bold Oblique
name IDs 5
Version 4.1.1
Version 4.1.1
Version 4.1.1
Version 4.1.1
name IDs 6
Ricty-Regular
Ricty-Oblique
Ricty-Bold
Ricty-BoldOblique
name IDs 16
<not exist>
<not exist>
<not exist>
<not exist>
name IDs 17
<not exist>
<not exist>
<not exist>
<not exist>
name IDs 21
<not exist>
<not exist>
<not exist>
<not exist>
name IDs 22
<not exist>
<not exist>
<not exist>
<not exist>
OS/2
version
0x0001
0x0001
0x0001
0x0001
usWeightClass
400
400
700
700
fsSelection
0x0080
0x0280
0x0080
0x0280

Related

Latex not opening png and windows not being able to generate bb

I'm using TexMaker (on Windows 10), using the pdflatex (F6) and yet I can't open the PNG file in the folder of my .tex
\usepackage{graphicx}
\begin{document}
\begin{figure}[h!]
\includegraphics[width=\linewidth]{File.png}
\end{figure}
\end{document}
so I tried to create an bb file from the PNG. I opened cmd at the folder and typed:
ebb File.png
ebb: file not writable for security reasons: File.bb
ebb: fatal: Unable to open output file File.bb
When clicking in the properties and security of File.png I see that my user both: is the owner of the folder and has all permissions set in (even tho I cannot uncheck any of the permissions I have, weirdly).
The folder which I'm working on has that black square marked on the "read only" attribute (in properties). Which I can't quite keep unchecked even tho I'm the owner of it. What is wrong?
EDIT: Here's what happens when I click on show permissions (>properties >security >advanced >show permissions) my user is the owner.
I can't click on anything even tho I'm the owner.
Edit, the logfile:
LOG FILE :
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 20.10) (preloaded format=pdflatex 2020.10.24) 25 OCT 2020 10:47
entering extended mode
**./test.tex
(test.tex
LaTeX2e <2020-10-01> patch level 1
L3 programming layer <2020-10-05> xparse <2020-03-03>
("C:\Program Files\MiKTeX\tex/latex/base\article.cls"
Document Class: article 2020/04/10 v1.4m Standard LaTeX document class
("C:\Program Files\MiKTeX\tex/latex/base\size12.clo"
File: size12.clo 2020/04/10 v1.4m Standard LaTeX file (size option)
)
\c#part=\count175
\c#section=\count176
\c#subsection=\count177
\c#subsubsection=\count178
\c#paragraph=\count179
\c#subparagraph=\count180
\c#figure=\count181
\c#table=\count182
\abovecaptionskip=\skip47
\belowcaptionskip=\skip48
\bibindent=\dimen138
)
("C:\Program Files\MiKTeX\tex/latex/graphics\graphicx.sty"
Package: graphicx 2020/09/09 v1.2b Enhanced LaTeX Graphics (DPC,SPQR)
("C:\Program Files\MiKTeX\tex/latex/graphics\keyval.sty"
Package: keyval 2014/10/28 v1.15 key=value parser (DPC)
\KV#toks#=\toks15
)
("C:\Program Files\MiKTeX\tex/latex/graphics\graphics.sty"
Package: graphics 2020/08/30 v1.4c Standard LaTeX Graphics (DPC,SPQR)
("C:\Program Files\MiKTeX\tex/latex/graphics\trig.sty"
Package: trig 2016/01/03 v1.10 sin cos tan (DPC)
)
("C:\Program Files\MiKTeX\tex/latex/graphics-cfg\graphics.cfg"
File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration
)
Package graphics Info: Driver file: pdftex.def on input line 105.
("C:\Program Files\MiKTeX\tex/latex/graphics-def\pdftex.def"
File: pdftex.def 2020/10/05 v1.2a Graphics/color driver for pdftex
))
\Gin#req#height=\dimen139
\Gin#req#width=\dimen140
)
("C:\Program Files\MiKTeX\tex/latex/l3backend\l3backend-pdftex.def"
File: l3backend-pdftex.def 2020-09-24 L3 backend support: PDF output (pdfTeX)
\l__kernel_color_stack_int=\count183
\l__pdf_internal_box=\box47
) (test.aux)
\openout1 = `test.aux'.
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 3.
LaTeX Font Info: ... okay on input line 3.
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 3.
LaTeX Font Info: ... okay on input line 3.
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 3.
LaTeX Font Info: ... okay on input line 3.
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 3.
LaTeX Font Info: ... okay on input line 3.
LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 3.
LaTeX Font Info: ... okay on input line 3.
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 3.
LaTeX Font Info: ... okay on input line 3.
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 3.
LaTeX Font Info: ... okay on input line 3.
("C:\Program Files\MiKTeX\tex/context/base/mkii\supp-pdf.mkii"
[Loading MPS to PDF converter (version 2006.09.02).]
\scratchcounter=\count184
\scratchdimen=\dimen141
\scratchbox=\box48
\nofMPsegments=\count185
\nofMParguments=\count186
\everyMPshowfont=\toks16
\MPscratchCnt=\count187
\MPscratchDim=\dimen142
\MPnumerator=\count188
\makeMPintoPDFobject=\count189
\everyMPtoPDFconversion=\toks17
) ("C:\Program Files\MiKTeX\tex/latex/epstopdf-pkg\epstopdf-base.sty"
Package: epstopdf-base 2020-01-24 v2.11 Base part for package epstopdf
Package epstopdf-base Info: Redefining graphics rule for `.eps' on input line 4
85.
)
<semirreta.png, id=1, 368.1253pt x 99.37125pt>
[1{C:/Users/JoaoV/AppData/Local/MiKTeX/pdftex/config/pdftex.map}] (test.aux) )
Here is how much of TeX's memory you used:
1167 strings out of 480236
17436 string characters out of 2890433
280939 words of memory out of 3000000
17769 multiletter control sequences out of 15000+200000
535555 words of font info for 31 fonts, out of 3000000 for 9000
1141 hyphenation exceptions out of 8191
60i,4n,66p,199b,236s stack positions out of 5000i,500n,10000p,200000b,50000s
<C:/Program Files/MiKTeX/fonts/type1/public/amsfonts/cm/cmr12.pfb><C:/Program F
iles/MiKTeX/fonts/type1/public/amsfonts/cm/cmtt12.pfb>
Output written on test.pdf (1 page, 13448 bytes).
PDF statistics:
15 PDF objects out of 1000 (max. 8388607)
0 named destinations out of 1000 (max. 500000)
6 words of extra memory for PDF output out of 10000 (max. 10000000)
I made a little test:
the real image is suposed to be these two lines: https://ibb.co/yYQCfnd
Remove the draft option, this prevents images from showing up

How can I convert a PDF to image with no font losses?

I have read tons of stackoverflow questions about problems with fonts when converting (with ghostscript) from PDF to image.
Because you don't have the fonts embedded, the ghostscript tries to find alternatives in your system and render the better possible.
But I can not understand why my MacOSX Preview is rendering perfect a PDF and ghostscript can't.
gs -sFONTPATH=/Library/Fonts -sDEVICE=pngalpha -o file-%03d.png -r144 my.pdf
I'm even telling gs where the fonts are.
This is the output.
$ pdffonts cv18.pdf
Fontconfig warning: "/usr/local/etc/fonts/fonts.conf", line 86: unknown element "blank"
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
CenturyGothic,Bold TrueType WinAnsi no no no 13 0
CenturyGothic TrueType WinAnsi no no no 14 0
CourierNew TrueType WinAnsi no no no 15 0
Arial TrueType WinAnsi no no no 16 0
AYTOPC+Wingdings TrueType WinAnsi yes yes no 17 0
TimesNewRoman TrueType WinAnsi no no no 18 0
CenturyGothic,Italic TrueType WinAnsi no no no 24 0
$ gs -sDEVICE=pngalpha -o file-%03d.png -r300 cv18.pdf
GPL Ghostscript 9.26 (2018-11-20)
Copyright (C) 2018 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 2.
Page 1
Querying operating system for font files...
Substituting font NewCenturySchlbk-Bold for CenturyGothic,Bold.
Loading C059-Bold font from /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/C059-Bold... 4357236 2918185 2219472 869812 4 done.
Substituting font NewCenturySchlbk-Roman for CenturyGothic.
Loading C059-Roman font from /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/C059-Roman... 4504148 3145674 2313416 951449 4 done.
Substituting font Courier for CourierNew.
Loading NimbusMonoPS-Regular font from /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/NimbusMonoPS-Regular... 4731860 3382093 2548760 1140366 4 done.
Can't find (or can't open) font file /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/ArialMT.
Can't find (or can't open) font file ArialMT.
Can't find (or can't open) font file /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/ArialMT.
Can't find (or can't open) font file ArialMT.
Didn't find this font on the system!
Substituting font Helvetica for ArialMT.
Loading NimbusSans-Regular font from /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/NimbusSans-Regular... 4939372 3576328 2589160 1183936 4 done.
Can't find (or can't open) font file /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/TimesNewRomanPSMT.
Can't find (or can't open) font file TimesNewRomanPSMT.
Can't find (or can't open) font file /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/TimesNewRomanPSMT.
Can't find (or can't open) font file TimesNewRomanPSMT.
Didn't find this font on the system!
Substituting font Times-Roman for TimesNewRomanPSMT.
Loading NimbusRoman-Regular font from /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/NimbusRoman-Regular... 5196116 3849418 3003064 1540393 4 done.
Page 2
Substituting font NewCenturySchlbk-Roman for CenturyGothic.
Substituting font NewCenturySchlbk-Bold for CenturyGothic,Bold.
Substituting font NewCenturySchlbk-Italic for CenturyGothic,Italic.
Loading C059-Italic font from /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/C059-Italic... 5504628 4138508 2467960 1088660 4 done.
The question is: why MacOSX Preview is not having this problems? Where does it take the fonts from, to render perfectly ?
I would need to see your PDF file to be able to comment, but where you say "I'm even telling gs where the fonts are", I'm afraid that does not appear to be correct.
The Ghostscript back channel says:
Can't find (or can't open) font file /usr/local/Cellar/ghostscript/9.26/share/ghostscript/9.26/Resource/Font/ArialMT.
So either the file does not exist, or Ghostscript can't open it or just possibly its corrupted in some way.
Is fontconfig the only thing you are using to 'tell gs where the fonts are' ? Because if so, you aren't really telling Ghostscript anything. Note also the fontconfig warning, I'm not an expert with font config, but I suspect you may want to sort that out too.
Ghostscript may, or may not, use fontconfig, depending how it was built, which obviously I don't know. If you were to actually tell Ghostscript about the fonts (instead of fontconfig) then it might work better. You would need to edit/create a fontmap.GS file and tell Ghostscript where to find it, using the -I (Include) switch.
You can find an example fontmap.GS in /ghostscript/Resource/Init
In order to help more I'd need to knwo the exact path where the font file is located, the method you are using to tell Ghostscript where the font file is located, and the exact configuration of the fontmap.GS (or whatever means you are using). The PDF file and font file would also be helpful, in case the font file is corrupted in some way. I;d also be curious about the format of the ArialMT and C059-Roman fonts. PostScript or TrueType fonts ?

Unable to start the startxwin cygwin

I was using Cygwin download by some one in our organization and the xterm worked fine, I recently updated the xterm version, since then I lost the ability to use the xwindow/start the xserver. The bash console where I ran the startxwin shows the following and does not open the xterm window. I manually opened the xterm window and it does not open the xwindows when I ssh'd into the remote machine.
$ startxwin
:0" in "list" command display name "6175
xauth: (stdin):1: bad "add" command line
Welcome to the XWin X Server
Vendor: The Cygwin/X Project
Release: 1.19.3.0
OS: CYGWIN_NT-6.1 6175 2.8.0(0.309/5/3) 2017-04-01 20:47 x86_64
OS: Windows 7 Service Pack 1 [Windows NT 6.1 build 7601] (Win64)
Package: version 1.19.3-2 built 2017-04-23
XWin was started with the following command line:
/usr/bin/XWin :0 -multiwindow -auth
/home/usrpao/.serverauth.13204
(II) xorg.conf is not supported
(II) See http://x.cygwin.com/docs/faq/cygwin-x-faq.html for more information
LoadPreferences: /home/usrpao/.XWinrc not found
LoadPreferences: Loading /etc/X11/system.XWinrc
LoadPreferences: Done parsing the configuration file...
winDetectSupportedEngines - RemoteSession: no
winDetectSupportedEngines - DirectDraw4 installed, allowing ShadowDDNL
winDetectSupportedEngines - Returning, supported engines 00000005
winSetEngine - Multi Window or Rootless => ShadowGDI
winScreenInit - Using Windows display depth of 32 bits per pixel
winAllocateFBShadowGDI - Creating DIB with width: 2560 height: 1024 depth: 32
winFinishScreenInitFB - Masks: 00ff0000 0000ff00 000000ff
winInitVisualsShadowGDI - Masks 00ff0000 0000ff00 000000ff BPRGB 8 d 24 bpp 32
MIT-SHM extension disabled due to lack of kernel support
XFree86-Bigfont extension local-client optimization disabled due to lack of shared memory support in the kernel
glWinSelectGLimplementation: Loaded 'cygnativeGLthunk.dll'
(II) AIGLX: Testing pixelFormatIndex 1
GL_VERSION: 4.3.0 - Build 10.18.14.4280
GL_VENDOR: Intel
GL_RENDERER: Intel(R) HD Graphics 4600
(II) GLX: enabled GLX_SGI_make_current_read
(II) GLX: enabled GLX_SGI_swap_control
(II) GLX: enabled GLX_MESA_swap_control
(II) GLX: enabled GLX_SGIX_pbuffer
(II) GLX: enabled GLX_ARB_multisample
(II) GLX: enabled GLX_SGIS_multisample
(II) GLX: enabled GLX_ARB_fbconfig_float
(II) GLX: enabled GLX_EXT_fbconfig_packed_float
(II) GLX: enabled GLX_ARB_create_context
(II) GLX: enabled GLX_ARB_create_context_profile
(II) GLX: enabled GLX_ARB_create_context_robustness
(II) GLX: enabled GLX_EXT_create_context_es2_profile
(II) GLX: enabled GLX_ARB_framebuffer_sRGB
(II) AIGLX: enabled GLX_MESA_copy_sub_buffer
(II) 80 pixel formats reported by wglGetPixelFormatAttribivARB
(II) 44 fbConfigs
(II) ignored pixel formats: 0 not OpenGL, 0 unknown pixel type, 36 unaccelerated
(II) GLX: Initialized Win32 native WGL GL provider for screen 0
winPointerWarpCursor - Discarding first warp: 1280 512
(--) 8 mouse buttons found
(--) Setting autorepeat to delay=500, rate=31
(--) Windows keyboard layout: "00000409" (00000409) "US", type 4
(--) Found matching XKB configuration "English (USA)"
(--) Model = "pc105" Layout = "us" Variant = "none" Options = "none"
Rules = "base" Model = "pc105" Layout = "us" Variant = "none" Options = "none"
winInitMultiWindowWM - DISPLAY=:0.0
winMultiWindowXMsgProc - DISPLAY=:0.0
winInitMultiWindowWM - xcb_connect () returned and successfully opened the display.
winProcEstablishConnection - winInitClipboard returned.
winClipboardThreadProc - DISPLAY=:0.0
winMultiWindowXMsgProc - xcb_connect() returned and successfully opened the display.
OS maintains clipboard viewer chain: yes
winClipboardProc - XOpenDisplay () returned and successfully opened the display.
Using Composite redirection
I had a similar issue and tracked it down to startx misbehaving. This is what I saw:
$ ./startx -- :1
:1" in "list" command display name "hostname
xauth: (stdin):1: bad "add" command line
The problem is with the way startx is setting its hostname variable. It does not strip out the carriage return. Later, when startx uses the hostname variable, the contained carriage return causes xuath "add" command to terminate prematurely and fail.
The fix for me was to strip out the carriage return.
Find where the variable hostname gets set in /usr/bin/startx, and just below there add this line:
hostname=$(echo $hostname | sed 's/\x0D//g')
It's a PATH issue. Cygwin\bin should be in the PATH before Windows\System32.
Otherwise system32\hostname.exe is invoked instead of /bin/hostname, which adds a trailing newline to the hostname.
A possible workaround is to invoke /bin/hostname instead of just hostname:
sed -i s/\`hostname/\`\\/bin\\/hostname/ /usr/bin/startx /usr/bin/startxwin
... but you'll face other issues until you put Cygwin at the beginning of the PATH (though then you might face yet other issues, so there's no ideal solution).

gnuplot pdfcairo unnamed Type 3 font in output on macos

Gnuplot with the pdfcairo terminal seems to give strange behavior in terms of fonts, where the generated pdf has unnamed, Type 3 fonts. Here's output from pdffonts on the output pdf file:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
[none] Type 3 Custom yes no yes 5 0
HTVMTJ+Helvetica CID TrueType Identity-H yes yes yes 6 0
ITRAZO+Helvetica CID TrueType Identity-H yes yes yes 7 0
This is a problem because the publisher refuses to accept Type 3 fonts in documents. How do I get around this? Below is a small gnuplot file that reproduces the problem on OS X El Capitan 10.11.3, gnuplot 5.0 patchlevel 5:
set terminal pdfcairo font "Helvetica, 16"
set output "../plots/test.pdf"
set xlabel "x-axis"
set ylabel "y-axis"
set xrange [0:1]
set yrange [0:1]
plot 0.5 title "y=0.5" w l lw 3
For now, my workaround is to use the postscript terminal and then epstopdf, but this requires a lot of rework on many many scripts. Any ideas what's wrong here?
I've hit this issue too and narrowed it down to being spaces. Any time you add whitespace to an axis label or set the tic format to include space, you get an unnamed Type 3 font added. I can't even begin to understand why.

strange characters: interaction of R and Windows locale?

WinXP-x32, R-2.13.0
Dear list,
I have a problem that (I think) relates to the interaction between Windows and R.
I am trying to scrape a table with data on the Hawai'ian Islands. This is my R code:
library(XML)
u <- "http://en.wikipedia.org/wiki/Hawaii"
tables <- readHTMLTable(u)
Islands <- tables[[5]]
The output is (first set of columns):
Island Nickname > > Islands
Island Nickname > > Location 1 Hawaiʻi[7] The Big
Island 19°34′N 155°30′W /
19.567°N 155.5°W / 19.567;
-155.5 2 Maui[8] The Valley Isle 20°48′N 156°20′W /
20.8°N 156.333°W / 20.8;
-156.333 3 Kahoʻolawe[9] The Target Isle 20°33′N
156°36′W / 20.55°N
156.6°W / 20.55; -156.6 4 LÄnaÊ»i[10] The Pineapple Isle
20°50′N 156°56′W /
20.833°N 156.933°W / 20.833;
-156.933 5 Molokaʻi[11] The Friendly Isle 21°08′N
157°02′W / 21.133°N
157.033°W / 21.133; -157.033 6 Oʻahu[12] The Gathering Place
21°28′N 157°59′W /
21.467°N 157.983°W / 21.467;
-157.983 7 Kauaʻi[13] The Garden Isle 22°05′N
159°30′W / 22.083°N
159.5°W / 22.083; -159.5 8 Niʻihau[14] The Forbidden Isle
21°54′N 160°10′W / 21.9°N
160.167°W / 21.9; -160.167
As you can see, there are "weird" characters in there. I have also tried readHTMLTable(u, encoding = "UTF-16") and readHTMLTable(u, encoding = "UTF-8")
but that didn't help.
It seems to me that there may be an issue with the interaction of the Windows settings of the character set and R.
sessionInfo() gives
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_3.2-0.2
I have also attempted to let R use another setting by entering: Sys.setlocale("LC_ALL", "en_US.UTF-8"), but this yields the response:
> Sys.setlocale("LC_ALL", "en_US.UTF-8")
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
OS reports request to set locale to "en_US.UTF-8" cannot be honored
In addition, I have attempted to make the change directly from the windows command prompt, using: chcp 65001 and variations of that, but that didn't change anything.
I noticed from searching the web that others have the issue as well, but have not been able to find a solution. I looks like this is an issue of how Windows and R interact. Unfortunately, all three computers at my disposal have this problem. It occurs both under WinXP-x32 and under Win7-x86.
Is there a way to make R override the windows settings or can the issue be solved otherwise?
I have also tried other websites, and the issue occurs every time when there is an é, ü, ä, î, et cetera in the text-to-be-scraped.
Thank you,
Roger
A not quite an answer:
If you look at the wikipedia page and change the encoding in your browser (in IE, View -> Encoding; in Firefox, View -> Character Encoding) to Western (ISO-8869-1) or Western (Windows-1252) then you see the silly characters. That ought to mean that you can use iconv to change the encoding and fix your problems.
#Convert factors to character
Islands <- as.data.frame(lapply(Islands, as.character), stringsAsFactors = FALSE)
iconv(Islands$Island, "windows-1252", "UTF-8")
Unfortunately, it doesn't work. It may be possible to get the correct text by using a different conversion (iconvlist() shows all the possibilities).
It is possible it simply strip out the offending characters, though this isn't ideal.
iconv(Islands$Island, "windows-1252", "ASCII", "")
Unable to replicate the error, however looking at the help files is useful.
Sys.setlocale("LC_TIME", "de") # Solaris: details are OS-dependent
Sys.setlocale("LC_TIME", "de_DE.utf8") # Modern Linux etc.
Sys.setlocale("LC_TIME", "de_DE.UTF-8") # ditto
Sys.setlocale("LC_TIME", "de_DE") # OS X, in UTF-8
Sys.setlocale("LC_TIME", "German") # Windows
For a windows you should use formatting like "English" or "Dutch_Netherlands.1252" to change these settings.
I tried to replicate your state
> Sys.setlocale("LC_ALL","Dutch_Netherlands.1252")
[1] "LC_COLLATE=Dutch_Netherlands.1252;LC_CTYPE=Dutch_Netherlands.1252;LC_MONETARY=Dutch_Netherlands.1252;LC_NUMERIC=C;LC_TIME=Dutch_Netherlands.1252"
> Sys.getlocale()
[1] "LC_COLLATE=Dutch_Netherlands.1252;LC_CTYPE=Dutch_Netherlands.1252;LC_MONETARY=Dutch_Netherlands.1252;LC_NUMERIC=C;LC_TIME=Dutch_Netherlands.1252"
library(XML)
u <- "http://en.wikipedia.org/wiki/Hawaii"
tables <- readHTMLTable(u)
Islands <- tables[[5]]
However I do not get the funny characters in console, in my own locale the ʻ was marked as , but still all functionality remained.
> Islands[1,1]
[1] Hawaiʻi[27]
8 Levels: Hawaiʻi[27] Kahoʻolawe[34] Kauaʻi[30] Lānaʻi[32] Maui[28] ... Oʻahu[29]
And these funny characters can be read easily, and found from the table.
> Encoding(as.character("Hawaiʻi"))
[1] "UTF-8"
> Encoding(as.character(Islands[1,1]))
[1] "UTF-8"
> grep("Hawaiʻi", as.character(Islands[1,1]))
[1] 1
If you still have problems it would rely elsewhere, however to change the locale under windows you have to use different names than Linux or OS X (see your own locale info for example). In Windows "Dutch" is probably enough.

Resources