The challenge: The shortest code, by character count, that detects and removes duplicate characters in a String. Removal includes ALL instances of the duplicated character (so if you find 3 n's, all three have to go), and original character order needs to be preserved.
Example Input 1:
Example Output 1:
Example Input 2:
Example Output 2:
(the second example removes letters that occur three times; some solutions have failed to account for this)
(This is based on my other question where I needed the fastest way to do this in C#, but I think it makes good Code Golf across languages.)

LabVIEW 7.1
ONE character and that is the blue constant '1' in the block diagram.
I swear, the input was copy and paste ;-)

21 characters of perl, 31 to invoke, 36 total keystrokes (counting shift and final return):
perl -pe's/$1//gwhile/(.).*\1/'

Ruby — 61 53 51 56 35
61 chars, the ruler says. (Gives me an idea for another code golf...)
puts ((i=gets.split('')){|c|i.to_s.count(c)<2}).join
... 35 by Nakilon

23 characters:
I'm an APL newbie (learned it yesterday), so be kind -- this is certainly not the most efficient way to do it. I'm ashamed I didn't beat Perl by very much.
Then again, maybe it says something when the most natural way for a newbie to solve this problem in APL was still more concise than any other solution in any language so far.

print filter(lambda c:s.count(c)<2,s)
This is a complete working program, reading from and writing to the console. The one-liner version can be directly used from the command line
python -c 's=raw_input();print filter(lambda c:s.count(c)<2,s)'

J (16 12 characters)
(~.{~[:I.1=#/.~) 'nbHHkRvrXbvkn'
It only needs the parenthesis to be executed tacitly. If put in a verb, the actual code itself would be 14 characters.
There certainly are smarter ways to do this.
EDIT: The smarter way in question:
(~.#~1=#/.~) 'nbHHkRvrXbvkn'
12 characters, only 10 if set in a verb. I still hate the fact that it's going through the list twice, once to count (#/.) and another to return uniques (nub or ~.), but even nubcount, a standard verb in the 'misc' library does it twice.

There's surely shorter ways to do this in Haskell, but:
Prelude Data.List> let h y=[x|x<-y,(<2).length$filter(==x)y]
Prelude Data.List> h "nbHHkRvrXbvkn"
Ignoring the let, since it's only required for function declarations in GHCi, we have h y=[x|x<-y,(<2).length$filter(==x)y], which is 37 characters (this ties the current "core" Python of "".join(c for c in s if s.count(c)<2), and it's virtually the same code anyway).
If you want to make a whole program out of it,
h y=[x|x<-y,(<2).length$filter(==x)y]
main=interact h
$ echo "nbHHkRvrXbvkn" | runghc tmp.hs
$ wc -c tmp.hs
54 tmp.hs
Or we can knock off one character this way:
$ echo "nbHHkRvrXbvkn" | runghc tmp2.hs
$ wc -c tmp2.hs
53 tmp2.hs
It operates on all of stdin, not line-by-line, but that seems acceptable IMO.

C89 (106 characters)
This one uses a completely different method than my original answer. Interestingly, after writing it and then looking at another answer, I saw the methods were very similar. Credits to caf for coming up with this method before me.
On one line, it's 58+48 = 106 bytes.
C89 (173 characters)
This was my original answer. As said in the comments, it doesn't work too well...
On two lines, it's 17+1+78+77 = 173 bytes.

65 Characters:
new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());
67 Characters with reassignment:
h=new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());

new string(input.GroupBy(c => c).Where(g => g.Count() == 1).ToArray());
71 characters

PHP (136 characters)
function q($x){return $x<2;}echo implode(array_keys(array_filter(
On one line, it's 5+1+65+65 = 136 bytes. Using PHP 5.3 you could save a few bytes making the function anonymous, but I can't test that now. Perhaps something like:
echo implode(array_keys(array_filter(array_count_values(str_split(
stream_get_contents(STDIN))),function($x){return $x<2;})));
That's 5+1+66+59 = 131 bytes.

another APL solution
As a dynamic function (18 charachters)
line assuming that input is in variable x (16 characters):

For Each c In s : s = IIf(s.LastIndexOf(c) <> s.IndexOf(c), s.Replace(CStr(c), Nothing), s) : Next
Granted, VB is not the optimal language to try to save characters, but the line comes out to 98 characters.

61 characters. Where $s="nbHHkRvrXbvkn" and $a is the result.
Fully functioning parameterized script:

C: 83 89 93 99 101 characters
O(n2) time.
Limited to 999 characters.
Only works in 32-bit mode (due to not #include-ing <stdio.h> (costs 18 chars) making the return type of gets being interpreted as an int and chopping off half of the address bits).
Shows a friendly "warning: this program uses gets(), which is unsafe." on Macs.
main(){char s[999],*c=gets(s);for(;*c;c++)strchr(s,*c)-strrchr(s,*c)||putchar(*c);}
(and this similar 82-chars version takes input via the command line:

Golfscript(sym) - 15
(just knocking a few characters off Mark Rushakoff's effort, I'd rather it was posted as a comment on his)
h y=[x|x<-y,[_]<-[filter(==x)y]]
which is better Haskell idiom but maybe harder to follow for non-Haskellers than this:
h y=[z|x<-y,[z]<-[filter(==x)y]]
Edit to add an explanation for hiena and others:
I'll assume you understand Mark's version, so I'll just cover the change. Mark's expression:
(<2).length $ filter (==x) y
filters y to get the list of elements that == x, finds the length of that list and makes sure it's less than two. (in fact it must be length one, but ==1 is longer than <2 ) My version:
[z] <- [filter(==x)y]
does the same filter, then puts the resulting list into a list as the only element. Now the arrow (meant to look like set inclusion!) says "for every element of the RHS list in turn, call that element [z]". [z] is the list containing the single element z, so the element "filter(==x)y" can only be called "[z]" if it contains exactly one element. Otherwise it gets discarded and is never used as a value of z. So the z's (which are returned on the left of the | in the list comprehension) are exactly the x's that make the filter return a list of length one.
That was my second version, my first version returns x instead of z - because they're the same anyway - and renames z to _ which is the Haskell symbol for "this value isn't going to be used so I'm not going to complicate my code by giving it a name".

Javascript 1.8
s.split('').filter(function (o,i,a) a.filter(function(p) o===p).length <2 ).join('');
or alternately- similar to the python example:
[s[c] for (c in s) if (s.split("").filter(function(p) s[c]===p).length <2)].join('');

123 chars. It might be possible to get it shorter, but this is good enough for me.
proc h {i {r {}}} {foreach c [split $i {}] {if {[llength [split $i $c]]==2} {set r $r$c}}
return $r}
puts [h [gets stdin]]

Full program in C, 141 bytes (counting newlines).

54 chars for the method body only, 66 with (statically typed) method declaration:
def s(s:String)=(""/:s)((a,b)=>if(s.filter(c=>c==b).size>1)a else a+b)

63 chars.
puts (t=gets.split(//)).map{|i|t.count(i)>1?nil:i}.compact.join

96 characters for complete working statement
Dim p=New String((From c In"nbHHkRvrXbvkn"Group c By c Into i=Count Where i=1 Select c).ToArray)
Complete working statement, with original string and the VB Specific "Pretty listing (reformatting of code" turned off, at 96 characters, non-working statement without original string at 84 characters.
(Please make sure your code works before answering. Thank you.)

(1st version: 112 characters; 2nd version: 107 characters)
/* #include <stdio.h> */
/* int */ k[256], o[100000], p, c;
/* int */ main(/* void */) {
while((c=getchar()) != -1/*EOF*/) {
++k[o[p++] = /*(unsigned char)*/c];
for(c=0; c<p; c++) {
if(k[o[c]] == 1) {
/* return 0; */
Because getchar() returns int and putchar accepts int, the #include can 'safely' be removed.
Without the include, EOF is not defined, so I used -1 instead (and gained a char).
This program only works as intended for inputs with less than 100000 characters!
Version 2, with thanks to strager
107 characters
#include <stdio.h>
/* global variables are initialized to 0 */
int char_count[256]; /* k in the other layout */
int char_order[999999]; /* o ... */
int char_index; /* p */
int main(int ch_n_loop, char **dummy) /* c */
/* variable with 2 uses */
(void)dummy; /* make warning about unused variable go away */
while ((ch_n_loop = getchar()) >= 0) /* EOF is, by definition, negative */
++char_count[ ( char_order[char_index++] = ch_n_loop ) ];
/* assignment, and increment, inside the array index */
/* reuse ch_n_loop */
for (ch_n_loop = 0; ch_n_loop < char_index; ch_n_loop++) {
(char_count[char_order[ch_n_loop]] - 1) ? 0 : putchar(char_order[ch_n_loop]);
return 0;

Javascript 1.6
Shorter than the previously posted Javascript 1.8 solution (71 chars vs 85)

Tested with WinXP DOS box (cmd.exe):
xchg cx,bp
mov al,2
rep stosb
inc cl
l0: ; to save a byte, I've encoded the instruction to exit the program into the
; low byte of the offset in the following instruction:
lea si,[di+01c3h]
push si
l1: mov dx,bp
mov ah,6
int 21h
jz l2
mov bl,al
shr byte ptr [di+bx],cl
jz l1
inc si
mov [si],bx
jmp l1
l2: pop si
l3: inc si
mov bl,[si]
cmp bl,bh
je l0+2
cmp [di+bx],cl
jne l3
mov dl,bl
mov ah,2
int 21h
jmp l3
Assembles to 53 bytes. Reads standard input and writes results to standard output, eg:
programname < input > output

118 characters actual code (plus 6 characters for the PHP block tag):

C# (53 Characters)
Where s is your input string:
new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());
Or 59 with re-assignment:
var a=new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());

Haskell Pointfree
import Data.List
import Control.Monad
import Control.Arrow
The whole program is 97 characters, but the real meat is just 23 characters. The rest is just imports and bringing the function into the IO monad. In ghci with the modules loaded it's just
(liftM2(\\)nub$ap(\\)nub) "nbHHkRvrXbvkn"
In even more ridiculous pointfree style (pointless style?):
main=interact$liftM2 ap liftM2 ap(\\)nub
It's a bit longer though at 26 chars for the function itself.

Shell/Coreutils, 37 Characters
fold -w1|sort|uniq -u|paste -s -d ''


For a simple starter project I was putting together a 2-7segement display 00 to 99 counter coded on sketch.
//The line below is the array containing all the binary numbers for the digits on a SSD from 0 to 9
const int number[11] = {0b1000000, 0b1111001, 0b0100100, 0b0110000, 0b0011001, 0b0010010, 0b0000010, 0b1111000, 0b0000000, 0b0010000};
I believe that my solution is either to change this part of the code or add another line, I'm just unsure.
Any advice?
I have tried adding another line to set one of the displays to stop at 6 but it didn't compile with the rest of the code.

Why is this if statement forcing zenity's --auto-close to close immediately?

Got a Debian package set up for a little application I was working on, this isn't really relevant for the question but for some context, the application is a simple bash script that deploys some docker containers on the local machine. But I wanted to add a dependency check to make sure the system had docker before it attempted to do anything. If it doesn't, download it, if it does, ignore it. Figured it be nice to have a little zenity dialog alongside it to show what was going on.
In that process, I check for internet before starting for obvious reasons and for some reason, the way I check if there is internet if zenity has the --auto-close flag, will instantly close the entire progress block.
Here is a little dummy example, that if statement is a straight copy-paste from my code, everything else is filler. :
if [[ $condition ]]; then
echo "0"
# Check for internet
if ping -c 3 -W 3; then
echo "# Internet detected, starting updates..."; sleep 1
echo "10"
err_msg="# No internet detected. You may be missing some dependencies.
Services may not function as expected until they are installed."
echo $err_msg
zenity --error --text="$err_msg"
echo "100"
exit 1
echo "15"
echo "# Downloading a thing" ; sleep 1
echo "50"
if zenity --question --text="Do you want to download a special thing?"; then
echo "# Downloading special thing" ; sleep 1
echo "# Not downloading special thing" ; sleep 1
echo "75"
echo "# downloading big thing" ; sleep 3
echo "90"
echo "# Downloading last thing" ; sleep 1
echo "100"
) |
zenity --progress --title="Dependency Management" --text="downloading dependencies, please wait..." \
--percentage=0 --auto-close
So im really just wondering why this is making zenity freak-out. If you comment out that if statement, everything works as you expect and zenity progress screen closes once it hits 100.
If you keep the if statement but remove the auto-close flag, it will execute as expected. It's like its initializing at 100 and then going to 0 to progress normally. But if that was the case, --auto-close would never work but in the little example they give you in the help section, it works just fine.
Thank you for a fun puzzle! Spoiler is at the end, but I thought it might be helpful to look over my shoulder while I poked at the problem. 😀️ If you're more interested in the answer than the journey, feel free to scroll. I'll never know, anyway.
Following my own advice (see 1st comment beneath the question), I set out to create a small, self-contained, complete example. But, as they say in tech support: Before you can debug the problem, you need to debug the customer. (No offense; I'm a terrible witness myself unless I know ahead of time that someone's going to need to reproduce a problem I've found.)
I interpreted your comment about checking for Internet to mean "it worked before I added the ping and failed afterward," so the most sensible course of action seemed to be commenting out that part of the code... and then it worked! So what happens differently when the ping is added?
Changes in timing wouldn't make sense, so the problem must be that ping generates output that gets piped to zenity. So I changed the command to redirect its output to the bit bucket:
ping -c 3 -W 3 &>/dev/null;
...and that worked, too! Interesting!
I explored what turned out to be a few ratholes:
I ran ping from the command line and piped its output through od -xa to check for weird control characters, but nope.
Instead of enclosing the contents of the if block in parentheses (()), which executes the commands in a sub-shell, I tried braces ({}) to execute them in the same shell. Nope, again.
I tried a bunch of other embarrassingly useless and time-consuming ideas. Nope, nope, and nope.
Then I realized I could just do
ping -c 3 -W 3 | zenity --progress --auto-close
directly from the command line. That failed with the --auto-close flag but worked normally without it. Boy, did that simplify things! That's about as "smallest" as you can get. But it's not, actually: I used up all of my remaining intelligence points for the day by redirecting the output from ping into a file, so I could just
(cat output; sleep 1) | zenity --progress --auto-close
and not keep poking at poor until I finally figured this thing out. (The sleep gave me enough time to see the pop-up when it worked, because zenity exits when the pipe closes at the end of the input. So, what's in that output file?
PING ( 56(84) bytes of data.
64 bytes from ( icmp_seq=1 ttl=59 time=18.5 ms
64 bytes from ( icmp_seq=2 ttl=59 time=21.8 ms
64 bytes from ( icmp_seq=3 ttl=59 time=21.4 ms
--- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 18.537/20.572/21.799/1.449 ms
The magic zenity-killer must be in there somewhere! All that was left (ha, "all"!) was to make my "smallest" example even smaller by deleting pieces of the file until it stopped breaking. Then I'd put back whatever I'd deleted last, and I deleted something else, da capo, ad nauseam, or at least ad minimus. (Or whatever; I don't speak Latin.) Eventually the file dwindled to
64 bytes from ( icmp_seq=1 ttl=59 time=18.5 ms
and I started deleting stuff from the beginning. Eventually I found that it would break regardless of the length of the line, as long as it started with a number that wasn't 0 and had at least 3 digits somewhere within it. Huh. It'd also break if it did start with a 0 and had at least 4 digits within... unless the second digit was also 0! What's more, a period would make it even weirder: none of the digits anywhere after the period would make it break, no matter what they were.
And then, then came the ah-ha! moment. The zenity documentation says:
Zenity reads data from standard input line by line. If a line is
prefixed with #, the text is updated with the text on that line. If a
line contains only a number, the percentage is updated with that
Wow, really? It can't be that ridiculous, can it?
I found the source for zenity, downloaded it, extracted it (with tar -xf zenity-3.42.1.tar.xz), opened progress.c, and found the function that checks to see if "a line contains only a number." The function is called only if the first character in the line is a number.
108 static float
109 stof(const char* s) {
110 float rez = 0, fact = 1;
111 if (*s == '-') {
112 s++;
113 fact = -1;
114 }
115 for (int point_seen = 0; *s; s++) {
116 if (*s == '.' || *s == ',') {
117 point_seen = 1;
118 continue;
119 }
120 int d = *s - '0';
121 if (d >= 0 && d <= 9) {
122 if (point_seen) fact /= 10.0f;
123 rez = rez * 10.0f + (float)d;
124 }
125 }
126 return rez * fact;
127 }
Do you see it yet? Here, I'll give you a sscce, with comments:
// Clear the "found a decimal point" flag and iterate
// through the input in `s`.
115 for (int point_seen = 0; *s; s++) {
// If the next char is a decimal point (or a comma,
// for Europeans), set the "found it" flag and check
// the next character.
116 if (*s == '.' || *s == ',') {
117 point_seen = 1;
118 continue;
119 }
// Sneaky C trick that converts a numeric character
// to its integer value. Ex: char '1' becomes int 1.
120 int d = *s - '0';
// We only care if it's actually an integer; skip anything else.
121 if (d >= 0 && d <= 9) {
// If we saw a decimal point, we're looking at tenths,
// hundredths, thousandths, etc., so we'll need to adjust
// the final result. (Note from the peanut gallery: this is
// just ridiculous. A progress bar doesn't need to be this
// accurate. Just quit at the first decimal point instead
// of trying to be "clever."
122 if (point_seen) fact /= 10.0f;
// Tack the new digit onto the end of the "rez"ult.
// Ex: if rez = 12.0 and d = 5, this is 12.0 * 10.0 + 5. = 125.
123 rez = rez * 10.0f + (float)d;
124 }
125 }
// We've scanned the entire line, so adjust the result to account
// for the decimal point and return the number.
126 return rez * fact;
Now do you see it?
The author decides "[i]f a line contains only a number" by checking (only!) that the first character is a number. If it is, then it plucks out all the digits (and the first decimal, if there is one), mashes them all together, and returns whatever it found, ignoring anything else it may have seen.
So of course it failed if there were 3 digits and the first wasn't 0, or if there were 4 digits and the first 2 weren't 0... because a 3-digit number is always at least 100, and zenity will --auto-close as soon as the progress is 100 or higher.
The ping statement generates output that confuses zenity into thinking the progress has reached 100%, so it closes the dialog.
By the way, congratulations: you found one of the rookiest kinds of rookie mistakes a programmer can make... and it's not your bug! For whatever reason, the author of zenity decided to roll their own function to convert a line of text to a floating-point number, and it doesn't do at all what the doc says, or what any normal person would expect it to do. (Protip: libraries will do this for you, and they'll actually work most of the time.)
You can score a bunch of karma points if you can figure out how to report the bug, and you'll get a bonus if you submit your report in the form of a fix. 😀️

What is the meaning of numbers in inline assemble

Do anyone know what the following code does?
I'm not sure what the 1, 2, 3 is refered and how they are used here. :-(
95 asm volatile("2: wrmsr ; xor %[err],%[err]\n"
96 "1:\n\t"
97 ".section .fixup,\"ax\"\n\t"
98 "3: mov %[fault],%[err] ; jmp 1b\n\t"
99 ".previous\n\t"
100 _ASM_EXTABLE(2b, 3b)
101 : [err] "=a" (err)
102 : "c" (msr), "" (low), "d" (high),
103 [fault] "i" (-EIO)
104 : "memory");
105 return err;​
The code is from Linux:
I really appreciate it ​if anyone could give me some key word to google it.
Thank you very much in advance!
Those are local labels (numbers followed by a colon).
When they are later referenced, the b (as in jmp 1b) means to refer to the nearest local label of that number going backwards. An f would look for a matching local label later (forwards) in the code.
That code declares an exception table, when an exception occurs executing the wrmsr instruction, the fault handler (usually in arch/<your_CPU_arch>/mm/fault.c) searches the exception table for the corresponding entry, and jumps there.
As you can see, the entry for that exception moves EIO into err, and jumps back to the instruction following the xor (which would clear err in case there was no error).

Getting GCC to optimize hand assembly

In an attempt to make GCC not generate a load-modify-store operation every time I do |= or &=, I have defined the following macros:
#define bset(base, offset, mask) bmanip(set, base, offset, mask)
#define bclr(base, offset, mask) bmanip(clr, base, offset, mask)
#define bmanip(op, base, offset, mask) \
asm("ldx " #base);\
asm("b" #op " " #offset ",x " #mask);\
And they work great; the disassembled binary is perfect.
The problem comes when I use more than one in sequence:
inline void spi_init()
bset(_io_ports, M6811_DDRD, 0x38);
bset(_io_ports, M6811_PORTD, 0x20);
bset(_io_ports, M6811_SPCR, (M6811_SPE | M6811_DWOM | M6811_MSTR));
This results in:
00002227 <spi_init>:
2227: 3c pshx
2228: fe 10 00 ldx 0x1000 <_io_ports>
222b: 1c 09 38 bset 0x9,x, #0x38
222e: 38 pulx
222f: 3c pshx
2230: fe 10 00 ldx 0x1000 <_io_ports>
2233: 1c 08 20 bset 0x8,x, #0x20
2236: 38 pulx
2237: 3c pshx
2238: fe 10 00 ldx 0x1000 <_io_ports>
223b: 1c 28 70 bset 0x28,x, #0x70
223e: 38 pulx
223f: 39 rts
Is there any way to get GCC (3.3.6-m68hc1x-20060122) to automatically optimize out the redundant stack operations?
gcc will always emit the assembly instructions you tell it to emit. So instead of explicitly writing code to load registers with the value you want to manipulate, you instead want to tell gcc to do this on your behalf. You can do this with register constraints.
Unfortunately the 6811 code generator doesn't seem to be a standard part of gcc --- I don't spot the documentation in the manual. So I can't point you at platform-specific bit of the docs. But the generic bit you need to read is here:
The syntax is freaky, but the summary is:
asm("instructions" : outputs : inputs);
...where inputs and outputs are lists of constraints, which tell gcc what value to put where. The classic example is:
asm("fsinx %1,%0" : "=f" (result) : "f" (angle));
f indicates that the named value needs to go into a floating point register; = indicates it's an output; then the names of the registers are substituted into the instruction.
So, you'll probably want something like this:
asm("b" #op " " #offset ",%0 " #mask : "=Z" (i) : "0" (i));
...where i is a variable containing the value you want to modify. Z you'll need to look up in the 6811 gcc docs --- it's a constraint which represents a register which is valid for the asm instruction which is being generated. The 0 indicates that the input shares a register with output 0, and is used for read/write values.
Because you've told gcc what register you want i to be, it can integrate this knowledge into its register allocator and find the least-cost way to get i where you need it with the least amount of code. (Sometimes no additional code.)
gcc inline assembly is deeply contorted and weird, but pretty powerful. It's worth spending some time to thoroughly understand the constraint system to get the best use out of it.
(Incidentally, I don't know 6811 code, but have you forgotten to put the result of the op somewhere? I'd expect to see an stx to match the ldx.)
Update: Oh, I see what bset is doing now --- it's writing the result back to a memory location, right? That's still doable but it's a bit more painful. You need to tell gcc that you're modifying that memory location, so that it knows not to rely on any cached value. You'll need to have an output parameter with constraint m which represents that location. Check the docs.

reading in a text file with a SUB (1a) (Control-Z) character in R on Windows

Following on from my query last week reading badly formed csv in R - mismatched quotes, these same CSV files also have embedded control characters such as the ASCII Substitute Character which is decimal 26 or 0x1A. Unfortunately readLines() seems to truncate the line at this character, so I am having difficulty in matching quotes - apart from losing the later fields in these lines!
I have tried to readBin() but I can't get it to read this file. I'm afraid I can't cleanly read this into R to give you an example and I'm having difficulty in creating these in R. Sorry not to be able to demonstrate with a clean example. Thoughts?
Now I'm confused - when I use the code
h3 <- paste('1,34,44.4,"', rawToChar(as.raw(c(as.integer(k1), 26, 65))), '",99')
identical(readLines(textConnection(h3)), h3)
I get TRUE which I find quite surprising!
Update 2
[1] "1,34,44.4,\" HIJK\032A \",99"
> writeLines(h3, 'h3.txt')
> h3a <- readLines('h3.txt')
Warning message:
In readLines("h3.txt") : incomplete final line found on 'h3.txt'
> h3a
[1] "1,34,44.4,\" HIJK"
So readLines() reacts differently when coming from a textConnection() and it silently truncates at the SUB character.
I would be surprised if it makes a difference but I'm on 2.15.2 on Windows-64.
Update 3
Some vague success in solving this...
zb <- file('h3.txt', "rb")
tmp <- readBin(zb, raw(), size=1, n=400) # raw is always of size =1
# [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
# [1] 31 2c 33 34 2c 34 34 2e 34 2c 22 20 48 49 4a 4b 1a 41 20 22 2c 39 39 0d 0a
# [1] "1,34,44.4,\" HIJK\032A \",99\r\n"
i.e. if I read in the file as binary and convert to character() afterwards it seems to work... this will be tedious for large CSV files...
Could there be a bug in R in incorrectly detecting a Control-Z as end of file on windows??
I think I've figured out a solution - because there appears to be a problem reading a Control-Z in the middle of a file on Windows, we need to read the file in binary / raw mode.
fnam <- 'h3.txt'
tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*$size, 100))=1
tmp.char <- rawToChar(tmp.bin)
txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE))
[1] "1,34,44.4,\" HIJK\032A \",99"
The following better answer was posted by Duncan Murdoch to R-Devel refer. Converting it into a function I get:
sReadLines <- function(fnam) {
f <- file(fnam, "rb")
res <- readLines(f)
I also ran into this problem when I used read.csv with a csv file that contained the SUB or CTRL-Z in the middle of the file.
Solved it with the readr package (if your file is comma separated)
If you have a ; as a separator, then use:
