How to use VowpalWabbit python framework learn for multi line example? - vowpalwabbit

from vowpalwabbit import pyvw
vw = pyvw.vw("--cb 3 --epsilon 0.2 --quiet")
input = "2:-5:0.2 | Anna"
vw.learn(input)
input = "3:-20:0.2 | Anna \n 2:-20:0.2 | Anna \n 1:-20:0.2 | Anna"
vw.learn([vw.example(string) for string in input.split('\n')])
print(vw.predict(" | Anna"))
This piece of code is throwing error:
RuntimeError Traceback (most recent call last)
<ipython-input-7-e8693ac0708c> in <module>()
4 vw.learn(input)
5 input = "3:-20:0.2 | Anna \n 2:-20:0.2 | Anna \n 1:-20:0.2 | Anna"
----> 6 vw.learn([vw.example(string) for string in input.split('\n')])
7
8 vw.learn(input)
/usr/local/lib/python3.6/dist-packages/vowpalwabbit/pyvw.py in learn(self, ec)
168 pylibvw.vw.learn(self, ec)
169 elif isinstance(ec, list):
--> 170 pylibvw.vw.learn_multi(self,ec)
171 else:
172 raise TypeError('expecting string or example object as ec argument for learn, got %s' % type(ec))
RuntimeError: This reduction does not support multi-line example.
Why am I getting this error? What is the correct syntax for learning from multi-line example?

The issue is that the reduction you are using, CB, is a single line reduction. Therefore passing multi line examples does not make sense in this case. This can be seen by the error:
RuntimeError: This reduction does not support multi-line example.
You can read more about --cb here: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Logged-Contextual-Bandit-Example

Related

Could ruamel.yaml support type descriptor like "num: !!float 4"?

I am learning using ruamel.yaml, and I am wondering whether it supports type descriptor as the original YAML like "num: !!float 4"?
The file is like:
num: !!float 4
I tried import a file like this, but met an error:
---------------------------------------------------------------------------<br>
ValueError Traceback (most recent call last)
Input In [22], in <cell line: 2>()
1 from ruamel import yaml
2 with open("net.yaml", "r", encoding="utf-8") as yaml_file:
----> 3 yaml_dict = yaml.round_trip_load(yaml_file)
4 yaml_dict
...
File ~/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/ruamel/yaml/constructor.py:1469, in RoundTripConstructor.construct_mapping(self, node, maptyp, deep)
1462 if not isinstance(key, Hashable):
1463 raise ConstructorError(
1464 'while constructing a mapping',
1465 node.start_mark,
1466 'found unhashable key',
1467 key_node.start_mark,
1468 )
-> 1469 value = self.construct_object(value_node, deep=deep)
1470 if self.check_mapping_key(node, key_node, maptyp, key, value):
1471 if key_node.comment and len(key_node.comment) > 4 and key_node.comment[4]:
File ~/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/ruamel/yaml/constructor.py:146, in BaseConstructor.construct_object(self, node, deep)
142 # raise ConstructorError(
143 # None, None, 'found unconstructable recursive node', node.start_mark
144 # )
145 self.recursive_objects[node] = None
--> 146 data = self.construct_non_recursive_object(node)
148 self.constructed_objects[node] = data
149 del self.recursive_objects[node]
File ~/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/ruamel/yaml/constructor.py:181, in BaseConstructor.construct_non_recursive_object(self, node, tag)
179 constructor = self.__class__.construct_mapping
180 if tag_suffix is None:
--> 181 data = constructor(self, node)
182 else:
183 data = constructor(self, tag_suffix, node)
File ~/software/python/anaconda/anaconda3/envs/conda-general/lib/python3.10/site-packages/ruamel/yaml/constructor.py:1271, in RoundTripConstructor.construct_yaml_float(self, node)
1259 return ScalarFloat(
1260 sign * float(value_s),
1261 width=width,
(...)
1268 anchor=node.anchor,
1269 )
1270 width = len(value_so)
-> 1271 prec = value_so.index('.') # you can use index, this would not be float without dot
1272 lead0 = leading_zeros(value_so)
1273 return ScalarFloat(
1274 sign * float(value_s),
1275 width=width,
(...)
1279 anchor=node.anchor,
1280 )
ValueError: substring not found
Why do I get this error, and how do I get rid of it?
That is a bug in ruamel.yaml<=0.17.21. The comment on the offending line (1271) says
# you can use index, this would not be float without dot
Obviously the author of that comment didn't know what he was talking about, as in your case, when using !!float 4 you have a float without a dot...
It is trivial to "fix" that by replacing index with find in line 1271, and when doing so that will load your document and you can dump the data.
But the corresponding representer for dumping doesn't cope with that outputs the float as 4.0, dropping the tag.
You could temporarily fix this by registering a simpler float constructor (e.g. the simple one from the SafeLoader), although this will affect all floats:
import sys
import ruamel.yaml
yaml_str = """\
num: !!float 4
"""
yaml = ruamel.yaml.YAML()
yaml.constructor.add_constructor(
'tag:yaml.org,2002:float', ruamel.yaml.constructor.SafeConstructor.construct_yaml_float
)
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
which gives:
num: 4.0

Missing ',' in line when Biopython reads a nexus tree

I want to edit a tree that I got from BEAST2 treeannotator in nexus-format.
Usually I use the module Phylo from Biopython for such work but Phylo.read(r"filename.tree", "nexus") gave me the next exception:
---------------------------------------------------------------------------
NexusError Traceback (most recent call last)
Input In [29], in <cell line: 1>()
----> 1 Phylo.read(r"filename.tree", "nexus")
File ~\miniconda3\lib\site-packages\Bio\Phylo\_io.py:60, in read(file, format, **kwargs)
58 try:
59 tree_gen = parse(file, format, **kwargs)
---> 60 tree = next(tree_gen)
61 except StopIteration:
62 raise ValueError("There are no trees in this file.") from None
File ~\miniconda3\lib\site-packages\Bio\Phylo\_io.py:49, in parse(file, format, **kwargs)
34 """Parse a file iteratively, and yield each of the trees it contains.
35
36 If a file only contains one tree, this still returns an iterable object that
(...)
46
47 """
48 with File.as_handle(file) as fp:
---> 49 yield from getattr(supported_formats[format], "parse")(fp, **kwargs)
File ~\miniconda3\lib\site-packages\Bio\Phylo\NexusIO.py:40, in parse(handle)
32 def parse(handle):
33 """Parse the trees in a Nexus file.
34
35 Uses the old Nexus.Trees parser to extract the trees, converts them back to
(...)
38 eventually change Nexus to use the new NewickIO parser directly.)
39 """
---> 40 nex = Nexus.Nexus(handle)
42 # NB: Once Nexus.Trees is modified to use Tree.Newick objects, do this:
43 # return iter(nex.trees)
44 # Until then, convert the Nexus.Trees.Tree object hierarchy:
45 def node2clade(nxtree, node):
File ~\miniconda3\lib\site-packages\Bio\Nexus\Nexus.py:668, in Nexus.__init__(self, input)
665 self.options["gapmode"] = "missing"
667 if input:
--> 668 self.read(input)
669 else:
670 self.read(DEFAULTNEXUS)
File ~\miniconda3\lib\site-packages\Bio\Nexus\Nexus.py:718, in Nexus.read(self, input)
716 break
717 if title in KNOWN_NEXUS_BLOCKS:
--> 718 self._parse_nexus_block(title, contents)
719 else:
720 self._unknown_nexus_block(title, contents)
File ~\miniconda3\lib\site-packages\Bio\Nexus\Nexus.py:759, in Nexus._parse_nexus_block(self, title, contents)
757 for line in block.commandlines:
758 try:
--> 759 getattr(self, "_" + line.command)(line.options)
760 except AttributeError:
761 raise NexusError("Unknown command: %s " % line.command) from None
File ~\miniconda3\lib\site-packages\Bio\Nexus\Nexus.py:1144, in Nexus._translate(self, options)
1142 break
1143 elif c != ",":
-> 1144 raise NexusError("Missing ',' in line %s." % options)
1145 except NexusError:
1146 raise
NexusError: Missing ',' in line 1 AB298157.1_2015_-7.9133750332192605_114.8086828279248, 2 AB298158.1_2007_-8.41698974207…
Using Nexus.read(Nexus(), input=r"filename.tree") gave the same result. Please could anyone help with this? I cannot understand the reason of this error because nexus file looks correct.
The reason is that Biopython cannot read nexus trees with links, constituent from translations & a newick tree. So it is required previously to convert this to the form with full names into the tree (as hereinbelow).
Begin
tree TREE1 = (((your,tree),(in,(the, newick))),format);
End;
P.S. It is allowed in the newick format to surround the label with quotes, & some programmes or scripts add them to those names that have ambiguous characters. But it can lead to exceptions during the following phylogenetic analysis, for instance, in BEAST. I wish you would be careful with this.

Problem using spacy tokenizer for count vectorizer

I'm trying to do sentiment analysis on Amazon product reviews using the Spacy module for preprocessing the text data. The code I'm using is exactly this. I modified the dataset that I'm using according to what's shown in the link. I'm getting the error:
TypeError Traceback (most recent call last)
<ipython-input-139-bcbf2d3c9cce> in <module>
4 ('classifier', classifier)])
5 # Fit our data
----> 6 pipe_countvect.fit(X_train,y_train)
7 # Predicting with a test dataset
8 sample_prediction = pipe_countvect.predict(X_test)
~\.conda\envs\py36\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params)
328 """
329 fit_params_steps = self._check_fit_params(**fit_params)
--> 330 Xt = self._fit(X, y, **fit_params_steps)
331 with _print_elapsed_time('Pipeline',
332 self._log_message(len(self.steps) - 1)):
~\.conda\envs\py36\lib\site-packages\sklearn\pipeline.py in _fit(self, X, y, **fit_params_steps)
294 message_clsname='Pipeline',
295 message=self._log_message(step_idx),
--> 296 **fit_params_steps[name])
297 # Replace the transformer of the step with the fitted
298 # transformer. This is necessary when loading the transformer
~\.conda\envs\py36\lib\site-packages\joblib\memory.py in __call__(self, *args, **kwargs)
350
351 def __call__(self, *args, **kwargs):
--> 352 return self.func(*args, **kwargs)
353
354 def call_and_shelve(self, *args, **kwargs):
~\.conda\envs\py36\lib\site-packages\sklearn\pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
738 with _print_elapsed_time(message_clsname, message):
739 if hasattr(transformer, 'fit_transform'):
--> 740 res = transformer.fit_transform(X, y, **fit_params)
741 else:
742 res = transformer.fit(X, y, **fit_params).transform(X)
~\.conda\envs\py36\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
1197
1198 vocabulary, X = self._count_vocab(raw_documents,
-> 1199 self.fixed_vocabulary_)
1200
1201 if self.binary:
~\.conda\envs\py36\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab)
1108 for doc in raw_documents:
1109 feature_counter = {}
-> 1110 for feature in analyze(doc):
1111 try:
1112 feature_idx = vocabulary[feature]
~\.conda\envs\py36\lib\site-packages\sklearn\feature_extraction\text.py in _analyze(doc, analyzer, tokenizer, ngrams, preprocessor, decoder, stop_words)
104 doc = preprocessor(doc)
105 if tokenizer is not None:
--> 106 doc = tokenizer(doc)
107 if ngrams is not None:
108 if stop_words is not None:
TypeError: 'str' object is not callable
I'm not sure what's causing this error and how to get rid of it. I'm pretty sure the count vectorizer produces a sparse matrix and not a string one. One thing that I've considered is that I'm using the spacy tokenizer, which was used in the link as vectorizer = CountVectorizer(tokenizer = spacy_tokenizer, ngram_range=(1,1)) but when I ran the program it was saying that spacy_tokenizer was undefined. So I used vectorizer = CountVectorizer(tokenizer = 'spacy', ngram_range=(1,1)) instead. But if I remove this then I don't know how to use the spacy tokenizer, and either way I am not certain that this was indeed the cause of the problem. Please help me out!
The error comes at this line:
doc = tokenizer(doc)
Since it says 'str' is not callable and the only thing being called here is the tokenizer object, it looks like your tokenizer is a string for some reason.
Based on the code you linked it looks like the spacy_tokenizer object is being configured incorrectly. But that variable isn't defined anywhere in the code despite being passed as an option, so the code you linked to looks like it can't possibly run.
It would help if you could make a minimal example that you could actually paste in the question here.

JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0) ---While Tuning gpt2.finetune

Hope you all are doing good ,
I am working on fine tuning GPT 2 model to generate Title based on the content ,While working on it ,I have created a simple CSV files containing only the title to train the model , But while inputting this model to GPT 2 for fine tuning I am getting the following ERROR ,
JSONDecodeError Traceback (most recent call last)
in ()
10 steps=1000,
11 save_every=200,
---> 12 sample_every=25) # steps is max number of training steps
13
14 # gpt2.generate(sess)
3 frames
/usr/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
336 if s.startswith('\ufeff'):
337 s = s.encode('utf8')[3:].decode('utf8')
--> 338 # raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
339 # s, 0)
340 else:
JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
Below is my code for the above :
import gpt_2_simple as gpt2
model_name = "120M" # "355M" for larger model (it's 1.4 GB)
gpt2.download_gpt2(model_name=model_name) # model is saved into current directory under /models/117M/
sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
'titles.csv',
model_name=model_name,
steps=1000,
save_every=200,
sample_every=25) # steps is max number of training steps
I have tried all the basic mechanism of handing UTF -8 BOM but did not find any luck ,Hence requesting your help .It would be a great help from you all .
Try changing the model name because i see you input 120M and the gpt2 model is called 124M

Code Golf: Duplicate Character Removal in String

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
The challenge: The shortest code, by character count, that detects and removes duplicate characters in a String. Removal includes ALL instances of the duplicated character (so if you find 3 n's, all three have to go), and original character order needs to be preserved.
Example Input 1:
nbHHkRvrXbvkn
Example Output 1:
RrX
Example Input 2:
nbHHkRbvnrXbvkn
Example Output 2:
RrX
(the second example removes letters that occur three times; some solutions have failed to account for this)
(This is based on my other question where I needed the fastest way to do this in C#, but I think it makes good Code Golf across languages.)
LabVIEW 7.1
ONE character and that is the blue constant '1' in the block diagram.
I swear, the input was copy and paste ;-)
http://i25.tinypic.com/hvc4mp.png
http://i26.tinypic.com/5pnas.png
Perl
21 characters of perl, 31 to invoke, 36 total keystrokes (counting shift and final return):
perl -pe's/$1//gwhile/(.).*\1/'
Ruby — 61 53 51 56 35
61 chars, the ruler says. (Gives me an idea for another code golf...)
puts ((i=gets.split(''))-i.select{|c|i.to_s.count(c)<2}).join
+-------------------------------------------------------------------------+
|| | | | | | | | | | | | | | | |
|0 10 20 30 40 50 60 70 |
| |
+-------------------------------------------------------------------------+
gets.chars{|c|$><<c[$_.count(c)-1]}
... 35 by Nakilon
APL
23 characters:
(((1+ρx)-(ϕx)ιx)=xιx)/x
I'm an APL newbie (learned it yesterday), so be kind -- this is certainly not the most efficient way to do it. I'm ashamed I didn't beat Perl by very much.
Then again, maybe it says something when the most natural way for a newbie to solve this problem in APL was still more concise than any other solution in any language so far.
Python:
s=raw_input()
print filter(lambda c:s.count(c)<2,s)
This is a complete working program, reading from and writing to the console. The one-liner version can be directly used from the command line
python -c 's=raw_input();print filter(lambda c:s.count(c)<2,s)'
J (16 12 characters)
(~.{~[:I.1=#/.~)
Example:
(~.{~[:I.1=#/.~) 'nbHHkRvrXbvkn'
RrX
It only needs the parenthesis to be executed tacitly. If put in a verb, the actual code itself would be 14 characters.
There certainly are smarter ways to do this.
EDIT: The smarter way in question:
(~.#~1=#/.~) 'nbHHkRvrXbvkn'
RrX
12 characters, only 10 if set in a verb. I still hate the fact that it's going through the list twice, once to count (#/.) and another to return uniques (nub or ~.), but even nubcount, a standard verb in the 'misc' library does it twice.
Haskell
There's surely shorter ways to do this in Haskell, but:
Prelude Data.List> let h y=[x|x<-y,(<2).length$filter(==x)y]
Prelude Data.List> h "nbHHkRvrXbvkn"
"RrX"
Ignoring the let, since it's only required for function declarations in GHCi, we have h y=[x|x<-y,(<2).length$filter(==x)y], which is 37 characters (this ties the current "core" Python of "".join(c for c in s if s.count(c)<2), and it's virtually the same code anyway).
If you want to make a whole program out of it,
h y=[x|x<-y,(<2).length$filter(==x)y]
main=interact h
$ echo "nbHHkRvrXbvkn" | runghc tmp.hs
RrX
$ wc -c tmp.hs
54 tmp.hs
Or we can knock off one character this way:
main=interact(\y->[x|x<-y,(<2).length$filter(==x)y])
$ echo "nbHHkRvrXbvkn" | runghc tmp2.hs
RrX
$ wc -c tmp2.hs
53 tmp2.hs
It operates on all of stdin, not line-by-line, but that seems acceptable IMO.
C89 (106 characters)
This one uses a completely different method than my original answer. Interestingly, after writing it and then looking at another answer, I saw the methods were very similar. Credits to caf for coming up with this method before me.
b[256];l;x;main(c){while((c=getchar())>=0)b[c]=b[c]?1:--l;
for(;x-->l;)for(c=256;c;)b[--c]-x?0:putchar(c);}
On one line, it's 58+48 = 106 bytes.
C89 (173 characters)
This was my original answer. As said in the comments, it doesn't work too well...
#include<stdio.h>
main(l,s){char*b,*d;for(b=l=s=0;l==s;s+=fread(b+s,1,9,stdin))b=realloc(b,l+=9)
;d=b;for(l=0;l<s;++d)if(!memchr(b,*d,l)&!memchr(d+1,*d,s-l++-1))putchar(*d);}
On two lines, it's 17+1+78+77 = 173 bytes.
C#
65 Characters:
new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());
67 Characters with reassignment:
h=new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());
C#
new string(input.GroupBy(c => c).Where(g => g.Count() == 1).ToArray());
71 characters
PHP (136 characters)
<?PHP
function q($x){return $x<2;}echo implode(array_keys(array_filter(
array_count_values(str_split(stream_get_contents(STDIN))),'q')));
On one line, it's 5+1+65+65 = 136 bytes. Using PHP 5.3 you could save a few bytes making the function anonymous, but I can't test that now. Perhaps something like:
<?PHP
echo implode(array_keys(array_filter(array_count_values(str_split(
stream_get_contents(STDIN))),function($x){return $x<2;})));
That's 5+1+66+59 = 131 bytes.
another APL solution
As a dynamic function (18 charachters)
{(1+=/¨(ω∘∊¨ω))/ω}
line assuming that input is in variable x (16 characters):
(1+=/¨(x∘∊¨x))/x
VB.NET
For Each c In s : s = IIf(s.LastIndexOf(c) <> s.IndexOf(c), s.Replace(CStr(c), Nothing), s) : Next
Granted, VB is not the optimal language to try to save characters, but the line comes out to 98 characters.
PowerShell
61 characters. Where $s="nbHHkRvrXbvkn" and $a is the result.
$h=#{}
($c=[char[]]$s)|%{$h[$_]++}
$c|%{if($h[$_]-eq1){$a+=$_}}
Fully functioning parameterized script:
param($s)
$h=#{}
($c=[char[]]$s)|%{$h[$_]++}
$c|%{if($h[$_]-eq1){$a+=$_}}
$a
C: 83 89 93 99 101 characters
O(n2) time.
Limited to 999 characters.
Only works in 32-bit mode (due to not #include-ing <stdio.h> (costs 18 chars) making the return type of gets being interpreted as an int and chopping off half of the address bits).
Shows a friendly "warning: this program uses gets(), which is unsafe." on Macs.
.
main(){char s[999],*c=gets(s);for(;*c;c++)strchr(s,*c)-strrchr(s,*c)||putchar(*c);}
(and this similar 82-chars version takes input via the command line:
main(char*c,char**S){for(c=*++S;*c;c++)strchr(*S,*c)-strrchr(*S,*c)||putchar(*c);}
)
Golfscript(sym) - 15
.`{\{=}+,,(!}+,
+-------------------------------------------------------------------------+
|| | | | | | | | | | | | | | | |
|0 10 20 30 40 50 60 70 |
| |
+-------------------------------------------------------------------------+
Haskell
(just knocking a few characters off Mark Rushakoff's effort, I'd rather it was posted as a comment on his)
h y=[x|x<-y,[_]<-[filter(==x)y]]
which is better Haskell idiom but maybe harder to follow for non-Haskellers than this:
h y=[z|x<-y,[z]<-[filter(==x)y]]
Edit to add an explanation for hiena and others:
I'll assume you understand Mark's version, so I'll just cover the change. Mark's expression:
(<2).length $ filter (==x) y
filters y to get the list of elements that == x, finds the length of that list and makes sure it's less than two. (in fact it must be length one, but ==1 is longer than <2 ) My version:
[z] <- [filter(==x)y]
does the same filter, then puts the resulting list into a list as the only element. Now the arrow (meant to look like set inclusion!) says "for every element of the RHS list in turn, call that element [z]". [z] is the list containing the single element z, so the element "filter(==x)y" can only be called "[z]" if it contains exactly one element. Otherwise it gets discarded and is never used as a value of z. So the z's (which are returned on the left of the | in the list comprehension) are exactly the x's that make the filter return a list of length one.
That was my second version, my first version returns x instead of z - because they're the same anyway - and renames z to _ which is the Haskell symbol for "this value isn't going to be used so I'm not going to complicate my code by giving it a name".
Javascript 1.8
s.split('').filter(function (o,i,a) a.filter(function(p) o===p).length <2 ).join('');
or alternately- similar to the python example:
[s[c] for (c in s) if (s.split("").filter(function(p) s[c]===p).length <2)].join('');
TCL
123 chars. It might be possible to get it shorter, but this is good enough for me.
proc h {i {r {}}} {foreach c [split $i {}] {if {[llength [split $i $c]]==2} {set r $r$c}}
return $r}
puts [h [gets stdin]]
C
Full program in C, 141 bytes (counting newlines).
#include<stdio.h>
c,n[256],o,i=1;main(){for(;c-EOF;c=getchar())c-EOF?n[c]=n[c]?-1:o++:0;for(;i<o;i++)for(c=0;c<256;c++)n[c]-i?0:putchar(c);}
Scala
54 chars for the method body only, 66 with (statically typed) method declaration:
def s(s:String)=(""/:s)((a,b)=>if(s.filter(c=>c==b).size>1)a else a+b)
Ruby
63 chars.
puts (t=gets.split(//)).map{|i|t.count(i)>1?nil:i}.compact.join
VB.NET / LINQ
96 characters for complete working statement
Dim p=New String((From c In"nbHHkRvrXbvkn"Group c By c Into i=Count Where i=1 Select c).ToArray)
Complete working statement, with original string and the VB Specific "Pretty listing (reformatting of code" turned off, at 96 characters, non-working statement without original string at 84 characters.
(Please make sure your code works before answering. Thank you.)
C
(1st version: 112 characters; 2nd version: 107 characters)
k[256],o[100000],p,c;main(){while((c=getchar())!=-1)++k[o[p++]=c];for(c=0;c<p;c++)if(k[o[c]]==1)putchar(o[c]);}
That's
/* #include <stdio.h> */
/* int */ k[256], o[100000], p, c;
/* int */ main(/* void */) {
while((c=getchar()) != -1/*EOF*/) {
++k[o[p++] = /*(unsigned char)*/c];
}
for(c=0; c<p; c++) {
if(k[o[c]] == 1) {
putchar(o[c]);
}
}
/* return 0; */
}
Because getchar() returns int and putchar accepts int, the #include can 'safely' be removed.
Without the include, EOF is not defined, so I used -1 instead (and gained a char).
This program only works as intended for inputs with less than 100000 characters!
Version 2, with thanks to strager
107 characters
#ifdef NICE_LAYOUT
#include <stdio.h>
/* global variables are initialized to 0 */
int char_count[256]; /* k in the other layout */
int char_order[999999]; /* o ... */
int char_index; /* p */
int main(int ch_n_loop, char **dummy) /* c */
/* variable with 2 uses */
{
(void)dummy; /* make warning about unused variable go away */
while ((ch_n_loop = getchar()) >= 0) /* EOF is, by definition, negative */
{
++char_count[ ( char_order[char_index++] = ch_n_loop ) ];
/* assignment, and increment, inside the array index */
}
/* reuse ch_n_loop */
for (ch_n_loop = 0; ch_n_loop < char_index; ch_n_loop++) {
(char_count[char_order[ch_n_loop]] - 1) ? 0 : putchar(char_order[ch_n_loop]);
}
return 0;
}
#else
k[256],o[999999],p;main(c){while((c=getchar())>=0)++k[o[p++]=c];for(c=0;c<p;c++)k[o[c]]-1?0:putchar(o[c]);}
#endif
Javascript 1.6
s.match(/(.)(?=.*\1)/g).map(function(m){s=s.replace(RegExp(m,'g'),'')})
Shorter than the previously posted Javascript 1.8 solution (71 chars vs 85)
Assembler
Tested with WinXP DOS box (cmd.exe):
xchg cx,bp
std
mov al,2
rep stosb
inc cl
l0: ; to save a byte, I've encoded the instruction to exit the program into the
; low byte of the offset in the following instruction:
lea si,[di+01c3h]
push si
l1: mov dx,bp
mov ah,6
int 21h
jz l2
mov bl,al
shr byte ptr [di+bx],cl
jz l1
inc si
mov [si],bx
jmp l1
l2: pop si
l3: inc si
mov bl,[si]
cmp bl,bh
je l0+2
cmp [di+bx],cl
jne l3
mov dl,bl
mov ah,2
int 21h
jmp l3
Assembles to 53 bytes. Reads standard input and writes results to standard output, eg:
programname < input > output
PHP
118 characters actual code (plus 6 characters for the PHP block tag):
<?php
$s=trim(fgets(STDIN));$x='';while(strlen($s)){$t=str_replace($s[0],'',substr($s,1),$c);$x.=$c?'':$s[0];$s=$t;}echo$x;
C# (53 Characters)
Where s is your input string:
new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());
Or 59 with re-assignment:
var a=new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());
Haskell Pointfree
import Data.List
import Control.Monad
import Control.Arrow
main=interact$liftM2(\\)nub$ap(\\)nub
The whole program is 97 characters, but the real meat is just 23 characters. The rest is just imports and bringing the function into the IO monad. In ghci with the modules loaded it's just
(liftM2(\\)nub$ap(\\)nub) "nbHHkRvrXbvkn"
In even more ridiculous pointfree style (pointless style?):
main=interact$liftM2 ap liftM2 ap(\\)nub
It's a bit longer though at 26 chars for the function itself.
Shell/Coreutils, 37 Characters
fold -w1|sort|uniq -u|paste -s -d ''

Resources