What are the numbers on the end of lines in a stacktrace - windows

What do the the numbers after the "+" at the end of the lines in a stack trace represent?
Function Source
ntdll!KiFastSystemCallRet
ntdll!ZwRemoveIoCompletion+c
kernel32!GetQueuedCompletionStatus+29
w3tp!THREAD_POOL_DATA::ThreadPoolThread+33
w3tp!THREAD_POOL_DATA::ThreadPoolThread+24
w3tp!THREAD_MANAGER::ThreadManagerThread+39
kernel32!BaseThreadStart+34
here they are
+c
+29
+33
+24
+39 +34

They are offsets, in hexadecimal, from the start of the named subroutine. For example
kernel32!BaseThreadStart+34
is 52 (34 hex) bytes into the routine BaseThreadStart in the kernel32 module.

Offset inside the function. Eg. on frame 3 the return address is: the address of the kernel32!GetQueuedCompletionStatus symbol + 29 bytes.

Related

addiu instruction encoding (MIPS,GCC)

Here is addiu instruction opcode (16-bit instructions, GCC option -mmicromips):
full instruction: addiu sp,sp,-280
opcode, hexa: 4F75
opcode, binary: 1001(instruction) 11101(sp is $29) 110101
My purpose is to detect all instruction of this kind (addiu sp,sp,)
and then to decode the immediate, in the above case (-280) (to follow the sp).
What I don't understand is the encoding of (-280).
Linked to: How to get a call stack backtrace?(GCC,MIPS,no frame pointer)
microMips has a specialized ADDIUSP instruction which the assembler chose to use. The first 6 bits are the opcode 010011, the next 9 bits are the encoded immediate 110111010 = 0x1BA and the LSB is reserved at 1.
The encoding for the immediate uses scaling by 4 and sign extension. Given that 0x1BA = -70 (using 9 bits) the value is -70 * 4 = -280.

java8 -XX:+UseCompressedOops -XX:ObjectAlignmentInBytes=16

So, I'm trying to run some simple code, jdk-8, output via jol
System.out.println(VMSupport.vmDetails());
Integer i = new Integer(23);
System.out.println(ClassLayout.parseInstance(i)
.toPrintable());
The first attempt is to run it with compressed oops disabled and compressed klass also on 64-bit JVM.
-XX:-UseCompressedOops -XX:-UseCompressedClassPointers
The output, pretty much expected is :
Running 64-bit HotSpot VM.
Objects are 8 bytes aligned.
java.lang.Integer object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 48 33 36 97 (01001000 00110011 00110110 10010111) (-1758055608)
12 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
16 4 int Integer.value 23
20 4 (loss due to the next object alignment)
Instance size: 24 bytes (reported by Instrumentation API)
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
That makes sense : 8 bytes klass word + 8 bytes mark word + 4 bytes for the actual value and 4 for padding (to align on 8 bytes) = 24 bytes.
The second attempt it to run it with compressed oops enabled compressed klass also on 64-bit JVM.
Again, the output is pretty much understandable :
Running 64-bit HotSpot VM.
Using compressed oop with 3-bit shift.
Using compressed klass with 3-bit shift.
Objects are 8 bytes aligned.
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) f9 33 01 f8 (11111001 00110011 00000001 11111000) (-134138887)
12 4 int Dummy.i 42
Instance size: 16 bytes (reported by Instrumentation API).
4 bytes compressed oop (klass word) + 8 bytes mark word + 4 bytes for the value + no space loss = 16 bytes.
The thing that does NOT make sense to me is this use-case:
-XX:+UseCompressedOops -XX:+UseCompressedClassPointers -XX:ObjectAlignmentInBytes=16
The output is this:
Running 64-bit HotSpot VM.
Using compressed oop with 4-bit shift.
Using compressed klass with 0x0000001000000000 base address and 0-bit shift.
I was really expecting to both be "4-bit shift". Why they are not?
EDIT
The second example is run with :
XX:+UseCompressedOops -XX:+UseCompressedClassPointers
And the third one with :
-XX:+UseCompressedOops -XX:+UseCompressedClassPointers -XX:ObjectAlignmentInBytes=16
Answers to these questions are mostly easy to figure out when looking into OpenJDK code.
For example, grep for "UseCompressedClassPointers", this will get you to arguments.cpp:
// Check the CompressedClassSpaceSize to make sure we use compressed klass ptrs.
if (UseCompressedClassPointers) {
if (CompressedClassSpaceSize > KlassEncodingMetaspaceMax) {
warning("CompressedClassSpaceSize is too large for UseCompressedClassPointers");
FLAG_SET_DEFAULT(UseCompressedClassPointers, false);
}
}
Okay, interesting, there is "CompressedClassSpaceSize"? Grep for its definition, it's in globals.hpp:
product(size_t, CompressedClassSpaceSize, 1*G, \
"Maximum size of class area in Metaspace when compressed " \
"class pointers are used") \
range(1*M, 3*G) \
Aha, so the class area is in Metaspace, and it takes somewhere between 1 Mb and 3 Gb of space. Let's grep for "CompressedClassSpaceSize" usages, because that will take us to actual code that handles it, say in metaspace.cpp:
// For UseCompressedClassPointers the class space is reserved above
// the top of the Java heap. The argument passed in is at the base of
// the compressed space.
void Metaspace::initialize_class_space(ReservedSpace rs) {
So, compressed classes are allocated in a smaller class space outside the Java heap, which does not require shifting -- even 3 gigabytes is small enough to use only the lowest 32 bits.
I will try to extend a little bit on the answer provided by Alexey as some things might not be obvious.
Following Alexey suggestion, if we search the source code of OpenJDK for where compressed klass bit shift value is assigned, we will find the following code in metaspace.cpp:
void Metaspace::set_narrow_klass_base_and_shift(address metaspace_base, address cds_base) {
// some code removed
if ((uint64_t)(higher_address - lower_base) <= UnscaledClassSpaceMax) {
Universe::set_narrow_klass_shift(0);
} else {
assert(!UseSharedSpaces, "Cannot shift with UseSharedSpaces");
Universe::set_narrow_klass_shift(LogKlassAlignmentInBytes);
}
As we can see, the class shift can either be 0(or basically no shifting) or 3 bits, because LogKlassAlignmentInBytes is a constant defined in globalDefinitions.hpp:
const int LogKlassAlignmentInBytes = 3;
So, the answer to your quetion:
I was really expecting to both be "4-bit shift". Why they are not?
is that ObjectAlignmentInBytes does not have any effect on compressed class pointers alignment in the metaspace which is always 8bytes.
Of course this conclusion does not answer the question:
"Why when using -XX:ObjectAlignmentInBytes=16 with -XX:+UseCompressedClassPointers the narrow klass shift becomes zero? Also, without shifting how can the JVM reference the class space with 32-bit references, if the heap is 4GBytes or more?"
We already know that the class space is allocated on top of the java heap and can be up to 3G in size. With that in mind let's make a few tests. -XX:+UseCompressedOops -XX:+UseCompressedClassPointers are enabled by default, so we can eliminate these for conciseness.
Test 1: Defaults - 8 Bytes aligned
$ java -XX:ObjectAlignmentInBytes=8 -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version
heap address: 0x00000006c0000000, size: 4096 MB, zero based Compressed Oops
Narrow klass base: 0x0000000000000000, Narrow klass shift: 3
Compressed class space size: 1073741824 Address: 0x00000007c0000000 Req Addr: 0x00000007c0000000
Notice that the heap starts at address 0x00000006c0000000 in the virtual space and has a size of 4GBytes. Let's jump by 4Gbytes from where the heap starts and we land just where class space begins.
0x00000006c0000000 + 0x0000000100000000 = 0x00000007c0000000
The class space size is 1Gbyte, so let's jump by another 1Gbyte:
0x00000007c0000000 + 0x0000000040000000 = 0x0000000800000000
and we land just below 32Gbytes. With a 3 bits class space shifting the JVM is able to reference the entire class space, although it's at the limit (intentionally).
Test 2: 16 bytes aligned
java -XX:ObjectAlignmentInBytes=16 -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version
heap address: 0x0000000f00000000, size: 4096 MB, zero based Compressed Oops
Narrow klass base: 0x0000001000000000, Narrow klass shift: 0
Compressed class space size: 1073741824 Address: 0x0000001000000000 Req Addr: 0x0000001000000000
This time we can observe that the heap address is different, but let's try the same steps:
0x0000000f00000000 + 0x0000000100000000 = 0x0000001000000000
This time around heap space ends just below 64GBytes virtual space boundary and the class space is allocated above 64Gbyte boundary. Since class space can use only 3 bits shifting, how can the JVM reference the class space located above 64Gbyte? The key is:
Narrow klass base: 0x0000001000000000
The JVM still uses 32 bit compressed pointers for the class space, but when encoding and decoding these, it will always add 0x0000001000000000 base to the compressed reference instead of using shifting. Note, that this approach works as long as the referenced chunk of memory is lower than 4Gbytes (the limit for 32 bits references). Considering that the class space can have a maximum of 3Gbytes we are comfortably within the limits.
3: 16 bytes aligned, pin heap base at 8g
$ java -XX:ObjectAlignmentInBytes=16 -XX:HeapBaseMinAddress=8g -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version
heap address: 0x0000000200000000, size: 4096 MB, zero based Compressed Oops
Narrow klass base: 0x0000000000000000, Narrow klass shift: 3
Compressed class space size: 1073741824 Address: 0x0000000300000000 Req Addr: 0x0000000300000000
In this test we are still keeping the -XX:ObjectAlignmentInBytes=16, but also asking the JVM to allocate the heap at the 8th GByte in the virtual address space using -XX:HeapBaseMinAddress=8g JVM argument. The class space will begin at 12th GByte in the virtual address space and 3 bits shifting is more than enough to reference it.
Hopefully, these tests and their results answer the question:
"Why when using -XX:ObjectAlignmentInBytes=16 with -XX:+UseCompressedClassPointers the narrow klass shift becomes zero? Also, without shifting how can the JVM reference the class space with 32-bit references, if the heap is 4GBytes or more?"

What is the meaning of numbers in inline assemble

Do anyone know what the following code does?
I'm not sure what the 1, 2, 3 is refered and how they are used here. :-(
95 asm volatile("2: wrmsr ; xor %[err],%[err]\n"
96 "1:\n\t"
97 ".section .fixup,\"ax\"\n\t"
98 "3: mov %[fault],%[err] ; jmp 1b\n\t"
99 ".previous\n\t"
100 _ASM_EXTABLE(2b, 3b)
101 : [err] "=a" (err)
102 : "c" (msr), "" (low), "d" (high),
103 [fault] "i" (-EIO)
104 : "memory");
105 return err;​
The code is from Linux: http://lxr.free-electrons.com/source/arch/x86/include/asm/msr.h#L91
I really appreciate it ​if anyone could give me some key word to google it.
Thank you very much in advance!
Those are local labels (numbers followed by a colon).
When they are later referenced, the b (as in jmp 1b) means to refer to the nearest local label of that number going backwards. An f would look for a matching local label later (forwards) in the code.
That code declares an exception table, when an exception occurs executing the wrmsr instruction, the fault handler (usually in arch/<your_CPU_arch>/mm/fault.c) searches the exception table for the corresponding entry, and jumps there.
As you can see, the entry for that exception moves EIO into err, and jumps back to the instruction following the xor (which would clear err in case there was no error).

Replace the n-th byte in a file with another byte

In Ruby, how do I replace, say, the 7th byte of a file with another byte?
Use binwrite method from IO class
IO.binwrite("testfile", [0x0D].pack("C"), 7) # => 1
# File could contain: "This is0two\nThis is line three\nAnd so on...\n"
0x0D is 13
Also you may need to know about pack method

Code Golf: Duplicate Character Removal in String

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
The challenge: The shortest code, by character count, that detects and removes duplicate characters in a String. Removal includes ALL instances of the duplicated character (so if you find 3 n's, all three have to go), and original character order needs to be preserved.
Example Input 1:
nbHHkRvrXbvkn
Example Output 1:
RrX
Example Input 2:
nbHHkRbvnrXbvkn
Example Output 2:
RrX
(the second example removes letters that occur three times; some solutions have failed to account for this)
(This is based on my other question where I needed the fastest way to do this in C#, but I think it makes good Code Golf across languages.)
LabVIEW 7.1
ONE character and that is the blue constant '1' in the block diagram.
I swear, the input was copy and paste ;-)
http://i25.tinypic.com/hvc4mp.png
http://i26.tinypic.com/5pnas.png
Perl
21 characters of perl, 31 to invoke, 36 total keystrokes (counting shift and final return):
perl -pe's/$1//gwhile/(.).*\1/'
Ruby — 61 53 51 56 35
61 chars, the ruler says. (Gives me an idea for another code golf...)
puts ((i=gets.split(''))-i.select{|c|i.to_s.count(c)<2}).join
+-------------------------------------------------------------------------+
|| | | | | | | | | | | | | | | |
|0 10 20 30 40 50 60 70 |
| |
+-------------------------------------------------------------------------+
gets.chars{|c|$><<c[$_.count(c)-1]}
... 35 by Nakilon
APL
23 characters:
(((1+ρx)-(ϕx)ιx)=xιx)/x
I'm an APL newbie (learned it yesterday), so be kind -- this is certainly not the most efficient way to do it. I'm ashamed I didn't beat Perl by very much.
Then again, maybe it says something when the most natural way for a newbie to solve this problem in APL was still more concise than any other solution in any language so far.
Python:
s=raw_input()
print filter(lambda c:s.count(c)<2,s)
This is a complete working program, reading from and writing to the console. The one-liner version can be directly used from the command line
python -c 's=raw_input();print filter(lambda c:s.count(c)<2,s)'
J (16 12 characters)
(~.{~[:I.1=#/.~)
Example:
(~.{~[:I.1=#/.~) 'nbHHkRvrXbvkn'
RrX
It only needs the parenthesis to be executed tacitly. If put in a verb, the actual code itself would be 14 characters.
There certainly are smarter ways to do this.
EDIT: The smarter way in question:
(~.#~1=#/.~) 'nbHHkRvrXbvkn'
RrX
12 characters, only 10 if set in a verb. I still hate the fact that it's going through the list twice, once to count (#/.) and another to return uniques (nub or ~.), but even nubcount, a standard verb in the 'misc' library does it twice.
Haskell
There's surely shorter ways to do this in Haskell, but:
Prelude Data.List> let h y=[x|x<-y,(<2).length$filter(==x)y]
Prelude Data.List> h "nbHHkRvrXbvkn"
"RrX"
Ignoring the let, since it's only required for function declarations in GHCi, we have h y=[x|x<-y,(<2).length$filter(==x)y], which is 37 characters (this ties the current "core" Python of "".join(c for c in s if s.count(c)<2), and it's virtually the same code anyway).
If you want to make a whole program out of it,
h y=[x|x<-y,(<2).length$filter(==x)y]
main=interact h
$ echo "nbHHkRvrXbvkn" | runghc tmp.hs
RrX
$ wc -c tmp.hs
54 tmp.hs
Or we can knock off one character this way:
main=interact(\y->[x|x<-y,(<2).length$filter(==x)y])
$ echo "nbHHkRvrXbvkn" | runghc tmp2.hs
RrX
$ wc -c tmp2.hs
53 tmp2.hs
It operates on all of stdin, not line-by-line, but that seems acceptable IMO.
C89 (106 characters)
This one uses a completely different method than my original answer. Interestingly, after writing it and then looking at another answer, I saw the methods were very similar. Credits to caf for coming up with this method before me.
b[256];l;x;main(c){while((c=getchar())>=0)b[c]=b[c]?1:--l;
for(;x-->l;)for(c=256;c;)b[--c]-x?0:putchar(c);}
On one line, it's 58+48 = 106 bytes.
C89 (173 characters)
This was my original answer. As said in the comments, it doesn't work too well...
#include<stdio.h>
main(l,s){char*b,*d;for(b=l=s=0;l==s;s+=fread(b+s,1,9,stdin))b=realloc(b,l+=9)
;d=b;for(l=0;l<s;++d)if(!memchr(b,*d,l)&!memchr(d+1,*d,s-l++-1))putchar(*d);}
On two lines, it's 17+1+78+77 = 173 bytes.
C#
65 Characters:
new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());
67 Characters with reassignment:
h=new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());
C#
new string(input.GroupBy(c => c).Where(g => g.Count() == 1).ToArray());
71 characters
PHP (136 characters)
<?PHP
function q($x){return $x<2;}echo implode(array_keys(array_filter(
array_count_values(str_split(stream_get_contents(STDIN))),'q')));
On one line, it's 5+1+65+65 = 136 bytes. Using PHP 5.3 you could save a few bytes making the function anonymous, but I can't test that now. Perhaps something like:
<?PHP
echo implode(array_keys(array_filter(array_count_values(str_split(
stream_get_contents(STDIN))),function($x){return $x<2;})));
That's 5+1+66+59 = 131 bytes.
another APL solution
As a dynamic function (18 charachters)
{(1+=/¨(ω∘∊¨ω))/ω}
line assuming that input is in variable x (16 characters):
(1+=/¨(x∘∊¨x))/x
VB.NET
For Each c In s : s = IIf(s.LastIndexOf(c) <> s.IndexOf(c), s.Replace(CStr(c), Nothing), s) : Next
Granted, VB is not the optimal language to try to save characters, but the line comes out to 98 characters.
PowerShell
61 characters. Where $s="nbHHkRvrXbvkn" and $a is the result.
$h=#{}
($c=[char[]]$s)|%{$h[$_]++}
$c|%{if($h[$_]-eq1){$a+=$_}}
Fully functioning parameterized script:
param($s)
$h=#{}
($c=[char[]]$s)|%{$h[$_]++}
$c|%{if($h[$_]-eq1){$a+=$_}}
$a
C: 83 89 93 99 101 characters
O(n2) time.
Limited to 999 characters.
Only works in 32-bit mode (due to not #include-ing <stdio.h> (costs 18 chars) making the return type of gets being interpreted as an int and chopping off half of the address bits).
Shows a friendly "warning: this program uses gets(), which is unsafe." on Macs.
.
main(){char s[999],*c=gets(s);for(;*c;c++)strchr(s,*c)-strrchr(s,*c)||putchar(*c);}
(and this similar 82-chars version takes input via the command line:
main(char*c,char**S){for(c=*++S;*c;c++)strchr(*S,*c)-strrchr(*S,*c)||putchar(*c);}
)
Golfscript(sym) - 15
.`{\{=}+,,(!}+,
+-------------------------------------------------------------------------+
|| | | | | | | | | | | | | | | |
|0 10 20 30 40 50 60 70 |
| |
+-------------------------------------------------------------------------+
Haskell
(just knocking a few characters off Mark Rushakoff's effort, I'd rather it was posted as a comment on his)
h y=[x|x<-y,[_]<-[filter(==x)y]]
which is better Haskell idiom but maybe harder to follow for non-Haskellers than this:
h y=[z|x<-y,[z]<-[filter(==x)y]]
Edit to add an explanation for hiena and others:
I'll assume you understand Mark's version, so I'll just cover the change. Mark's expression:
(<2).length $ filter (==x) y
filters y to get the list of elements that == x, finds the length of that list and makes sure it's less than two. (in fact it must be length one, but ==1 is longer than <2 ) My version:
[z] <- [filter(==x)y]
does the same filter, then puts the resulting list into a list as the only element. Now the arrow (meant to look like set inclusion!) says "for every element of the RHS list in turn, call that element [z]". [z] is the list containing the single element z, so the element "filter(==x)y" can only be called "[z]" if it contains exactly one element. Otherwise it gets discarded and is never used as a value of z. So the z's (which are returned on the left of the | in the list comprehension) are exactly the x's that make the filter return a list of length one.
That was my second version, my first version returns x instead of z - because they're the same anyway - and renames z to _ which is the Haskell symbol for "this value isn't going to be used so I'm not going to complicate my code by giving it a name".
Javascript 1.8
s.split('').filter(function (o,i,a) a.filter(function(p) o===p).length <2 ).join('');
or alternately- similar to the python example:
[s[c] for (c in s) if (s.split("").filter(function(p) s[c]===p).length <2)].join('');
TCL
123 chars. It might be possible to get it shorter, but this is good enough for me.
proc h {i {r {}}} {foreach c [split $i {}] {if {[llength [split $i $c]]==2} {set r $r$c}}
return $r}
puts [h [gets stdin]]
C
Full program in C, 141 bytes (counting newlines).
#include<stdio.h>
c,n[256],o,i=1;main(){for(;c-EOF;c=getchar())c-EOF?n[c]=n[c]?-1:o++:0;for(;i<o;i++)for(c=0;c<256;c++)n[c]-i?0:putchar(c);}
Scala
54 chars for the method body only, 66 with (statically typed) method declaration:
def s(s:String)=(""/:s)((a,b)=>if(s.filter(c=>c==b).size>1)a else a+b)
Ruby
63 chars.
puts (t=gets.split(//)).map{|i|t.count(i)>1?nil:i}.compact.join
VB.NET / LINQ
96 characters for complete working statement
Dim p=New String((From c In"nbHHkRvrXbvkn"Group c By c Into i=Count Where i=1 Select c).ToArray)
Complete working statement, with original string and the VB Specific "Pretty listing (reformatting of code" turned off, at 96 characters, non-working statement without original string at 84 characters.
(Please make sure your code works before answering. Thank you.)
C
(1st version: 112 characters; 2nd version: 107 characters)
k[256],o[100000],p,c;main(){while((c=getchar())!=-1)++k[o[p++]=c];for(c=0;c<p;c++)if(k[o[c]]==1)putchar(o[c]);}
That's
/* #include <stdio.h> */
/* int */ k[256], o[100000], p, c;
/* int */ main(/* void */) {
while((c=getchar()) != -1/*EOF*/) {
++k[o[p++] = /*(unsigned char)*/c];
}
for(c=0; c<p; c++) {
if(k[o[c]] == 1) {
putchar(o[c]);
}
}
/* return 0; */
}
Because getchar() returns int and putchar accepts int, the #include can 'safely' be removed.
Without the include, EOF is not defined, so I used -1 instead (and gained a char).
This program only works as intended for inputs with less than 100000 characters!
Version 2, with thanks to strager
107 characters
#ifdef NICE_LAYOUT
#include <stdio.h>
/* global variables are initialized to 0 */
int char_count[256]; /* k in the other layout */
int char_order[999999]; /* o ... */
int char_index; /* p */
int main(int ch_n_loop, char **dummy) /* c */
/* variable with 2 uses */
{
(void)dummy; /* make warning about unused variable go away */
while ((ch_n_loop = getchar()) >= 0) /* EOF is, by definition, negative */
{
++char_count[ ( char_order[char_index++] = ch_n_loop ) ];
/* assignment, and increment, inside the array index */
}
/* reuse ch_n_loop */
for (ch_n_loop = 0; ch_n_loop < char_index; ch_n_loop++) {
(char_count[char_order[ch_n_loop]] - 1) ? 0 : putchar(char_order[ch_n_loop]);
}
return 0;
}
#else
k[256],o[999999],p;main(c){while((c=getchar())>=0)++k[o[p++]=c];for(c=0;c<p;c++)k[o[c]]-1?0:putchar(o[c]);}
#endif
Javascript 1.6
s.match(/(.)(?=.*\1)/g).map(function(m){s=s.replace(RegExp(m,'g'),'')})
Shorter than the previously posted Javascript 1.8 solution (71 chars vs 85)
Assembler
Tested with WinXP DOS box (cmd.exe):
xchg cx,bp
std
mov al,2
rep stosb
inc cl
l0: ; to save a byte, I've encoded the instruction to exit the program into the
; low byte of the offset in the following instruction:
lea si,[di+01c3h]
push si
l1: mov dx,bp
mov ah,6
int 21h
jz l2
mov bl,al
shr byte ptr [di+bx],cl
jz l1
inc si
mov [si],bx
jmp l1
l2: pop si
l3: inc si
mov bl,[si]
cmp bl,bh
je l0+2
cmp [di+bx],cl
jne l3
mov dl,bl
mov ah,2
int 21h
jmp l3
Assembles to 53 bytes. Reads standard input and writes results to standard output, eg:
programname < input > output
PHP
118 characters actual code (plus 6 characters for the PHP block tag):
<?php
$s=trim(fgets(STDIN));$x='';while(strlen($s)){$t=str_replace($s[0],'',substr($s,1),$c);$x.=$c?'':$s[0];$s=$t;}echo$x;
C# (53 Characters)
Where s is your input string:
new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());
Or 59 with re-assignment:
var a=new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());
Haskell Pointfree
import Data.List
import Control.Monad
import Control.Arrow
main=interact$liftM2(\\)nub$ap(\\)nub
The whole program is 97 characters, but the real meat is just 23 characters. The rest is just imports and bringing the function into the IO monad. In ghci with the modules loaded it's just
(liftM2(\\)nub$ap(\\)nub) "nbHHkRvrXbvkn"
In even more ridiculous pointfree style (pointless style?):
main=interact$liftM2 ap liftM2 ap(\\)nub
It's a bit longer though at 26 chars for the function itself.
Shell/Coreutils, 37 Characters
fold -w1|sort|uniq -u|paste -s -d ''

Resources