String of bits to byte. Value out of range - converting error java - byte

I need to convert the set of bits that are represented via String. My String is multiple of 8, so I can divide it by 8 and get sub-strings with 8 bits inside. Then I have to convert these sub-strings to byte and print it in HEX. For example:
String seq = "0100000010000110";
The seq is much longer, but this is not the topic. Below you can see two sub-strings from the seq. And with one of them I have trouble, why?
String s_ok = "01000000"; //this value is OK to convert
String s_error = "10000110"; //this is not OK to convert but in HEX it is 86 in DEC 134
byte nByte = Byte.parseByte(s_ok, 2);
System.out.println(nByte);
try {
byte bByte = Byte.parseByte(s_error, 2);
System.out.println(bByte);
} catch (Exception e) {
System.out.println(e); //Value out of range. Value:"10000110" Radix:2
}
int in=Integer.parseInt(s_error, 2);
System.out.println("s_error set of bits in DEC - "+in + " and now in HEX - "+Integer.toHexString((byte)in)); //s_error set of bits in DEC - 134 and now in HEX - ffffff86
I can't understand why there is an error, for calculator it is not a problem to convert 10000110. So, I tried Integer and there are ffffff86 instead of simple 86.
Please help with: Why? and how to avoid the issue.

Well, I found how to avoid ffffff:
System.out.println("s_error set of bits in DEC - "+in + " and now in HEX - "+Integer.toHexString((byte)in & 0xFF));
0xFF was added. Bad thing is - I still don't know from where those ffffff came and I it is not clear for, what I've done. Is it some kind of byte multiplication or is it masking? I'm lost.

Related

UTF-8 value of a character in ColdFusion?

In ColdFusion I can determine the ASCII value of character by using asc()
How do I determine the UTF-8 value of a character?
<cfscript>
x = "漢"; // 3 bytes
// bytes of unicode character, a.k.a. String.getBytes("UTF-8")
bytes = charsetDecode(x, "UTF-8");
writeDump(bytes); // -26-68-94
// convert the 3 bytes to Hex
hex = binaryEncode(bytes, "HEX");
writeDump(hex); // E6BCA2
// convert the Hex to Dec
dec = inputBaseN(hex, 16);
writeDump(dec); // 15121570
// asc() uses the UCS-2 representation: 漢 = Hex 6F22 = Dec 28450
asc = asc(x);
writeDump(asc); // 28450
</cfscript>
USC-2 is fixed to 2 bytes, so it cannot support all unicode characters (as there can be as much as 4 bytes per character). But what are you actually trying to achieve here?
Note: If you run this example and get more than 3 bytes returned, make sure CF picks up the file as UTF-8 (with BOM).

Decrypting REG_NONE value in Registry

http://i.stack.imgur.com/xaP9s.jpg
Referring to the screenshot above as I'm not able to attach screenshot,
I want to convert the Filesize value which is in Hex to a String i.e. human readable format
The actual decimal value is 5.85 MB
While converting, I am not getting the actual value i.e. 5.85
Can any one suggest how do I convert the values.
I have a set of these hex values and want to convert them into a human readable format.
Each pair of hexadecimal numbers represents a byte, while the lowest value bytes are placed to the left:
0x00 -> 0
0xbb -> 187
0x5d -> 93
0*256^0 + 187*256^1 + 93*256^2 + 0*256^3 + 0*256^4 + 0*256^5 + 0*256^6 + 0*256^7
= 6142720
6142720 / 1024^2
= 5.85815
This storage format is called little-endian: https://en.wikipedia.org/wiki/Little-endian#Little-endian

ruby YAML parse bug with number

I have encountered what appears to be a bug with the YAML parser. Take this simple yaml file for example:
new account:
- FLEETBOSTON
- 011001742
If you parse it using this ruby line of code:
INPUT_DATA = YAML.load_file("test.yml")
Then I get this back:
{"new account"=>["FLEETBOSTON", 2360290]}
Am I doing something wrong? Because I'm pretty sure this is never supposed to happen.
It is supposed to happen. Numbers starting with 0 are in octal notation. Unless the next character is x, in which case they're hexadecimal.
07 == 7
010 == 8
011 == 9
0x9 == 9
0xA == 10
0xF == 15
0x10 == 16
0x11 == 17
Go into irb and just type in 011001742.
1.9.2-p290 :001 > 011001742
=> 2360290
PEBKAC. :)
Your number is a number, so it's treated as a number. If you want to make it explictly a string, enclose it into quotes, so YAML will not try to make it a number.
new account:
- FLEETBOSTON
- '011001742'

MATLAB: how to display UTF-8-encoded text read from file?

The gist of my question is this:
How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?
Details:
I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:
>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8');
>> [x, x, x, enc] = fopen(fid); enc
enc =
UTF-8
>> tbl = textscan(fid, '%s', 35, 'delimiter', ',');
>> tbl{1}{1}
ans =
ÎÎÎÎÎΠΣΦΩαβγδεζηθικλμνξÏÏÏÏÏÏÏÏÏÏ
>>
As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:
>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω'
pasted =
>>
Thanks!
I present below my findings after doing some digging... Consider these test files:
a.txt
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
b.txt
தமிழ்
First, we read files:
%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')'; %'# read bytes
fclose(fid);
%# decode as unicode string
str = native2unicode(b,'UTF-8');
If you try to print the string, you get a bunch of nonsense:
>> str
str =
Nonetheless, str does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):
>> double(str)
ans =
Columns 1 through 13
915 916 920 923 926 928 931 934 937 945 946 947 948
Columns 14 through 26
949 950 951 952 953 954 955 956 957 958 960 961 962
Columns 27 through 35
963 964 965 966 967 968 969 13 10
Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:
figure
text(0.1, 0.5, str, 'FontName','Arial Unicode MS')
title(str)
xlabel(str)
One trick I found is to use the embedded Java capability:
%# Java Swing
label = javax.swing.JLabel();
label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) );
label.setText(str);
f = javax.swing.JFrame('frame');
f.getContentPane().add(label);
f.pack();
f.setVisible(true);
As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet undocumented feature and set the charset to UTF-8 (on my machine, it is ISO-8859-1 by default):
feature('DefaultCharacterSet','UTF-8');
Now with a proper font (you can change the font used in the Command Window from Preferences > Font), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):
>> str
str =
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
>> disp(str)
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπÏςστυφχψω
And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):
uicontrol('Style','text', 'String',str, ...
'Units','normalized', 'Position',[0 0 1 1], ...
'FontName','Arial Unicode MS', 'FontSize',30)
Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:
As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.

Code Golf: Duplicate Character Removal in String

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
The challenge: The shortest code, by character count, that detects and removes duplicate characters in a String. Removal includes ALL instances of the duplicated character (so if you find 3 n's, all three have to go), and original character order needs to be preserved.
Example Input 1:
nbHHkRvrXbvkn
Example Output 1:
RrX
Example Input 2:
nbHHkRbvnrXbvkn
Example Output 2:
RrX
(the second example removes letters that occur three times; some solutions have failed to account for this)
(This is based on my other question where I needed the fastest way to do this in C#, but I think it makes good Code Golf across languages.)
LabVIEW 7.1
ONE character and that is the blue constant '1' in the block diagram.
I swear, the input was copy and paste ;-)
http://i25.tinypic.com/hvc4mp.png
http://i26.tinypic.com/5pnas.png
Perl
21 characters of perl, 31 to invoke, 36 total keystrokes (counting shift and final return):
perl -pe's/$1//gwhile/(.).*\1/'
Ruby — 61 53 51 56 35
61 chars, the ruler says. (Gives me an idea for another code golf...)
puts ((i=gets.split(''))-i.select{|c|i.to_s.count(c)<2}).join
+-------------------------------------------------------------------------+
|| | | | | | | | | | | | | | | |
|0 10 20 30 40 50 60 70 |
| |
+-------------------------------------------------------------------------+
gets.chars{|c|$><<c[$_.count(c)-1]}
... 35 by Nakilon
APL
23 characters:
(((1+ρx)-(ϕx)ιx)=xιx)/x
I'm an APL newbie (learned it yesterday), so be kind -- this is certainly not the most efficient way to do it. I'm ashamed I didn't beat Perl by very much.
Then again, maybe it says something when the most natural way for a newbie to solve this problem in APL was still more concise than any other solution in any language so far.
Python:
s=raw_input()
print filter(lambda c:s.count(c)<2,s)
This is a complete working program, reading from and writing to the console. The one-liner version can be directly used from the command line
python -c 's=raw_input();print filter(lambda c:s.count(c)<2,s)'
J (16 12 characters)
(~.{~[:I.1=#/.~)
Example:
(~.{~[:I.1=#/.~) 'nbHHkRvrXbvkn'
RrX
It only needs the parenthesis to be executed tacitly. If put in a verb, the actual code itself would be 14 characters.
There certainly are smarter ways to do this.
EDIT: The smarter way in question:
(~.#~1=#/.~) 'nbHHkRvrXbvkn'
RrX
12 characters, only 10 if set in a verb. I still hate the fact that it's going through the list twice, once to count (#/.) and another to return uniques (nub or ~.), but even nubcount, a standard verb in the 'misc' library does it twice.
Haskell
There's surely shorter ways to do this in Haskell, but:
Prelude Data.List> let h y=[x|x<-y,(<2).length$filter(==x)y]
Prelude Data.List> h "nbHHkRvrXbvkn"
"RrX"
Ignoring the let, since it's only required for function declarations in GHCi, we have h y=[x|x<-y,(<2).length$filter(==x)y], which is 37 characters (this ties the current "core" Python of "".join(c for c in s if s.count(c)<2), and it's virtually the same code anyway).
If you want to make a whole program out of it,
h y=[x|x<-y,(<2).length$filter(==x)y]
main=interact h
$ echo "nbHHkRvrXbvkn" | runghc tmp.hs
RrX
$ wc -c tmp.hs
54 tmp.hs
Or we can knock off one character this way:
main=interact(\y->[x|x<-y,(<2).length$filter(==x)y])
$ echo "nbHHkRvrXbvkn" | runghc tmp2.hs
RrX
$ wc -c tmp2.hs
53 tmp2.hs
It operates on all of stdin, not line-by-line, but that seems acceptable IMO.
C89 (106 characters)
This one uses a completely different method than my original answer. Interestingly, after writing it and then looking at another answer, I saw the methods were very similar. Credits to caf for coming up with this method before me.
b[256];l;x;main(c){while((c=getchar())>=0)b[c]=b[c]?1:--l;
for(;x-->l;)for(c=256;c;)b[--c]-x?0:putchar(c);}
On one line, it's 58+48 = 106 bytes.
C89 (173 characters)
This was my original answer. As said in the comments, it doesn't work too well...
#include<stdio.h>
main(l,s){char*b,*d;for(b=l=s=0;l==s;s+=fread(b+s,1,9,stdin))b=realloc(b,l+=9)
;d=b;for(l=0;l<s;++d)if(!memchr(b,*d,l)&!memchr(d+1,*d,s-l++-1))putchar(*d);}
On two lines, it's 17+1+78+77 = 173 bytes.
C#
65 Characters:
new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());
67 Characters with reassignment:
h=new String(h.Where(x=>h.IndexOf(x)==h.LastIndexOf(x)).ToArray());
C#
new string(input.GroupBy(c => c).Where(g => g.Count() == 1).ToArray());
71 characters
PHP (136 characters)
<?PHP
function q($x){return $x<2;}echo implode(array_keys(array_filter(
array_count_values(str_split(stream_get_contents(STDIN))),'q')));
On one line, it's 5+1+65+65 = 136 bytes. Using PHP 5.3 you could save a few bytes making the function anonymous, but I can't test that now. Perhaps something like:
<?PHP
echo implode(array_keys(array_filter(array_count_values(str_split(
stream_get_contents(STDIN))),function($x){return $x<2;})));
That's 5+1+66+59 = 131 bytes.
another APL solution
As a dynamic function (18 charachters)
{(1+=/¨(ω∘∊¨ω))/ω}
line assuming that input is in variable x (16 characters):
(1+=/¨(x∘∊¨x))/x
VB.NET
For Each c In s : s = IIf(s.LastIndexOf(c) <> s.IndexOf(c), s.Replace(CStr(c), Nothing), s) : Next
Granted, VB is not the optimal language to try to save characters, but the line comes out to 98 characters.
PowerShell
61 characters. Where $s="nbHHkRvrXbvkn" and $a is the result.
$h=#{}
($c=[char[]]$s)|%{$h[$_]++}
$c|%{if($h[$_]-eq1){$a+=$_}}
Fully functioning parameterized script:
param($s)
$h=#{}
($c=[char[]]$s)|%{$h[$_]++}
$c|%{if($h[$_]-eq1){$a+=$_}}
$a
C: 83 89 93 99 101 characters
O(n2) time.
Limited to 999 characters.
Only works in 32-bit mode (due to not #include-ing <stdio.h> (costs 18 chars) making the return type of gets being interpreted as an int and chopping off half of the address bits).
Shows a friendly "warning: this program uses gets(), which is unsafe." on Macs.
.
main(){char s[999],*c=gets(s);for(;*c;c++)strchr(s,*c)-strrchr(s,*c)||putchar(*c);}
(and this similar 82-chars version takes input via the command line:
main(char*c,char**S){for(c=*++S;*c;c++)strchr(*S,*c)-strrchr(*S,*c)||putchar(*c);}
)
Golfscript(sym) - 15
.`{\{=}+,,(!}+,
+-------------------------------------------------------------------------+
|| | | | | | | | | | | | | | | |
|0 10 20 30 40 50 60 70 |
| |
+-------------------------------------------------------------------------+
Haskell
(just knocking a few characters off Mark Rushakoff's effort, I'd rather it was posted as a comment on his)
h y=[x|x<-y,[_]<-[filter(==x)y]]
which is better Haskell idiom but maybe harder to follow for non-Haskellers than this:
h y=[z|x<-y,[z]<-[filter(==x)y]]
Edit to add an explanation for hiena and others:
I'll assume you understand Mark's version, so I'll just cover the change. Mark's expression:
(<2).length $ filter (==x) y
filters y to get the list of elements that == x, finds the length of that list and makes sure it's less than two. (in fact it must be length one, but ==1 is longer than <2 ) My version:
[z] <- [filter(==x)y]
does the same filter, then puts the resulting list into a list as the only element. Now the arrow (meant to look like set inclusion!) says "for every element of the RHS list in turn, call that element [z]". [z] is the list containing the single element z, so the element "filter(==x)y" can only be called "[z]" if it contains exactly one element. Otherwise it gets discarded and is never used as a value of z. So the z's (which are returned on the left of the | in the list comprehension) are exactly the x's that make the filter return a list of length one.
That was my second version, my first version returns x instead of z - because they're the same anyway - and renames z to _ which is the Haskell symbol for "this value isn't going to be used so I'm not going to complicate my code by giving it a name".
Javascript 1.8
s.split('').filter(function (o,i,a) a.filter(function(p) o===p).length <2 ).join('');
or alternately- similar to the python example:
[s[c] for (c in s) if (s.split("").filter(function(p) s[c]===p).length <2)].join('');
TCL
123 chars. It might be possible to get it shorter, but this is good enough for me.
proc h {i {r {}}} {foreach c [split $i {}] {if {[llength [split $i $c]]==2} {set r $r$c}}
return $r}
puts [h [gets stdin]]
C
Full program in C, 141 bytes (counting newlines).
#include<stdio.h>
c,n[256],o,i=1;main(){for(;c-EOF;c=getchar())c-EOF?n[c]=n[c]?-1:o++:0;for(;i<o;i++)for(c=0;c<256;c++)n[c]-i?0:putchar(c);}
Scala
54 chars for the method body only, 66 with (statically typed) method declaration:
def s(s:String)=(""/:s)((a,b)=>if(s.filter(c=>c==b).size>1)a else a+b)
Ruby
63 chars.
puts (t=gets.split(//)).map{|i|t.count(i)>1?nil:i}.compact.join
VB.NET / LINQ
96 characters for complete working statement
Dim p=New String((From c In"nbHHkRvrXbvkn"Group c By c Into i=Count Where i=1 Select c).ToArray)
Complete working statement, with original string and the VB Specific "Pretty listing (reformatting of code" turned off, at 96 characters, non-working statement without original string at 84 characters.
(Please make sure your code works before answering. Thank you.)
C
(1st version: 112 characters; 2nd version: 107 characters)
k[256],o[100000],p,c;main(){while((c=getchar())!=-1)++k[o[p++]=c];for(c=0;c<p;c++)if(k[o[c]]==1)putchar(o[c]);}
That's
/* #include <stdio.h> */
/* int */ k[256], o[100000], p, c;
/* int */ main(/* void */) {
while((c=getchar()) != -1/*EOF*/) {
++k[o[p++] = /*(unsigned char)*/c];
}
for(c=0; c<p; c++) {
if(k[o[c]] == 1) {
putchar(o[c]);
}
}
/* return 0; */
}
Because getchar() returns int and putchar accepts int, the #include can 'safely' be removed.
Without the include, EOF is not defined, so I used -1 instead (and gained a char).
This program only works as intended for inputs with less than 100000 characters!
Version 2, with thanks to strager
107 characters
#ifdef NICE_LAYOUT
#include <stdio.h>
/* global variables are initialized to 0 */
int char_count[256]; /* k in the other layout */
int char_order[999999]; /* o ... */
int char_index; /* p */
int main(int ch_n_loop, char **dummy) /* c */
/* variable with 2 uses */
{
(void)dummy; /* make warning about unused variable go away */
while ((ch_n_loop = getchar()) >= 0) /* EOF is, by definition, negative */
{
++char_count[ ( char_order[char_index++] = ch_n_loop ) ];
/* assignment, and increment, inside the array index */
}
/* reuse ch_n_loop */
for (ch_n_loop = 0; ch_n_loop < char_index; ch_n_loop++) {
(char_count[char_order[ch_n_loop]] - 1) ? 0 : putchar(char_order[ch_n_loop]);
}
return 0;
}
#else
k[256],o[999999],p;main(c){while((c=getchar())>=0)++k[o[p++]=c];for(c=0;c<p;c++)k[o[c]]-1?0:putchar(o[c]);}
#endif
Javascript 1.6
s.match(/(.)(?=.*\1)/g).map(function(m){s=s.replace(RegExp(m,'g'),'')})
Shorter than the previously posted Javascript 1.8 solution (71 chars vs 85)
Assembler
Tested with WinXP DOS box (cmd.exe):
xchg cx,bp
std
mov al,2
rep stosb
inc cl
l0: ; to save a byte, I've encoded the instruction to exit the program into the
; low byte of the offset in the following instruction:
lea si,[di+01c3h]
push si
l1: mov dx,bp
mov ah,6
int 21h
jz l2
mov bl,al
shr byte ptr [di+bx],cl
jz l1
inc si
mov [si],bx
jmp l1
l2: pop si
l3: inc si
mov bl,[si]
cmp bl,bh
je l0+2
cmp [di+bx],cl
jne l3
mov dl,bl
mov ah,2
int 21h
jmp l3
Assembles to 53 bytes. Reads standard input and writes results to standard output, eg:
programname < input > output
PHP
118 characters actual code (plus 6 characters for the PHP block tag):
<?php
$s=trim(fgets(STDIN));$x='';while(strlen($s)){$t=str_replace($s[0],'',substr($s,1),$c);$x.=$c?'':$s[0];$s=$t;}echo$x;
C# (53 Characters)
Where s is your input string:
new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());
Or 59 with re-assignment:
var a=new string(s.Where(c=>s.Count(h=>h==c)<2).ToArray());
Haskell Pointfree
import Data.List
import Control.Monad
import Control.Arrow
main=interact$liftM2(\\)nub$ap(\\)nub
The whole program is 97 characters, but the real meat is just 23 characters. The rest is just imports and bringing the function into the IO monad. In ghci with the modules loaded it's just
(liftM2(\\)nub$ap(\\)nub) "nbHHkRvrXbvkn"
In even more ridiculous pointfree style (pointless style?):
main=interact$liftM2 ap liftM2 ap(\\)nub
It's a bit longer though at 26 chars for the function itself.
Shell/Coreutils, 37 Characters
fold -w1|sort|uniq -u|paste -s -d ''

Resources