Julia writing to binary error - binaryfiles

I'm trying to write to binary data using from a partitioned data frame. Generally, this process works fine however occasionally I get some errors. I have a written a basic conditional to address the error (I have also used try/catch blocks but I'm working with a relatively large data set and so I think the Boolean might be faster if that assumption is false feel free to make fun of me and/or my friends). Here is some code:
for x in RICT["$i"]["Numbers"]
if typeof(x) == "NAtype"
write(f3, convert(ASCIIString, "$x" ))
else
write(f3, convert(Int32, x ) )
end
end
here is the error which my diminutive understanding of life and Julia tells me I shouldn't be seeing:
no method convert(Type{Int32},NAtype)
Thanks so much.

The output of typeof(x) is not a string so it will never match "NAtype". Remove the quotation marks from around NAtype and then it should work.

Related

Halide::Expr' is not contextually convertible to 'bool' -- Storing values of functions in variables

I am new to using Halide and I am playing around with implementing algorithms first. I am trying to write a function which, depending on the value of the 8 pixels around it, either skips to the next pixel or does some processing and then moves on to the next pixel. When trying to write this I get the following compiler error:
84:5: error: value of type 'Halide::Expr' is not contextually convertible to 'bool'
if(input(x,y) > 0)
I have done all the tutorials and have seen that the select function is an option, but is there a way to either compare the values of a function or store them somewhere?
I also may be thinking about this problem wrong or might not be implementing it with the right "Halide mindset", so any suggestions would be great. Thank you in advance for everything!
The underlying issue here is that, although they are syntactically interleaved, and Halide code is constructed by running C++ code, Halide code is not C++ code and vice versa. Halide code is entirely defined by the Halide::* data structures you build up inside Funcs. if is a C control flow construct; you can use it to conditionally build different Halide programs, but you can't use it inside the logic of the Halide program (inside an Expr/Func). select is to Halide (an Expr which conditionally evaluates to one of two values) as if/else is to C (a statement which conditionally executes one of two sub-statements).
Rest assured, you're hardly alone in having this confusion early on. I want to write a tutorial specifically addressing how to think about staged programming inside Halide.
Until then, the short, "how do I do what I want" answer is as you suspected and as Khouri pointed out: use a select.
Since you've provided no code other than the one line, I'm assuming input is a Func and both x and y are Vars. If so, the result of input(x,y) is an Expr that you cannot evaluate with an if, as the error message indicates.
For the scenario that you describe, you might have something like this:
Var x, y;
Func input; input(x,y) = ...;
Func output; output(x,y) = select
// examine surrounding values
( input(x-1,y-1) > 0
&& input(x+0,y-1) > 0
&& ...
&& input(x+1,y+1) > 0
// true case
, ( input(x-1,y-1)
+ input(x+0,y-1)
+ ...
+ input(x+1,y+1)
) / 8
// false case
, input(x,y)
);
Working in Halide definitely requires a different mindset. You have to think in a more mathematical form. That is, a statement of a(x,y) = b(x,y) will be enforced for all cases of x and y.
Algorithm and scheduling should be separate, although the algorithm may need to be tweaked to allow for better scheduling.

'OR' Syntax in (g)Lua

I have limited knowledge of lua and would like to make an or statement.
However, I don't know the exact syntax.
Would the code below work correctly?
if text == "/teamspeak" or text == "/ts" then
If not please let me know on the correct syntax of the statement.
Yes, the statements are correct. You do not have any syntactical errors there, though you might want to check whether text contains only the command or the whole string (as is the case with ptokax). You might also want to check that the command is uppercase/lowercase or mixed-casing.
local sCmd = text:lower()
if sCmd == "/ts" or sCmd == "/teamspeak" then
...
end
Lua uses the keyword or for or statements.
I recommend reading the Lua language reference.
Your code would work correctly if you terminate the if then statement with end.
Best way is to try it yourself. If you do not have Lua installed you can use http://www.lua.org/demo.html
And please note that nil is not the same as false! Many Lua beginners have problems here.
That statement should work, though I suggest converting the string to lowercase first, as jhpotter92 already suggested.
A typical problem in cases like this is when the order the language deals with operands is not the one you'd expect; if, for example, lua were to evaluate the or before the == operator (which it doesn't, see reference) that code would not work. Therefore it is never a bad idea to write your code like this
if (text == "/teamspeak") or (text = "/ts") then <...> end
just to be sure lua does things in the correct order.
If you ever find yourself in this kind of situation again, and you don't want to wait for someone to respond to your question, you can just start lua in interactive mode (assuming you have lua installed on your system, which is very helpful for everyone who wants to learn/code in lua) and type something like
> text = "/teamspeak"
> if text == "/teamspeak" or text == "/ts" then print "true ♥" end
In this example, the console will output "true ♥". Repeat this with text="/ts" and text="some other string" and see if the line of code behaves as it should.
This shouldn't take you longer than 5 minutes (maybe +5 minutes to install lua first)

try catch or type conversion performance in julia - (Julia 73 seconds, Python 0.5 seconds)

I have been playing with Julia because it seems syntactically similar to python (which I like) but claims to be faster. However, I tried making a similar script to something I have in python for tesing where numerical values are within a text file which uses this function:
function isFloat(s)
try:
float64(s)
return true
catch:
return false
end
end
For some reason, this takes a great deal of time for a text file with a reasonable amount of rows of text (~500000).
Why would this be? Is there a better way to do this? What general feature of the language can I understand from this to apply to other languages?
Here are the two exact scripts i ran with the times for reference:
python: ~0.5 seconds
def is_number(s):
try:
np.float64(s)
return True
except ValueError:
return False
start = time.time()
file_data = open('SMW100.asc').readlines()
file_data = map(lambda line: line.rstrip('\n').replace(',',' ').split(), file_data)
bools = [(all(map(is_number, x)), x) for x in file_data]
print time.time() - start
julia: ~73.5 seconds
start = time()
function isFloat(s)
try:
float64(s)
return true
catch:
return false
end
end
x = map(x-> split(replace(x, ",", " ")), open(readlines, "SMW100.asc"))
u = [(all(map(isFloat, i)), i) for i in x]
print(start - time())
Note also that you can use the float64_isvalid function in the standard library to (a) check whether a string is a valid floating-point value and (b) return the value.
Note also that the colons (:) after try and catch in your isFloat code are wrong in Julia (this is a Pythonism).
A much faster version of your code should be:
const isFloat2_out = [1.0]
isFloat2(s::String) = float64_isvalid(s, isFloat2_out)
function foo(L)
x = split(L, ",")
(all(isFloat2, x), x)
end
u = map(foo, open(readlines, "SMW100.asc"))
On my machine, for a sample file with 100,000 rows and 10 columns of data, 50% of which are valid numbers, your Python code takes 4.21 seconds and my Julia code takes 2.45 seconds.
This is an interesting performance problem that might be worth submitting to julia-users to get more focused feedback than SO will probably provide. At a first glance, I think you're hitting problems because (1) try/catch is just slightly slow to begin with and then (2) you're using try/catch in a context where there's a very considerable amount of type uncertainty because of lots of function calls that don't return stable types. As a result, the Julia interpreter spend its time trying to figure out the types of objects rather than doing your computation. It's a bit hard to tell exactly where the big bottlenecks are because you're doing a lot of things that are not very idiomatic in Julia. Also you seem to be doing your computations in the global scope, where Julia's compiler can't perform many meaningful optimizations due to additional type uncertainty.
Python is oddly ambiguous on the subject of whether using exceptions for control flow is good or bad. See Python using exceptions for control flow considered bad?. But even in Python, the consensus is that user code shouldn't use exceptions for control flow (although for some reason generators are allowed to do this). So basically, the simple answer is that you should not be doing that – exceptions are for exceptional situations, not for control flow. That is why almost zero effort has been put into making Julia's try/catch construct faster – you shouldn't be using it like that in the first place. Of course, we will probably get around to making it faster at some point.
That said, the onus is on us as the designers of Julia's standard library to make sure that we provide APIs that never force you to use exceptions for control flow. In this case, you need a function that allows you to try to parse something as a floating-point value and indicate whether that was possible or not – not by throwing an exception, but rather by returning normal values. We don't provide such an API, so this ultimately a shortcoming of Julia's standard library – as it exists right now. I've opened an issue to discuss this API design question: https://github.com/JuliaLang/julia/issues/5704. We'll see how it pans out.

Unicode - the right thing to do

I'm working on something which processes UTF-8 encoding, and I found myself asking the question:
What should I do when I encounter a byte which never occur inside a
UTF-8 encoded string?
i.e. 0x1111111X
For example, I'm writing a small snippet of code which looks at the current place in the stream of bytes, and tells you how many bytes are used to represent the code point at that place in the stream.
0x0XXXXXXX just 1
0x10XXXXXX oops, we are in a continuation byte,
search back upstream to find the leading byte
0x11XXXXXX count the
number of leading 1s, that's the answer
0x1111111X err, this is not
possible in UTF-8!!! what to do!?!?
I'm thinking of returning an error value, but wondering if I should, as a side effect, replace it with some more predictable error glyph (I mean the code point representing said glyph). And later when I do something more complicated, like jumping through the string and find that the leading byte does not have the correct number of continuation bytes after it... I'm thinking I should "fix" that up too.
Is it standard practice to leave wrongly encoded strings broken, or to change them and make them be wrong but at least play nice?
The most common way is to just throw a meaningful error if the input is not correct and stop.
There are a lot of good reasons to do so:
speed: if you try to fix errors this often cause your
function to be slower even on correct inputs
simplicity: your code can become really complicated if you try to fix any error
maintainability and correctness: it's just easier to ensure the function works correctly
when you stop whenever the input does not match the specification you are working with. Since you have only to check input according to specification.
purpose: any time you get to such a point like here you have to think about:
what is the purpose of my function? Why I came up with the idea to write it?
Also: a function fixcode which fixes the uft8 could be used also at an other place, so it makes total sense to separate fixing (purpose, simplicity, maintainability and correctness argument again).
Even if you expect an error, I would prefer to separate the encode and fixcode since
your can reuse fixcode in outer contexts.
If you are really thinking about fixing the utf8 code while encoding I would use a pattern like this:
try {
q = encode(s);
} catch(encodingerror) {
log(encodingerror);
t = fixcode(s);
q = encode(t);
}

Clone detection algorithm

I'm writing an algorithm that detects clones in source code. E.g. if there is a block like:
for(int i = o; i <5; i++){
doSomething(abc);
}
...and if this block is repeated somewhere else in the source code it will be detected as a clone. The method I am using at the moment is to create hashes for lines/blocks and compare them with hashes of other lines/blocks in the same source to see if there are any matches.
Now, if the same block as above was to be repeated somewhere with only the argument of doSomething different, it would not be detected as a clone even though it would appear very much like a clone to you and me. My algorithm detects exact matches but doesn't detect matching blocks where only the argument is different.
Could anyone suggest any ways of getting around this issue? Thanks!
Here's a super-simple way, which might go too far in erasing information (i.e., might produce too many false positives): replace every identifier that isn't a keyword with some fixed name. So you'd get
for (int DUMMY = DUMMY; DUMMY<5; DUMMY++) {
DUMMY(DUMMY);
}
(assuming you really meant o rather than 0 in the initialization part of the for-loop).
If you get a huge number of false positives with this, you could then post-process them by, for instance, looking to see what fraction of the DUMMYs actually correspond to the same identifier in both halves of the match, or at least to identifiers that are consistent between the two.
To do much better you'll probably need to parse the code to some extent. That would be a lot more work.
Well if you're going todo something else then you're going to have to parse to code at least a bit. For example you could detect methods and then ignore the method arguments in your hash. Anyway I think it's always true that you need your program to understand the code better than 'just text blocks', and that might get awefuly complicated.

Resources