Position information in validation errors - validation

The problem
I'll start with a simplified parsing problem. Suppose I've got a list of strings that I want to parse into a list of integers, and that I want to accumulate errors. This is pretty easy in Scalaz 7:
val lines = List("12", "13", "13a", "14", "foo")
def parseLines(lines: List[String]) = lines.traverseU(_.parseInt.toValidationNel)
We can confirm that it works as expected:
scala> parseLines(lines).fold(_.foreach(t => println(t.getMessage)), println)
For input string: "13a"
For input string: "foo"
This is nice, but suppose the list is pretty long, and that I decide I want to capture more information about the context of the errors in order to make clean-up easier. For the sake of simplicity I'll just use (zero-indexed) line numbers here to represent the position, but the context could also include a file name or other information.
Passing around position
One simple approach is to pass the position to my line parser:
type Position = Int
case class InvalidLine(pos: Position, message: String) extends Throwable(
f"At $pos%d: $message%s"
)
def parseLine(line: String, pos: Position) = line.parseInt.leftMap(
_ => InvalidLine(pos, f"$line%s is not an integer!")
)
def parseLines(lines: List[String]) = lines.zipWithIndex.traverseU(
(parseLine _).tupled andThen (_.toValidationNel)
)
This also works:
scala> parseLines(lines).fold(_.foreach(t => println(t.getMessage)), println)
At 2: 13a is not an integer!
At 4: foo is not an integer!
But in more complex situations passing the position around like this gets unpleasant.
Wrapping errors
Another option is to wrap the error produced by the line parser:
case class InvalidLine(pos: Position, underlying: Throwable) extends Throwable(
f"At $pos%d: ${underlying.getMessage}%s",
underlying
)
def parseLines(lines: List[String]) = lines.zipWithIndex.traverseU {
case (line, pos) => line.parseInt.leftMap(InvalidLine(pos, _)).toValidationNel
}
And again, it works just fine:
scala> parseLines(lines).fold(_.foreach(t => println(t.getMessage)), println)
At 2: For input string: "13a"
At 4: For input string: "foo"
But sometimes I have a nice error ADT and this kind of wrapping doesn't feel particularly elegant.
Returning "partial" errors
A third approach would be to have my line parser return a partial error that needs to be combined with some additional information (the position, in this case). I'll use Reader here, but we could just as well represent the failure type as something like Position => Throwable. We can reuse our first (non-wrapping) InvalidLine above.
def parseLine(line: String) = line.parseInt.leftMap(
error => Reader(InvalidLine((_: Position), error.getMessage))
)
def parseLines(lines: List[String]) = lines.zipWithIndex.traverseU {
case (line, pos) => parseLine(line).leftMap(_.run(pos)).toValidationNel
}
Once again this produces the desired output, but also feels kind of verbose and clunky.
Question
I come across this kind of problem all the time—I'm parsing some messy data and want nice helpful error messages, but I also don't want to thread a bunch of position information through all of my parsing logic.
Is there some reason to prefer one of the approaches above? Are there better approaches?

I use a combination of your first and second options with locally-requested stackless exceptions for control flow. This is the best thing I've found to keep error handling both completely bulletproof and mostly out of the way. The basic form looks like this:
Ok.or[InvalidLine]{ bad =>
if (somethingWentWrong) bad(InvalidLine(x))
else y.parse(bad) // Parsers should know about sending back info!
}
where bad throws an exception when called that returns the data passed to it, and the output is a custom Either-like type. If it becomes important to inject additional context from the outer scope, adding an extra transformer step is all it takes to add context:
Ok.or[Invalid].explain(i: InvalidLine => Invalid(i, myFile)) { bad =>
// Parsing logic
}
Actually creating the classes to make this work is a little more fiddly than I want to post here (especially since there are additional considerations in all of my actual working code which obscures the details), but this is the logic.
Oh, and since this ends up just being an apply method on a class, you can always
val validate = Ok.or[Invalid].explain(/* blah */)
validate { bad => parseA }
validate { bad => parseB }
and all the usual tricks.
(I suppose it's not fully obvious that the type signature of bad is bad: InvalidLine => Nothing, and the type signature of the apply is (InvalidLine => Nothing) => T.)

An overly-simplified solution could be:
import scala.util.{Try, Success, Failure}
def parseLines(lines: List[String]): List[Try[Int]] =
lines map { l => Try (l.toInt) }
val lines = List("12", "13", "13a", "14", "foo")
println("LINES: " + lines)
val parsedLines = parseLines(lines)
println("PARSED: " + parsedLines)
val anyFailed: Boolean = parsedLines.exists(_.isFailure)
println("FAILURES EXIST?: " + anyFailed)
val failures: List[Throwable] = parsedLines.filter(_.isFailure).map{ case Failure(e) => e }
println("FAILURES: " + failures)
val parsedWithIndex = parsedLines.zipWithIndex
println("PARSED LINES WITH INDEX: " + parsedWithIndex)
val failuresWithIndex = parsedWithIndex.filter{ case (v, i) => v.isFailure }
println("FAILURES WITH INDEX: " + failuresWithIndex)
Prints:
LINES: List(12, 13, 13a, 14, foo)
PARSED: List(Success(12), Success(13), Failure(java.lang.NumberFormatException: For input string: "13a"), Success(14), Failure(java.lang.NumberFormatException: For input string: "foo"))
FAILURES EXIST?: true
FAILURES: List(java.lang.NumberFormatException: For input string: "13a", java.lang.NumberFormatException: For input string: "foo")
PARSED LINES WITH INDEX: List((Success(12),0), (Success(13),1), (Failure(java.lang.NumberFormatException: For input string: "13a"),2), (Success(14),3), (Failure(java.lang.NumberFormatException: For input string: "foo"),4))
FAILURES WITH INDEX: List((Failure(java.lang.NumberFormatException: For input string: "13a"),2), (Failure(java.lang.NumberFormatException: For input string: "foo"),4))
Given that you could wrap all this in a helper class, abstract parsing function, generalize input and output types and even define error type whether it's an exception or something else.
What I'm suggesting is a simple map based approach, the exact types could be defined based on a task.
The annoying thing is that you have to keep a reference to parsedWithIndex in order to be able to get indexes and exceptions unless your exceptions will hold indexes and other context information.
Example of an implementation:
case class Transformer[From, To](input: List[From], f: From => To) {
import scala.util.{Try, Success, Failure}
lazy val transformedWithIndex: List[(Try[To], Int)] =
input map { l => Try ( f(l) ) } zipWithIndex
def failuresWithIndex =
transformedWithIndex.filter { case (v, i) => v.isFailure }
lazy val failuresExist: Boolean =
! failuresWithIndex.isEmpty
def successfulOnly: List[To] =
for {
(e, _) <- transformedWithIndex
value <- e.toOption
} yield value
}
val lines = List("12", "13", "13a", "14", "foo")
val res = Transformer(lines, (l: String) => l.toInt)
println("FAILURES EXIST?: " + res.failuresExist)
println("PARSED LINES WITH INDEX: " + res.transformedWithIndex)
println("SUCCESSFUL ONLY: " + res.successfulOnly)
Prints:
FAILURES EXIST?: true
PARSED LINES WITH INDEX: List((Success(12),0), (Success(13),1), (Failure(java.lang.NumberFormatException: For input string: "13a"),2), (Success(14),3), (Failure(java.lang.NumberFormatException: For input string: "foo"),4))
SUCCESSFUL ONLY: List(12, 13, 14)
Try can be replaced with Either or your own custom Failure.
This does feel a bit more object oriented rather than functional.

Related

Performance of Table Search vs. string.find in Lua

Let's say that I want to store some alphanumeric data. I can either use a table:
t = { "a", "b", "c" }
or a string:
s = "abc"
When I want to test if 'b' is in my data set, I can test the table by saying:
function findInTable(table, element)
for i, v in ipairs(table) do
if v == element then return true end
end
return false
end
if findInTable(t, "b") then
--do stuff
or I can test the string by saying:
if s:find("b") then
--do stuff
Which of these methods is faster? I imagine that string.find is essentially doing the same thing as my findInTable function. I need to check data in this way on every draw for a game, so performance is critical. A lot of my data is being extracted from text files and it's easier to keep it in string format rather than using commas or some such as delimiters to organize it into table values.
Consider doing this:
t = { ["a"]=true, ["b"]=true, ["c"]=true }
Then to test if a string s is in t, simply do
if t[s] then ...
I do similar things between two frames in LÖVE [love2d].
And without benchmarking it i can say: Its fast enough
( Frames decreases with many drawable objects only )
Moreover i would like to add the table functions to the table t as methods...
t = setmetatable({"a", "b", "c"}, {__index = table})
print(t[t:concat():find("b")])
-- Output: b
-- False Output: nil
-- And nil never happens if...
print(t[t:concat():find("d")] or t[#t])
-- Output: c
-- ..fallback to a default with: or
...check it out or change this in...
https://www.lua.org/demo.html
...to figure out how it works.

What is the most efficient way to replace a list of words without touching html attributes?

I absolutely disagree that this question is a duplicate! I am asking for an efficiency way to replace hundreds of words at once. This is an algorithm question! All the provided links are about to replace one word. Should I repeat that expensive operation hundreds of times? I'm sure that there are better ways as a suffix tree where I sort out html while building that tree. I removed that regex tag since for no good reason you are focusing on that part.
I want to translate a given set of words (more then 100) and translate them. My first idea was to use a simple regular expression that works better then expected. As sample:
const input = "I like strawberry cheese cake with apple juice"
const en2de = {
apple: "Apfel",
strawberry: "Erdbeere",
banana: "Banane",
/* ... */}
input.replace(new RegExp(Object.keys(en2de).join("|"), "gi"), match => en2de[match.toLowerCase()])
This works fine on the first view. However it become strange if you words which contains each other like "pineapple" that would return "pineApfel" which is totally nonsense. So I was thinking about checking word boundaries and similar things. While playing around I created this test case:
Apple is a company
That created the output:
Apfel is a company.
The translation is wrong, which is somehow tolerable, but the link is broken. That must not happen.
So I was thinking about extend the regex to check if there is a quote before. I know well that html parsing with regex is a bad idea, but I thought that this should work anyway. In the end I gave up and was looking for solutions of other devs and found on Stack Overflow a couple of questions, all without answers, so it seems to be a hard problem (or my search skills are bad).
So I went two steps back and was thinking to implement that myself with a parser or something like that. However since I have multiple inputs and I need to ignore the case I was thinking what the best way is.
Right now I think to build a dictionary with pointers to the position of the words. I would store the dict in lower case so that should be fast, I could also skip all words with the wrong prefix etc to get my matches. In the end I would replace the words from the end to the beginning to avoid breaking the indices. But is that way efficiency? Is there a better way to achieve that?
While my sample is in JavaScript the solution must not be in JS as long the solution doesn't include dozens of dependencies which cannot be translated easy to JS.
TL;DR:
I want to replace multiple words by other words in a case insensitive way without breaking html.
You may try a treeWalker and replace the text inplace.
To find words you may tokenize your text, lower case your words and map them.
const mapText = (dic, s) => {
return s.replace(/[a-zA-Z-_]+/g, w => {
return dic[w.toLowerCase()] || w
})
}
const dic = {
'grodzi': 'grodzila',
'laaaa': 'forever',
}
const treeWalker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT
)
// skip body node
let currentNode = treeWalker.nextNode()
while(currentNode) {
const newS = mapText(dic, currentNode.data)
currentNode.data = newS
currentNode = treeWalker.nextNode()
}
p {background:#eeeeee;}
<p>
grodzi
LAAAA
</p>
The link stay untouched.
However mapping each word in an other language is bound to fail (be it missing representation of some word, humour/irony, or simply grammar construct). For this matter (which is a hard problem on its own) you may rely on some tools to translate data for you (neural networks, api(s), ...)
Here is my current work in progress solution of a suffix tree (or at least how I interpreted it). I'm building a dictionary with all words, which are not inside of a tag, with their position. After sorting the dict I replace them all. This works for me without handling html at all.
function suffixTree(input) {
const dict = new Map()
let start = 0
let insideTag = false
// define word borders
const borders = ' .,<"\'(){}\r\n\t'.split('')
// build dictionary
for (let end = 0; end <= input.length; end++) {
const c = input[end]
if (c === '<') insideTag = true
if (c === '>') {
insideTag = false
start = end + 1
continue
}
if (insideTag && c !== '<') continue
if (borders.indexOf(c) >= 0) {
if(start !== end) {
const word = input.substring(start, end).toLowerCase()
const entry = dict.get(word) || []
// save the word and its position in an array so when the word is
// used multiple times that we can use this list
entry.push([start, end])
dict.set(word, entry)
}
start = end + 1
}
}
// last word handling
const word = input.substring(start).toLowerCase()
const entry = dict.get(word) || []
entry.push([start, input.length])
dict.set(word, entry)
// create a list of replace operations, we would break the
// indices if we do that directly
const words = Object.keys(en2de)
const replacements = []
words.forEach(word => {
(dict.get(word) || []).forEach(match => {
// [0] is start, [1] is end, [2] is the replacement
replacements.push([match[0], match[1], en2de[word]])
})
})
// converting the input to a char array and replacing the found ranges.
// beginning from the end and replace the ranges with the replacement
let output = [...input]
replacements.sort((a, b) => b[0] - a[0])
replacements.forEach(match => {
output.splice(match[0], match[1] - match[0], match[2])
})
return output.join('')
}
Feel free to leave a comment how this can be improved.

How to convert global enum values to string in Godot?

The "GlobalScope" class defines many fundamental enums like the Error enum.
I'm trying to produce meaningful logs when an error occurs. However printing a value of type Error only prints the integer, which is not very helpful.
The Godot documentation on enums indicates that looking up the value should work in a dictionary like fashion. However, trying to access Error[error_value] errors with:
The identifier "Error" isn't declared in the current scope.
How can I convert such enum values to string?
In the documentation you referenced, it explains that enums basically just create a bunch of constants:
enum {TILE_BRICK, TILE_FLOOR, TILE_SPIKE, TILE_TELEPORT}
# Is the same as:
const TILE_BRICK = 0
const TILE_FLOOR = 1
const TILE_SPIKE = 2
const TILE_TELEPORT = 3
However, the names of the identifiers of these constants only exist to make it easier for humans to read the code. They are replaced on runtime with something the machine can use, and are inaccessible later. If I want to print an identifier's name, I have to do so manually:
# Manually print TILE_FLOOR's name as a string, then its value.
print("The value of TILE_FLOOR is ", TILE_FLOOR)
So if your goal is to have descriptive error output, you should do so in a similar way, perhaps like so:
if unexpected_bug_found:
# Manually print the error description, then actually return the value.
print("ERR_BUG: There was a unexpected bug!")
return ERR_BUG
Now the relationship with dictionaries is that dictionaries can be made to act like enumerations, not the other way around. Enumerations are limited to be a list of identifiers with integer assignments, which dictionaries can do too. But they can also do other cool things, like have identifiers that are strings, which I believe you may have been thinking of:
const MyDict = {
NORMAL_KEY = 0,
'STRING_KEY' : 1, # uses a colon instead of equals sign
}
func _ready():
print("MyDict.NORMAL_KEY is ", MyDict.NORMAL_KEY) # valid
print("MyDict.STRING_KEY is ", MyDict.STRING_KEY) # valid
print("MyDict[NORMAL_KEY] is ", MyDict[NORMAL_KEY]) # INVALID
print("MyDict['STRING_KEY'] is ", MyDict['STRING_KEY']) # valid
# Dictionary['KEY'] only works if the key is a string.
This is useful in its own way, but even in this scenario, we assume to already have the string matching the identifier name explicitly in hand, meaning we may as well print that string manually as in the first example.
The naive approach I done for me, in a Singleton (in fact in a file that contain a lot of static funcs, referenced by a class_name)
static func get_error(global_error_constant:int) -> String:
var info := Engine.get_version_info()
var version := "%s.%s" % [info.major, info.minor]
var default := ["OK","FAILED","ERR_UNAVAILABLE","ERR_UNCONFIGURED","ERR_UNAUTHORIZED","ERR_PARAMETER_RANGE_ERROR","ERR_OUT_OF_MEMORY","ERR_FILE_NOT_FOUND","ERR_FILE_BAD_DRIVE","ERR_FILE_BAD_PATH","ERR_FILE_NO_PERMISSION","ERR_FILE_ALREADY_IN_USE","ERR_FILE_CANT_OPEN","ERR_FILE_CANT_WRITE","ERR_FILE_CANT_READ","ERR_FILE_UNRECOGNIZED","ERR_FILE_CORRUPT","ERR_FILE_MISSING_DEPENDENCIES","ERR_FILE_EOF","ERR_CANT_OPEN","ERR_CANT_CREATE","ERR_QUERY_FAILED","ERR_ALREADY_IN_USE","ERR_LOCKED","ERR_TIMEOUT","ERR_CANT_CONNECT","ERR_CANT_RESOLVE","ERR_CONNECTION_ERROR","ERR_CANT_ACQUIRE_RESOURCE","ERR_CANT_FORK","ERR_INVALID_DATA","ERR_INVALID_PARAMETER","ERR_ALREADY_EXISTS","ERR_DOES_NOT_EXIST","ERR_DATABASE_CANT_READ","ERR_DATABASE_CANT_WRITE","ERR_COMPILATION_FAILED","ERR_METHOD_NOT_FOUND","ERR_LINK_FAILED","ERR_SCRIPT_FAILED","ERR_CYCLIC_LINK","ERR_INVALID_DECLARATION","ERR_DUPLICATE_SYMBOL","ERR_PARSE_ERROR","ERR_BUSY","ERR_SKIP","ERR_HELP","ERR_BUG","ERR_PRINTER_ON_FIR"]
match version:
"3.4":
return default[global_error_constant]
# Regexp to use on #GlobalScope documentation
# \s+=\s+.+ replace by nothing
# (\w+)\s+ replace by "$1", (with quotes and comma)
printerr("you must check and add %s version in get_error()" % version)
return default[global_error_constant]
So print(MyClass.get_error(err)), or assert(!err, MyClass.get_error(err)) is handy
For non globals I made this, though it was not your question, it is highly related.
It would be useful to be able to access to #GlobalScope and #GDScript, maybe due a memory cost ?
static func get_enum_flags(_class:String, _enum:String, flags:int) -> PoolStringArray:
var ret := PoolStringArray()
var enum_flags := ClassDB.class_get_enum_constants(_class, _enum)
for i in enum_flags.size():
if (1 << i) & flags:
ret.append(enum_flags[i])
return ret
static func get_constant_or_enum(_class:String, number:int, _enum:="") -> String:
if _enum:
return ClassDB.class_get_enum_constants(_class, _enum)[number]
return ClassDB.class_get_integer_constant_list(_class)[number]

Ruby implicit conversion of String into Integer (typeError)

I am trying to use a YAML file, reading from it and writing to it a list of values. On the first run of this script, the yaml file is correctly created, but then on the second it throws a conversion TypeError which I don't know to fix.
db_yml = 'store.yml'
require 'psych'
begin
if File.exist?(db_yml)
yml = Psych.load_file(db_yml)
puts "done load"
yml['reminders']['reminder_a'] = [123,456]
yml['reminders']['reminder_b'] = [457,635,123]
File.write(db_yml, Psych.dump(yml) )
else
#the file does not exist yet, create an empty one.
File.write(db_yml, Psych.dump(
{'reminders' => [
{'reminder_a'=> [nil]},
{'reminder_b'=> [nil]}
]}
)) #Store
end
rescue IOError => msg
# display the system generated error message
puts msg
end
produces the file store.yml on first run:
---
reminders:
- reminder_a:
-
- reminder_b:
-
So far so good. But then on the second run it fails with
done load
yamlstore.rb:23:in `[]=': no implicit conversion of String into Integer (TypeError)
from yamlstore.rb:23:in `<main>'
Could you tell me where I am going wrong?
The error message says that you were passing a String where Ruby expects something that is implicitly convertible to an Integer. The number one place where Ruby expects something that is implicitly convertible to an Integer is when indexing into an Array. So, whenever you see this error message, you can be 99% sure that you are either indexing an Array with something you thought was an Integer but isn't, or that you are indexing an Array that you thought was something else (most likely a Hash). (The other possibility is that you are trying to do arithmetic with a mix of Integers and Strings.)
Just because Ruby is a dynamically-typed programming language does not mean that you don't need to care about types. In particular, YAML is a (somewhat) typed serialization format.
The type of the file you are creating looks something like this:
Map<String, Sequence<Map<String, Sequence<Int | null>>>>
However, you are accessing it, as if it were typed like this:
Map<String, Map<String, Sequence<Int | null>>>
To put it more concretely, you are creating the value corresponding to the key 'reminders' as a sequence (in YAML terms, an Array in Ruby terms) of maps (Hashes). Arrays are indexed by Integers.
You, however, are indexing it with a String, as if it were a Hash.
So, you either need to change how you access the values like this:
yml['reminders'][0]['reminder_a'] = [123, 456]
# ↑↑↑
yml['reminders'][1]['reminder_b'] = [457,635,123]
# ↑↑↑
Or change the way you initialize the file like this:
File.write(db_yml, Psych.dump(
{ 'reminders' => {
# ↑
'reminder_a' => [nil],
# ↑ ↑
'reminder_b' => [nil]
# ↑ ↑
}
so that the resulting YAML document looks like this:
---
reminders:
reminder_a:
-
reminder_b:
-
There is nothing wrong with the YAML file. However you create the file you create it with the following structure:
yaml = {
'reminders' => [
{'reminder_a'=> [nil]},
{'reminder_b'=> [nil]}
]
}
Notice that the contents of yaml['reminders'] is an array. Where it goes wrong is here:
reminders = yaml['reminders']
reminder_a = reminders['reminder_a'] # <= error
# in the question example:
# yml['reminders']['reminder_a'] = [123,456]
Since reminders is an array you can't access it by passing a string as index. You have 2 options:
In my opinion the best option (if you want to access the reminders by key) is changing the structure to use a hash instead of an array:
yaml = {
'reminders' => {
'reminder_a'=> [nil],
'reminder_b'=> [nil]
}
}
With the above structure you can access your reminder through:
yaml['reminders']['reminder_a']
Somewhat clumsy, find the array element with the correct key:
yaml['reminders'].each do |reminder|
reminder['reminder_a'] = [123,456] if reminder.key? 'reminder_a'
reminder['reminder_b'] = [457,635,123] if reminder.key? 'reminder_b'
end

Processing a list of Scalaz6 Validation

Is there an idiomatic way to handle a collection of Validation in Scalaz6?
val results:Seq[Validation[A,B]]
val exceptions = results.collect{case Failure(exception)=>exception}
exceptions.foreach{logger.error("Error when starting up ccxy gottware",_)}
val success = results.collect{case Success(data)=>data}
success.foreach {data => data.push}
if (exceptions.isEmpty)
containers.foreach( _.start())
I could think of using a fold when looping on results, but what about the final test?
The usual way to work with a list of validations is to use sequence to turn the list into a Validation[A, List[B]], which will be be empty (i.e., a Failure) if there were any errors along the way.
Sequencing a Validation accumulates errors (as opposed to Either, which fails immediately) in the semigroup of the left-hand type. This is why you often see ValidationNEL (where the NEL stands for NonEmptyList) used instead of simply Validation. So for example if you have this result type:
import scalaz._, Scalaz._
type ExceptionsOr[A] = ValidationNEL[Exception, A]
And some results:
val results: Seq[ExceptionsOr[Int]] = Seq(
"13".parseInt.liftFailNel, "42".parseInt.liftFailNel
)
Sequencing will give you the following:
scala> results.sequence
res0: ExceptionsOr[Seq[Int]] = Success(List(13, 42))
If we had some errors like this, on the other hand:
val results: Seq[ExceptionsOr[Int]] = Seq(
"13".parseInt.liftFailNel, "a".parseInt.liftFailNel, "b".parseInt.liftFailNel
)
We'd end up with a Failure (note that I've reformatted the output to make it legible here):
scala> results.sequence
res1: ExceptionsOr[Seq[Int]] = Failure(
NonEmptyList(
java.lang.NumberFormatException: For input string: "a",
java.lang.NumberFormatException: For input string: "b"
)
)
So in your case you'd write something like this:
val results: Seq[ValidationNEL[A, B]]
results.sequence match {
case Success(xs) => xs.foreach(_.push); containers.foreach(_.start())
case Failure(exceptions) => exceptions.foreach(
logger.error("Error when starting up ccxy gottware", _)
)
}
See my answers here and here for more detail about sequence and about Validation more generally.

Resources