How to avoid inserting the wrong data type in SQLite tables? - debugging

SQLite has this "feature" whereas even when you create a column of type INTEGER or REAL, it allows you to insert a string into it, even a string without numbers in it, like "the quick fox jumped over the lazy dog".
How do you prevent this kind of insertions to happen in your projects?
I mean, when my code has an error that leads to that kind of insertions or updates, I want the program to give an error, so I can debug it, not simply insert garbage in my database silently.

You can implement this using the CHECK constraint (see previous answer here). This would look like
CREATE TABLE T (
N INTEGER CHECK(TYPEOF(N) = 'integer'),
Str TEXT CHECK(TYPEOF(Str) = 'text'),
Dt DATETIME CHECK(JULIANDAY(Dt) IS NOT NULL)
);

The better and safer way is write a function (isNumeric, isString, etc) that validates the user input...

Related

(var)char as the type of the column for performance?

I have a column called "status" in PostgreSQL. First it used to be "status_id" of type integer. The values were kept on client, so there was no table on the server called statuses where I'd keep those statuses and then do inner join with the first table.
I used to send the ids of the statuses from the client (they had the names on the client). However, at some point I understood I'd better make the server hold those statuses. Not in a separate table but in the first one and I want to make them strings. So the initial table will have a status column of type string (varchar, to be more specific). I read it wouldn't be that slow.
In general, is it a good idea? I suppose it is because doing inner join (in case I'd keep statuses in the separate table) each time is expensive as well as sending ids from the client.
1) The only concern I have is that the column status should be of type char, not varchar. It should make it more effective I suppose. Is that so?
2) If the first case is correct then I'm not sure I'll be able to name all the statuses using exactly the same amount of characters, let's say, 5 characters. Some of them might be longer, some shorter. How can I solve this?
UPDATE:
It's not denationalization because I'm talking about 1 single table. There is no and has never been the second table called Statuses with the fields (id, status_name).
What I'm trying to convey is that I could use char(n) for status_name and also add index on it. Then it should be fast enough. However, it might be or not possible to name all the statuses with the certain (n) amount of characters and that's the only concern.
I don't think so using char or varchar instead integer is good idea. It is hard to expect how much slower it will be than integer PK, but this design will be slower - impact will be more terrible when you will join larger tables. If you can, use ENUM types instead.
http://www.postgresql.org/docs/9.2/static/datatype-enum.html
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
CREATE TABLE person (
name text,
current_mood mood
);
INSERT INTO person VALUES ('Moe', 'happy');
SELECT * FROM person WHERE current_mood = 'happy';
name | current_mood
------+--------------
Moe | happy
(1 row)
PostgreSQL varchar and char types are very similar. Internal implementation is same - char can be (it is paradox) little bit slower due addition by spaces.
I'd go one step further. Never use the outdated data type char(n), unless you know you have to (for compatibility or some rare exotic reason). The type is utterly useless in a modern database. Padding strings with blank characters is nonsense, and if you have to do it, you can do it in a cheaper fashion with rpad() on data retrieval.
SELECT rpad('short', 10) AS char_10_string;
varchar is basically the same as text and allows a length specifier: varchar(n). I generally use just text. If I need to limit the length, I use a CHECK constraint. Here's one example, why.
Whenever you can use a simple integer (or enum) instead, that's a bit smaller and faster in every respect. Consider #Pavel's answer for enum.
As for:
because doing inner join (...) each time is expensive
Well, it carries a small cost, but it's generally cheaper than redundantly saving text representation of the status instead of a much cheaper integer in the main table. That kind of rumor is spread by people having problems understanding the concept of database normalization. The enum type is a compromise here - for relatively static sets of values.

SQLITE3 strings in where clauses seem confused

I'm wondering if anyone has any clarification on the difference between the following statements using sqlite3 gem with ruby 1.9.x:
#db.execute("INSERT INTO table(a,b,c) VALUES (?,?,?)",
some_int, other_int, some_string)
and
#db.execute("INSERT INTO table(a,b,c) VALUES (#{some_int},"+
+"#{some_int}, #{some_string})")
My problem is: When I use the first method for insertion, I can't query for the "c" column using the following statement:
SELECT * FROM table WHERE c='some magic value'
I can use this:
"SELECT * FROM table WHERE c=?", "some magic value"
but what I really want to use is
"SELECT * FROM table WHERE c IN ('#{options.join("','")}')"
And this doesn't work with the type of inserts.
Does anyone know what the difference is at the database level that is preventing the IN from working properly?
I figured this out quite a while ago, but forgot to come back and point it out, in case someone finds this question at another time.
The difference turns out to be blobs. Apparently when you use the first form above (the substitution method using (?,?)) SQLite3 uses blogs to enter the data. However, if you construct an ordinary SQL statement, it's inserted as a regular string and the two aren't equivalent.
Insert is not possible to row query but row query used in get data that time this one working.
SQLite in you used in mobile app that time not work bat this row query you write in SQLite Browse in that work

Reversing a string using an index in Oracle

I have a table that has IDs and Strings and I need to be able to properly index for searching for the end of the strings. How we are currently handling it is copying the information into another table and reversing each string and indexing it normally. What I would like to do is use some kind of index that allows to search in reverse.
Example
Data:
F7421kFSD1234
d7421kFSD1235
F7541kFSD1236
d7421kFSD1234
F7421kFSD1235
b8765kFSD1235
d7421kFSD1234
The way our users usually input thier search is something along the lines of...
*1234
By reversing the strings (and the search string: 4321*) I could find what I am looking for without completely scanning the whole table. My question is: Is making a second table the best way of doing this?
Is there a way to reverse index?
Ive tried an index like this...
create index REVERSE_STR_IDX on TABLE(STRING) REVERSE;
but oracle doesn't seem to be using it according to the Explain Plan.
Thanks in advance for the help.
Update:
I did have a problem with unicode characters not being reversed correctly. The solution to this was casting them.
Example:
select REVERSE(cast(string AS varchar2(2000)))
from tbl
where id = 1
There is the myth that a reverse key index can be used for that, however, I've never seen that in action.
I would try a "manual" function based index.
CREATE INDEX REVERSE_STR_IDX on TBL(reverse(string));
SELECT *
FROM TBL
WHERE reverse(string) LIKE '4321%';

Insert VS (Select and Insert)

I am writing a simple program to insert rows into a table.But when i started writing the program i got a doubt. In my program i will get duplicate input some times. That time i have to notify the user that this already exists.
Which of the Following Approaches is good to Use to achieve this
Directly Perform Insert statement will get the primary key violation error if it is duplicate notify otherwise it will be inserted. One Query to Perform
First make a search for the primary key values. If found a Value Prompt User. Otherwise perform insert operation.For a non-duplicate row this approach takes 2 queries.
Please let me know trade-offs between these approaches. Which one is best to follow ?
Regards,
Sunny.
I would choose the 2nd approach.
The first one would cause an exception to be thrown which is known to be very expensive...
The 2nd approach would use a SELECT count(*) FROM mytable WHERE key = userinput which will be very fast and the INSERT statement for which you can use the same DB connection object (assuming OO ;) ).
Using prepared statements will pre-optimize the queries and I think that will make the 2nd approach much better and mre flexible than the first one.
EDIT: depending on your DBMS you can also use a if not exists clause
EDIT2: I think Java would throw a SQLExcpetion no matter what went wrong, i.e. using the 1st approach you wouldn't be able to differ between a duplicate entry or an unavailable database without having to parse the error message - which is again a point for using SELECT+INSERT (or if not exists)

iterating over linq entity column

i need to insert a record with linq
i have a namevaluecollection with the data from a form post..
so started in the name=value&name2=value2 etc.. type format
thing is i need to inset all these values into the table, but of course the table fields are typed, and i need to type up the data before inserting it
i could of course explicitly do
linqtableobj.columnproperty = convert.toWhatever(value);
but i have many columns in the table, and the data coming back from the form, doesnt always contain all fields in the table
thought i could iterate over the linq objects columns, getting their datatype - to use to convert the appropriate value from the form data
fine all good, but then im still stuck with doing
linqtableobj.columnproterty = converted value
...if there is one for every column in the table
foreach(col in newlinqrowobj)
{
newlinqobj[col] = convert.changetype(namevaluecollection[col.name],col.datatype)
}
clearly i cant do that, but anything like that possible.. or
is it possible to loop around the columns for the new 'record' setting the values as i go.. and i guess grabbing the types at that point to do the conversion
stumped i am
thanks
nat
If you have some data type with a hundred different properties, and you want to copy those into a completely different data type with a hundred different properties, then somehow somewhere in your code you are going to have to define a hundred different "mapping" instructions. It doesn't matter what framework you are using, or whether the "mapping" instructions are lines of C# code, XML elements, lambda functions, proprietary "stuff", or whatever. There's no getting away from it.
Bearing that in mind, having one line of code per property looks to me like the fastest, simplest, most readable and maintainable solution.
If I understood your problem correctly, you could use reflection (or dynamic code generation if it is performance sensitive) to circumvent your typing problems
There is a preety good description of how to do something like this at codeproject.
Basically you get a PropertyInfo for the property you want to set (if it's not a property I think you would need dynamic code generation) and use it's setValue method (after calling the appropriate Convert.ChangeType of course). This will basicall circumvent the whole static typing, so there you are.

Resources