do i need to set length for every poco property in Entity Framework Code First? - model-view-controller

do i need to set length for every poco property in Entity Framework Code First ? if i dont
set stringLength or maxlength/minlength for a property , it will be nvarchar(max) , how bad is nvarchar(max) ? should i just leave it alone in development stage , and improve it before production ?

You should define a Max length for each property where you want to restrict the length. Note that the nvarchar(max) data type is different from the nvarchar(n) datatype, where n is a number from 1-4000. The max version that you get when you define no max length is meant for large blocks of text, like paragraphs and the like. It can handle extremely large lengths, and so the data is stored separately from the rest of the fields of the record. nvarchar(n), on the other hand, is stored inline with the rest of the rows.
It's probably best to go ahead and set those values as you want now, rather than waiting to do so later. Choose values that are as large as you will ever need, so you never have to increase them. nvarchar(n) stores its info efficiently; for example, a nvarchar(200) does not necessarily take up 200 characters of space; it only uses enough space to store what is actually put into it, plus a couple extra bytes for saving its length.
So whenever possible, you should set a limit on your entity's text fields.

NVARCHAR - is variable length field. So it consumes only space you need for it. On the other hand NCHAR allocates all the space it requires not on demand as NVARCHAR does.
MSDN advises to use nvarchar when the sizes of the column data entries are probably going to vary considerably.
It's the way to go as for me on the early stages of a project. You can tune it when needed.

According to the next blog post nvarchar(max) is not the same as ntext until the actual value size does not reach 4000 symbols (cause limitation is 8K, and widechars use two bytes per char). As far as it hits this size it behaves pretty much the same as ntext. So as for me I don't see any good reason to avoid using nvarchar(max) data type.

Related

Storing long text in Datastore

Is Datastore suitable to store really long text, e.g. profile descriptions and articles?
If not, what's the Google Cloud alternative?
If yes, what would be the ideal way to store it in order to maintain formatting such as linebreaks and markdown supported keywords? Simply store as string or convert to byte? And should I be worried about dirty user input?
I need it for a Go project (I don't think language is relevant, but maybe Go have some useful features for this)
Yes, it's suitable if you're OK with certain limitations.
These limitations are:
the overall entity size (properties + indices) must not exceed 1 MB (this should be OK for profiles and most articles)
texts longer than a certain limit (currently 1500 bytes) cannot be indexed, so the entity may store a longer string, but you won't be able to search in it / include it in query filters; don't forget to tag these fields with "noindex"
As for the type, you may simply use string, e.g.:
type Post struct {
UserID int64 `datastore:"uid"`
Content string `datastore:"content,noindex"`
}
string types preserve all formatting, including newlines, HTML, markup and whatever formatting.
"Dirty user input?" That's the issue of rendering / presenting the data. The datastore will not try to interpret it or attempt to perform any action based on its content, nor will transform it. So from the Datastore point of view, you have nothing to worry about (you don't create text GQLs by appending text ever, right?!).
Also note that if you're going to store large texts in your entities, those large texts will be fetched whenever you load / query such entities, and you also must send it when you modify and (re)save such an entity.
Tip #1: Use projection queries if you don't need the whole texts in certain queries to avoid "big" data movement (and so to ultimately speed up queries).
Tip #2: To "ease" the burden of not being able to index large texts, you may add duplicate properties like a short summary or title of the large text, because string values shorter than 1500 bytes can be indexed.
Tip #3: If you want to go over the 1 MB entity size limit, or you just want to generally decrease your datastore size usage, you may opt to store large texts compressed inside entities. Since they are long, you can't search / filter them anyway, but they are very well compressed (often below 40% of the original). So if you have many long texts, you can shrink your datastore size to like 1 third just by storing all texts compressed. Of course this will add to the entity save / load time (as you have to compress / decompress the texts), but often it is still worth it.

SQLite DB Size Column Data Type Considerations

I'm working with an SQLite DB where all columns are of NVARCHAR data type.
Coming from MS SQL background I know that NVARCHAR has additional baggage associated with it, my first impulse is to refactor most column types to have concrete string lengths enforced (most are under 50 chars long).
But at the same time I know that SQLite treats things a bit differently.
So my question is should I change/refactor the column types? And by doing so is there anything to gain in terms of disk space or performance in SQLite?
DB runs on Android/iOS devices.
Thanks!
You should read https://www.sqlite.org/datatype3.html.
CHARACTER(20), VARCHAR(255), VARYING CHARACTER(255), NCHAR(55)
NATIVE CHARACTER(70), NVARCHAR(100), TEXT and CLOB are treated as TEXT.
Also SQLite does not enforce lengths.
So instead of digging through critic documentation I did a bit of experimenting with column types.
Database I'm working with has well over 1 million records, with most columns as NVARCHAR, so any change on column datatypes was easily seen in file size deltas.
Here are the results I found in effort to reduce DB size:
NVARCHAR:
Biggest savings came from switching column types where possible from NVARCHAR to plain INT or FLOAT. On a DB file of 80MB savings were in Megabytes, very noticable. With some additional refactoring I dropped the size down to 47MB.
NVARCHAR vs. VARCHAR:
Made very little difference, perhaps a few KBs on a DB of a size of 80MBs
NVARCHAR vs. OTHER String Types:
Switching between various string based types made almost no difference, as documentation points out all string types are stored all the same in SQLite, as TEXT
INT vs OTHER numerics
No Difference here, SQLite stores all as NUMBER in the end.
Indexes based on NVARCHAR columns also took up more space, once re-indexed on INT columns I shedded a few MBs

Oracle Performance terrible after changing Varchar2 fields to NVarchar2

I've been developing a DotNet project on oracle (Ver 10.2) for the last couple of months and was using Varchar2 for my string data fields. This was fine and when navigating the project page refreshes were never more than a half second if even (it's quiet a data intensive project). The data is referenced from 2 different schemas, one a centralised store of data and one of which is my own. Now the centralised schema will be changing to be unicode compliant (but hasn't yet) so all Varchar2 fields will become NVarchar2, in preparation for this I changed all the fields in my schema to be NVarchar2 and since then performance has been horrible .. up to 30/40 second page refreshes.
Could this be because Varchar2 fields in the centralised schema will be joined against NVarchar2 fields in my schema on some stored procedures. I know NVarchar2 is twice the size of Varchar2 but that wouldn't explain the sudden massive change. As I said any tips for what to look for to improve would be great, if I haven't explained the scenario well enough do ask for more information.
Firstly, do a
select * from v$nls_parameters where parameter like '%SET%';
Character sets can be complicated. You can have single-byte charactersets, fixed-size multibyte character set sand variable-sized multi-byte character sets. See the unicode descriptions here
Secondly, if you are joining a string in a single-byte characterset to a string in a two-byte characters set, you have a choice. You can do a binary/byte comparison (which generally won't match anything if you compare between a single-byte character set and a two-byte characterset). Or you can do a linguistic comparison, which will generally mean some CPU cost, as one value is converted into another, and often the failure to use an index.
Indexes are ordered, A,B,C etc. But a character like Ä may fall in different places depending on the Linguistic order. Say the index structure puts Ä between A and B. But then you do a linguistic comparison. The language of that comparison may put Ä after Z, in which case the index can't be used. (Remember your condition could be a BETWEEN rather than an = ).
In short, you'll need a lot of preparation, both in your schema and the central store, to enable efficient joins between different charactersets.
It is difficult to say anything based on what you have provided. Did you manage to check if the estimated cardinalities and/or explain plan changed when you changed the datatype to NVARCHAR2? You may want to read the following blog post to see if you can find a lead
http://joze-senegacnik.blogspot.com/2009/12/cbo-oddities-in-determing-selectivity.html
It is likely no longer able to use indexes that it previously could. As Narendra suggests check the explain plan to see what changed. It is possible that once the centeralized store is changed the indexes will again be usable. I suggest testing that path.
Setting the NLS_LANG initialization parameter properly is essential to proper data conversion. The character set that is specified by the NLS_LANG initialization parameter should reflect the setting for the client operating system. Setting NLS_LANG correctly enables proper conversion from the client operating system code page to the database character set. When these settings are the same, Oracle assumes that the data being sent or received is encoded in the same character set as the database character set, so no validation or conversion is performed. This can lead to corrupt data if conversions are necessary.

Is it reasonable to use small blobs in Oracle?

In Oracle LongRaw and Varchar2 have a max length of 4kb in Oracle, but I need to store objects of 8kb & 16kb, so I'm wondering what's a good solution. I know I could use a Blob, but a Blob has variable length and is basically an extra file behind the scenes if I'm correct, a feature and a Price I'm not interested in paying for my Objects.
Are there any other solutions or datatypes that are more suited to this kind of need?
Thanks
A blob is not a file behind the scene. It is stored in the database. Why does it matter that it has variable length? You can just use a blob column (or clob if your data is text data) and it gets stored in its own segment.
You should use a BLOB.
A BLOB is not stored as an extra file, it's stored as a block in one of your datafiles (just like other data). If the BLOB becomes too large for a single block (which may not happen in your case) then it will continue in another block.
If your BLOB data is really small, you can get Oracle to store it inline with other data in your row (like a varchar2).
Internally, Oracle is doing something similar to what PAX suggested. The chunks are as big as a DB block minus some overhead. If you try and re-invent Oracle features on top of Oracle it's only going to be slower than the native feature.
You will also have to re-implement a whole heap of functionality that is already provided in DBMS_LOB (length, comparisons, etc).
Why don't you segment the binary data and store it in 4K chunks? You could either have four different columns for these chunks (and a length column for rebuilding them into your big structure) or the more normalized way of another table with the chunks in it tied back to the original table record.
This would provide for expansion should you need it in future.
For example:
Primary table:
-- normal columns --
chunk_id integer
chunk_last_len integer
Chunk table:
chunk_id integer
chunk_sequence integer
chunk varchar2(whatever)
primary key (chunk_id,chunk_sequence)
Of course, you may find that your DBMS does exactly that sort of behavior under the covers for BLOBs and it may be more efficient to let Oracle handle it, relieving you of the need to manually reconstruct your data from individual chunks. I'd measure the performance of each to figure out the best approach.
Don't store binary data in varchar2 columns, unless you are willing to encode them (base64 or similar). Character set issues might corrupt your data otherwise!
Try the following statement to see the effect:
select * from (select rownum-1 original, ascii(chr(rownum-1)) data from user_tab_columns where rownum<=256) where original<>data;
Varchar2 is of variable length just as well. If you need to store binary data of any bigger than small size in your database, you'll have to look in blob's direction. Another solutiuon is of course storing the binary somewhere on the file system, and storing the path to the file as a varchar in the db.

How does SCN_TO_TIMESTAMP work?

Does the SCN itself encode a timestamp or is it a lookup from some table.
From an AskTom post he explains that the timestamp to +/-3seconds is stored in raw field in smon_scn_time. IS that where the function is going to get the value?
If so, when is that table purged if ever? If so, what triggers that purge?
If it is, does that make it impossible to translate old SCN's to Timestamps?
If it's impossible, then it eliminates any uses of that field that are long term things (read: auditing).
If I put that function in a query, would joining to that table be faster?
If so, anyone know how to covert that Raw column?
The SCN does not encode a time value. I believe it is an autoincrementing number.
I would guess that SMON is inserting a row into SMON_SCN_TIME (or whatever table underlies it) every time it increments the SCN, including the current timestamp.
I queried for the minimum recorded timestamp in several databases and they all go back about 5 days and have a little under 1500 rows in the table. So it is less than the instance lifetime.
I imagine the lower bound on how long the data is kept might be determined by the DB_FLASHBACK_RETENTION_TARGET parameter, which defaults to 1 day.
I would recommend using the function, they've probably provided it so they can change the internals at will.
No idea what the RAW column TIM_SCN_MAP contains, but the TIME_DP and SCN column would appear to give you the mapping.

Resources