SQL Azure and Membership Provider Tenant ID - asp.net-membership

What might be a good way to introduce BIGINT into the ASP.NET Membership functionality to reference users uniquely and to use that BIGINT field as a tenant_id? It would be perfect to keep the existing functionality generating UserIds in the form of GUIDs and not to implement a membership provider from ground zero. Since application will be running on multiple servers, the BIGINT tenant_id must be unique and it should not depend on some central authority generating these IDs. It will be easy to use these tenant_id with a SPLIT AT command down the road which will allow bucketing users into new federated members. Any thoughts on this?
Thanks

You can use bigint. But you may have to modify all stored procedures that rely on user ID. Making ID global unique is usually not a problem. As long as the ID is the primary key, database will force it to be unique. Otherwise you will get errors when inserting new data (in that case, you can modify ID and retry).
So the most important difference is you may need to modify stored procedures. You have a choice here. If you use GUID, you don't need to do anything. But it may be difficult to predict how to split the federation to balance queries. As pointed out in another thread (http://stackoverflow.com/questions/10885768/sql-azure-split-on-uniqueidentifier-guid/10890552#comment14211028_10890552), you can sort existing data at the mid point. But you don't know future data will be inserted in which federation. There's a potential risk that federations will become unbalanced, and you may need to merge and split them at a regular interval to keep them in shape.
By using bigint, you have better control over the key. For example, you have two federations. The first has ID from 1 to 10000, and the second has ID from 10001 to 20000. When creating a new user, you first check how many records are in each federation. Suppose federation 1 has 500 records and federation 2 has 1000 records, to balance the load, you choose to insert to federation 1, so you choose an ID between 1 and 10000. But using bigint, you may need to do more work to modify stored procedures.

Related

DynamoDB Throughput vs Search time

I've just figured out a big mistake I had while creating the dynamodb structure.
I've created 11 tables, whereas one of them is the table mostly refereed to and the others are complementary tables.
For example, I have a table where I hold names (together with other info) called "Names" and another table called "NamesMappings" holding all these names added to the "Names" table so that each time a user wants to add a name to the "Names" table he first tries to put the name in "NamesMappings" and only if it succeed (therefore this name doesn't exist) he can add the name into the "Names" table. This procedure helps if the name is not unique and is not the primary key in the "Names" table and with this technique I don't have to search inside the "Names" table if the name exists, but instead I can try to add it to the "NamesMappings" table and only if it succeed I know this is a unique name.
First of all, I would like to ask you if this is a common approach or there is a better one?
Next, I figured out that with this design I soon reached to 11 tables each has 5 provisioned capacity of read and write which leads to overall 55 provisioned read and write under the free-tier. Then I understood why I get all these payments each month, because as the number of tables is getting bigger, and I leave the provisioned capacity as default (both read/write capacity are 5) I get more and more provisioned capacity.
So, what should be my conclusion from this understanding? Should I try to reduce the number of tables even if it takes more effort to preform scanning and querying inside the table? Or should I split the table same as I do but reduce the capacity of these mappings tables used only for indication if an item exists or not in another table?
If I understand your problem correctly you're missing the whole concept of NoSQL Databases.
Your Names table should have a Hash key (which is similar to a Primary key) that has a uniformly generated identifier (an UUID is a great candidate). This would automatically make this Table queryable by this unique identifier. You said, however, that you don't know the ID but you only know the Name instead. This leads me to think you could create a Global Secondary Index (GSI) on the Name attribute inside the Names table so you can also query by Name. Up to this point, your table structure should look like this:
id | name
Both of them are independently queryable, which gives you a lot of flexibility already.
Now, let's say you want to add the NameMapping attribute (which I don't know how it looks like), you can simply add it under the Names table, getting rid of the NamesMappings table, greatly reducing the number of WCUs and RCUs across your account. Your table structure should now look like this:
id | name | mappings
where mappings is, let's say, a JSON object.
Since you can only query on top level attributes in DynamoDB, you can now perform a query against the name attribute which has a GSI configured. If the query returns nothing, then name is unique. But let's say you still need some data inside the mappings object, then you could query by name and, in your code, you could apply a map/filter/reduce operation on the mappings attribute and decide what to do next.
Remember that duplication is just OK in a NoSQL world. This may look scary if you come from a purely SQL background, but data should be stored in such a way in NoSQL databases that you should be able to fetch all the needed information in one go, therefore avoiding "joins" (joins are still possible in a NoSQL database, but since there are no strong relationships between entities, you need to perform these joins manually on the code level). To give you some real context, imagine you have a Orders table where you keep track of the ordered Products and the Store that the Order belongs to: you'd save both the Products and the Store objects (and not their IDs, as it would happen in the SQL way) inside the Order object, so if you want to query for a given OrderId in the future, you wouldn't need to make extra calls (aka "joins") to the Product/Store tables to fetch the information, since everything would already be stored inside the Order object.

What might be the purpose of this column in eTRM (Oracle eBusiness suite)

I realize this is quite specialized question(about Oracle's eTRM + eBusiness suite ) I'm trying to figure out the meaning of this
REMIT_TO_ADDRESS_ID NUMBER (15)
which comes from the AR.RA_CUSTOMER_TRX_ALL table . The reason is that in a query I have, there's a bug like this where we say:
LEFT OUTER JOIN ra_customer_trx_all
ON rct.REMIT_TO_ADDRESS_ID = acct.REMIT_TO_ADDRESS_ID \
(acct is from the table hz_cust_acct_sites_all , by the way)
My guess is that REMIT_TO_ADDRESS_ID is some kind of meta-data?
I really appreciate any pointers/tips. Thanks.
Little bit rusty, but did Oracle Apps for 10 years. From your question I understand that you are new to Oracle Apps technology. ra_customer_trx_all stands for:
"RA" => "Accounts Receivables" also known as "AR" (something you sell and want money for),
"customer" says it,
"trx" => "transactions",
"_all" => all records across all organisations (multi-org).
It is a nice table with lots of features :-)
When in Oracle Apps a column is listed with name ending in '_id' and data type of number(15, 0), it is generally a reference to a row in another table. Depending on the Oracle Apps module, you will sometimes find also a foreign key constraint. But generally most Oracle Apps modules rely on the frontend to enforce referential integrity.
So remit_to_address_id refers to another table. In this case address information. Also, the naming of the column tells us that the referred row is used in a special way (role) namely as "remit to".
You might want to join it to the address table of Apps. When you do so, please check the columns listed in the indexes. The multi-org field org_id may be listed first (probably not in AR). If you forget them, you will still have good results since the ID-s are unique across the system, but the index might not be used.
For end user queries, I generally recommend to use the multi-orged view instead of the _all table. This ensures that users only see their current organisation. Remember that you need to set up the client_identifier session variable (if I recall correctly) to store the current organisation ID in.
I hope this helps you.
I have no knowledge of eTRM, or any other Oracle business application.
That said, as a complete wild guess, I would say that the REMIT_TO_ADDRESS_ID is the ID of an address that a payment of some kind is sent to, and that the address is optional (thus the outer join). So, in an Accounts Payable system, you may have a vendor, who has a normal business address. But when you send actual monies, they have an optional Remit To Address, and the payment is sent there instead of the normal business address.

Surrogate key in 'User' / 'Role' tables for desktop app? Whats the purpose?

I have to add some security for a C#/.NET WinForms/Desktop application. I am using Oracle DB back-end.
The tables are simple: User (ID,Name), Role(ID,Role), UserRole(UserID,RoleID).
I am using the windows account name to populate User table. Role table will for now just be simply 'Admin','SuperUser','BasicUser'...
Since no two people could ever possible have the same windows account name... even when I do not control these name management (netops does, hence why I want to use windows accounts so I don't have to manage it ;)). For Role table, I should again never have dupe value - I control the input, there will only be 3 (tactical app going away within year). UserRole is a join table to represent the Many-To-Many relationships of users and roles, so no surragate key is justified.
Simple question - Why bother with 'ID' (int) in the User and Role table? Any point or advantage here? Is this one of those 'I've always done it this way' type things? Or have I just not done this in awhile and forget the reason?
Names change - primary key values must not. Abigail Smith becomes Abigail Jones and the username changes but a surrogate key protects against having to cascade those changes everywhere.
If you are using a surrogate key but there is a column or combination of columns which should be unique, then enforce that using a unique index. There's a good chance you'll want indexes on your user.name and role.role columns anyway, and a unique index is more space efficient and supplies useful metadata to the optimizer. If you have a surrogate key but don't have another combination of columns that uniquely identify a row then think again whether you have your entity definition right.
One caution. Especially for very narrow tables with few access paths, you may use an index-organized table. Oracle will only allow an index organized table on the primary key, but does allow foreign keys against a unique set of columns (if it is enforced by a unique constraint, not simply a unique index).
It is possible that you'll end up with a table where a unique ID is enforced through a unique index and treated as PK by an ORM and used as the parent for foreign key relationships, but the primary key (as defined in the DB) is the rolename/username/whatever because you want that as the driver for an index-organised table.
A surrogate key is not required on intersection tables, but here are a few reasons to do so:
Consistency: If every table has a single artificial key, you always know the key name when you know the table name.
Ease Of Use: Less typing — one key means ON and WHERE clauses are shorter and thus less error-prone.
Interoperability: Some ORMs only work well with tables with a single primary key column.

Random ID generation on Sign Up - Database Performance

I am making a site that each account will have an ID.
But, I didn't want to make it incrementable, meaning:
id=1
id=2
...
id=1000
What I want is to have random IDs:
id=2355
id=5647734
id=23532
...
(The reason is to avoid robots to check all accounts profiles by just incrementing a ID in URL - and maybe other reason, but that is not the question)
But, I am worried about performance on registration.
It will be something like this:
while (RANDOM_ID is not taken): generate new RANDOM_ID
On generating a new ID for the new account, I will query database (MySQL) to check if the ID exists, for each generation.
Is there any better solution for this?
Is there any disadvantage of using random IDs?
Thanks in advance.
There are many, many reasons not to do this:
Your solution, as written, is not transactionally-safe; two transactions at the same time could both generate the same "random" ID.
If you serialize the transaction in order to make it safe, you will slaughter performance because the query will keep every single collision row locked until it finds a spare ID.
Using a random ID as the primary key will fragment the hell out of your clustered index. This is bad enough with uuids - the whole point of an auto-generated identity column is so you can generate a safe sequence out of it.
Why not use a regular primary key, but just don't use that in any of your URLs? Generate a secondary non-sequential ID along with it - such as a uuid - index it, and use this column in any public-facing segments of your application instead of the primary key if you are really worried about security.
You can use UUIDs. It's a unique identifier generated based partly on timestamp. It's almost certainly guaranteed to be unique so you don't have to do a query to check.
i do not know what language you're using, but there should be library or sample code for this for most languages.
Yes you can use UUID but keep your auto_increment field. Just add a new field and set it so something like: md5(microtime(true).rand()) or whatever other method you like and use that unike key along the site to make the links instead to expose the primary key in urls.

ASP.NET Membership Provider, User ID GUID, and disk space

I'm currently using the SQL Membership provider for ASP.NET, which uses GUIDs for the User ID. My application has several custom tables that have foreign key relations back to the User table and I'm concerned about the disk space and performance implications of the standard provider's use of GUIDs for user ID.
Has anyone run into space / performance issues related to this and if so are there custom approaches that people have implemented to address this?
Any insight or suggestions would be most appreciated.
Thanks
I doubt you'll have any space issues as a result of using GUIDs rather than INT types for example. One thing I will warn you about is that you might be tempted to create clustered indexes on the GUID columns in the database. DO NOT DO THIS. By default, GUIDs are random, and inserting random data into a column that has a clustered index causes a few issues. Clustered, as you might know, means IN PHYSICAL STORAGE SEQUENCE. So when you insert a new random value (GUID) that row usually has to be inserted into the middle of the table. This can lead to massively fragmented indexes.
My advice would be to create a table that links the GUIDs to INT values (BIGINT if you expect that many users) and then use the INT everywhere else. Like Fermin just said.
Could you not have a custom table which maps the GUID to an integer value which you can then use the integer in custom tables?
UserId guid
FriendlyUserId int //use this as FK in other tables?
If you are using SQL Server 2005, you may want to look at the NewSequentialId() method. Eric Swann provides a good overview of its use with the Membership provider. There is also a nice article on benefits of using sequential GUIDs over the default random ones. Here is a performance comparison excerpt from the article...
[Reads] [Writes] [Leaf Pages] [Avg Page Used] [Avg Fragmentation] [Record Count]
IDENTITY(,) 0 1,683 1,667 98.9% 0.7% 50,000
NEWID() 0 5,386 2,486 69.3% 99.2% 50,000
NEWSEQUENTIALID() 0 1,746 1,725 99.9% 1.0% 50,000

Resources