Create unique ID's in VFP - uniqueidentifier

I need to create unique record id's in VFP based on mailing information: zip5, address, lastname, firstname. Once created, relational tables will be loaded in SQL server 7 with the unique ID's. Any suggestinos?

You can use a GUID: GUID entry at FoxPro Wiki.
And here are some examples.
Easiest one using WSH...
* VFP 7+
oGUID = CreateObject("scriptlet.typelib")
cGUID = Strextract(oGUID.GUID, "{", "}" )
* Other VFP
oGUID = CreateObject("scriptlet.typelib")
cGUID = substr( oGUID.GUID, 2, 36 )

VFP does have support for unique IDs - in that it has Primary Indexes (which can be based on multiple fields - but make sure the key length is fixed, so if your VFP table uses varchars you will need to pad the fields) and a table can also have Candidate Indexes (where the indexed field(s) must be unique, just like a primary key, but you can have multiple candidate indexes per table).
Either of these will enforce uniqueness in your fields, but generating a primary key based on zip5, address, lastname and firstname will be inefficient. The suggestion of GUIDs will work nicely, or if you have VFP8 or later you can use an Autoinc column, which is analogous to an Identity column in SQL Server.
Incidentally, unique indexes are only used for backward compatibility..

I created my own function for that purpose to return a unique identifier. When I have a new record I just scan the database and have the function replace the unique identifier field (I call mine UID) in the database with a new uid if the record does not have one.
FUNCTION UIDgenerator()
LOCAL c_UID
LOCAL c_dump
c_UID = STRTRAN(SYS(2015),"_","") + [-]
c_dump = STRTRAN(SYS(2015),"_","")
c_dump = STRTRAN(SYS(2015),"_","")
c_dump = STRTRAN(SYS(2015),"_","")
c_UID = c_UID + STRTRAN(SYS(2015),"_","")
RETURN c_UID
endfunction() &&UIDGenerator
You don't have to do c_dump 3 times, but I wanted the identifiers to be a little more further apart.

Related

Oracle SQL Query Performance, Function based Indexes

I have been trying to fine tune a SQL Query that takes 1.5 Hrs to process approx 4,000 error records. The run time increases along with the number of rows.
I figured out there is one condition in my SQL that is actually causing the issue
AND (DECODE (aia.doc_sequence_value,
NULL, DECODE(aia.voucher_num,
NULL, SUBSTR(aia.invoice_num, 1, 10),
aia.voucher_num) ,
aia.doc_sequence_value) ||'_' ||
aila.line_number ||'_' ||
aida.distribution_line_number ||'_' ||
DECODE (aca.doc_sequence_value,
NULL, DECODE(aca.check_voucher_num,
NULL, SUBSTR(aca.check_number, 1, 10),
aca.check_voucher_num) ,
aca.doc_sequence_value)) = " P_ID"
(P_ID - a value from the first cursor sql)
(Note that these are standard Oracle Applications(ERP) Invoice tables)
P_ID column is from the staging table that is derived the same way as above derivation and compared here again in the second SQL to get the latest data for that record. (Basically reprocessing the error records, the value of P_ID is something like "999703_1_1_9995248" )
Q1) Can I create a function based index on the whole left side derivation? If so what is the syntax.
Q2) Would it be okay or against the oracle standard rules, to create a function based index on standard Oracle tables? (Not creating directly on the table itself)
Q3) If NOT what is the best approach to solve this issue?
Briefly, no you can't place a function-based index on that expression, because the input values are derived from four different tables (or table aliases).
What you might look into is a materialised view, but that's a big and potentially difficult to solve a single query optimisation problem with.
You might investigate decomposing that string "999703_1_1_9995248" and applying the relevant parts to the separate expressions:
DECODE(aia.doc_sequence_value,
NULL,
DECODE(aia.voucher_num,
NULL, SUBSTR(aia.invoice_num, 1, 10),
aia.voucher_num) ,
aia.doc_sequence_value) = '999703' and
aila.line_number = '1' and
aida.distribution_line_number = '1' and
DECODE (aca.doc_sequence_value,
NULL,
DECODE(aca.check_voucher_num,
NULL, SUBSTR(aca.check_number, 1, 10),
aca.check_voucher_num) ,
aca.doc_sequence_value)) = '9995248'
Then you can use indexes on the expressions and columns.
You could separate the four components of the P_ID value using regular expressions, or a combination of InStr() and SubStr()
Ad 1) Based on the SQL you've posted, you cannot create function based index on that. The reason is that function based indexes must be:
Deterministic - i.e. the function used in index definition has to always return the same result for given input arguments, and
Can only use columns from the table the index is created for. In your case - based on aliases you're using - you have four tables (aia, aila, aida, aca).
Req #2 makes it impossible to build a functional index for that expression.

Include variable as field name in linq statement

Hi I have a link statement against a database table I did not create... The data structure is
Tbl_BankHols
BHDate .... Datetime
E ......... Bit
S ......... Bit
W ......... Bit
I ......... Bit
Basically it has a list of dates and then a value of 0 or 1 in E, S, W, I which indicate if that date is a bank holiday in England, Scotland, Wales and/or Ireland.
If I want to find out if a date is a bank holiday in any of the countries my Linq statement is
Dim BHQ = From d in db.Tbl_BankHols _
Where d.BHDate = chkDate _
Select d.BHDate
Where chkDate is the date I am checking. If a result is returned then the date is a bank holiday in one of the countries.
I now need to find out if chkDate is a bank holiday in a particular country how do I introduce that into the where statement?
I'm asking if this is possible before I think about changing the structure of the database. I was thinking of just having a single country field as a string which will contain values like E, EW, EWS, EWSI, I and other similar combinations and then I just use WHERE BCountry LIKE %X% (where X is the country I'm interested in). Or is there a better way?
Erm stop,
Your suggestion for extra denormalisation and using LIKE is a really "wrong" idea.
You need one table for countries, lets call it Country and another table for holidays, lets call it Holiday. The Country table should contain a row for each country in your system/model. The Holiday table should have two columns. One for the Date and a foriegn key to country, lets call it CountryId.
Then your linq could look something like,
db.Holiday.Any(Function(h) h.Country.Name =
"SomeCountry" AndAlso h.Date = someDate)
The reasons why you shoudn't use LIKE for this are manifold but two major objections are.
LIKE doesn't perform well, its hard for an index to support it, and,
Lets imagine a situation where you need to store holidays for these countries,
Ecuador
El Salvador
Estonia
Ethiopia
England
Now, you have already assigned the code "E" to England, what code will you give to the others? No problem you say, "EL", "ET" ... but, already your LIKE "%E" condition is broken.
Here are the scripts for the schema I would go with.
CREATE TABLE [Country](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](100) NOT NULL,
CONSTRAINT [PK_Country] PRIMARY KEY CLUSTERED
(
[Id] ASC
));
CREATE TABLE [Holiday](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Date] [date] NOT NULL,
[CountryId] [int] NOT NULL FOREIGN KEY Country(Id),
CONSTRAINT [PK_Country] PRIMARY KEY CLUSTERED
(
[Id] ASC
));
Instead of changing the structure of your table the way you wrote, you could introduce a new Region (Code, Description) table and add a foreign key to your table pointing to the regions table. Your bank holidays table will then contain one record per (date/region) combination.
And your linq statement:
Dim BHQ = From d in db.Tbl_BankHols _
Where d.BHDate = chkDate And d.Region = "England" _
Select d.BHDate
before I think about changing the structure of the database
You can do it by composing a query with OR predicates by using LINQKit.
Dim IQueryable<BankHoliday> BHQ = ... (your query)
Dim pred = Predicate.False(Of BankHolifday)
If countryString.Contains("E")
pred = pred.Or(Function(h) h.E)
EndIf
If countryString.Contains("S")
pred = pred.Or(Function(h) h.S)
EndIf
...
Return query.Where(pred.Expand())
(I'm not fluent in VB so there may be some error in there)
See this answer for a similar example.
While the DB structure is not optimal, if you are working with legacy code and there's no effort given for a full refactor, you're gonna have to make do (I feel for you).
The simplest option I think you have is to select not just the date, but also the attributes.
Dim BHQ = From d in db.Tbl_BankHols _
Where d.BHDate = chkDate _
Select d
This code will give you d.S, d.E, d.W, and d.I, which you can use programmatically to determine if the holiday applies to whatever country you are currently working on. This means outside the query, you will have a separate if statement which would qualify if the holiday applies to the country you are processing.

Efficiently query table with conditions including array column in PostgreSQL

Need to come up with a way to efficiently execute a query with and array and integer columns in the WHERE clause, ordered by a timestamp column. Using PostgreSQL 9.2.
The query we need to execute is:
SELECT id
from table
where integer = <int_value>
and <text_value> = any (array_col)
order by timestamp
limit 1;
int_value is an integer value, and text_value is a 1 - 3 letter text value.
The table structure is like this:
Column | Type | Modifiers
---------------+-----------------------------+------------------------
id | text | not null
timestamp | timestamp without time zone |
array_col | text[] |
integer | integer |
How should I design indexes / modify the query to make it as efficient as possible?
Thanks so much! Let me know if more information is needed and I'll update ASAP.
PG can use indexes on array but you have to use array operators for that so instead of <text_value> = any (array_col) use ARRAY[<text_value>]<#array_col (https://stackoverflow.com/a/4059785/2115135). You can use the command SET enable_seqscan=false; to force pg to use indexes if it's possible to see if the ones you created are valid. Unfortunately GIN index can't be created on integer column so you will have to create two diffrent indexes for those two columns.
See the execution plans here: http://sqlfiddle.com/#!12/66a71/2
Unfortunately GIN index can't be created on integer column so you will have to create two diffrent indexes for those two columns.
That's not entirely true, you can use btree_gin or -btree_gist
-- feel free to use GIN
CREATE EXTENSION btree_gist;
CREATE INDEX ON table USING gist(id, array_col, timestamp);
VACUUM FULL ANALYZE table;
Now you can run the operation on the index itself
SELECT *
FROM table
WHERE id = ? AND array_col #> ?
ORDER BY timestamp;

Synchronizing two tables using a stored procedure, only updating and adding rows where values do not match

The scenario
I've got two tables with identical structure.
TABLE [INFORMATION], [SYNC_INFORMATION]
[ITEM] [nvarchar](255) NOT NULL
[DESCRIPTION] [nvarchar](255) NULL
[EXTRA] [nvarchar](255) NULL
[UNIT] [nvarchar](2) NULL
[COST] [float] NULL
[STOCK] [nvarchar](1) NULL
[CURRENCY] [nvarchar](255) NULL
[LASTUPDATE] [nvarchar](50) NULL
[IN] [nvarchar](4) NULL
[CLIENT] [nvarchar](255) NULL
I'm trying to create a synchronize procedure that will be triggered by a scheduled event at a given time every day.
CREATE PROCEDURE [dbo].[usp_SynchronizeInformation]
AS
BEGIN
SET NOCOUNT ON;
--Update all rows
UPDATE TARGET_TABLE
SET TARGET_TABLE.[DESCRIPTION] = SOURCE_TABLE.[DESCRIPTION],
TARGET_TABLE.[EXTRA] = SOURCE_TABLE.[EXTRA],
TARGET_TABLE.[UNIT] = SOURCE_TABLE.[UNIT],
TARGET_TABLE.[COST] = SOURCE_TABLE.[COST],
TARGET_TABLE.[STOCK] = SOURCE_TABLE.[STOCK],
TARGET_TABLE.[CURRENCY] = SOURCE_TABLE.[CURRENCY],
TARGET_TABLE.[LASTUPDATE] = SOURCE_TABLE.[LASTUPDATE],
TARGET_TABLE.[IN] = SOURCE_TABLE.[IN],
TARGET_TABLE.[CLIENT] = SOURCE_TABLE.[CLIENT]
FROM SYNC_INFORMATION TARGET_TABLE
JOIN LSERVER.dbo.INFORMATION SOURCE_TABLE ON TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
WHERE TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
--Add new rows
INSERT INTO SYNC_INFORMATION (ITEMNO, DESCRIPTION, EXTRA, UNIT, STANDARDCOST, STOCKTYPE, CURRENCY_ID, LASTSTANDARDUPDATE, IN_ID, CLIENTCODE)
SELECT
src.ITEM,
src.DESCRIPTION,
src.EXTRA,
src.UNIT,
src.COST,
src.STOCKTYPE,
src.CURRENCY_ID,
src.LASTUPDATE,
src.IN,
src.CLIENT
FROM LSERVER.dbo.INFORMATION src
LEFT JOIN SYNC_INFORMATION targ ON src.ITEMNO = targ.ITEMNO
WHERE
targ.ITEMNO IS NULL
END
Currently, this procedure (including some others that are also executed at the same time) takes about 15 seconds to execute.
I'm planning on adding a "Synchronize" button in my work interface so that users can manually synchronize when, for instance, a new item is added and needs to be used the same day.
But in order for me to do that, I need to trim those 15 seconds as much as possible.
Instead of updating every single row, like in my procedure, is it possible to only update rows that have values that does not match?
This would greatly increase the execution speed, since it doesn't have to update all the 4000 rows when maybe only 20 actually needs it.
Can this be done in a better way, or optimized?
Does it need improvements, if yes, where?
How would you solve this?
Would also appreciate some time differences between the solutions so I can compare them.
UPDATE
Using marc_s's CHECKSUM is really brilliant. The problem is that in some instances the information creates the same checksum. Here's an example, due to the classified content, I can only show you 2 columns, but I can say that all columns have identical information except these 2. To clarify: this screenshot is of all the rows that had duplicate CHECKSUMs. These are also the only rows with a hyphen in the ITEM column, I've looked.
The query was simply
SELECT *, CHECKSUM(*) FROM SYNC_INFORMATION
If you can change the table structure ever so slightly - you could add a computed CHECKSUM column to your two tables, and in the case the ITEM is identical, you could then check that checksum column to see if there are any differences at all in the columns of the table.
If you can do this - try something like this here:
ALTER TABLE dbo.[INFORMATION]
ADD CheckSumColumn AS CHECKSUM([DESCRIPTION], [EXTRA], [UNIT],
[COST], [STOCK], [CURRENCY],
[LASTUPDATE], [IN], [CLIENT]) PERSISTED
Of course: only include those columns that should be considered when making sure whether a source and a target row are identical ! (this depends on your needs and requirements)
This persists a new column to your table, which is calculated as the checksum over the columns specified in the list of arguments to the CHECKSUM function.
This value is persisted, i.e. it could be indexed, too! :-O
Now, you could simplify your UPDATE to
UPDATE TARGET_TABLE
SET ......
FROM SYNC_INFORMATION TARGET_TABLE
JOIN LSERVER.dbo.INFORMATION SOURCE_TABLE ON TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
WHERE
TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
AND TARGET_TABLE.CheckSumColumn <> SOURCE_TABLE.CheckSumColumn
Read more about the CHECKSUM T-SQL function on MSDN!

Performance tuning on reading huge table

I have a huge table with more than one hundred million of rows and I have to query this table to return a set of data in a minimum of time.
So I have created a test environment with this table definition:
CREATE TABLE [dbo].[Test](
[Dim1ID] [nvarchar](20) NOT NULL,
[Dim2ID] [nvarchar](20) NOT NULL,
[Dim3ID] [nvarchar](4) NOT NULL,
[Dim4ID] [smalldatetime] NOT NULL,
[Dim5ID] [nvarchar](20) NOT NULL,
[Dim6ID] [nvarchar](4) NOT NULL,
[Dim7ID] [nvarchar](4) NOT NULL,
[Dim8ID] [nvarchar](4) NOT NULL,
[Dim9ID] [nvarchar](4) NOT NULL,
[Dim10ID] [nvarchar](4) NOT NULL,
[Dim11ID] [nvarchar](20) NOT NULL,
[Value] [decimal](21, 6) NOT NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
[Dim1ID] ASC,
[Dim2ID] ASC,
[Dim3ID] ASC,
[Dim4ID] ASC,
[Dim5ID] ASC,
[Dim6ID] ASC,
[Dim7ID] ASC,
[Dim8ID] ASC,
[Dim9ID] ASC,
[Dim10ID] ASC,
[Dim11ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
This table is the fact table of Star schema architecture (fact/dimensions). As you can see I have a clustered index on all the columns except for the “Value” column.
I have filled this data with approx. 10,000,000 rows for testing purpose. The fragmentation is currently at 0.01%.
I would like to improve the performance when reading a set of rows from this table using this query:
DECLARE #Dim1ID nvarchar(20) = 'C1'
DECLARE #Dim9ID nvarchar(4) = 'VRT1'
DECLARE #Dim10ID nvarchar(4) = 'S1'
DECLARE #Dim6ID nvarchar(4) = 'FRA'
DECLARE #Dim7ID nvarchar(4) = '' -- empty = all
DECLARE #Dim8ID nvarchar(4) = '' -- empty = all
DECLARE #Dim2 TABLE ( Dim2ID nvarchar(20) NOT NULL )
INSERT INTO #Dim2 VALUES ('A1'), ('A2'), ('A3'), ('A4');
DECLARE #Dim3 TABLE ( Dim3ID nvarchar(4) NOT NULL )
INSERT INTO #Dim3 VALUES ('P1');
DECLARE #Dim4ID TABLE ( Dim4ID smalldatetime NOT NULL )
INSERT INTO #Dim4ID VALUES ('2009-01-01'), ('2009-01-02'), ('2009-01-03');
DECLARE #Dim11 TABLE ( Dim11ID nvarchar(20) NOT NULL )
INSERT INTO #Dim11 VALUES ('Var0001'), ('Var0040'), ('Var0060'), ('Var0099')
SELECT RD.Dim2ID,
RD.Dim3ID,
RD.Dim4ID,
RD.Dim5ID,
RD.Dim6ID,
RD.Dim7ID,
RD.Dim8ID,
RD.Dim9ID,
RD.Dim10ID,
RD.Dim11ID,
RD.Value
FROM dbo.Test RD
INNER JOIN #Dim2 R
ON RD.Dim2ID = R.Dim2ID
INNER JOIN #Dim3 C
ON RD.Dim3ID = C.Dim3ID
INNER JOIN #Dim4ID P
ON RD.Dim4ID = P.Dim4ID
INNER JOIN #Dim11 V
ON RD.Dim11ID = V.Dim11ID
WHERE RD.Dim1ID = #Dim1ID
AND RD.Dim9ID = #Dim9ID
AND ((#Dim6ID <> '' AND RD.Dim6ID = #Dim6ID) OR #Dim6ID = '')
AND ((#Dim7ID <> '' AND RD.Dim7ID = #Dim7ID) OR #Dim7ID = '')
AND ((#Dim8ID <>'' AND RD.Dim8ID = #Dim8ID) OR #Dim8ID = '')
I have tested this query and that’s returned 180 rows with these times:
1st execution: 1 min 32 sec; 2nd execution: 1 min.
I would like to return the data in a few seconds if it’s possible.
I think I can add the non-clustered indexes but I am not sure what the best way is to set the non-clustered indexes!
If having sorted order data in this table could improve the performances?
Or are there other solutions than indexes?
Thanks.
Consider your datatypes as one problem. Do you need nvarchar? It's measurably slower
Second problem: the PK is wrong for your query, It should be Dim1ID, Dim9ID first (or vice versa based on selectivity). or some flavour with the JOIN columns in.
Third problem: use of OR. This construct usually works despite what nay-sayers who don't try it will post.
RD.Dim7ID = ISNULL(#Dim7ID, RD.Dim7ID)
This assumes #Dim7ID is NULL though. The optimiser will short circuit it in most cases.
I'm with gbn on this. Typically in star schema data warehouses, the dimension IDs are int, which is 4 bytes. Not only are all your dimensions larger than that, the nvarchar are both varying and using wide characters.
As far as indexing, just one clustering index may be fine since in the case of your fact table, you really don't have many facts. As gbn says, with your particular example, your index needs to be in the order of the columns which you are going to be providing so that the index can actually be used.
In a real-world case of a fact table with a number of facts, your clustered index is simply for data organization - you'll probably be expecting some non-clustered indexes for specific usages.
But I'm worried that your query specifies an ID parameter. Typically in a DW environment, you don't know the IDs, for selective queries, you select based on the dimensions, and the ids are meaningless surrogates:
SELECT *
FROM fact
INNER JOIN dim1
ON fact.dim1id = dim1.id
WHERE dim1.attribute = ''
Have you looked at Kimball's books on dimensional modeling? I think if you are going to a star schema, you should probably be familiar with his design techniques, as well as the various pitfalls he discusses with the too many and too few dimensions.
see this: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
quick answer, if you are on the right service pack of SQL Server 2008, is to
try adding that to the end of the query:
OPTION(RECOMPILE)
when on the proper service pack of SQL Server 2008, the OPTION(RECOMPILE) will build the execution plan based on the runtime value of the local variables.
For people still using SQl Server 2008 without the proper service packs or still on 2005 see: Dynamic Search Conditions in T-SQLVersion for SQL 2005 and Earlier
I'd be a little concerned about having all the non-value columns in your clustered index. That will make for a large index in the non-leaf levels. And, that key will be used in the nonclustered indexes. And, it will only provide any benefit when [Dim1ID] is included in the query. So, even if you're only optimizing this query, you're probably getting a full scan.
I would consider a clustered index on the most-commonly used key, and if you have a lot of date-related queries (e.g., date between a and b), go with the date key. Then, create non clustered indexes on the other key values.

Resources