Remove duplicates from custom entities in Microsoft Dynamics CRM - dynamics-crm

Has anyone found a good way to either merge or remove duplicates that are in custom entities? In our case we have two custom entities, literature history and subscriptions which relate contacts back to a custom entity named literature.
I can run a duplicate detection job, but this returns thousands of records and deleting them one at a time is impractical at best. We would like to either be able to merge them or just delete the duplicates. However, much Google searching has not turned up any good suggestions other than "you can write something."
Okay, but where to even get started? Should I be bulk deleting from the duplicate detection job? Should I try just writing a quick and dirty c# program with the SDK? Is there a way to merge custom entities that just requires some magical workflow voodoo?
EDIT: FYI What I eventually did was setting the deletion state code using some fun SQL to quickly find duplicates:
UPDATE T1 SET DeletionStateCode = 2
FROM New_subscriptionhistory T1 INNER JOIN New_subscriptionhistory T2 ON t1.New_LiteratureId = T2.New_LiteratureId AND t1.New_ContactId = t2.New_ContactId
AND t1.CreatedOn > t2.CreatedOn AND t1.statecode = 0 AND t2.statecode = 0

You should look into creating a Bulk Delete Job using the SDK.
Here's a short tutorial.

I won't say with certainty that this is the only or the best way, but we've used SQL queries in the _MSCRM database, setting the DeletionStateCode of any duplicated entity to 2.

Related

Recursive database viewing

I have this situation. Starting from a table, I have to check all the records that match a key. If records are found, I have to check another table using a key from the first table and so on, more on less on five levels. There is a way to do this in a recursive way, or I have to write all the code "by hand"? The language I am using is Visual Fox Pro. If this is is not possible, is it al least possible to use recursion to popolate a treeview?
You can set a relation between tables. For example:
USE table_1.dbf IN 0 SHARED
USE table_2.dbf IN 0 SHARED
SET ORDER TO TAG key_field OF table_2.cdx IN table_2
SET RELATION TO key_field INTO table_2 ADDITIVE IN table_1
First two commands open table_1 and table_2. Then you have to set the order/index of table_2. If you don't have an index for the key field then this will not work. The final command sets the relation between the two tables on the key field.
From here you can browse both tables and table_2's records will be filtered based on table_1's key field. Hope this helps.
If the tables have similar structure or you only need to look at a few fields, you could write a recursive routine that receives the name of the table, the key to check, and perhaps the fields you need to check as parameters. The tricky part, I guess, is knowing what to pass down to the next call.
I don't think I can offer any more advice without at least seeing some table structures.
Sorry for answering so late, but the problem was of course that the recursion wasn't a viable solution since I had to search inside multiple tables. So I resolved by doing a simple 2-Level search in the tables that I needed.
Thank you very much for the help, and sorry again for answering so late.

Linq left-joining like tables

I have 2 datatables of identical structure, and I need to find all records that appear on the first, but not on the second. What makes it more complicated is that the matching needs to be on 3 columns instead of one.
Background - I'm writing a replication process where rows of data arrive in an XML transaction and they need to be matched against the 'host' database to find out if there are any items that need to be added. The basic algorithm is as follows:
Load in transaction dataset containing several datatables
Create a new datatable and populate with the 'host' entries from the local database
Run a match between them to find out which are the 'new' records
Iterate through said 'new' records and create the objects in the database.
I've seen many examples of LEFT JOIN in LINQ but I can't seem to find anything that specifically meets my needs. It would be easy if the tables were joined on one column, but unfortunately this is not the case....
Any help would be appreciated.
Thanks,
Tim
See Microsoft's 101 LINQ Samples. There is a LEFT OUTER JOIN example that should help you out.

How do I implement threaded comments?

I am developing a web application that can support threaded comments. I need the ability to rearrange the comments based on the number of votes received. (Identical to how threaded comments work in reddit)
I would love to hear the inputs from the SO community on how to do it.
How should I design the comments table?
Here is the structure I am using now:
Comment
id
parent_post
parent_comment
author
points
What changes should be done to this structure?
How should I get the details from this table to display them in the correct manner?
(Implementation in any language is welcome. I just want to know how to do it in the best possible manner)
What are the stuff I need to take care while implementing this feature so that there is less load on the CPU/Database?
Thanks in advance.
Storing trees in a database is a subject which has many different solutions. It depends on if you want to retrieve a subhierarchy as well (so all children of item X) or if you just want to grab the entire set of hierarchies and build the tree in an O(n) way in memory using a dictionary.
Your table has the advantage that you can fetch all comments on a post in 1 go, by filtering on the parentpost. As you've defined the comment's parent in the textbook/naive way, you have to build the tree in memory (see below). If you want to obtain the tree from the DB, you need a different way to store a tree:
See my description of a pre-calc based approach here:
http://www.llblgen.com/tinyforum/GotoMessage.aspx?MessageID=17746&ThreadID=3208
or by using balanced trees described by CELKO here:
or yet another approach:
http://www.sqlteam.com/article/more-trees-hierarchies-in-sql
If you fetch everything in a hierarchy in memory and build the tree there, it can be more efficient due to the fact that the query is pretty simple: select .. from Comment where ParentPost = #id ORDER BY ParentComment ASC
After that query, you build the tree in memory with just 1 dictionary which keeps track of the tuple CommentID - Comment. You now walk through the resultset and build the tree on the fly: every comment you run into, you can lookup its parentcomment in the dictionary and then store the comment currently processed also in that dictionary.
Couple things to also consider...
1) When you say "sort like reddit" based on rank or date, do you mean the top-level or the whole thing?
2) When you delete a node, what happens to the branches? Do you re-parent them? In my implementation, I'm thinking that the editors will decide--either hide the node and display it as "comment hidden" along with the visible children, hide the comment and it's children, or nuke the whole tree. Re-parenting should be easy (just set the chidren's parent to the deleted's parent), but it anything involving the whole tree seems to be tricky to implement in the database.
I've been looking at the ltree module for PostgreSQL. It should make database operations involving parts of the tree a bit faster. It basically lets you set up a field in the table that looks like:
ltreetest=# select path from test where path <# 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
However, it doesn't ensure any kind of referential integrity on its own. In other words, you can have a records for "Top.Science.Astronomy" without having a record for "Top.Science" or "Top". But what it does let you do is stuff like:
-- hide the children of Top.Science
UPDATE test SET hide_me=true WHERE path #> 'Top.Science';
or
-- nuke the cosmology branch
DELETE FROM test WHERE path #> 'Top.Science.Cosmology';
If combined with the traditional "comment_id"/"parent_id" approach using stored procedures, I'm thinking you can get the best of both worlds. You can quickly traverse the comment tree in the database using your "path" and still ensure referential integrity via "comment_id"/"parent_id". I'm envisioning something like:
CREATE TABLE comments (
comment_id SERIAL PRIMARY KEY,
parent_comment_id int REFERENCES comments(comment_id) ON UPDATE CASCADE ON DELETE CASCADE,
thread_id int NOT NULL REFERENCES threads(thread_id) ON UPDATE CASCADE ON DELETE CASCADE,
path ltree NOT NULL,
comment_body text NOT NULL,
hide boolean not null default false
);
The path string for a comment look like be
<thread_id>.<parent_id_#1>.<parent_id_#2>.<parent_id_#3>.<my_comment_id>
Thus a root comment of thread "102" with a comment_id of "1" would have a path of:
102.1
And a child whose comment_id is "3" would be:
102.1.3
A some children of "3" having id's of "31" and "54" would be:
102.1.3.31
102.1.3.54
To hide the node "3" and its kids, you'd issue this:
UPDATE comments SET hide=true WHERE path #> '102.1.3';
I dunno though--it might add needless overhead. Plus I don't know how well maintained ltree is.
Your current design is basically fine for small hierarchies (less than thousand items)
If you want to fetch on a certian level or depth, add a 'level' item to your structure and compute it as part of the save
If performance is an issue use a decent cache
I'd add the following new fields to the above tabel:
thread_id: identifier for all comments attached to a specific object
date: the comment date (allows fetching the comments in order)
rank: the comment rank (allows fetching the comment order by ranking)
Using these fields you'll be able to:
fetch all comments in a thread in a single op
order comments in a thread either by date or rank
Unfortunately if you want to preserve your queries DB close to SQL standard you'll have to recreate the tree in memory. Some DBs are offering special queries for hierarchical data (f.e. Oracle)
./alex

How to Sort Data Table like FogBugz Cases Table

Anyone ever see how fogbugz sorts their tables? When you click to sort the column, they actually break the table up into many small tables that have each category of info.
Wondering if anyone knows how they do this?
Looking to implement this feature.
If you take a look through the cases page, and sort you can see what I mean.
Any help would be AWESOME!
Still Haven't figured this one out.
EDIT: #Peter, I don't want to postback and recreate a table every time the header title is clicked for a sort. I also want to know if their is a generic solution for this. If I click on the header to sort, by the way of javascript, it seperates the "one" table into many and I want to know if their is any generic solution for this because its just a MUCH better way of viewing a sorted Table.
EDIT: I do need a javascript sorter, but if you look right down at the implementation of fogbugz, it produces a different result...
Yup, Rich got it (I coded this feature into FogBugz a long while back).
If you have to do this on the client you have no choice but to sort the data, iterate through it generating table row after table row, and every time you hit a new sort value you create a new thead w/ the appropriate information.
To be honest it would be a pretty cool modification to this jQuery plugin: http://tablesorter.com/docs/ and you'd be able to leverage a lot of their work. If you're going to put in the time and create a general solution, might as well make it accessible to the community.
Without knowing specifically how Fog Creek accomplishes this, the way that I would do it is to output a table header, then iterate through the list, outputting a footer and a new header each time the group value changed.
Not sure what answer do you expect. SQL query for this would simply use ordering on selected column, and UI would start new table each time this value changes.
Here is screenshot of FogBugz with this sorting, after clicking on Priority column.
http://img297.imageshack.us/img297/6974/76755363ee3.png
Of course, starting new table doesn't make sense for every column (title, case #).
Edit: If I understand correctly, you're looking for a way how to do this in a browser without loading new page. If this is the case, I would suggest at least some server-side support, which would return your data in correct order, and properly structured for subtables (in xml/json/whatever you use). Your javascript will use this data to recreate tables. I am sure others with more web-ui experience will provide you with better answers.
I've used the Sortable Tables script from Kryogenix with some good results.
I don't know if it is relevant, but we store the results of a query in a temporary table in SQL, and then reference current-row-less-one to see if a Category has changed, and indicate this in the resulset.
In some instances we "indicate" this with a column containing
<tr><td colspan=999>Category Heading</td></tr>
so that the web page can just "inject" that into the table it is building.
SELECT Col1, Col2, ...,
[CATEGORY] = CASE WHEN T1.CategoryCol <> COALESCE(T2.CategoryCol, '')
THEN '<tr><td colspan=999>' + T1.CategoryCol + '</td></tr>'
ELSE ''
END
FROM #MyTempTable AS T1
LEFT OUTER JOIN #MyTempTable AS T2
ON T2.ID = T1.ID - 1

Using LINQ to query flat text files with fixed-length records?

I've got a file filled with records like this:
NCNSCF1124557200811UPPY19871230
The codes are all fixed-length, and some of them link to other flat files (sort of like a relational database). What's the best way of querying this data using LINQ?
This is what I came up with intuitively, but I was wondering if there's a more elegant way:
var records = File.ReadAllLines("data.txt");
var table = from record in records
select new { FirstCode = record.Substring(0, 2),
OtherCode = record.Substring(18, 4) };
For one thing I wouldn't read it all into memory to start with. It's very easy to write a LineReader class which iterates over a file a line at a time. I've got a version in MiscUtil which you can use.
Unless you only want to read the results once, however, you might want to call ToList() at the end to avoid reading the file multiple times. (This is still nicer than reading all the lines and keeping that in memory - you only want to do the splitting once.)
Once you've basically got in-memory collections of all the tables, you can use normal LINQ to Objects to join them together etc. You might want to go to a more sophisticated data model to get indexes though.
I don't think there's a better way out of the box.
One could define a Flat-File Linq Provider which could make the whole thing much simpler, but as far as I know, no one has yet.

Resources