Database Duplicates -- Your Ideas and Methods 
Author Message
 Database Duplicates -- Your Ideas and Methods

What is your, as in the reader of this, ideas on the best way to check for,
and eliminate duplicate records in a database.

Basically I was thinking that I shouldnt INSERT a record where all rows
except ID and Timestamp info are the same as a current row.

The issue then comes on UPDATES, where if the current row is changed to a
row that already exists, then I should just use the existing ID, and Delete
the Row.  To solve the issue of deleting rows with multiple dependancies, I
would run a scheduled "Clean Database" program that would delete rows
without any external references.

The simple fact is that, for all of my knowledge, Normalization is a Nice
Idea, however deploying it is a pain in the ass.  In fact I dont know any
database that totally eliminates duplicates.

I was also thinking of running a scheduled Duplicate eliminator once a day,
or runing all of my inserts and updates through a SQL Function.  What are
your Ideas on these and other methods.  Does anyone know of Information
Sources for these Ideas?  Ill keep researching, just wanted to find out what
you think.



Wed, 31 Aug 2005 22:48:02 GMT  
 Database Duplicates -- Your Ideas and Methods
Mike,

If you know what column(s) have to be unique, use Primary Key and Unique
constraints.

--
Dejan Sarka, SQL Server MVP
FAQ from Neil & others at: http://www.sqlserverfaq.com
Please reply only to the newsgroups.
PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org


Quote:
> What is your, as in the reader of this, ideas on the best way to check
for,
> and eliminate duplicate records in a database.

> Basically I was thinking that I shouldnt INSERT a record where all rows
> except ID and Timestamp info are the same as a current row.

> The issue then comes on UPDATES, where if the current row is changed to a
> row that already exists, then I should just use the existing ID, and
Delete
> the Row.  To solve the issue of deleting rows with multiple dependancies,
I
> would run a scheduled "Clean Database" program that would delete rows
> without any external references.

> The simple fact is that, for all of my knowledge, Normalization is a Nice
> Idea, however deploying it is a pain in the ass.  In fact I dont know any
> database that totally eliminates duplicates.

> I was also thinking of running a scheduled Duplicate eliminator once a
day,
> or runing all of my inserts and updates through a SQL Function.  What are
> your Ideas on these and other methods.  Does anyone know of Information
> Sources for these Ideas?  Ill keep researching, just wanted to find out
what
> you think.



Thu, 01 Sep 2005 02:01:03 GMT  
 Database Duplicates -- Your Ideas and Methods
a row that is the same as another row in the database, with the obvous
exception of a primary key, still violates the rules of normalization.
Right?!?  For example: 1 table with 3 columns (Primary Key, Description,
Keywords) and where 2 rows with the same descriptions and keywords.  The
primary key only serves for a reference pointer, and is not really a
important value (in most cases)



Quote:
> Mike,

> If you know what column(s) have to be unique, use Primary Key and Unique
> constraints.

> --
> Dejan Sarka, SQL Server MVP
> FAQ from Neil & others at: http://www.sqlserverfaq.com
> Please reply only to the newsgroups.
> PASS - the definitive, global community
> for SQL Server professionals - http://www.sqlpass.org



> > What is your, as in the reader of this, ideas on the best way to check
> for,
> > and eliminate duplicate records in a database.

> > Basically I was thinking that I shouldnt INSERT a record where all rows
> > except ID and Timestamp info are the same as a current row.

> > The issue then comes on UPDATES, where if the current row is changed to
a
> > row that already exists, then I should just use the existing ID, and
> Delete
> > the Row.  To solve the issue of deleting rows with multiple
dependancies,
> I
> > would run a scheduled "Clean Database" program that would delete rows
> > without any external references.

> > The simple fact is that, for all of my knowledge, Normalization is a
Nice
> > Idea, however deploying it is a pain in the ass.  In fact I dont know
any
> > database that totally eliminates duplicates.

> > I was also thinking of running a scheduled Duplicate eliminator once a
> day,
> > or runing all of my inserts and updates through a SQL Function.  What
are
> > your Ideas on these and other methods.  Does anyone know of Information
> > Sources for these Ideas?  Ill keep researching, just wanted to find out
> what
> > you think.



Thu, 01 Sep 2005 03:01:44 GMT  
 Database Duplicates -- Your Ideas and Methods
Mike,

What rule of normalization do you think duplicate values breaks? That is not
the same as duplicate data. The distinction is terribly important. Sam Smith
lives at 123 Green Street. So does Sam Smith and Sam Smith. It is a
multi-generationsl household that has an elder, middle and child. They are
distinct individuals so are separate data, as much as if they were Fred,
Harold, and John.

Now, there certainly may be good business reasons for avoiding duplicates,
although I can't imagine blindly removing them. I guess I'd be a lot more
worried about how they are getting in there. The user has two pieces of data
and is making them look identical. What are they really trying to do and how
are you going to make it easier for them to do that.

Kathleen


Quote:
> a row that is the same as another row in the database, with the obvous
> exception of a primary key, still violates the rules of normalization.
> Right?!?  For example: 1 table with 3 columns (Primary Key, Description,
> Keywords) and where 2 rows with the same descriptions and keywords.  The
> primary key only serves for a reference pointer, and is not really a
> important value (in most cases)



> > Mike,

> > If you know what column(s) have to be unique, use Primary Key and Unique
> > constraints.

> > --
> > Dejan Sarka, SQL Server MVP
> > FAQ from Neil & others at: http://www.sqlserverfaq.com
> > Please reply only to the newsgroups.
> > PASS - the definitive, global community
> > for SQL Server professionals - http://www.sqlpass.org



> > > What is your, as in the reader of this, ideas on the best way to check
> > for,
> > > and eliminate duplicate records in a database.

> > > Basically I was thinking that I shouldnt INSERT a record where all
rows
> > > except ID and Timestamp info are the same as a current row.

> > > The issue then comes on UPDATES, where if the current row is changed
to
> a
> > > row that already exists, then I should just use the existing ID, and
> > Delete
> > > the Row.  To solve the issue of deleting rows with multiple
> dependancies,
> > I
> > > would run a scheduled "Clean Database" program that would delete rows
> > > without any external references.

> > > The simple fact is that, for all of my knowledge, Normalization is a
> Nice
> > > Idea, however deploying it is a pain in the ass.  In fact I dont know
> any
> > > database that totally eliminates duplicates.

> > > I was also thinking of running a scheduled Duplicate eliminator once a
> > day,
> > > or runing all of my inserts and updates through a SQL Function.  What
> are
> > > your Ideas on these and other methods.  Does anyone know of
Information
> > > Sources for these Ideas?  Ill keep researching, just wanted to find
out
> > what
> > > you think.



Thu, 01 Sep 2005 08:23:34 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Duplicates New ideas?

2. Best Method to detect duplicate record on .ADDNEW?

3. I am getting duplicate records, I mean everything is duplicate even Access record number

4. Holy holy duplicates duplicates Batman Batman!

5. Ideas to integrate Outlook and newsgroup into my Access database

6. Ideas to integrate Outlook,PowerPoint with Access database??

7. Dispatch Database Duplicate Fields

8. Database - duplicate records

9. VB error trapping of duplicates in indexed Access database

10. HELP: Duplicate Keys in Access Database

11. How do I exclude duplicate database records

12. Looking for database manipulation ideas. (long)

 

 
Powered by phpBB® Forum Software