Hi

I need to generate a CSV of names and address from a table and I would like only one result (it doesn't matter which) per distinct email address. If possible, they also need to be ordered by DateAdded descending so I can select just the most recent 1000 results.

My table contains a number of fields such as FirstName, LastName, Address1, Address2, Postcode, Email etc. A plain distinct doesn't work as they may have entered a slightly different name/address combination each time they place an order. So I thought doing it on a distinct email address would be the way.

I have tried populating a temporary table with all distinct email addresses and using a left join but still get multiple results.

For example if the fields were FirstName, LastName, Email and the data looked like

John Williams john@somewhere.com
Jonathan Williams john@somewhere.com
Mary Thomas mary@blah.com

I would want to return just
John Williams john@somewhere.com
Mary Thomas mary@blah.com

as the first two records have the same email address so only one needs to be returned.

I've tried a variety of JOIN, DISTINCT and GROUP BY statements but can't get what I need. I'm sure I'm missing something but can't see what!

Any help would be gratefully appreciated.

Recommended Answers

All 6 Replies

Select Email,
(
  Select Top 1 FirstName
  From Table As x
  Where x.Email = Table.Email
  Order By x.CreateDate Desc
)
From Table
Group By Email
Order By Email

You could modify the logic about to form a join syntax if you wanted.

Hi Scott

That's excellent thanks. That does bring back just the two rows.

But how would I expand that to include other fields such as LastName, Address1, Postcode etc. I've tried amending your code to something like

Select ShopperEmail,
(
  Select Top 1 BillingFirstName, BillingLastName
  From TempOrders As x
  Where x.ShopperEmail = TempOrders.ShopperEmail
  Order By x.OrderDate Desc
)
From TempOrders
Group By ShopperEmail
Order By ShopperEmail

but get the error "Only one expression can be specified in the select list when the subquery is not introduced with EXISTS." which doesn't mean much to me I'm afraid. Sorry - I might need a bit more hand holding on this one

When you say "You could modify the logic about to form a join syntax if you wanted." Do you mean that I could re-write the code you provided using a JOIN instead of a subquery? I think that may be a bit beyond me unfortunately!

Thanks

IF OBJECT_ID('Emails', 'U') IS NOT NULL DROP TABLE Emails
IF OBJECT_ID('tempdb..#TempTable', 'U') IS NOT NULL DROP TABLE #TempTable
GO
Create Table Emails
(
  Email varchar(50),
  FirstName varchar(50),
  LastName varchar(50),
  CreateDate DateTime
)

Insert Into Emails (Email, FirstName, LastName, CreateDate) Values ('sknake@daniweb.com', 'Scott', 'Knake', GetDate())
Insert Into Emails (Email, FirstName, LastName, CreateDate) Values ('sknake@daniweb.com', 'Scott1', 'Knake1', GetDate()+1)
Insert Into Emails (Email, FirstName, LastName, CreateDate) Values ('sknake@daniweb.com', 'Scott2', 'Knake2', GetDate()+2)
Insert Into Emails (Email, FirstName, LastName, CreateDate) Values ('sknake@daniweb.com', 'Scott3', 'Knake3', GetDate()+3)
GO
Insert Into Emails (Email, FirstName, LastName, CreateDate) Values ('sSmith@daniweb.com', 'penfold33', 'Smith', GetDate())
Insert Into Emails (Email, FirstName, LastName, CreateDate) Values ('sSmith@daniweb.com', 'penfold331', 'Smith1', GetDate()+1)
Insert Into Emails (Email, FirstName, LastName, CreateDate) Values ('sSmith@daniweb.com', 'penfold332', 'Smith2', GetDate()+2)
Insert Into Emails (Email, FirstName, LastName, CreateDate) Values ('sSmith@daniweb.com', 'penfold333', 'Smith3', GetDate()+3)
GO

Select *
From Emails

Select Email, Min(CreateDate) As CreateDate
INTO #TempTable
From Emails
Group By Email

Select Emails.*
From #TempTable Inner Join Emails On (#TempTable.Email = Emails.Email and #TempTable.CreateDate = Emails.CreateDate)

Hi Scott

That's great, thanks for all your help on this. Your solution works well but someone else came up with a different solution:

select * from yourtable y1 
where y1.yourprimarykey in  
(select yourprimarykey = (select top 1 y2.yourprimarykey from yourtable y2 where y2.email = emails.email) 
from 
(select distinct Email from yourtable) emails)

I was just wondering if you knew whether either solution would be better regarding performance (or any other reason) or doesn't it matter which I go with?

Thanks again

There might be a difference and I imagine mine has more of a penalty since it is creating a temp table which causes disk I/O but for readability and maintainability I prefer the way I chose regardless.

It should be a negligible difference unless these tables are huge so its up to you :)

Please mark this thread as solved since you have found two answers to your question and good luck!

Great. Thanks - I'll have a play with both.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.