0

Hi thought i'd try here see if anyone can help solve this quicker.

First off we have a email database.

The main table is the email data, all unique rows per email address with a unique id.

eg.
emailid,email
1,email@example.com
2,email@example.co.uk

Next we have another table which logs the data of each time an email has been attempted to send such as

incid,emailid,sent,campaign_id,datesent
1,1,1,send1,2012-02-20
2,2,0,send1,2012-02-20
3,1,1,send2,2012-02-27
4,2,1,send2,2012-02-27

The part i'm having trouble with is a query that pulls the next emails a campaign is up to

(
	SELECT *
	FROM `the_main_table` as `mt` LEFT JOIN `sentdata` AS `sd` ON mt.emailid = sd.emailid 
	WHERE ##......##
sd.campaign_id = 'send2'
	GROUP BY mt.emailid  
	ORDER BY MAX(sd.`datesent`) ASC 
	HAVING MAX(sd.`datesent`)
	LIMIT 0,500
)
UNION
(
	SELECT *
	FROM `the_main_table` as `mt` 
	WHERE ##......##
	mt.emailid NOT IN(SELECT `emailid` FROM `sentdata` WHERE `campaign_id` = 'send2') 
	GROUP BY mt.emailid 
	LIMIT 0,500
)

The needed results for the query is an ordered list of a set quantity (eg. 500) of emails that haven't been sent to OR if everything has been sent the emails that haven't been sent to for the longest time.

The problem with the current one I worked out above is the quantity, the first union pulls an ordered list of the 500 oldest it has already sent to(if there are any), then the union below pulls 500 it hasn't sent to(if there are any). For new campaigns it will pull 500, which is good afterwards it will pull the same 500 plus 500 new ones - which we can't have.

Can anyone think of a different approach to get the desired results?

2
Contributors
20
Replies
21
Views
5 Years
Discussion Span
Last Post by smantscheff
Featured Replies
  • You can move the WHERE conditions into the left join clause to exclude the undesired rows. Why don't you prepare a nice little test case with complete table structures, test data and your query for us? It might clear things up. Read More

  • Maybe this could be an approach: Make a left join from the_master_table to sentdata. For any missing rows on the right insert a date way back in the past and sort by this virtual date as well as by actual dates. That way you get all the records which are … Read More

0

Order by the fields sent ASC and datesent DESC and use the first 500 rows.

Edited by smantscheff: n/a

0

Order by the fields sent ASC and datesent DESC and use the first 500 rows.

Thanks for reply, the `sent` column contains a 0 if the email failed to send directly and a 1 if the email was sent to the server successfully - ordering by this asc will just put the fails at the top.

We don't want to keep retrying an email if the server doesn't want it, regardless if it failed or not the order is just "when it was last attempted" - fails are handled from another subquery.

I've been talking with another tech guy here and we tried:

#in the WHERE: 
AND (`sd`.`campaign_id` = 'send2' OR `sd`.`campaign_id` IS NULL) 
#Having:
HAVING MAX(sd.`datetime`) OR MAX(sd.`datetime`) IS NULL

Alas the join is not leaving them fields null - they just contain a random different campaign_id when the current campaign_id hasn't sent to it before.

It left an idea open wondering if there is a way to LEFT JOIN a table with WHERE criteria ran on it before actually being joint on such as:

LEFT JOIN (SELECT * FROM `sentdata` WHERE campaign_id = 'send2') AS `sd` ON `mt`.`emailid` = `sd`.`emailid`

This way the field would be NULL and we could identify them as not being sent before, anyone familiar with that sort of thing?

0

That's correct. In a left join all values in the right table for which there is no matching record are NULL.

0

That's correct. In a left join all values in the right table for which there is no matching record are NULL.

re-read, there are matching values but are excluded because of the where clause

1

You can move the WHERE conditions into the left join clause to exclude the undesired rows. Why don't you prepare a nice little test case with complete table structures, test data and your query for us? It might clear things up.

0

You can move the WHERE conditions into the left join clause to exclude the undesired rows. Why don't you prepare a nice little test case with complete table structures, test data and your query for us? It might clear things up.

I'm exploring down that road atm - when i actually typed the "LEFT JOIN (SELECT * FROM `sentdata` WHERE campaign_id = 'send2') AS `sd` ON `mt`.`emailid` = `sd`.`emailid`" i was actually guessing and had no idea that would work!

The current problem now is just speed and the current query i am running on is this:

SELECT mt.emailid,sd.`datesent`,MAX(sd.`datesent`),sd.cusid
FROM (SELECT recid FROM `the_main_table` WHERE bademail = 0 AND source IN ('source1','source2') AND oksend = 0) AS `mt` 
LEFT JOIN (SELECT * FROM `sentdata` WHERE campaign_id = 'send2') AS `sd` ON `mt`.`emailid` = `sd`.`emailid`
GROUP BY mt.emailid  
HAVING MAX(sd.`datesent`) = sd.`datesent`
ORDER BY MAX(sd.`datesent`) ASC 
LIMIT 50000

A LEFT JOIN will take over 4/5 mins which is too long, i think a left join would do exactly what i want it to.

An INNER JOIN takes only 1.5 seconds but wont pull records there are no match for - which i need it to pull the ones with no match.

I'm still working on it myself atm - if i don't work out a solution by tonight ill write test tables

0

Have a look at EXPLAIN <yourQuery> for both cases. Maybe you can add indexes to speed up your LEFT JOIN query.

0

Have a look at EXPLAIN <yourQuery> for both cases. Maybe you can add indexes to speed up your LEFT JOIN query.

Heres the explain of the INNER JOIN

id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
"1","PRIMARY","<derived3>","ALL",\N,\N,\N,\N,"12311","Using temporary; Using filesort"
"1","PRIMARY","<derived2>","ALL",\N,\N,\N,\N,"702698","Using where; Using join buffer"
"3","DERIVED","mainsent","ref","NewIndex3","NewIndex3","8","","7701","Using where"
"2","DERIVED","maintable","ref","ownedby,addunk","addunk","1","","2747012","Using where"

and heres the LEFT JOIN

id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
"1","PRIMARY","<derived2>","ALL",\N,\N,\N,\N,"702692","Using temporary; Using filesort"
"1","PRIMARY","<derived3>","ALL",\N,\N,\N,\N,"12300",""
"3","DERIVED","mainsent","ref","NewIndex3","NewIndex3","8","","7697","Using where"
"2","DERIVED","maintable","ref","ownedby,addunk","addunk","1","","2747012","Using where"

as far as i can see the time issue is just the volume of rows, 8k vs 702k.

the 2 indexes "ownedby and addunk" im not sure how to index the 2 fields together to make it more efficient

0

Which indexes do you have?
You don't need the doubling of select clauses.
And do not select all columns (*) - it will slow things additionally.

Along those lines:

SELECT mt.emailid, sd.datesent,MAX(sd.datesent),sd.cusid
FROM the_main_table mt
LEFT JOIN sentdata sd
ON mt.emailid=sd.emailid
AND campaign_id = 'send2'
AND bademail = 0 
AND source IN ('source1','source2') 
AND oksend = 0

GROUP BY mt.emailid  
HAVING MAX(sd.datesent) = sd.datesent
ORDER BY MAX(sd.datesent) ASC 
LIMIT 50000
0

Alright getting pretty stuck on this now I've put together this script, if you run it all you'll get the same table setup and it hit's the same problem. The data is pretty random but creates the same problem.

DROP TABLE IF EXISTS `sentdata`;
DROP TABLE IF EXISTS `the_main_table`;

CREATE TABLE `the_main_table` (
  `RECID` INT(7) NOT NULL AUTO_INCREMENT,
  `emref` VARCHAR(20) DEFAULT NULL,
  `company` VARCHAR(60) DEFAULT NULL,
  `town` VARCHAR(40) DEFAULT NULL,
  `postcode` VARCHAR(8) DEFAULT NULL,
  `subpostcode` VARCHAR(4) DEFAULT NULL,
  `county` VARCHAR(30) DEFAULT NULL,
  `telephone` VARCHAR(25) DEFAULT NULL,
  `prefix` VARCHAR(10) DEFAULT NULL,
  `fname` VARCHAR(40) DEFAULT NULL,
  `sname` VARCHAR(40) DEFAULT NULL,
  `fullname` VARCHAR(80) DEFAULT NULL,
  `jobrole` VARCHAR(60) DEFAULT NULL,
  `responsibility` VARCHAR(20) DEFAULT NULL,
  `email` VARCHAR(200) NOT NULL DEFAULT '',
  `domain` VARCHAR(100) NOT NULL,
  `employees` VARCHAR(40) DEFAULT NULL,
  `fltband` VARCHAR(40) DEFAULT NULL,
  `sic4code` VARCHAR(4) DEFAULT NULL,
  `sic4desc` VARCHAR(120) DEFAULT NULL,
  `sic2code` VARCHAR(2) DEFAULT NULL,
  `sic2desc` VARCHAR(120) DEFAULT NULL,
  `sicsector` VARCHAR(40) DEFAULT NULL,
  `businesscode` VARCHAR(30) DEFAULT NULL,
  `busdesc` VARCHAR(50) DEFAULT NULL,
  `turnover` VARCHAR(40) DEFAULT NULL,
  `postcodearea` VARCHAR(5) DEFAULT NULL,
  `region` VARCHAR(20) DEFAULT NULL,
  `status` VARCHAR(40) DEFAULT NULL,
  `tps` VARCHAR(10) NOT NULL DEFAULT '0',
  `ADDUNK` TINYINT(1) NOT NULL DEFAULT '0',
  `ASKOFF` TINYINT(1) NOT NULL DEFAULT '0',
  `contbounce` TINYINT(1) NOT NULL DEFAULT '0',
  `REJECT` TINYINT(1) DEFAULT '0',
  `listmonth` VARCHAR(40) NOT NULL DEFAULT 'Dec 2010',
  `ownedby` ENUM('MH','CUSDATA','MHNEW','MHDEAD','ID','5MINS') NOT NULL DEFAULT 'MHNEW',
  `source` TEXT,
  `softbounce` INT(5) DEFAULT NULL,
  `fax` VARCHAR(30) DEFAULT NULL,
  `addr1` VARCHAR(40) DEFAULT NULL,
  `addr2` VARCHAR(40) DEFAULT NULL,
  `addr3` VARCHAR(30) DEFAULT NULL,
  `addr4` VARCHAR(25) DEFAULT NULL,
  `numbounces` INT(4) NOT NULL DEFAULT '0',
  `datefail` VARCHAR(15) DEFAULT NULL,
  `totfail` TINYINT(1) DEFAULT '0',
  `website` VARCHAR(150) DEFAULT NULL,
  `addunkmonth` VARCHAR(2) NOT NULL DEFAULT '0',
  `datelast` VARCHAR(25) DEFAULT NULL,
  `numsent` INT(5) DEFAULT '0',
  `nation` VARCHAR(15) DEFAULT NULL,
  `trapscore` INT(1) DEFAULT '0',
  `readcount` INT(3) DEFAULT NULL,
  `generic` TINYINT(1) DEFAULT '0',
  `contactid` VARCHAR(20) DEFAULT NULL,
  `LASTCUSID` VARCHAR(6) DEFAULT NULL,
  `SENDINC` INT(5) NOT NULL DEFAULT '0',
  `spareint1` INT(3) DEFAULT NULL,
  `spareint2` INT(3) DEFAULT NULL,
  `spareint3` INT(3) DEFAULT NULL,
  `sparevarchar2` VARCHAR(10) DEFAULT NULL,
  `sparevarchar3` VARCHAR(10) DEFAULT NULL,
  `sparevarchar4` VARCHAR(20) DEFAULT NULL,
  `spareint4` INT(3) DEFAULT NULL,
  PRIMARY KEY (`RECID`),
  UNIQUE KEY `email` (`email`),
  KEY `ownedby` (`ownedby`),
  KEY `RecidIN` (`RECID`),
  KEY `addunk` (`ADDUNK`),
  KEY `postcodesrea` (`postcodearea`),
  KEY `sic4code` (`sic4code`),
  KEY `postcode` (`postcode`),
  KEY `county` (`county`),
  KEY `website` (`website`)
) ENGINE=INNODB AUTO_INCREMENT=11108021 DEFAULT CHARSET=latin1
;
CREATE TABLE `sentdata` (
  `RECID` INT(8) NOT NULL DEFAULT '0',
  `SID` INT(4) NOT NULL DEFAULT '0',
  `datetime` DATETIME NOT NULL DEFAULT '2000-01-01 00:00:00',
  `SENT` TINYINT(1) NOT NULL DEFAULT '0',
  `CUSID` VARCHAR(6) NOT NULL DEFAULT 'BIM00',
  `sendid` VARCHAR(7) NOT NULL DEFAULT '0',
  KEY `NewIndex1` (`RECID`),
  KEY `NewIndex2` (`SID`),
  KEY `NewIndex3` (`CUSID`),
  KEY `NewIndex4` (`datetime`),
  KEY `NewIndex6` (`SID`,`RECID`),
  KEY `NewIndex5` (`SENT`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1
;

DROP PROCEDURE IF EXISTS load_test_data;

DELIMITER #

CREATE PROCEDURE load_test_data()
BEGIN

DECLARE v_max_customers INT UNSIGNED DEFAULT 0;
DECLARE v_max_orders INT UNSIGNED DEFAULT 0 ;
DECLARE v_counter INT UNSIGNED DEFAULT 0 ;
DECLARE v_rnd_cust_id INT UNSIGNED DEFAULT 0;
DECLARE v_rnd_send_id INT UNSIGNED DEFAULT 0;
DECLARE v_base_date DATETIME;

  SET FOREIGN_KEY_CHECKS = 0;

  TRUNCATE TABLE `sentdata`;
  TRUNCATE TABLE `the_main_table`;

  SET FOREIGN_KEY_CHECKS = 1;

  SET v_base_date = "2012-02-01 00:00:00";

  SET v_max_customers = 1000000;
  SET v_max_orders = 5000000; 

  START TRANSACTION;

  SET v_counter = 0;
  WHILE v_counter < v_max_customers DO
        INSERT INTO `the_main_table` (`email`) VALUES (CONCAT('email', v_counter+1,'@example.com'));
    SET v_counter=v_counter+1;
  END WHILE;

  COMMIT;

  START TRANSACTION;

  SET v_counter = 0;
  WHILE v_counter < v_max_orders DO

    SET v_rnd_cust_id = FLOOR(1 + (RAND() * v_max_customers));
    SET v_rnd_send_id = FLOOR(10 + (RAND() * 90));
    

        INSERT INTO `sentdata` (`RECID`, `datetime`,`CUSID`) VALUES (v_rnd_cust_id, v_base_date + INTERVAL v_counter SECOND,CONCAT('BIM',v_rnd_send_id));
    SET v_counter=v_counter+1;
  END WHILE;

  COMMIT;

END #

DELIMITER ;

CALL load_test_data();

##runs ok
SELECT mt.recid,sd.`datetime`,MAX(sd.`datetime`),sd.cusid
FROM 
	#(SELECT recid FROM `maintable` WHERE addunk = 0 AND ownedby IN ('mh','5mins','id') AND askoff = 0) as `mt` 
	`the_main_table` AS `mt`
INNER JOIN (SELECT * FROM `sentdata` WHERE cusid = 'BIM24') AS `sd` ON `mt`.`recid` = `sd`.`recid`
GROUP BY mt.recid  
HAVING MAX(sd.`datetime`) = sd.`datetime`
ORDER BY MAX(sd.`datetime`) ASC 
LIMIT 50000
;
##times out severly over 11 minutes and gave up waiting
SELECT mt.recid,sd.`datetime`,MAX(sd.`datetime`),sd.cusid
FROM 
	#(SELECT recid FROM `maintable` WHERE addunk = 0 AND ownedby IN ('mh','5mins','id') AND askoff = 0) AS `mt` 
	`the_main_table` AS `mt`
LEFT JOIN (SELECT * FROM `sentdata` WHERE cusid = 'BIM24') AS `sd` ON `mt`.`recid` = `sd`.`recid`
#WHERE addunk = 0 AND ownedby IN ('mh','5mins','id') AND askoff = 0
GROUP BY mt.recid  
ORDER BY MAX(sd.`datetime`) ASC 
LIMIT 50000;
1

Maybe this could be an approach:
Make a left join from the_master_table to sentdata.
For any missing rows on the right insert a date way back in the past and sort by this virtual date as well as by actual dates. That way you get all the records which are missing from the right side as well as the oldest ones from the right side.
Like in:

select mt.recid, mt.email, ifnull(sd.datetime,'1970-01-01') as theDate, cusid
from the_main_table mt
left join sentdata sd
on mt.RECID = sd.RECID 
where cusid is null or cusid='BIM24'
order by theDate 
limit 50000;
Votes + Comments
helpful
0

Maybe this could be an approach:
Make a left join from the_master_table to sentdata.
For any missing rows on the right insert a date way back in the past and sort by this virtual date as well as by actual dates. That way you get all the records which are missing from the right side as well as the oldest ones from the right side.
Like in:

select mt.recid, mt.email, ifnull(sd.datetime,'1970-01-01') as theDate, cusid
from the_main_table mt
left join sentdata sd
on mt.RECID = sd.RECID 
where cusid is null or cusid='BIM24'
order by theDate 
limit 50000;

Thanks for that, i can see that ifnull function being very useful to know for future, it's getting close. For the query to pull the correct data it needs to be ran like this:

SELECT mt.recid, mt.email, sd.datetime, cusid
FROM the_main_table mt
LEFT JOIN (SELECT recid,`datetime`,cusid FROM sentdata WHERE `cusid` = 'BIM24') sd
ON mt.RECID = sd.RECID 
ORDER BY sd.datetime ASC
LIMIT 5000000;

The fact the datetime field is null isn't an issue, it's when an email hasn't been sent to by 'BIM24' but it has been sent to by someone else such as 'BIM30'. When someone else has sent to it the WHERE clause excludes it from the result set. This is quite important we have brand new entries that haven't sent to anything up to longtime ones that have already sent to everything - but at the same time new data will come in they havn't sent to as well

Only joining on emails that 'BIM24' has sent to would leave the rest NULL and null emails won't get excluded by the where clause and NULL's are ordered at the top. Soon as i do this however the query just times out i've not actually tried letting it run through - i'm trying it now and its currently at 14 mins and still going.

The query is stuck in this state: Copying to tmp table on disk

We have full access to the server, its a dedicated linux mysql server - it does nothing else we can edit the mysql ini file if thats not setup well. I'm thinking it's some index not set right cause such a small change shouldn't make such a huge difference - i'd of thought it would run much faster excluding records before the join.

0

I do not understand any more what you're trying to get at. Could you prepare a test case with just 2 or 3 rows which shows which ones you would like to include and which not?
I also do not understand the subselect. Why not like this:

SELECT mt.recid, mt.email, sd.datetime, cusid
FROM the_main_table mt
LEFT JOIN sentdata  sd
ON mt.RECID = sd.RECID 
WHERE `cusid` = 'BIM24'
ORDER BY sd.datetime ASC
LIMIT 5000000;

This runs on my test system in 3 seconds.

0

Hi Heres the 2 query examples

This one pulls full list 1million rows took 28 mins and pulled all 1million rows:

/*[11:32:39][1699412 ms]*/ SELECT mt.recid, mt.email, sd.datetime, cusid  FROM the_main_table mt  LEFT JOIN (SELECT recid,`datetime`,cusid FROM sentdata WHERE `cusid` = 'BIM24') sd  ON mt.RECID = sd.RECID ORDER BY sd.datetime ASC  LIMIT 5000000;

Your one, the one that runs fast took 6.8s but pulled 254k

/*[12:58:26][6880 ms]*/ SELECT mt.recid, mt.email, sd.datetime, cusid  FROM the_main_table mt  LEFT JOIN sentdata sd  ON mt.RECID = sd.RECID   WHERE cusid IS NULL OR cusid='BIM24'  ORDER BY sd.datetime ASC  LIMIT 5000000;

In the second the other 746k records have been sent to before by a different cusid.

so cusid != 'BIM24' AND cusid is not null and they are excluded - i don't want them to be excluded. The first one they are returned as NULL and are included in the result set.

eg.

the_main_table
email1
email2
email3

sentdata
email1,bim24,2012-02-01
email2,bim56,2012-02-01

SELECT mt.recid, mt.email, sd.datetime, cusid FROM the_main_table mt LEFT JOIN sentdata sd ON mt.RECID = sd.RECID WHERE cusid IS NULL OR cusid='BIM24' ORDER BY sd.datetime ASC LIMIT 5000000

result
email1,bim24,2012-02-01
email3,null,null


The only purpose of the left join onto sentdata is to get 20,000 records that havn't been sent to then if there arn't 20,000 get the oldest send date to make up the number to 20,000

eg. for a list of 5

email1,null,null
email2,null,null
email3,null,null
email4,bim24,2012-01-04
email5,bim24,2012-01-06
*LIMIT REACHED*
email6,bim24,2012-01-08
email7,bim24,2012-01-09

On the next pull after they have been sent the top 5 move down to the bottom as they have the most recent datesent now.

I just need a query to be able to pull them records that havn't been sent/oldest datesent within 2 minutes preferably

0

But this is exactly what my query where I introduced the ifnull function does: It selects first all records from the_main_table which do not have a matching record in sentdata, and second and additionally all records in sentdata.
Test case:

drop table if exists main_table;
create table main_table (email char(10));
insert into main_table values ('email1'),('email2'),('email3');
drop table if exists sent_table;
create table sent_table (email char(10),cusid char(10),date date );
insert into sent_table values ('email1','bim24','2012-02-01'), ('email2','bim56','2012-02-01');
select mt.email, ifnull(sd.date,'1970-01-01') as theDate, cusid
from main_table mt
left join sent_table sd
on mt.email=sd.email
where cusid is null or cusid='BIM24'
order by theDate;

Result:

+--------+------------+-------+
| email  | theDate    | cusid |
+--------+------------+-------+
| email3 | 1970-01-01 | NULL  |
| email1 | 2012-02-01 | bim24 |
+--------+------------+-------+
0

But this is exactly what my query where I introduced the ifnull function does: It selects first all records from the_main_table which do not have a matching record in sentdata, and second and additionally all records in sentdata.
Test case:

drop table if exists main_table;
create table main_table (email char(10));
insert into main_table values ('email1'),('email2'),('email3');
drop table if exists sent_table;
create table sent_table (email char(10),cusid char(10),date date );
insert into sent_table values ('email1','bim24','2012-02-01'), ('email2','bim56','2012-02-01');
select mt.email, ifnull(sd.date,'1970-01-01') as theDate, cusid
from main_table mt
left join sent_table sd
on mt.email=sd.email
where cusid is null or cusid='BIM24'
order by theDate;

Result:

+--------+------------+-------+
| email  | theDate    | cusid |
+--------+------------+-------+
| email3 | 1970-01-01 | NULL  |
| email1 | 2012-02-01 | bim24 |
+--------+------------+-------+

Thats not the result set i want, i want this one:

email,theDate,cusid
email2,1970-01-01,\N
email3,1970-01-01,\N
email1,2012-02-01,bim24

using this query on that setup gives the right result set:

SELECT mt.email, IFNULL(sd.DATE,'1970-01-01') AS theDate, cusid
FROM main_table mt
LEFT JOIN (SELECT email,`DATE`,cusid FROM sent_table WHERE `cusid` = 'BIM24') sd
ON mt.email=sd.email
WHERE cusid IS NULL OR cusid='BIM24'
ORDER BY theDate;

The problem is when scaled up in volume the time is way too long, i think it could even go to 5 minutes but that is pushing it ideally the query should be complete within 2 minutes.

I get the feeling some index is missing once "LEFT JOIN (SELECT email,`DATE`,cusid FROM sent_table WHERE `cusid` = 'BIM24') sd" is there a way to add an index to that subquery?

0

Going around in circles.
In this test setup, you want email1 because it links to bim24. You want email3 because there has no mail yet been sent to email3. Why do you want email2?
Also, the subselect in your query still does not make sense. It leads to a tautology.
The subselect selects all from sent_table where cusid='bim24' . Because this select is right table in the left join, all right table rows have either cusid='bim24' or isnull(cusid) . So the subsequent Where clause WHERE cusid IS NULL OR cusid='BIM24' in effect selects all rows in the right table, and, because of the left join, all in the left table too. So you are just selecting all rows from main_table in a quite complicated way.

0

Going around in circles.
In this test setup, you want email1 because it links to bim24. You want email3 because there has no mail yet been sent to email3. Why do you want email2?
Also, the subselect in your query still does not make sense. It leads to a tautology.
The subselect selects all from sent_table where cusid='bim24' . Because this select is right table in the left join, all right table rows have either cusid='bim24' or isnull(cusid) . So the subsequent Where clause WHERE cusid IS NULL OR cusid='BIM24' in effect selects all rows in the right table, and, because of the left join, all in the left table too. So you are just selecting all rows from main_table in a quite complicated way.

Yes in the actual queries it has a few subqueries and about 4 or 5 where clauses that filter out the actual list in maintable.

the full data is about 5.8million and the full list with no critera drops it to around 800k, if its done with sic code or postcode etc - people want to target a specific area this full list changes even just 2000 emails are pulled from maintable.

This is the list pulled from maintable, for a brand new person its simple. It can just pick any cause it has never sent to any before. After the first send however, say 20k, the list needs to know which have been sent and when.

mainsent stores this data of every email sent

For the next send the data pulled needs to take into account what has already been sent and pick the next 20k that are either the oldest or have never been sent.

so in the test setup:
I want email1 because it is in bim24's selected list, last in the list because its the only one been sent to

I want email2 because it is in bim24's selected list and he has never sent it ordered at top

And I want email3 because it is in bim24's selected list and he has never sent it ordered at top

and yeah the where clause makes no difference as they are already either bim24 or null

This cannot be done in a field in maintable. The only purpose of mainsent is to order the selected emails from maintable. If there was a way to pre-calculate which 20k to send and do a email IN from maintable that would work also.

0

Worked out a solution now with temp tables.

If anyone is interested:

DROP TEMPORARY TABLE IF EXISTS temp_maintable_bim24;
DROP TEMPORARY TABLE IF EXISTS temp_mainsent_bim24;
CREATE TEMPORARY TABLE temp_mainsent_bim24 SELECT recid,`datetime`,`cusid` FROM mainsent WHERE `cusid` = 'bim24';
ALTER TABLE temp_mainsent_bim24 ADD INDEX (recid),ADD INDEX (`datetime`);
CREATE TEMPORARY TABLE temp_maintable_bim24 SELECT recid, email,domain, postcode, postcodearea, subpostcode,town, employees, turnover, sic4code, sic4desc, sic2code, sic2desc, trapscore, company, prefix, fname, sname, askoff FROM maintable WHERE `ownedby` IN ('MH','ID') AND addunk = 0 AND askoff = 0;
ALTER TABLE temp_maintable_bim24 ADD UNIQUE INDEX (recid);
;
SELECT maintable.RECID, email,ms.`datetime`
FROM temp_maintable_bim24 AS `maintable` 
LEFT JOIN `temp_mainsent_bim24` AS `ms` ON maintable.recid = ms.recid 
GROUP BY maintable.RECID 
ORDER BY MAX(ms.`datetime`) ASC 
LIMIT 0,20000

Thanks for helping smantscheff!

0

Thanks for the update. I believe that your database design has some flaws which lead to your complex queries. But, as they say, never change a running system. And never run a changing one.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.