Greetings, coders from a Canadian near Toronto.
I have been working on an online botanical tree of as many woods that I can identify as I can. To be sure, the following link is absolutely not any sales or promotional gimmick. I just want to make it easier for anyone helping in our forum to be able to see first hand what I am doing. Its a lot easier to relate then since this is not a typical personal, commercial or none profit site. Its purpose is unique in the world of websites as a knowledge base for wood around the world. This is becoming the new repository for over 8 years of research into wood that before was totally in one text file approaching 5.9 meg.
http://www.prowebcanada.com/taxa (taxa - short for taxonomy)

Except for a couple of pages (such as the intro page), the entire site is dynamically generated with PHP and data stored in MySQL tables. Even most of the HTML is not static but generated on the fly for each page request.

Although there is a lot of information that goes along with all the wood names, the emphasis on the research I am doing has been especially to identify and record as many woods in the world as I can ----> the more the better.

Until recently, I was reporting about 6,550 woods all by botanical names. Using common names instead just doesn't work, trust me. I also have a list of all the woods in the world's second largest wood collection in the Netherlands, the Tervuren wood collection. Even after
cleaning out any duplicates (from over 17,000) there are still over 11,300 wood names in my Tervuren MySQL table. When I can finally append all the NEW species names into my main (or master) 'species' table, my project will take a huge jump forward by better than doubling the number of woods in the world I can report.

To preserve the original files I am working on, I copied them over as 'tervuren_target' (where the new records would come from) and 'species' as species_master (the table into which the new Tervuren data should be appended without duplicates.

Once that is done, I can later rename the file or copy the 'species_master' over again to just 'species' that is crucial to the operation of the species level.

At one point, I was able to append all the files in tervuren_target'into 'species_master (over 17,000 records) but in excess of 3,000 duplicates came over even though I thought I had added code to prevent this.

I sought advice from someone else and it was suggested that the following script should do the job:

INSERT INTO species_master (genera, name, species_name, authorities, species_source)

SELECT DISTINCT genera, name, species_name, authorities, species_source FROM 'tervuren_target'
LEFT JOIN species_master
ON
tervuren_target.species_name! = species_master.species_name
WHERE species_master.species_name IS NULL

---- but that did NOT work. I am a junior PHP and SQL programmer, slow at coding. When things don't work, I often end up trying all kinds of variations and that is what I did --- still stuck on not finding the script that will work.

Since both tables can be cleaned of duplicates within each separate file (such as with a SELECT DISTINCT statement) and because to all I have ever seen, DISTINCT can be used ONLY internal to one file at a time and not across files (am I right?), using a DISTINCT command is useless as long as the two files are cleaned of duplicates before the merge.

So ...... please help me craft an SQL statement that will append all names that are in 'tervuren_target' that are NOT in the 'species_master' file --- into the 'species_master'. The comparison should be on species_name. The end result should have no duplicates of 'species-name'.

Fingers-crossed, legs crossed, eyes even crossed, I hope someone can come up with a statement that will actually work.

With thanks in advance,

Bill Mudry
Mississauga, Ontario Canada
(Canada's 6th largest city).

Recommended Answers

All 4 Replies

I think you are making one minor mistake intead of != you must use = sign in join condition.

INSERT INTO species_master (genera, name, species_name, authorities, species_source)

SELECT DISTINCT genera, name, species_name, authorities, species_source FROM tervuren_target
LEFT JOIN species_master
ON
tervuren_target.species_name = species_master.species_name
WHERE species_master.species_name IS NULL

I think you are making one minor mistake intead of != you must use = sign in join condition.

INSERT INTO species_master (genera, name, species_name, authorities, species_source)

SELECT DISTINCT genera, name, species_name, authorities, species_source FROM tervuren_target
LEFT JOIN species_master
ON
tervuren_target.species_name = species_master.species_name
WHERE species_master.species_name IS NULL

I noticed that some of the field names had changed since the suggested code had been run.
Instead I ran the following code:

INSERT INTO species_master (genus_name, species_name, authorities_species, source)
SELECT DISTINCT genus_name2, species_name2, authorities_species2, source2 FROM tervuren_target
LEFT JOIN species_master
ON
tervuren_target.species_name2 = species_master.species_name
WHERE species_master.species_name IS NULL


----------------------------------------------------------------------------------

However I got the most unusual error come up:

Error
SQL query:

SHOW TABLE STATUS LIKE 'tervuren_target';


MySQL said:

#2006 - MySQL server has gone away

I have never seen an error like that! Whatever could be happening to make 'the
MySQL server go away'?? Weird!

I ran this twice and got the same error both times. Where do I go from here, now?

Bill Mudry
Mississauga, Ontario

You Just remove DISTINCT keyword, and add " limit 10 " at the end of query, lets add only 10 records first.
I think your datasize is too big.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.