•
•
•
•
What is DaniWeb IT Discussion Community?
You're currently browsing the MySQL section within the Web Development category of DaniWeb, a massive community of 391,774 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,428 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our MySQL advertiser:
Views: 416 | Replies: 1
![]() |
•
•
Join Date: Jul 2008
Posts: 1
Reputation:
Rep Power: 0
Solved Threads: 0
Hello to all,
I am quite new to sql, and so the time has come that I don't understand why a query has a certain output.I apologise for the long post but I'd rather describe exactly what I am doing in, hoping that someone can spot the problem that I fail to.
I have created and populated two tables in my database as:
And after loading the data it looks like:
id start_bindsite end_bindsite
1 10212250 10212257
2 10105334 10105368
3 101556063 101556073
And
After population it looks like:
gene_start gene_end gene_name
10212248 10312253 gene1
10212258 10322250 gene2
101556070 102556060 gene3
What I want is to find which gene from table genome is closer to each of the sites 1, 2, 3 in table binding_sites.
I have created a nested query that calculates the linear distances between the gene_start and both the start_bindsite and end_bindsite and then with the outer select I wish to choose the minimum distance. The problem is that when I group by the start_bindsite field the output is indeed for each binding site the minimum distance but the gene_name is always gene1 , whereas for the last two binding sites it is gene2 and gene3 respectively that should be presented. And when grouping by other fields or combination of other fields doesn't seem to help.
The query:
And the output
+-----------+-----------+-----------+------------------+
| start | end | gene_name | minimum_distance |
+------------+-----------+-----------+------------------+
| 10105334 | 10105368 | gene1 | 106880 |
| 10212250 | 10212257 | gene1 | 1 |
| 101556063 | 101556073 | gene1 | 3 |
+-----------+-----------+------------+------------------+
3 rows in set (0.00 sec)
Does anyone have any idea what am I doing wrong here? Any hint or suggestion would be hugely appreciated!!
Thanks for your time in advance,
Eva
I am quite new to sql, and so the time has come that I don't understand why a query has a certain output.I apologise for the long post but I'd rather describe exactly what I am doing in, hoping that someone can spot the problem that I fail to.
I have created and populated two tables in my database as:
create table binding_sites (id integer, start_bindsite integer, end_bindisite integer);
id start_bindsite end_bindsite
1 10212250 10212257
2 10105334 10105368
3 101556063 101556073
And
create table genome (gene_name varchar(50), gene_start integer, gene_end integer);
gene_start gene_end gene_name
10212248 10312253 gene1
10212258 10322250 gene2
101556070 102556060 gene3
What I want is to find which gene from table genome is closer to each of the sites 1, 2, 3 in table binding_sites.
I have created a nested query that calculates the linear distances between the gene_start and both the start_bindsite and end_bindsite and then with the outer select I wish to choose the minimum distance. The problem is that when I group by the start_bindsite field the output is indeed for each binding site the minimum distance but the gene_name is always gene1 , whereas for the last two binding sites it is gene2 and gene3 respectively that should be presented. And when grouping by other fields or combination of other fields doesn't seem to help.
The query:
select temp.start, temp.end, temp.gene_name, min(temp.distance) as minimum_distance
from (select start_bindsite, end_bindsite, gene_name, abs(gene_start-start_bindsite) as distance
from genome, binding_sites
group by gene_name, start_bindsite
union
select start_bindsite, end_bindsite, gene_name, abs(gene_start-end_bindsite) as distance
from genome, binding_sites
group by gene_name, start_bindsite) as temp
group by temp.startAnd the output
+-----------+-----------+-----------+------------------+
| start | end | gene_name | minimum_distance |
+------------+-----------+-----------+------------------+
| 10105334 | 10105368 | gene1 | 106880 |
| 10212250 | 10212257 | gene1 | 1 |
| 101556063 | 101556073 | gene1 | 3 |
+-----------+-----------+------------+------------------+
3 rows in set (0.00 sec)
Does anyone have any idea what am I doing wrong here? Any hint or suggestion would be hugely appreciated!!
Thanks for your time in advance,
Eva
•
•
Join Date: Apr 2008
Posts: 295
Reputation:
Rep Power: 1
Solved Threads: 41
Hello,
1. how come that in
>>> select temp.start, temp.end, temp.gene_name, min(temp.distance) as
>>> minimum_distance from (select start_bindsite, end_bindsite, gene_name,
>>> abs(gene_start-start_bindsite) as distance from genome, binding_sites group by
>>> gene_name, start_bindsite
group-by clause contains 2 columns only whereas you select 3 columns? this should produce a true error, if not, something is wrong with your database.
2. Because of
>>> from genome, binding_sites
you are doing inner join. Therefore you need a where clause with at least one join constraints. Because this is missing an undesirable cross join would be computed.
(I advice you a more modern join construct: from genome g INNER JOIN binding_sites b ON g.column = b.column. Here predicate g.column = b.column is the join constraint.)
krs,
tesu
1. how come that in
>>> select temp.start, temp.end, temp.gene_name, min(temp.distance) as
>>> minimum_distance from (select start_bindsite, end_bindsite, gene_name,
>>> abs(gene_start-start_bindsite) as distance from genome, binding_sites group by
>>> gene_name, start_bindsite
group-by clause contains 2 columns only whereas you select 3 columns? this should produce a true error, if not, something is wrong with your database.
2. Because of
>>> from genome, binding_sites
you are doing inner join. Therefore you need a where clause with at least one join constraints. Because this is missing an undesirable cross join would be computed.
(I advice you a more modern join construct: from genome g INNER JOIN binding_sites b ON g.column = b.column. Here predicate g.column = b.column is the join constraint.)
krs,
tesu
Information is moving—you know, nightly news is one way, of course, but it's also moving through the blogosphere and through the Internets. I promise you I will listen to what has been said here, even though I wasn't here. Ann and I will carry out this equivocal message to the world. I'm the master of low expectations.
![]() |
•
•
•
•
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
•
•
•
•
•
•
•
•
DaniWeb MySQL Marketplace
- Need another set of eyes to my gutted code (PHP)
- Nested loop (ASP)
- SQL SERVER 2000 Selecting a record based on an aggregate function (MS SQL)
- Problem with query. (MS SQL)
- SQL query problem with WHERE clause (ASP)
- RunTime 3075, cant see the problem? And Values are as they should be... (Visual Basic 4 / 5 / 6)
- Cold Fusion Loop problem (ColdFusion)
Other Threads in the MySQL Forum
- Previous Thread: Selecting customers that follow a particular trend
- Next Thread: IF THEN statement would not work


Linear Mode