Search query excluding "The" from values

Question

bleedi 3 Junior Poster in Training

14 Years Ago

Hey!

I've got a SELECT query which fetches rows based on a search string using LIKE command. I was wondering, how could I make it to skip "The " from the beginning of the value?

For example, I have TV series listed in the database, like "Futurama", "The Office", "The Friends" and "The Simpsons". Now, if I want to search with a string "F", how can I make it to exclude the "the" from the beginning so I could have a resultset of "Futurama" and "The Friends"?

My current query is like this:
"SELECT name FROM series WHERE name LIKE 'word%' ORDER BY name"

I know it could be done by changing 'word%' to '%word%', but then I would get other kinds of results too, like, from the example, "The Office". :)

mysql

4 Contributors
8 Replies
182 Views
19 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by d5e5

Borzoi 24 Posting Whiz

14 Years Ago

Why would you not want to use %word% ? With your way, if you have something like "Harry Potter and the Chamber of Secrets" and the person searches for "Chamber of Secrets" then no result would be returned.

If you have code to ignore "The" every time, then if someone searches for "The Simpsons" nothing would be returned as you are ignoring "The" from the database.

Also, if I'm not mistaken, smantscheff's code will only replace "The " and not "the " and since small words such as "the", "a", "and" etc. are supposed to be in lowercase when in titles, it shouldn't replace those.

d5e5 109 Master Poster

14 Years Ago

Why would you not want to use %word% ? With your way, if you have something like "Harry Potter and the Chamber of Secrets" and the person searches for "Chamber of Secrets" then no result would be returned.
If you have code to ignore "The" every time, then if someone searches for "The Simpsons" nothing would be returned as you are ignoring "The" from the database.
Also, if I'm not mistaken, smantscheff's code will only replace "The " and not "the " and since small words such as "the", "a", "and" etc. are supposed to be in lowercase when in titles, it shouldn't replace those.

CREATE TABLE IF NOT EXISTS `test` (
  `name` varchar(255) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `test` (`name`) VALUES
('Futurama'),
('The Office'),
('The Friends'),
('The Simpsons'),
('Harry Potter and the Chamber of Secrets');
SELECT name
FROM `test`
WHERE name
RLIKE '[[:<:]]F'
OR name
RLIKE 'Chamber of Secrets';

This gives the following output:

+-----------------------------------------+
| name                                    |
+-----------------------------------------+
| Futurama                                | 
| The Friends                             | 
| Harry Potter and the Chamber of Secrets | 
+-----------------------------------------+

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

smantscheff 265 Veteran Poster · Answer 1 · 2010-11-19T15:09:24+00:00

You could use the replace function:

SELECT name FROM series WHERE replace(name,'The ','') LIKE 'word%' ORDER BY replace(name,'The ','')

This would also replace "The" if it is somewhere in the string. To avoid that (and with no pattern matching replace function in MySQL) you could either write a function which does the replacement only at the start, or you can insert a marker for the start of the string:

SELECT name FROM series WHERE replace(concat('~',name),'~The ','') LIKE 'word%' ORDER BY replace(name,'The ','')

(Not tested)

bleedi 3 Junior Poster in Training · Answer 2 · 2010-11-19T16:04:02+00:00

Thanks for the answers. The first one isn't that helpful, since "the" is quite nice word to have in the middle of strings.

The second one is nice, but doesn't that mess up the rows without "the" in the beginning?

smantscheff 265 Veteran Poster · Answer 3 · 2010-11-19T16:27:13+00:00

The solution might use a little bit refining. How about this one:

create table test (textfield varchar(255));
insert into test values ('The Opera'), ('Opera, The'), ('My Opera' );
select * from test where textfield rlike "^(The )?opera";

+------------+
| textfield  |
+------------+
| The Opera  |
| Opera, The |
+------------+

bleedi 3 Junior Poster in Training · Answer 4 · 2010-11-19T16:30:38+00:00

That seems to be OK, thanks for your effort. :)

I ended up making a java snippet which decides whether to call only the "plain name" query, or to call "The + "name"" query. The snippet chooses the correct query by boolean value. :)

smantscheff 265 Veteran Poster · Answer 5 · 2010-11-19T22:43:26+00:00

I don't find this regex character class [:<:] in the docs - what does it mean?

d5e5 109 Master Poster · Answer 6 · 2010-11-20T03:37:37+00:00

I don't find this regex character class [:<:] in the docs - what does it mean?

It means 'start-of-word boundary'. I wanted to search on upper-case 'F' but RLIKE doesn't have a case-sensitive mode, so I looked for a way to specify 'F' only when it starts a word. It's kind of buried in the docs at http://dev.mysql.com/doc/refman/5.1/en/regexp.html Scroll down to where it says

[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).