Hey!

I've got a SELECT query which fetches rows based on a search string using LIKE command. I was wondering, how could I make it to skip "The " from the beginning of the value?

For example, I have TV series listed in the database, like "Futurama", "The Office", "The Friends" and "The Simpsons". Now, if I want to search with a string "F", how can I make it to exclude the "the" from the beginning so I could have a resultset of "Futurama" and "The Friends"?

My current query is like this:
"SELECT name FROM series WHERE name LIKE 'word%' ORDER BY name"

I know it could be done by changing 'word%' to '%word%', but then I would get other kinds of results too, like, from the example, "The Office". :)

Recommended Answers

All 8 Replies

You could use the replace function:

SELECT name FROM series WHERE replace(name,'The ','') LIKE 'word%' ORDER BY replace(name,'The ','')

This would also replace "The" if it is somewhere in the string. To avoid that (and with no pattern matching replace function in MySQL) you could either write a function which does the replacement only at the start, or you can insert a marker for the start of the string:

SELECT name FROM series WHERE replace(concat('~',name),'~The ','') LIKE 'word%' ORDER BY replace(name,'The ','')

(Not tested)

Thanks for the answers. The first one isn't that helpful, since "the" is quite nice word to have in the middle of strings.

The second one is nice, but doesn't that mess up the rows without "the" in the beginning?

The solution might use a little bit refining. How about this one:

create table test (textfield varchar(255));
insert into test values ('The Opera'), ('Opera, The'), ('My Opera' );
select * from test where textfield rlike "^(The )?opera";
+------------+
| textfield  |
+------------+
| The Opera  |
| Opera, The |
+------------+

That seems to be OK, thanks for your effort. :)

I ended up making a java snippet which decides whether to call only the "plain name" query, or to call "The + "name"" query. The snippet chooses the correct query by boolean value. :)

Why would you not want to use %word% ? With your way, if you have something like "Harry Potter and the Chamber of Secrets" and the person searches for "Chamber of Secrets" then no result would be returned.

If you have code to ignore "The" every time, then if someone searches for "The Simpsons" nothing would be returned as you are ignoring "The" from the database.

Also, if I'm not mistaken, smantscheff's code will only replace "The " and not "the " and since small words such as "the", "a", "and" etc. are supposed to be in lowercase when in titles, it shouldn't replace those.

Why would you not want to use %word% ? With your way, if you have something like "Harry Potter and the Chamber of Secrets" and the person searches for "Chamber of Secrets" then no result would be returned.

If you have code to ignore "The" every time, then if someone searches for "The Simpsons" nothing would be returned as you are ignoring "The" from the database.

Also, if I'm not mistaken, smantscheff's code will only replace "The " and not "the " and since small words such as "the", "a", "and" etc. are supposed to be in lowercase when in titles, it shouldn't replace those.

CREATE TABLE IF NOT EXISTS `test` (
  `name` varchar(255) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `test` (`name`) VALUES
('Futurama'),
('The Office'),
('The Friends'),
('The Simpsons'),
('Harry Potter and the Chamber of Secrets');
SELECT name
FROM `test`
WHERE name
RLIKE '[[:<:]]F'
OR name
RLIKE 'Chamber of Secrets';

This gives the following output:

+-----------------------------------------+
| name                                    |
+-----------------------------------------+
| Futurama                                | 
| The Friends                             | 
| Harry Potter and the Chamber of Secrets | 
+-----------------------------------------+

I don't find this regex character class [:<:] in the docs - what does it mean?

I don't find this regex character class [:<:] in the docs - what does it mean?

It means 'start-of-word boundary'. I wanted to search on upper-case 'F' but RLIKE doesn't have a case-sensitive mode, so I looked for a way to specify 'F' only when it starts a word. It's kind of buried in the docs at http://dev.mysql.com/doc/refman/5.1/en/regexp.html Scroll down to where it says

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.