What is FIND_IN_SET() performance like? It it something that should be avoided at all costs? I want to search a column that consists of comma-delimited numbers for rows that contain a specific number.

Recommended Answers

All 6 Replies

FIND_IN_SET() does not use indexes, so it can become slow. You can try with a regular expression and LIKE:

SELECT `column` FROM `table` WHERE `column` LIKE '123%' AND `column` RLIKE '123[0-9]{0,3}';

See this post for more information:

Thank you but the article you linked to does not mention RLIKE. Do you mean REGEXP?

Thanks! Sorry though, I'm not really experienced in regular expressions and I'm unsure of how the code you provided mimics the behavior of FIND_IN_SET(123, column_name).

I would need it to match on rows where the column value is:
123,456,789
2,4,123
987,123,23

Something like LIKE '123%' would only match on the first.

Thinking I might just stick to find in set because it will only ever need to seek through fewer than 100 rows.

Something like LIKE '123%' would only match on the first.
Yes, you can try this one: LIKE '%123%' to find all rows instead of LIKE '123%'

It sounds like you aren't working to the strengths of the database. MySQL, unlike Oracle, PostgreSQL etc doesn't support arrays, and searching on text is unlikely to be optimal.

I don't know the context of the original query, so it's difficult to suggest a solution, but it could be something like this:

 Tags:
 - id: 1
   name: Programming
 - id: 2
   name: Databases
 - id: 3
   name: Web Development
 - id: 4
   name: Floristry
 - id: 5
   name: Kids TV

 Posts:
 - id: 1
   name: Making pages faster
   tags: [1, 2, 3]
 - id: 2
   name: CBeebies top ten
   tags: [5]

So, to find all posts about Web Development you'd look through all of the posts, filtering on the tags field for the inclusion of 5. In PostgreSQL, using the array type, we'd do something like this:

select * from posts;
┌────┬──────────────┬─────────┐
│ id │     name     │  tags   │
├────┼──────────────┼─────────┤
│  1 │ CBeebies     │ {5}     │
│  2 │ Fast Website │ {1,2,4} │
│  3 │ Programming  │ {4,6}   │
└────┴──────────────┴─────────┘
(3 rows)

Time: 0.282 ms
peter=# select * from posts where tags @> '{5}';
┌────┬──────────┬──────┐
│ id │   name   │ tags │
├────┼──────────┼──────┤
│  1 │ CBeebies │ {5}  │
└────┴──────────┴──────┘
(1 row)

That's fine, but that doesn't give you the benefits of using a relational database. What if someone deletes the Kids TV tag from the tags table? Well, our arrays (or strings full of comma separated values) will point to nothing. That's precisely what we wouldn't want.

What should really be going on here is that we should, rather than an 'array', be storing that data in a link table.

create table post_tags (
  id serial primary key, 
  tag_id integer references tags(id),
  post_id integer references posts(id)
);

The references clauses for the post_id and tag_id fields indicate that a foreign key will be created. A foreign key is a field in one table that uniquely identifies a row of another table. They can be configured to cascade or restrict operations done to either side of the relationship.

So, you could configure it to, when you delete your Programming tag, it could cascade that deletion to all Programming posts or (more sensibly) restrict the deletion!

Now that we have a link table in place, our query is actually quite simple and fast (once we've added indexes, at least)

select
   p.name as "post name",
   t.name as "tag name"
from
  posts p
  inner join post_tags pt on p.id = pt.post_id
  inner join tags t on pt.tag_id = t.id
where
  t.name = 'Kids TV';

┌───────────┬──────────┐
│ post name │ tag name │
├───────────┼──────────┤
│ CBeebies  │ Kids TV  │
└───────────┴──────────┘
(1 row)

Time: 0.732 ms

Now, we have a extremely fast query, safe data with enforced integrity and we're not relying on any text manipulation, regular expressions or vendor-specific functionality.

commented: nice explanation, thanks for sharing! +14
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.