FIND_IN_SET() performance

Question

Dani 4,675 The Queen of DaniWeb

8 Years Ago

What is FIND_IN_SET() performance like? It it something that should be avoided at all costs? I want to search a column that consists of comma-delimited numbers for rows that contain a specific number.

mysql

4 Contributors
6 Replies
9K Views
4 Days Discussion Span
Latest Post 8 Years Ago Latest Post by pty

All 6 Replies

cereal 1,524 Nearly a Senior Poster

8 Years Ago

FIND_IN_SET() does not use indexes, so it can become slow. You can try with a regular expression and LIKE:

SELECT `column` FROM `table` WHERE `column` LIKE '123%' AND `column` RLIKE '123[0-9]{0,3}';

See this post for more information:

https://www.psce.com/blog/2012/03/31/mysql-queries-with-regexp/

pty 882 Posting Pro

8 Years Ago

It sounds like you aren't working to the strengths of the database. MySQL, unlike Oracle, PostgreSQL etc doesn't support arrays, and searching on text is unlikely to be optimal.

I don't know the context of the original query, so it's difficult to suggest a solution, but it could be something like this:

 Tags:
 - id: 1
   name: Programming
 - id: 2
   name: Databases
 - id: 3
   name: Web Development
 - id: 4
   name: Floristry
 - id: 5
   name: Kids TV

 Posts:
 - id: 1
   name: Making pages faster
   tags: [1, 2, 3]
 - id: 2
   name: CBeebies top ten
   tags: [5]

So, to find all posts about Web Development you'd look through all of the posts, filtering on the tags field for the inclusion of 5. In PostgreSQL, using the array type, we'd do something like this:

select * from posts;
┌────┬──────────────┬─────────┐
│ id │     name     │  tags   │
├────┼──────────────┼─────────┤
│  1 │ CBeebies     │ {5}     │
│  2 │ Fast Website │ {1,2,4} │
│  3 │ Programming  │ {4,6}   │
└────┴──────────────┴─────────┘
(3 rows)

Time: 0.282 ms
peter=# select * from posts where tags @> '{5}';
┌────┬──────────┬──────┐
│ id │   name   │ tags │
├────┼──────────┼──────┤
│  1 │ CBeebies │ {5}  │
└────┴──────────┴──────┘
(1 row)

That's fine, but that doesn't give you the benefits of using a relational database. What if someone deletes the Kids TV tag from the tags table? Well, our arrays (or strings full of comma separated values) will point to nothing. That's precisely what we wouldn't want.

What should really be going on here is that we should, rather than an 'array', be storing that data in a link table.

create table post_tags (
  id serial primary key, 
  tag_id integer references tags(id),
  post_id integer references posts(id)
);

The references clauses for the post_id and tag_id fields indicate that a foreign key will be created. A foreign key is a field in one table that uniquely identifies a row of another table. They can be configured to cascade or restrict operations done to either side of the relationship.

So, you could configure it to, when you delete your Programming tag, it could cascade that deletion to all Programming posts or (more sensibly) restrict the deletion!

Now that we have a link table in place, our query is actually quite simple and fast (once we've added indexes, at least)

select
   p.name as "post name",
   t.name as "tag name"
from
  posts p
  inner join post_tags pt on p.id = pt.post_id
  inner join tags t on pt.tag_id = t.id
where
  t.name = 'Kids TV';

┌───────────┬──────────┐
│ post name │ tag name │
├───────────┼──────────┤
│ CBeebies  │ Kids TV  │
└───────────┴──────────┘
(1 row)

Time: 0.732 ms

Now, we have a extremely fast query, safe data with enforced integrity and we're not relying on any text manipulation, regular expressions or vendor-specific functionality.

Edited 8 Years Ago by pty because: All content after an emoji disappeared!

cereal commented: nice explanation, thanks for sharing! +14

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Dani 4,675 The Queen of DaniWeb Administrator Featured Poster Premium Member · Answer 1 · 2016-12-26T21:45:00+00:00

Thank you but the article you linked to does not mention RLIKE. Do you mean REGEXP?

cereal 1,524 Nearly a Senior Poster Featured Poster · Answer 2 · 2016-12-27T00:23:40+00:00

Yes, in MySQL, RLIKE is a synonym of REGEXP:

http://dev.mysql.com/doc/refman/5.7/en/regexp.html

Dani 4,675 The Queen of DaniWeb Administrator Featured Poster Premium Member · Answer 3 · 2016-12-27T10:08:17+00:00

Thanks! Sorry though, I'm not really experienced in regular expressions and I'm unsure of how the code you provided mimics the behavior of FIND_IN_SET(123, column_name).

I would need it to match on rows where the column value is:
123,456,789
2,4,123
987,123,23

Something like LIKE '123%' would only match on the first.

Thinking I might just stick to find in set because it will only ever need to seek through fewer than 100 rows.

Васил 0 Newbie Poster · Answer 4 · 2016-12-30T10:54:59+00:00

Something like LIKE '123%' would only match on the first.
Yes, you can try this one: LIKE '%123%' to find all rows instead of LIKE '123%'

FIND_IN_SET() performance

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers