Hi all,
I using PHP & MySql. Here's the situation:

1. I have a website which is a meeting place for employees & employers. Employees can sign up for an account & select the skills they have. These skills list will be shown on their resume. I have an array of about 100 skills & I am storing the values directly in DB as comma separated values.
For example, I have skills like PHP, ASP, .NET etc. & they have values of 1, 2, 3 etc. Say employee checks 1,2,4,6,10 skills (out of 100), this will be stored under 'skills' field in 'Resume' table as 1, 2, 4, 6, 10.
When the Employee visits his resume, the corresponding skills will be shown instead of the skill numbers (which is right). i.e on his resume, Employee sees PHP, ASP, C#, Java, Cobol (These are the respective skill names for the skills selected). So I have no problem so far.

2. Now Employers can search for users with specific skill set. Employer can go to a search page which shows all the 100 skills. Employers can tick the appropriate checkboxes & press the submit button to get relevant results. For example, now say an Employer wants to search for all Employees who has PHP, C# and Cobol skills in their resume (Employees may have more skills but PHP & Java skills are mandatory & must exist in Employee resume). So the Employer is going to ticks the corresponding checkboxes that refer to each of those mentioned skills & press the submit button. Now this should show all results of Employees who MUST have those ticked skills in their resume. Here is where the problem starts.

I have written SQL query to get the results but it produces results only for perfect matches i.e if Employer selected PHP, C# and Cobol skills for search, then I get the array values & store them as comma separated values in a string (example: 1, 2, 5) and send it to query. Then only Employees who has selected only 1, 2 and 5 as their skills are shown. But I want to display all users who have even selected 1, 2, 5,10 and 1, 2, 3, 4, 5, 10, 13, 100 and 1, 2, 4, 5, 22, 98. because all these Employees have the skills (1, 2 and 5) that the Employer has originally searched for.

I tried different versions of the same query using AND and OR but of no use. I did come across another way of storing the Employee skills in database & that would be as:

EMPLOYEE ID               EMPLOYEE SKILL ID              EMPLOYEE SKILL NAME
1                                 1                                PHP
1                                 2                                ASP
1                                 5                                C#

But the above method would mean that if I have 100 skills & if an Employee selects at least 50 skills, this would create 50 rows of information, all for the same person. Additionally, I have the Employee resume as one of the fields in the same table which I would like to procure in 1 shot. So creating multiple rows for the same Employee is useless. And talk about the number of rows for, say, 100 Employees who have selected at least 50 skills, that would be 100 (Employees) x 50 (skills) = 5000 rows of information just for 100 people! A very inefficient way, I feel. So I thought comma separated values might help. So in this way, I can get skills, resume & other fields & I have to just maintain only 1 row for every user ( that would be only 100 rows in case of 100 Employees, think about the time save !)

I hope that I have been clear with my question. I just need to get all the Employees that match at least the skills selected by the Employer & not just Exact matches. I would consider Exact match search at a later stage but right now, the former is top priority.

Any help is highly appreciated.

Thanks very much & look forward to your replies.

Recommended Answers

All 14 Replies

Hey,

Why don't you have an employee table, a skills table and an employee_skill table. Hence:

Employee:
Id   |   Name   |   Email   |   etc...

Skills:
Id   |   Skill
 1        PHP
 2        MySQL

Employee_skill:
Id   |   EmployeeId   |   SkillId

Then when you list all the entries in the skills table for an employee to select, when entering their choices in the DB, you'd just need to create an entry in the employee_skill table. And as it'll be simply 3 numerical values, you could have millions of rows in this table without any issues, so long as you add the correct indexes. E.g. an id index, employee id index and a skill id index, and perhaps an employee id/skill id index.


Then when an employer is searching for people with specific skills, find the skill id values and do a query such as:

SELECT *
FROM `employee`
WHERE `id` IN (
[INDENT]SELECT DISTINCT `employeeId`
FROM `employee_skill`
WHERE `skillId` IN (
[INDENT]SELECT `skillId` 
FROM `employer_skills`
WHERE `employerId` = '1221'
[/INDENT]
)[/INDENT]
)

I think the above query will work. And to add more complexity, you could count the number of matches each employee gets to the employers skills and order them by relevancy, hence the most suitable employee is listed first.

R.

Thanks to both of you guys pritaeas & robothy for taking time to post your views/findings.

To pritaeas:
I tried using FIND_IN_SET syntax & it did not produce me the right result. I selected 1, 3, 5 options from the skills checkboxes & performed searched but the query returned 0 values, while an Employee exists whose skills are stored as 1, 3, 5, 7, 9. So I think this method did not work. Additionally, it only worked if I selected only the 1st checkbox (value 1) & it returned me the row in DB. But for multiple checkboxes, it did not return me any values. It's not even working if I select the checkbox with value 3 & perform the search nor any other value as to that matter.

I am hoping that there might be some other way to get around this. I will try digging into this more, but please do let me know if you have any more ideas. I will be waiting.

To robothy:
I haven't tried your method yet, but will give it a shot. Additionally, don't you think that since there will be numerous number of rows, the query speed will be much slower? I am thinking about 1000 Employees with about 50 skills roughly, i.e 50,000 rows of information! Is there anything at all which can help us reduce the time taken to execute & retrieve?
Look forward to your reply as well.

Thank you all once again. I will be awaiting further replies.

Using indexes will speed your queries up massively. At my last job, we used tables with over a million entries, and this query took at most 10 seconds to run.

So if you apply an index to the employee table on the ID, and likewise the skills table on the ID and to the fields in the employee_skill table as I said in my last post, that should be enough.

Best,
R.

Hi robothy,
Thank you once again for a fast reply. I will follow likewise for the index. But I am confused here:

SELECT `skillId` FROM `employer_skills`WHERE `employerId` = '1221'

Where is it coming from. I do not have a table called Employer skills (at least for now..maybe later). Right now, all the Employers would just use a simple form, which will have Skills checkboxes & other criteria. When the information is POST'ed from the form, say we capture the info in an array called $skills, how do I search using this array?

Please let me know. Thank you.

Right, sorry. I thought it may be that an employer has registered and saved the skills they desire in an employee.

Selecting the results from a form would work too. Just have the skills id as the value that is submitted if a checkbox is checked. Therefore, you'd have a list of skill id values to pass straight into your query.

R.

Hi,
But I think I might have some problems trying to pass the array values. Could you please give me the query to do that? Much appreciate if you could please do so. (I had much trouble trying to do it in a for loop previously.)
Thank you.

If you do as I said and have the value of each checkbox the skill id value from the skills table, then when you submit the form you'd get an array of skill values.

Implode the array into a comma separated string, and pass that to an SQL query such as that below:

$strSkillIds = implode(',', $arrSkillIds);

SELECT *
FROM `employee`
WHERE `id` IN (
[INDENT]SELECT DISTINCT `employeeId` FROM `employee_skill` WHERE `skillId` IN (
[INDENT]$strSkillsIds[/INDENT]
    )[/INDENT]
)

R.

Thanks for the query, robothy. I will try that & let you know how it goes.

Thanks much.

Hi robothy,
I tried out that query. I now have a Skills table & an Employee Skills table as well. In Employee Skills table, I have an Employee with skills 1, 3, 5, 7 and 9 (all odd numbers chosen for easy understanding). I created the query just like you have shown me in recent post. From Employer's front-end, I selected 1, 3 and 5 & hit
"Search". Voila! I got the Employee as a result. Yayyy...!

But....
This time I selected 1, 2, 3 and 5 & hit "Search". I still got the same Employee as a result. I tried this with different number combinations & what I have observed is that as long as the Employee selected skill matches with at least one record in Employee Skills table, it will show the Employee. But this would not serve the purpose much because if there's only 1 matching skill out of 15 skills selected by an Employer, I want to show only results that match 100% as that would be the most relevant ones. I tried HAVING clause in combination with IN, but that gave me errors. Looks like HAVING is used mostly when we deal with numbers, counts & aggregates....

So I feel that the query is more or less working like an OR statement right now. Can we make it so that we get 100% matching results?

Look forward to your reply. Thank you.

In my previous post I believe I mentioned that the query would return any result. If you were to count how many times a particular employee matched with the skills requested, and then correlate that back to how many skills were requested. This would then allow you to sort the results by relevancy.

R

Hi robothy,
I tried different combination of the query. It is

$count = count($skills_arr);

$query .= " HAVING count(employee.username) >= $count";

I appended this at the end of the query but it produced the same results as would an OR command do. I do not get 100% matching results. Obviously something is not right ? Please let me know.

Look forward to your reply. Thank you.

SQL really isn't one of my strong skills, so the following may be totally wrong. Anyway, that was what I was getting at:

SELECT *
FROM `employee` AS `t1`
JOIN (
	SELECT COUNT(*) AS `skillsMatched`, `employeeId`
	FROM 
		SELECT `employeeId` FROM `employee_skill` WHERE `skillId` IN (
	        $strSkillsIds
	    )
	) GROUP BY `employeeId`
) AS `t2`
ON `t1`.`id` = `t2`.`employeeId`
WHERE `t2`.`skillsMatched` = 'SKILLS_SELECTED'

replacing SKILLS_SELECTED with the value from

count($arrSkills);
commented: This post paved the way to solving the problem that I had with my own query. +1

R -> Robothy
R -> Rocks
Robothy Rocks! :icon_wink:

That did the trick for me, robothy. I am able to do the search as needed & it's producing me the correct results. It's showing me the 100% matching results. I have tested it about 5 times so far & it's working pretty good!

One concern about the employee_skills table. As you have specified, I coded it so that for each of the selected skills by the Employee in his resume, a unique row is being created for every skill. So for 5 skills checked, it creates 5 rows with the same Emp. ID but with different skill ID.

Say if the Employee wants to uncheck some of the existing skills & selects new skills, to update the information in the employee_skills table, I am first deleting all the entries corresponding to this Employee_ID & then inserting new rows in this. Because of this, the old indexes are lost & the indexes are increasing at a really fast rate & I think that I will run out of indexes soon because there will be many Employees & more than 100 skills that can be checked. So for even 1 change, all previous records are destroyed (& the table index is destroyed as well with Employee_ID & Skill_Id). So is there a way that I can revert back to a lesser count on the Table index? Maybe some code that I can run weekly to get the Table index to start back from 1?

And yes, thanks a ton for the help with the skill search. :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.