string matching with controlled vocabulary

Question

doomas10 0 Newbie Poster

14 Years Ago

Hello,

i have the following query. I have a txt file with data like this:

1 observational study
1.1 cohort study
1.1.1 retrospective cohort study
1.1.2 prospective cohort study
1.2 cross-sectional study

And another file with data like this:

cross-sectional survey 12345.txt
retrospective study 2345.txt
...

I want to do an appropriate string matching. To be more specific i want to read each line of the second file and find the one that is similar (or looks kinda the same) from the first file. So for the first line of the second file the "cross-sectional survey" will be assigned to "1.2 cross-sectional study".

Is there any way of doing this? :-/

python

3 Contributors
4 Replies
229 Views
1 Day Discussion Span
Latest Post 14 Years Ago Latest Post by TrustyTony

All 4 Replies

woooee 814 Nearly a Posting Maven

14 Years Ago

Split() each record. Don't use the first element of the list, join([1:]), in the first file. And, similarly, don't use the last element in the second file.

Edited 14 Years Ago by woooee because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2011-03-23T03:02:26+00:00

You could find this my old function useful: Longest common subsequence

doomas10 0 Newbie Poster · Answer 2 · 2011-03-23T16:46:08+00:00

not sure how is going to work this with thesplit command. I want to compare the strings of the second file with those of the first, and from the comparison i want to get the ones that look similar to the lines of the second file.

For example "retrospective study" is similar to "retrospective cohort study" thus the comparison will return "retrospective cohort study" as the most similar string for the line of the second file "retrospective study" :/

I will try your function and will let you know how it goes. Is there any module that can do this?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 3 · 2011-03-23T23:31:06+00:00

If you read the messages of the thread you can see that there is module difflib module which has connected functionality.

string matching with controlled vocabulary

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers