0

I have table something like [ID, ITEM_ID, URI, IMAGE_FILE_NAME, UPDATED_AT etc.].

I need to compare URI that can be something like /system/images/777/medium/Debrecen_-_University.jpg?1279547675 with IMAGE_FILE_NAME that is as Debrecen_-_Protestant_Great_Church.jpg

Is there a way so I can provide IMAGE_FILE_NAME as sort of substring of URI and check if that substring is part of the string?

3
Contributors
4
Replies
6
Views
7 Years
Discussion Span
Last Post by tesuji
1

Well, I think what he is looking for is to find the longest common substring (LCS) of both strings

/system/images/777/medium/Debrecen_-_University.jpg?1279547675

Debrecen_-_Protestant_Great_Church.jpg

where the green marked substrings seems to be such LCS. There is also a longest common subsequence problem where the sequence of chars need not to be contiguously stored.

LCS is a famous problem usually solved by dynamic programming, its time complexity is O(mn). Some theory can be found here. There are also implementations on wikibooks in various languages. Unfortunately, I haven't seen any implementation in PSM (pl/sql) for SQL user defined function (UDF). I am sure one cannot do that in SQL queries directly. So an UDF written in PSM or even a C/C++ implementation what can be called from SQL, what is possible as for example in postgresql, seems to be a solution. Yet what I have seen are some PHP implementations having poor O(n^4).

-- tesu

Edited by tesuji: n/a

1

Hi Peter

I couldnĀ“t stop thinking about the longest-common-substring problem for SQL language (user defined function). So I took and old version of LCS which I once implemented in plain C. Its time complexity is O(m*n), storage O(m+n). I translated this C version line by line into a user defined function (PSM of SQL 1999) for Microsoft SQL Server and Sybase database. Finally I also compiled a Java version from my old C LCS function.

All three functions implement the very same LCS algorithm and I myself can also affirm that every function is based on the same programming methods and approaches what guaranties that they have same O(m*n).

Then I did some benchmarks using your sample strings containing Hungarian Debrecen. I executed all three functions/ methods 100000 times on these strings on an "medium upper class" win XP computer.

[I]Here are the results for 100000 function calls:

Programming Language          Execution time
-------------------------------------------- 
plain C (no classes)             2.2 seconds       
Java, static member              2.8 seconds
SQL user defined function     2658.0 seconds [/I]

I am very surprised at the poor SQL-UDF result. I had never believed that SQL-UDF would consume a thousand times execution time of C/Java function. So it is rather out of the question to implement LCS algorithm in SQL user defined function for practical usage.

This benchmark pleasantly surprised one by the computing power of Java. It put up a great fight by contrast with fast C for this non-numerical algorithm.

-- tesu

Edited by tesuji: n/a

Votes + Comments
Nice test ;)
This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.