943,634 Members | Top Members by Rank

Ad:
You are currently viewing page 1 of this multi-page discussion thread
Mar 6th, 2008
0

Data model for storing boolean expressions

Expand Post »
Hi all,

(this is a repost from here)

I'm working on a relational data model in which I'd like to store boolean expressions. I'm a little rusty so I'm humbly asking for a little push.

For example, let's say I have the following boolean expression:

((A = 8 OR A >= 9) AND B < 3) OR NOT (A <> 8 AND B = 0)

What would be a simple, elegant data model to store this kind (or any kind) of expression? Also important, how would you query the database in order to rebuild the logical expression from the stored data?

BTW, any kind of help would be appreciated. Reference to books, papers, web sites are perfectly ok.

Hey, thanks!
Reputation Points: 10
Solved Threads: 1
Light Poster
LeBurt is offline Offline
38 posts
since Mar 2008
Mar 9th, 2008
0

Re: Data model for storing boolean expressions

At first glance the answer is 'store it as is in a varchar field'. But it seems you want to do something else. Perhaps if you gave us the context of this problem, it would suggest a design. What do you really want to do?

Are these answers to questions? Are they questions to be answered?
Reputation Points: 18
Solved Threads: 20
Junior Poster
trudge is offline Offline
176 posts
since Sep 2007
Mar 10th, 2008
0

Re: Data model for storing boolean expressions

Fair enough, here's a little more information.

I'm writing a data mining application where the database has about a hundred tables sharing close to a thousand fields among them. The application lets users -- university researchers mainly -- create ad-hoc queries on any table, any field to "water down" the data. That is to say, the user input, all done into web forms, is made into a valid SQL query that filters the data. From there they perform statistical analysis and what have you on the resulting data.

The current ad-hoc querying module I wrote is fairly simple but also quite limited. Users can select a table and a field (operand A), an operator, and a value (operand A'). In its simplest form, you get a boolean expression like this:

A > A'

For the same operand A, users can create an OR rule as such:

(A > A' OR A = A'' OR A <= A''')

Furthermore, users can create AND rules for different operands B and C like this:

(A > A' OR A = A'' OR A <= A''')
AND (B <= B' OR B = B'')
AND (C <> C')

...and so on. There is no limit to the amount of OR rules you can put on one line nor is there any limit to the number of AND rule lines.

Like I said, the above model is fairly simple, I store all rules in three tables:

Filter
---------
* id
name

AndRule
-----------
* id
Filter_id (FK)
OperandA
Operator
OperandA'


OrRule
----------
* id
AndRule_id (FK)
Operator
OperandA'

Of course my model breaks when trying to store more complex expressions. For example, I can't store this:

(A = A' OR B = B')
OR (A <> A'' AND B <> B'')

Hence me asking: is there a known way to store expressions (any kind really, they don't have to be boolean I suppose) is a relational model supporting any combination of operators and parenthesis?

Storing the expression (or the resulting SQL query) in a varchar field is of course feasible but I was hoping for something more flexible, i.e. that could be displayed/modified by the user in the same UI where they built the expression in the first place without having to parse it. In other words, I by far prefer a broken down expression that I can assemble fairly easily, rather than a text expression that I must break down anyway if I want to edit it or build a valid SQL query from it.
Reputation Points: 10
Solved Threads: 1
Light Poster
LeBurt is offline Offline
38 posts
since Mar 2008
Mar 10th, 2008
0

Re: Data model for storing boolean expressions

This is indeed a very interesting problem. Thank you

Questions:

Are you allowing the logical expressions XOR and NOT to be included in your model?

Will there ever be more than 3 variables (A,B,C)?

I'm not clear on your use of A=A' (A prime). Can you explain a little more?

If I understand correctly, you want to map the combinations of logical operators with an unknown but finite number of variables. For argument's sake let's say there are 4 logical operators (AND, OR, NOT, XOR) and 3 variables (A, B, C). As well, there are the equivalancy operators (=, <, >, <=, >=, <>).

So, there are 3 things to track (logical operators, variables, equivalency operators). In my mind this indicates 3 tables. Since these are fairly atomic values, they can all be stored as boolean values themselves in the tables (0,1). This is going to get very confusing very quickly

Is this the right interpretation of your problem?

For some further insight try to get a copy of 'Data & databases: Concepts in practice' by Joe Celko.
Last edited by trudge; Mar 10th, 2008 at 6:51 am.
Reputation Points: 18
Solved Threads: 20
Junior Poster
trudge is offline Offline
176 posts
since Sep 2007
Mar 10th, 2008
0

Re: Data model for storing boolean expressions

Quote ...
Are you allowing the logical expressions XOR and NOT to be included in your model?
No, since the expressions translate to SQL statements (the WHERE clause to be more specific), I'm sticking to the usual AND and OR operators for the moment.

Quote ...
Will there ever be more than 3 variables (A,B,C)?
Yes, in the current model there is no limit to the number of AND rules, therefore no limit to the number of variables.

Quote ...
I'm not clear on your use of A=A' (A prime). Can you explain a little more?
I use this notation to illustrate in a generic form the two operands of an equivalence test. In practice, I could have said Field = Value, since...

A, B, C... are field names in a table out of the thousand fields we are mining into. For example, if I have a table X in my database with fields X1, X2 and X3, variable A would represent one of those fields (X.X1), B another field (X.X3) and C could be Y.Y65 in a different table. I'm mining into about 1000 fields spread across 100 tables as it is right now.

A', B', C'... are values we are testing against the selected fields. For example, if X1's type is numerical, the expression A = A' could translate to X.X1 = 10. This is how it would be stored in the AndRule table:

A | Operator | A'
-------------------------
X.X1 | = | 10
X.X3 | >= | 10 Feb 1971
Y.Y65 | = | 'apple'

You'll notice that I can test against numerical values, dates and strings. Of course to do this I must track each and every field in the database and know its data type.

Quote ...
So, there are 3 things to track (logical operators, variables, equivalency operators). In my mind this indicates 3 tables.
I would add a fourth dimension to track: the precedence set by the use of parentheses. These two expressions are not logically identical:

(A AND B) OR C
A AND (B OR C)

To me, that is the most difficult part and the main reason why I'm looking for a shortcut
Thanks for the help and the book reference, I'll look it up.
Reputation Points: 10
Solved Threads: 1
Light Poster
LeBurt is offline Offline
38 posts
since Mar 2008
Mar 11th, 2008
0

Re: Data model for storing boolean expressions

So you are trying to create ad-hoc queries on the fly? Is this correct?

I'm not sure why you need to store the elements of a query in a database. If you want users to query a database directly, then provide a controlled environment where they pick ONLY the elements you want them to access. For example drop down lists, radio buttons, check boxes, etc. Build the query from that and execute it. If this is the case you will also want to code against SQL injection attacks. You will also have to allow for an unknown number of variables using some client-side programming.

Or am I completely missing something here? What kind of data are you querying?
Reputation Points: 18
Solved Threads: 20
Junior Poster
trudge is offline Offline
176 posts
since Sep 2007
Mar 11th, 2008
0

Re: Data model for storing boolean expressions

Correct.

I feel we're going a little off topic here but if you really must know, the data is medical and I already have a controlled ad-hoc environment with everything you suggest.

The purpose of this application is scientific. We're mining data and looking for statistical meaning where none was before. If a user works hard at creating a filter that will produce a list of brain scans with such and such characteristics belonging to people that show this and that symptom and are smokers or have a genotype like this or like that, you can probably guess that there are many reasons to save that query. I can think of two big ones: 1) perhaps the user will want to come back later, change a single parameter out of the 30 he already set and get a new filtered dataset from there without having to start over; 2) if that researcher publishes a science paper based on the data, I would imagine he'd want a little bit of traceability, i.e. be able to tell exactly where the data is coming from.

I've said enough about the what and the why, any idea how?
Reputation Points: 10
Solved Threads: 1
Light Poster
LeBurt is offline Offline
38 posts
since Mar 2008
Mar 11th, 2008
0

Re: Data model for storing boolean expressions

Thank you for the further information. You're right, it isn't pertinent, I was just trying to make sure I had the right interpretation to work with.

Saving a 'view' of the query along with the exact query isn't acceptable?
Reputation Points: 18
Solved Threads: 20
Junior Poster
trudge is offline Offline
176 posts
since Sep 2007
Mar 11th, 2008
0

Re: Data model for storing boolean expressions

There's certainly merit to the idea as it solves reason #2 in my previous post. It makes it somewhat harder however when wanting to edit the query, I would have to:

a) parse the WHERE clause
b) rebuild the web form
c) accept user input
d) rebuild the WHERE clause
e) rerun the query

Following Occam's Razor logic, if I store it in pure boolean form, I save step a) which in my mind is always a royal pain (maybe because I'm no better at writing parsers than at designing an "expression data model" - sigh).

But thanks for the suggestion, I'll certainly consider it if I come to an absolute stop with my current efforts.
Last edited by LeBurt; Mar 11th, 2008 at 6:26 pm.
Reputation Points: 10
Solved Threads: 1
Light Poster
LeBurt is offline Offline
38 posts
since Mar 2008
Mar 11th, 2008
0

Re: Data model for storing boolean expressions

I'm about at the limit of any help I might be able to provide you. This is a very good exercise.

If you were to save the query verbatim, could you not then present it back to the user for editing? Obviously they must know the structure of the database or they wouldn't have been able to create the query in the first place. Your task would be to accept an edited query (properly sanitized) before executing it. An option to save the new query results might be available.

If you haven't yet done so, you may want to check the newsgroups - there are some good ones for database design, and that is where I first ran into Joe Celko. He actually responded to a query I had a few years ago.
Last edited by trudge; Mar 11th, 2008 at 6:39 pm.
Reputation Points: 18
Solved Threads: 20
Junior Poster
trudge is offline Offline
176 posts
since Sep 2007

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Database Design Forum Timeline: Populating a foreign key table with variable user input
Next Thread in Database Design Forum Timeline: Database Migration





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC