Im curious if anyone knows if its possible to write a program to generate a regular expression given a finite automation.

To make things less complicated I want to limit the number of states to about 4, assume the FA is in minimal form and that the FA has only one FinalState and only one StartState.

Ive been thinking about it for a while now and I think the first obvious thing to do would be to create a transition table for the FA.

So an FA could look like this:

NumberOfStates 4 
StartState   1 
FinalState   4 
StateNumber  NextStateA   NextStateB
1            2            4
2            3            2
3            4            4

And would generate the regular expression: b + (ab*a(a + b))

Ive been racking my brain for hours but am stumped on how to go about this. Any ideas is greatly appreciated.

Definately possible but quite challenging to find an optimal solution. Going to write down a quick first thought..

To make your example a little bit more complex say there's the following lines as well:

StateNumber  NextStateC
2            1
3            1

Given have the table filled already every branch of paths would resulting in building up it's own operand of an '+' (or).

From the start state you'd be able to deduct the following paths:

1 -> 2 : a
1 -> 4 : b

then you continue with 2 and 4. Information known about 2:

2 can reach 2 with b*.
2 can reach 1 with c.
2 can reach 3 with a.

This means an extension of our paths. First the directly described paths:

2 -> 2 : b*
2 -> 1 : c
2 -> 3 : a

After adding new paths you should always try to deduce new paths from them. in this case it would add the following. (you should deduct new paths after every deduction too, basically any path addition):

So adding 2->2: b* would result in: All paths that start with 2 get preceded by b* and all paths that end with 2 would be extended with b*. I reference paths in that's how you'll likely want your program to deal with it and it makes it easier for me to explain. The resulting paths after this addition are:

2 -> 2 : b*
1 -> 2 : a<ref:2->2>
1 -> 4 : b

Now we add the 2->1: c rule. we have a 2->2 path so we should prefix it with it's definition so it becomes b*c. The paths so far would be like this:

2 -> 2 : b*
1 -> 2 : a<ref:2->2>
1 -> 4 : b
2 -> 1 : <ref:2->2>c

Because we added a path we should check for possible deductions. (Check existing paths ending in two and see if their starting point going to 1 would result in a new path or path starting with 1) In this case we can deduct a few new paths:

1 -> 1 
2 -> 2 
2 -> 4

Adding the first rule results in the same procedure as done earlier so it becomes this:

2 -> 2 : b*
1 -> 2 : <ref:1->1>a<ref:2->2>
1 -> 4 : <ref:1->1>b
2 -> 1 : <ref:2->2>c<ref:1->1>
1 -> 1 : <ref:1->2><ref:2->1>

Then adding the rule 2 -> 2 the transformation is again following the same procedure. Note that we did already have a 2->2 definition and this should be OR'd. You'll want to replace the old definition in existing paths. So it becomes:

2 -> 2 : ((b*) + (<ref:2->1><ref:1->2>))*
1 -> 2 : <ref:1->1>a<ref:2->2>
1 -> 4 : <ref:1->1>b
2 -> 1 : <ref:2->2>c<ref:1->1>
1 -> 1 : <ref:1->2><ref:2->1>

The next rule we'd add would be 2 -> 4 : <ref:2->1><ref:1->4>. This addition results in the following deductions:

1->4 is extended with a new option:

1 -> 4 : ((<ref:1->1>b) + (<ref:1->2><ref:2->4>))

What's interesting for this case as that deductions could be endless. (unless you can detect the expressions are symantically correct) because 1->4 would deduce to 2->4 because a 2->1 path existing. But because a 2->4 path being part of the 1->4 path it should not be added.

The next line is the last non-deducted rule of node 2, which was 2 -> 3 : a. We prefix it with the 2->2 rule and then add it to our paths:

2 -> 2 : ((b*) + (<ref:2->1><ref:1->2>))*
1 -> 2 : <ref:1->1>a<ref:2->2>
1 -> 4 : ((<ref:1->1>b) + (<ref:1->2><ref:2->4>))
2 -> 1 : <ref:2->2>c<ref:1->1>
2 -> 3 : <ref:2->2>a
1 -> 1 : <ref:1->2><ref:2->1>
2 -> 4 : <ref:2->1><ref:1->4>

This leads to the deduction of 1->3:

1 -> 3 : <ref:1->2><ref:2->3>

which could result in 2->3 but it's not added because 2->3 is part of 1->3.

The last paths to consider are those from node 3. (as node 4 has none) rules for this node:

3 can reach 1 with c.
3 can reach 4 with (a + b).

Adding 3->1 : c would first add a suffix (because we have 1->1):
The paths would look like this:

2 -> 2 : ((b*) + (<ref:2->1><ref:1->2>))*
1 -> 2 : <ref:1->1>a<ref:2->2>
1 -> 4 : ((<ref:1->1>b) + (<ref:1->2><ref:2->4>))
2 -> 1 : <ref:2->2>c<ref:1->1>
2 -> 3 : <ref:2->2>a
3 -> 1 : c<ref:1->1>
1 -> 1 : <ref:1->2><ref:2->1>
2 -> 4 : <ref:2->1><ref:1->4>
1 -> 3 : <ref:1->2><ref:2->3>

Then we check for deductable paths again:
2->1 gets extended:

2 -> 1 : ((<ref:2->2>c<ref:1->1>) + (<ref:2->3><ref:3->1>))

1->1 gets extended:

1 -> 1 : ((<ref:1->2><ref:2->1>) + (<ref:1->3><ref:3->1>))*

and the other way around is an option too:

3 -> 3 : (<ref: 3->1><ref: 1->3>)*

The last two deductions for this addition are the following:

3 -> 2 : <ref:3->1><ref:1->2>
3 -> 4 : <ref:3->1><ref:1->4>

This first one would result in an extension of 2->2, 1->2, 3->1:

2 -> 2 : ((b*) + (<ref:2->1><ref:1->2>) + (<ref:2->3><ref:3-2>) )*
1 -> 2 : ((<ref:1->1>a<ref:2->2>) + (<ref:1-3><ref:3->2>))
3 -> 1 : ((c<ref:1->1>) + (<ref:3->2><ref:2->1>))

3->3 is not modified because the <ref:3->2><ref:2->3> deduction already exist in in a different format. The same goes for 3->4.

The second one would result in the following extensions:

1 -> 4 : ((<ref:1->1>b) + (<ref:1->2><ref:2->4>) + (<ref:1->3><ref:3->4>))
2 -> 4 : ((<ref:2->1><ref:1->4>) + (<ref:2->3><ref:3->4>))

Then the final addition would be

3 -> 4 : (a + b)

This would result in the following:

2 -> 2 : ((b*) + (<ref:2->1><ref:1->2>) + (<ref:2->3><ref:3-2>))*
1 -> 2 : ((<ref:1->1>a<ref:2->2>) + (<ref:1-3><ref:3->2>))
1 -> 4 : ((<ref:1->1>b) + (<ref:1->2><ref:2->4>) + (<ref:1->3><ref:3->4>))
2 -> 1 : ((<ref:2->2>c<ref:1->1>) + (<ref:2->3><ref:3->1>))
2 -> 3 : (<ref:2->2>a)
3 -> 1 : ((c<ref:1->1>) + (<ref:3->2><ref:2->1>))
1 -> 1 : ((<ref:1->2><ref:2->1>) + (<ref:1->3><ref:3->1>))*
2 -> 4 : ((<ref:2->1><ref:1->4>) + (<ref:2->3><ref:3->4>))
1 -> 3 : (<ref:1->2><ref:2->3>)
3 -> 3 : (<ref: 3->1><ref: 1->3>)*
3 -> 2 : (<ref:3->1><ref:1->2>)
3 -> 4 : ((<ref:3->1><ref:1->4>) + (a + b))

You could then extract the expression by rewriting 1->4 until all references are gone. I might have made a mistake when writing this up (limited time) but the principle should be clear. (The proces above is done in a systematic way even though some steps could have been done faster. An example would be rewriting section to something smaller that is semantically identical) It's also easier to program than to write in text..

It would be more challenging to aim for optimal performance though. The hardest part about this is probably determining when to add deducted paths or when they should be skipped as they offer nothing new.

Using the above approach with your initial example (which is a lot simpler) you'd get the following pathlist:

1->2: a<ref:2->2>
1->4: (b + (<ref:1->3><ref:3->4>))
2->2: b*
2->3: <ref:2->2>a
1->3: <ref:1->2><ref:2->3>
3->4: (a + b)

Then writing 1->4 out results in:

1->4: (b + (<ref:1->3><ref:3->4>))
1->4: (b + (<ref:1->2><ref:2->3>(a + b)))
1->4: (b + (a<ref:2->2><ref:2->2>a(a + b)))
1->4: (b + (ab*b*a(a + b)))

Which is the same as

b + (ab*a(a + b))
This article has been dead for over six months. Start a new discussion instead.