I'm new here...but
The things I'll talk about came up as I was doing independent research and it's not homework related, however I'm not sure where else I should post it except in a computer science subforum. Since it's not homework, you might want to give it less priority.
I have a state graph (directed acyclic) with a beginning state (lets call it S) and an end state (N). Each transition happens in the following way:
When I'm currently at the state s, I choose an "action" from a set of actions (lets call it A). Then there is a probability distribution P(a), a E A to move from the current state to a set of states according to the action taken.
The problem is that at each step, I want to choose the action which minimizes the expected number of remaining hops to the end state.
Is there a name for this model or similar/equivalent one?
Has this algorithm been thought of before?