Disclaimer

Full disclosure, this is a problem I've recently encountered at work. I couldn't find a reasonable solution and ended up recommending a logistics approach rather than a full code approach to solve it. The code performs a subset of required functionality to give users more options, but doesn't account for all reported formats. If one of you manages to solve it then I'll likely contact you for permission to use the solution in production code. However, after spending quite a bit of time on this, I'm somewhat confident that it's an unsolvable problem due to ambiguity. Anyone who solves it will earn a lot of l33t points in my mind.

On to the challenge!

Challenge

Users have a text box for date entry, and you have no control over this text box. The string must be converted to a DateTime object so that it's a valid date later in the process. However, users are allowed to type strings that are not parsable by the DateTime class. For example (using 01/01/2014):

  • 1114
  • 112014
  • 01114
  • 0112014
  • 10114
  • 1012014
  • 010114
  • 01012014

There is no consistent range for valid years (noted because one of the cases practically requires verification of the year part).

The challenge is to write an interpreter that will take a valid date with any of the above formats and normalize it into a "MM/dd/yyyy" or "MM/dd/yy" format so that DateTime will parse the string correctly. For strictly ambiguous cases, note the assumption you've made, if any. For the purposes of education in this challenge, also note which cases you found to be ambiguous and why.

Extra credit for allowing strings that are already parsable and produce the correct interpretation after removing part separators. Don't assume that users won't type something like "1/1/14" or "01/01/2014".

There's no need to ensure the date itself is valid, DateTime will handle that when parsing. However, you may find that you need to do some validation to ensure a correct interpretation (I certainly did).

Be sure to show your work. And by that I mean test scaffolding that ensures all cases are tested and produce the desired result.

Edited 2 Years Ago by deceptikon

One problem is that some inputs are ambiguous, like: "10114". That can be interpreted as 10/01/2014, or 01/01/2014.

One simple solution would be backtracking.

Assuming ISO8601 order.

Input: "1114"
    You can read the day as 1 or 11.
    Try day as 11:
        You can try to read the month as 01 or 14
        Try month as 01:
            You cannot read the year.
        Try month as 14:
            You cannot read the year.
    Try day as 1:
        You can read the month as 1 or 11.
        Try month as 1:
            You can read year as 2014.
            Try 2014:
                No more characters left. Date is 01/01/2014.

This should work for all variations, and it is easily modifiable to find all possible meanings. The code length should be short and easy to implement in any language. (I don't have a C# compiler on my computer, nor am I familiar enough with it to write one out blindly. I could give you a c/c++ solution though if it helps).

EDIt: I should point out in case it isn't obvious. Read the day as 1 and 2 digits, read the month as 1 and 2 digits, and read the year as 2 and 4 digits.

Edited 2 Years Ago by Hiroshe

Awesome indeed!
I guess strings like "1 jan 14" or "01 1 2014" are allowed also?
I'm going to try to put my teeth in it.
Just had a completely new set installed BTW. Which is ahem also rather awesome ... :)

Edited 2 Years Ago by ddanbe: addition

I guess strings like "1 jan 14" or "01 1 2014" are allowed also?

The use cases I received at least were consistent in the ordering of the month and day, thankfully. If 1/1/2014 could be interpreted as MM/dd/yyyy or dd/MM/yyyy, I wouldn't have even bothered with a code solution and instead said that the users have to be consistent. There's simply too much ambiguity to resolve in that case. ;)

However, it's safe to assume a short date en-US formatted string with optional leading zeros for single digit month and year.

The big issue here is the year. Especially since you also could have a year of say "4" (for 2004).

Why I asked its purpose since you could have given it logic to assume a few items. You could also create pattern recognition if you could link entered values to a user.

Both of those however, still require validation.

One idea I had was that you have the code make multiple assumptions about the date. For instance all the examples above could equal 01/01/2014. Once you have determined all the possible valid dates you ask the User which one is correct.

(With what you have given us, I don't blame you for thinking this might be impossible)

This is exactly why we try to enforce some consistent formats on dates, such as MMDDYYYY, DDMMYYYY, MMDDYY, DDMMYY, etc. As long as the format is agreed upon, then this isn't a major issue, but some systems use YYYYMMDD, or YYMMDD so the dates are properly sortable.

.. then there is the question, do we have to take into account that the users can enter separators like "/" or "-". It wouldn't make it much more complex, but if that is not in the picture (because the imput field is made numeric only) then there is no need for additional logic.

do we have to take into account that the users can enter separators

I have thought of this and if the user uses separators, this could remove some ambiguity. Think of "1012014" and "1/01/2014" this is "Jan 1 2014". The first string could also be interpreted as "Oct 1 2014". Should we give an error here? I know the C# compiler does if he can't figure out an ambiguity.

The actual approach would be fairly easy (although long-winded) if it weren't for the following case:

112014

112014
20th Nov 2014 or;
112014
1st Jan 2014 or;
112014
12th Jan 2014

All three are valid according to your interpretation rules, the code can make no distinction between any of them. Your only option here to ask the user which they mean. A problem that I don't think any heuristics would really be able to solve without an input/result history.

So you could gradually make the system better at guessing. But it won't ever really be 100% perfect.

Extra credit for allowing strings that are already parsable and produce the correct interpretation after removing part separators. Don't assume that users won't type something like "1/1/14" or "01/01/2014".

As my LINQ is improving I'm still in doubt between the next two statements:

//Allow only digit characters, theText = text from TextBox
char[] theDigits = theText.Where(ch => Char.IsDigit(ch)).ToArray();
//or
IEnumerable<char> digitCollection = theText.Where(ch => Char.IsDigit(ch));

Next I wanted to tackle the year and noticed that if the length(or Count) of the digit string is 4 or 5 etc I came to the following table:

f50976de5a566202335266f4a664545c

So how to solve this?
I have no idea. :(

Edited 2 Years Ago by ddanbe: correction

Seeing all of you go through essentially the same process I did makes me feel all warm and squishy. Group hug! ;D

This article has been dead for over six months. Start a new discussion instead.