I have a database table that is a catalog of each day of business (DOB) for my 3 'stores' ranging over a year. Other than the Store ID and DOB, all other data here is irrelevant for this discussion.

The data entry of this table is not necessarily in chronological order; I may have entered data moving forward BUT may also have gone backwards in an effort to catch up historical info. Hence, there is no integrity or relationship between the Record ID and the DOB throughout this table.

Example: I entered all 31 days of January 2013 then entered all previous November 2012 data before entering Feb 2013 and then jumping back to Dec 2012.....so on and so forth.

What I want to do is find the earliest DOB for a particular Store ID and move forward to the most recent DOB for that Store ID and check whether I'm missing any dates between. Obviously I could do that visually for a week or a month but it becomes a bit harder when going thru a year or more.

I can't imagine repeatedly querying the DB table would be effective or efficient but I'm not sure on the logic in looping thru a datatable.

Can anybody give me a hand here? Appreciate your help.

Recommended Answers

All 16 Replies

Hey maj.

SELECT * FROM myTable
 WHERE StoreID = someid
 ORDER BY DOB

will return a recordset for a given store with DOB sorted from earliest to most recent. Don't forget to use parameterized queries.

It would help a lot to tell us what db you are using.
A couple things come to mind, which would work in MS SQL or MySQL, but not access and can't even tell if it would work on oracle.

Also, is this a one time thing or is it going to be run every x days (when you insert new month) ?

Edit: Does DOB contain only working days (Mon-Fri) or Saturday and Sunday are included? I'm guessing you are OK with checking the national holidays manually.

I'll give you the basic logic, which should work on all dbs, but the implementation will probably require some tweaking depending on what you write it for.

What you need is a list of integers from 0 to the number of days you are looking to check (if it's a year 365 days or whatever, the sky is the limit). After that you join to this list and dateadd your minimum date. if the date created in the dateadd can be joined to your table then you've got that day in your table. If you outer join then you can filter out the existing dates by where table1.date is null

MS SQL Syntax, but will work for MySQL with changes in the dateadd:

declare @min_DOB datetime
select @min_DOB = min(DOB) from table where StoreID = 1

SELECT x.date
FROM table
right join 
(select dateadd(m, num, @min_DOB) AS date from 
(SELECT a.id + b.id AS num FROM (SELECT 1 AS id UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6
UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 0) AS a
CROSS JOIN (SELECT 0 AS id UNION SELECT 10 UNION SELECT 20 UNION SELECT 30 UNION SELECT 40 UNION SELECT 50
UNION SELECT 60 UNION SELECT 70 UNION SELECT 80 UNION SELECT 90 ) AS b 
CROSS JOIN (SELECT 0 AS id UNION SELECT 100 UNION SELECT 200 UNION SELECT 300 UNION SELECT 400 UNION SELECT 500
UNION SELECT 600 UNION SELECT 700 UNION SELECT 800 UNION SELECT 900 ) AS c ) AS x ) as y
on DOB = date
WHERE StoreID = 1
    and DOB is null 

You can eliminate the weekends with and DATENAME(dw, date) not in ('Saturday','Sunday') and you are left to check the national holidays or any other days your stores didn't work.

Please note that the query won't return the StoreID as we are only looking for the missing records (so StoreID is NULL) and that I haven't tested this so it might contain errors.

This is the basic logic to do it on the fly, without looping through the records. It can be performed in access as well, but it won't be able to calculate a set of integers like that. You'll need a table with a sequence of IDs. It's not that hard to create, fill in a list in Excel and copy/paste it into a new table.

If it's a one time thing and you can't get it to work, then you can do it in Excel. Use Reverend's ordered select to get a list of DOBs, paste it into excel and -assuming you've got the dates in column A- write in a blank column =A2=A1+1 and fill it down. Filter the False values and you've got the dates that are missing the previous one(s).

I doubt you want to learn how to do it with a cursor in MS SQL, but if you do, let me know.

@adam_k - Dude! I've shot people for writing queries like that. It is not only unreadable, but it is also impossible to modify.

What I want to do is find the earliest DOB for a particular Store ID and move forward to the most recent DOB for that Store ID

The query I provided does just that.

Without knowing the complete structure of the table, my first suggestion would be to make the primary key compound, consisting of StoreID and DOB. That way the records would always be in the desired (at least for this) order. I'm going to assume you have one record per day. Please correct me if this is not the case.

get the recordset for a store

set currdate to DOB from first record

for each record

    do while currdate < DOB
        output "missing data for ",currdate
        currdate += 1
    loop

    currdate += 1

next

An example of this using ADO is

Dim con As New ADODB.Connection
Dim rec As New ADODB.Recordset

con.Open("Driver={SQL Server};Server=.\SQLEXPRESS;Database=mydb;Trusted_Connection=yes;")
rec.Open("SELECT StoreID, DOB FROM TheMaj WHERE StoreID = 1 ORDER BY DOB", con, CursorTypeEnum.adOpenStatic)

Dim currdate As Date = rec("DOB").Value

Do Until rec.EOF
    Dim nextdate As Date = CDate(rec("DOB").Value)
    Do While currdate < nextdate
        Debug.WriteLine("missing " & currdate)
        currdate = currdate.AddDays(1)
    Loop
    currdate = currdate.AddDays(1)
    rec.MoveNext()
Loop

rec.Close()
con.Close()

First, just to clarify....

This is Access and I'm using VB.NET with DataSets & DataTables.
Yes, there is a record for every day of the calendar year; no issues with WeekDays vs WeekEnds.

Jim, your first SELECT makes perfect sense and is quite easy to understand and employ. Your second entry starting with the "get the recordset for a store" is also easy to understand but I have one problem with it; the "currdate += 1" part.

The DOB in the table is a Long Integer with a format like: 20130115. So I think I need to use a function to convert it and then use something like the "currdate = currdate.AddDays(1)" from your ADO snippet to ensure I have a valid date.

Does that make sense?

Finally, the comment you made 'Don't forget to use parameterized queries.'.....I'm a little foggy on that. Could you give me a simple example? I need to work on those more.

Thanks

I'm curious as to why you would store a date as a long integer when Access supports a date field. The currdate += 1 was just pseudo code which is why I elaborated with some actual code (after I got around to writing it). What the loop is doing is stepping through each record in the returned recordset. At each step we do a catch-up to keep currdate and nextdate in sync. As long as there are no missing dates there is no catch-up necessary. If nextdate skips over days then we do have to do a catch-up and the inner loop will flag the missing dates. Because you are using Access, if you want to use parameterized queries you will have to use OleDb. Example here. But I wouldn't worry about that until you have the other logic worked out.

  • Dude! I've shot people for writing queries like that. It is not only unreadable, but it is also impossible to modif

You understand of course that the post box isn't exactly query analyzer - or what they call it nowadays.
This was about the logic more than anything else.
My query will pick out the missing dates, without having the user go through a billion records and have them do the job or getting a billion records to another layer and then loop through them.
This way of doing it is more efficient and faster. Looping throught records just isn't the way dbs work efficiently.

It's not about efficiency. If you have to hit the database a hundred time a minute repeatedly throughout the day then efficiency becomes more important. The requested query is something that will likely be run once a month at the most. In this case (and in almost all others) clarity and maintainability of code outweighs everything else. Can you honestly tell me that the efficiency of your code (and we haven't determined that it IS more efficient) is such that it outweighs clarity and maintainability?

My query will pick out the missing dates, without having the user go through a billion records

Does this look like the kind of application that will have a billion records per store?

If my comments seemd a bit heavy-handed it's because I spent 29 years as a maintenance programmer. It's especially important to know that I spent a lot of that time on 24x7 standby which means when something stopped working at 3 in the morning, I was the guy who had to come in to fix it. A big part of fixing something is knowing how it works and that comes back to clarity. That long query has a place, but only as an intellectual (or academic) exercise. I once saw Conway's Game of Life completely implemented in one line of APL. It was a demonstration of the power of APL but nobody would advocate writing production code like that.

If it makes you feel better, put the derived tables in a view and use that. BUT:

  1. since it's producing it's own values, it will only fail when UNION or CROSS JOIN are no longer supported.
  2. It is a maintenance free query
  3. It has only 1 step. It is way easier to trouble shoot than to get the records into a dataset and then sort the dataset and then determine the missing dates and then present them to the user.
  4. If it's usage in this application is one per month, it's not going to stop working at 3 am.
  5. Efficient programming and respecting resources are what makes a good programmer.
  6. People reading what we post here aren't going to have the same requirements as OP. "Verba volant, scripta manent".

I totally respect your knowledge, age and experiences.
I've written my fair share of code in the last decade and almost always had problems with poorly commented code or code that didn't follow coding standards (naming a table or a global variable "a" or "aaa").
Long queries were never a problem - and trust me I've had sprocs quite big, undocumented, uncommented, written by a person in the States (I'm in Greece) and I was supporting it by myself. I've carried my laptop to the beach more times than my sun lotion, so I know the feeling of waking up and connecting to see why is everybody sitting instead of working, but I preffer this over the sproc calling sproc calling sproc calling dll calling sproc presenting results to IIS being consumed by ......

Anyway this got to be huge and we are way off topic.

My solution

  1. is clear at a glance
  2. took me less than five munutes to code up
  3. took no time to debug
  4. runs regardless of whether UNION or CROSS JOIN are supported
  5. is maintenance free

I presume that determining which dates are missing would be a prelude to entering the missing data in which case user intervention is required.

My point was not that it might fail at 3 AM. My point was that if it should fail for any reason it would be easy to troubleshoot.

I agree on your point about Efficient programming and respecting resources, however, I could argue that your solution is less efficient in that it puts all of the processing requirements on the database server where resources are more critical because they are shared with whatever other processes are running on the server. My solution only requires a simple selection of a contiguous (if the primary key is set as I suggested) set of records. The remainder of the processing is done on the client machine. There is an inverse relationship between the cleverness of the code and the number of comments required to document it.

I totally respect your knowledge, age and experiences.

Likewise. I consider this a rollicking debate on a difference of viewpoint (something that has been sorely lacking in the Geeks Lounge of late).

almost always had problems with poorly commented code or code that didn't follow coding standards

Oh, the stories I could tell. How about the FORTRAN statement

INTEGER NINE /4/

Wanna know how many hours of wasted time that resulted in?

I thought, for the sake of argument, that I'd run the queries through the query analyzer. Yours came up with

The multi-part identifier "x.date" could not be bound.

I'm running MS SQLExpress

WOW....this is getting entertaining!

I'm first going to answer to Jim's curiousity regarding 'storing a date as a long integer'; hopefully my logic is legitimate although there may be a better way.

The data is retreived from a series of 'dated' folders in the format 20130115 and I've found over the years that I can easily recognize dates that way (must be a dyslexia thing). I also thought that a LngInt was as efficient in memory for Access as a Date. I do have a function that quickly converts my format to a more standard Date format for use in the VB.NET code. I guess this could present some extra processing requirements but most of my work is not that process intensive. Please let me know if this is wrong. I'll post another thread in the near future because I do have that concern on this same project.

In regards to the growing discussion of sprocs, CROSS JOINS, efficient programming and gun control; my attitude is that a Junior Poster like me is looking for a logical, correct, easily understood code that I can expand on as I learn more and more. Jim has seen enough of my postings and recognizes that I'm not an 'professional' software engineer (never took a computer class in my 50+ years). Maybe because of that I prefer getting something to work and then 'dialing it in' as I gain knowledge and confidence.

In this particular project I am the half-assed 'developer' [chuckle] and only user but I have already saved myself and my staff many hours of time by replacing time consuming, error prone manual chores with automated processes that give them the data they need without the pencil, abacus, calculator and reams of paper. I'm currently in Shanghai China and they love & worship creating paperwork...arg!

But I do love the discussions and I do learn a bit along the way.

Thank you both,

Major

It may take less storage to use an int but you lose out on the builtin date math functions. I recommend converting the fields to actual date values. For example, adding one day to 20120131 gets you 20120132 which is probably not what you want. And that's not even considering the problems with leap year calculations.

easily understood code

Ah. A kindred spirit ;-)

I'll take a look at all the tables to see how widespread it is, thankfully it's only in the small catalog/library tables and not my table with over 2 million records! And thankfully this whole discussion is about those smaller tables so it's a good time to implement your suggestion. I'll write a little routine to make this transition and then get back on track to work out the loop you suggested about.

Thanks

@themaj: We are all looking for a solution that we can understand. To be honest, the 1st time I saw this query it was waaaaay out of my league.

@Reverend: Bandwidth is also a resource. In the system I've been supporting/developing until recently, bandwidth was way more valuable than server IO (Plus RF Guns don't have that many resources to use). Plus SQL is better at processing bulk data and it is faster than looping through it. We won't agree, but I enjoy this conversation and the different approach.

Sorry themaj for the continuous out of topic.

I don't learn anything when I have a discussion with someone who agrees with me. I'd like to point out that the original specs consisted of three stores with one record per store per day. I don't imagine bandwidth will ever be a concern here. But I do agree that your approach may be more efficient in some cases. For example, I wrote an app that added data for 8000+ analog points and 8000+ status points per hour to two tables. At each insert there was a possibility of new points (each new point had to be given a unique numeric id on first insert) so the first step was to determine which (if any points were new and add them to the point def table. This was done much more quickly and efficiently in a single query than with a fetch and a loop.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.