How to extract data between two dates in a CSV file

Question

stompper33 0 Newbie Poster

14 Years Ago

I'm a newbie to python and I have a problem. My problem is that I have a CSV file with hundreds and hundreds of rows of data. Each row contains temperature data that has been recorded daily. What I want to do is try to extract certain rows of data from the csv file. For example I want to be able to give a function two different days.(mm/dd/yyyy) I'll call one 'initialDay' and the other 'finalDay'. Now I would like to extract the data in between these two days. My data is constructed so that's DATE is a column. Does Python offer any way of extracting just the data between my two dates?? Another quick example: let say I entered:
initialDay = 1/1/2000 and finalDay = 1/4/2000
I would want the daily temp recorded for january 1st 2000 through january 4th 2000.
So if initialDay = finalDay it should only return the temp for that single day.

python

4 Contributors
9 Replies
3K Views
6 Days Discussion Span
Latest Post 14 Years Ago Latest Post by Beat_Slayer

All 9 Replies

snippsat 661 Master Poster

14 Years Ago

Does Python offer any way of extracting just the data between my two dates??

Yes more than one way depends how data are organized.

post an short example off that cvs file.
Mark where 'initialDay' and 'finalDay' are.

snippsat 661 Master Poster

14 Years Ago

If you change Bent Slayer code,to this.
It will give you a list.

initial_date = '9/14/1990'
initial_month, initial_day, initial_year = initial_date.split('/')
final_date = '9/17/1990'
final_month, final_day, final_year = final_date.split('/')

f_in = open('test.csv').readlines()
#f_out = open('filtered_stations_temp.csv', 'w')

my_list = []
for i in range(1, len(f_in)):
    station, date, max_temp, min_temp = f_in[i].split(',')
    month, day, year = date.split('/')
    if initial_month <= month <= final_month and \
    initial_day <= day <= final_day and \
    initial_year <= year <= final_year:
        my_list.append(f_in[i].strip())

#f_out.close()
print my_list

Another way.

start_date = '9/14/1990'
end_date = '9/18/1990'

flag = 1
my_list = []
linelist = open('test.csv')
for line in linelist:
    if start_date in line: # or line.startswith('STATION') ## to get first line
        flag = 0
    if end_date in line:
        flag = 1
    if not flag and not end_date in line:
       #print line,
       my_list.append(line.strip())

print my_list

TrustyTony 888 ex-Moderator

14 Years Ago

The first offered solution is not going to work if statistics goes over month boundary.

The second solution of snippsat is quite usable if each stations data are always sorted in date order and if there is always some statistics from both given days or second day is after last day of statistics.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

stompper33 0 Newbie Poster · Answer 1 · 2010-08-15T23:38:02+00:00

STATION,DATE,MAX,MIN
PRIMET,9/14/1990,30.6,6.3
PRIMET,9/15/1990,33.9,7.7
PRIMET,9/16/1990,32.8,8.9
PRIMET,9/17/1990,31.3,9.3
PRIMET,9/18/1990,31.7,9.7
RS20,9/14/1990,12.2,5.7
RS20,9/15/1990,18.5,3.7
RS20,9/16/1990,24.9,8
RS20,9/17/1990,23.7,9.3
RS20,9/18/1990,26.6,9.4
VANMET,9/14/1990,18.5,6
VANMET,9/15/1990,9.3,6.5
VANMET,9/16/1990,11.8,3.7
VANMET,9/17/1990,11.5,2.3
VANMET,9/18/1990,12.2,1.9

****Above is a short example of my CSV file. Now the first column is different temp Stations where data was recorded. Now lets say I want to extract the rows of data for each station between 9/14/1990 and 9/17/1990. Meaning:

initialDay = 9/14/1990
finalDay = 9/17/1990

The end result should be 4 rows of data for each station being that each station recorded data on those days. So a total of 12 rows. The 3rd and 4th columns represent max and min temps for the day.

thanks

Beat_Slayer 17 Posting Pro in Training · Answer 2 · 2010-08-16T02:10:31+00:00

If i understood correctly:

initial_date = '9/14/1990'
initial_month, initial_day, initial_year = initial_date.split('/')
final_date = '9/17/1990'
final_month, final_day, final_year = final_date.split('/')

f_in = open('stations_temp.csv').readlines()
f_out = open('filtered_stations_temp.csv', 'w')

f_out.write(f_in[0])

for i in range(1, len(f_in)):
    station, date, max_temp, min_temp = f_in[i].split(',')
    month, day, year = date.split('/')
    if initial_month <= month <= final_month and \
    initial_day <= day <= final_day and \
    initial_year <= year <= final_year:
        f_out.write(f_in[i])

f_out.close()

Cheers and Happy coding

stompper33 0 Newbie Poster · Answer 3 · 2010-08-16T03:30:53+00:00

Hey thanks this is very helpful. My only problem is what if I didn't want to create a new file but rather store the filtered data in a list can this be done easily? I was able to write some code that will do this for one specific day of the year. Below is the code:

import csv

def filtered_stations_temps(YearlyData, initial_date):
        stationTemps = []
        for row in YearlyData:
                if row[1] in initial_date:
                        stationTemps.append(row[0:4])
        return list(stationTemps)

if __name__ == '__main__':

        YearlyData = list(csv.reader(open('stations_temps.csv'), delimiter=','))
        print filtered_stations_temps(YearlyData, ['9/14/1990'])

I will need to somehow give 'filtered_stations_temps' another argument for 'final_date' and include it in some statement.

Beat_Slayer 17 Posting Pro in Training · Answer 4 · 2010-08-18T02:15:45+00:00

I know the thread is solved and closed, but I got a question.

How does my solution don't work?

I don't get it.

It will copy all lines in wich the date is between that range, for sure.

Cheers and Happy coding

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 5 · 2010-08-18T11:18:49+00:00

I know the thread is solved and closed, but I got a question.
How does my solution don't work?
I don't get it.
It will copy all lines in wich the date is between that range, for sure.
Cheers and Happy coding

Use:

start_date = '06/12/2007'
end_date = '01/27/2008'

and attached modified csv to see what I meant

Beat_Slayer 17 Posting Pro in Training · Answer 6 · 2010-08-21T17:07:47+00:00

Thanks mate by the clearing up.

I wasn't really seeing it.

import time

#initial_date = '9/14/1990'
initial_date = '6/12/2007'
initial = time.mktime(time.strptime(initial_date, "%m/%d/%Y"))

#final_date = '9/17/1990'
final_date = '1/27/2008'
final = time.mktime(time.strptime(final_date, "%m/%d/%Y"))

f_in = open('stations_temp_tonyjv.csv').readlines()
f_out = open('filtered_stations_temp.csv', 'w')

f_out.write(f_in[0])

for i in range(1, len(f_in)):
    station, date, max_temp, min_temp = f_in[i].split(',')
    date = time.mktime(time.strptime(date, "%m/%d/%Y"))
    if initial <= date <= final:
        f_out.write(f_in[i])

f_out.close()

Cheers and Happy coding.

How to extract data between two dates in a CSV file

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers