I have a long list of data structured in the following way

Date , Time, Temperature, Moisture, Accumulated precipitation

1/01/2011, 00:00, 23, 50, 2,
1/01/2011, 00:15, 22, 45, 1,
1/01/2011, 00:30, 20, 39, 0,
1/01/2011, 01:00, 25, 34, 0,
1/01/2011, 01:15, 23, 50, 0,
.
.
.
.
1/01/2011, 23:45, 22, 40, 0,
.
.
.
.
31/01/2011, 00:00, 23, 45, 0,

How I can get the daily averages of the variables Temperature and Moisture for the 31 day of the month, someone has an idea how I can do?

Recommended Answers

All 12 Replies

Let's say d is a date string in the format you posted. Then
m=datetime.date(*map(int,d.split("/"))).month
returns the month as an integer. You could construct a dictionary that collects the data for each month:
dctTemp[m].append(newTemp)

Then you can average the values when you collect all the days.

You define four dictionaries:
sum_temperature day-> sum of temperature on the given day
count_temperature day-> count of temerature data on given day
likewise with moisture.

You read in the file line by line.
You parse one line into day, temerature, moisture. Time and precipitation is not needed for the task. (?)
sum_temperature[day]+=temperature
count_temperature[day]+=1
likewise you do with moisture.

After the file is parsed, you loop on the sum_temperature keys and values. On each day you get the average temperature:
sum_temperature[day]/count_temperature[day]

Problems left for you:
Increasing sum_temperature the first time. Keycheck or defaultdict.
Empty day average.

Now you can ask yourself, the docs, google, or us how do you program it.

and Moisture for the 31 day of the month,

and Moisture for the 31 day of the month

Split the rec to isolate the date and split the date into month, day, and year.

rec="1/01/2011, 00:00, 23, 50, 2,"
split_on_space = rec.split()
print split_on_space[0]
mm, dd, ccyy = split_on_space[0].split("/")
if 31 == int(dd):
    print "Found the 31st"
else:
    print "%s is Not the 31st" % (dd)

how i program it?

If you can not come up with an at least partially working solution, than you should go back to tutorials or easier tasks.

If cannot accomplish how to open a file and read it line by line, then I recommend googling it.
http://rosettacode.org/wiki/Read_a_file_line_by_line

I have a file 'path-tracks.csv' containing date, lat, lon, id, humidity, rainfall data recorded every 15 minutes. I need to obtain daily averages of monthly database. How do I get these daily averages?. One more thing, my monthly database is not continuous, there are days where there were no data at a certain time,How I resolve this problem and I can get daily averages?. I am doing some tests in a file with few data and it has the following form:

data=np.loadtxt('path-tracks.csv',delimiter=',',skiprows=1,dtype=str)
data
array([['02-05-2004 06:15', '19.7', '-95.2', '1', '45', '-38', 'CCM', '1'],
       ['02-05-2004 06:30', '19.7', '-94.7', '1', '34', '-48', 'CCM', '2'],
       ['02-05-2004 06:45', '19.3', '-93.9', '1', '57', '-60', 'CCM', '3'],
       ['02-05-2004 07:15', '19', '-93.5', '1', '89', '-58', 'CCM', '4'],
       ['02-05-2004 07:30', '19', '-92.8', '1', '34', '-50', 'CCM', '2'],
       ['02-05-2004 07:45', '19.2', '-92.6', '1', '23', '-40', 'CCM', '1'],
       ['02-05-2004 08:15', '19.9', '-93', '1', '10', '-43', 'CCM', '1'],
       ['02-05-2004 08:45', '20', '-92.8', '1', '50', '-32', 'CCM', '2'],
       ['30-05-2004 09:15', '23.1', '-100.2', '2', '45', '-45', 'SCME', '3'],
       ['30-05-2004 09:45', '23.2', '-100', '2', '68', '-56', 'SCME', '1'],
       ['30-05-2004 10:15', '23.3', '-100', '2', '90', '-48', 'SCME', '2'],
       ['30-05-2004 10:45', '23.3', '-100.2', '2', '100', '-32', 'SCME',
        '1'],
       ['31-05-2004 03:15', '23.4', '-99', '3', '12', '-36', 'SCM', '1'],
       ['31-05-2004 03:45', '23.5', '-98.9', '3', '34', '-46', 'SCM', '2'],
       ['31-05-2004 04:15', '23.6', '-98.7', '3', '56', '-68', 'SCM', '1'],
       ['31-05-2004 04:45', '23.7', '-98.8', '3', '78', '-30', 'SCM', '1']], 
      dtype='|S16')

Until now, i have making a script for get the average of the all days, my question is: how i can get the average for each day?

import numpy as np
import datetime as dt

f = file('path-tracks.csv','r')
line = f.readline()
line = f.readline()
temp = []
hume = []
preci = []
date = []
while line:
    data = line.split(',')
    temp.append(int(data[4]))
    hume.append(int(data[5]))
    preci.append(int(data[7]))
    date.append(dt.datetime.strptime(data[0],'%d-%m-%Y %H:%M'))
    line = f.readline()


temp = array(temp)
hume = array(hume)
preci = array(preci)
date= array(date)
all = array([date,temp,hume,preci])
print 'max:', all[1].max(), all[2].max(),
print 'suma:', all[3].sum()
print 'mean:', all[1].mean(), all[2].mean()

If you do not have source data for the daily average, than you must decide what to do. This is not a programming question. If you intend to draw some graph of the data, then using the prevoius day's data comes to my mind.

from collections import defaultdict
import datetime 

dataset=[['02-05-2004 06:15', '19.7', '-95.2', '1', '45', '-38', 'CCM', '1'],
       ['02-05-2004 06:30', '19.7', '-94.7', '1', '34', '-48', 'CCM', '2'],
       ['02-05-2004 06:45', '19.3', '-93.9', '1', '57', '-60', 'CCM', '3'],
       ['02-05-2004 07:15', '19', '-93.5', '1', '89', '-58', 'CCM', '4'],
       ['02-05-2004 07:30', '19', '-92.8', '1', '34', '-50', 'CCM', '2'],
       ['02-05-2004 07:45', '19.2', '-92.6', '1', '23', '-40', 'CCM', '1'],
       ['02-05-2004 08:15', '19.9', '-93', '1', '10', '-43', 'CCM', '1'],
       ['02-05-2004 08:45', '20', '-92.8', '1', '50', '-32', 'CCM', '2'],
       ['30-05-2004 09:15', '23.1', '-100.2', '2', '45', '-45', 'SCME', '3'],
       ['30-05-2004 09:45', '23.2', '-100', '2', '68', '-56', 'SCME', '1'],
       ['30-05-2004 10:15', '23.3', '-100', '2', '90', '-48', 'SCME', '2'],
       ['30-05-2004 10:45', '23.3', '-100.2', '2', '100', '-32', 'SCME',
        '1'],
       ['31-05-2004 03:15', '23.4', '-99', '3', '12', '-36', 'SCM', '1'],
       ['31-05-2004 03:45', '23.5', '-98.9', '3', '34', '-46', 'SCM', '2'],
       ['31-05-2004 04:15', '23.6', '-98.7', '3', '56', '-68', 'SCM', '1'],
       ['31-05-2004 04:45', '23.7', '-98.8', '3', '78', '-30', 'SCM', '1']]


daily_data=defaultdict(list)
for data in dataset:
    temp=int(data[4])
    hume=int(data[5])
    preci=int(data[7])
    time=datetime.datetime.strptime(data[0],'%d-%m-%Y %H:%M')
    day=datetime.date(time.year,time.month,time.day)
    daily_data[day].append((temp,hume,preci))

minday,maxday=min(daily_data.keys()), max(daily_data.keys())

day=minday
previous_temp_average=None
while day <= maxday:
    if daily_data[day]!=[]:
        temp_average=sum(data[0] for data in daily_data[day])/sum(1 for data in daily_data[day])
    else:
        temp_average=previous_temp_average
    print("Average temperature for day: %s is %s" % (day,temp_average))
    previous_temp_average=temp_average
    day=day+datetime.timedelta(days=1)

Thanks for your comment, it was a lot of contribution for me. I found another way to create daily averages of many variables, for example of a database that has the following structure:

Fecha,Time, DirViento, MagViento, Temperatura, Humedad, PreciAcu.

Cada columna tiene datos diarios cada 15 minutos.I share the code that calculates the sum of the column PreciAcu and average of temperature and humidity columns:

import numpy as np
import pandas as pd
from numpy import *
from pandas import *
from scipy import *



data = pd.read_csv('tancoyol.csv')
index5=data.set_index(['Fecha','Hora'],inplace=True)

grouped = index5.groupby(level=0)

`stat_cea = grouped.agg({'Temperatura':np.mean,'Humedad':np.mean,'PreciAcu':np.sum})`


print 'Done............'

Now I have one more question, I need to convert to radians the DirViento column, how I can do this and add this new column to my data file? . In summary,i need to obtain 3 columns from the DirViento column.

Specifically:
First I need to convert each of the column values in DirViento to Radians (New column called Rad)
Second I need to get the sine of Rad column (new column called Sin)
Third I need to get the cosine of Rad column (new column called Cos).

How I can achieve this?

You use math.sin, math.cos and math.pi.

You didn't tell us what was in the DirViento column.
I sincerely hope you are not trying to convert a temperature value in degrees to a temperature value in radians...

I'm guessing it's "wind direction"

BearofNH:
Thanks for your comments, but I'm not doing that you think.

rrashkin:
Effectively, I need to calculate average wind, Temperature, Moisture and Sum of Precipitation from a monthly database has daily data recorded every 15 minutes. A sample of the dataframe is as follows:

                DirV MagV Temp HumR  PreciAcu

Fecha_Hora
0 2011/07/01 00:00 318 6.6 21.22 100 1.7
1 2011/07/01 00:15 342 5.5 21.20 100 1.7
2 2011/07/01 00:30 329 6.6 21.15 100 4.8
3 2011/07/01 00:45 279 7.5 21.11 100 4.2
4 2011/07/01 01:00 318 6.0 21.16 100 2.5
5 2011/07/01 01:15 329 7.1 21.13 100 4.0
6 2011/07/01 01:30 300 4.7 21.15 100 1.3
7 2011/07/01 01:45 291 3.1 21.23 100 2.2
8 2011/07/01 02:00 284 7.6 21.29 100 1.3
10 2011/07/02 00:00 281 3.6 21.47 100 3.2
11 2011/07/02 00:15 360 2.7 21.52 100 2.5
12 2011/07/02 00:30 57 1.2 21.53 100 0.0
13 2011/07/02 00:45 300 3.4 21.69 100 0.0
14 2011/07/02 01:00 359 5.9 21.67 100 0.0
15 2011/07/01 01:15 309 1.8 21.65 100 0.0

I'm using pandas to read the data and I do calculations with numpy and scipy. After taking some considerations such as: if some value is 0, replace to 360, Obtain the components u and v and if the component v is 0, then resuling wind direction is 0 if no is atan(u/v) and magnitude whit (u2 + v2)**1/2.

for get daily mean and sum i do the next:

index5=data.set_index(['Fecha','Hora'],inplace=True)
grouped = index5.groupby(level=0)
stat_cea=grouped.agg({'MagRes':np.mean,'DirRes':np.mean,'Temperatura':np.mean,'Humedad':np.mean,'PreciAcu':np.sum})

Fecha DirRes Humedad MagRes PreciAcu Temperatura

2011/07/01 141.0588 100 4.6475 30.4 21.35
2011/07/02 153.8235 99 3.3455 18.0 21.841765
2011/07/03 133.8824 99 4.6470 4.0 21.347059

Now I want to save the output to a excel file, but I would like the columns were rounded to 1, eg: 153.8235 = 153.8. After that i get this, now i want to convert the columns: DirRes to an integer and MagRes, PreciAcu and Temperatura remain as float.

How can I achieve this in pandas or if there is another way to get this, appreciate your help

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.