## glez_b

I have a long list of data structured in the following way

Date , Time, Temperature, Moisture, Accumulated precipitation

1/01/2011, 00:00, 23, 50, 2,
1/01/2011, 00:15, 22, 45, 1,
1/01/2011, 00:30, 20, 39, 0,
1/01/2011, 01:00, 25, 34, 0,
1/01/2011, 01:15, 23, 50, 0,
.
.
.
.
1/01/2011, 23:45, 22, 40, 0,
.
.
.
.
31/01/2011, 00:00, 23, 45, 0,

How I can get the daily averages of the variables Temperature and Moisture for the 31 day of the month, someone has an idea how I can do?

## rrashkin 41

Let's say d is a date string in the format you posted. Then
`m=datetime.date(*map(int,d.split("/"))).month`
returns the month as an integer. You could construct a dictionary that collects the data for each month:
`dctTemp[m].append(newTemp)`

Then you can average the values when you collect all the days.

## slate 241

You define four dictionaries:
sum_temperature day-> sum of temperature on the given day
count_temperature day-> count of temerature data on given day
likewise with moisture.

You read in the file line by line.
You parse one line into day, temerature, moisture. Time and precipitation is not needed for the task. (?)
sum_temperature[day]+=temperature
count_temperature[day]+=1
likewise you do with moisture.

After the file is parsed, you loop on the sum_temperature keys and values. On each day you get the average temperature:
sum_temperature[day]/count_temperature[day]

Problems left for you:
Increasing sum_temperature the first time. Keycheck or defaultdict.
Empty day average.

Now you can ask yourself, the docs, google, or us how do you program it.

## woooee 814

and Moisture for the 31 day of the month,

and Moisture for the 31 day of the month

Split the rec to isolate the date and split the date into month, day, and year.

``````rec="1/01/2011, 00:00, 23, 50, 2,"
split_on_space = rec.split()
print split_on_space[0]
mm, dd, ccyy = split_on_space[0].split("/")
if 31 == int(dd):
print "Found the 31st"
else:
print "%s is Not the 31st" % (dd)
``````

## glez_b

how i program it?

## slate 241

If you can not come up with an at least partially working solution, than you should go back to tutorials or easier tasks.

If cannot accomplish how to open a file and read it line by line, then I recommend googling it.

## glez_b

I have a file 'path-tracks.csv' containing date, lat, lon, id, humidity, rainfall data recorded every 15 minutes. I need to obtain daily averages of monthly database. How do I get these daily averages?. One more thing, my monthly database is not continuous, there are days where there were no data at a certain time,How I resolve this problem and I can get daily averages?. I am doing some tests in a file with few data and it has the following form:

``````data=np.loadtxt('path-tracks.csv',delimiter=',',skiprows=1,dtype=str)
data
array([['02-05-2004 06:15', '19.7', '-95.2', '1', '45', '-38', 'CCM', '1'],
['02-05-2004 06:30', '19.7', '-94.7', '1', '34', '-48', 'CCM', '2'],
['02-05-2004 06:45', '19.3', '-93.9', '1', '57', '-60', 'CCM', '3'],
['02-05-2004 07:15', '19', '-93.5', '1', '89', '-58', 'CCM', '4'],
['02-05-2004 07:30', '19', '-92.8', '1', '34', '-50', 'CCM', '2'],
['02-05-2004 07:45', '19.2', '-92.6', '1', '23', '-40', 'CCM', '1'],
['02-05-2004 08:15', '19.9', '-93', '1', '10', '-43', 'CCM', '1'],
['02-05-2004 08:45', '20', '-92.8', '1', '50', '-32', 'CCM', '2'],
['30-05-2004 09:15', '23.1', '-100.2', '2', '45', '-45', 'SCME', '3'],
['30-05-2004 09:45', '23.2', '-100', '2', '68', '-56', 'SCME', '1'],
['30-05-2004 10:15', '23.3', '-100', '2', '90', '-48', 'SCME', '2'],
['30-05-2004 10:45', '23.3', '-100.2', '2', '100', '-32', 'SCME',
'1'],
['31-05-2004 03:15', '23.4', '-99', '3', '12', '-36', 'SCM', '1'],
['31-05-2004 03:45', '23.5', '-98.9', '3', '34', '-46', 'SCM', '2'],
['31-05-2004 04:15', '23.6', '-98.7', '3', '56', '-68', 'SCM', '1'],
['31-05-2004 04:45', '23.7', '-98.8', '3', '78', '-30', 'SCM', '1']],
dtype='|S16')
``````

Until now, i have making a script for get the average of the all days, my question is: how i can get the average for each day?

``````import numpy as np
import datetime as dt

f = file('path-tracks.csv','r')
temp = []
hume = []
preci = []
date = []
while line:
data = line.split(',')
temp.append(int(data[4]))
hume.append(int(data[5]))
preci.append(int(data[7]))
date.append(dt.datetime.strptime(data[0],'%d-%m-%Y %H:%M'))

temp = array(temp)
hume = array(hume)
preci = array(preci)
date= array(date)
all = array([date,temp,hume,preci])
print 'max:', all[1].max(), all[2].max(),
print 'suma:', all[3].sum()
print 'mean:', all[1].mean(), all[2].mean()
``````

## slate 241

If you do not have source data for the daily average, than you must decide what to do. This is not a programming question. If you intend to draw some graph of the data, then using the prevoius day's data comes to my mind.

``````from collections import defaultdict
import datetime

dataset=[['02-05-2004 06:15', '19.7', '-95.2', '1', '45', '-38', 'CCM', '1'],
['02-05-2004 06:30', '19.7', '-94.7', '1', '34', '-48', 'CCM', '2'],
['02-05-2004 06:45', '19.3', '-93.9', '1', '57', '-60', 'CCM', '3'],
['02-05-2004 07:15', '19', '-93.5', '1', '89', '-58', 'CCM', '4'],
['02-05-2004 07:30', '19', '-92.8', '1', '34', '-50', 'CCM', '2'],
['02-05-2004 07:45', '19.2', '-92.6', '1', '23', '-40', 'CCM', '1'],
['02-05-2004 08:15', '19.9', '-93', '1', '10', '-43', 'CCM', '1'],
['02-05-2004 08:45', '20', '-92.8', '1', '50', '-32', 'CCM', '2'],
['30-05-2004 09:15', '23.1', '-100.2', '2', '45', '-45', 'SCME', '3'],
['30-05-2004 09:45', '23.2', '-100', '2', '68', '-56', 'SCME', '1'],
['30-05-2004 10:15', '23.3', '-100', '2', '90', '-48', 'SCME', '2'],
['30-05-2004 10:45', '23.3', '-100.2', '2', '100', '-32', 'SCME',
'1'],
['31-05-2004 03:15', '23.4', '-99', '3', '12', '-36', 'SCM', '1'],
['31-05-2004 03:45', '23.5', '-98.9', '3', '34', '-46', 'SCM', '2'],
['31-05-2004 04:15', '23.6', '-98.7', '3', '56', '-68', 'SCM', '1'],
['31-05-2004 04:45', '23.7', '-98.8', '3', '78', '-30', 'SCM', '1']]

daily_data=defaultdict(list)
for data in dataset:
temp=int(data[4])
hume=int(data[5])
preci=int(data[7])
time=datetime.datetime.strptime(data[0],'%d-%m-%Y %H:%M')
day=datetime.date(time.year,time.month,time.day)
daily_data[day].append((temp,hume,preci))

minday,maxday=min(daily_data.keys()), max(daily_data.keys())

day=minday
previous_temp_average=None
while day <= maxday:
if daily_data[day]!=[]:
temp_average=sum(data[0] for data in daily_data[day])/sum(1 for data in daily_data[day])
else:
temp_average=previous_temp_average
print("Average temperature for day: %s is %s" % (day,temp_average))
previous_temp_average=temp_average
day=day+datetime.timedelta(days=1)
``````

## glez_b

Thanks for your comment, it was a lot of contribution for me. I found another way to create daily averages of many variables, for example of a database that has the following structure:

Fecha,Time, DirViento, MagViento, Temperatura, Humedad, PreciAcu.

Cada columna tiene datos diarios cada 15 minutos.I share the code that calculates the sum of the column PreciAcu and average of temperature and humidity columns:

``````import numpy as np
import pandas as pd
from numpy import *
from pandas import *
from scipy import *

index5=data.set_index(['Fecha','Hora'],inplace=True)

grouped = index5.groupby(level=0)

print 'Done............'
``````

Now I have one more question, I need to convert to radians the DirViento column, how I can do this and add this new column to my data file? . In summary,i need to obtain 3 columns from the DirViento column.

Specifically:
First I need to convert each of the column values in DirViento to Radians (New column called Rad)
Second I need to get the sine of Rad column (new column called Sin)
Third I need to get the cosine of Rad column (new column called Cos).

How I can achieve this?

## slate 241

You use math.sin, math.cos and math.pi.

## BearofNH 104

You didn't tell us what was in the DirViento column.
I sincerely hope you are not trying to convert a temperature value in degrees to a temperature value in radians...

## rrashkin 41

I'm guessing it's "wind direction"

## glez_b

BearofNH:

rrashkin:
Effectively, I need to calculate average wind, Temperature, Moisture and Sum of Precipitation from a monthly database has daily data recorded every 15 minutes. A sample of the dataframe is as follows:

``````                DirV MagV Temp HumR  PreciAcu
``````

Fecha_Hora
0 2011/07/01 00:00 318 6.6 21.22 100 1.7
1 2011/07/01 00:15 342 5.5 21.20 100 1.7
2 2011/07/01 00:30 329 6.6 21.15 100 4.8
3 2011/07/01 00:45 279 7.5 21.11 100 4.2
4 2011/07/01 01:00 318 6.0 21.16 100 2.5
5 2011/07/01 01:15 329 7.1 21.13 100 4.0
6 2011/07/01 01:30 300 4.7 21.15 100 1.3
7 2011/07/01 01:45 291 3.1 21.23 100 2.2
8 2011/07/01 02:00 284 7.6 21.29 100 1.3
10 2011/07/02 00:00 281 3.6 21.47 100 3.2
11 2011/07/02 00:15 360 2.7 21.52 100 2.5
12 2011/07/02 00:30 57 1.2 21.53 100 0.0
13 2011/07/02 00:45 300 3.4 21.69 100 0.0
14 2011/07/02 01:00 359 5.9 21.67 100 0.0
15 2011/07/01 01:15 309 1.8 21.65 100 0.0

I'm using pandas to read the data and I do calculations with numpy and scipy. After taking some considerations such as: if some value is 0, replace to 360, Obtain the components u and v and if the component v is 0, then resuling wind direction is 0 if no is atan(u/v) and magnitude whit (u2 + v2)**1/2.

for get daily mean and sum i do the next:

``````index5=data.set_index(['Fecha','Hora'],inplace=True)
grouped = index5.groupby(level=0)