Hi Guys, I'm Using Python 2.7.3

This is my txt file:

TimeStamp,Irradiance
21/7/2014 0:00,0.66
21/7/2014 0:00,0.71
21/7/2014 0:00,0.65
21/7/2014 0:00,0.67
21/7/2014 0:01,0.58
21/7/2014 0:01,0.54
21/7/2014 0:01,0.63
21/7/2014 0:01,0.65
21/7/2014 0:02,0.64
21/7/2014 0:02,0.63
21/7/2014 0:02,0.63
21/7/2014 0:02,0.64
.
.
. 
.
22/7/2014 23:57,0.53
22/7/2014 23:58,0.69
22/7/2014 23:58,0.61
22/7/2014 23:58,0.65
22/7/2014 23:58,0.59
22/7/2014 23:59,0.63
22/7/2014 23:59,0.67
22/7/2014 23:59,0.68
22/7/2014 23:59,0.58

How can i find the average/mode/max value of my txt file data and display it

My program is using the data above and letting user to choose from time & to time then plot the user selected criteria (which i had done)

but now i would like my program to display like the maximum (highest) value, average & mode of the data from the selected time by the user

Edited 2 Years Ago by dumicom

dumicom
Deleted Member

I suggest using pandas dataframes and their methods for statistics

# -*- coding: utf-8 -*-

import pandas as pd

df = pd.read_table(
    'data.csv',
    sep=',',
    header=0,
    parse_dates = [0]
)

print(df)
m = df.mean()
print(type(m))
print(m)
print float(m['Irradiance'])

""" my output -->
0  2014-07-21 00:00:00        0.66
1  2014-07-21 00:00:00        0.71
2  2014-07-21 00:00:00        0.65
3  2014-07-21 00:00:00        0.67
4  2014-07-21 00:01:00        0.58
5  2014-07-21 00:01:00        0.54
6  2014-07-21 00:01:00        0.63
7  2014-07-21 00:01:00        0.65
8  2014-07-21 00:02:00        0.64
9  2014-07-21 00:02:00        0.63
10 2014-07-21 00:02:00        0.63
11 2014-07-21 00:02:00        0.64
12 2014-07-22 23:57:00        0.53
13 2014-07-22 23:58:00        0.69
14 2014-07-22 23:58:00        0.61
15 2014-07-22 23:58:00        0.65
16 2014-07-22 23:58:00        0.59
17 2014-07-22 23:59:00        0.63
18 2014-07-22 23:59:00        0.67
19 2014-07-22 23:59:00        0.68
20 2014-07-22 23:59:00        0.58

[21 rows x 2 columns]
<class 'pandas.core.series.Series'>
Irradiance    0.631429
dtype: float64
0.631428571429
"""

Hi Gribouillis, thanks for the reply, what i want the program is after user input their desired timings only then we display the mode/average of the selected data

i tried to put it into my code and would like to know if this is the right way to do it:

from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd

x = []
y = []
t = []

fig = plt.figure()
rect = fig.patch
rect.set_facecolor('#31312e')
readFile = open('data.txt', 'r')
sepFile = readFile.read().split('\n')
readFile.close()

startTime = raw_input('please enter start time in format like 21/7/2014 0:00 :')
endTime   = raw_input('please enter end time in format like 22/7/2014 23:57 :') 

# startTime = '21/7/2014 0:02'
# endTime = '22/7/2014 23:58'
startTime = datetime.strptime(startTime, '%d/%m/%Y %H:%M')
endTime = datetime.strptime(endTime, '%d/%m/%Y %H:%M')

df = pd.read_table( 
# how can i make df = startTime & endTime since we want to know the user input before being able to calculate the average/mode/max/min #
'data.csv',
sep=',',
header=0,
parse_dates = [0]
)

print(df) 
m = df.mean()
print(type(m))
print(m)
print float(m['Irradiance'])

for idx, plotPair in enumerate(sepFile):
    if plotPair in '. ':
        # skip. or space
        continue
    if idx > 1:  # to skip the first line
        xAndY = plotPair.split(',')
        time_string = xAndY[0]
        time_string1 = datetime.strptime(time_string, '%d/%m/%Y %H:%M')
        if startTime<=time_string1 <=endTime:
            t.append(time_string1)
            y.append(float(xAndY[1]))

ax1 = fig.add_subplot(1, 1, 1, axisbg='white')
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%Y %H:%M'))
ax1.plot(t, y, 'c', linewidth=3.3)

plt.title('IRRADIANCE')
plt.xlabel('TIME')
fig.autofmt_xdate(rotation=45)
fig.tight_layout()
plt.show()

Edited 2 Years Ago by dumicom

Here is what I found

# -*- coding: utf-8 -*-

import datetime as dt
import pandas as pd

df = pd.read_table(
    'data.csv',
    sep=',',
    header=0,
    parse_dates = [0],
    index_col = 0,
)

print(df)

startTime = '21/7/2014 0:02'
endTime = '22/7/2014 23:58'

startTime = dt.datetime.strptime(startTime, '%d/%m/%Y %H:%M')
endTime = dt.datetime.strptime(endTime, '%d/%m/%Y %H:%M')
df2 = df[startTime:endTime]
print(df2)
print(df2.describe())


""" my output -->
                     Irradiance
TimeStamp                      
2014-07-21 00:00:00        0.66
2014-07-21 00:00:00        0.71
2014-07-21 00:00:00        0.65
2014-07-21 00:00:00        0.67
2014-07-21 00:01:00        0.58
2014-07-21 00:01:00        0.54
2014-07-21 00:01:00        0.63
2014-07-21 00:01:00        0.65
2014-07-21 00:02:00        0.64
2014-07-21 00:02:00        0.63
2014-07-21 00:02:00        0.63
2014-07-21 00:02:00        0.64
2014-07-22 23:57:00        0.53
2014-07-22 23:58:00        0.69
2014-07-22 23:58:00        0.61
2014-07-22 23:58:00        0.65
2014-07-22 23:58:00        0.59
2014-07-22 23:59:00        0.63
2014-07-22 23:59:00        0.67
2014-07-22 23:59:00        0.68
2014-07-22 23:59:00        0.58

[21 rows x 1 columns]
                     Irradiance
TimeStamp                      
2014-07-21 00:02:00        0.64
2014-07-21 00:02:00        0.63
2014-07-21 00:02:00        0.63
2014-07-21 00:02:00        0.64
2014-07-22 23:57:00        0.53
2014-07-22 23:58:00        0.69
2014-07-22 23:58:00        0.61
2014-07-22 23:58:00        0.65
2014-07-22 23:58:00        0.59

[9 rows x 1 columns]
       Irradiance
count    9.000000
mean     0.623333
std      0.044441
min      0.530000
25%      0.610000
50%      0.630000
75%      0.640000
max      0.690000

[8 rows x 1 columns]
"""

You must explore the capabilities of pandas dataframe. There are many tools for your problem.

Hi Gribouillis, Thanks alot for your help but i can't seem to make my program work when i merge the above code with my code, can you help me take a look?

the problem i have is this few lines of code:

df = pd.read_table(
    'data.csv',
    sep=',',
    header=0,
    parse_dates = [0],
    index_col = 0,
)

i have already got the readFile code to help me read the comma delimited txt file so these lines i never put in, how can do it without putting the code above

from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
x = []
y = []
t = []
fig = plt.figure()
rect = fig.patch
rect.set_facecolor('#31312e')
readFile = open('data.txt', 'r')
sepFile = readFile.read().split('\n')
readFile.close()

startTime = raw_input('please enter start time in format like 21/7/2014 0:00 :')
endTime   = raw_input('please enter end time in format like 22/7/2014 23:57 :') 

# startTime = '21/7/2014 0:02'
# endTime = '22/7/2014 23:58'
startTime = datetime.strptime(startTime, '%d/%m/%Y %H:%M')
endTime = datetime.strptime(endTime, '%d/%m/%Y %H:%M')


for idx, plotPair in enumerate(sepFile):
    if plotPair in '. ':
        # skip. or space
        continue
    if idx > 1:  # to skip the first line
        xAndY = plotPair.split(',')
        time_string = xAndY[0]
        time_string1 = datetime.strptime(time_string, '%d/%m/%Y %H:%M')
        if startTime<=time_string1 <=endTime:
            t.append(time_string1)
            y.append(float(xAndY[1]))
ax1 = fig.add_subplot(1, 1, 1, axisbg='white')
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%Y %H:%M'))
ax1.plot(t, y, 'c', linewidth=3.3)
plt.title('IRRADIANCE')
plt.xlabel('TIME')
fig.autofmt_xdate(rotation=45)
fig.tight_layout()
plt.show()

Edited 2 Years Ago by dumicom

You must not copy and paste code verbatim without understanding it. If your data file is named data.txt instead of data.csv for example, you must adapt the code. The question here is whether you want to use a pandas DataFrame which already has builtin methods for statistics or not (this is very similar to a dataframe in RĀ for example, if you know R). If you dont, you'll have to write your own functions.
For example you have 2 lists t and y, respectively with timestamps and data. Let's hope the t's are ordered. You can start by writing code to extract the sublists of t and y for which the timestamp is between startTime and endTime.
Then we could write code to calculate the mean and other statistical functions on these sublists.

Edited 2 Years Ago by Gribouillis

This article has been dead for over six months. Start a new discussion instead.