![]() |
| ||
| Python I have been given a project in my Python class that reads in a file, and in the file, 32 attributes are given to determine if a lump is either a benign or a malignant tumor. In my trainClassifier function, I have to find each attributes total for both malin. and begn. and then, I have to find the averages of the two. I am having troubles on how to start this function. I can't figure out how to index the list to chose whether it is a malin. or begn. The attributes are the first line of the file, and I have to find the avgs. of the first 10 attributes. However, I also have to use the last attribute which determines if it is a begn. or malin. 2. Train a simple classifier. A classifier is a model of the problem such that when we’re given a new record we can compare the new record to the model in order to predict the class of the new record. We use the training set to build up this model. Our model is very simple. For all malignant records, for each attribute, we calculate the average value of each attribute. For all benign records, for each attribute, we calculate the average value of each attribute. To create the model, we then calculate the midpoint of these averages for each attribute. Then to classify new records, if the majority (5 or more) of the new record’s attributes are above their respective midpoints, then the new record is predicted to be malignant, otherwise (4 or less), benign. There are many different methods in the areas of Artificial Intelligence and Machine Learning that have been used by computer programmers to make predictions. Most of these methods rely heavily on statistics-based methods that use computers to crunch a lot of numbers. We’re more interested in developing our programming skills than delving deep into statistics so we are going to use a very simple method to make predictions. That is to say, our classifier is probably not statistically sound but it serves as a good programming exercise as well as a good introduction to the problem of predicting classes. Furthermore, in the real world, we commonly face lots of issues that crop up with missing data, noisy data, or other problems. We don’t face any of these issues in this assignment. It is safe to assume that all of the data is there and correct. I guess I don't know how to use the first function to get the avgs. of the second. I need to specify if it is a beng. or a malin. from the 32 attribute of the actual file. The files looks like this: radius length ..... class 1.2242 .45 M .24252 .34 B .242556 .353 M I don't know how to grab the information needed from the 32 second attribtue (class) to add up all the attributes. THis is confusing I know. I'm sorry, but if anyone could help me that'd be great. Ask if you need something explained better. (I'm sure you might) # Tasks # 1 - Create a training set # 2 - Train a 'dumb' rule-based classifier # 3 - Create a test set # 4 - Apply rule-based classifier to test set # 5 - Report accuracy of classifier attributeList = [] attributeList.append("ID") attributeList.append("radius") attributeList.append("texture") attributeList.append("perimeter") attributeList.append("area") attributeList.append("smoothness") attributeList.append("compactness") attributeList.append("concavity") attributeList.append("concave") attributeList.append("symmetry") attributeList.append("fractal") attributeList.append("class") ##################### # 1. Create a training set # - Read in file # - Create a dictionary for each line # - Add this dictionary to a list # # makeTrainingSet # parameters: # - filename: name of the data file containing the training data records # # returns: trainingSet: a list of training records (each record is a dict, # that contains attribute values for that record.) ########################################################## def makeTrainingSet(filename): trainingSet = [] # Read in file for line in open(filename,'r'): if '#' in line: continue line = line.strip('\n') linelist = line.split(',') # Create a dictionary for the line # ( assigns each attribute of the record (each item in the linelist) # to an element of the dictionary, using the constant keys ) record = {} for i in range(len(attributeList)): if(i==11): #class label is a character, not a float record[attributeList[i]] = linelist[31].strip() else: record[attributeList[i]] = float(linelist[i]) # Add the dictionary to a list trainingSet.append(record) return trainingSet ########################################################## # 2. Train 'Dumb' Classifier # trainClassifier # parameters: # - trainingSet: a list of training records (each record is a dict, # that contains attribute values for that record.) # # returns: a dictionary of midpoints between the averages of each attribute's # values for benign and malignant tumors ############################################################################### def trainClassifier(trainingSet): # A. initialize dictionaries for sums of attribute values # and initialize record counts return classifier # B. process each record in the training set # calculating sums and counts as we go # C. calculate averages # D. calcualte midpoints for our classifier # return classifier |
| ||
| Re: Python Wrap code in code tags: [code=python] # Code here [/code] Also, read the forum rules about homework, and about asking questions in general. |
| ||
| Re: Python So, did you ever get started on this? show us some code and maybe we can help you out |
| All times are GMT -4. The time now is 4:46 pm. |
Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2009 DaniWeb® LLC