Hi,

I think I have a pretty simple question but my google searches are giving me more information than I need. I think my terminology is not clear.

Let's say I'm making a dictionary to hold a datatype very specific to some filetype or data of interest. Let's say I have the following data in a file:

'keith'    100,  50, 'puppy'
'bill'     300,  32, 'cat'

This data is arbitrary, but I will create a dictionary keyed by the first column, and then the other entries will be stored in either a list or tuple. The first two entries of the list/tuple will be integers and the third will be a string. I want to mandate my program to only allow for this ordering to prevent bad input and also clarify to the user exactly how the data is stored. Bastically, I want to tell python "hey make a dictionary, with each value being a list/tuple of three object, and reserve space for two ints and a string. How do I do this?

Something like newtype=type(dict(str:list(int,int,str) )

I have no idea how to start this, although I imagine its very routine. Thanks.

Well, here is how you could use the abstract base classes in module collections to implement such custom types. Here a list type which item types are given. Notice that this class uses 'converters' instead of types, which means that complex behavior can be implemented whith regards to the conditions on items, for example one could ask that an item be a floating number between 0 and 1, etc

#!/usr/bin/env python
# -*-coding: utf8-*-
# Title: customtp.py
# Author: Gribouillis
# Created: 2012-07-06 09:17:00.649267 (isoformat date)
# License: Public Domain
# Use this code freely.

import collections

def custom_list_type(name, converters):

    class tp(collections.MutableSequence):

        def __init__(self, iterable):
            L = list(iterable)
            if len(L) != len(self.converters):
                raise TypeError(("Invalid number of items", len(L)))
            self.data = list(c(x) for c, x in zip(self.converters, L))

        # sized methods
        def __len__(self):
            return len(self.data)

        # iterable methods
        def __iter__(self):
            return iter(self.data)

        # container methods
        def __contains__(self, value):
            return value in self.data

        # sequence methods
        def __getitem__(self, index):
            return self.data[index]

        # mutable sequence methods
        def __setitem__(self, index, value):
            self.data[index] = self.converters[index](value)
        def __delitem__(self, index):
            raise TypeError("Can not remove item")
        def insert(self, index, item):
            raise TypeError("Can not insert item")

        # repr
        def __repr__(self):
            return "{name}({items})".format(name = self.__class__.__name__, items = repr(list(self)))

    tp.converters = tuple(converters)
    tp.__name__ = name
    return tp

if __name__ == "__main__":

    # the first custom type attempts to convert invalid values (for example str to int)

    cuslist = custom_list_type("Mylist", (int, int, str))

    L = cuslist([3, "4", "hello"])
    L[1] = 39.14
    try:
        L[0] = "joe"
    except ValueError:
        print "test joe passed"
    L[2] = 10
    print L

    # the second custom type allows only value of the specified types
    # and raises TypeError on invalid values.

    def check_int(value):
        if isinstance(value, (int, long)):
            return value
        else:
            raise TypeError("int or long value expected")

    def check_str(value):
        if isinstance(value, basestring):
            return value
        else:
            raise TypeError("basestring value expected")

    cuslist = custom_list_type("Mylist", (check_int, check_int, check_str))

    try:
        L = cuslist([3, "4", "hello"])
    except TypeError:
        print "test ctor passed"
        L = cuslist([3, 4, "hello"])
    try:
        L[1] = 39.14
    except TypeError:
        print "test float passed"
    try:
        L[0] = "joe"
    except TypeError:
        print "test joe passed"
    try:
        L[2] = 10
    except TypeError:
        print "test basestring passed"
    L[2] = "hello"
    print L

Edited 4 Years Ago by Gribouillis

Comments
Neat!

Damn. That's a great example, thanks!

I am trying to learn it; however, it is a bit involved for me. This is a lot more effort than I had expected. I know if the enthought tool suite, it's fairly straightforward to make custom types, but I guess that's because they already begin by inheriting fomr the HasTraits baseclass to begin with. For example:

List(Int,Int,Str)  

Declared in the contstructor would create a custom class for such a type. I guess I"ll stop taking this for granted.

Also, I guess I should be aware that a custom class and a custom type aren't quite the same thing. Do you think it would be a waste of memory for me to create a class for handling this simple data rather than a new type? Like something with a quick initializer?

Or maybe just a method to validate input, rather building a new type. What is the most Pythonic way?

It seems to me that

MyList = custom_list_type("Mylist", (check_int, check_int, check_str))

is not more difficult to write than

Mylist = List(Int,Int,Str)

This feature may be part of the enthought tool suite, but it's not included in python. The pythonic way of thinking is to avoid type checking.

Comments
The pythonic way of thinking is to avoid type checking. +1

What is the most Pythonic way?

Not to typecheck.
Need to know the type of an object?
Let me make a brief argument: No, you don't.
Just use the object as if it was whatever you expect it to be, and handle any errors that result.
http://www.siafoo.net/article/56

@S.Lott
Actually, a manual type check in Python is almost always a waste of time and code.
It's simply a bad practice to write type checking code in Python.
If an inappropriate type was used by some malicious sociopath, Python's ordinary methods will raise an ordinary exception when the type fails to be appropriate.
You write no code, your program still fails with a TypeError.
There are very rare cases when you must determine type at run-time.

Grib,

I noticed in plain Python that I do something like:

mylist=[int,int,str]

Is there a way to force inputs to adhere to this without defining an entirely new class. For example, if I say:

a=[1,2,'hi']
b=[1,2,5]

And then have the program recvognize that 'a' is valid but 'b' is invalid, without going into the detail of your exapmle? This is what I'm after at the end of the day.

In regard to type checking, I see what you guys are saying and am not trying to validate my types as much as I just want an explicit declaration like

mylist=[int,int,str]

So its clear exactly how the data fields are stored.

Am I making sense or is this not clear?

You assume the correct form and use try...except for input check. Snippsat gave good quote about it, even Gribouillis class was neat coding, only that is not the Pythonic way. Have you read about concept of duck typing?

Hey pyTony, do you mean something like this:

mylist=[150, 15, 'billy']

try:
    new=[int(mylist[0]), int(mylist[1]), str(mylist[2])
except ...

I was thinking maybe this is stringent enough.

Almost, but not making any otherwise unnecessary operations, but protecting first use of values, for example in beginning of inner loop. But if you got values from file, then they are all in beginning string values, and then we would only guard conversion function, like you wrote in your post (but all items would be strings or one unsplit string)

you can also make your own custom dict.
custom type are done with classes

    class custom_dict(dict):    
        # mydict[key]   
        def __getitem__(self,key):
            return self.__dict__[key]

        # mydict[key] = value   
        def __setitem__(self,key,value):
            if len(value) != 3:
                raise TypeError("Wrong numbers of arguments")
            elif isinstance(value[0],int) is False:
                raise TypeError("First argument not an integer")
            elif isinstance(value[1],int) is False:
                raise TypeError("Second argument not an integer")
            elif isinstance(value[2],str) is False:
                raise TypeError("Third argument not an string")
            else:
                self.__dict__[key] = value

        # del mydict[key]       
        def __delitem__(self,key):
            del self.__dict__[key]

        # how it appears on print outs  
        def __repr__(self):
            return self.__dict__.__repr__()

        # normal dict operations
        def keys(self):
            return self.__dict__.keys()

        def values(self):
            return self.__dict__.values()

        def items(self):
            return self.__dict__.items()

    mydict = custom_dict() # instead of mydict = {}
    mydict['kieth'] = (100, 50, 'puppy')
    mydict['bill'] = (300, 32, 'cat')

    print mydict
    print
    mydict['wrong'] = (50,'4','snail')

Edited 4 Years Ago by DrakeMagi: typo

Not bad code DrakeMagi, but inheriting from dict is not nice, as you are actually delegating dictionary to operations of objects __dict__. So replace the dict as object as base type, if you inherit dict you do not need so much code:

class CustomDict(dict):    
    # mydict[key] = value   
    def __setitem__(self, key, value):
        if len(value) != 3:
            raise TypeError("Wrong numbers of arguments")
        for argno, argtype in enumerate((int, int, str)):
            if not isinstance(value[argno], argtype):
                raise TypeError("Argument %i not an %s" % (argno, argtype.__name__))
        dict.__setitem__(self, key, value)


mydict = CustomDict() # instead of mydict = {}
mydict['kieth'] = (100, 50, 'puppy')
mydict['bill'] = (300, 32, 'cat')

print mydict
print
mydict['wrong'] = (50,'4','snail')

Also Class names should be in CamelCase.

Edited 4 Years Ago by pyTony

Nice! Learned something myself. didn't know you could do that with types (int,int,str).

I'm not very pythonic myself.
guess my class should be called
class ObjectDict(object):
forgot to not inherit dict
but your way better. showing how to make custom type from inherit type.

i just like being able to use my . with dicts.

mydict = CustomDict() # instead of mydict = {}
mydict.kieth = (100, 50, 'puppy')
mydict.bill = (300, 32, 'cat')
print mydict

One thing. I thought python was passing on % for strings for .format(...)
raise TypeError("Argument {0} not an {1}".format(argno, argtype.__name__))

I know I like format better because I don't have to remember type.

Edited 4 Years Ago by DrakeMagi

To enable accessing the values with '.', you should add to the code of object:

def __setattr__(self, key, value):
    self.__dict__[key] = value

__getattr__ = dict.__getitem__

Yes, it is good to prefer the format method which is newer and more powerfull. But for % you can use for every type %s if you like not restrict the type.

def setattr(self, key, value):
self.dict[key] = value
getattr = dict.getitem

Hey PyTony, this post makes a lot more sense to me now after the recent discussions we've had. Does enabling dict value access with '.' using the setattr method... does that add a lot extra overhead to the code or not really? It seems a nice convienence to the users, but is there a cost to it?

Here is a related example using the Python dictionary:

# to simplify things create a comma separated value (CSV) file
data = """\
keith,100,50,puppy
bill,300,32,cat
frank,200,9x9,mouse
"""

fname = "mydata.csv"
# write the test data file
with open(fname, "w") as fout:
    fout.write(data)

# read the data in line by line
data_dict = {}
for ix, line in enumerate(open(fname)):
    # remove trailing newline char
    line = line.rstrip()
    # create a list by splitting at the commas
    line_list = line.split(",")
    #print(line_list)  # test
    try:
        # create dictionary items
        key = line_list[0]
        int1 = int(line_list[1])
        int2 = int(line_list[2])
        str1 = line_list[3]
        item_list = [int1, int2, str1]
        # form the dictionary elements
        data_dict[key] = item_list
    except ValueError:
        print(line)
        print("data has a value error in line %d" % (ix+1))
        print("this data not added to dictionary")

print('-'*40)
print(data_dict)  # test

Thanks for sharing HiHe.

Your example is the usual way I go about doing things. What motivated me to stray from that is really the idea that I am going to be sharing this code with a larger community (biopython I hope). So the question becomes, what is the most general and robust way to represent all of this data? If I was guaranteed to read it in from the same file type for eternity, then I wouldn't even bother making a custom class. But what happens when you also want the users to be able to enter data manually? Or enter sparse fields (aka the file format changes in the future, will your code still work without rewriting a new file I/O handler).

With all the help I've gotten on the forum as well as some other tips, I think I've come up with a preferred way to do it reliably and robustly. At the end of the day, what you need is a class to manage each record (or row in the data) and a dictionary class that is made to handle multiple records. Depending on if you want mutability and a certain level of type checking, you need to add custom behaviors at both levels.

I will turn this into a code snippet based on namedtuples later.

This question has already been answered. Start a new discussion instead.