I'm trying to create a vertical histogram using only built-in modules. I understand the current histogram function isn't truly a histogram (and the code is probably very ugly), but I'm totally lost on how to create a vertical histogram.

import itertools

def histogram(s):
	print("Histogram:")
	print("%s %7s %12s" % ( "No.", "Value", "Histogram" ))
	for num, count in (s):
		print("%3d %7d %s" % (num, count,  "*" * count))

text = "Sample text for this example."
word_list = []
word_seq = []

text = text.strip()

for punc in ".,;:!?'-&[]()" + '"' + '/':
    text = text.replace(punc, "")

words = text.lower().split()

for word in words:
    word_count = len(word)
    word_list.append(word_count)

word_list.sort()

for key, iter in itertools.groupby(word_list):
    word_seq.append((key, len(list(iter))))
    
histogram(word_seq)

Any help is greatly appreciated.

Recommended Answers

All 12 Replies

hint: Think about a two dimensional array of asterisks. There is nothing sacred about traversing such an array in row-major order.

Here is my latest with some help I received earlier...

One problem is that I cannot figure why I receive a TypeError no matter how I try to assign tuple_dict. Also, am I even on the right track??? I added posting the length and count before the histogram.

import itertools

def histogram(s):
	tuple_dict = {}
	for i in sorted(s.values()):
		t = ()
		for n in s.keys():
			if s[n] >= i:
				t = t + tuple('*')
			else:
				t = t + tuple(' ')
			tuple_dict[i] = t
	return tuple_dict

text = "This is a sample text for this example to work."
word_list = []
word_seq = []

text = text.strip()

for punc in ".,;:!?'-&[]()" + '"' + '/':
	text = text.replace(punc, "")

words = text.lower().split()

for word in words:
    word_count = len(word)
    word_list.append(word_count)

word_list.sort()

for key, iter in itertools.groupby(word_list):
    word_seq.append((key, len(list(iter))))
    
print("%s %7s" % ( "Length", "Count"))

for num, count in word_seq:
	print("%5d %10d" % (num, count))

d = {}

for a, b in word_seq:
	d.setdefault(a, []).append(b)
	
histogram(d)

You are pretty much going around in circles, it seems to me, so "No, not on the right track (in my opinion)." The problem you have on line 12 is that i is an array holding a single integer. Arrays are mutable, so they cannot be hashed. cast i to a tuple or take its data out of the array and the code is acceptable to Python, but still doesn't do anything useful. (If you post the entire error message, including line number and the full text, it is easier to help... and better yet, if you actually think about the error message "with fresh eyes" (that's the hardest part) you are likely to be able to figure it out for yourself: The people who write these messages are doing their best to be helpful...

Thank you sir. I'm guessing it's obvious that I'm stabbing in the dark here. I'm honestly not trying to ask for the answer, but do you know of any code I could look at to get a better understanding?

The error messages came from me trying to assign tuple_dict as a dict, list or tuple. Each error was line 45 (calling the function) and 12. The line 12 errors were as follows when changing line 4 to each:

tuple_dict = () receives "TypeError: 'tuple' object does not support item assignment"

tuple_dict = [] receives "TypeError: list indices must be integers, not list"

tuple_dict = {} receives "TypeError: unhashable type: 'list'"

tuple_dict = {} receives "TypeError: unhashable type: 'list'"

I would suggest that you print the values and note also that a list can not be used as a dictionary's key. If that is indeed what you want to do, convert to a tuple first.

def histogram(s):
    tuple_dict = {}
    ## don't use "i", "l", or "O" as variable names as they can look like numbers
    for val in sorted(s.values()):
        print type(val), val
        ## comment the rest of the code until you get this problem solved

Also there is a problem with the last line of your code.

Consider using the class collections.Counter to count the number of distinct word lengths. Something like

#... get clean word list in words
for word in words:
    counter[len(word)] += 1

Now that you have the data, you need to display it. Think of the display as a two dimensional array. The code you have that works makes that array like this: The first column is lengths of words, and each subsequent column holds an asterisk in the the length row if that row had at least that many words of that length. But what you want this time is exactly the same if you just switch 'column' for 'row' and vice versa. In order to do that, you can either build the array that way, or you can just access the elements in the order you need them.

>>> histogram(([1,56],[2,67]))
Histogram:
No.   Value    Histogram
  1      56 ********************************************************
  2      67 *******************************************************************
>>>

Looks like you have quite good start, you only need to consider the correct scaling by finding the maximum count value before printing.

Here's what I have so far. Unfortunately, it's not producing the desired results. The final output should look like this http://dev.collabshot.com/show/723400/# NOTE: I have changed the text variable here because I am calling a file in the actual version.

import itertools

def histo(dict_words):
	x_max = max(dict_words.keys())
	y_max = max(dict_words.values())
	
	for j in range(400, 0, -20 ):
		print(j, '%3s' % '-|')
		s = '%9s' % '|'
		for i in range(0, x_max):
			if i in dict_words.keys() and dict_words[i] >= j:
				s += '*'
			else:
				s += ' '
		print(s)
		
	for i in range(1, x_max):
		s += '-+-'
	print(s)
	
	for i in range(1, x_max):
		s += ' %d ' % i
	print(s)


text = "This is a sample text for this example to work."
word_list = []
word_seq = []
dicta = {}

text = text.strip()

for punc in ".,;:!?'-&[]()" + '"' + '/':
	text = text.replace(punc, "")

words = text.lower().split()

for word in words:
    word_count = len(word)
    word_list.append(word_count)

word_list.sort()

for key, iter in itertools.groupby(word_list):
    word_seq.append((key, len(list(iter))))
    
print("%s %7s" % ( "Length", "Count"))

for num, count in word_seq:
	print("%5d %10d" % (num, count))

dicta = dict(word_seq)

print(histo(dicta))

I think you are trying to make this too difficult. Format a string to print the line as you want it, and then print the string.

import itertools

def histo(dict_words):
	
    ## length of words from 10 letters to one letter
    for length in range(10, 0, -1 ):
        this_line = '%3d-|%9s' % (length, '-|')  
 
        ## allow for the possibility that the dictionary may 
        ## not contain all lengths from 1-10
        if length in dict_words:
            ## multiply '*' by the number of times this length was found
            this_line += '*' * dict_words[length]

        print this_line

    print '%3d-|       -|----0----1----1----2' % (0)  
    print '              1-3-5----0----5----0'  

text = "This is a sample text for this example to work."
text += 'and the quick brown fox jumped over the lazy, lazydogdog'
word_list = []
word_seq = []
dicta = {}

text = text.strip()

for punc in ".,;:!?'-&[]()" + '"' + '/':
	text = text.replace(punc, "")

words = text.lower().split()

for word in words:
    word_count = len(word)
    word_list.append(word_count)

word_list.sort()

for key, iter in itertools.groupby(word_list):
    word_seq.append((key, len(list(iter))))
    
print("%s %7s" % ( "Length", "Count"))

for num, count in word_seq:
	print("%5d %10d" % (num, count))

dicta = dict(word_seq)

print(histo(dicta))

Could someone show me how write the numbers on the left side of the histogram shown here http://dev.collabshot.com/show/723400/#

Here's my code:

import itertools

def histo(dict_words):
	x_max = max(dict_words.keys())
	y_max = max(dict_words.values())
	s = ""
	
	for j in range(y_max, 0, -20 ):
		
		s = '%9s' % '|'
		
		for i in range(1, x_max):
			if i in dict_words.keys() and dict_words[i] >= j:
				s += '***'
			else:
				s += '   '
		print(s)
		
	s = '\t-+'
	for i in range(1, x_max):
		s += '-+-'
	s += '>'
	print(s)

	s = '\t |'
	for i in range(1, x_max):
		if i > 9:
			s += '%d ' % i
		else:
			s += ' %d ' % i 
	print(s)


if __name__ == "__main__":
	f = open('declaration.txt', 'r')
	f.close()
	text = ""
	word_list = []
	word_seq = []
	dicta = {}
	
	open_file = open('declaration.txt', 'r').readlines()
	text = text.join(open_file).strip()
	
	for punc in ".,;:!?'-&[]()" + '"' + '/':
		text = text.replace(punc, "")

	words = text.lower().split()
	
	for word in words:
		word_count = len(word)
		word_list.append(word_count)
	
	word_list.sort()
	
	for key, iter in itertools.groupby(word_list):
		word_seq.append((key, len(list(iter))))
	
	print("%s %7s" % ( "Length", "Count"))
	
	for num, count in word_seq:
		print("%5d %10d" % (num, count))
		
	print(" ")
	
	dicta = dict(word_seq)
	
	histo(dicta)

You could replace line 10 with:

s = ('%9s-|' % (j+1)) if (not (j+1) % 100) else ('%11s' % '|')

However this is not final solution as this depends on y_max which here seems to finish with 99, but could vary according to text. You should debug the code to fix this.

Now the plot does not have connection to file you read. You should debug your code for file IO. It very much wrong, especially lines 35 and 36 which does nothing but cause error if file does not exist. Also iter is build in function and should not be shadowed.

Also your histogram loop does not reach x_max and histogram does not adjust to y_max.

Here is output of my fixed version (leaving your length counting alone, even it is not correct as it joins words together) of your code (. for existing value > 0 at last line)

Length   Count
    1         19
    2        272
    3        279
    4        186
    5        161
    6        145
    7        117
    8         82
    9         67
   10         61
   11         37
   12         15
   13         10
   14          9
   15          2
   16          2
   17          4
   18          2
   19          1
   21          1
 
      300-|                                                               
      290-|                                                               
      280-|                                                               
      270-|   ::::::                                                      
      260-|   ::::::                                                      
      250-|   ::::::                                                      
      240-|   ::::::                                                      
      230-|   ::::::                                                      
      220-|   ::::::                                                      
      210-|   ::::::                                                      
      200-|   ::::::                                                      
      190-|   ::::::                                                      
      180-|   :::::::::                                                   
      170-|   :::::::::                                                   
      160-|   ::::::::::::                                                
      150-|   ::::::::::::                                                
      140-|   :::::::::::::::                                             
      130-|   :::::::::::::::                                             
      120-|   :::::::::::::::                                             
      110-|   ::::::::::::::::::                                          
      100-|   ::::::::::::::::::                                          
       90-|   ::::::::::::::::::                                          
       80-|   :::::::::::::::::::::                                       
       70-|   :::::::::::::::::::::                                       
       60-|   :::::::::::::::::::::::::::                                 
       50-|   :::::::::::::::::::::::::::                                 
       40-|   :::::::::::::::::::::::::::                                 
       30-|   ::::::::::::::::::::::::::::::                              
       20-|   ::::::::::::::::::::::::::::::                              
       10-|:::::::::::::::::::::::::::::::::::::::..................   ...
           -+-+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+->
           | 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21

With corrected counts (both your round about method and by defaultdict(int)) I get from attached text this result:

Length   Count
    1         20
    2        272
    3        280
    4        188
    5        161
    6        147
    7        119
    8         89
    9         76
   10         65
   11         36
   12         15
   13         10
   14          7
   15          2

      300-|                                             
      290-|                                             
      280-|      :::                                    
      270-|   ::::::                                    
      260-|   ::::::                                    
      250-|   ::::::                                    
      240-|   ::::::                                    
      230-|   ::::::                                    
      220-|   ::::::                                    
      210-|   ::::::                                    
      200-|   ::::::                                    
      190-|   ::::::                                    
      180-|   :::::::::                                 
      170-|   :::::::::                                 
      160-|   ::::::::::::                              
      150-|   ::::::::::::                              
      140-|   :::::::::::::::                           
      130-|   :::::::::::::::                           
      120-|   :::::::::::::::                           
      110-|   ::::::::::::::::::                        
      100-|   ::::::::::::::::::                        
       90-|   ::::::::::::::::::                        
       80-|   :::::::::::::::::::::                     
       70-|   ::::::::::::::::::::::::                  
       60-|   :::::::::::::::::::::::::::               
       50-|   :::::::::::::::::::::::::::               
       40-|   :::::::::::::::::::::::::::               
       30-|   ::::::::::::::::::::::::::::::            
       20-|:::::::::::::::::::::::::::::::::            
       10-|:::::::::::::::::::::::::::::::::::::::......
           -+-+--+--+--+--+--+--+--+--+--+--+--+--+--+--+->
           | 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.