CodeCounter.py for Python

Updated chriswelborn 2 Tallied Votes 362 Views Share

I started using Python a few weeks ago, something finally clicked and I started writing tools for my personal use. I started writing something with a GUI using GTK and the code lines started multiplying rapidly, so I thought to myself "Am I commenting my source to death? How many comments do I have?". Since I'm new to the language I wasn't really sure if I was going about everything the right way. At the time I had never heard of pycount or CLOC, I just found out about those two today after a google search. I think the authors of those tools had the same idea, before me ofcourse. My script counts code lines, blank lines, and comments, but it also counts inline-comments, imports , variables, variable assignments, classes, functions, if / try / & print statements, and reports the count or percentage of each. It also has the ability to list all imports in a script, plus the ability to list all variables in a script in the order of discovery, or alphabetical order. That can be accomplished by passing the -v or -i flag on the command-line. There are a few short comings which are explained on my site and in the comments, like Doc Strings and multiple variable assignments on the same line. But for most projects it's dead on. I don't want to say too much, you can visit my site for more info. Here is what the default basic output looks like when codecount.py is ran on itself:

Finished analyzing: codecount.py...

       Total Lines: 447

        Code Lines: 245 (54.8098%)
       Blank Lines: 63 (14.0939%)

     Comment Lines: 139 (31.0961%)
   Inline Comments: 22
    Total Comments: 161

 Longest Code Line: 113
Shortest Code Line: 5

   Longest Comment: 75
  Shortest Comment: 3

           Imports: 3
         Functions: 5
           Classes: 0
     If Statements: 52
    Try Statements: 1
  Print Statements: 65
       Assignments: 89
         Variables: 38

I hope this helps some people out, or teaches someone something about Python. It's certainly done that for me. If anyone has any suggestions or ideas feel free to send me a message. Thanks for reading,
-Cj

update : There is a very small bug in this script, the first line after if __name__=="__main__" should read if ("-h" in sys.argv[1]): instead of "h".. sorry.

MODERATOR EDIT : Code edited and the double post removed

TrustyTony commented: Reasonable effort +12
#!/usr/bin/env python
	
# Count code lines, comment lines, blank lines ( show percentage of each )
# Count imports, classes, functions, try statements, print statements,
#       if statements, variables, variable assignments
# Reports count/percentage of each item on the console

# This program cannot handle Doc Strings as of yet, and multiple
# variables on a single line confuses it a bit (so it ignores them).
# I am working on fixing these and other issues.
#---------------------------------------------------------------------

#(c) Copyright by Christopher J. Welborn, 2012, cjwelborn@gmail.com 

# Permission to use, copy, modify, and distribute this software and 
# its documentation without fee and for any purpose, except direct 
# commerial advantage, is hereby granted, provided that the above 
# copyright notice appear in all copies and that both that copyright
# notice and this permission notice appear in supporting documentation.
#
# THE AUTHOR CHRISTOPHER J. WELBORN DISCLAIMS ALL WARRANTIES WITH REGARD TO
# THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
# FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL,
# INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING
# FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
# NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
# WITH THE USE OR PERFORMANCE OF THIS SOFTWARE!

# -Christopher J. Welborn

# 7-17-2012


# Note to developers:
# I realize that there are probably better ways to do what I am
# doing here, I have only been writing Python for a couple weeks now
# and I am just getting used to it. If you can do this better, or plan
# on taking this script and modifying it, please drop a line in the 
# comments that says "inspired by Christopher Welborn's codecount.py"
# or something like that. If you have any pointers for me I welcome
# them, just send me an email and I'll try to implement your idea.
# 

# Imports
import sys          # for argument parsing
from os import path # for file stuff
import re           # for trimming spaces from strings

# Set App Name
sAppName = "Code Counter"
# Set Version
sVersion = "1.0"
# Get script file name
sAppExe = sys.argv[0]
# Set spacing for reports
sSpacing = "        "
# Trim ./ from script file name
if sAppExe[:2] == "./": 
	sAppExe = sAppExe[2:] 

sUsage = "Usage: ./" + sAppExe + " [-hdvi] script_to_analyze.py"


# No command-line arguments?
if len(sys.argv) < 2:
	print sAppName + " v." + sVersion + " - Not enough arguments, expecting:"
	print sUsage
	exit()
# Too many command-line arguments?
if len(sys.argv) > 3:
	print sAppName + " v. " + sVersion + " - Too many arguments, expecting:"
	print sUsage
	exit()
	
# Get count of arguments
iArgs = len(sys.argv)
# Get last argument (File to analyze)
sFile = sys.argv[iArgs - 1]

# Initialize Count: Counters
iTotal = 0          # Total Lines
iBlanks = 0         # Blank Lines
iCode = 0           # Actual Code
iCommentLine = 0    # Full Comment Line
iComments = 0       # Comments
iFunction = 0       # Function Definitions
iClass = 0          # Class Definitions
iIf = 0             # If/ElseIf Statements
iTry = 0            # Try/Catch Blocks
iPrint = 0          # Print Statements
iAssignments = 0    # Assignment Statements
iCodePer = 0        # Percent of Code
iBlankPer = 0       # Percent of Blank Lines
iCommPer = 0        # Percent of Comments
lVariables = list() # Variables
lImports = list()   # Imports

# Initialize Count: Length of code
iLongest = 0
iLongestComment = 0
# For calculating the shortest we have to start high
# The only way this wouldn't work is if ALL lines were greater than 999 chars.
iShortest = 999 
iShortestComment = 999

			
# DO WORK (analyze)
def doWork():	
	global iTotal, iBlanks, iCode, iCommentLine, iComments
	global iFunction, iClass, iIf, iTry, iPrint, iAssignments
	global iLongest, iLongestComment, iShortest, iShortestComment
	global iCodePer, iCommPer, iBlankPer
	global lVariables, lImports
	
	try:
		# Open script file for reading
		fFile = open(sFile, 'r')
	except exMsg:
		# Caught Error
		print sAppName + " Error Opening File: " + sFile
		print exMsg
		exit()
		
	# Cycle thru lines in file
	for sLine in fFile:
		# Increment Total Line Counter
		iTotal += 1

		# Strip all tabs
		sLine = sLine.strip("\t")
		
		# Replace Spaces With Nothing (Trim Spaces)
		sTrim = re.sub(r' ', '', sLine)
		# Trim NewLine Chars
		sTrim = re.sub(r'\n', '', sTrim)
		# Trim TabSpace Chars
		sTrim = re.sub(r'\t', '', sTrim)
	

		# Empty Line?
		if len(sTrim) == 0:
			# Increment Blank Line Counter
			iBlanks += 1
		# Not Empty:
		else:
		

			# Just a regular comment?:
			if sTrim[:1] == "#" and sTrim[:2] != "#!":
				# Increment Comment Count
				iCommentLine += 1
				# Longest Comment Yet?
				if len(sTrim) > iLongestComment:
					iLongestComment = len(sLine)
				# Shortest Comment Yet?
				if len(sTrim) < iShortestComment:
					iShortestComment = len(sLine)
			# Actual Code?:
			else:
				# Increment Code Line Counter
				iCode += 1
				# Longest Line Yet?
				if len(sTrim) > iLongest:
					iLongest = len(sLine)
				# Shortest Line Yet?
				if len(sTrim) < iShortest:
					if len(sTrim) > 0:
						iShortest = len(sLine)
				# Function?
				if sTrim[:3] == "def":
					iFunction += 1
				# Class?
				if sTrim[:5] == "class":
					iClass += 1
				# If/ElseIf?
				if sTrim[:2] == "if" or sTrim[:4] == "elif":
					iIf += 1
				# Try?
				if sTrim[:3] == "try":
					iTry += 1
				# Import?
				if sTrim[:6] == "import" or ((sTrim[:4] == "from") and ("import" in sTrim)):
					# Initialize String
					sImport = ""
					# Grab import name using "import [module]"
					if sTrim[:6] == "import":
						sImport = sLine[7:]
						sSplit = sImport.split(" ")
						sImport = sSplit[0].strip("#").strip("\t")
					# Grab import name using "from [module] import"
					else:
						sImport = sLine[5:]
						sSplit = sImport.split(" ")
						sImport = sSplit[0] + "." + sSplit[2]
					# Add to list of imports
					if not sImport in lImports:
						lImports.append(sImport)
					 
				# Print?
				if sTrim[:5] == "print":
					iPrint += 1
				# Inline Comments
				if "#" in sTrim and (not '"' in sTrim):
					iComments += 1
				# Inline Comments vs. # Characters in "string"
				elif "#" in sTrim and '"' in sTrim:
					# Split string by " Character
					sSplit = sTrim.split('"') # strings with "
								
					# Simple Inline Comment?
					if len(sSplit) == 1:
						iComments += 1
					# Tricky Inline Comment?: (# in "string" or outside?)
					else:
						# Get index for last section of string
						iLast = len(sSplit) - 1
					
						# Get last section of string
						sLast = sSplit[iLast]
					
						# Comment in last section?
						if "#" in sLast:
							iComments += 1
				# Inline Comments vs. # Characters in 'string'
				elif "#" in sTrim and "'" in sTrim:
					# Split string by ' Character
					sSplit = sTrim.split("'") # strings with '
				
					# Simple Inline Comment?
					if len(sSplit) == 1:
						iComments += 1
					# Tricky Inline Comment?: (# in 'string' or outside?)
					else:
						# Get index for last section of string
						iLast = len(sSplit) - 1
					
						# Get last section of string
						sLast = sSplit[iLast]
					
						# Comment in last section?
						if "#" in sLast:
							iComments += 1
				# Assignment Statements 
				if "=" in sLine:
					# We Found the = sign, now we have to weed out the non-assignment stuff
					sTrimA = sLine
					if "==" in sTrimA: 
						sTrimA = re.sub('==','', sTrimA)
					if "<=" in sTrimA: 
						sTrimA = re.sub('<=','', sTrimA)
					if ">=" in sTrimA: 
						sTrimA = re.sub('>=','', sTrimA)
					if "!=" in sTrimA: 
						sTrimA = re.sub('!=','', sTrimA)
					# Do we still have a = sign after that?
					if "=" in sTrimA:
						# Increment assigment count (because we found some kind of assignment)
						iAssignments += 1

						# Grab Variable Name, so we can count all the variables when we're done.
					
						# Get everything left of = character
						sLeft = sTrimA[:sTrimA.index("=")]
						# Split by spaces
						sSplit = sLeft.split(" ")
						

						# Get index of last string section (to get variable name)
						iLast = len(sSplit) - 1
						# If last item is blank or ",',+,-,*,\,\\ char, use the next to last item
						if sSplit[iLast] == "" or sSplit[iLast] == '"' or sSplit[iLast] == '+': 
							iLast -= 1
						if sSplit[iLast] == "-" or sSplit[iLast] == '*' or sSplit[iLast] == "/": 
							iLast -= 1
						if sSplit[iLast] == "'" or sSplit[iLast] == "//":
							iLast -= 1
							
						# "=" is definately inside string if " or ' is first or last char
						# ...so it doesn't count as a variable. (it's just a "#1 String")
						if (not '"' in sSplit[iLast]) and (not "'" in sSplit[iLast]) and (sSplit[0] != "print") and (sSplit[0] != "if"):
							# functions like "def myfunc(self, data=None)" don't count...
							if (not sSplit[0] == "def"):
								
								# Get Variable Name
								sVariable = sSplit[iLast].strip('\t')
									
								# Add Variable to list (If Not Already There)
								if (not sVariable in lVariables):
				
									# Catch split variable assignments?
									if ")" in sVariable:
										# Remove )'s
										sVariable = sVariable.strip(")")
										if ("(" in sSplit[iLast -1]) and ("," in sSplit[iLast - 1]):
											sExtraVar = sSplit[iLast - 1]
											sExtraVar = sExtraVar.strip(",")
											sExtraVar = sExtraVar.strip("(")
											# add our extra variable we caught, if not already there
											if (not sExtraVar in lVariables): 
												lVariables.append(sExtraVar)
									
									# variable passed all of our tests, append it
									lVariables.append(sVariable)
									
												
									
			
	# Close File
	fFile.close()


	# Find Percentage of Code Lines vs. Total
	iCodePer = (iCode / float(iTotal)) * 100
	# Find Percentage of Comment Lines vs. Total
	iCommPer = (iCommentLine / float(iTotal)) * 100
	# Find Percentage of Blank Lines vs. Total
	iBlankPer = (iBlanks / float(iTotal)) * 100

# REPORT
def printReport():
	# Report Information
	print "                        "
	print "    Finished analyzing: " + sFile + "..."
	print "                        "
	print "           Total Lines: " + str(iTotal)
	print "                        "
	print "            Code Lines: " + str(iCode) + " (" + str(iCodePer)[:7] + "%)"
	print "           Blank Lines: " + str(iBlanks) + " (" + str(iBlankPer)[:7] + "%)"
	print "                        "
	print "         Comment Lines: " + str(iCommentLine) + " (" + str(iCommPer)[:7] + "%)"
	print "       Inline Comments: " + str(iComments)
	print "        Total Comments: " + str(iCommentLine + iComments)
	print "                        "
	print "     Longest Code Line: " + str(iLongest)
	print "    Shortest Code Line: " + str(iShortest)
	print "                        "
	print "       Longest Comment: " + str(iLongestComment)
	print "      Shortest Comment: " + str(iShortestComment)
	print "                        "
	print "               Imports: " + str(len(lImports))
	print "             Functions: " + str(iFunction)
	print "               Classes: " + str(iClass)
	print "         If Statements: " + str(iIf)
	print "        Try Statements: " + str(iTry)
	print "      Print Statements: " + str(iPrint)
	print "           Assignments: " + str(iAssignments)
	print "             Variables: " + str(len(lVariables))

# IMPORTS
def printImports():
	global lImports
	
	# Print Header
	print sAppName + ":"
	if len(lImports) > 0:
		# sort imports?
		if not "d" in sys.argv[1]:
			lImports = sorted(lImports)
		# print imports
		print " " + str(len(lImports)) + " imports found in " + sFile + ":"
		for sImp in lImports:
			print sSpacing + sImp
			
	# No Imports Found!
	else:
		print " No Imports Found In " + sFile + "!"
		
# VARIABLES
def printVariables():
	global lVariables
	
	# Print header, if printImports() didn't already...
	if (not "i" in sys.argv[1]):
		print sAppName + ":"
	# Print Variables	
	if len(lVariables) > 0:
		# sort variables?
		if not "d" in sys.argv[1]:
			lVariables = sorted(lVariables)
		# print variables
		print " " + str(len(lVariables)) + " variables (" + str(iAssignments) + " assignments) found in " + sFile + ":"
		for sVar in lVariables:
			print sSpacing + sVar
	# No Variables Found!
	else:
		print " No Variables Found In " + sFile + "!"

# HELP		
def printHelp():
	print " "
	print sAppName +  " Help:"
	print sUsage
	print " "
	print " h : Show this help message"
	print " v : Print all variables found"
	print " i : Print all imports found"
	print " d : Sorts variable/import lists in the order they were discovered,"
	print "      ...default is alphabetically."
	print " "
	print "Note:"
	print "      As of right now the variable finder is not %100 accurate,"
	print "      ...it works for simple variable assignments but if it finds"
	print "      ...a line like this: (variable1, variable2) = myFunc(args)"
	print "      ...it will grab variable2, but not always variable1."
	print "      This program does not handle doc strings yet, they"
	print "     ...will be counted as code."
	exit()
		
# ! --- Start Of Script --- ! #

# -- Script Shelled not Imported, so do some work
if __name__ == "__main__":
		
	# Show Help? (Do this before we waste our doWork on nothing)
	if ("-h" in sys.argv[1]):
		printHelp()
		exit()

	# Is valid file? (Don't waste our time on bad filenames)
	if path.isfile(sFile) == False:
		print sUsage
		print "Expecting valid filename!"
		exit()

	# Do All Analyzing
	doWork()
	
	# Extra arguments?
	if len(sys.argv) > 2:
		# Is Import List?
		if "i" in sys.argv[1]:
			printImports()
	
		# Is variable list?
		if "v" in sys.argv[1]:
			printVariables()
		
		# only -d flag passed? why would you do that?
		if ("-" in sys.argv[1]) and ("d" in sys.argv[1]) and (not "i" in sys.argv[1]) and (not "v" in sys.argv[1]):
			print sUsage
			print "'d' flag doesn't do anything without the 'v' or 'i'!"
		# Don't go to report since argument options passed
		exit()
	# No flags passed? Print Regular Report
	printReport()
	# EXIT SCRIPT
	exit()
TrustyTony 888 ex-Moderator Team Colleague Featured Poster

Looks quite a tidy effort, I would like if you changed to Python PEP8 style naming like print_report() and would indent with 4 spaces. I personally find it more clear also to use startswith method instead of == with slice.

Another suboptimal thing is that you are using if where clearly conditions are exclusive, like

# Function?
if sTrim[:3] == "def":
 iFunction += 1
# Class?
if sTrim[:5] == "class":
  iClass += 1

Which could be

# Function?
if s_trim.startswith("def"):
    i_function += 1
# Class?
elif s_trim.startswith("class"):
    i_class += 1

Of course there could be more efficient way such as having counts in list or dictionary instead of separate variables, but that is another story.

i see, i'm very new to Python, a couple weeks in. I'm still learning the ways. My code still reads like VB.Net and i'm still learning some of the builtins. Thanks for your input :)

Thank you moderator, I was having trouble with "edit post". It wouldn't let me scroll down past about the 221st line for some reason. I realize this is kinda big to be a "snippet", so maybe that's my fault. Thanks for fixing it though...

TrustyTony 888 ex-Moderator Team Colleague Featured Poster

I have reported the bug, it could be solved by replacing the whole snippet with new snippet, but it was not possible to edit end ol the ost in place.

Rashakil Fol 978 Super Senior Demiposter Team Colleague

Hello Christopher Welborn. Thanks for posting your code. By posting your code, you have granted DaniWeb an exclusive copyright license to your code according to DaniWeb's terms of service. You may no longer use it and have no rights to you code. Please delete your code from your computer. As the Terms of Service say:

Any and all information posted on DaniWeb may not be copied or used elsewhere, in any way, shape, or form, either on the Internet or in print, without prior written permission from Dani Horowitz.

Further transmission of your source code material, such as in a personal project or in handing in an assignment, may be prosecutable as criminal copyright infringement.

happygeek commented: yawn +0

That is a copy/paste copyright/warning that I put into any whole-project I release publicly, I realize that it doesn't do much for me but I do like the warning it gives about damage or harm. Anything I post on DaniWeb is for the cause, for education, entertainment, utility, or other purposes, for everyone. If i want to keep something for myself, or protect the source, I take other measures and I definately would not be posting it in a forum such as this, with code wide-open.

rubik-pypol 0 Newbie Poster

Nice work, how can I copy-paste the code to try it locally? It seems to me that it's not possible to disable line numbers, and they get inside the selection.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster

double click the code and copy/paste.

I updated this code for better performance, it can now handle Doc Strings and "Quote Comments", it lists Exceptions caught with except, and variable recognition is much better. It can now recognize multiple variables on the same line like x1, x2, x3 = ["test1", "test2"].. and just recognizes variables better than it used to. i.e. (it doesn't break as easy).The updated code is at: welbornproductions.net ...I would have uploaded it here but I didn't want to duplicate my post.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.