Hi,

I am a little newbie with python, more use of C++
The goal is to search a specifed filename e trought multiple directories.
After finding the file, search trough it's content some description on some strings and write out the description on a xml file.
So far i manage to search trough a file a single line that is in the same directory with script, but this isn't the goal, just a starting point.

The content of one control.ini - the searched file file looks like this:

[INFO]
    name = ping
    description = '''This test runs the ping utility to test the connectivity
                     between target and server hosts with the folowing IP   addres: 192.169.0.4.
                 '''
timeout = 300
version1

search file trought directories, search content of file, copy content to a file
and my code so far is:

import os , os.path
r = open('/home/test/control.ini', 'r')
w = open('/home/test/result.txt' , 'w')
for line in r.readlines():
if "description" in line :
x= str(line)
print x
r.closed
w.write(x)
w.closed

Maybe i wold like to put the name, version and timeout later out.
The field description, version, name and timeout is the same in all controls.ini.

Any suggestion is much aprecieted.

Thanks!

Edited 6 Years Ago by playonlcd: n/a

why did you do c++ before python. c++ is a low level language and python is the one for beginners

Well, that was the begining for me..
Now i have to shift to python..

Anyway, i manage to search the file trought directories with a UNIX command.
Tha main problem is that i can't get the hole text out of apostrofes, i down't have any idea...
Temporaly i am putting it into a text file.

Till now looks like this:

import string, os
search = 'find . -name "control.ini" -print'        
w = open('/home/test/result.txt' , 'w')
    for file in os.popen(search).readlines():    
    name = file[:-1]                      
    for line in open(name).readlines():    #file scan
        pos1 = string.find(line, 'description')
        if  pos1 >= 0:
            print line      #report
            w.writelines(line)              
w.close

Edited 6 Years Ago by playonlcd: n/a

Okay, I think I have the whole thing working for you.

I have left the file paths as I had them on my system (working on Ubuntu Linux). Just change them for your system.

I have spent a couple hours figuring this out and so I haven't commented very well. I did test it on my system, exactly as I put it here, so it should work.

If you have questions, I will answer them later.

#! /usr/bin/python

# playonlcd.py

import re
import os
from xml.dom import minidom

class CntIniFileDetails:
    """Go through the specified directories and find all the control.ini files
    
    extract the:
        description string
        timout value
        version number
    and put it all together as xml and print to designated file"""

    def __init__(self, root_dir):
        self.root = root_dir
        self.CntList = []
        self.block = "<control_details>"
    def findCtrIni(self):
        """Go through the specified portion of specified directories

        return a list of 'control.ini' files with their full paths"""
        for dirPath, dirNames, fileNames in os.walk(self.root):
            if 'control.ini' in fileNames:
                self.CntList.append("%s/control.ini" % dirPath)
        return self.CntList
    def readEach(self):
        """Read each file and extract content
        
        return an xml format string to join into an xml file"""
        for CtrFiles in self.findCtrIni():
            openTags = "<file><filename>CtrFiles</filename>"
            endTag = "</file>"
            r = open(CtrFiles, 'r')
            fileCont = r.read()
            r.close()
            des = re.split("'''", fileCont)
            description = "<description>%s</description>" % des[1]
            con = re.split('\n', fileCont)
            for lines in con:
                if lines.startswith("timeout ="):
                    timeout = "<timeout>%s</timeout>" % lines[10:]
                if lines.startswith("version"):
                    version = "<version>%s</version>" % lines[7:]
            chunk = openTags + description + timeout + version + endTag
            self.block = self.block + chunk
        self.block = self.block + "</control_details>"
        return self.block

    def buildXml(self):
        """Takes the xml like string (self.block)
        and builds it into a xml file
        and then writes to a designated file"""

        content = self.readEach()
        xmlOut = minidom.parseString(content)        
        conIni = open('/home/vernon/Python/trying/controlDetails.xml', 'w')
        conIni.write(xmlOut.toxml())
        print "One file written to:\n/home/vernon/Python/trying/controlDetails.xml"


if __name__ == "__main__":
    test = CntIniFileDetails('/home/vernon/Python/')
    test.buildXml()

Many many thanks!
Long way till there :d

The one last thing is that XML should be created from program.
I think i can handle with this.

Take a look in the final method in the class - it creates an xml file with each of the details you wanted.

Take a look in the final method in the class - it creates an xml file with each of the details you wanted.

I run the program but in my case it creates a xml with no data; only with
"<?xml version="1.0" ?><control_details/>"

Did you correct all the file paths throughout the program? It is going to search a directory and all the files within it. I guess that you need to build a thing in it that stops it building the xml file when it didn't find a control.ini file.

Just put a series of prints it is to see what each variable is doing.

Does the code make sense? Is there anything that you don't understand?

I see that I left line 62 in there from when I was testing it. Just delete that, but if you create a series of print and try catch tests, you should have it all working soon.

I basically built it all up one step at a time in the python ide. You can de-construct it in the same manner.

Are you on Linux. If not, there may be a few things that work different for you?

Yes, i correct the path.
The code seems far to different form C but is intuitive.
I understand it but because i don't have any experience with object oriented i don't have much ideas to write it and the lack of exeperience with python.

THe XML has only 40 bytes, asuming that i have almost 2000 files with control.ini i am sure that xml is empty....

The text that i generated with my code has 430 KB...

I am on Ubuntu.

Edited 6 Years Ago by playonlcd: n/a

I am sorry that it is not working for you. I have just been testing it again, running the tests further down the file path (I had only let it search a small directory) and created more control.ini files, and it worked each time.

Yes, i correct the path.

Okay, try the test

#
test = CntIniFileDetails('/path/to/try/')

changing line 66 (or whatever it turns into on your text editor) to just one directory down from one control.ini file and see if it finds just that one.

I don't know if you are familiar with the __name__ == "__main__" trick, but it means you can run the thing from command line:

vernon@sandcurve3:~/Python/trying$ python playonlcd.py
One file written to:
/home/vernon/Python/trying/controlDetails.xml
vernon@sandcurve3:~/Python/trying$

Here is an example running it from the command line.

The code seems far to different form C but is intuitive.
I understand it but because i don't have any experience with object oriented i don't have much ideas to write it and the lack of experience with python.

If you have done C++ you are going to find pythons object orientation really easy to get the hang of. Read 'dive into python' and just ignore the fact that some of the stuff is out of data - you can figure all that out later.

Basically with the test:

#
if __name__ == "__main__":
#
test = CntIniFileDetails('/file/path/to/test/')
#
test.buildXml()

the "test = CntIniFileDetails('...." intanciates and 'test.buildXml()' calls a method.

I assume that you know enough Python to understand the __init__ method - otherwise perhaps a Python project of this nature is really a step beyond what you are capable of at this point in Python. I don't mean to be rude at all with that statement - I would be totally lost trying to create a similar file in C/C++. It is going to involve regular expression - horrid messy stuff that isn't even a builtin in Python, xml handling (perhaps my xml handling here is not that good either - if it was my program I would have used a database (sqlite3) instead, and searching the file path isn't so simple either. Basically, everything about what you are trying to do is rather advanced and unless you can get up to speed with Python rather fast, you may find that your project is a little tricky for where you are at, at this point.

Anyway, first method def __init__. That just sets you up, don't call it directly, self does that for you. From there on, in this class, the methods are calling each other backwards. So when you write: test.buildXml(), the buildXml() automatically calls the previous method - readEach(self), and it in turn calls findCtrIni(self). You can test with a call to readEach(self) or findCtrIni().

Try this. Change the findCtrIni() method as follows:

def findCtrIni(self, printThis=0):
        """Go through the specified portion of specified directories

        return a list of 'control.ini' files with their full paths"""
        for dirPath, dirNames, fileNames in os.walk(self.root):
            if 'control.ini' in fileNames:
                self.CntList.append("%s/control.ini" % dirPath)
        if printThis == 1:
            for eachFilePath in self.CntList:
                print eachFilePath
        return self.CntList

Remember that indentation must be spot on.

Then change your test script at the bottom like this:

if __name__ == "__main__":
    test = CntIniFileDetails('/a/file/location/')
    test.findCtrIni(1)

what you have done now is add an additional parameter to the method. It is a rather cool Python trick, that you can have optional parameters in a method call (or function.) Call it with test.findCtrIni(1) and it runs the little 'for' loop at the bottom of the method, and shows you if it found any directories with control.ini files in. If you want to get my code working, let me know what that gives you can we can work from there. Do something like this:

vernon@sandcurve3:~/Python/trying$ python playonlcd.py
/home/vernon/Public/Programming_Info/control.ini
/home/vernon/Python/trying/control.ini
vernon@sandcurve3:~/Python/trying$

THe XML has only 40 bytes, assuming that i have almost 2000 files with control.ini i am sure that xml is empty....

The text that i generated with my code has 430 KB...

I am on Ubuntu.

Yes, clearly something is not working. I am also on Ubuntu, using both 9.04 and 9.10 on my desktop/laptop respectively.

Well, running the program again, without any modification from upper post the result is that it's finds the description but the index for description is to small:

With a print descriptionprints the description field.

Traceback (most recent call last):
  File "test3.py", line 72, in <module>
    test.buildXml()
  File "test3.py", line 63, in buildXml
    content = self.readEach()
  File "test3.py", line 43, in readEach
    description = "<description>%s</description>" % des[1]
IndexError: list index out of range

Yes, i am alittle overhead with python, but i think that practice make more than reading. :d

Okay, you will just need to refine how it searches for files and how it actually finds the bits you are looking for. It worked with the example file you gave, but if you had interfering text in the files, there could be issues with the simple search stuff I did.

At a close look i don't understand how you manage to search "description" field as is on multiple lines.

What is used for con = re.split('\n', fileCont) ?

I read some of regular expression on Python bu i can't manage to locate the field description as is on multiple lines in some files and in other on one line.

One condition would be a switch on " ' ' ' " condition. Write lines till next " ' ' ' "
But python don't have "switch case" like in C.....

Here is how it searches the description:

des = re.split("'''", fileCont)
description = "<description>%s</description>" % des[1]

This assumes that you only use the ''' around the description, and it splits the whole file into three and puts it in a list. the first bit is the part before the first encounter of '''. the second is the description, and the third is the parts after the second '''

Then I just called the second part des[1] You know that lists start with 0 as the index for the first bit, so the [1] refers to the second item in the list des...your description. Of course if there are other occurrences of ''' in your file, it will not work. It doesn't use regular expression.

I am stuck with this piece of cod, still getting the error:

IndexError: list index out of range

It seems that i am trying to read something that is not on the list of the list is keeping indexing and overflows?

There must be another way to repeatedly read the split portion from list...

With a few files works fine, but from hundred above is no use.

Also, i have another problem, i have fields of description and name that aren't market by quotes, it seems a little bit difficult to identify them.

Manage to get rid of error with a try exept error around the code:

try:                 
       des = re.split("'''", fileCont)
       description = "<description>%s</description>" % des[1]
except IndexError:
        pass

Edited 6 Years Ago by playonlcd: n/a

Hi,
Happy new year!!

But back to work :)

I manage to select the text, but i am unable to write it to an XML since i donw't have much experience with XML....
I selected the text using dictionaries instead regular expression, and is much elegand than processing lines.
The code till now looks like this:

#/usr/lib/python

import re, os
from xml.dom import minidom
from configobj import ConfigObj


class FileDescription:    
  
    def __init__(self, root_dir):
        self.root = root_dir
        self.CntList = []
        self.block = "<control_details>"
        self.description_arg=""

               
        
    def FindControl(self):       
        for dirPath, dirNames, fileNames in os.walk(self.root):
            if 'control.ini' in fileNames:
                self.CntList.append("%s/control.ini" % dirPath)                
        return self.CntList
        
        
        
    def readEach(self): 
        conIni=open('/home/playonlcd/Details.xml', 'w')
        for CtrFiles in self.FindControl():
            openTag = "<file><filename>%s</filename>" %CtrFiles
            endTag = "</file>"    
            config=ConfigObj(CtrFiles)
            INFO=config.get('INFO')       
            name="<name>%s</name>" %INFO.get('name')
            line_description=INFO.get('description')
            descript=re.sub('\n'," ", line_description)
            description = "<description>%s</description>" %descript
            author="<author>%s</author>" %INFO.get('author')
            packages_tested="<packages_tested>%s</packages_tested>" %INFO.get('packages_tested')
            descript_info = name + '\n' + description + '\n' + author + '\n' + packages_tested + '\n'
       

                
            if config.has_key('ARG'):
               descript_arg=""
               ARG=config.get('ARG')
               for i in ARG.iterkeys():
                   x=ARG.get(i)
                   arg_name="<arg_name>%s</arg_name>" %x.get('name')    
                   line_descript = x.get('description')
                   arg_descript = re.sub('\n'," ", line_descript)
                   arg_description = "<arg_description>%s</arg_description>"  %arg_descript
                   descript_arg = descript_arg + '\n' + i  + '\n' + arg_name  + '\n' + arg_description  + '\n' 
                   self.description_arg=descript_arg                                     
            self.block = openTag + descript_info + self.description_arg + endTag
            
            content=self.block           
            xmlOut = minidom.parseString(content)            
            conIni.write(xmlOut.toxml())          

     


         
if __name__ == "__main__":
    test = FileDescription('/home/playonlcd/')
    test.readEach()

And the error is this:

Traceback (most recent call last):
File "test1.py", line 69, in <module>
test.readEach()
File "test1.py", line 60, in readEach
xmlOut = minidom.parseString(content)
File "/usr/lib/python2.5/xml/dom/minidom.py", line 1925, in parseString
return expatbuilder.parseString(string)
File "/usr/lib/python2.5/xml/dom/expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "/usr/lib/python2.5/xml/dom/expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: mismatched tag: line 2, column 140

What is writes till the error still i can't view it in XML, in a text editor i can see the fields. Deleting <file> and </file> from inside text and let just <file> at beginning of file and at the end </file> seems to load the XML file ok. But i don't have any idea how to write it from program, since the block is constructed inside "for" and it must be process by parseString from minidom.

Any idea is much appreciated.
Thanks in advance.

Edited 6 Years Ago by playonlcd: n/a

Sorry for late delay.
I manage to finish the script.
I studied more how the python handles xml and i have been able to create a xml document with a header and some child's.

This question has already been answered. Start a new discussion instead.