Find, Match and replace..?

Question

turnerca902 0 Newbie Poster

17 Years Ago

Hi all,

I have a little project that I think would be possible to accomplish using python. I'm a beginner at this sort of thing, so I am posting the basic information in the hopes that someone out there could help me get started or point me in the right direction.

I have a .dbf file which contains three fields: an id (1), a Mapsheet # (455626) and finally a Mapsheet Name that corresponds to the Mapsheet# ( Big Lake ). There are +500 records in the table.

I have metadata files (179 of them) that have a corresponding Mapsheet# to the records in the .dbf table. What I want to do is read in each .xml file, search for this line:

<title Sync="TRUE">4555626.tif</title>

And replace the 4555626.tif text with the matching record from the .dbf table - (Big Lake)

Is this even possible? (I assume it is) Or too difficult for a beginner?

Thanks for your help out there.

python xml

3 Contributors
8 Replies
154 Views
5 Days Discussion Span
Latest Post 17 Years Ago Latest Post by woooee

All 8 Replies

woooee 814 Nearly a Posting Maven

17 Years Ago

I have a .dbf file which contains three fields: an id (1), a Mapsheet # (455626) and finally a Mapsheet Name that corresponds to the Mapsheet# ( Big Lake ).

I assume you are using something like dbfpy to read the file, so populating the dictionary would be fairly simple. The following is just pseudo code.

import os

class test_class:
   def __init__(self, dbf_dir):
      self.map_dic={}
      file_names = os.listdir(dbf_dir)
      for file_nm in file_names:
         if file_nm.endswith(".dbf")     ## assumes no dirs end with '.dbf'
            self.read_dbf(file_nm)
      print self.map_dic()

   def read_dbf(self, file_nm):
      recs = read_file_recs_using_dbfpy(file_nm)
      for rec in recs:
         mapsheet_name = rec[1]     ## or whatever the form
         mapsheet_num = rec[3] 
         self.map_dic[mapsheet_num] = mapsheet_name

TC=test_class( "dbf_dir_name" )

What I want to do is read in each .xml file, search for this line:
<title Sync="TRUE">4555626.tif</title>
And replace the 4555626.tif text with the matching record from the .dbf table - (Big Lake)

So you want to use rec.find('<title Sync="TRUE">') and rec.find("</title>"). If both are found then this is the correct record and you can then calculate the beginning and end positions for a rec.split(). You can then look the number up in self.map_dic and use the corresponding name.

woooee 814 Nearly a Posting Maven

17 Years Ago

Sorry, I meant slice, rec[begin:end], not rec.split(). Also, you possibly want a separate program file for 'read the dbf' and then would import it into 'read the XML' to keep it from getting too confusing as can happen when everything is in one file.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

slate 241 Posting Whiz in Training · Answer 1 · 2008-06-02T06:21:32+00:00

First build a dictionary with keys of mapsheet nrs and values of mapsheet names.

High hack value, instable code:
Filter the xml line by line with a regular expression like:
pattern=re.compile(r'<title Sync="TRUE">(([0-9]+)\.tif).*?')
Replace line.match.group(X) whith the appripirate value from the dictionary

More reading, more learning, elegant code, maybe overkill:
Parse the xml with expat, write everything out unchanged except the title tag. On the title tag's value perform split(".") and dictionary lookup.

turnerca902 0 Newbie Poster · Answer 2 · 2008-06-02T19:04:47+00:00

Hey slate,

Thanks for your response. What you are saying makes sense, but I have been searching around for a command that will allow me to read a file into a dictionary, and can't find such a thing. I can't imagine having to manually enter each of the records like this:

dic1 = {([('sape', 4139), ('guido', 4127), ('jack', 4098)])

How do I automate this process?
Thanks Again.

-Cat

turnerca902 0 Newbie Poster · Answer 3 · 2008-06-02T19:41:08+00:00

Hey slate,

dic1 = {([('sape', 4139), ('guido', 4127), ('jack', 4098)])

Sorry...the example I used is obviously not showing quite what I meant. Revised below:

tel = {'sape': 4139, 'guido': 4127, 'jack': 4098}

turnerca902 0 Newbie Poster · Answer 4 · 2008-06-03T01:11:38+00:00

Ok...for the moment, this is what I have. I constructed my own dictionary as you suggested. I am working in a test folder with only 3 files, so building a dictionary manually is not difficult at this point...

So..I am wondering how to take the next step here and read thru each .xml file and search for my String vairable. (And then match this string variable with the appropriate dictionary record and substitute in the alternate name.)

import os, fileinput
# list just the files in a given folder
folder = 'E:/Manifold/4555630/Test_meta'
String = ('<title Sync="TRUE">" + (filename) + ".tif</title>')
dict = {'Marshy Hope' : 4555630, 'Almost Home' : 4555631, 'Shady Grove' : 4555632}
for (paths, dirs, files) in os.walk(folder):
    # testing, this shows a list of filenames
    print files
    # now loop through the list of filenames
    for filename in files:
        # testing
        print filename
        #now select only the xml's
        for filename in files:
            filesplit = filename.split('.')
            filenameroot = filesplit[0]
            try:
                fileext = filesplit[2]
            except:
                print 'no xml here'
            else:
                if fileext == 'xml':
                   print 'Metadata Found!'

(Yeah, it's still very basic, but I appreciate suggestions from anyone who is able to give them!)

turnerca902 0 Newbie Poster · Answer 5 · 2008-06-03T19:53:18+00:00

Thanks woooee,

Very helpful...but I am afraid I must ask another really beginner-y question (Or two).

import os

dbf_dir = 'E:\\Manifold\\4555630\\Test_meta\\'
file_nm = 'E:\\Manifold\4555630\\Test_meta\\TEST_LUT.dbf'
TC=test_class(dbf_dir)

class test_class:
   def __init__(self, dbf_dir):
      self.map_dic={}
      file_names = os.listdir(dbf_dir)
      for file_nm in file_names:
         if file_nm.endswith(".dbf")     # assumes no dirs end with dbf
            self.read_dbf(file_nm)
      print self.map_dic()

   def read_dbf(self, file_nm):
       recs = read_file_recs_using_dbfpy(file_nm)
       recs = openFile(file_nm , readOnly=1)
       for rec in recs:
          mapsheet_name = rec[1]     # or whatever the form
          mapsheet_num = rec[0] 
          self.map_dic[mapsheet_num] = mapsheet_name

I have a syntax error on this line, and I cannot figure out what the problem is. Do you see it?:

if file_nm.endswith(".dbf")     # assumes no dirs end with dbf

I'm also not quite sure what to make of the line:

recs = read_file_recs_using_dbfpy(file_nm)

Does the line below it that I added make sense as a substitute?

Thanks for your time and assistance!

woooee 814 Nearly a Posting Maven · Answer 6 · 2008-06-03T21:14:12+00:00

if file_nm.endswith(".dbf") # assumes no dirs end with dbf
requires a colon at the end
if file_nm.endswith(".dbf"):

recs = read_file_recs_using_dbfpy(file_nm)
is just a pseudo-code statement for any routine that will read the dbf file and return the data in some form that can be used by the program.

Find, Match and replace..?

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers