Hi, Guys,

I use RG to handle some txt files in Python to rearrange format of these text files. These text files look like below

MATERIAL
NAME=STEEL IDES=S W=76819.55
T=0 E=1.99948E+11 U=.3 A=.0000117 FY=3.447379E+08
NAME=CONC2 IDES=C W=23560
T=0 E=2.48211E+10 U=.2 A=.0000099
NAME=OTHER IDES=N W=76819.55
T=0 E=1.99948E+11 U=.3 A=.0000117
NAME=CON3 IDES=C W=23560
T=0 E=2.48211E+10 U=.2 A=.0000099

FRAME SECTION
NAME=CON450X450 MAT=CONC2 SH=R T=.45,.45
NAME=FSEC1 MAT=CON3 SH=P T=.3048
NAME=B200 MAT=CONC2 SH=R T=.16,.2
NAME=B300 MAT=CONC2 SH=R T=.16,.3
NAME=B400 MAT=CONC2 SH=R T=.16,.4
NAME=B400H MAT=CONC2 SH=R T=.16,.4

SHELL SECTION
NAME=WALL1 MAT=CONC2 TYPE=Shell,Thin TH=.25
NAME=SLAB1 MAT=CON3 TYPE=Shell,Thin TH=.25
NAME=DECK1 MAT=CONC2 TYPE=Membr TH=.0889
NAME=PLANK1 MAT=CONC2 TYPE=Membr TH=.25
NAME=SLAB2 MAT=CONC2 TYPE=Shell,Thin TH=1

I am using simple RE to identify location of useful information below
for line1 in alllines:
shelltype=re.search('TYPE=Shell',line1)
if shelltype is not None:
....

However, as I am a new user, I doe not know
1) how to locate range of these useful information, for instance from which line to which line in a text file
for example: can I locate the number of the line on which only "FRAME SECTION" exists

2) how to extract values for example .45,0.45 or 1.99948e+11

3) how to assign these extracted values into some variables so that I can call these variables later on

Could you please give me some help? thank you so much.

Ning

Recommended Answers

All 9 Replies

Do you just want to extract the values for the "MATERIAL" and "STEEL" or do you want all values under "MATERIAL"? Ditto for "FRAME" and "SHELL". You would test for "MATERIAL" or "FRAME" and then extract depending or which catagory it is, and use a dictionary to store the values, with "CON450X450", etc as the key.

You don't even need regexes: your file is 'object orientated' ;)
I fact in your file, there is a list of 'blocks'. Each block has a headline (the 1st line) and the other lines are lists of pairs key/value which can be readily transformed into dictionaries (2 pairs are separated by a space char, and the '=' char separates the key from the value). All you need to parse your file is 'strip()' and 'split()'.

Do you just want to extract the values for the "MATERIAL" and "STEEL" or do you want all values under "MATERIAL"? Ditto for "FRAME" and "SHELL". You would test for "MATERIAL" or "FRAME" and then extract depending or which catagory it is, and use a dictionary to store the values, with "CON450X450", etc as the key.

Hi, woooee

Actually I need to capture all information following NAME. MATERIAL, TYPE and son on. I will check dictionary function and have a go. Thank you a lot.

Best Wishes

Ning

You don't even need regexes: your file is 'object orientated' ;)
I fact in your file, there is a list of 'blocks'. Each block has a headline (the 1st line) and the other lines are lists of pairs key/value which can be readily transformed into dictionaries (2 pairs are separated by a space char, and the '=' char separates the key from the value). All you need to parse your file is 'strip()' and 'split()'.

Hi, Gribouillis,

Thank you for your reply. Actually those blocks are only a little part of my file, the other part is large amount of data. The following are some example. Thus I use RE. Is it still right to use RE considering this circumstance.

21 J=12,7198 SEC=CON450X450 NSEG=2 ANG=0
22 J=7198,7199 SEC=CON450X450 NSEG=2 ANG=0
23 J=7199,7200 SEC=CON450X450 NSEG=2 ANG=0
24 J=7200,156 SEC=CON450X450 NSEG=2 ANG=0
25 J=14,7201 SEC=CON450X450 NSEG=2 ANG=0
26 J=7201,7202 SEC=CON450X450 NSEG=2 ANG=0
27 J=7202,7203 SEC=CON450X450 NSEG=2 ANG=0
28 J=7203,157 SEC=CON450X450 NSEG=2 ANG=0
29 J=16,7204 SEC=CON450X450 NSEG=2 ANG=0
30 J=7204,7205 SEC=CON450X450 NSEG=2 ANG=0
31 J=7205,7206 SEC=CON450X450 NSEG=2 ANG=0
32 J=7206,158 SEC=CON450X450 NSEG=2 ANG=0
33 J=18,7207 SEC=CON450X450 NSEG=2 ANG=0
34 J=7207,7208 SEC=CON450X450 NSEG=2 ANG=0
35 J=7208,7209 SEC=CON450X450 NSEG=2 ANG=0
36 J=7209,159 SEC=CON450X450 NSEG=2 ANG=0
37 J=20,7210 SEC=CON450X450 NSEG=2 ANG=0
38 J=7210,7211 SEC=CON450X450 NSEG=2 ANG=0
39 J=7211,7212 SEC=CON450X450 NSEG=2 ANG=0
40 J=7212,160 SEC=CON450X450 NSEG=2 ANG=0
41 J=22,7213 SEC=CON450X450 NSEG=2 ANG=0
42 J=7213,7214 SEC=CON450X450 NSEG=2 ANG=0
43 J=7214,7215 SEC=CON450X450 NSEG=2 ANG=0
44 J=7215,161 SEC=CON450X450 NSEG=2 ANG=0
45 J=24,7216 SEC=CON450X450 NSEG=2 ANG=0
46 J=7216,7217 SEC=CON450X450 NSEG=2 ANG=0
47 J=7217,7218 SEC=CON450X450 NSEG=2 ANG=0
48 J=7218,162 SEC=CON450X450 NSEG=2 ANG=0
49 J=26,7219 SEC=CON450X450 NSEG=2 ANG=0
50 J=7219,7220 SEC=CON450X450 NSEG=2 ANG=0
51 J=7220,7221 SEC=CON450X450 NSEG=2 ANG=0
52 J=7221,163 SEC=CON450X450 NSEG=2 ANG=0
53 J=28,7222 SEC=CON450X450 NSEG=2 ANG=0
54 J=7222,7223 SEC=CON450X450 NSEG=2 ANG=0
55 J=7223,7224 SEC=CON450X450 NSEG=2 ANG=0
56 J=7224,164 SEC=CON450X450 NSEG=2 ANG=0
57 J=30,7225 SEC=CON450X450 NSEG=2 ANG=0
58 J=7225,7226 SEC=CON450X450 NSEG=2 ANG=0
59 J=7226,7227 SEC=CON450X450 NSEG=2 ANG=0
60 J=7227,165 SEC=CON450X450 NSEG=2 ANG=0
61 J=32,7228 SEC=CON450X450 NSEG=2 ANG=0

5011 J=548,5406,4833,6103 SEC=SLAB1
5012 J=4833,6103,4832,6104 SEC=SLAB1
5013 J=4832,6104,1167,6105 SEC=SLAB1
5014 J=1167,6105,4831,6106 SEC=SLAB1
5015 J=5406,2625,6103,2607 SEC=SLAB1
5016 J=6103,2607,6104,2589 SEC=SLAB1
5017 J=6104,2589,6105,2571 SEC=SLAB1
5018 J=6105,2571,6106,2553 SEC=SLAB1
5019 J=4830,6107,1166,6108 SEC=SLAB1
5020 J=1166,6108,4829,6109 SEC=SLAB1
5021 J=4829,6109,4828,6110 SEC=SLAB1
5022 J=4828,6110,547,6111 SEC=SLAB1
5023 J=547,6111,6099,6112 SEC=SLAB1
5024 J=6099,6112,541,6113 SEC=SLAB1
5025 J=541,6113,6096,6114 SEC=SLAB1
5026 J=6096,6114,540,6115 SEC=SLAB1
5027 J=540,6115,4821,6116 SEC=SLAB1
5028 J=4821,6116,4820,6117 SEC=SLAB1
5029 J=4820,6117,1165,6118 SEC=SLAB1
5030 J=1165,6118,4819,6119 SEC=SLAB1

I was thinking of split() and unfortunately cannot find a way to locate these block (ie, identify their line number) as in different files, their location are different. Thank you so much and I will check strip().

Best Wishes

Ning

Hi, Guys

I guess it is better to ask for two simple questions firstly.

Assuming that there are two lines in a text file as follows:
NAME=STEEL IDES=S W=76819.55
T=0 E=1.99948E+11 U=.3 A=.0000117 FY=3.447379E+08

I wan to generate a dictionary called STEEL whose indices would include IDES, W, T, E, U, A, FY. Values of these indoces would be extracted from the two lines. My concern is how to achieve this target as there are two lines rather tahn one line here. So my two questions are

1) how to extract S from IDES=S or 76819.55 from W=76819.55? I am trying split()...however cannot work out
2) when I use match() or search() to identify the first one of the two lines, how to extract data from the second line as I only have index of the first line?

all the best

ning

You can obtain a dictionary for each line like this

def lineTodic(line):
    return dict(item.split("=") for item in (x for x in line.strip().split()))

apparently, this should work for all the lines containing = in your file. This is not exactly the dictionary that you described, but you can then update the dictionary of the first line with the dictionary of the second line, etc.

This will add a second line where necessary. How you process the data depends on what you want to do with it. Is it this just to rearrange and write to a file? If so the following code, with a few modifications to take into account the lines not processed by it, should work. If not post back with specifics.

This code uses a dictionary of lists, and should be self-explanatory. There are 2 functions, one to process the "NAME=" record, and a second to append additional data. The trick is to pass the existing dictionary and key to the append function. The code also creates an empty dictionary before processing the next "NAME=" record. If you are comfortable using a class, this code would be a good candidate. Keeping track of the variables is easier IMHO.

"""-----------------------------------------------------------
test file contains this data subset

MATERIAL
NAME=STEEL IDES=S W=76819.55
T=0 E=1.99948E+11 U=.3 A=.0000117 FY=3.447379E+08
NAME=CONC2 IDES=C W=23560
T=0 E=2.48211E+10 U=.2 A=.0000099
NAME=OTHER IDES=N W=76819.55
T=0 E=1.99948E+11 U=.3 A=.0000117
NAME=CON3 IDES=C W=23560
T=0 E=2.48211E+10 U=.2 A=.0000099

FRAME SECTION
NAME=CON450X450 MAT=CONC2 SH=R T=.45,.45
NAME=FSEC1 MAT=CON3 SH=P T=.3048
NAME=B200 MAT=CONC2 SH=R T=.16,.2
NAME=B300 MAT=CONC2 SH=R T=.16,.3
NAME=B400 MAT=CONC2 SH=R T=.16,.4
NAME=B400H MAT=CONC2 SH=R T=.16,.4
"""

def append_rec_list(rec_in, return_dic, key):
   conditions = rec_in.split()
   print "     appending", conditions
   for each_c in conditions:
      sub_conditions = each_c.split("=")  ## a list with 2 elements
      return_dic[key].append(sub_conditions)
   return return_dic
  
def new_rec_list(rec_in, dic_return):
   conditions = rec_in.split()
   print "     record was split into", conditions
   name, key = conditions[0].split("=")   ## split first one only
   dic_return[key]= []                     ## empty list

   ##   process element #2 through the end of the list
   stop = len(conditions)
   for num in range(1, stop):
      sub_conditions = conditions[num].split("=")  ## a list with 2 elements
      dic_return[key].append(sub_conditions)
   return dic_return, key
  
def test_file_read():
   rec_dic = {}
   previous_rec = False   ##  indicates if previous rec starts with "NAME="
   fp = open("test1.txt", "r")
   for one_rec in fp:
      if one_rec.startswith("NAME="):
         if len(rec_dic):
            print "dictionary contains", rec_dic, "\n"
         rec_dic = {}
         rec_dic, key = new_rec_list(one_rec, rec_dic)
         previous_rec = True
      else:
         if previous_rec:   ## only if previous rec was a "NAME=" rec
            rec_dic = append_rec_list(one_rec, rec_dic, key)
         previous_rec = False
   print "dictionary contains", rec_dic, "\n"

test_file_read()

Hi, woooee

Thankl you so much for your help. Sorry not to reply in time as I just came back from my holiday.

Your code works very well after deleting a line code. I used your code create successfuly dictionary I needed. However, I feel it is still a long way for me to achieve my traget: rearrange and write to a file. I might bother you again if I cannot sort out problems with hard struggle.

all the best

ning

This will add a second line where necessary. How you process the data depends on what you want to do with it. Is it this just to rearrange and write to a file? If so the following code, with a few modifications to take into account the lines not processed by it, should work. If not post back with specifics.

This code uses a dictionary of lists, and should be self-explanatory. There are 2 functions, one to process the "NAME=" record, and a second to append additional data. The trick is to pass the existing dictionary and key to the append function. The code also creates an empty dictionary before processing the next "NAME=" record. If you are comfortable using a class, this code would be a good candidate. Keeping track of the variables is easier IMHO.

"""-----------------------------------------------------------
test file contains this data subset

MATERIAL
NAME=STEEL IDES=S W=76819.55
T=0 E=1.99948E+11 U=.3 A=.0000117 FY=3.447379E+08
NAME=CONC2 IDES=C W=23560
T=0 E=2.48211E+10 U=.2 A=.0000099
NAME=OTHER IDES=N W=76819.55
T=0 E=1.99948E+11 U=.3 A=.0000117
NAME=CON3 IDES=C W=23560
T=0 E=2.48211E+10 U=.2 A=.0000099

FRAME SECTION
NAME=CON450X450 MAT=CONC2 SH=R T=.45,.45
NAME=FSEC1 MAT=CON3 SH=P T=.3048
NAME=B200 MAT=CONC2 SH=R T=.16,.2
NAME=B300 MAT=CONC2 SH=R T=.16,.3
NAME=B400 MAT=CONC2 SH=R T=.16,.4
NAME=B400H MAT=CONC2 SH=R T=.16,.4
"""

def append_rec_list(rec_in, return_dic, key):
   conditions = rec_in.split()
   print "     appending", conditions
   for each_c in conditions:
      sub_conditions = each_c.split("=")  ## a list with 2 elements
      return_dic[key].append(sub_conditions)
   return return_dic
  
def new_rec_list(rec_in, dic_return):
   conditions = rec_in.split()
   print "     record was split into", conditions
   name, key = conditions[0].split("=")   ## split first one only
   dic_return[key]= []                     ## empty list

   ##   process element #2 through the end of the list
   stop = len(conditions)
   for num in range(1, stop):
      sub_conditions = conditions[num].split("=")  ## a list with 2 elements
      dic_return[key].append(sub_conditions)
   return dic_return, key
  
def test_file_read():
   rec_dic = {}
   previous_rec = False   ##  indicates if previous rec starts with "NAME="
   fp = open("test1.txt", "r")
   for one_rec in fp:
      if one_rec.startswith("NAME="):
         if len(rec_dic):
            print "dictionary contains", rec_dic, "\n"
         rec_dic = {}
         rec_dic, key = new_rec_list(one_rec, rec_dic)
         previous_rec = True
      else:
         if previous_rec:   ## only if previous rec was a "NAME=" rec
            rec_dic = append_rec_list(one_rec, rec_dic, key)
         previous_rec = False
   print "dictionary contains", rec_dic, "\n"

test_file_read()

Hi, woooee

I am modifying your this code to generate a dict with different format shown below
Frame_Section={'B400':{'MAT':'CONC2', 'SH':'R', 'T': '1.16,0.5'},
'B200:{'MAT':'CONc3', 'SH':'R', 'T': '0.16,0.5'}}
rather than the one using your code. Could you please show me how to do that? Actually I havd read "Learning Python" through over Easter and stll have not worked out a way. Thank you so much.

Regards

ning

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.