I need to parse an assembly program. I have no problems parsing a group of instructions or a group of instructions with one loop. For example:

LD    R2,     0(R1)
          DADD  R4,     R2,     R3
          SD    0(R1),  R4
          DADDI R1,     R1,     #-8
          DADD  R2,     R2,     R4
or
          Loop: LD    R2,     0(R1)
                DADD  R4,     R2,     R3
	        SD    0(R1),  R4
	        DADDI R1,     R1,     #-8
	        BNEZ  R1,     Loop
	        DADD  R2,     R2,     R4

My problem is when there are nested loops. For example:

Loop: LD    R2,     0(R1)
      Loop: LD    R2,     0(R1)
            DADD  R4,     R2,     R3
	    SD    0(R1),  R4
	    DADDI R1,     R1,     #-8
	    BNEZ  R1,     Loop
	    DADD  R2,     R2,     R4
     DADD  R4,     R2,     R3
     SD    0(R1),  R4
     DADDI R1,     R1,     #-8
     BNEZ  R1,     Loop
     DADD  R2,     R2,     R4

I was hoping that someone could provide some insight on how to parse the nested loop example. I would like to split the nested loops into 2 separate lists, innerLoop and outerLoop. The only way I see to do this is to count the whitespace or indentation to break it up. If anyone has a better idea, please let me know. Thanks!

Recommended Answers

All 3 Replies

When it comes to writing parsers, I tend not to write parsers at all and rely on parser generators like Wisent.

Anyway, care to show us the code of what you'll already doing?

I should be more clear when I mention parsing. I'm not using any parser generators or writing a parser. I'm parsing in the most basic sense, i.e-pulling what I want based on conditionals.

This is extremely hacky code. This is what I have for the cases that do work with the 2 examples above only. I makes no assumption that there is code before the loop and no code after the loop. I stopped writing that check because I realized I need to check for nested loops and was not sure how to extract the nested loops individually.

op = [['Loop:', 'LD', 'R2', '0(R1)'], ['DADD', 'R4', 'R2', 'R3'], ['SD', '0(R1)', 'R4'], ['DADDI', 'R1', 'R1', '#-8'], ['BNEZ', 'R1', 'Loop'], ['DADD', 'R2', 'R2', 'R4']]

for item in op:
    if item[0] == 'BNEZ':
        place = op.index(item)
        afterBNEZ = op[place+1:]
        beforeBNEZ = op[0:place+1]

I have to correct my above statement. The slicing in afterBNEZ does take into account if there is code after BNEZ. If not, The slicing will return an empty list.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.