Hello, i am currently writing an assembler for the theoretical SIC/xe machine and i decided to do it in python.
Very new to python but i thought it would be fund to do it in and willing to learn. So far i know what to do and have the diea except i cannot implement it in python code yet. I can do it in java or c++.
Basically i want to parse through the source.txt file and since the source is made with 3 columns i decide to split by tab and match there.
here is my sample code

def main():
	comment = "."
	SYMTAB = {"START":' ', "BYTE":" ","WORD":" ", "RESB":" ","RESW":" ","END":" ","BASE":" ","NOBASE":" "}
	OPTAB = {"ADDR":" ", "COMPR":" ", "SUBR":" ", "ADD":" ", "SUB":" ", "MUL":" ", "DIV":" ", "COMP":" ", "J":" ", "JEQ":" ", "JGT":" ", 
"JLT":" ", "JSUB":" ", "LDA":" ", "LDB":" ", "LDCH":" ", "LDL":" ", "LDT":" ", "LDX":" ", "RSUB":" ", "TIX":" ", "TIXR":" ", "RD":" ", "TD":" ", "WD":" ", 
"STA":" ", "STB":" ", "STCH":" ", "STL":" ", "STX":" ", "CLEAR":" "}
	inf = open("source.txt", 'rU')
	outf = open ("temp.txt", 'w')
	lines = inf.readlines()
	for line in lines:
		if line.find("START") > -1:
			about = line.split()
			LOCCTR = about[2]
	for line in lines:
		while line.find("END") < -1:
			print line


	return 0

if __name__ == '__main__':

I would like to know how to iterate better through lines with python.
Please ask me more questions if necessary
i have attached a sample source.txt file as well so you can see how it is. As you can see its separated in 3 columns but sometimes the first column is empty.

I like to use readlines to get the data. Some other useful commands include stip and append. Here is some ugly code to demonstrate these:

batch_data = FileHandle.readlines()
    for line in batch_data:
        SplitArray = ["0"]
        #remove endlines from the lines
        CleanLine = line.strip("\n")
        #remove white space from lines
        CleanLine = CleanLine.strip()
        SplitArray = CleanLine.split("=")
        #remove white space from lines
        SplitArray[0] = SplitArray[0].strip()
        #append the read line to an array
        if len(SplitArray) == 2:
            #remove white space from lines
            SplitArray[1] = SplitArray[1].strip()
            #append the read line to an array

Hope that helps.

Ok i see that is useful. But how do i go about matching strings in that split array from string in a database or another array?

In python, you can compare strings easily. Just loop through the code, and do a compare like this:

while (i < len(ParameterArray)):
        if ParameterArray[i] == databaseItem:
             match = True

You can also use search to find partial text in strings. Here is an example:

import re
re.compile("some text").search(ParameterArray[j])

I would start with something like this. I would same time produce simulator for the code in the end.

L == LOCCTR ??

generate=False ## code generation on?

## adr parameter is the first column value, op is the third column (list?)
def start(adr,op):
    global generate

def byte(adr,op):

def word(adr,op):

def resb(adr,op):

def resw(adr,op):

def end(adr,op):
    global generate

def base(adr,op):

def nobase(adr,op):

def addr(adr,op):

def compr(adr,op):

def subr(adr,op):

def add(adr,op):

def sub(adr,op):

def mul(adr,op):

def div(adr,op):

def comp(adr,op):

def j(adr,op):

def jeq(adr,op):

def jgt(adr,op):

def jlt(adr,op):

def jsub(adr,op):

def lda(adr,op):

def ldb(adr,op):

def ldch(adr,op):

def ldl(adr,op):

def ldt(adr,op):

def ldx(adr,op):

def rsub(adr,op):

def tix(adr,op):

def rd(adr,op):

def td(adr,op):

def wd(adr,op):

def sta(adr,op):

def stb(adr,op):

def stch(adr,op):

def stl(adr,op):

def stx(adr,op):

def clear(adr,op):


    "LOCCTR": 0,
    "FLAGS": 0x0

if __name__ == '__main__':
    inf = open("sic.txt", 'rU')

    lines = [(code[:13].strip(), code[13:21].strip(), code[21:].strip()) for code in inf]

    for i,line in enumerate(lines):  ## generate index,value pairs from lines
        if line[0]:

    print '** SYMBOLS **'
    for k in sorted(VALUES.keys()): print k,' = ',VALUES[k]

    print '-'*40
    print '** AS LIST **'
    for i in lines: print i
    print '-'*40
    for li,line in enumerate(lines):
        print li,':\t','\t'.join(line)

## process the replacement of symbols by value, spliting of third field parameters
## run the functions in loop for each line for assembler
## run as program updating program counter and registers in functions
## according of the meaning of instructions
## to do simulation run of the program

I would start with something like this. I would same time produce simulator for the code in the end.

hey tonyjv, i udnerstand that those are functions for each symbol but what paramters do i give. can you give me an example.

 def subr(2,94):


def subr(adr,op):
    adr = 2
    op = 94

Also how can i access the opcode stored there?
like subr.value() ?

Just one thing. For line iterating, i find it is a bad idea to load the whole file in a list (using readlines()) except if you can't process it sequentially. This will consume memory for really nothing interesting...
Python allows iterating directly on file lines :

for line in open(myfile,'r'):
    print line


with open(myfile,'r') as infile:
    for line in infile:
        print line

This seems to be the preferred way for some obscures reasons I don't know (if someone can explain, i would learn this with pleasure).

I thought that you iterate over lines of code and do calls to subroutines found through dict, maybe having both SYMTAB and OPTAB as one table would make life easier, if they can not have save values (like adrress called SUBR)
I thought this kind of use. My example loop for symbols must be replaced with loop aware of length of each instruction and use byte addressing (mostly used for processors)

## example opcode lives in high byte of 32 bit number 
## and adr is in the remaining 24 bits
## op I thought to use for the value calculated from the third colum
## = operand
def subr(adr,op):
    ## subr = 0x94
    ## in case we have have op as clear number
    VALUES["LOCCNTR"] = op
    ## return the 4 bytes and increase location adr by 4 for code scan
   ## I mean that next instruction must be written 4 bytes from old value
    return (0x94<<24 + op, adr + 4)

In the loop we would do (this would work in reality for variable length instructions, would need some function for multibyte saving to memory, instead of list assignment).

memory[adr], adr =  this_instructions_operation(adr,op)

Using double assignment

>>> a = [0,1,2]
>>> i=1
>>> a[i],i = (99,i+1)
>>> i
>>> a
[0, 99, 2]

Maybe better alternative to returning ready bytes of assembly is to return list value and do second pass over these values to write real values. This is maybe better as we have to make second pass anyway if we have variable length instructions (so we do not know length of each instruction and so the offset for address from the START address)

However this direct assembly was more easy for me to make up as example.

Could you explain the meaning of the code, my try as python style comments.

PGRM1        START   0      ## PGRM1==?
FIRST        LDB     NUMB1  ## this will put word address of NUMB1 to B? or contents of that location?
             STL     RETAD  ## STore LOCCTR? The value of next LDA instructions location
             LDA     NUMB1
LOOP         ADD     NUMB2
             STA     ARRAY,X ## put current value of A to memory location starting from ARRAY?
             TIX     LIMIT  ## ??
             ADD     #1     ## add to A constant 1?
             STA     NUMB3
             J       @RETAD  ## jump to location stored in location RETAD?
RETAD        RESW    1       ## reserve word=2 bytes for value (uninitialized) and call location RETAD
LIMIT        WORD    10      ## put two bytes word value for constant 10 or reserve 10 words array?
ARRAY        RESB    1024    ## reserve 1024B=kByte for array?
NUMB1        WORD    5
NUMB2        WORD    10
NUMB3        RESW    1
             END     FIRST  ## NOT PGRM1

I didnt explain it clearly. This is not a linker. Just an assembler. I search the file for symbols, if the symbol is not in the symbol table, i add it to SYMTAB. If it is then continue. basically i just assemble the opcodes with their values and locctr and produce object code. .
Here is some pseudo code. http://pastebin.com/Af21Dk7Y . i do not want/need to have somone write that out fo me in python. i just want to know syntax on how to deal witht eh strings, after finding them , assign them a locctr value. and wrtie out thei value. so basically if i FIND start in a line. I initiate Locctr to whatever value is after START. So copy START 1000 would give a locctr value of 1000. then i add the opcode in front of it. So its xx1000
and so on for the rest of the code for each label in there. so i dont think all those functions are necessary. unless i make them return the opcode value but that still would bee too much.

You are on the right track with the dictionary, but am not sure exactly what you want to do. Some input and output data might help. This code is an example of what I think you want.

SYMTAB = {"START":' ', "BYTE":" ","WORD":" ", "RESB":" ","RESW":" ","END":" ","BASE":" ","NOBASE":" "}
test_list = [ "START 1000\n", "JUNK 2000\n", "END 3000\n" ]
for rec in test_list:
    rec = rec.strip()
    substrs = rec.split()
    ## assume the keyword is always the first word
    print substrs[0],
    if substrs[0] in SYMTAB:
        print "FOUND  = xx %s" % ( " ".join(substrs[1:]) ) 
        print "NOT FOUND"