Generally, my professor's assignments are vague with limited constraints. while this allows the opportunity to become resourceful and creative, there are some like i who are blindly lost without a push in the right direction, this is my assignment please any insight is much appreciated.

We are going to work on strings. They represent an area where you
can achieve significant performance improvements in conjunction
with high level languages. You will have practice doing that with this
assignment. My goal is for you to explore how your string design
trade-off’s influenced how you process the strings you created.
Assignment Details
We spoke about several types of strings you can construct with ASM
in class. You are to come up with a design for a string and implement
3 operations with your design (concatenation, length, etc., etc.).
You don’t need to do anything fancy to create the strings - just go
ahead and allocate them in memory. I want you to focus your energy
on how you will manipulate them.

2 Years
Discussion Span
Last Post by sbesch

It sounds like your professor wants you to play with different string representations. Specifically, how to determine length at any given time as that's the usual difference between string types. There are three common ones:

  • Terminated strings: An extra slot is allocated for storage of a terminating character (usually 0). Any manipulation of the string stops at this character, and modifications ensure that the terminating character is present to avoid corrupting memory. This is how C-style strings work, and the drawback is determining the length is an O(n) process because you need to traverse the string to find the terminating character.

  • Bookkeeping strings: Extra memory is allocated for storage of the string's current length. Usually it's a byte or two at the beginning. Storage of actual character data starts immediately after this bookkeeping block. Determining the length is quick and easy, but one drawback is you limit the maximum size of the string, or add unnecessary memory if the string is always small. Accessing the bookkeeping block can also be unintuitive.

  • Aggregate types: The string is just a block of memory and the size is stored in a separate variable. This is the most intuitive design, but it's risky in that the size variable must be updated religiously and it's easy to forget. Most string classes in OO languages define string types using this design.


I think that everyone will agree that strings can be a real pain. The days of simple ASCII strings have evaporated into the mists of time. Now we have to deal with unicode in all it's variants. However, in terms of assembly language string processing, pretty much everything has already been said about the various means of stashing a string in memory. There are however a few useful items that you should consider when dealing with strings. There is really no substitute for the native x86 string instructions. So, you need to find the length of an arbitrary string? Well, you have to assume some maximum length for your string. This must be less than the segment size minus the start address of the string - and it's not safe to assume some large number. On the other hand, if you can rely on the string being properly terminated, then things get simpler. Now, you can get the length of the string in about 4 instructions. Simply load (E)DI with the address of the string, (E)CX with the maximum length to allow, AL with the terminating character and execute "RepNE Scasb". (E)DI will be left pointing 1 past the terminating character. The length is then a simple matter of subtraction.
By the way, watch out for odd terminating characters. Yes, c strings use a null byte and so do most assemblers, but, I have seen cases where there is no default terminator, or some other character may be used. Admittedly, this is rare - perhaps even non-existant nowdays. As a historical note I've seen '*', '?', and various control characters. In some special cases - for example when null is a legal character in a string, you will see 2 nulls in a row as a termination. The point is to remind you that string handling in general is somewhat (if not totally) application specific: Never assume anything - test all your assumptions and take precautions. Strings can kill!!

Edited by sbesch: Clarification

This article has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.