You shouldn't worry too much about efficiency yet. Just work on getting it functional. :) As for the functional part, pulling HTML tags from a file can be tricky for the general case. Probably the safest way is to walk through every character and build each tag individually before storing it. That way you can easily check to see if you're in a string or something.