I have a question about programming/scripting in general, which I'm still driving over rocky terrian trying to learn.

The compiler, regardless what it is; reads though code from top to bottom, does it read though all the lines of code first, or does it stop at functions etc and continue on depending on the user input ?

Recommended Answers

All 10 Replies

Any implementation will definitely read everything (well some languages have some kind of "stop here" token, which causes the impelementation to stop reading at that token, but in those case the stuff after would simply not be considered part of the program at all). Even if it did decide to skip, say, the body of an if-statement, it would still need to actually read it in order to know how far to skip.

A compiler will also never skip anything based on user input as the compiler can't possibly know what the user input will be (as the compiler is long done doing its job when the program actually runs). It might skip generating code if it can tell that a piece of code will never ever get executed (say if the programmer wrote something impossible like if(1==2)), but generally the code would still be parsed - just not translated to machine code.

An interpreter could skip based on user input, as interpreters actually run at the time as the program, but generally the only thing it'd skip is execution. That is if you have an if whose condition is false, the body obviously won't be executed. In most interpreters it would still be parsed though. However there are some interpreters, most notably shells, that actually do skip syntax analysis of any lines that do not get executed, that is after seeing an if with a false condition, it discards all lines until one is equal to fi (but, as mentioned in the first program, the line still needs to actually be read to determine whether it's equal to fi). That's the exception though.

A common way compilers and interpreters work is this: The parser reads the whole file word-by-word (using a so-called tokenizer, which can take a stream of characters and read one word at a time from it) and while doing so generates an abstract syntax tree. Once the whole file has been read into such an AST, the whole AST gets translated into some sort of intermediate code. Now various optimiziations and analyses might happen (generally more in case of compilers than interpreters) and after that a compile will translate the intermediate code into machine code (or whatever else the target language is) and an interpreter will execute the intermediate code. Code might be skipped during optimization or execution, but the whole file has already been read and processed in various ways by then.

Interesting & helpful, thanks :)

if you are interested in actual example of a parser that could be used to build a programming language, take a look at parser.moonyweb.com. You can try it online to get insight in "abstact syntax tree" it creates for a text by some grammar rules.

Some programming languages are designed so that the parser and compiler can work from top down -- that is, the meaning and compilation of a particular function depends only on the code that comes before it. C, C++, and probably Pascal work this way. Thus, the compiler doesn't need to parse or read the whole file, before generating output. A long time ago you could take advantage of this to compile files on low-memory systems. It would be a bad idea nowadays, though, for all sorts of reasons related to compilation speed, optimizations, and user friendliness.

Thus, the compiler doesn't need to parse or read the whole file, before generating output

But here you're just saying that the compiler can start writing the .exe (or whatever) file before it's done parsing, right? By the time the compiler is done, it will still have parsed the whole file. It won't skip anything (barring preprocessor directives, which still need to at least read everything they're skipping and of course don't depend on user input in any way).

Of course it will eventually parse the whole file, unless an error happens, in which case it could stop, or a user-friendly compiler might try to find some more errors.

I remember some C compilers that created assembly code first and then went on to object code and then added an executable stub. That allowed you to add assembly code directly to the C code.

Python compiles to a byte code (looks somewhat like assembly code), that is then interpreted. The compiler separates code into main code and functions/classes/methods. Imports can be handled as precompiled byte code.

Google' Go compiler is optimized for compile speed and is several orders of magnitude faster than the commonly used C++ compiler. The syntax of Go caters to that.

Go's syntax doesn't really cater to compilation times, in any particularly special way. Its compilation time wins come from other aspects of the language design, like its lacking generics and cyclic module imports.

Actually Go has done away with the convoluted overlapping header files that C and C++ use, in an effort to speed things up. Also, you can't import a package you are not using. One could consider this part of the syntax.

One could consider this part of the syntax.

Then one'd be wrong, because it's not part of the syntax.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.