Hi! I’m trying to write a ‘C’ program to extract variable definitions and variable use in each line from a input ‘C’ program.
That is, a variable is said to be defined when its value changes(including declarations) and it is said to be ‘used’ when it is referenced. I need to consider this for all the cases and functions(built-in as well as user defined)..
For example consider the following input program

int main()
{
int a,b,c;
a=10;
b=20;
c=a+b;
printf(“c=%d”,c);
}

In the line 3, a,b,c are defined and nothing is referenced, hence in line number 3 under the ‘defined’ column a,b,c must be printed, and a mere ‘–‘ should be printed under the ‘used’ column.
Similarly in line 6, ‘a’ is defined and both ‘b’ and ‘c’ are used, hence in defined column a should be printed and under ‘used’ column ‘b,c’ should be printed.
I really don’t know how to start.. can anyone please give me an idea??
Thanks in advance..

Recommended Answers

All 3 Replies

This is a non-trivial program. First you need to be able to parse C code into tokens and recognize identifiers, which is a difficulty in and of itself. That would be the first step. Once you have a list of tokens, you need to determine the purpose of each identifier in the list (declaration, value context, or object context). In declarations and object contexts, that's where the value may change. In value contexts, that's where the value is referenced.

The problem is that you can reference the object without directly changing the value even though the value eventually gets changed. This is a somewhat difficult case to determine. For example:

void foo(int *p)
{
    *p = 10;
}

int main(void)
{
    int x;

    foo(&x);

    return 0;
}

x is modified through p, so *p = 10 would count as a "definition" for x in your requirements.

Another interesting difficulty is the preprocessor, which can create synonymous tokens and further hide modifications to a variable:

#define y x

int main(void)
{
    int x;

    y = 10;

    return 0;
}

This brings up the question of whether to assume the input source code has already been preprocessed, or to preprocess it yourself. Working with a preprocessed translation unit would be vastly simpler because then you wouldn't need to worry about headers, though you'd still have potential issues with external definitions and other translation units managed by the linker.

I'll go out on a limb and assume this is a school assignment, in which case you can likely cover the simplest of cases (direct variable usage with no aliasing) and meet the requirements. But I'll warn you that this kind of feature is very often on the list of desirables for a compiler (to aid in compile-time bug finding), but regularly rejected due to its complexity and cost.

commented: nice :) +36

What Narue said is all very true. However, to get you on your way, first you need to write the code that can identify variable declarations/definitions, such as: int a,b,c; Next, you need to break out each variable in this list, which requires that you recognize the comma delimiters and terminating semicolon. Note that sometimes these declarations may span lines. Also note, you may see something like this: int a=3,b,c; or this int a = 3, b, c; I could go on, but you should start to be seeing why Narue said that this gets complex quickly. As a result, my guess is that the purpose of this exercise is to get you to thinking about parsing and parsing problems - a very important part of computer software engineering.

Probably the most useful function available in C for this task is isalnum(). It will aid you extremely well in ignoring non-variable-name characters -- although watch for '_'.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.