Hi ,

so far i have been concentrating on programming aspects.
now its time to learn some internals of C -compiler.

Lexical Analyzer
Parser
Hashtables
Syntax Checkers
Expression Evaluators
Symbol table
:

etc

please share me some info related to symbol table

what are the contents of the symbol table
and how to view the contents of the symbol table.

i am using Gcc compiler on Linux.

Thanks

only for this question,

and how to view the contents of the symbol table.

if you using the mingw tools or gcc or DJGPP like gcc based tools,
you can use the nm utility to view the symbol table associated with
each binary or object file.

nm mybinary.o

read the man page for the nm ,
http://unixhelp.ed.ac.uk/CGI/man-cgi?nm

in the microsoft compiler there are tools like dumpbin.exe and
borland has some other tools.

i just tried to check the symbol table by using foll files .


I am using the following files

Head_Symb.h

#define LV 20
extern int Glb_Var;
extern int fun( char *p , char a[]);

Lib_Symb.c

#include "Head_Symb.h"
int fun( char *p , char a[])
{
        static int Stc_Var  ;
        int Lcl_Var2;

        printf("Ptr: = %s\n", p);
        printf("Arr: = %s\n", a);
        printf(" Glb_Var = %d \n", Glb_Var);
        return 0;

}

Main_Symb.c

#include "Head_Symb.h"
#include<stdio.h>
int Glb_Var = 10;
int main( void )
{
        int Local_Var = LV;
        char *Ptr = "pointer";
        char Arr[]= "array";
        fun ( Ptr , Arr);
return 0;
}

i am using following commands

gcc -c *.c
Lib_Symb.o and Main_Symb.o are generated

when i use the command
nm Lib_Symb.o
the following are listed :

00000000 T fun
                 U Glb_Var
                 U printf
00000000 b Stc_Var.0

and

when i use the command

nm Main_Symb.o

U fun
00000000 D Glb_Var
00000000 T main

gcc *c

and when i use the command
nm
or
nm a.out

i got following list

08049628 A __bss_start
080482e4 t call_gmon_start
08049628 b completed.1
08049528 d __CTOR_END__
08049524 d __CTOR_LIST__
08049618 D __data_start
08049618 W data_start
080484a4 t __do_global_ctors_aux
08048308 t __do_global_dtors_aux
0804961c D __dso_handle
08049530 d __DTOR_END__
0804952c d __DTOR_LIST__
08049538 D _DYNAMIC
08049628 A _edata
08049630 A _end
080484c8 T _fini
08049524 A __fini_array_end
08049524 A __fini_array_start
080484e4 R _fp_hw
0804833c t frame_dummy
08048520 r __FRAME_END__
08048368 T fun
08049624 D Glb_Var
08049604 D _GLOBAL_OFFSET_TABLE_
         w __gmon_start__
08048278 T _init
08049524 A __init_array_end
08049524 A __init_array_start
080484e8 R _IO_stdin_used
08049534 d __JCR_END__
08049534 d __JCR_LIST__
         w _Jv_RegisterClasses
08048460 T __libc_csu_fini
0804840c T __libc_csu_init
         U __libc_start_main@@GLIBC_2.0
080483b4 T main
08049620 d p.0
08049524 A __preinit_array_end
08049524 A __preinit_array_start
         U printf@@GLIBC_2.0
080482c0 T _start
0804962c b Stc_Var.0

the things i dont undestand here is :

1) why the command nm Lib_Symb.o and Main_Symb.o are not having any address refereence, all address are zero there.

2) is there any relation between addresses
in the command nm a.out
because
there are different things that are specifeid at some address
ex : at
08049628 A __bss_start
08049628 b completed.1
08049628 A _edata

what all these specifies .
i understood little bit from the link you provoded .

thanks for the link.
but overall i dint understand completely what actually a symbol table of .0 files (Lib_Symb.o and Main_Symb.o ) consists of and .a.out consists of .

please just give a breif discussion or links


thanks

Glb_var will created on the data segment "D" denotes it and
main will created on the .code segment. that's why they
starts at same addresses.

and you can see there are more symbols starting with _ underscore
that's why C and C++ does not recommend you to start a identifier
with underscore even it's valid. They are internal helper routines.

I think you have already know what are the .code .bss .data segments
and how the format of a a.out file.
if don't know : http://en.wikipedia.org/wiki/A.out

the things i dont undestand here is :

1) why the command nm Lib_Symb.o and Main_Symb.o are not having any address refereence, all address are zero there.

2) is there any relation between addresses
in the command nm a.out
because
there are different things that are specifeid at some address
ex : at
08049628 A __bss_start
08049628 b completed.1
08049628 A _edata

what all these specifies .
i understood little bit from the link you provoded .

thanks for the link.
but overall i dint understand completely what actually a symbol table of .0 files (Lib_Symb.o and Main_Symb.o ) consists of and .a.out consists of .

please just give a breif discussion or links
thanks

1. To answer your first question. The files are object files meaning they still have to be linked(assigned addresses) in the final executable so the linking process will assign the memory addresses. The addresses in object files start at zero or are referenced from zero.

Note: Another binary utility you should use is

objdump -D a.out>testfile

This will demonstrate my answer: Take this simple piece of code and compile

gcc -c getsetx.c

unsigned long x = 0;
unsigned long y = 0;

unsigned long gety()
{
	return y;
}

void sety(unsigned long val)
{
	y =  val;
}

unsigned long getx()
{
	return x;
}

void setx(unsigned long val)
{
	x =  val;
}

nm getsetx.o is:

0000000000000022 T getx
0000000000000000 T gety
000000000000002f T setx
000000000000000d T sety
0000000000000000 B x
0000000000000008 B y

and objdump -d getsetx,o is:


getsetx.o: file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <gety>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # b <gety+0xb>
b: c9 leaveq
c: c3 retq

000000000000000d <sety>:
d: 55 push %rbp
e: 48 89 e5 mov %rsp,%rbp
11: 48 89 7d f8 mov %rdi,-0x8(%rbp)
15: 48 8b 45 f8 mov -0x8(%rbp),%rax
19: 48 89 05 00 00 00 00 mov %rax,0x0(%rip) # 20 <sety+0x13>
20: c9 leaveq
21: c3 retq

0000000000000022 <getx>:
22: 55 push %rbp
23: 48 89 e5 mov %rsp,%rbp
26: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 2d <getx+0xb>
2d: c9 leaveq
2e: c3 retq

000000000000002f <setx>:
2f: 55 push %rbp
30: 48 89 e5 mov %rsp,%rbp
33: 48 89 7d f8 mov %rdi,-0x8(%rbp)
37: 48 8b 45 f8 mov -0x8(%rbp),%rax
3b: 48 89 05 00 00 00 00 mov %rax,0x0(%rip) # 42 <setx+0x13>
42: c9 leaveq
43: c3 retq

Disassembly of section .bss:

0000000000000000 <x>:
...

0000000000000008 <y>:
...

You can see that each section starts its addressing at zero...its just an easy way for the linker to add this functionality when it knows where everything starts and what its offset is.

thanks gerard and NicAx64 for your answers.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.