Friday, June 8, 2007

Moving from VC++ to GCC ..

I have been using Visual Studio for doing my code development for the past three years and am new to the Unix environment. I wanted to learn this so that it would help me to work on both platforms without any worries. So in this article I am planning to talk about the transition from VC to GCC.

Step 1: CygWin
It is a collection of software which help running the unix environment on Windows platform. (Too generic explanation. check out the link for more details). For installing it you need to go to the following link. Download the setup. Choose what packages (equivalent of softwares in windows) you need in your Unix environment and install them.
Basic things that are required are:
  • VI Editor - for file editing
  • GCC and G++ - Compilers
  • GDB - Debugger for debugging your code
Step 2: G++ and GCC
Next thing to understand is how the compilation takes place in Unix systems.

Compiling and Running a Single File:
Lets start by compiling a single file.

$ gcc sample.c

This compiles the code in sample.c and if it does not have any compilation (or linking errors) crates a output file a.out (named so for historic reasons). If the output file nome needs to be changed it can be done by

$ gcc sample.c -o sample

which produces the output file "sample".
To run the file we need to specify the path and the file name as shown below

$ ./sample

Sometimes the executable permission for the file needs to be set in which case it can be modified by the following command

$ chmod 777 sample

Giving read write and execution permission for the administrator,group and the user.

Creating a Debug Ready Code:
For debugging a code, the executable generated must have debug information so that while we use the debugger to run the code we would be able to break at certain point, view variables etc.

$ g++ -g sample.c -o sample

The output file size is bigger than the normal compiled file. This extra information can be removed by using the command "strip"

$ strip sample

This basically throws out the symbol information. For more details read "man strip"

Creating a Optimized Code:
When we try to generate a optimized code (say in code size and speed) the compiler used various algorithms and tends to be little slower it getting the output file.

$ g++ -O sample.c -o sample

For higher optimizations we can use a number along with the '-O' options. It is recommended that we do not go more than 2 (since higher optimizations can result in code which may modify the functionality)

$ g++ -O2 sample.c -o sample

For other options it is better to rean the man pages.

Compiling Multiple Files:
As known, projects are not made of single files and for organizing files properly for understanding. The easiest way to do it would be to compile all the source files in a single command

$ g++ sample.cpp sample1.cpp sample2.cpp -o sample

But the problem with this is we need to compile all the files even if one of the files is changed and so we separate the process into two steps
  • Compiling - To create .o (object files)
  • Linking - To link the object files to form the executable (or output)
$ g++ -c sample.cpp
$ g++ -c sample1.cpp
$ g++ -c sample2.cpp
$ g++ sample.o sample1.o sample2.o -o sample

The first three statements use "-c" option to signal the compiler to stop with the creation of the object files alone and the final line links all the object files to form the output. The intension behind this split-up is that the linking functionality takes relatively less time than the compilation.

Steps Involved In Compilation:
We just saw the various compilation methods and options but would it not be better to understand what is done when we invoke the g++ (or cc or gcc or acc) command.
  • Driver - It is what is invoked when we call "g++". It is a "engine" which drives the various tools of the compiler. When invoked it invokes the various tools involved and passing the output of one into the input of the other
  • C Pre-Processor - normally called "cpp". It handles all the pre-processor functionalities (#define, #include and #ifdef etc)
    • $ g++ -E sample.cpp
  • Compiler - normally called the "cc1". Creates the object file ('-c' option)
  • Assembler - normally called as "as".takes the object file and maps it to the asm code
    • g++ -S sample.cpp
  • Linker-Loader - takes all the object files and creates a executable which the operating system supports. It contains the internal structure of the executable-location of the data segment, location of source code segment, location of debug information and so on.
Step 3: Debugging Using GDB
Gdb stands for "GNU Debugger". It is a debugging tool in the unix environment. It helps u to step through, break, print values of the running code. It is a command line debugger and unlike the VC debugger does not have separate windows for viewing variable and stack etc. But is vary powerful

To Invoke a "gdb":

For running the executable in the debugger we first require to compile the code with debug information.

$ g++ -g sample.c -o sample.exe
$ gdb sample.exe

The gbd needs to be run in the directory in which the source file is present or else the gdb would not show us where we are currently executing. It is possible to link other source files into the gdb

Running a program inside gdb:
To start running the code inside the gdb we use the run command. For giving runtime inputs to gdb we attach the arguments along with run. Since space is considered as a parameter separator we need to specify strings inside quotes.

$ run "hi how r u" "hope this is useful"


Setting Breakpoints:
Break points are places were u want the program to pause and helps u to view the state of the execution (like various variable values, stack etc). They can be set up by
  • specifying line number
    • $ break sample.cpp:9
  • specifying the function name
    • $ break main
Stepping through the commands:
Once you have paused inside the code you can "step" through the code at a one line by the following
  • To step into a function (if it is a function call)
    • $ step
  • To execute and go to the next line after executing the present
    • $ next
To learn more on how to set up break points and using conditions to break use "man break" or "man breakpoints"

Printing variables and expressions:
The values at the position where u breaked can be checked out by using

$ print i

and the output would be

$ $1=10

meaning "i" is 10 (this would be printed only if "i" is inside the scope). If it is not the case the a error message would be printed

$ No symbol "i" inside context.


Viewing the stack:
To get the stack trace information or to know "where" you are in the code use

$ where

and the following output is obtained

$ #0 print_string (num=1, string=0xbffffc9a "hello") at debug_me.c:7
$ #1 0x80484e3 in main (argc=1, argv=0xbffffba4) at debug_me.c:23

The #1 and #0 are the stack frame information and so any local variable insdie this stack frames can be viewed by first shifting to that stack frame and then printing out variables in that frame

$ frame 1
$ print i

Now the variable i in frame 1 (in function main) would be printed.

Attaching to already running process:
There can be scenarios in which we want to attach to already running process like some process running in background (to run a process in background attach a '&' to the process like ./sample&), some process from a remote machine. In that case we can attach the gdb to that process by the program name and process id (if we use only the program name then another instance of the program starts running in the gbd)

$ gdb sample 9261

Here the assumption is that sample is the program name and 9261 is its Process ID. With this command the gdb attaches itself with the program and then pauses it. From here we can use "where" to find out the position in the program and start our debugging.

Debugging a crashed code:
There are always possibility that a code can crash. Whenever a code crashes a "core" file is dumped which has the information about the state of the memory during the crash, stack information and so on. This can be viewed by loading it back using the GDB.

$ gdb sample core

Usually this file get created in the directory in which the executable is running (due to call of some some signals SIGV which I do not understand). There can be cases where the crash is due to memory corruption and in this case the stack information itself may be invalid and so gbd does not help.

Step 4: MAKE Utility - Automating Compilation
Compiling more than one source file would be pretty annoying and if this could be automated by some simple scripts it would be helpful. This is where make utility (or makefiles) come into play
Makefiles is a collection of instructions which is used to compile our program. Whenever some files of our program are modified we type "make" which would recompile only the files that are modified using minimum compilation and create the output file. However, this is not done by some magic but by the set of rules and dependencies supplied by us in the makefile.

Makefile Components:
The make file consists of

Variable definitions - Lines defining values for variables using '='
CFLAGS = -g -Wall
SRCS = main.c sample.c

Dependency Rules - under what conditions should a file be re-compiled
main.o:main.c
g++ -c -Wall main.c -o main.o

Here the main.o file is to be updated whenever the main.c file is modified (dependency as shown in line one). For updating main.o the method used is specified in commands next line. A command line always starts with a tab space followed by the necessary command

Comment - A line starting with # is considered as comment
# this line is commented

Order of Updating Files:
A make command invokes the commands specified in the makefile, checking for dependencies recursively. Any makefile would contain a target. A target can be a file which the make needs to update or a just a name to mark a starting point (in which case it is referred as a 'phony target'). It checks the dependency for the target. Then it recursively checks the dependencies for the files in the dependency of the target. The recursion is terminated whenever there are no dependencies but there is a file by the dependency name (in which case it is assumed that this file is up to date). Then the commands are executed to update every dependent file till the current target is updated. This appears to be a little complex and would be clear from the example below.

Single File Compilation:
Lets consider a example and then understand what are the features of Make.

# top level rule to compile all
all: main

# Linking the object file
main: main.o
gcc -g main.o -o main

# compiling the source
main.o: main.c
gcc -g -Wall -c main.c

#cleaning everything after creation
clean:
/bin/rm -f main.o main

1) all rules need not have command. eg all: main
2) all rules need not have a dependency as in 'clean' in which case the command is executed without checking any condition
3) all rules need not be used whenever make is invoked. say clean is not used when programs are compiled and is used for cleaning the code

Compiling more than one file:
Moving to the next stage if there are multiple files to compile in a project (as in real world scenarios), we try to make the makefile as flexible as possible so that it can be grown by modifying it as we grow the project.

#the top level rule
all: main

#the program is made of several source files
main: main.o file1.o file2.o
g++ -g main.o file1.o file2.o -o main

#dependencies of file1
file1.o: file1.c file1.h
g++ -g -Wall -c file1.c

#dependencies of file2
file2.o: file2.c file2.h
g++ -g -Wall -c file2.c

#rule for cleaning
clean:
/bin/rm -f file1.0 file2.o main.o main

1) As seen each file has its own rule.
2) Each file has dependency over its header file because if the dependency covers only the source file, there can be cases in which the header file is modified but since the source file is not modified the re-compilation is not invoked thus leading to a wrong result.
As seen there are many redundancies in the above way of representation and these can be removed by utilizing the features of makefile

Using Compiler and Linker Flags:
In the above example we execute compile all the files in the debug mode and once debugged if we want to generate a optimized code we need to go to every rule and change the option. This process is cumbersome and error prone. To avoid it it is always better to assign the Flag options and compiler type (since this can also change) to variables and use the variables in creating the make file.

# Use "gcc" to compile
CC=gcc
#Sometimes the linker can be different from the compiler
LD=gcc
# the compiler flags
CFLAGS = -g -Wall -c
#Linker Flags
LFLAGS= -g
#Command used to remove files
RM=/bin/rm
#List of generated object files
OBJ=main.o file1.o file2.o
#program executable filename
PROG=main

#the top level rule
all: $(PROG)

#the program is made of several source files
$(PROG) : $(OBJ)
$(LD) $(LDFLAG) $(OBJ) -o $(PROG)

#dependencies of file1
file1.o: file1.c file1.h
$(CC) $(CFLAGS) file1.c

#dependencies of file2
file2.o: file2.c file2.h
$(CC) $(CFLAGS) file2.c

#rule for cleaning
clean:
$(RM) -f $(OBJ) $(PROG)

Even after this modification (where even the remotest possibility of change in settings was assigned to a variable) still there is a issue of introducing a new rule for every file include in the project which is taken care by "File Type" rules

# each source file dependency can be replaced by
%.o: %.c
$(CC) $(CFLAGS) $<

Filetypes:
1) '% ' is a wild card and matches anyfile which ends in a .o to be dependent on any file (with the same match as .o) which ends with .c i.e A file with file1.o is dependent on the file with file1.c (and not with file2.c)
2) '$<' variable corresponds to the dependency list that was matched by the rule. Say in the case #dependencies of file2 file2.o: file2.c $(CC) $(CFLAGS) $< $<>

#define source file
SRCS=main.c file1.c file2.c
# All others remain the same
#........
# Depend rule
depend:
$(RM) .depend
makedepend -f- -- $(CFLAGS) -- $(SRCS) > .depend
#Now add a line to include the .depend file
include .depend

So in the above example the makedepend utility browses through all the .c source file and creates a dependency list of all header files and pushes it in to a .depend file. Since this is included in the makefile the dependency list is updated with the header dependencies.

1 comment:

Movers Community said...

go to www.moverscommunity.com for movers