Wednesday, June 13, 2007

Veilog Basics (II)

Issues with Verilog:

Incomplete Specification of Conditions
(if-else or case) leads to 'Latches' (described later)

module mux_3to1(a,b,c,sel,out);
input [0:1] sel;
input a,b,c;
output out;
always@(a or b or c or sel) begin
case (sel)
2'b0: a;
2'b1: b;
2'b2: c; // what happens when 3 occurs????
endcase

endmodule
  • Thus for handling the case of 'three' as sel input a latch is introduced (which is unnecessary) to store the previous value.
  • The same is the case with a 'if' condition without a 'else'
  • To avoid this either precede the condition with default value for output or handle all the cases
    • Each if with a else condition
    • Each case with a default condition
CASE1:
always@(a or b or c or sel) begin
out = 1'b1;
case (sel)
2'b0: out = a;
2'b1: out = b;
2'b2: out = c;
endcase

CASE2:
always@(a or b or c or sel) begin
case (sel)
2'b0: out = a;
2'b1: out = b;
2'b2: out = c;
default: out = 1'b1;
endcase

Priority Logic:
Consider a case of 4 to 2 encoder, the truth table looks like
(I3,I2,I1,I0|E1,E2),
(0,0,0,1|0,0),(0,0,1,0|0,1),(0,1,0,0|1,0),(1,0,0,0|1,1),
For all others(x,x,x,x|x,x)
The code for such a module may look like

module may_encode_4to2(i,e);
input [0:3] i;
output [0:1] e;

always @ (i) begin
if(i[0]==1) e=2'b00;
else if(i[1]==1) e=2'b'01;
else if(i[2]==1) e =2'b'10;
else if(i[3]==1) e=2'b'11;
else e=2'bxx;
end

endmodule
  • Intension: Whenever two inputs are '1' the output must be don't care
  • Consider the input: i[0:4] = (1,0,0,1), the resultant output must be a don't care but unfortunately the output is a (0,0).
  • Reason: Since (i[0]==1) takes higher priority compared to the final else condition and so may fail. If-else and case statements are interpreted literally
  • Resulting Circuit: A serial bit comparison circuit.
  • Prevention: Try to give mutually exclusive inputs for comparison functions as demonstrated below so that the logic evaluated is simpler and circuit implemented would be in parallel
module encoder_4to2(i,3);
input [0:3] i;
output [0:1] o;
reg e;

always @ (i) begin
if (i=4'b0001) e=0;
else if (i=4'b0010) e=1;
else if (i=4'b0100) e=2;
else if (i=4'b1000) e=3;
else e=2'bxx
end

endmodule

Interconnecting Modules:
To maintain large designs modularity plays a important role. Verilog supports higher modules which contain sub-modules interconnected to each other. Lets consider a example of a ALU which does the basic functionalities of addition, subtraction and multiplication on the two input 32 bit values (in case of multiplication lets assume input to be 16 bits) and outputs the requested functionality.
The table for functionality may look like (assume F0 and F1 to be selectors and A and B be the input)

(F0,F1,Func)(0,0,A+B)(0,1,A-B)(1,1,A*B)

The below is a example of the ALU with the test setup. The test setup consists of initializing the registers with the required input and modifying the input after delays.

// Individual Components
module mux32three(i1,i2,i3,sel,out); // 32 bit 3 to 1 mux
input [31:0] i1,i2,i3;
input [1:0] sel;
output [31:0] out;
reg [31:0] out;

always @ (i1 or i2 or i3) begin
case (sel)
2'b00: out=i1;
2'b01: out=i2;
2'b10: out=i3;
default: out=32'bx;
endcase
end
endmodule

module add32(i1,i2,o); // 32 bit adder
input [31:0] i1,i2;
output [31:0] o;
assign o = i1+i2;
endmodule

module sub32(i1,i2,o); // 32 bit subtraction module
input [31:0] i1,i2;
output [31:0] o;
assign o = i1-i2;
endmodule

module mul16(i1,i2,o); // 16 bit multiplier
input [15:0] i1,i2;
output [31:0] o;
assign o = i1*i2;
endmodule

// ALU Module formed by inter-connecting modules
module alu(a,b,sel,alu_out); // ALU Module
input [31:0] a,b;
input [1:0] sel;
output [31:0] alu_out;

wire [31:0] add_out, sub_out, mul_out; // Intermediate wires to connect sub-modules

add32 adder(a,b,add_out);
sub32 subtract(a,b,sub_out);
mul16 multiply(a[15:0],b[15:0],mul_out);
mux32three out_mux(add_out,sub_out,mul_out,sel,alu_out);

endmodule

// Test Workspace which instantiates the alu and gives test input
module test_alu;
reg [31:0] a,b;
reg [1:0] func;

wire [31:0] our_alu_out;

initial // This sub-routine is called only once
begin
a = 32'd10;
b = 32'd10;
func = 2'd0;
#2 // Delay element (2 cycle delay)
func = 2'd1;
#2
func = 2'd2;
end

alu our_alu(a,b,func,our_alu_out);

endmodule

Modules Can be interconnected by specifying the ports as ordered or associate the port with the input/outputs.

alu our_alu(a,b,func,our_alu_out);// Can also be called as
alu our_alu(.a(a),.alu_out(our_alu_out),.b(b),.sel(func));
// in which case the order of inputs need not be maintained

Also the Verilog has primitive routines which need not be instantiated and must be ordered.
eg: and(in1,in2,in3,out)


Useful Boolean Functions:
  • Bitwise Operator: Operates on individual bits of a vector
    • ~a(NOT), a&b(AND), a|b(OR), a^b(XOR), a~^b(XNOR)
  • Logical Operator: Acts on the vector as a whole
    • !a(NOT), a&&b(AND), a||b(OR)
  • Reduction Operator: Acts on each bit of a single input
    • &a (AND), ~&a(NAND), |a(OR), ~|a(NOR), ^a(XOR)
    • &(4'b0110) = 0&1&1&0=1'b0
  • Comparison Operator:
    • ab,a<=b,a>=b - Relational Operator
    • a==b , a!=b - logical equality (whenever a or b contains x or z the result is unknown else return 0 or 1based on comparison)
    • a===b,a!==b - Case equality (does a bit by bit comparison even when x and z are present and return 1 or 0 based on comparison)
  • Note the difference between ! and ~

Verilog - Basics (I)

It is a HARDWARE DESCRIPTION LANGUAGE used to describe digital systems. It allows you to specify the digital systems at various levels of abstraction - Behavior Level, Register Transfer Level (RTL), Gate Level, Switch Level.

Design Styles:
A system, traditionally, can be designed by either Bottom Up or Top Down. Bottom up is where we start with the lowest of modules and add make complex modules from lower modules. This is usually done for smaller electronic circuits but for huge systems the Top Down approach is preferred. Since this allows the flexibility of early testing and easy change of technologies.
The basic Top Down Approach followed is as follows:
Specification ->
High Level Design ->
Low Level Design ->
RTL Coding ->
Functionality Verification -> Gate Level Simulation ->RTL Coding
Logic Synthesis -> Gate Level Simulation ->RTL Coding
Place and Route ->
Fabrication ->
Post Si Validation

Abstractions:
Behavioral:
  • System described by Concurrent Algorithms.
  • Each Algorithm consists of sequential instructions executed one after other
  • Functions, Tasks and always blocks are major components
Register Transfer Level:
  • Circuit specified by operations involving trasfer of data between registers
  • A clock is always used and RTL has strict time bounds
Gate Level:
  • Not advisable since there are tools (synthesis tools) that does this. Gate level simulation is done using the output from these tools
  • System described by logical links and timing information
  • Logic Values used : 0,1,x,z
  • Operations: AND, OR, NOT etc
Verilog Semantics:
Module:
  • Verilog design consists of inter-connection of modules
  • Modules may be an element or collection of lower level design blocks
An example of a simple module would be,

module mux_2_to_1(a,b,out,outbar,sel);
// 'module' keyword starts a module. definition ends in a semicolon
// Comments start with a double slash
input a,b,sel; // ports are defined at the beginning of module
output out, outbar; // direction is given by input, output, inout (bi-directional)

assign out = sel?a:b; // Module behavior expressed. Each line executes parallely and order
assign outbar = ~out;

endmodule

Continuous (Data) Assignment:
  • 'assign' keyword used for continuous assignments i.e the right hand side is applied to the left hand side continuously. A simple way to express Combinatorial Logic
  • Target of Continuous Assignment is a net with Combinatorial Logic .
  • Left Side:
    • Net (Vector or Scalar / Concatenated or not)
  • Right Side:
    • Net or Register / Vector or Scalar / Concatenated or not
  • Data flow Operators
    • Conditional Assignment: (condition)?(value-if-true):(value-if-false)
    • Boolean Logic: ~,&,|
    • Arithmetic: +,-,*
    • Nested Condition:
      • assign out = s1?(s0?i3:i2):(s0?i1:i0); //4:1 Mux
Gate Level Description:
The same mux if implemented at gate level would look like,

module muxgate(a,b,sel,out,outbar);
input a,b,sel;
output out,outbar;
wire out1,out2,selb;
and a1(out1,sel,a); //
and b1(out2,selb,b);
not n1(selb,sel);
or o1(out,out1,out2);
assign outbar=~out;
endmodule
  • Nets represents the connection between hardware elements. 'wire' is the keyword for Nets
  • Each module can be instantiated by . details discussed later.
  • Verilog Library supports basic logic gates as primitives
    • and or not nand nor xor xnor not buf
    • can be extended to multiple inputs.eg: nand n4(out,in1,in2,in3,in4)
    • bufif0 and bufif1 are tri-state buffers
    • Primitives do not require instance variable eg and(out,in1,in2,in3)
Procedural Assignments:
As said earlier each description of the module (or statements) execute parallely and so order is not important (which is very useful in representing Combinatorial Logic) but for implementing Sequential Logic where conditions are executed one after other Procedural Assignments help.
  • Two structured procedural statements: initial and always
  • Supports richer C-like constructs like if-else, case,for,while
module mux_2_to_1(a,b,out,outbar,sel);
input a,b,sel;
output out,outbar

always @(a or b or sel) begin // always@(sensitivity list) begin execution
if(sel=0) out=a; // the region between begin to end is executed serially and order matters
else out = b;
outbar=~out;
end

endmodule

  • Conceptually, always block is called once whenever anything(since 'or' is used) in the sensitivity list is modified.
Verilog Registers:
  • Registers (reg) in Verilog corresponds to a variable which can store a value
  • Unlike digital system registers (which are memory elements which require clock and so should not be confused) registers do not require clock and value can be changed anytime.
Difference Between Procedural and Continuous Assignments:
  • Both can co-exist in the same module
  • Procedural statements update a register and this register is not modified unless another procedural statement updates it.
  • In continuous assignment, the right hand side is constantly placed on the left hand side.
Case Statement:
Case and if may be used interchangeably and Case statements helps in readability in case of long if-else statements.

module mux_2_to_1(a,b,out,outbar,sel);
input a,b,sel;
output out,outbar

always @(a or b or sel) begin
case(sel)
1'b1: a; // number representation '
1'b0: b; // e.g. 16'h4fff - 16 bit hexadecimal value
endcase
outbar=~out;
end

endmodule

N-bit Signals (Vectors) - Power of Verilog:
  • Multi-bit signals and buses are easy in Verilog
  • 2-to-1 multiplexer with 8-bit operands
module mux_2_to_1(a,b,sel,out,outbar);
input [0:7] a,b;
input sel;
output [0:7] out,outbar;
reg [0:7] out;

always @(a or b or sel) begin
case(sel)
1'b1: a;
1'b0: b;
endcase
outbar=~out;
end

endmodule
  • Here 'sel' is a scalar while a,b,out and outbar are called Vectors
  • Signals can be concatenated using a Concatenation operator {}
    • {b[0:7],b[8:15]} = { a[8:15],a[0:7]}
    • The above is a byte swap operation
Integer Arithmetic:
32 bit adder with carry:

module add32(a,b,Cin,out,Cout);
input [0:32] a,b;
input Cin;
output [0:32] out;
output Cout;
assign {Cout,out} = a+b+Cin;
endmodule

Tuesday, June 12, 2007

Digital Systems (II)

This deals with reduction of boolean functions. Lets start with,
One Bit Binary Adder:
Input: A,B,Cin(carry)
Output: S,Cout
Truth Table:
(A,B,Cin,S,Cout) (0,0,0,0,0)(0,0,1,1,0)(0,1,0,1,0)(0,1,1,0,1)(1,0,0,1,0)(1,0,1,0,1)(1,1,0,0,1)(1,1,1,1,1)

Sum of Products representation:
S=A'B'Cin+A'BCin'+AB'Cin'+ABCin = (After Reduction) A XOR B XOR C
Cout=ABCin'+AB'Cin+A'BCin+ABCin = (After Reduction) Cin(A+B)+AB

Canonical Form Representation:
  • Sum Of Products(Min) Terms
    • Min Terms - ANDed product of literals (input combination for which output is true)
    • Representation: [(0,0,0)(a'b'c') m0],[(0,0,1)(a'b'c) m1],[(0,1,0)(a'bc') m2] .....
  • Product of Sum (Max) Terms
    • Max Terms - ORed sum of literals (input combination for which output is false)
    • Representation: [(o,o,o)(a+b+c)M0],[(o,o,1)(a+b+c')M1],[(o,1,o)(a+b'+c)M2], ...
Thus Expressing the 'S' in Canonical (not same as reduced or minimal form)
SOP: S = SUM(1,2,4,7) = m1+m2+m4+m7
POS: S = PROD(0,3,5,6) = M0.M3.M5.M7 = (A+B+Cin)(A'+B'+Cin)(A'+B+Cin')(A+B'+Cin')

Conversions:
  • SOP to POS:
    • F(a,b,c) = SUM(1,2,4,7) = PROD(0,3,5,6)
  • POS to SUM:
    • F(a,b,c) = PROD(1,4,7) = SUM(0,2,3,4,6)
  • F SOP to F' POS (switch SUM and PROD)
    • F(a,b,c) = SUM(1,2,3,4) = PROD(0,5,6,7)
    • F'(a,b,c) = PROD(1,2,3,4) = SUM(0,5,6,7)
  • F POS to F' SOP (switch SUM and PROD)
    • F(a,b,c) = PROD(1,2,3,4) = SUM(0,5,6,7)
    • F'(a,b,c) = SUM(1,2,3,4) = PROD(0,5,6,7)
    • Observe that it is same as previous just switch F and F'
Uniting Theorem: This theorem is the basis of Boolean logic reduction.
  • Basis: A(B+B') = A
  • Find two element sub-sets of the ON-set where only one variable changes its value - this single varying variable can be eliminated and a single product term used to represent both elements
  • Usually visual techniques are used to identify these element sub-sets
Boolean Cubes:
One way to visualize a boolean logic is to assume the logic to be a cube. 'One' input is a line, 'two' input a square, 'three' input a cube ... and n inputs corresponds to a n dimensional cube.
Every corner of the cube is assumed to be a adjacency plane. Each adjacency plane corresponds to a product term. Adjacency planes for which the output is ON is made bold. ON-set adjacencies are combined to form ON Sets and these sets form sub-cubes
Implication of a sub-cube in 3d cube,
  • 0-cube i.e a single node represents a product term in three literals
  • 1-cube i.e a line of two nodes represents a product term in two literals
  • 2-cube i.e a square of four nodes represents a product term in one literals
  • 3-cube i.e a cube of eight nodes represents a CONSTANT output
  • In general, in a 'n' dimensional cube, a 'm' sub-cube yields a term with n-m literals

K (or Karnaugh) Maps:
Representing the logic in rectangular boxes, where each box in the rectangle represents a min term. The boxes are arranged in such way that the neighboring boxes vary in one variable. So the numbering system used for these rectangular boxes is based on Grey Codes.

Certain systems have don't care conditions in which case, they can be used as either '1' or '0' to the advantage to reduce the logic.

Certain systems have lesser number of literals in POS than in SOP form, like
SOP: F(A,B,C,D) = C'D+A'D
POS: F(A,B,C,D) = D(A'+C')

Digital Systems (I)

Basic Gate:
  • Inverter: out = !in
  • MOS gates are like switches. The 'source' is connected to the 'drain' based upon the 'gate' voltage (Vgs).
  • PMOS: If Vgs
  • NMOS: If Vgs>Vt(a threshold voltage), then source and drain are connected else not connected
  • CMOS: Complementary MOS i.e combination of a P type and N type MOS. PMOS drain connected to the source of NMOS (and acts as a output). PMOS source connected to Driver Voltage (Vd). NMOS drain connected to the ground. The 'Gate' of both are connected to the 'Input'
  • CMOS: If Input (Vgs>Vt) PMOS is open and NMOS is closed. So 'Output' connected to 'ground' i.e In = High, Out = Low
  • CMOS: If Input (Vgs
Possible functions with two variables:
From the above we can derive the number of functions that are possible with one variable i.e 2. They are Out = In and Out = NOT(In).
Going a step further, lets take two functions.Let 'x' and 'y' be two input variables for a digital system. What are the possible number of functions with x and y?
  • x and y each have two states '0' and '1'
  • So the possible number of states with both x and y is 2^2 = 4 states.
  • So with four possible input states the possible output states is 2^4 (since each state can have 1 or 0) =16 and thus 16 number of functions
  • In general, for a 'n' input system there are 2^(2^n) functions.
(x y)=> (0,0) (0,1) (1,0) (1,1)<=> (s1,s2,s3,s4)
f(s1,s2,s3,s4)=>(0,0,0,0) (0,0,0,1) (0,0,1,0) (0,0,1,1) ... and so on

Common Gates:
OR(x,y,out)=>(0,0,0)(0,1,1)(1,0,1)(1,1,1) <=> Z = X+Y
AND(x,y,out)=>(0,0,0)(0,1,0)(1,0,0)(1,1,1) <=> Z = X.Y
NOR(x,y,out)=>(0,0,1)(0,1,0)(1,0,0)(1,1,0) <=> Z = NOT(X+Y)
NAND(x,y,out)=>(0,0,1)(0,1,1)(1,0,1)(1,1,0) <=> Z = NOT(X.Y)
XOR(x,y,out)=>(0,0,0)(0,1,1)(1,0,1)(1,1,0) <=> Z = NOT(X).Y + X.NOT(Y)
XNOR(x,y,out)=>(0,0,0)(0,1,0)(1,0,0)(1,1,1) <=> Z = NOT(X).NOT(Y)+X.Y

Generic Gates Design:
CMOS results in inverting functions and so it is usually easier to construct NAND and NOR compared to AND and OR.
As a generalization of the single input case, a gate can be assumed to be designed as PULL UP LOGIC (=> PMOS) and PULL DOWN LOGIC (=> NMOS)
PULL UP => connects the output to Driver Voltage to get high output for required input combination.
PULL DOWN => connects the output to the ground to get Low output for required input combination.
For NAND,
x 0 1 0 1
y 0 0 1 1
Pup on on on off
Pdn off off off on
out 1 1 1 0

For NOR,
x 0 1 0 1
y 0 0 1 1
Pup on off off off
Pdn off on on on
out 1 0 0 0

Boolean Algebra Theorems:
  • Elementary
    • x+0=x <=> x.1=x
    • x.0=0 <=> x+1=1
    • x+x=x <=> x.x=x
    • NOT(NOT(x)) = x
    • x+NOT(x)=1 <=> x.NOT(x)=0
  • Commutativity
    • x+y = y+x <=> x.y=y.x
  • Associativity
    • x+(y+z) = (x+y)+z <=> x.(y.z) = (x.y).z
  • Distributivity
    • x.(y+z) = x.y+x.z <=> x+(y.z) = (x+y).(x.z)
  • Uniting
    • x.y+x.NOT(y) = x <=> (x+y).(x+NOT(y))
  • Absorption
    • x+x.y=x <=> x.(x+y) = x
    • (x+NOT(y)).y = x.y <=> x.NOT(y)+y = (x+y)
  • Factoring
    • x.y+x.z=x.(y+z) <=> (x+y).(x+z) = x+(yz)
  • Consensus
    • (x.y)+(y.z)+(NOT(x).z) = (x.y)+(NOT(x).z) <=> (x+y).(y+z).(NOT(x)+z) = (x+y)(NOT(x)+z)
  • De Morgan's Law
    • NOT(x+y+z+...) = NOT(x).NOT(y)NOT(z)... <=> NOT(x.y.z. ...) = NOT(x)+NOT(y)+NOT(z)+ ...
  • Generalised De Morgan's Law
    • f(x1,x2,x3..,1,0,.,+) = f(NOT(x1),NOT(x2),NOT(x3)...,0,1,+,.)
  • Duality:
    • Obtained by replacing exchanging '+' and '.' AND '1' and '0' leaving the variables without any change
    • f(x1,x2,x3..,0,1,.,+) <=> f(x1,x2,x3,..,1,0,+,.)

Friday, June 8, 2007

Moving from VC++ to GCC ..

I have been using Visual Studio for doing my code development for the past three years and am new to the Unix environment. I wanted to learn this so that it would help me to work on both platforms without any worries. So in this article I am planning to talk about the transition from VC to GCC.

Step 1: CygWin
It is a collection of software which help running the unix environment on Windows platform. (Too generic explanation. check out the link for more details). For installing it you need to go to the following link. Download the setup. Choose what packages (equivalent of softwares in windows) you need in your Unix environment and install them.
Basic things that are required are:
  • VI Editor - for file editing
  • GCC and G++ - Compilers
  • GDB - Debugger for debugging your code
Step 2: G++ and GCC
Next thing to understand is how the compilation takes place in Unix systems.

Compiling and Running a Single File:
Lets start by compiling a single file.

$ gcc sample.c

This compiles the code in sample.c and if it does not have any compilation (or linking errors) crates a output file a.out (named so for historic reasons). If the output file nome needs to be changed it can be done by

$ gcc sample.c -o sample

which produces the output file "sample".
To run the file we need to specify the path and the file name as shown below

$ ./sample

Sometimes the executable permission for the file needs to be set in which case it can be modified by the following command

$ chmod 777 sample

Giving read write and execution permission for the administrator,group and the user.

Creating a Debug Ready Code:
For debugging a code, the executable generated must have debug information so that while we use the debugger to run the code we would be able to break at certain point, view variables etc.

$ g++ -g sample.c -o sample

The output file size is bigger than the normal compiled file. This extra information can be removed by using the command "strip"

$ strip sample

This basically throws out the symbol information. For more details read "man strip"

Creating a Optimized Code:
When we try to generate a optimized code (say in code size and speed) the compiler used various algorithms and tends to be little slower it getting the output file.

$ g++ -O sample.c -o sample

For higher optimizations we can use a number along with the '-O' options. It is recommended that we do not go more than 2 (since higher optimizations can result in code which may modify the functionality)

$ g++ -O2 sample.c -o sample

For other options it is better to rean the man pages.

Compiling Multiple Files:
As known, projects are not made of single files and for organizing files properly for understanding. The easiest way to do it would be to compile all the source files in a single command

$ g++ sample.cpp sample1.cpp sample2.cpp -o sample

But the problem with this is we need to compile all the files even if one of the files is changed and so we separate the process into two steps
  • Compiling - To create .o (object files)
  • Linking - To link the object files to form the executable (or output)
$ g++ -c sample.cpp
$ g++ -c sample1.cpp
$ g++ -c sample2.cpp
$ g++ sample.o sample1.o sample2.o -o sample

The first three statements use "-c" option to signal the compiler to stop with the creation of the object files alone and the final line links all the object files to form the output. The intension behind this split-up is that the linking functionality takes relatively less time than the compilation.

Steps Involved In Compilation:
We just saw the various compilation methods and options but would it not be better to understand what is done when we invoke the g++ (or cc or gcc or acc) command.
  • Driver - It is what is invoked when we call "g++". It is a "engine" which drives the various tools of the compiler. When invoked it invokes the various tools involved and passing the output of one into the input of the other
  • C Pre-Processor - normally called "cpp". It handles all the pre-processor functionalities (#define, #include and #ifdef etc)
    • $ g++ -E sample.cpp
  • Compiler - normally called the "cc1". Creates the object file ('-c' option)
  • Assembler - normally called as "as".takes the object file and maps it to the asm code
    • g++ -S sample.cpp
  • Linker-Loader - takes all the object files and creates a executable which the operating system supports. It contains the internal structure of the executable-location of the data segment, location of source code segment, location of debug information and so on.
Step 3: Debugging Using GDB
Gdb stands for "GNU Debugger". It is a debugging tool in the unix environment. It helps u to step through, break, print values of the running code. It is a command line debugger and unlike the VC debugger does not have separate windows for viewing variable and stack etc. But is vary powerful

To Invoke a "gdb":

For running the executable in the debugger we first require to compile the code with debug information.

$ g++ -g sample.c -o sample.exe
$ gdb sample.exe

The gbd needs to be run in the directory in which the source file is present or else the gdb would not show us where we are currently executing. It is possible to link other source files into the gdb

Running a program inside gdb:
To start running the code inside the gdb we use the run command. For giving runtime inputs to gdb we attach the arguments along with run. Since space is considered as a parameter separator we need to specify strings inside quotes.

$ run "hi how r u" "hope this is useful"


Setting Breakpoints:
Break points are places were u want the program to pause and helps u to view the state of the execution (like various variable values, stack etc). They can be set up by
  • specifying line number
    • $ break sample.cpp:9
  • specifying the function name
    • $ break main
Stepping through the commands:
Once you have paused inside the code you can "step" through the code at a one line by the following
  • To step into a function (if it is a function call)
    • $ step
  • To execute and go to the next line after executing the present
    • $ next
To learn more on how to set up break points and using conditions to break use "man break" or "man breakpoints"

Printing variables and expressions:
The values at the position where u breaked can be checked out by using

$ print i

and the output would be

$ $1=10

meaning "i" is 10 (this would be printed only if "i" is inside the scope). If it is not the case the a error message would be printed

$ No symbol "i" inside context.


Viewing the stack:
To get the stack trace information or to know "where" you are in the code use

$ where

and the following output is obtained

$ #0 print_string (num=1, string=0xbffffc9a "hello") at debug_me.c:7
$ #1 0x80484e3 in main (argc=1, argv=0xbffffba4) at debug_me.c:23

The #1 and #0 are the stack frame information and so any local variable insdie this stack frames can be viewed by first shifting to that stack frame and then printing out variables in that frame

$ frame 1
$ print i

Now the variable i in frame 1 (in function main) would be printed.

Attaching to already running process:
There can be scenarios in which we want to attach to already running process like some process running in background (to run a process in background attach a '&' to the process like ./sample&), some process from a remote machine. In that case we can attach the gdb to that process by the program name and process id (if we use only the program name then another instance of the program starts running in the gbd)

$ gdb sample 9261

Here the assumption is that sample is the program name and 9261 is its Process ID. With this command the gdb attaches itself with the program and then pauses it. From here we can use "where" to find out the position in the program and start our debugging.

Debugging a crashed code:
There are always possibility that a code can crash. Whenever a code crashes a "core" file is dumped which has the information about the state of the memory during the crash, stack information and so on. This can be viewed by loading it back using the GDB.

$ gdb sample core

Usually this file get created in the directory in which the executable is running (due to call of some some signals SIGV which I do not understand). There can be cases where the crash is due to memory corruption and in this case the stack information itself may be invalid and so gbd does not help.

Step 4: MAKE Utility - Automating Compilation
Compiling more than one source file would be pretty annoying and if this could be automated by some simple scripts it would be helpful. This is where make utility (or makefiles) come into play
Makefiles is a collection of instructions which is used to compile our program. Whenever some files of our program are modified we type "make" which would recompile only the files that are modified using minimum compilation and create the output file. However, this is not done by some magic but by the set of rules and dependencies supplied by us in the makefile.

Makefile Components:
The make file consists of

Variable definitions - Lines defining values for variables using '='
CFLAGS = -g -Wall
SRCS = main.c sample.c

Dependency Rules - under what conditions should a file be re-compiled
main.o:main.c
g++ -c -Wall main.c -o main.o

Here the main.o file is to be updated whenever the main.c file is modified (dependency as shown in line one). For updating main.o the method used is specified in commands next line. A command line always starts with a tab space followed by the necessary command

Comment - A line starting with # is considered as comment
# this line is commented

Order of Updating Files:
A make command invokes the commands specified in the makefile, checking for dependencies recursively. Any makefile would contain a target. A target can be a file which the make needs to update or a just a name to mark a starting point (in which case it is referred as a 'phony target'). It checks the dependency for the target. Then it recursively checks the dependencies for the files in the dependency of the target. The recursion is terminated whenever there are no dependencies but there is a file by the dependency name (in which case it is assumed that this file is up to date). Then the commands are executed to update every dependent file till the current target is updated. This appears to be a little complex and would be clear from the example below.

Single File Compilation:
Lets consider a example and then understand what are the features of Make.

# top level rule to compile all
all: main

# Linking the object file
main: main.o
gcc -g main.o -o main

# compiling the source
main.o: main.c
gcc -g -Wall -c main.c

#cleaning everything after creation
clean:
/bin/rm -f main.o main

1) all rules need not have command. eg all: main
2) all rules need not have a dependency as in 'clean' in which case the command is executed without checking any condition
3) all rules need not be used whenever make is invoked. say clean is not used when programs are compiled and is used for cleaning the code

Compiling more than one file:
Moving to the next stage if there are multiple files to compile in a project (as in real world scenarios), we try to make the makefile as flexible as possible so that it can be grown by modifying it as we grow the project.

#the top level rule
all: main

#the program is made of several source files
main: main.o file1.o file2.o
g++ -g main.o file1.o file2.o -o main

#dependencies of file1
file1.o: file1.c file1.h
g++ -g -Wall -c file1.c

#dependencies of file2
file2.o: file2.c file2.h
g++ -g -Wall -c file2.c

#rule for cleaning
clean:
/bin/rm -f file1.0 file2.o main.o main

1) As seen each file has its own rule.
2) Each file has dependency over its header file because if the dependency covers only the source file, there can be cases in which the header file is modified but since the source file is not modified the re-compilation is not invoked thus leading to a wrong result.
As seen there are many redundancies in the above way of representation and these can be removed by utilizing the features of makefile

Using Compiler and Linker Flags:
In the above example we execute compile all the files in the debug mode and once debugged if we want to generate a optimized code we need to go to every rule and change the option. This process is cumbersome and error prone. To avoid it it is always better to assign the Flag options and compiler type (since this can also change) to variables and use the variables in creating the make file.

# Use "gcc" to compile
CC=gcc
#Sometimes the linker can be different from the compiler
LD=gcc
# the compiler flags
CFLAGS = -g -Wall -c
#Linker Flags
LFLAGS= -g
#Command used to remove files
RM=/bin/rm
#List of generated object files
OBJ=main.o file1.o file2.o
#program executable filename
PROG=main

#the top level rule
all: $(PROG)

#the program is made of several source files
$(PROG) : $(OBJ)
$(LD) $(LDFLAG) $(OBJ) -o $(PROG)

#dependencies of file1
file1.o: file1.c file1.h
$(CC) $(CFLAGS) file1.c

#dependencies of file2
file2.o: file2.c file2.h
$(CC) $(CFLAGS) file2.c

#rule for cleaning
clean:
$(RM) -f $(OBJ) $(PROG)

Even after this modification (where even the remotest possibility of change in settings was assigned to a variable) still there is a issue of introducing a new rule for every file include in the project which is taken care by "File Type" rules

# each source file dependency can be replaced by
%.o: %.c
$(CC) $(CFLAGS) $<

Filetypes:
1) '% ' is a wild card and matches anyfile which ends in a .o to be dependent on any file (with the same match as .o) which ends with .c i.e A file with file1.o is dependent on the file with file1.c (and not with file2.c)
2) '$<' variable corresponds to the dependency list that was matched by the rule. Say in the case #dependencies of file2 file2.o: file2.c $(CC) $(CFLAGS) $< $<>

#define source file
SRCS=main.c file1.c file2.c
# All others remain the same
#........
# Depend rule
depend:
$(RM) .depend
makedepend -f- -- $(CFLAGS) -- $(SRCS) > .depend
#Now add a line to include the .depend file
include .depend

So in the above example the makedepend utility browses through all the .c source file and creates a dependency list of all header files and pushes it in to a .depend file. Since this is included in the makefile the dependency list is updated with the header dependencies.

Thursday, June 7, 2007

Math softwares

Reading through Digg, I came across this real good article which talked about the various math tools available free online. It starts of with three free softwares and the readers have added many new and good ones in the comments. So thought of collecting them into a single post.
Courtesy: 3 awesome free math programs by antonio
Well Known Softwares, like the
  • Mathematica - a complete package for mathematics from a small calculator to a system simulator
  • Matlab - personally the only software I have used and benefited from. Similar, but is more on the engineering side rather than math
  • Maple - I used this for fun. To try and visualize various plots and formulas. Can be used as we do math on a notebook.
Know is the fact that these are not free and for research and educational purposes they are pretty costly. So there are a list of alternatives which are free (Long Live The Open Source)
  • Maxima - It is a Computer Algebra System (CAS) similar to Maple. A CAS is a system which is able to perform symbolic manipulation for the resolution of common problems.
  • SciLab - The free replacement for Matlab. It is a good plotting and visualization tool
  • R - Yes thats it. That is the name of the software. When there is 'C' why can't there be a 'R'. Anyway, For statistical computing and analysis in the Open Source world nothing can get better than R. It is similar to the commercial S-plus
The above were the ones discussed by the author and the following would be the once I picked from the comments. Where there is a community there is a solution ;)
  • Scipy - Scientific Programming using Python. Provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.
  • newt - this is read as (Eta Epsilon Omega Tau). This says to be a demo version used for plotting funcitons
  • Geogebra - GeoGebra is a free and multi-platform dynamic mathematics software for schools that joins geometry, algebra and calculus.
  • FreeMat - free environment for rapid engineering and scientific prototyping and data processing. It is similar to commercial systems such as MATLAB from Mathworks, and IDL from Research Systems, but is Open Source
  • GAP - does not seem impressive. A user comment "for real mathematics, there’s GAP, which allows you to manipulate groups, permutations, graphs, rewriting systems, Lie algebras, etc, using a Python-like language. Lots of standard algorithms for computational algebra are implemented in the standard library."
  • Math Code Snippets - Codes that can be re-used. User comment "For several–specific–numerical mathematical routines. The routines posted off this page are based on good-quality code, mostly from the NETLIB repository of algorithms. Very handy (quick, easy-to-use, and free!) for when numbers are required, but a user might not want to install and start up a whole software package."
  • Grace - Plotting tool. Stands for "GRaphing, Advanced Computation and Exploration of data."
  • ChartAll - The name says it all. A web utility to chat and plot information
  • TEECHART OFFICE - Another charting tool.
Hope these are useful.

Free online technical journals

Some of the newsletter I have subscribed to.

Tuesday, June 5, 2007

Some Good Firefox Add-Ons and Extensions

Basic:
  • Adblock and Adblock Filterset.G Update - Makes your browsing experience really good by hiding the ads.
  • Download Statusbar - Minimize the huge download status window into a small icon in your browser windows
  • Download ThemAll, Flashget, Free Download Manager Plugin - A good download manager and accelerator
  • Google Notebook - allows you to make notes of things you read in google notebooks
  • TamilKey - Allows you to type in tamil
  • Videodownloader - helps u in downloading the FLV video(flash files) embedded in the HTML pages of google videos, you tube, etc

Tab Effects and Manipulations:
  • Tab Mix Plus - Allows u to store and restore sessions (integrated with the recent mozilla browser)
  • Tab Effect - Gives you the apple type cube rotation effect when u switch between tabs
  • Tab Scope - Gives a preview of the currently opened tab when u move the mouse over the tab. you can scroll, move forward and backward in the preview window
  • Mouse Gestures - Removes the use of keyboard for browsing. Move you move to make certain drawings and you would be able to open, close, switch minimize etc the tabs and windows. Funky add on
  • Firefox Showcase - Shows the preview of all opened tabs and helps u find the required tab
Theme:
  • Noia 2.0 - This is the current theme I love to have

Friday, June 1, 2007

20 Free Application To Increase Productivity

  1. Launchy - free program launcher
  2. AutoHotKey - automate just about anything by capturing your keystrokes and mouse clicks
  3. AVG Anti Virus
  4. Spybot Search And Destroy - anti spyware
  5. Ad-Aware
  6. Free Download Manager
  7. BK ReplaceEm - powerful search and replace utilities, allowing you to operate on multiple files at once.
  8. Google Web Accelerator - allow you to enjoy faster web browsing in seconds
  9. CamStudio - recording software that will allow you to create demonstration videos, online tutorials, or even video-based information products
  10. Audacity - software for recording and editing sounds. You can use Audacity to record live audio, edit sound files, mix sounds together, and much more
  11. Foxit PDF Reader - fast PDF reader convertor etc
  12. 7-zip - file compressor
  13. CCleaner - removes unused files from your system, freeing up valuable hard disk space.
  14. Openoffice.org - a replacement for Microsoft's Office Suite
  15. Better Gmail -To enhance your Gmail for optimum productivity, then you will need to download the Better Gmail Firefox extension.
  16. FileZilla - fast reliable FTP client
  17. RoboForm - password manager and web form filler that will allow you to browse the Web faster than ever
I found only 17 to be useful. :) . Courtesy: Top 20 Free Applications to Increase Your Productivity

Thursday, May 31, 2007

Anti Aliasing Filter

Aliasing: "signals" that are essentially continuous in space or time must be sampled, and the set of samples is never unique to the original signal. The other signals that could (or did) produce the same samples are called aliases of the original signal.

Nyquist Criteria: The minimum sampling rate required to allow unambiguous reconstruction of a bandlimited continuous signal from its samples (without having any alias) is to sample it at a rate more that twice the bandwidth of the signal.

Generally, any real world signal (be it sound or image or video) is not band limited. It is the perceiver (the ear, the eye) that is band limited or responsive to certain frequencies. Thus when a real world signal is converted to digital to avoid aliasing the signals are first filtered to make them band limited and then by Nyquist criteria they are sampled.

In case of voice, the maximum frequency response is around 4KHz and so a anti-aliasing filter of 4 Khz is first applied.Similarly, for music the audible range is around 22Khz and so a anti-aliasing filter is used.( Anti-aliasing filter is a low pass filter.)

The concept of aliasing is more clear in case of sound or audio but when it comes to video or images it is not so clear or it is not mapped to the things that we use for sound. So this article aims at clarifying this.

The example I am going to take here is from a article I read in "Extreme Tech" named "Anti-aliasing and Anisotropy filtering explained". Consider the following image that needs to be sampled based on the pixel grid shown behind it.

A diagonal line on a set of pixels.
The image you’re trying to represent—a diagonal line.
A diagonal line on a set of pixels.
A rough way to do this would be to just make points in which the black area is covered mostly by making it a black pixel and areas covered by white mostly by white pixel. Doing this we get the following image.

Aliased pixels
Aliasing at work—your pixels display one color or the other.
Aliased pixels
From the above figure it is clear that the image has jagged effect and does not represent the diagonal line clearly. To make it more clear lets consider a more sampled grid.

Aliasing at higher resolution
As resolution goes up, the ugliness of aliasing become less apparent.
Aliasing at higher resolution
Here even though the diagonal line is better seen (since by going for higher number of samples we have increased the sampling rate and thus reduced aliasing effect) we still observe the jagged effect.
To reduce this effect the solution would be thinking in the blind way is to fill regions which are partially with white and black with shades of gray.
So if the pixel is 25% filled by the black edge, you make that pixel only 25% black—a light gray color. If it's 80% covered by the black line, you make that pixel 80% black—dark grey. The result is an antialiased edge.

Antialiased edge
Antialiased edge

What we have really done here is that we have filtered (or applied a anti-aliasing filter) to the input image before sampling it. That is if the diagonal edge is filtered by a low pass filter before sampling then we would observe gray shades along the edges. A much clear picture of the same is found below.

AA edge higher resolution
Antialiased edge on an 80x60 pixel image
AA edge higher resolution

In 3D Graphics, this aliasing effect is described in a different view and the methods described is given below.
Super-sampling:
"The simple brute-force approach is called supersampling. For every pixel on your screen, you determine the color for multiple pixels, and then combine them. Let's say your screen is running at a resolution of 800x600 pixels. If you want to perform 4x supersampling, you effectively draw the screen at 1600x1200 pixels (four times as large) and then scale that down to 800x600 by combining the colors of four neighboring pixels into one. Most modern 3D cards don't exactly draw a higher-resolution screen. Rather, they calculate, store, and combine "subpixel samples" for each pixel."
Sub-pixel samples
In other words, (since we do not have analog filters while capturing or since in graphics we cannot have a perfect analog signal), we over sample the input signal then apply a linear filter across it and then down sample the over sampled input.

Effects of Super Sampling In Real World Images:
No AA vs. 2xAA
Even 2xAA makes a huge difference, and this is the fastest, lowest-quality mode available.
No AA vs. 2xAA

No AA vs. 4xAA
Stepping up to 4xAA makes a big difference.
No AA vs. 4xAA

Full quality comparison
Even higher levels of AA, together with AA modes that supersample objects with transparent textures (like the trees), makes the game look a lot better.
Full quality comparison

From the above comparison pictures effect of anti-aliasing effect is better shown.

Wednesday, May 30, 2007

Six Ways To Write More Comprehensible Code

Courtesy: Jeff Vogel
I read this article and thought I would just jot down the major points. I liked it.

Cache

A place to hide or conceal something (dict)

A cache is a place to store a block of memory which would be used again and again.

REASON:
The reason for having cache is that cost of faster memories are high and to cater to the speed of CPU we require faster memories. So instead of having the entire memory as faster memory (since it is costly) we have smaller faster memories and bring in data whenever required into this faster memory from slower memory. Thus reduce the average access time for data.

ASSUMPTIONS:
Based on Principle of Locality
  1. Spatial Locality - Data that is near the currently accessed data has high probability of getting accessed
  2. Temporal Locality - The same data would be accessed again and again (temporary variables, Instructions getting executed etc)
BASIC BLOCK DIAGRAM:

CPU<=>Level 1<=>Level 2<=>Level 3
------->Memory size Increases-------->
<-------Speed increases <-------------- MECHANISM:

Initially all the data is in Level 3 Cache. Whenever a data needs to be accesses by the CPU it checks for the availability in Level 1 (if found/cache hit). If data not available (cache miss) then it goes to higher level cache (Level2). Then the same till it reaches the data. Once obtained the data along with some of it neighbors are brought into all the lower level caches till the memory.
So during the first access, there is latency. For the subsequent access of the same data (since available in the cache) it is faster - Temporal Locality. Further, for access of data near the first accessed data the access speed is faster - Spatial Locality. Thus on an average it reduces the access time.

TERMS:
Cache Hit:
Data requested by the CPU is present in that particular level
Cache Miss:
Data requested is not present in the particular level
Associativity:
The way in which the cache is organized - e.g. Direct Mapping, Set Associativity
Cache Line: Size of data read from higher level in one read into the current level

Associativity:
How main memory maps to the cache memory location.

Direct Mapping:
Caches can be either "direct-mapped" or "set-associative." To explain these terms, let's look the L1P cache of C64x shown in Figure 4. This cache consists of 512 32 byte lines. Each line always maps to the same fixed addresses in memory. For instance:
  • Addresses 0000h to 0019h will always be cached in line 0
  • Addresses 0020h to 0039h will always be cached in line 1
  • Addresses 3FE0h to 3FFFh will always be cached in line 511.
Once we get to address 4000h, the capacity of the cache is exhausted so addresses 4000h to 4019h start back over at line 0.

n addition to holding copies of data, each line of the L1P cache contains:
  • A valid bit, which indicates whether or not the cache line contains useful data.
  • A tag, which is equivalent to the upper 18 bits of the address. This is needed because a given line can contain data from different addresses. For example, line 0 can hold data from addresses 0000h to 0019h, addresses 4000h to 4019h, etc.
  • An (implied) set number. The set number is equivalent to bits 5 to 13 of the address. For a direct-mapped cache, the set number is the same as the line number. Things get more complicated with a set-associative cache, as we will se later in this article.
Now let's see what happens if the CPU accesses address location 0020h. Assume that the cache has been completely invalidated, meaning that no line contains cached data. When the CPU makes a request to read address 20h, the cache controller starts by looking at the set portion of the address (e.g. bit 5 to 13) to see which cache line it should inspect. For address 0020h the set portion is one. The controller then checks the tag bit for line one to see if the line holds addresses 0020h to 0039h, which it does. Finally, it checks the valid bit to see if the line holds useful data. Since the valid bit is assumed to be 0, the controller registers a miss.

The miss causes the controller to fetch the line (0020h-0039h) from memory and store the data in set one. The tag portion of the address is stored in the tag RAM, and the valid bit is set to one. The fetched data is also forwarded to the CPU, and the access is complete.

If address 0020h is accessed again, the cache controller once again takes the address, inspects its set and tag portions, and compares the tag portion of the address against the value stored in tag RAM. In this case the tag comparison is positive. The valid bit is now one so the controller will register a hit and forward the data in the cache line to the CPU, completing the access.



Set Associative Mapping:
Set-associative caches are an extension of direct-mapped caches. In a direct-mapped cache, each set contains only one line. In a set-associative cache, each set consists of multiple lines, which are referred to as "ways." Figure 5 illustrates a set-associative cache, the C64x DSP's L1D. This is a two way set associative cache with 64 byte lines and a total 16 KBytes capacity.



n addition to holding copies of data, each line of the L1D cache contains:
  • A LRU bit that indicates which way was the least recently used. (This is not found in L1P.)
  • A dirty bit, which indicates whether or not the cache line matches the contents of main memory. (This is not found in L1P.)
  • A valid bit, which indicates whether or not the cache line contains useful data.
  • A tag, which is equivalent to the upper 18 bits of the address.
  • A set number, which is equivalent to bits 5 to 13 of the address.
Hits and misses are determined the same way as in a direct-mapped cache except that now two tag comparisons are necessary—one for each way—to determine which way holds the requested data. If there is a hit in way 0, the data of the line in way 0 is accessed, and if there is a hit in way 1, the data of the line in way 1 is accessed.

If both ways miss, the data needs to be allocated from memory. The LRU bit determines how the data is allocated and can be thought of as a switch. If the LRU bit is set to 0 the line is allocated in way 0, and if it is 1 the line gets allocated in way 1. The state of the LRU bit changes whenever an access (read or write) is made to a cache line. For instance, if the line in way 0 is read, the LRU bit switches to 1. Note that the LRU bit is only consulted on a miss, but its status is updated every time a line is accessed regardless whether it was a hit, miss, read or write.

As the L1P, the L1D is a read-allocated cache where new data is allocated from memory on a read miss only. On a write miss, the data goes through a write buffer to memory, bypassing L1D cache. On a write hit, the data is written to the cache but not immediately to memory. This type of cache is referred to as write-back cache since data that was modified by a CPU write access is written back to memory at a later time.

The dirty bit indicates that a cache line has been modified by a write but the new data has not been written out to main memory. Initially the dirty bit will be zero. As soon as the CPU writes to a line, the corresponding dirty bit will be set to 1. The data is then written back to main memory when the line is evicted from the cache. This will happen when a read miss causes new data to be allocated to the dirty line. A write-back can also be initiated in software by sending a write back command to the cache controller. However, this is not usually required.
Tag, Cache Line, Cache Line Size and Cache Size would be clear from the above diagram.

Optimizing Cache Performance
There are three types of misses:
  • Compulsory miss (also called first reference miss) - a miss that occurs when the data is brought in cache for the first time. In contrast to the other two misses above, they can usually not be avoided.
  • Conflict miss - a miss that occurs because the line is replaced before it is re-used.
  • Capacity miss - a miss that occurs when the capacity is exhausted. Capacity misses are a subset of conflict misses.
For each miss, there will be stalls while the controller pulls data from memory into cache. To get the highest performance, a line must be reused as often as possible before it is replaced with another line. Reusing the line but accessing different locations within that line improves the spatial locality of accesses, while reusing the same locations of a line improves the temporal locality of accesses. This is the fundamental strategy of optimizing memory accesses for cache performance.

For example, cache performance is high when a program accesses memory sequentially. This access pattern makes many accesses to the same line before moving on to the next line. For example, the following pseudo-code has a very high hit rate:


The performance of the following code is not good because accesses to memory are in large strides, which means it makes fewer accesses to a given line before moving to the next line:


If a line is evicted from cache and then accessed again, the line must be brought into cache again. Therefore, it is important to avoid eviction as long as possible. Identifying the cause of a miss helps prevents a future miss.

As noted above, capacity misses occur because the cache is smaller than main memory. If there are capacity misses, the simplest solution is to increase the size of the cache. For example, the L2 cache on the C64x DSP can be configured as a mix of SRAM and cache. If there are excessive capacity misses, the programmer can allocate more L2 memory to cache. Another solution is to reduce the amount of data that is needed at any given time.

If there are conflict misses, the key is to reorganize data so that temporally proximate data sets map to different sets. (With a direct-mapped cache, this is the same as making sure the data sets map to different lines.) Changing the memory layout allows the data accessed to be located at addresses in memory that do not conflict (i.e. map to the same set) in cache. Alternatively, from a hardware design point of view, sets that can hold two or more lines can be created. Thus, two lines from the memory that map to the same set can both be allocated in cache without having one evict the other.