Thursday, May 31, 2007

Anti Aliasing Filter

Aliasing: "signals" that are essentially continuous in space or time must be sampled, and the set of samples is never unique to the original signal. The other signals that could (or did) produce the same samples are called aliases of the original signal.

Nyquist Criteria: The minimum sampling rate required to allow unambiguous reconstruction of a bandlimited continuous signal from its samples (without having any alias) is to sample it at a rate more that twice the bandwidth of the signal.

Generally, any real world signal (be it sound or image or video) is not band limited. It is the perceiver (the ear, the eye) that is band limited or responsive to certain frequencies. Thus when a real world signal is converted to digital to avoid aliasing the signals are first filtered to make them band limited and then by Nyquist criteria they are sampled.

In case of voice, the maximum frequency response is around 4KHz and so a anti-aliasing filter of 4 Khz is first applied.Similarly, for music the audible range is around 22Khz and so a anti-aliasing filter is used.( Anti-aliasing filter is a low pass filter.)

The concept of aliasing is more clear in case of sound or audio but when it comes to video or images it is not so clear or it is not mapped to the things that we use for sound. So this article aims at clarifying this.

The example I am going to take here is from a article I read in "Extreme Tech" named "Anti-aliasing and Anisotropy filtering explained". Consider the following image that needs to be sampled based on the pixel grid shown behind it.

A diagonal line on a set of pixels.
The image you’re trying to represent—a diagonal line.
A diagonal line on a set of pixels.
A rough way to do this would be to just make points in which the black area is covered mostly by making it a black pixel and areas covered by white mostly by white pixel. Doing this we get the following image.

Aliased pixels
Aliasing at work—your pixels display one color or the other.
Aliased pixels
From the above figure it is clear that the image has jagged effect and does not represent the diagonal line clearly. To make it more clear lets consider a more sampled grid.

Aliasing at higher resolution
As resolution goes up, the ugliness of aliasing become less apparent.
Aliasing at higher resolution
Here even though the diagonal line is better seen (since by going for higher number of samples we have increased the sampling rate and thus reduced aliasing effect) we still observe the jagged effect.
To reduce this effect the solution would be thinking in the blind way is to fill regions which are partially with white and black with shades of gray.
So if the pixel is 25% filled by the black edge, you make that pixel only 25% black—a light gray color. If it's 80% covered by the black line, you make that pixel 80% black—dark grey. The result is an antialiased edge.

Antialiased edge
Antialiased edge

What we have really done here is that we have filtered (or applied a anti-aliasing filter) to the input image before sampling it. That is if the diagonal edge is filtered by a low pass filter before sampling then we would observe gray shades along the edges. A much clear picture of the same is found below.

AA edge higher resolution
Antialiased edge on an 80x60 pixel image
AA edge higher resolution

In 3D Graphics, this aliasing effect is described in a different view and the methods described is given below.
Super-sampling:
"The simple brute-force approach is called supersampling. For every pixel on your screen, you determine the color for multiple pixels, and then combine them. Let's say your screen is running at a resolution of 800x600 pixels. If you want to perform 4x supersampling, you effectively draw the screen at 1600x1200 pixels (four times as large) and then scale that down to 800x600 by combining the colors of four neighboring pixels into one. Most modern 3D cards don't exactly draw a higher-resolution screen. Rather, they calculate, store, and combine "subpixel samples" for each pixel."
Sub-pixel samples
In other words, (since we do not have analog filters while capturing or since in graphics we cannot have a perfect analog signal), we over sample the input signal then apply a linear filter across it and then down sample the over sampled input.

Effects of Super Sampling In Real World Images:
No AA vs. 2xAA
Even 2xAA makes a huge difference, and this is the fastest, lowest-quality mode available.
No AA vs. 2xAA

No AA vs. 4xAA
Stepping up to 4xAA makes a big difference.
No AA vs. 4xAA

Full quality comparison
Even higher levels of AA, together with AA modes that supersample objects with transparent textures (like the trees), makes the game look a lot better.
Full quality comparison

From the above comparison pictures effect of anti-aliasing effect is better shown.

Wednesday, May 30, 2007

Six Ways To Write More Comprehensible Code

Courtesy: Jeff Vogel
I read this article and thought I would just jot down the major points. I liked it.

Cache

A place to hide or conceal something (dict)

A cache is a place to store a block of memory which would be used again and again.

REASON:
The reason for having cache is that cost of faster memories are high and to cater to the speed of CPU we require faster memories. So instead of having the entire memory as faster memory (since it is costly) we have smaller faster memories and bring in data whenever required into this faster memory from slower memory. Thus reduce the average access time for data.

ASSUMPTIONS:
Based on Principle of Locality
  1. Spatial Locality - Data that is near the currently accessed data has high probability of getting accessed
  2. Temporal Locality - The same data would be accessed again and again (temporary variables, Instructions getting executed etc)
BASIC BLOCK DIAGRAM:

CPU<=>Level 1<=>Level 2<=>Level 3
------->Memory size Increases-------->
<-------Speed increases <-------------- MECHANISM:

Initially all the data is in Level 3 Cache. Whenever a data needs to be accesses by the CPU it checks for the availability in Level 1 (if found/cache hit). If data not available (cache miss) then it goes to higher level cache (Level2). Then the same till it reaches the data. Once obtained the data along with some of it neighbors are brought into all the lower level caches till the memory.
So during the first access, there is latency. For the subsequent access of the same data (since available in the cache) it is faster - Temporal Locality. Further, for access of data near the first accessed data the access speed is faster - Spatial Locality. Thus on an average it reduces the access time.

TERMS:
Cache Hit:
Data requested by the CPU is present in that particular level
Cache Miss:
Data requested is not present in the particular level
Associativity:
The way in which the cache is organized - e.g. Direct Mapping, Set Associativity
Cache Line: Size of data read from higher level in one read into the current level

Associativity:
How main memory maps to the cache memory location.

Direct Mapping:
Caches can be either "direct-mapped" or "set-associative." To explain these terms, let's look the L1P cache of C64x shown in Figure 4. This cache consists of 512 32 byte lines. Each line always maps to the same fixed addresses in memory. For instance:
  • Addresses 0000h to 0019h will always be cached in line 0
  • Addresses 0020h to 0039h will always be cached in line 1
  • Addresses 3FE0h to 3FFFh will always be cached in line 511.
Once we get to address 4000h, the capacity of the cache is exhausted so addresses 4000h to 4019h start back over at line 0.

n addition to holding copies of data, each line of the L1P cache contains:
  • A valid bit, which indicates whether or not the cache line contains useful data.
  • A tag, which is equivalent to the upper 18 bits of the address. This is needed because a given line can contain data from different addresses. For example, line 0 can hold data from addresses 0000h to 0019h, addresses 4000h to 4019h, etc.
  • An (implied) set number. The set number is equivalent to bits 5 to 13 of the address. For a direct-mapped cache, the set number is the same as the line number. Things get more complicated with a set-associative cache, as we will se later in this article.
Now let's see what happens if the CPU accesses address location 0020h. Assume that the cache has been completely invalidated, meaning that no line contains cached data. When the CPU makes a request to read address 20h, the cache controller starts by looking at the set portion of the address (e.g. bit 5 to 13) to see which cache line it should inspect. For address 0020h the set portion is one. The controller then checks the tag bit for line one to see if the line holds addresses 0020h to 0039h, which it does. Finally, it checks the valid bit to see if the line holds useful data. Since the valid bit is assumed to be 0, the controller registers a miss.

The miss causes the controller to fetch the line (0020h-0039h) from memory and store the data in set one. The tag portion of the address is stored in the tag RAM, and the valid bit is set to one. The fetched data is also forwarded to the CPU, and the access is complete.

If address 0020h is accessed again, the cache controller once again takes the address, inspects its set and tag portions, and compares the tag portion of the address against the value stored in tag RAM. In this case the tag comparison is positive. The valid bit is now one so the controller will register a hit and forward the data in the cache line to the CPU, completing the access.



Set Associative Mapping:
Set-associative caches are an extension of direct-mapped caches. In a direct-mapped cache, each set contains only one line. In a set-associative cache, each set consists of multiple lines, which are referred to as "ways." Figure 5 illustrates a set-associative cache, the C64x DSP's L1D. This is a two way set associative cache with 64 byte lines and a total 16 KBytes capacity.



n addition to holding copies of data, each line of the L1D cache contains:
  • A LRU bit that indicates which way was the least recently used. (This is not found in L1P.)
  • A dirty bit, which indicates whether or not the cache line matches the contents of main memory. (This is not found in L1P.)
  • A valid bit, which indicates whether or not the cache line contains useful data.
  • A tag, which is equivalent to the upper 18 bits of the address.
  • A set number, which is equivalent to bits 5 to 13 of the address.
Hits and misses are determined the same way as in a direct-mapped cache except that now two tag comparisons are necessary—one for each way—to determine which way holds the requested data. If there is a hit in way 0, the data of the line in way 0 is accessed, and if there is a hit in way 1, the data of the line in way 1 is accessed.

If both ways miss, the data needs to be allocated from memory. The LRU bit determines how the data is allocated and can be thought of as a switch. If the LRU bit is set to 0 the line is allocated in way 0, and if it is 1 the line gets allocated in way 1. The state of the LRU bit changes whenever an access (read or write) is made to a cache line. For instance, if the line in way 0 is read, the LRU bit switches to 1. Note that the LRU bit is only consulted on a miss, but its status is updated every time a line is accessed regardless whether it was a hit, miss, read or write.

As the L1P, the L1D is a read-allocated cache where new data is allocated from memory on a read miss only. On a write miss, the data goes through a write buffer to memory, bypassing L1D cache. On a write hit, the data is written to the cache but not immediately to memory. This type of cache is referred to as write-back cache since data that was modified by a CPU write access is written back to memory at a later time.

The dirty bit indicates that a cache line has been modified by a write but the new data has not been written out to main memory. Initially the dirty bit will be zero. As soon as the CPU writes to a line, the corresponding dirty bit will be set to 1. The data is then written back to main memory when the line is evicted from the cache. This will happen when a read miss causes new data to be allocated to the dirty line. A write-back can also be initiated in software by sending a write back command to the cache controller. However, this is not usually required.
Tag, Cache Line, Cache Line Size and Cache Size would be clear from the above diagram.

Optimizing Cache Performance
There are three types of misses:
  • Compulsory miss (also called first reference miss) - a miss that occurs when the data is brought in cache for the first time. In contrast to the other two misses above, they can usually not be avoided.
  • Conflict miss - a miss that occurs because the line is replaced before it is re-used.
  • Capacity miss - a miss that occurs when the capacity is exhausted. Capacity misses are a subset of conflict misses.
For each miss, there will be stalls while the controller pulls data from memory into cache. To get the highest performance, a line must be reused as often as possible before it is replaced with another line. Reusing the line but accessing different locations within that line improves the spatial locality of accesses, while reusing the same locations of a line improves the temporal locality of accesses. This is the fundamental strategy of optimizing memory accesses for cache performance.

For example, cache performance is high when a program accesses memory sequentially. This access pattern makes many accesses to the same line before moving on to the next line. For example, the following pseudo-code has a very high hit rate:


The performance of the following code is not good because accesses to memory are in large strides, which means it makes fewer accesses to a given line before moving to the next line:


If a line is evicted from cache and then accessed again, the line must be brought into cache again. Therefore, it is important to avoid eviction as long as possible. Identifying the cause of a miss helps prevents a future miss.

As noted above, capacity misses occur because the cache is smaller than main memory. If there are capacity misses, the simplest solution is to increase the size of the cache. For example, the L2 cache on the C64x DSP can be configured as a mix of SRAM and cache. If there are excessive capacity misses, the programmer can allocate more L2 memory to cache. Another solution is to reduce the amount of data that is needed at any given time.

If there are conflict misses, the key is to reorganize data so that temporally proximate data sets map to different sets. (With a direct-mapped cache, this is the same as making sure the data sets map to different lines.) Changing the memory layout allows the data accessed to be located at addresses in memory that do not conflict (i.e. map to the same set) in cache. Alternatively, from a hardware design point of view, sets that can hold two or more lines can be created. Thus, two lines from the memory that map to the same set can both be allocated in cache without having one evict the other.