DE2 Verilog examples
ECE 576 Cornell University

I used these examples to help teach myself Verilog. They are in the order I did them. The later examples are more fluent.
List of projects included on this page:

  1. Simple FPGA i/o
  2. External SRAM interface
  3. Kraken 16-bit cpu
  4. VGA interface: color generation, diffusion-limited aggregation
  5. Audio Codec interface: DAC output and ADC input
  6. M4K memory block timing test

  1. FPGA I/O
    This simple example defines: The example was built mostly to understand the FPGA I/O pin assignments and the compilation/synthesis procedure. The whole QuartusII project is zipped here.
  2. External SRAM interface
    This example exercises external 61LV25616 SRAM by:

  3. Kraken 16-bit cpu
    This example is a simple 16 instruction ISA cpu with LED and switch i/o. The implemented datapath and timing diagram are useful to understand the Verilog. There is a picture of the board displaying instruction address PC=02 which contains 16'h8104, which is the instruction LI r1,4 (load-immediate register1 with value 4). This cpu is mostly intended for me to teach myself Verilog in an Altera context.
    Features include:
    Program:
       assembler        instruction memory
       LI r0, 1         8001 ;need to NOP first inst out of reset
       LI r0, 1         8001
       LI r1, 4         8104
       SUB r1 ,r1, r0   1110
       BNZ r1, -1       C1FF ;PC update timing implies that this jump is to the SUB
       JMP -3           E0Fd ;This jump is to the second LI
    A short mpeg of this program executing. The finger entering the frame from the lower right is running the clock. The blinking green LED is illuminated during the FETCH state. The left-most 2 digit 7-seg display is showing the PC. The 4 digit 7-seg display is showing the instruction being fetched/executed. The program loops through the subtract 4 times, then jumps back, reloads the counters and down-counts again.

  4. VGA examples
    1. The first example displays a 320x240 image on a 640x480 raster using external SRAM. The color depth is 12 bits/pixel (4 bits/primary/pixel). When first powered up, SRAM contains random bits. Pressing KEY1 writes a 20x15 grid of colors to memory. Holding KEY2 while pressing KEY1 write a single color to SRAM, as determined by the upper 12 bits of switches. SW[15:12] is red intensity, SW[11:8] is green, SW[7:4] is blue. The VGA driver, PLL, and reset controller from the DE2 CDROM are necessary to compile this example. All the files are in this zip. An image is below. There is a dim scan bar across the middle of the frame caused by the digital camera.
    2. The second example displays a 640x480 image on a 640x480 raster with a color depth of 8 bits/pixel. The color map is shown below. There are 16 levels of green (vertically), 4 levels of red and 4 levels of blue. Both red and blue increase in intensity from left to right (except for the last four columns on the right where there is no blue). In the first 4 columns, blue intensity is zero, becoming full intensity in columns 13-16. You could build a color lookup table to produce more distinct colors than the linear set displayed here.
    3. The third VGA example generates a 320x240 diffusion-limited-aggregation (DLA). A DLA is a clump formed by sticky particles adhering to an existing structure. In this design, we start with one pixel at the center of the screen and allow a random walker to bounce around the screen until it hits the pixel at the center. It then sticks and a new walker is started randomly at one of the 4 corners of the screen. The random number generators for x and y steps are XOR feedback shift registers (see also Hamblen, Appendix A). The VGA driver, PLL, and reset controller from the DE2 CDROM are necessary to compile this example. All the files are in this zip. A short mpg (1.3 Mbyte) shows the initial stages of the simulation. The first image below is at an earlier time in the simulation than the second image. When the structure is larger, there is a bias to grow toward the corners because that is where new walkers were released. The program is structured as a state machine which processes walker updates only during vertical or horizontal sync intervals when external SRAM is not needed to update the screen. Note that you must push KEY0 to start the state machine.


      A minor change in the design results in colors. The 12-bit color vector (see VGA example 1 above) was just decremented for every new walker. Since the ordering is high bits for red, mid bits for green, low bits for blue, the color starts at white, fades to yellow, then red, then cycles in a complicated fashion.



      Another minor modification of the design results in a "forest". In this case, the entire bottom of the screen was the starting seed and the walker release sites were uniformly distributed across most of the top of the screen The second image below shows the simulation after it has reached the top of the screen. At this point, every walker is immedately immobilized and growth stops as shown in the second image following.

      This design version has several states merged to take advantage of hardware parallelism. Memory access is, of course, serial. The walker release sites were modifed to cover approximately the top two-thirds of the screen.
  5. Using the Audio Codec
    DAC only:
    The audio codec supports ADC conversion of microphone or line inputs into the FPGA, and DAC conversion of digital audio from the FPGA to line out. This example shows how to set up playback from the FPGA to line out. Playback is supported from flashRAM, SDRAM, SRAM or by direct on-the-fly synthesis. This example uses direct digital synthesis (DDS) of a sine wave. The DDS sinewave ROM table is computed by a matlab program. In this mode, the codec requires 16-bit, 2's complement samples for each channel. The matlab actually generates 15-bit samples because the gain of the headphone output was turned up too high and caused clipping.

    The logical structure of the hardware driver:

    ADC and DAC:
    This example uses a modifed interface to loop the the stereo ADC input through a digital filter and back to the output DAC. The whole project is zipped here. The project includes a SignalTap debug interface to show ADC data input. In the SignalTap data-capture shown below, the ADCLRCK signals right-channel when low and left-channel when high.

    The top_level module was modifed slightly to simplify the interface to the new Audio_DAC_ADC module. The I2C_AV_config module LUT was modified to turn on the ADC with line-input. Note that one word of dummy data must be sent across the I2C interface before the first actual setup parameter. The central part of the Audio_DAC_ADC module is shown below. Data is clocked in/out on the negative edge of AUD_BCLK. The 16-bit, serial data is 2's complement, MSB first. The LRCK_1X signal represents the locally generated ADCLRCK.
    always@(negedge oAUD_BCK or negedge iRST_N)
    begin
    	if(!iRST_N) SEL_Cont <= 0;
    	else
    	begin
    		SEL_Cont <= SEL_Cont+1; //BIT SELECTOR: 4 bit counter, so it wraps at 16
    		if (LRCK_1X) //ADCLRCK
    			AUD_inL[~(SEL_Cont)] <= iAUD_ADCDAT;
    		else
    			AUD_inR[~(SEL_Cont)] <= iAUD_ADCDAT;
    	end
    end
    // output the DAC bit-stream						
    assign oAUD_DATA = (LRCK_1X)? AUD_outL[~SEL_Cont]: AUD_outR[~SEL_Cont] ;														
    // Filter the input sample and register output	
    always@(negedge LRCK_1X ) //oAUD_LRCK
    begin
    	// 1-pole lowpass at about 200 Hz. The >>> operator is SIGNED shift right
    	AUD_outR <= (AUD_inR>>>6) + AUD_outR - (AUD_outR>>>6); 
     	// just pass through left channel
    	AUD_outL <= AUD_inL ; 
    end 


  6. M4K memory block timing test
    The CycloneII FPGA has dedicated memory blocks. Timing reads/writes seemed to be an issue in some designs, so I decided to write some Verilog to test various clocking methods, write through versus no write through, and pure Verilog (from the Altera HDL style manual) versus Megafunction memory definitions (which generate Verilog). The six combinations that I tried were:
    1. Memory clocked on negative edge, with write through
    2. Memory clocked on negative edge, without write through
    3. Memory clocked on positive edge, with write through
    4. Memory clocked on positive edge, without write through
    5. Memory block megafunction without registered output, clocked on negative edge
    6. Memory block megafunction without registered output, clocked on positive edge
    The first four were from the Altera HDL style manual, the last two from the MegaWizard Plugin Manager Verilog generator. In all cases the state machine which reads/writes memory is clocked on the positive edge of the same clock. When changing versions of QuartusII, you may need to open the asyncRAM megafunction (using the appropriate wizard) and regenerate it. The top level module latches the data from memory one cycle after the read/write address is asserted and displays a pass/fail on two LEDs. LEDR[15] lights if read-after-write is correct in one cycle, while LEDR[17] lights if the read is correct in one cycle. Only configurations 1 and 5 pass both tests. The whole project is zipped here.

References

JO Hamblen, TS Hall and MD Furman, Rapid protoyping of digital systems, Springer 2005