# An 800Mbps Multi-Channel CMOS Serial Link with 3x Oversampling Sungjoon Kim, Kyeongho Lee, Deog-Kyoon Jeong, David D. Lee\*, and Andreas G. Nowatzvk\* Inter-University Semiconductor Research Center Seoul National University, Seoul, 151-742, Korea \*SUN Microsystems, Mountain View, CA #### Abstract A CMOS serial link is described that uses a digital PLL with 3x over-sampling to recover both clock and data. An implementation with 0.6um CMOS technology exhibits 800Mbps operation with BER of less than 10E-12 for pseudo random number sequence. Chip area and power dissipation per channel at 800Mbps are 2.1mm x 1.1mm and 0.75W, respectively. #### 1. Introduction There has been growing demand for high speed communication channels for ATM networks, processor-to-processor communications, and peripheral I/Os. To overcome the difficulties of analog PLL-based links to be integrated with digital circuits in CMOS technology, several digital clock and data recovery schemes have been developed [1][2]. Digital methods using CMOS technologies are cost effective and allow integration with other digital logic functions in a single chip. However, digital approaches lack enough bandwidth compared with the analog approaches due to excessive computing requirement and high sampling ratio. # 2. Chip Architecture The approach described in this paper uses a uniform sampling technique in which sampling clocks are generated by a local system clock. It requires only 3 times oversampling and a multibit averaging digital PLL allows increased bandwidth. The clock and data recovery circuit is mostly digital except a charge pump PLL which is shared by several channels. Because clock and data recovery is performed in the digital domain, there is no interference between channels and, thus, multiple channels are integrated easily in a single chip. In a test chip, 8 channels were integrated in a single chip. The block diagram of the 4 channel serial links is shown in Figure 1. A charge pump PLL generates 30 multiphase clocks for the transmitters and receivers of all channels. The clocks have twice the frequency of the local system clock with differing phases equally spaced across 360 degrees. The transmitter uses 10 clocks to convert 10 bits of parallel data into a high speed serial stream. Each serial input data bit is oversampled three times using 30 multiphase clocks. The 30 bits of oversampled data are processed in parallel by the DPLL to extract the phase information from the incoming data patterns. 10 bits of data are recovered in each cycle by selecting the center bits among 30 oversampled bits. Clock recovery is done by dynamic selection of one of 30 clocks. An external interface with a channel encoder/decoder and a self-test logic is designed to be 20 bit wide rather than 10 bit to allow using a low frequency system clock. #### 3. Circuit Design #### A. transmitter and sampler An impedance matching and swing adjustment circuit shown in Figure 2 allows for adjustment of termination resistance and voltage swing on a Twinax cable. Serialized outgoing data are driven by an ECL-like open-drain differential pair. A pair of voltage controlled resistors controlled by the impedance matching circuit terminates the transmission cable with 50 $\Omega$ . We adopted two stage sense amplifier as a fast settling sampler with low metastability[3]. Figure 1. Block diagram of 4 channel serial link 22.7.1 Figure 2. Impedance matching and swing adjust circuit #### B. Digital PLL A block diagram of a multibit averaging DPLL used in the receiver is shown in Figure 3. The DPLL receives 30 bits of oversampled data provided by the data oversampler. There are two pointers inside the DPLL, the phase pointer and the word pointer. The phase pointer, P, which indicates the center bit among 3 oversampled bits, is a three-bit ring counter that circulates a single bit according to Pup and Pdown signal. The word pointer, W, is a unary 10 bit ring counter that points to the start of a word frame. The word pointer changes its state when the phase pointer crosses its bit boundary by moving its value from 2 to 0 or 0 to 2. To bring the timing of all the sampled data into a single domain of the recovered clock, a data rotator and two-stage retimer DFFs are used. The data rotator rotates 30 bit data according to the word pointer value in such a way that the 30 sequentially sampled data be retimed by the 15 and 30 DFFs with a maximum timing margin as shown in Figure 5. This configuration always guarantees maximum timing margins for the DFFs when the internal recovered clock, IntRck, changes by the word pointer. The digital phase detector and loop filter shown in Figure 6 extracts transitions in the 30 bits of parallel sampled data and performs averaging and low pass filtering. The count block counts the number of transitions in 3 possible edges and encodes the number to four bits. After voting for the maximum likely edge, the comparison block compares it with a current phase and generates up or down signal for a phase adjustment. Three successive ups or downs move the current phase pointer. The state diagram of the low pass filter is shown in Figure 7. This simple low pass filtering effectively removes phase variations caused by jitter, and at the same time tracks the gradual phase drift caused by frequency differences between remote and local stations. ## C. Clock Recovery An internal recovered clock is synthesized under the control of the word pointer and the phase pointer by dynamically selecting one clock out of 30 possible clocks to recover the sender's clock. When the frequency of the remote station is higher than that of the local station, the word pointer successively selects the clock that leads the current clock and vice versa. Figure 4 illustrates the clock synthesis process when the frequency of the remote station is higher than the local station. The recovered clock is a buffered version of ck[3xW+P]. Figure 3. Block diagram of multibit averaging DPLL Figure 4. Clock synthesis process when transmitter is faster Figure 5. Conversion of sampled data into recovered clock domain by data rotator Figure 6. Digital phase detector and loop filter #### D. Data Recovery The data can be recovered simply by just selecting 10 center bits among 30 bits according to the phase pointer value. The 10 bit recovered data are converted to 20 bit data synchronized at the divided recovered clock and sent to other circuits. The character synchronization is done by a character synchronizer integrated in the same chip. The character synchronizer increases word pointer until correct byte alignment is reached. Figure 7. State diagram of digital loop filter Figure 8. Chip microphotograph # 4. Experimental Results The circuit has been fabricated using a $0.6\mu m$ , double-metal CMOS technology. A microphotograph of the chip including 8 channels of serial links, encoders, and decoders is shown in Figure 8. The serial link core was implemented with full custom layout and the encoders/decoders and test logic were implemented with standard cell. The total chip size is $9x9 \text{ mm}^2$ . One serial link channel occupies $2.1\text{mm} \times 1.1\text{mm}$ . The circuit draws 1.2A from a 5V power supply when all 8 channels are active at 800Mbps. 0.75W of power is consumed in each serial link. The PLL clock histogram is shown in Figure 9. Figure 9. Measured jitter histogram of PLL clock Figure 10. Measured eye-diagram at 1Gbps The measured peak-to-peak PLL jitter is 130ps and the RMS jitter is 20.3ps when the PLL is running at 120MHz. The eye-diagram at 1Gbps is shown in Figure 10. With 500mV swing on a 2 meter Twinax cable, the maximum bandwidth is measured at 840Mbps with BER of 10E-9 when the frequency of the local station differs by 0.1% from that of the remote station. BER at 800Mbps is less than 10E-12 with the same condition. The recovery circuit operates error-free up to 700Mbps. The chip characteristics are summarized in Table 1. Table 1. Chip characteristics and performance summary | Technology | 0.6μm double metal CMOS | |------------------------|------------------------------| | Total chip size with 8 | 9mm X 9mm | | channels | | | Area per channel | 2.1mm X 1.1 mm | | Power per channel | 0.75W | | (800Mbps) | | | RMS jitter of PLL at | 20.3ps | | 1.2Gbps | | | Bit error rate (BER) | No errors at 700Mbps | | | 10 <sup>-12</sup> at 800Mbps | | | 10 -9 at 840Mbps | #### 5. Conclusion Using the 3 times oversampling architecture based on multiphase generating PLL and multibit averaging DPLL, an 800Mbps 8 channel CMOS serial link has been fabricated and its operation has been experimentally verified. This new architecture minimizes analog components and allows separate multi-channel operation without interference. It also enables cost effective integration of serial link with CMOS ASIC. # Acknowledgments The authors wish to thank Joongseok Moon of Seoul National University and Rong Pan and David Lai of LSI Logic for their technical assistance and testing efforts. # References - [1] Mel Bazes, Roni Ashuri, "A Novel CMOS Digital Clock and Data Decoder," *IEEE Journal of Solid-State Circuits*, Vol.27, No.2, pp. 1934-1940, Dec., 1992. - [2] Bin Guo, Arthur, Yun-che Wang, and James Kubinec., "A 125Mbs CMOS All-Digital Data Transceiver using Synchronous Uniform Sampling," *ISSCC Digest of Technical Papers*, pp. 112-113, Feb., 1994. - [3] Kyeongho Lee, Sungjoon Kim, Gijung Ahn, and Deog-Kyoon Jeong, "A CMOS Serial Link for 1 Gbaud Fully Duplexed Data Communication," *Symposium on VLSI Circuits Digest of Technical Papers*, pp.125-126, June, 1994.