## 2.2 A 90nm CMOS 16Gb/s Transceiver for Optical Interconnects

Samuel Palermo<sup>1</sup>, Azita Emami-Neyestanak<sup>2</sup>, Mark Horowitz<sup>1</sup>

<sup>1</sup>Stanford University, Stanford, CA <sup>2</sup>Columbia University, New York, NY

As I/O bit rates have increased in order to accommodate growing on-chip aggregate bandwidth, the disparity between optical and electrical channels at the board and box level has risen. This increases electrical link equalization complexity and leads designers to consider optical interconnects in order to meet I/O power-budget and density requirements. A dense low-power full optical transceiver cell capable of 16Gb/s operation is developed to explore the potential of optical interconnects to meet growing chip-to-chip bandwidth requirements.

The transceiver architecture is shown in Fig. 2.2.1. In order to enable short bit periods without consuming excessive area and power in clock generation and distribution, a multiple clock phase multiplexing architecture is used at both the transmitter and the receiver. In the frequency-synthesis PLL of the transmitter, a 5-stage coupled pseudo-differential ring oscillator provides 5 sets of complementary clock phases spaced a bit-period apart. These phases are used to switch a 5-to-1 multiplexer to produce a serial data stream. The multiplexer serial output is buffered by the VCSEL driver output stage, described in detail in [1]; it consists of a 4-tap current-mode FIR filter that equalizes the VCSEL response at high data rates. At the receiver side, a low-voltage integrating and double-sampling front-end performs demultiplexing directly at the input node using 5 uniform clock phases from the dual-loop clock-recovery system.

The integrating and double-sampling receiver front-end, shown in Fig. 2.2.2, demultiplexes the incoming data stream with 5 parallel segments that include a pair of input samplers, a buffer, and a sense-amplifier. Two current sources at the receiver input node, the photodiode current and a current source that is feedback biased to the average photodiode current, supply and deplete charge from the receiver input capacitance respectively. For data encoded to ensure DC balance, the input voltage will integrate up or down due to the mismatch in these currents. A differential voltage,  $\Delta v_{\rm h}$ , is developed in each receiver segment by sampling at the beginning and end of a bit period defined by the rising edge of the recovered clocks  $\Phi[n]$  and  $\Phi[n+1]$ , respectively. While in a previous implementation [2]  $\Delta v_b$  was applied directly to an offsetcorrected StrongArm latch used as a sense-amplifier for data regeneration, the reduced supply voltage that comes with scaling technologies causes the integrating input to exceed the senseamp input range. In order to fix the sense-amp common-mode input level and buffer the sensitive sample nodes from kickback charge, a differential buffer is inserted between the samplers and the sense-amp. The power penalty of the additional buffer is quite small (250µW per segment), as buffer gain is low to avoid senseamp offset saturation and bandwidth requirements are relaxed due to input demultiplexing. The use of PMOS samplers provides a receiver input range from 0.6 to 1.1V. Demultiplexing directly at the input gives the sense amp sufficient time (5 times the bit period) for data regeneration and precharging, thus eliminating the requirement for a TIA operating at the bit rate.

Clock recovery is implemented with the dual-loop architecture shown in Fig. 2.2.3. It expands the work presented in [3] to multiple clock phase operation. An additional set of 5 receiver segments provides binary phase information to a baud-rate phase detector [2] which, when compared to a common 2× oversampling scheme, saves power and area by reducing the number of distributed clock phases by a factor of 2. The clock phases for the data and phase samplers are generated by an adaptive bandwidth frequency-synthesis PLL [4], while a secondary phase loop selects 2 of the PLL phases to interpolate between in order to provide the optimal receiver sampling position. The interpolated phase,  $\Psi$ , is used as the feedback clock for the frequency synthesis PLL, allowing for the simultaneous shifting of all VCO phases with only one interpolator, instead of 5, normally required in a 1-to-5 demultiplexing receiver. Because the interpolator is in the feedback path, any glitches due to interpolator switching are filtered by the PLL. Also, the delay from the VCO to the samplers is minimized, resulting in reduced jitter accumulation. While the frequency synthesis PLL and secondary phase loops are coupled, the implementation can be treated as an effective dual-loop system if the loop bandwidths are set appropriately. The frequency-synthesis loop bandwidth is set relatively high (40MHz for 16Gb/s operation) to filter phase noise from the ring oscillator and allow the PLL to track the CDR updates, while the secondary phase loop bandwidth is set low (<4MHz) to filter out phase errors caused by low input SNR for the phase samplers.

Precise phase spacing of the recovered clocks is essential for good sensitivity, because any phase error will result in a reduced double-sampled differential voltage at the receiver. Clock buffers with digitally-adjustable capacitive loads are used to tune out mismatches in the VCO and clock distribution network, such that initial phase errors in the range of  $\pm 12\%$  of a bit period are reduced to <2%. Since there is a static path from the VCO to the receiver samplers (due to the phase interpolator being in the PLL feedback path), these phase errors can be tuned with a low bandwidth control loop.

The optical transceiver is fabricated in a 90nm standard CMOS process. Both the 850nm VCSEL and photodetector are attached with short wirebonds, as shown in Fig. 2.2.7. The optical eye diagrams of Fig. 2.2.4 show how the equalizing transmitter provides a 45% increase in vertical eye opening and enables the 10Gb/s class VCSEL to operate at 16Gb/s with 6.2mA average current and 3dB extinction ratio. The transceiver performance is summarized in Fig. 2.2.5. Recovered clock jitter is  $1.9 ps_{rms}$  (Fig. 2.2.6) while the optical receiver sensitivity with BER=10<sup>-10</sup> is -9.6dBm average optical power at 10Gb/s and -5.4dBm at 16Gb/s. Using the photodiode responsivity of 0.5mA/mW at 850nm and the measured 440fF input capacitance, this converts to an input voltage swing of 12.5mV at 10Gb/s and 20.2mV at 16Gb/s. It is worth noting that the wirebond connection to the photodetector adds extra parasitics and superior sensitivity numbers (less optical power for a given input voltage swing) could be achieved with a more integrated approach, such as flip-chip bonding. The transceiver consumes 104mW from the core 1V supply and 25mW from the 2.8V VCSEL output stage, for a total power consumption of 129mW or 8.1mW/(Gb/s). The total transceiver area is 0.105mm<sup>2</sup>.

## Acknowledgments:

The authors would like to acknowledge the help and support of D. Patil, B. Nezamfar, P. Chiang, and B. Gupta, CMP and STMicroelectronics for chip fabrication, ULM photonics for VCSELs, Albis Optoelectronics for photodiodes, and MARCO-IFC for funding. S. Palermo thanks Sh. Palermo for constant help and support.

## References:

[1] S. Palermo and M. Horowitz, "High-Speed Transmitters in 90nm CMOS for High-Density Optical Interconnects," *Proc. ESSCIRC*, pp. 508-511, Sep. 2006.

[2] A. Emami-Neyestanak et al., "CMOS Transceiver with Baud Rate Clock Recovery for Optical Interconnects," *Symp. VLSI Circuits*, pp. 410-413, June 2004.

[3] P. Larsson, "A 2-1600MHz CMOS Clock Recovery PLL with Low-Vdd Capability," *IEEE J. Solid-State Circuits*, vol. 34, no. 12, pp. 1951-1960, Dec. 1999.

[4] S. Sidiropoulos et al., "Adaptive Bandwidth DLLs and PLLs using Regulated Supply CMOS Buffers," *Symp. VLSI Circuits*, pp. 124-127, June 2000.

2



Continued on Page 586

