Doing More With the Minimum
I wonder if AI will graduate from self confidence to obsessive improvement
A couple of weeks ago I dropped the post with the 3-port very small ribbon-based SRAM using complex lithography. Then I went back to calculating GNSS orbits, then I realized there was another simpler way to make a fully dual-port SRAM at that size, then I got a new job. It has been a busy time of new directions. Now that new SRAM waiting to go. It just insisted on going first! Ideas can crowd in out of order.
Apologies to readers not obsessed with SRAM but this one is pretty and deserves its place in the blog. Not only does it have the classic dual R/W port functionality, but the lithography is much simpler than the design from 2 weeks ago. Much of this can probably be implemented with DUV with pitch multiplication and Inverse Lithography Transforms (ILT), no electron beams needed at all. Simplicity is nice!
Unclassically Classical
While the new design has dual R/W ports, it does not implement that functionality in the usual way.
The classic 2R/2W cell has 2 enable wires and 4 DQ wires. The enables travel vertically to select the words of data. The DQs travel horizontally moving data in or out of both sides of the cell. The problems come because the cell is wider than it is tall (180nm x 80nm in Festive 3 rules), and it does not have space to easily move 4 DQs sideways at the M2 level (M0 is full of internal cell connections).
The Unclassical design has just 2 DQ wires and 4 enables. This is a better match to the geometry of the M1 and M2 metals. But, how does that work? Every SRAM cell we have discussed needed 2 DQs, a signal and its inverse, connected to opposite sides of the cell to overpower the cell internal feedback in the brief write event.
A simple LTspice simulation of this circuit is at:
https://github.com/TanjIsGray/Unclassical-SRAM
In the unclassical approach these two connections, which are weak pass gates in the classical design, are replaced by a strong pass gate (normally just called a pass gate, since it is the usual and more versatile design, or a transmission gate). This is still a dual-transistor drive but now both attach on either the left side or the right side of the feedback loop. These left and right sides are the two ports (one is inverted, but that can be corrected with an inverter in the periphery logic). The dual transistor pass on one side competes with and can overpower the single active transistor on that side of the feedback loop.
We also need to ensure that reading does not flip that same feedback transistor. The reading operation must be made weaker than the central inverter. One approach is to equalize the ~DQ0 or DQ1 wires to Vdd/2 then floating them, before enabling the pass-gate for reading. An alternative approach is to equalize the DQ wire to Vss and then only open the P-transistor enable, implementing a one-sided read which is weaker both because the output is floating and only a weak pass is enabled (or we could use Vdd and enable the N-transistor weak pass, either choice is equivalent).
Building the Fundamentals
The changes from the 2R/1W cell at the channel, gate, and source-drain level are subtle. The ribbons are still perfectly regular at a 40nm pitch with no cuts. The only change is in the source-drain where there are no longer any split contacts, but all contacts come in pairs just as all gates come in pairs.
The power contacts, Vss shown with green and the Vdd with red, are assumed to be contacted from below in a Back-Side Power Delivery (BSPD) process.
The pattern is regular which should help with the lithography when placing the horizontal cuts which split the gates and contacts.
Build on That Foundation
On that base we will build internal feedback with 2 M0 links, and a third row of M0 is occupied by a couple of links that will connect to M1 as we will see below. Three M0 tracks are used, which may be pitched evenly at 32nm.
The cell will have 8 transistors in an 8T space, 100% use of space, with all edges shared by mirrored versions of the same cell. Sharing is permitted in the vertical direction because all the cells in the word are going to be enabled the same way.
A set of vias builds up from the 4 gates to prepare for the 4 word-enables.
All the gates are dual and shared with neighboring cells above or below. This keeps the via density low allowing for good ILT mask optimizations for imaging the vias.
Now the 4 word enables come through at a regular 45nm pitch over the gates in the transistors. There are 4 enables since each pass gate needs an inverse pair of enables for the P and N enabling gates.
Bring The Data
The final stage is to build vias up to M2 level from the source/drain contacts.
The gates are all dual within a cell, reflecting the shared signals controlling the pass gates down the word line. The data contacts are shared with cells at right or left. The result is a sparse pattern for vias which will help with the lithography. Sharing horizontally is safe because each port of the array uses one word at a time, so the cells on the sides are isolated from the activity as their pass gate will not be enabled.
The ~DQ0 and DQ1 run horizontally with no breaks or quirks at a 40nm pitch.
Let’s See That Again
And .. that’s it. It is all over before you know it! Two R/W ports in an SRAM cell of 14,400 nm2 in this notional Festive 3 process. That is less than half the size of the same functionality explained in Mock 4 process in May. Maybe SRAM can scale.
It was so nice to get such a clean layout. The ribbons and backside power are going to be a wonderful combination. I am impatient to see the circuits coming out of 3nm-class ribbon (aka Gate All Around or GAA) processes.
Next Week
As I mentioned at the start, I have started a new job - more details when I have settled into it. I will be able to continue this blog but my pace will slow a little. I have enjoyed attempting practical projects so you can expect to see more explorations of semiconductor circuits. But, first I will go back and finish my project of using GNSS satellites as a sort of telescope to locate the sun by gravity.
Then I have the project to show how to build a low energy MatMul for Int8/4 and FP6 types in a micro-scaled data stream.
After that I will look at the problem of building an incrementor for Panopticon mechanism in tracking disturbance in DRAM. A Panopticon variation “Per Row Activation Counting” (PRAC) is included in the DDR5 spec and expected in LPDDR6 but it seems they do not embed the counters adjacent to the cells, so I am curious to do the detailed modelling and see why.
I am encouraged to see over 1,200 readers of this rather technical blog! Don’t be shy with the comments and messages. I am also open to requests to explore “very small” technologies you may suggest! I will write soon!