SRAM With Double the Fun - with a Twist

Sharing a bit makes it even better!

May 20, 2024

This week was indeed fun! The 3D geometry is pretty intense, finding a good arrangement took a few tries, but the result looks nice and clean.

The basic 1T1C SRAM with a single port is not often used. SRAM is expensive and tends to be needed in places where there is a lot of parallel activity. 20 years ago you could find separate chips dedicated to basic SRAM and it would have been useful then as external interfaces ran at similar speeds to the logic inside the chip. Internal signal throughput improvements have outpaced signals to other packages. Having high-speed high-throughput memory bottlenecked and delayed behind the wiring between packages made little sense, you might as well use DRAM and have more capacity.

Perhaps the rise of chiplets with much faster interconnects will make high density baseline SRAM interesting in future, but for now SOCs have moved high speed scratchpad and cache SRAMs inside the chip. An SOC has multiple cores and accelerators, so most of that SRAM will have at least two uses. For example, L2 cache usually has one side working for the local core or accelerator while the other side interfaces with the system fabric to share with other parts of the SOC.

This leads to adding ports to the basic SRAM cell, additional independent paths to read and write the SRAM array. There needs to be some collision avoidance but so long as the two clients of the array are working with different rows they can run in parallel. This in principle can eliminate queues of unpredictable size, since the collisions are generally resolved by a rule where both sides can share a single operation (under whatever rule decides the winner if both are writing).

The simplest approach to a dual port cell is simply to add another pair of access transistors, which are controlled by a different enable system and connect out to a different DQ. So long as there is no collision at the row then there is only one pair in use at one time, and the operations behave just the same way if the new access pair is the same size as the original access pair. SRAM really is a marvelously flexible circuit, although adding more transistors will add leakage and reduce the signal margins which will need some clever assists for reading and writing.

The layout cartoon looks complex so we can expect the physical construction to be crowded. A challenge!

The cell gets an extra N fin at top and bottom to support the new access pair and the layout area of the cell grows by 50%. The central section remains the same P-fin cut trick and M0 feedback wiring as the basic cell, but we have new gates and new access paths to connect, so there are many new wires.

Now the twist. Looking at it from a new direction. The first draft of SRAM assumed that words of data, the set of bits accessed together, were organized in horizontal rows (following the power tracks, like the logic cells). That causes two problems. One is that sharing cell edges requires the cells to be enabled in even/odd sets. This doubles the count of word enable wires to be routed horizontally. A similar problem arises with routing the DQs vertically when we want two ports. I succeeded in making that work in a first version of this post. Then I noticed that if the word enables run down the column, and the DQs run along the row, there is no requirement for even/odd alternation of the cells.

This reduces the vertical signal flow to two enables, much more suitable for the narrow vertical aspect of the cell. The two N-fins at the bottom and at the top are carrying the dual access transistors for the DQ0 and ~DQ0, and the DQ1 and ~DQ1 wires, which run easily along the horizontal. The N-fins use tied down stubs, not physical cuts, to maintain the fin stress. The P-fins have physical cuts and partial stubs, which may reduce their stress and weaken them. I do not show the stubs here, but you can see them drawn in beginning steps in the end-of-post video recap.

If you look closely you will see that all the DQs are solved, in the sense that they are all connected to the edges. That used only half of the M0 lines. So, all we need to do now is solve the word enables.

The M1 wires completely solve that, too. There are two enables (port 0 and port 1) and the cell has the space for two M1 wires. All done! Much simpler than the first way of looking at an SRAM cell.

The first time I wrote this post I ran the signals the wrong way. It took me 3 tries and went all the way up to the M4 level, bursting at its limit. Now, same functionality, same base level fins and gates, but finished at M1 in the first try because the wiring is turned 90 degrees. Circuit layout is very sensitive to crowding and flow.

Hmm, but what are those stubs on the right side? Well, let’s think about distance. The M0 wires are probably on the order of 400 ohms per micron of length. They do not carry a lot of current but they must be capable of overpowering the feedback loop inside a cell when writing a new value. This will put some limit on their length, so an extra s[ace is used to allow stubs bring in M2 lines to help. These spaces add a small fraction to the size of the array.

This might be overkill, especially on smaller SRAM arrays. The M2 wires have 3 to 4 times better conductivity and are more decoupled from the capacitance loads at the cell, so they help deliver the signals faster and drive current better. This “drop zone” where they reach down to support the M0 wires probably would be used every 16 cells or so.

You may wonder about the need for power, which also runs on M0. I did not draw it but these can be raised up to M4 in a similar way.

Both the power and the DQ lines are not highly stressed. The DQ lines each serve one port, which means they only read or write one cell at a time. All the other cells are quiet and indeed provide some reserve capacitance which will help supply current to overpower the one cell which is being written. In a similar way the power wires at most serve 4 cells (because power is shared on neighboring edges, and then there can be cells from 2 ports) but again the other cells on the wire act as reserve capacitance. The wires may look long but the active loads are limited and do not change with longer wires.

If you notice there are 8 stubs you will now not be surprised these are for a similar purpose of extending the word enable lines.

The M3 layers support M1 carrying word enable signals from the array edge.

The 90 degree rotation has revealed a remarkably flat and simple design for SRAM, even if we choose to use M2 and M3 layers for fly-over support of M0 and M1.

Share Poratbo

Lets see that again

Here is the build-up again, in a video:

Getting this right took multiple tries and a new angle. I like the result!

Planning for the Edges

Even with flyover, the size of an array will reach practical horizontal and vertical limits. Each cell is 360nm high and 100nm wide. An array will need row selection circuits, word line drivers, and the column sense amps. Can we implement an array with 64 cache lines of 136 logical cells (128 bits plus ECC)? What is the size of the overhead compared to the size of the array?

The image shows a general approach to a complete array. Signals will need to be re-driven to cross the whole array. The DQ lines pose an interesting problem for redrive, since one wire provides both input and output to the cells. I will show one solution which has data written from one side and read on the other, allowing the long lines to be partitioned for direction and then rewritten.

Addressing for port selection could be done from both top and bottom. The signals only flow one direction. I will put them all on one side. I assume that solving conflicting addresses will require both addresses to be coming from one source where the ordering and coincidences are resolved.

Next Week

Next week we will look at the support circuits needed at the edges and at interior re-drivers of the array, and get some trial layouts to see the size of those elements. In the week after that my goal is to round it all off with some size formulae for total array plus edges.

Do please put questions or notes in a comment. You, dear audience, have been getting livelier which is helpful for me! I am happy to correct mistakes, add clarifications, and also to know what readers do or do not find interesting.

Poratbo

SRAM With Double the Fun - with a Twist

Sharing a bit makes it even better!

Lets see that again

Planning for the Edges

Next Week

Discussion about this post