Programmable Chipsets: RMT (text)

by Sasha Shkrebets last modified Mar 20, 2023 08:01 AM
We are continuing the module on programmable data planes and we are also continuing our discussion of how to make programmable data-planes more scalable. In this part of the lesson we'll focus on techniques that can be used to make hardware more programmable.
Welcome back.
We are continuing the module on programmable data planes and we are also
continuing our discussion of how to make programmable data-planes more scalable.
In this part of the lesson we'll focus on
techniques that can be used to make hardware more programmable.
Before we jump into the capabilities of current hardware and what we might
do to make data-plane hardware more programmable, let's first explore
what we really want from SDN and examine whether or not current
hardware really provides the desirable features that we would like from SDN.
One of the main goals of SDN is to support protocol-independent processing.
In other words, we should be able to process traffic
traveling through the network independent of any particular control protocols.
We should be able to control network
behavior and repurpose our network devices in the
field without redeploying hardware and we'd like these
functions to be implemented with fast low-power chips.
Unfortunately, the hardware that is deployed in today's
networks still constrains what we're capable of doing.
OpenFlow is protocol dependent because of
the constraints of conventional switching chips.
The OpenFlow protocol has had to map
its functions onto the capabilities of existing chips.
Now, that mapping has enabled quick adoption, but at the same time, it's
constrained what we might think about
putting into a control protocol like OpenFlow.
So it's worth asking the question, what we would
do differently if we could completely re-design the data-plane.
We'll explore this question in the context of two different projects.
Both of the projects that we'll look at are rooted in the following insight:
there are relatively few data plane primitives
that a network device needs to perform.
In other words, the set of functions that we
want to perform on packets is actually pretty limited.
We might need to do some bit shifting or
parsing or rewriting of different header fields, various types
of manipulations, traffic shaping, forwarding decisions and so forth,
but it's fairly easy to enumerate that list of functions.
Now we might compose those functions in different ways, but ultimately
the building blocks that we need are in fact pretty limited.
So this insight leads us to the conclusion that we can in fact build a flexible data
plane by developing a fixed set of modules
and coming up with ways or methods of integrating.
In other words, we design hardware that
provides the building blocks and then allows
us to plumb those building blocks together and we get a fast programmable data-plane.
We will look at that approach in the context of two different architectures.
One is OpenFlow chip that
provides generalizable, programmable match-action primitives.
The other is a programmable, modularizable FPGA-based data plane called SwitchBlade.
Let's first take a look at the OpenFlow chip.
The OpenFlow chip design is a recent design exercise
to see whether a chip could parse existing and custom
packet headers and perform a number of sequential stages
of match-action to build a much more flexible hardware data-plane.
Let's take a quick look at the design of this chip.
The chip is laid out with a RISC-like architecture, in other words, a reduced
instruction set, that allows processing to effectively ride Moore's law.
In other words, as the chips get faster and faster, we can
process packets at higher rates yet we can still compose these
instructions to perform fairly complex forwarding operations.
The chip has as many as 32 stages of match and action.
Let's take a look at what happens for
matching and actions at each stage of the pipeline.
Match tables need to be flexible.
The match tables are tables that are laid out in two parts of memory, TCAM and SRAM.
The table structure requires some creative memory management because
often processing does not require 32 stages of match and action.
And yet, there may be tables that need to be fairly big, often
bigger than the memory that has been laid out for one particular match stage.
Therefore, the memory management of the chip needs to be such
that we can create logical tables that span multiple physical stages.
Each action processor performs actions on one or more fields in
the packet header using the VLIW instruction set provided by the chip.
Because each action processor takes less than a square
millimeter of area on the chip, the processing pipeline
can afford many action processors for each stage potentially
resulting in hundreds of action processors across the chip pipeline.
This architecture permits a very flexible match-action based programmable
data-plane for only about a 15% overhead and chip area.
However, the data plane is still based on performing sequences of match and action.
In practice, network operators may wish
to perform increasingly complex and sophisticated
operations on streams of packets such as on the fly transcoding or encryption.
To perform these more complex operations we need
to place more sophisticated packet processing in the data-plane.
Of course, customized hardware can do this, but what
if we wanted to support this with a programmable data-plane?
That's the idea behind SwitchBlade, which
is a programmable, modularizable FGPA-based data plane.
The main idea behind SwitchBlade is to identify modular hardware building
blocks that can implement a variety of data-plane functions and then
allow a developer to enable or disable these building blocks and connect them
in a hardware pipeline using high level software programming managers.
Potentially, we may also want to allow custom
data-planes to operate in parallel on the same hardware.
For example, if we had specific traffic flows that
needed to be transcoded or encrypted, we might only
want to apply that on a subset of the
traffic or traffic coming in on specific virtual interfaces.
So we'd like the hardware to support that type of virtualization as well.
In other words, we'd like to get the
advantages of both hardware and software with minimal overhead.
SwitchBlade pushes custom forwarding planes into programmable hardware.
You might have a programmable software router like Click, running
in one or more virtual environments on a hardware platform.
Instead of having all of that forwarding take place in software, we'd like to
push that forwarding function, some of which
may be custom, down into the hardware.
We'd also like to have multiple virtual data planes,
each of which supports different custom packet processing pipelines.
The first stage in the SwitchBlade
pipeline is the virtual data plane selection.
That is when traffic arrives we need to determine which virtual
data plane or packet processing pipeline the traffic should be directed to.
The idea is that SwitchBlade
should support separate packet processing pipelines,
lookup tables and forwarding modules for each of these virtual data planes.
A table in memory maps the source MAC address on the incoming packet to the
virtual data plane identifier based on that virtual
data plane that the packet is mapped to.
Switchblade then attaches a 64-bit platform header that controls
the functions that may be performed in later stages.
The header can also be controlled from
high level software programs using a register interface.
SwitchBlade does have a traffic shaping step, but we
will not talk about that step in this lesson.
Let's proceed to the step where
SwitchBlade performs preprocessing on the traffic.
SwitchBlade selects processing functions at this step from a library of reusable
modules that have already been synthesized in the programmable hardware.
The preprocessor thus allows a programmer to quickly customize the
packet processing pipeline without needing to re-synthesize or
re-program functions using a hardware description language.
Rather, the programmer can control everything
from a high level programming language.
We've shown how this library of reusable modules can be used to implement a variety
of custom data planes including a multi-path
routing protocol called Path Splicing, IPv6 and OpenFlow.
The preprocessor hashes custom bits in the packet header and then
inserts the value of that hash into the SwitchBlade platform header.
The ability to select custom bits from the
packet header to create that hash platform header is
what allows SwitchBlade to perform custom processing and forwarding
decisions based on arbitrary bits in the packet header.
One example of a protocol that can
be implemented using SwitchBlade is OpenFlow, built
a limited implementation of OpenFlow with no
matching based on VLANs and no wildcards.
The preprocessing steps are quite simple.
The preprocessor essentially parses the packet and extracts
the relevant tuples corresponding to the OpenFlow flowspace.
It then passes those bits to the hashing module in the SwitchBlade preprocessor,
which outputs a 32-bit hash value that
controls both packet processing and forwarding decisions.
Adding new modules in SwitchBlade, of course,
requires Verilog programming but it is possible.
The idea is that synthesizing or adding new modules
should not be that frequent based on our intuition
that many custom data plane operations can be performed
with relatively few hardware primitives in the data plane.
Forwarding consists of three steps.
There's an output port lookup
process, which performs custom forwarding, depending
on the bits that have been set in the platform header.
Wrapper modules allow matching to be performed on custom bit offsets.
And, custom post processors allow other functions
to be enabled or disabled on the fly.
SwitchBlade also provides the capability to throw software exceptions.
So if a programmer wants a particular packet
operation to be performed, but the hardware modules
do not support it, the programmer can specify
that some packets be redirected to the CPU.
Those packets are passed to the CPU with
the virtual data plane identifier and the SwitchBlade platform
header, which allows for software exceptions to be executed
that are specific to that packets virtual data plane.
This combination of virtual data planes and custom postprocessing
allows SwitchBlade to perform different packet processing
operations depending on the type of packet that arrives.
So an IPv6 packet might be subject to
a number of operations such as a TTL decrement
whereas a layer two OpenFlow packet might be simply
directed straight from forwarding logic to the output queues.
Other custom protocols, like path splicing, might also be passed through a
custom set of post processing modules
that are selected from the pre-synthesized modules.
In summary, another way to make programmable data
planes scale is to make hardware more programmable.
We've built on the insight that if we optimize a few primitives and
provide the ability to compose those primitives, then the hardware
data plane can in fact be quite simple and yet extremely flexible.
We've seen two such examples of programmable
hardware data planes that build on this insight.
One is the OpenFlow Chip and the other is SwitchBlade.
Navigation