FCoE: Unovering the CNA - Deep Dive
Continuing with my current theme of Fibre Channel over Ethernet, and at the request of several people, this post will take a close look at Converged Network Adapters (CNA).
Now then, there is no easy way to put this, but to do justice to a topic like this will require a lot of words. I’ll do my best to keep it succinct, but if you are looking for a high level overview in under 1,000 words then this is probably for you, but thanks for stopping by……… However, if you want a deep dive and opportunity for technical discussion, then this might be what you’re looking for.
Still here? Magic, let’s go…..
First things first, its best to have a quick look at what things look like without CNAs, in order to more fully appreciate some of the problems CNAs are resolving.
In a traditional server with no CNA cards and not connected to a unified fabric it is not uncommon to see the following “network” related sprawl (sorry about the terrible diagram but I wrote most of this post while on an aeroplane) –
The FC HBAs tend to run at either 2, 4 or 8Gbps with the NICs usually running at either 100Mbps or 1Gbps.
The multiplicity of NICs is common to meet the demands of multiple traffic types such as; production, backup, management etc.
It is also worth noting that most servers are built with two physical HBA cards installed to provide both redundancy and increased aggregate bandwidth/performance - commonly referred to as I/O multi-pathing and is a must in 99.99% of FC SANs.
But that was then and this is now….. or nearly now
Say hello to the Converged Network Adapter, or CNA for short.
The Converged Network Adapter is exactly what it says it is – an HBA and a NIC converged on to a single PCIe adapter –
Diagram 2 - Picture courtesy of Emulex
Digging into the technical detail, some CNAs have a single ASIC that performs both the HBA and the NIC functionality, whereas other have a separate ASIC for each of the distinct functions. This really depends on the vendor and model. The important point being - CNAs provide HBA and NIC functionality in hardware, making them fast and reducing CPU overhead. Implementing functionality without thieving CPU cycles is a big plus-point in server virtualisation environments.
Considering the fact that stealing CPU cycles is undesirable at the best of times and a federal offence punishable by death in virtual server environments, providing hardware offloads is vital. To help out in this area, the latest raft of Generation 2 Emulex OneConnect Universal Converged Network Adapters, such as the LP2100x, provide full CPU offload for all protocols on a single chip design and are FIP compliant! That’s not just FCoE offload, we’re talking about TOE and iSCSI as well – the iSCSI one is interesting and may be a chat for another day!
NOTE: I will point out that I had a chat recently with Emulex and was very impressed. They are laser focussed (pun intended) on FCoE and really understand the market and value position of FCoE. A real pleasure to chat with passionate like minded people, oh and they didn't hang up on me when I mentioned "Broadcom" ;-) VERY unprofessional of me but I was unbelievably jetlagged when we spoke, so I thank them for overlooking my lack of tact and poor humour.
So with the fact that CNAs provide both HBA and NIC functionality, as well as connecting to 10Gbps Enhanced Ethernet fabrics, it shouldn’t take a rocket scientist to figure out that we could swap out our 6 PCI adapters from Diagram 1 and replace them with just 2 CNAs (2 for redundancy). If you think about it, this has the potential to reduce PCI adapter count and cable sprawl by crazy%
Early generation CNAs are installed as PCIe adapters in servers and as PCIe mezzanine cards in blade servers. However, the way forward is to eventually have them embedded on the motherboard – a la’ LOM – vendors currently have this on their roadmap. But this is no biggy right? Well actually this has the potential to hugely reduce the amount of space consumed by PCI adapters inside of servers paving the way for higher density in blade servers where real-estate is at a premium!
Now this brings up another interesting benefit. Not only is FCoE and its associated technologies resolving todays problems, it is actually enabling a better future. How good would it be to have 6 or more CNAs per blade server! That’s 60Gbps+ of flexible bandwidth based on today’s current 10Gbps Enhanced Ethernet!
Now this becomes a real winner in light of the current raft virtualisation optimised CPUs on the market - think Intel Nehalem and associated virtualisation technologies like Intel VT-x VT-c, VT-d VMDq - xpect the ratio of VMs per core to rocket skyward! In order to keep pace with the advancements in chip technology the I/O subsystem needs to move in step. CNAs and Lossless Enhanced Ethernet are vital to this.
This point aside, I recently had a hand in a design involving HP BladeSystem c7000 technology. I remember during the design wishing that the system had CNAs instead of the Flex10 and Virtual Connect technologies. Could have saved space and cabling as well as potentially given more flexibility. However, this was prior to even FCoE being ratified by INCITS.
Obvious benefits summarised
So some obvious benefits that CNAs bring include - reduced number of PCI adapters, cables, space, power and cooling. Nice!
Secondly, by supporting 10Gbps Enhanced Ethernet they provide greater bandwidth than existing adapters allowing more traffic types and volume to travel down the same stretch of cable. Suddenly the wire once nirvana is now a reality (actually wire twice if you want true I/O multi-pathing).
NOTE: A quick heads up on throughput. While CNAs provide access to 10Gbps CEE networks, not all services, including FCoE, are supported at full bandwidth. For example, CNAs normally support Ethernet at 10Gbps but FCoE only runs at a max of either 4Gbps or 8Gbps. This is in line with 4Gbps and 8Gbps FC – there is currently no standard for 10Gbps FC N-Port to F_Port connections.
<OK so this is about the half way point, so you might want to come up for some air here before discussing the interesting stuff>
Clever things with CNAs
Now to the deeper technical stuff………..
With 10Gbps Enhanced Ethernet, FCoE, CNAs and other associated technologies being relatively new, the standards bodies (such as INCITS T11 FC-BB-5 for FCoE) as well as the vendors are able to design with virtualisation in mind. This is great!
Let’s mention a couple of technologies that bring a lot to the table in Hypervisor environments.
The first technology worth mentioning is N_Port ID Virtualisation, otherwise known as NPIV. NPIV is a T11 INCITS and ANSI standard initially developed by IBM and Emulex, and yes I know it’s not exactly new.
Because NPIV is not a new technology I wont spend much time on it here other than to say NPIV makes it possible for a single HBA to have several N_Port IDs and WWPNs, allowing virtual machines to be uniquely addressable on the SAN and therefore able to be managed on the SAN just like normal physical servers.
If it would be useful I can put something up on NPIV. Just leave a comment at the bottom asking and if enough people ask I’ll post on it.
I/O Virtualisation (SRIOV)
So while NPIV allows multiple VMs to be uniquely addressable on the SAN side of an HBA, on the other side, the side of the servers PCI tree, there is still a 1:1 mapping between physical ports and addressable PCI devices.
Early virtual server implementations see the hypervisor act as middleman, between the I/O adapter and the Virtual Machine (VM). The VM never actually talks directly with the I/O adapter, always through a middle-man - the hypervisor. But like any middleman scenario, it has its advantages and disadvantages. The middleman would argue that he adds “value”, but is rarely so keen to point out that he also always takes a generous percentage of the spoils (in our case, I/O performance and CPU cycles).
Some of the advantages and disadvantages of having the hypervisor as the middleman might include –
- Hypervisors often provide snapshot capabilities
- Hypervisors often provide thin provisioning capabilities
- Many existing Hypervisor technologies are currently designed around this model
- Lower I/O performance. Because the hypervisor handles all I/O to and from a VM, as well as the associated interrupt handling etc, this injects server-side latency in to the I/O path as well as stealing CPU cycles form the main role of the Hypervisor.
- Security concerns. The fact that all I/O, to and from VMs, is seen and touched by the hypervisor may be a concern to some people :-S
- Limited feature set. You are restricted to the features and functions provided by the Hypervisor and not exposed to the full feature set available from the manufacturers native driver….
A common example where having the Hypervisor act as the middle man is not seen as desirable is a high throughput OLTP type system. These are rarely virtualised due to, among other things, the I/O performance impact associated with having the Hypervisor as the middle-man.
So what does SRIOV do to help?
NOTE: Single Root I/O Virtualisation (SRIOV) and Multi Root I/O Virtualisation (MRIOV) are extensions to PCIe brought to us courtesy of the electronics industry consortium known as the PCI Special Interests Group (PCI-SIG).
SRIOV implementations allow an I/O adapter, in co-operation with the Hypervisor, to be sliced up in to multiple virtual adapters. Each of these uniquely addressable on the servers’ PCI tree. Each virtual adapter can then be mapped directly to a virtual machine and in turn is directly addressable by that virtual machine – no more middle-man. SRIOV based adapters may have dedicated I/O paths in silicon and work alongside other related technologies such as Intel VT-c, VT-d, VMDq…. to provide a huge variety of offloads and assists aimed at reducing the load on main CPU and memory systems and thus increasing system performance. This mode of operation is sometimes referred to as “hypervisor bypass” mode (we should be thinking VMDirectPath by now) and offers close to full line rate, making virtualisation a more realistic option for transaction based and other I/O intensive systems like our example OLTP system.
This also opens the door to things like VMs running drivers provided by the manufacturers making new functionality immediately available without waiting for the Hypervisor middle-man to support it. Obviously it all needs buy in from your Hypervisor …..
Of course SRIOV is a semi-open standard (I say semi-open because several months ago when I researched it you had to pay to get access to the standards docs) and like most standards each vendor is free to implement the specifics in their own unique and value-add way as well as to add more features and requirements around it.
So in a nutshell, NPIV enables a single port to have multiple discrete addresses on the SAN side, whereas SRIOV allows a single I/O adapter to have multiple discrete addresses internally on the servers PCI bus/tree. The diagram below from a long time ago on the Cisco website shows an I/O card that can be partitioned in to 128 (0-127) virtual interfaces.
Diagram from the Cisco website
Say goodbye to hardware rip and replace
By implementing the above mentioned technologies it becomes possible to run a single cable to a server and then use management software to configure the virtual interfaces according to current requirements. For example, removing a NIC function and adding an HBA function becomes just a software change - no requirement to physically swap out a PCI card or lift the floor and run new cables. Simply make the change in software and the new device will be presented to the servers PCI tree, job done! Sound good to anyone else?
There is more, but I doubt anyone would read more than this in one post. Other topics can always be discussed in the comments section below....
So if you made it all the way to hear, thanks and I hope it was useful. Please feel free to join the discussion either here on the blog site via the comments section below, or by following me on Twitter @nigelpoulton. I only talk about storage and related technologies.
PS. I am an independent consultant and can be contacted at nigelATrupturedmonkeyDOTcom