Yellow Dog Solutions

TOPIC: Gigabit Ethernet Support on G4s



Introduction
The following has been extracted from an exchange of emails between two of our customers: Bill with NASA and Greg with JPL. This discussion concerns YDL support for the gigabit ethernet cards that ship with the Apple G4 towers. Note that Exchange #6 below offers performance surpassing the GMAC driver by 4%.


Exchange #1: Bill wrote
This is your lucky day. I work at NASA Goddard, and we just got in some new 867 MHz G4s, which I'm in the process of installing YDL on. I initially installed YDL 2.0, which defaults to a 2.2.19 kernel, which doesn't support GigE. I then tried a 2.4.6 kernel, but this didn't handle GigE either. Just last night, I updated to YDL 2.1 using yup update, and this installs a 2.4.10 kernel. I tested this earlier today, and the GigE worked fine with the default GMAC driver. If you want to try even newer kernels, I would recommend getting the latest BenH kernels via rsync (I plan on trying out one of his 2.4.14 kernels later).

Here is the result of my initial performance tests from earlier today:

clifford:	Apple-PowerMac_G4/867   running YDL 2.1   kernel 2.4.10-12a
wiz:		Apple-PowerMac_G4/2x500 running YDL 1.2.1 kernel 2.4.6 (UP)

MTU:		1500

Transferring 16384 64-KB buffers for a total of 1 GB of date:

clifford GMAC/Extreme-5i -> wiz Alteon/cisco-3508:

	clifford:	transmitted 495.7805 Mb/s using 44% of the CPU
	wiz:		received    494.7032 Mb/s using 93% of the CPU

clifford GMAC/Extreme-5i <- wiz Alteon/cisco-3508:

	clifford:	received    622.6924 Mb/s using 52% of the CPU
	wiz:		transmitted 621.5185 Mb/s using 99% of the CPU

clifford -> clifford:

	clifford:	transmitted 2456.1632 Mb/s using 57% of the CPU
	clifford:	received    2456.0635 Mb/s using 42% of the CPU


We're interested in evaluating the major performance benefit (and lower CPU overhead) of using jumbo frame (9K MTU) GigE, although the above is just with the normal 1500 byte Ethernet MTU, since we don't have all the necessary equipment in or installed yet. The Extreme 5i GigE switch supports jumbo frames but the cisco 3508 does not. I am not sure if the built-in Broadcom BCM5411 Ethernet chip that comes with the new G4s has jumbo frame support, but I don't think that it does. The Alteon NIC in wiz uses the acenic driver and does support jumbo frames, but I don't know if you can still buy these NICs (you'd have to check with 3Com which bought out the Alteon NIC business). We also have some 3Com 3C996 NICs, which are supposed to support jumbo frames, but the Linux bcm5700 driver hasn't been ported to PPC yet to the best of my knowledge. The best bet for jumbo frame support may be the NetGear GA-620T (I hope that's the right part number as I'm doing this from memory at the moment) NIC, which is supposed to support jumbo frames using the acenic driver. We have a couple of these on loan to try out so I should know more about them shortly. I don't know if you're concerned about jumbo frame support or not, but you might want to take it into consideration.

I hope this helps you out. The 2.4.10 kernel that comes with YDL 2.1 (or one of the more recent BenH kernels) should work fine with the built-in GigE on the newer G4s using the GMAC driver. BenH is also working on an alternate sungem driver, but it wasn't quite ready just yet, and in any case, the regular GMAC driver should be fine for your purposes in running a cluster system.


Exchange #2: Bill wrote
Here's the dmesg from our new 867 G4 (running YDL 2.1 kernel 2.4.10-12a):
clifford# dmesg | grep eth0
eth0: GMAC at 00:03:93:1b:37:b8, driver v1.5k4
eth0: PHY ID: 0x00206071
eth0: Found Broadcom BCM5411 PHY (Gigabit)
eth0: Link state change, phy_status: 0x796d
eth0:    Link up ! BCM54xx aux_stat: 0xff3c (link mode: 7)
eth0:    Full Duplex: 1, Speed: 1000


And here's the dmesg from our older dual 500 G4 (running BenH kernel 2.4.6):
wiz# dmesg | grep eth0
eth0: GMAC at 00:30:65:c3:83:2e, driver v1.4k4
eth0 PHY ID: 0x00206047
eth0 Found Broadcom BCM5400 PHY (Gigabit)
eth0: Link state change, phy_status: 0x616d
Note the testing I reported was with eth1 on wiz, which is the Alteon NIC card. The built-in GMAC ethernet is currently only connected to a Fast Ethernet switch, because at the time we initially installed that system, we didn't have a GigE switch that had copper ports (that's why my much earlier testing was done with two such systems directly connected back-to-back with a 8-wire GigE cable). Now that we have a copper GigE switch to test with, if I get some time today, I'll try moving wiz's GMAC ethernet connection from our cisco 5500 to the Extreme 5i, and see if I can get it to work as a GigE connection.

Some other possible differences are the type of GigE switch being used (Extreme 5i in our case) and how the ports are configured (we just have them set to auto-negotiate at the moment IIRC).


Exchange #3: Bill wrote
OK, we moved wiz's (our older dual 500 G4) built-in GMAC GigE Ethernet connection to the Extreme 5i GigE switch, and it does work fine. Here's the dmesg output:

wiz% dmesg | grep eth0
eth0: GMAC at 00:30:65:c3:83:2e, driver v1.4k4
eth0 PHY ID: 0x00206047
eth0 Found Broadcom BCM5400 PHY (Gigabit)
eth0: Link state change, phy_status: 0x616d


Note this is still with the BenH 2.4.6 kernel. The performance wasn't as good as with using the Alteon NIC card on wiz, but was still fairly decent:

clifford GMAC/Extreme-5i -> wiz GMAC/Extreme-5i:

	clifford:	transmitted 363.4113 Mb/s using 31% of the CPU
	wiz:		received    362.6118 Mb/s using 66% of the CPU

clifford GMAC/Extreme-5i <- wiz GMAC/Extreme-5i:

	clifford:	received    358.7618 Mb/s using 32% of the CPU
	wiz:		transmitted 358.2726 Mb/s using 46% of the CPU


I haven't had a chance to install YDL 2.1 on our other new 867 G4 yet, but when I do, I'll do some further network performance tests between the two 867 G4s, and post them back to everyone. I hope to achieve nearly full GigE performance.


Exchange #4: Bill wrote
I thought you guys would be interested in the following message I just posted to the linuxppc-dev e-mail list, where I compared the GigE performance of the GMAC and SUNGEM drivers on the latest 2.4.15-pre4-ben0 kernel.

Although its performance wasn't as good as the GMAC driver (and used a lot more CPU), you might want to try the SUNGEM driver to see if it would at least support the hardware that's being used at JPL. One other idea I had that you might want to try is hooking two of your systems back-to-back, just to eliminate any possible switch problems. I suspect however that your problem was the GMAC driver just not properly recognizing the specific Broadcom GigE chip built into your hardware, in which case the SUNGEM driver might possibly recognize it (lower performance is still lots better than not functional and the lower performance still exceeded 500 Mbps).
I just did a GigE performance comparison of the GMAC and SUNGEM drivers using the latest 2.4.15-pre4-ben0 kernel. The two test systems were both 867 MHz G4s connected to two ports on the same Extreme 5i GigE switch. The test was simply to measure the sustained TCP network throughput for a 60 second period (using memory-to-memory transfers of 64 KB buffers with a 768 KB window size). The test was run shortly after both systems had been rebooted, and there was nothing else of significance running on either system (not even X windows).

The GMAC driver had significantly better performance. It sustained 663 Mbps for the 60 second test period, and used 63 % of the CPU on the transmitter and 64 % of the CPU on the receiver. By comparison, the SUNGEM driver only achieved 588 Mbps, and utilized 100 % of the CPU on the transmitter and 86 % of the CPU on the receiver. Thus, the SUNGEM driver had an 11.3 % lower network performance while using 58.7 % more CPU (and was in fact totally CPU saturated).

I was actually somewhat disappointed even by the GMAC GigE performance. I was expecting to achieve nearly full GigE performance, and since there was still about 1/3 of the CPU available, the bottleneck was obviously elsewhere. Perhaps it is just a limitation of the actual Broadcom BCM5411 GigE chip built into the 867 MHz G4. I am hoping that this is in fact the case. I will be trying more tests later using a NetGear GA620T PCI NIC using the ACENIC driver to see if it has better performance. This NetGear NIC is also supposed to support jumbo frames (9K MTU), and I am very interested in determining the presumably significant performance benefits and/or reduced CPU usage associated with using jumbo frames. -Bill


Exchange #5: Bill wrote
I'm in the process of working with BenH to do some additional testing of the SUNGEM driver. He felt and tests I did confirmed that the SUNGEM driver was generating an abnormally high number of interrupts. He's looking into a possible bug fix for this problem which may correct the performance problems with the SUNGEM driver (he said he would send me a new driver shortly for testing). I'll keep you all posted as to the (hopefully good) results. -Bill


Exchange #6: Bill wrote
I have a further update. I have been working with David Miller of RedHat and Benjamin Herrenschmidt to optimize the performance of the SUNGEM driver. David supplied several performance patches, Ben incorporated them into his kernel tree together with some of his own patches, and I did the testing. I can now report great success, in that the SUNGEM driver now actually outperforms the GMAC driver by about 4 % on my two 867 MHz G4 systems (each with 640 MB of memory) with basically the same amount of CPU usage on the two systems. The new SUNGEM driver is available in Ben's latest 2.4.17-pre1-ben0 rsync kernel tree. Ben has also stated that he intends to mark the GMAC driver as obsolete and that it will no longer be available in 2.5 and later kernels. It would be interesting to find out if the new SUNGEM driver works or not on the JPL G4 systems. It might be a good idea to start a table on the GigE web page that was just created listing what GigE systems are known to work and which ones don't, something like the following:
System             GigE Chip*  Kernel                  Driver    Status
------             ----------  ------                  ------    ------

Dual 500 MHz G4    BCM5400     2.4.6                   GMAC**    Works
                               2.4.17-pre1-ben0        SUNGEM    Works
     867 MHz G4    BCM5411     2.4.10-12a (YDL 2.1)    GMAC      Works
                               2.4.17-pre1-ben0***     SUNGEM    Works
     533 MHz G4    BCM5401     2.4.10-12a (YDL 2.1)    GMAC      Doesn't Work
                               2.4.17-pre1-ben0        SUNGEM    ???

* from "dmesg | grep eth"
** GMAC driver being phased out after 2.4 kernels
*** Actually, this is the 2.4.17-pre1-ben0 SUNGEM driver running on a 2.4.15-pre4-ben0 kernel, because of a current problem booting newer kernels on this system

Visit
www.penguinppc.org/dev/kernel.shtml to download the latest kernels.



 
          Copyright ® 1999-2010. Fixstars Corporation. All rights reserved.
YDL.net Fixstars Corporation