Catalyst 3750 – Are they really that bad?

Over the past few years I have seen many comments in the blogosphere questioning the reliability of the Catalyst 3750 platform. I believe Ethan Banks (@ecbanks) has posted on this in the past and more recently Aaron Conaway (@aconaway). I thought I’d post my experiences over the past 4-5 years managing ~40x 3750 switch stacks composed of 2-5 switches without a single stack failure in production operation (we did have one stack member die due to lightning strike and the rest of the stack continued to function).

I will qualify that all of these stacks are in a relatively static physical installations but many of them have been subjected to temperature ranges of 22C to 30C (72f – 86f for the American folk). They aren’t configured with any advanced features, only routing/switching/access-lists and weren’t running bleeding edge IOS (mostly c3750-i5-mz.122-20.SE). Most of these stacks were at the edge delivering Ethernet services in a residential environment and not in the core or data center.

I will also add that I think both Ethan and Aaron are awesome engineers that I respect greatly and I’m and avid reader of their blogs, my experience with the 3750 just hasn’t been the same as they have described. Perhaps some of my anally retentive practices in deploying these switches has minimised the occurrence of failure.

PUT THE STACK CABLES IN PROPERLY

The stack cable is heavy with quite a relatively shallow connector; you should install this in the same way you would change a tire. I seat the connector and hand screw the cable in until it is firm and then use a flat head screwdriver a few turns on each side at a time until the cable is fully screwed in.

When provisioning a new stack I always screw all the stack cables in first and then power on the switches.

SYNCHRONISE YOUR IOS

Install the same IOS on a new stack member as the rest of the stack before installing using the archive command.

Switch#archive download-sw tftp://x.x.x.x/c3750-i5-mz.122-20.SE.tar

PROVISION NEW SWITCHES BEFORE HAND

Provision new switches on the stack before installing. I also configure the new switch with the appropriate switch number.

Switch(conf)#switch x renumber y

RIG THE ELECTION

Use the switch x priority command to rig the election on all switches. Highest priority wins the stack election.

I usually want switch 1 to be the master so give it a priority of 15 descending that number with each subsequent switch

Switch(conf)#switch 1 priority 15
SYNCHRONISE CONFIG

Manually drop the config of the switch onto new stack members before adding them too the stack. I know you don’t technically have to do this I just like removing all chances of error.

FINAL VERDICT

I think 3750’s are a great switch which in my experience have been rock solid performers. I would recommend them on the edge of anyone’s network, the ability to grow capacity as needed and distributed fault tolerance are big pluses. Would I use them in a data center? Probably not. I would also suggest that if you required a stack size of 4-5 switches off the bat then a 4500 would probably be a better option for cost vs features/performance.

Advertisements

11 thoughts on “Catalyst 3750 – Are they really that bad?

  1. Pingback: Aaron's Worthless Words » Blog Archive » Catalyst 3750s – Bad Luck with a Cisco Logo

  2. I agree that generally speaking, the 3750 stacks are fine. I’ve also heard that most of the issues that we ran into with our 3750 stackwise ports causing flapping stack adjacencies (manufactured in early 2006) were worked out in later production runs.

    My big remaining complaint is that it’s usually impossible to upgrade the IOS on stack members, without ultimately having to reload the entire stack to get the stack back in order. The stackwise tolerance for revision mismatches is ridiculously low.

    In my world, these stacks were servicing as an access layer for a high uptime financial services data centers…the inability to upgrade the stack without disrupting access to the connected hosts made for a lot of stupid workarounds to fix bugs that management opted to do instead of scheduling downtime to fix issues properly. That’s what was best for the business, so that’s what we did.

    Once you get a bad taste in your mouth for a certain technology, it’s hard to rinse it out. I can’t say I’d never deploy 3750s again (just built a 2-stack for a co-worker managing a QA/Dev environment yesterday in fact), but would be careful before deploying them in certain environments – all would depend on the data center.

  3. Ethan,

    Thanks for taking the time too comment.

    Completely agree with you on different horses for different courses.

    Are you meant to be able to upgrade IOS on 3750 stacks without reloading them? I’ve never tried this.

    I have never had issues joining new stack members to production stacks though.

    Thanks

    Fletch

  4. In a stack, if you upgrade the IOS on one stack member, then reboot that one stack member to load the new image, most of the time that upgraded stack member cannot rejoin the stack because of a stackwise version mismatch. So the only way to get the whole stack upgraded is to get the new IOS loaded on all of the stack members, then reload the stack completely.

    There was some tolerance for very minor stackwise version differences as I recall, but if you were going from say 12.2(25) to something far away like 12.2(54), forget it. Maybe that’s gotten better lately (I haven’t been doing much upgrading of 3750 stacks over the last year), but my bet is it’s still a concern.

    This is a problem if you’re in a high-uptime environment where you spread your physical uplinks from a host (maybe with LACP or network teaming) across multiple stack members with the expectation that you’d never have all the stack members down at the same time. That’s an effective strategy for power and other sorts of failures, but then blows up when an IOS upgrade becomes necessary. You actually need to spread your host uplinks across multiple *stacks* to maximize your host uptime, and I don’t know too many environments that are going to run multiple 3750 stacks for top of rack or even end of row…too expensive, and probably too much port density. Makes better sense just to have 2 physically separate switches in that specific use case…and that’s what you call a 3560. 😉

    I never have had issues joining new stack members to production stacks either, assuming the stack member IOS versions were compatible.

  5. Ethan,

    Have you got any documents claiming that upgrading say the master switch and then reloading that individual switch in the stack is supported?

    While I’ll admit it would be a cool feature to have, everything I have ever read about Stackwise indicates that you have to treat the stack as a single entity and all IOS versions must be kept in an identical state. It is a bit rough to criticise a product for a feature it doesn’t claim to have. How I wish my 3750 could toast bread also 😉

    I think we are just coming at this from different angles. I think 3750’s make a great Access/Distribution layer switch outside of the DC. I probably wouldn’t use one in a server edge or high-uptime environment for the exact reasons you have described.

    Do 3560’s support Multi-chassis Link Aggregation?? Though you would be ok if you were just using NIC teaming with some kind of beacon.

    Fletch

  6. I don’t know if Cisco documentation claims you can upgrade one stack member at a time offhand, although at one time there was a matrix available listing what IOS and stackwise versions were interoperable within a stack. I do know that Cisco sales told the guys buying the gear that you could definitely upgrade one at a time – it was a critical design element of the data center. When I got hired to lead the route/switch deployment in the data center build sometime after the equipment purchase, I was tasked with developing a process to make hitless IOS upgrades (assuming multiple physical server uplinks) happen within the stack…only to discover that Cisco sales had overpromised. (Surprise!) At the time it was sold back in the day, I believe the solution was billed as a Cisco reference architecture, although I have to guess it was never any such thing. Sales guys say what they have to say to close a deal sometimes, which is too bad.

    I do not believe 3560s support MEC, (although maybe, haven’t read release notes of late). You’d be relying on something other than LACP for a dual switch top-of-rack situation there (network teaming like you said), or moving into Nexus gear if MEC LACP was a critical design element. Or doing something altogether different to provide application redundancy, such as using hardware load-balancers with pool members uplinked to different physical switches to maintain application availability during network maintenance windows or outages. Or swinging application traffic to a different data center…often easier in concept than in actuality.

    The 3750s are what they are. In the beginning, Cisco didn’t get stackwise right, period. Historically, I think it’s reasonable to say that Cisco *rarely* gets new paradigms 100% right out of the gate. VSS users have had dramatically mixed experiences. NX-OS deployments have run into challenges. In 2010, stackwise seems fully baked from all reports I hear, but it just so happens that back in 2006 I got burned with a bad run of hardware potentially impacting 100’s of switches (in reality far less, but still painful requiring a team to forklift out a ton of 3750s), a difficult upgrade process, and data-center impacting problems with the few that did break. I’ve spent far more time than I care to think about working around 3750 stack issues.

    I’m not really arguing that no one should ever deploy 3750 stacks…more just stating my experience since your article contends that maybe Aaron and I were underconfiguring the stacks: a very polite way of saying that maybe our 3750 failures were because we didn’t know what we were doing. Not unfair to think that, since I think the context of my 3750 comments was a podcast rant where there wasn’t time to develop in detail. Or maybe I blogged about it?

    Every technology, protocol, piece of hardware, and configuration detail adds or reduces an element of risk. Building a network infrastructure for high availability is a balancing act. During the sales process, the promise of stackwise was reduced management headaches and increased resiliency. In my experience, fail. I readily acknowledge that most other people haven’t had that experience. But I’ve found that I’ve often had to design a network to survive when technology does not work as advertised…to answer the question, “So what do we do when it doesn’t work like it’s supposed to?”

    I’m getting old and grumpy, I’m afraid. Have we beat this topic to death now? I think we have…

  7. Pingback: I Admit It: It’s Hard For Me To Forgive 3750 Stacks « PACKETattack

  8. We’re finding out ourselves, along with the general IT community, that there are upgrade issues when 3750 switches are stacked. The switch will boot after a software upgrade, but not with the new image because the wrong command was inputted. Some of these wrong commands inputted are “/leave-old-sw” and “archive download-sw/leave-old-sw”, which will lead the switches to boot with old software. There is an simple solution to upgrading the software to all WS-C3750 when stacked. There is a command that you can input into the system, and that is: “archive copy-sw”. For more information about this issue, please refer to our blog. http://www.ccnytech.com/blog/Cisco3750/

  9. We had a large number of failed 3750 switches where I work so I investigated why. It appears that the 3750 switches that date up to 2005 have suffered from bad capacitors made in Taiwan (search for Capacitor Plague). Sometime in 2005, Cisco changed motherboard design for the 3750 which does not have the 2 banks of capacitors.
    They do not always fully fail. One of the symptoms of these switches has been intermittent stack ports… as well as other odd behavior.

  10. Also, they have a limited lifetime warranty (EOS announcement plus 5 years) so Cisco is still replacing them.

  11. Hi, Can you stack 24 port and 48 port 3750’s together, of course if the IOS is the same?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s