I've been encountering a strange but annoying issue in my office in work - if anyone could shed any light on this or have any ideas as to what could be the cause it would be great.
We have 30 Dell PowerEdge blade servers with 2x NICs in each - these are running about 3-4 virtual machines at a time. Most of these 60 interfaces are connected to a Netgear GS748T(v3) 48 port managed switch. Last week a colleague complained that accessing his server was extremely slow. I noticed the NIC activity LED was flashing very rapidly and restarting the interface didn't do much. Moving it to a different port on the Netgear did resolve the issue and moving a server from a "good" port to a "bad" port showed that the problem stayed with the port on the Netgear.
More people are complaining and right enough, this issue is now effecting 7 or 8 ports at a time, all completely random. Hard or soft rebooting the Netgear switch will shift the problem randomly to different ports but never resolve it.
Today, I port mirrored one of the "bad" ports to see if there was a lot of traffic on it but there was nothing out of the ordinary - just regular network traffic, 90% of which was going to the Dell server rather than from. I then isolated the port to a VLAN so just my laptop and the Dell NIC were on it - the activity LED continued to blink like it was transferring at 100Mbps but Wireshark showed practically no traffic.
I did a factory reset on the Netgear but it booted back up with 7 servers suffering the same issue (and the web interface not loading at all :( ). It's a bit of a mystery because our traffic wouldn't be very heavy at all, certainly extremely well below the quoted maximum throughput of 40Gbps that Netgear state this model can cope with. I've also noticed for some time that some ports don't negotiate 1Gbps despite all the Dells' NICs supporting this speed.
It would be great to hear any suggestions, ideas or clues as to what the cause could be or ways to pinpoint it.
Thanks in advance! :)