We have a very basic but large network. The network consists of a CPX L32e and Magelis connected to an unmanaged switch. Then a wire goes to an N-Tron managed switch. This is duplicated around 30 more times with the N-Trons setup as a ring. The wire that goes between the N-Tron and unmanaged switch is very susceptible to pinching and breaking. The network is just supervisory and only the Magelis and CPX need connection to operate.
The problem arises when the wire becomes pinched and shorts. Under the right condition the wire can short and bring the whole network down. All CPX and HMIs loose their CIP connection. I have been able to bench test and duplicate the issue and snoop with Wireshark.
When the wire shorts it essentially loops the Tx and Rx (10/100) and creates a ghost device. Snooping between the PLC and HMI the CIP connection will fail when the PLC does an ARP request of who has “0.0.0.0”. Once this happens the TCP connection will not reestablish until the faulted wire is disconnected. I even tried just the unmanaged switch by itself with the PLC and HMI and the same thing happens. I will upload the captures tomorrow when I get in the office, just looking to see if anyone has seen this issue.
What I believe is happening is the ARP table is getting corrupted from the wire fault, but I’m not certain of this nor know how to prove it. I do know that no matter where in the network that I create a wire fault, eventually all CPX’s will do an ARP resquest for 0.0.0.0 and at that point it stop communicating until the wire is removed. Under normal operation, this never occurs. Also packet count and traffic stays low and normal with the wire faulted, no storm or loop back is occurring.
Tomorrow I’ll have some diagrams and captures to help out.
The problem arises when the wire becomes pinched and shorts. Under the right condition the wire can short and bring the whole network down. All CPX and HMIs loose their CIP connection. I have been able to bench test and duplicate the issue and snoop with Wireshark.
When the wire shorts it essentially loops the Tx and Rx (10/100) and creates a ghost device. Snooping between the PLC and HMI the CIP connection will fail when the PLC does an ARP request of who has “0.0.0.0”. Once this happens the TCP connection will not reestablish until the faulted wire is disconnected. I even tried just the unmanaged switch by itself with the PLC and HMI and the same thing happens. I will upload the captures tomorrow when I get in the office, just looking to see if anyone has seen this issue.
What I believe is happening is the ARP table is getting corrupted from the wire fault, but I’m not certain of this nor know how to prove it. I do know that no matter where in the network that I create a wire fault, eventually all CPX’s will do an ARP resquest for 0.0.0.0 and at that point it stop communicating until the wire is removed. Under normal operation, this never occurs. Also packet count and traffic stays low and normal with the wire faulted, no storm or loop back is occurring.
Tomorrow I’ll have some diagrams and captures to help out.