Mitsubishi Modbus quits responding

oceanwanderlust · Jun 1, 2020

I have about a hundred Mitsubishi Q Series PLCs with Modbus/TCP cards which I talk to from a custom SCADA system.

Lately, occasionally, the Modbus card in ~30% of them will just lockup. The PLC will continue to function, and the Modbus card is ICMP Pingable, but when I try to start a TCP connection to the Modbus port, it just sends back a RST ACK as shown in the attached Wire Shark screen shot. Hard cycling the PLC power seems to clear up the issue.

Has anyone else experienced this? Is there anything else I can look at in the Mitsubishi Studio to find out why the Modbus card is not communicating? Could this be caused by a large amount of network traffic? (mostly broadcast)

thanks

joe

parky · Jun 1, 2020

This is what wireshark say about it
Spurious retransmissions are packets that are sent again even though their content has already been ACKed. That usually means that you're capturing on the receiving end of the connection, seeing data arrive twice and most likely means that the acknowledge packets for those data packets have been lost on the way to the sender. I doubt it is the plc's it does sound more like some problem on your network. but with about a hundred nodes it may be tricky. Is 237 the Scada ?

lfe · Jun 1, 2020

In some cases this happens because successive TCP connections are opened without closing the previous ones, until the limit is reached.

See if the card has a configurable timeout to close the unused connections.

oceanwanderlust · Jun 1, 2020

172.17.233.5 is the SCADA
172.17.233.236 is the Q-series PLC

The network capture is done from a Meraki Managed Switch. There is a direct connection from this capture switch to the PLC but a small unknown topology from the capture switch the the SCADA.

parky · Jun 1, 2020

I am inclined to agree about open sessions, there are many options but I have never used Modbus on Mitsi I consider it an old (although well used) protocol.

parky · Jun 1, 2020

Are you implying that at that time only one PLC has "LOCKED" coms as you state about 30% of a hundred are causing problems, is it just one that seems to lock each time rather than many at the same time?.

Ken Roach · Jun 1, 2020

Thanks for the IP address clarifications and the location of the mirror/tap.

What OS is the SCADA computer running ?

I see an ordinary 4 seconds between TCP connection attempts (SYN) which are properly rejected (RST,ACK) but about 0.5 seconds between SYN packets that are "retransmitted".

My guess is that a lot of packets are being lost between the PLCs and the SCADA computer, and when there's an ordinary timeout the SCADA computer can't "tear down" the TCP connection and the PLCs end up with an abandoned TCP connection on Port 502. They're rejecting additional connections from that SCADA computer.

It would be interesting to see if a different host (a diagnostic PC or engineering workstation) can make a TCP connection to Port 502 during the time when the PLCS are rejecting connections from the SCADA computer.

Are there any wireless or "redundant" links in this network ? Obviously, the classic question of "what changed ?" is critical.

If there are no filters on this Wireshark traffic from the PLC's port, then it doesn't look like the PLC is being exposed to large amounts of broadcast traffic, but maybe the switches or routing equipment is.

A Wireshark listening port at the "master" side is the next diagnostic step I would take.

oceanwanderlust · Jun 1, 2020

Approximately 30 of my approximately 100 Modbus PLC cards are currently locked up in this state.

The SCADA OS is Windows Server 2003.

There is a MAC filter on my capture which is removing ~120 broadcast packets per second.

The first time this issue happened last year, there was a physical routing loop and broadcast packet storm. I'm not sure what's different now since shutdown; the traffic is not quiet but not egregious.

I have another simpler client server application in this same factory and have been able to clearly prove that random occasional packets are lost between the machines and floor. However, I'd expect the PLC to recover, not lock up.

When the PLC is seemingly locked up, I have tried using 2 other Modbus implementations (Qmod and Schneider Electric) from my laptop instead of the server, and still get the same RST ACK response.

I just tried disabling the switch port for one hour. I had hoped any TCP ports would be recycled after 10 min keep alive and/or 10 sec timeout. However when I re-enabled the switch port, the PLC immediately started responding with RST ACKs again.

I can do a capture at the server and switch in the morning, but I have not had a lot of luck following the actual TCP stream myself.

Thankyou all for being a sounding board and contributing ideas...

joe

oceanwanderlust · Jun 1, 2020

The top screen shot is a capture from my server and the bottom is from the network switch (which is directly connected to the PLC). Time is apparently not synchronized, but the visual capture windows should line up.

I don't know why the server sends the same SYN again after it clearly receives the RST ACK.

I'm wary of the checksum error, but that's supposedly normal if its calculation is offloaded.

Ken Roach · Jun 2, 2020

If you can, ZIP and attach (or post to another file storage site) the Wireshark captures themselves. It will make it a little easier to see the details.

On the SCADA client side, I see the computer trying three times to make a TCP connection to Port 502, then waiting four seconds to try again. It's denied every time, which is consistent with your characterization of the PLC being "locked up" to Modbus/TCP.

You can tell these sequences of events apart not just by the timestamps, but by the "ephemeral source port" number: 56886, then 56946, then 57011.

I can't explain why the SYN packets are normal on the SCADA side, but marked as "spurious" on the PLC side. If these are running on different versions of Wireshark the default settings and parsers might be different.

One thing that's notable is that the SCADA computer's TCP Window size is large; 65535. But the Mitsubishi PLC is sending its RST/ACK with a TCP Window size of zero. That generally indicates that the PLC's TCP receive buffer is full and it wants the other device to wait. But the other device is "impatient" and tries again quickly.

This doesn't explain why the PLCs are "locking up", or how to get them out of that state without rebooting them entirely.

I think you're going to have to set up Wireshark sessions (on both ends if you can !) and reboot one of these PLCs.

And, if you get a chance, disconnect one of the PLCs that is working correctly, and watch how the SCADA computer tries to reconnect to it.

It would also be interesting to set up a "honeypot" that has a Modbus/TCP server running on it at an unused IP address, and see if anything on the network tries to connect to it. What you're seeing might be a simple denial-of-service attack against Port 502.

oceanwanderlust · Jun 2, 2020

Ken Roach said:
One thing that's notable is that the SCADA computer's TCP Window size is large; 65535. But the Mitsubishi PLC is sending its RST/ACK with a TCP Window size of zero. That generally indicates that the PLC's TCP receive buffer is full and it wants the other device to wait. But the other device is "impatient" and tries again quickly.

I'm puzzled, because if the receive buffer is full or it is out of connections, disabling its network link for an hour *should* have fixed it.

BTW, I did netstat the SCADA yesterday and there were not any extra connections open to the PLC.

Ken Roach · Jun 2, 2020

I agree that if this was an ordinary buffer overflow that disconnecting the PLC would allow it to reset. But it could be a condition that "crashed" the Modbus/TCP Server feature.

You've verified the Melsec Q series module is responding to ICMP PING, and that it's rejecting connections from your SCADA computer on TCP Port 502.

Have you tried to connect to it with other tools and ports ? I don't know if these modules have a diagnostic webserver.

What about GX Developer ? I think it uses UDP port 5000.

I've seen some embedded devices "crash" when exposed to broadcast or multicast traffic, though a few hundred packets/second shouldn't be too great a challenge. Can your switch filter out that traffic ?

oceanwanderlust · Jun 14, 2020

Here's the screenshots of the error log.

7461 - TCP forcibly disconnected
730A - Default parameter mismatch
73DA - TCP keep alive

I'm not sure which one to focus on, or which are cause or effect.

joe

parky · Jun 15, 2020

I suggest you read the manual section 6 page 12
MODBUS/TCP Interface Module User's Manual

Here:
https://eu3a.mitsubishielectric.com/fa/en/dl/1694/158847.pdf
This talks about user settings versus defaults

oceanwanderlust · Jun 16, 2020

Severity of Default parameter mismatch?

What are the chances that "730A - Default parameter mismatch" is more than a benign warning?

We don't do anything fancy with these modbus cards (just polled TCP slaves) so any custom settings in the PLC are probably by accident.

I don't have access to the PLC, so I have to convince someone else to fix it, and they are already a little short with me.

joe

Mitsubishi Modbus quits responding

oceanwanderlust

Member

parky

Member

lfe

Lifetime Supporting Member

oceanwanderlust

Member

parky

Member

parky

Member

Ken Roach

Lifetime Supporting Member + Moderator

oceanwanderlust

Member

oceanwanderlust

Member

Ken Roach

Lifetime Supporting Member + Moderator

oceanwanderlust

Member

Ken Roach

Lifetime Supporting Member + Moderator

oceanwanderlust

Member

parky

Member

oceanwanderlust

Member

Similar Topics