modbus TCP error invalid data received by M340 io scanning

V0N_hydro · Jul 17, 2014

I have two PLCs connected by gigabit ethernet over fibre, PLC0 has NOE0100 ethernet module IOscanning PLC4 with 100ms repetitive rate. both are BMX P34 2020 CPU programmed with unity 6.0.

PLC4 detects conditions which are then latched in PLC4, the output of the latches are packed in to a word using BIT_TO_WORD, PLC0 IO scanner reads the word, and in PLC0 the bits are written back to bools using WORD_TO_BIT so PLC0 can take the appropriate action based on which bits are true.

I have been treating this process as if it were a wire - what goes in one end comes out the other - but I have found that the bits have been SET in the PLC0 without being latched in PLC4. nothing writes to the bits in PLC0 except the WORD_TO_BIT block.

I have verified the bits have been received in PLC0 without being sent from PLC4 by creating a latch for each bit in PLC0 - if it is ever set then the latch in PLC4 should have been set as well. I have found the latches set in PLC0 without being set in PLC4.

I know this error has happened at least twice in the past 2 months as it has caused nuisance shutdowns.

as a temporary remedy I have added delay prior to taking any action based on the received bits in PLC0 so that if the bit is set for a single 100ms IO scan cycle it doesn't kick off an undesired sequence of events.

In the past three months there has only been "no comms" status from the IO scanner a few times for less than a second each time, usually which coincides with building-on-line changes of the plc program, so it isn't an intermittent ethernet connection.

Has anyone ever experienced invalid or erroneous data being received on modbus TCP? The only time I have seen anything like that was multi drop modbus serial with multiple masters ( i know this is a terrible design and they moved away from it eventually) sometimes a messages can get mixed up and a master receives a response to the wrong request.

I suppose any modbus master on the ethernet network could connect to PLC0 and SET the bit as well by issuing a write coil or writing to the word itself with write holding register modbus command.

thanks for any ideas on similar problems encountered or for further troubleshooting to confirm how these bits are getting set

danw · Jul 17, 2014

No, I don't run Modbus TCP at that rate or volume and I don't claim to be a statistics guy, so I just went and Googled probability of undetected bit error in ethernet packet.

In the page at http://noahdavids.org/self_published/CRC_and_checksum.html
there's a statement, "In Performance of Checksums and CRCs over Real Data Stone and Partridge estimated that between 1 in 16 million and 1 in 10 billion TCP segments will have corrupt data and a correct TCP checksum."

I don't know whether the guy's right or not, but at the 10 writes/second rate that your traffic is running, it only takes 18.5 days to run up 16 million packets.

10Hz writes = 600/min * 60/hour * 24hr = 864k writes/day
In 18.5 days, Ethernet traffic is 16 million packets.

"Twice in the past 2 months", 60 days, is less often than once in 18.5 days.

robertkjonesjr · Jul 18, 2014

What is the health timeout on the scanner, PLC0? Do you manage the health bits in anyway, especially in regards to fallback? Are you reading and writing, or just writing?

What is the scantime of the server, specifically the MAST task? The RepRate is valid only if the server is fast enough: IOScanner will issue a query and then wait for a response, and will issue a new query after 100ms (the reprate) IF a response came back... so actual performance is tied to server behavior too. If you were scanning a hot standby Schneider PLC then scantime is likely over 100ms, but likely is less if using a BMX as it does not support hot standby features.

When you have a general down event that you describe, do you know how long these are? I used to implement a little array of debug info for IOScanner that included a number of rows, each that would be populated with: Date/time of event, duration of an event. An event to trigger entries in the array was health bit dropping, but then I would know when and for how long we were down for, and the amount of time that you are actually down can, sometimes, help lead to a root cause (but not always). As well as the timestamp can help tie the issue to external events, or index into a packet trace to find the issue if not readily apparent.

I did observe a bug in several Schneider devices that wrong Modbus data would be returned over ModbusTCP when acting as a server. However, it was very rare and had very specific set of conditions to trigger and IOScanner cannot trigger this particular issue, though that was some time ago and is likely fixed anyway.

V0N_hydro · Jul 18, 2014

danw, interesting thanks for pulling that up. What I take away from that is any time I program an action which is a result of some inputs that are over communications I should have some time delay on the action-side of the communication link to make sure transient values don't trigger the action.

Robertkjonesjr,
I assume for modbus the client the master and the slave the server. so PLC4 is the server and PLC0 with the IOscanner is the client.

for PLC4 in the PLC0 IO scanner the repetitive rate is 100ms, the health timeout is 500ms, the read is 98 registers and the write is 100 registers.

the scantime of the mast task on PLC0 is min/current/max 9(4)/11(5)/20(15). the help file says the time in brackets is the overhead time. the CPU ethernet port is handling 2-5 messages per second. the NOE0100 module is handling min/avg/max 448/457/484 messages per second and has 9 open connections.

the PLC4 (which is the modbus slave) MAST scan time is min/current/max 11(2)/13(5)/22(13). the ethernet port has 6 open connections and min/avg/max messages per second is 166/167/183.

I have set up a block in PLC0 to record the duration and date/time of health bit = false. I have observed usually any outage corresponds with building-on-line-changes of the program. for PLC4 there are 2 outages recorded one of duration 84ms and another of duration 123 ms and the dates don't correspond with any nuisance shutdowns. So the health bit has only dropped out twice in the last 2 months.

One thing I have noted with the health-bit-outage-recording-block is that one PLC is in the middle of a "ring" so data must traverse 3 ethernet switches to get to this PLC. it has far more drop outs of the health bit than the other 3 PLCs which only go through 2 switches. I am using star topology going forwards.

modbus TCP error invalid data received by M340 io scanning

V0N_hydro

Member

danw

Lifetime Supporting Member

robertkjonesjr

Member

V0N_hydro

Member

Similar Topics