[Logix] Intermittent comms issue (ethernet/ip) between Control and Compact Logix PLCs

defcon.klaxon

Lifetime Supporting Member
Join Date
Feb 2015
Location
Far NorCal
Posts
616
Hi guys,

I'm having an issue with a comms failure issue but I can't figure out the root cause. I've narrowed down the issue a good bit, and I'm hoping someone with no experience with Control/CompactLogix PLCs might point me in the right direction.

I'm testing code, HMI connectivity, and radios here (basically a proof of concept) at my office. I have one ControlLogix PLC, and one CompactLogix PLC. The CompactLogix PLC is a remote site and the ControlLogix PLC is the local PLC that connects to the HMI. I'm using MSG instructions at the ControlLogix PLC to both send data to and receive data from the CompactLogix PLC.

I'm monitoring the connections at the ControlLogix PLC; there are several sites set up in the project (water wells), but I only have one CompactLogix PLC as a spare so I should have one good connection, and two bad. And this is true; I'm testing the connections with a GSV instruction getting EntryStatus from each site, then masking to isolate the one connection bit. The masked bit is 4 if the connection is good, 3 if it's attempting to connect, and 1 if it's failed. I am also using a GSV to get FaultCode.

Here's the weird thing: if I watch the EntryStatus data, I can see that the data is changing very quickly; the raw data (before masking) regularly indicates a lost and re-established connection, but it's so fast that the masked data never changes. Same with FaultCode; it's 0 most of the time but it'll very quickly flicker to 516, then back to 0. My MSG instructions get DN outputs, which is even more strange.

Originally I thought it was the radios, so I eliminated them from the system entirely, and plugged the PLCs both into my desk's switch. Comms issues persisted, so it's not a radio issue. I then thought maybe there was some odd reason why the switch might be the issue, so I plugged the CompactLogix straight into the ControlLogix, and monitored from my laptop to both PLC's USB ports to eliminate any other ethernet issues; comms issues STILL persist! When I change a value in either remote or local PLC, it takes MINUTES for the data to get to the other PLC, if it ever does. However, changing data in the HMI (before I went to no switch) instantly shows up on the local PLC, so the issue is definitely the comms between the PLCs.

At this point I have to imagine that there is a comms/scan/something setting somewhere that I'm not properly setting. When I first began this project I did a quick check of comms through these radios and it worked great, but I was only passing one piece of data at a time, and the rest of the code wasn't written. Now that I'm trying the code I'm going to deploy in a couple weeks, everything is now broken. I did come across a document that said a 516 code means that the connection timed out after 4xRPI, but two things are odd about that: first, the RPI setting seems to be greyed out every time I see it so I wouldn't even know how to adjust it, and that doesn't necessarily explain why the code flickers instead of being solid, like it's making connections sometimes, but not all the time.

Any thoughts on what I should look into for these comms between the two PLCs would be greatly appreciated, I'm not what else to look for since the issue seems to be quite transient, but consistent. Thanks a million.
 
Last edited:
Hi guys,

Just wanted to provide a quick update for what I've found so far:

First off, the faultcode I'm getting is very rarely mentioned in decimal form (516) but is VERY common in hex form (16#0204). This is providing a lot more info. Sounds like it's a timeout error, which is defined as 4xRPI. The RPI for my ethernet/ip module is greyed out; I have found one person who said you have to delete the module and then create it anew, as that is the only time you can adjust the RPI. I'm hoping there's another way around that, but if I can't find anything I'm going to give it a shot. If anyone can confirm or dis-confirm what I've found so far I'd really appreciate hearing it. I'll keep this thread going as I figure things out regardless in case anyone stumbles across it.
 
I remember you posting early on about the differences between Produced/Consumed and MSG instructions, and that you were designing a radio system.

Are you using both Produced/Consumed Tags and MSG instructions ?

It sounds to me like you've got something crossed up and you're Setting something you should be Getting.

Do you have managed switches that will let you mirror a port and capture some traffic ?

I agree that I'd have to see the programs to understand what you're trying to do.
 
I assume you are using produce/consume tags in v18 or later.

I remember you posting early on about the differences between Produced/Consumed and MSG instructions, and that you were designing a radio system. Are you using both Produced/Consumed Tags and MSG instructions?

Hi guys, thanks for responding. Ken you remember correctly, I was asking about this awhile back. To answer both your statements, I was planning to use produce/consume tags and MSG instructions both; MSG for remote sites on radio, p/c between local panels. However, I haven't implemented p/c tags for this test so I'm only doing MSG instructions. Further related, we have a consultant that reviewed my code who has more experience than me, and he told me that his company (a pretty large one) avoids p/c altogether and solely use MSG instructions, so while he didn't outright say it would be "better", he did say he's never used them so now I'm not sure which would be better; p/c seem a lot easier to implement, but I'm worried that his word will now carry weight about how we should avoid them, even though they seem pretty straightforward so long as you get your RPI times sufficiently adjusted.

Anyway, sorry to get off topic. That all to say, the code I'm testing right now is only MSG instructions.

Do you have managed switches that will let you mirror a port and capture some traffic?

That's a great question, and I'm not sure of the answer. I'll look into it.

I agree that I'd have to see the programs to understand what you're trying to do.

I'm currently attempting to upload my code, I'll post a link once I get it all sorted.

As far as how I've set up things, I followed/copied the code from 1756-PM012F-EN-P, the Logix5000 Controllers Messages Manual so in theory, it should be pretty straightforward but there's nothing like getting your eyes on actual code. I'll link ASAP.

Thanks!
 
Last edited:
I have a system running 8 L32E's and 1 5555 ControlLogix. All L32E's are connected via Prosoft RadioLinx. For the first 2 years I ran P/C tags and we had HORRIBLE luck with comm's. I switched to CIP reads and writes via MSG instructions and we've had almost no issues for the last 4 years. BUT....

After a recent Radio Upgrade I had 2 systems that the CLX was having trouble MSG'ing (all MSG instructions are done in the 5555). About 60-70% of the time the MSG instructions were in ER state for those 2 systems. I replaced radios, antenna's, switches and cables at the L32E side and did everything I could think of to solve the issue. In the end I discovered that the firmware on those two units was an older rev than the other 6 units. I updated the firmware and the problem vanished.

I haven't had time to research the WHY's of that situation, but I have a pretty good guess. One of the L32E's in question was running 13.x and the other was running 15.x all the others were at 18.x or above.
 
I have a system running 8 L32E's and 1 5555 ControlLogix. All L32E's are connected via Prosoft RadioLinx. For the first 2 years I ran P/C tags and we had HORRIBLE luck with comm's. I switched to CIP reads and writes via MSG instructions and we've had almost no issues for the last 4 years. BUT....

After a recent Radio Upgrade I had 2 systems that the CLX was having trouble MSG'ing (all MSG instructions are done in the 5555). About 60-70% of the time the MSG instructions were in ER state for those 2 systems. I replaced radios, antenna's, switches and cables at the L32E side and did everything I could think of to solve the issue. In the end I discovered that the firmware on those two units was an older rev than the other 6 units. I updated the firmware and the problem vanished.

I haven't had time to research the WHY's of that situation, but I have a pretty good guess. One of the L32E's in question was running 13.x and the other was running 15.x all the others were at 18.x or above.

That's a good thing to check; all my code is running in 24.x so I don't think that's the problem here. This is just two PLCs on my desk at the moment (I'm heading down to integrate in a few weeks).
 
Another update:

I've stripped the local/comms "master" code down to just have the one remote site; no other local PLCs in the project, and no other remote sites. This has solved the problem it seems; I have a solid connection with no flickering faultcode. Thus, the problem has to be somewhere in the portion of the project where I've added other sites. I'm going to work through and slowly add things back, see if I can further isolate the problem.

If anyone has suggestions on changing the RPI for the ethernet/ip module to avoid timeouts just for reference, I would certainly appreciate that.

Thanks guys.
 
Sorry if I’m missing something but I’m not clear on whether or not you are using produce/consume. RPI is a P/C setting and won’t have any impact on message instructions except that if you have a P/C communications session running it will occupy bandwidth.
Having said that, your approach of backing up and taking things one step at a time is a good one. Keep in mind that if you’re going to have multiple panels communicating with each other and each panel has multiple devices in it that are communicating with each other you will need a good quality managed switch in each panel. You want to limit the communications between panels to only the data that is meant to do that.
As to your message instructions, I’d recommend staggering them so that only one is enabled at a time. There are several ways to do this but I use the method in the image most of the time. You can easily add code to skip a message if the need arises and if you’re feeling really skilled you can do things link writing routines that eliminate a message instruction that errors a certain number of times etc…
Lastly, what radios are you using? Not all wireless is the same and different radios will perform better than others. On the surface that sounds obvious however the point is that when dealing with P/C some radios can be made to work reasonably well while others simply won’t get the job done and some P/C setups will work “OK” on most wireless connections and other setups won’t work at all on most (if not all) wireless connections. The person who said that they won’t ever us P/C is a little far to the extreme but when working with wireless networks, my position is that message instructions should always be the first consideration and P/C should only be looked at if absolutely necessary and the conditions are compatible with the wireless network and vice versa


cplg_stagger%20copy_zpsigc3q5pa.jpg
 
Sorry if I’m missing something but I’m not clear on whether or not you are using produce/consume. RPI is a P/C setting and won’t have any impact on message instructions except that if you have a P/C communications session running it will occupy bandwidth.

Sorry for the confusion. Let me lay out what I'm talking about, let me know if this clears things up. And of course let me know if something I've said is incorrect, this is still a pretty big learning curve:

1. I am not currently using p/c tags, no.
2. According to documentation about the faultcode I was getting (16#0204), the faultcode is for connection timeout. The timeout is defined as 4 x RPI for the ethernet/ip connection.
3. I understand that RPI is for p/c tags, but from what I've read (and I could very well be incorrect) that RPI also is involved with ethernet/ip connections. When I click on the Properties for the ethernet/ip module in the I/O Connections section of the project (and click on the Connection tab in the Properties pop up), there is a field for the module's RPI but it's greyed out. I can't imagine that it would be there if it wasn't some part of the ethernet/ip process. Here's a link that frequently mentions RPI when talking about setting up ethernet/ip:

http://www.deltamotion.com/support/...ls/EtherNetIP/EtherNet_IP_I_O_Performance.htm

Maybe I'm misunderstanding what this article is talking about; maybe the reference of RPI is for the I/O cards in the same backplane? Doesn't *seem* like it, but again, this is all pretty new to me.

This link (see page 70) seems to make it more clear that RPI is only in reference to "I/O"; further, it states that explicit data (which includes MSG) is not bound by RPI.

http://literature.rockwellautomation.com/idc/groups/literature/documents/rm/enet-rm002_-en-p.pdf

The amount of "google-fu" it took to get to that answer was pretty enormous! But it seems like I'm confusing RPI requirements.

All that being said, it sounds like there was something set incorrectly in the I/O Connections and/or MSG manager. Let me investigate those specifically and see if I can isolate the issue I'm having.

Thanks again for the time everyone has taken for addressing this problem, it's been very helpful on this end.
 
RPI stands for “Request for Packet Interval” and is something set when a device is a “producer” (sorry if that is remedial). The Ethernet card local to the processor isn’t a producer so the RPI setting is grayed out (there’s nothing to set). On page 70 of the ENET-RM002C… document “For example, a local EtherNet/IP communications module does not require an RPI because it does not produce data for the system but acts as a bridge to remote modules”. The only reason I can think of why there is the RPI box (grayed out) is that there must be certain applications where an ENET module acts as a producer rather than a bridge.
 
RPI stands for “Request for Packet Interval” and is something set when a device is a “producer” (sorry if that is remedial).

Not at all, I appreciate a full explanation so that I can check what I think I know and make sure it's accurate. In this case I am aware of this, but it's good to double check.

The Ethernet card local to the processor isn’t a producer so the RPI setting is grayed out (there’s nothing to set). On page 70 of the ENET-RM002C… document “For example, a local EtherNet/IP communications module does not require an RPI because it does not produce data for the system but acts as a bridge to remote modules”.

This makes sense, yeah.

The only reason I can think of why there is the RPI box (grayed out) is that there must be certain applications where an ENET module acts as a producer rather than a bridge.

I *think* this is explained in the same document on pages 70 and 71; it seems like certain data (called "implicit") does rely on RPI, and implicit data includes I/O and also motion control and safety data. But yes, for my MSG (which are explicit) then RPI doesn't apply.

I'm fairly certain that with this issue figured out more, I will be able to track down the exact reason things aren't working. Will report back with my findings, and thanks again for your time.
 
For the record, "Implicit" is a Produce/Consume (technically known as "Ethernet/IP UDP") and "Explicit" is message based connection (technically known as "Ethernet/IP TCP”). As the technical names imply Produce/Consume is a UDP connection and Explicit is a TCP connection. One of the big differences is that a TCP connection is a connection that is opened, data is transferred and then the connection is closed. UDP is a connection that once opened remains that way continuously for “real time” data exchange. Produce/Consume was created for “real time I/O” and it is much faster at reading/writing I/O. The problem with it with regards to wireless is that it uses packet count information to determine the health of the connection specifically the number of packets lost in a given time frame (calculated via “4xRPI”. There’s more to it than that but it’s over my head). If you drop more than 4 packets in a specific time it will decide that the connection is broken and reset it. No matter what kind of wireless you’re using you will always lose packets which is why Rockwell says not to use Produce/Consume wirelessly. However if you can keep the number of tags low and tolerate high RPI times, depending on the radios speed, bandwidth and quality of connection you can make it work.
It sounds like you’ve got a good handle on where to go from here. Please do let us know what you come up with and good luck.
 

Similar Topics

I had a comms fault between my VFD and Controller (5069-L320ERS2) that started about a month ago and happened maybe once a day to now where it...
Replies
1
Views
242
Hi everyone, I hope I don't butcher this up, please feel free to critique me wherever if I do, I have an issue I would equate to "chasing...
Replies
4
Views
241
Hi folks, I'll try to keep this as short as possible, also I don't expect anyone to go through troubleshooting the whole issue on here as there's...
Replies
11
Views
6,112
I have a micrologix 1400 that will fault out on error code 71h every once in a while, I am talking months in between fault. Now, sometimes I can...
Replies
6
Views
12,280
Hi everyone, I have a problem with a new machine I programmed about a month ago. It is up and running but every day or two the PanelView decides...
Replies
15
Views
7,857
Back
Top Bottom