1769-L36ERM Communication Issues

dmroeder · May 20, 2021

bradal said:
As part of troubleshooting, a while ago I started setting devices away from auto negotiated and to 100mbps and full duplex. At one point they were all auto.

How would PLC react to being set 100/Full forced, if VFD etc. was auto? Does anything even really talk in half duplex unless set that way?

I have seen nothing but problems when set to forced 100MB full duplex. Problems like what you are experiencing. Two exceptions:

Devices in a device level ring (DLR) need to all be set to 100MB Full Duplex
We had a specific brand switch that would goof up the negotiation to the PLC only, so we had to set those to 100 MB Full Duplex.

Other than that, we set everything to auto, we have 0 packet errors.

ronny_resistor · May 23, 2021

We used a bunch of automation direct stride managed switches. L35 to strude to two Parker drives. Everytime u unplugged the drives they would not restart their comms. I changed the cable and went direct from port two to one of the drives and no problems. I then got an unmanaged allen bradlet switch and this resolved my issues. No more hang up. I never used managed switches before in any of our systems but these 400 stride switches have been a real pain.

bradal · May 31, 2021

Help!

Hey guys, some more information for you. I managed to setup some better logging in the PLC. I am monitoring the status register in the PLC for that specific I/O link. I have set all devices back to auto negotiate.

100-Remote I/O AENT Module
101-Remote CPU 1769-L24ER
101-Remote CPU 1769-L24ER

I am seeing now that I get the status register showing "shutting down" status 5 before cycling. See the list of statuses below.

0 = Standby
1 = Faulted
2 = Validating
3 = Connecting
4 = Running
5 = Shutting down
6 = Inhibited
7 = Waiting
9 =Firmware Updating
10 = Configuring

Keep in mind these statuses are in the main master PLC and the devices are configured in the I/O. I have attached two screenshots. One of just the main remote I/O connection failure which was the only device down and then a previous time where all devices I am monitoring went down.

bradal · May 31, 2021

Updated Ethernet page screenshots

Ken Roach · Jun 1, 2021

It took me a bit to figure out your first two screenshots of are your datalog of time stamps, the EntryStatus attrirbute of the Module class (highest 4 bits), in descending order of time.

It's normal for a connection to go through a sequence of logical status states when it breaks and re-makes. Unfortunately they're not in numerical order of their normal operation.

The sequence you see for node "100" when its connection fails and then restarts is 5, 0, 1, 2, 7, 3, 4.

Under nominal operating conditions, all connection Status values are 4. That's how most folks write trap/detection logic to tell if they're broken. So the state was probably "4" prior to 2021-05/31 03:12:04.

[a timeout or other failure occurs]
5 Shutting Down
[...] six seconds elapse
0 Standby
[...] three seconds elapse
1 Faulted
2 Validating
7 Waiting
3 Connecting
4 Running
[and it runs for a while]

Your screenshot Capture-Comm3.png shows that the problem isn't a physical problem with the cables from the CompactLogix ethernet ports to a switch, because all the media error counters are zero on both ports.

The screenshot Capture-Comm4.png does show that there have been some missed packets in the connection to 10.10.57.86.

But those 24 "Missed Rx Packets" aren't much in the scheme of things, with a connection that's been producing data every 100 ms for 7 hours. 10 packets/second x 60 seconds x 60 minutes x 7 hours = 252,000 packets with just 24 missed Rx packets.

It might indicate a short burst, or an occasional loss. You can't tell from that diagnostic screen because it doesn't timestamp each lost Rx Packet. But because the connection is still up, it wasn't enough to cause it to fail (generally 4x in a row).

bradal · Jun 1, 2021

Yes! Thanks for the reply.

In essence you could think of the '4' value in the status as actually the first in line. It keeps that 4 for a long period of time and then changes to 5,0,1,2,7,3 and back to 4 again. Just logging change of state with timestamp

Just as an update, I was having packet losses on two of my secondary CPU's, 1769-L24***. After some digging I noticed the I/O setup had a 1769-L33E in the config instead of the new 1769-L36ERM that we put in to accommodate more connections. We no longer have packet loss, something I was not expecting as I assumed the controller type would be somewhat transparent and communicate the same. Communications were working normally as well.

Also, I inhibited the 10.10.57.86 device in the I/O for now as we do not require that status feedback at this time.

After all of this.....we had another failure logged like shown in screen shot.

Ken Roach · Jun 1, 2021

1769-L33E in the config instead of the new 1769-L36ERM [...] We no longer have packet loss

I would bet you a box of donuts that is a coincidence. Produced/Consumed Tag connections have been independent of firmware revision and controller type since they were introduced in the late 1990s.

I will have to go back and read some of this thread history. You've got a VLAN and apparently are experiencing loss of connections to all the devices in the I/O tree at once, which suggests a reset, purge, or breaking of something that connects two networks.

Go ahead and re-post (or send to me via PM) the filesharing link for the network architecture.

A CompactLogix with 48 of its 64 CIP connections taken up by I/O that suddenly runs out of connections makes the INGEAR drivers a prime suspect. I presume the ERP system monitors its connection to the CompactLogix as well ?

bradal · Jun 1, 2021

See link below. We have all PLC and process equipment including flow meters and VFD's on the 57 VLAN. The ingear driver is driven from our main server on 10.10.60.10 if I recall correctly.

http://www.filedropper.com/baytownnetworkdiagramarchitecturerev2b-shareable

Ken Roach · Jun 1, 2021

Thank you for that file.

That's a pretty big network, with some fairly sophisticated switches and routers. The extensive cameras on the network are also the sort of traffic I would want to examine carefully.

Do you have any access to network support engineers or technicians, or are you pretty much it because the disruption is mostly affecting the control systems ?

Do you know if there have been interruptions in the video system, or any other parts of the network ?

bradal · Jun 1, 2021

We do have a dedicated IT person that handles all of our network infrastructure but his background is more in true IT not control systems. My background is control system programming, configuring and commissioning but on this project and for a number of years we have been growing a team of people (roughly 10) dedicated to PLC and software development. This project was poorly put together originally and quickly executed within a sister company and I have been left figuring out issues and getting back to my AB roots here.

I have been working with our IT guy and lead software developer (all "SCADA" development is custom C# programming) to install some diagnostics. In reality, I am left researching my own tools, hence coming back to you guys who know your stuff, for help.

Add to the fact I am in Alberta, Canada and the site is in Houston, Texas, throw in a little covid and it becomes a struggle. My next step, during next idle phase, is literally disconnect every single device except the PLC backbone for a couple days to see if PLC still logs that failure. Adding devices, likely by entire VLAN's initially to narrow it down.

Any simple network diagnostics you can suggest?

drbitboy · Jun 2, 2021

Wow, that is some network. I think your approach using a binary search of disconnected VLANs is your simplest diagnostic. Do any of the losses go over wireless connections (I see some T-Mobile in there)? Also, unmanaged vs. managed switches may make a difference, though I have never understood why.

P.S. The names bring back some memories: I worked on the Cold Lake Flexicoker design at ER&E when it was in Florham Park, a lifetime ago.

P.P.S. typo in the office page: The Meraki MS120-24P SEC-03 is labeled a 48-port switch.

1769-L36ERM Communication Issues

dmroeder

Lifetime Supporting Member

ronny_resistor

Member

bradal

Member

bradal

Member

Ken Roach

Lifetime Supporting Member + Moderator

bradal

Member

Ken Roach

Lifetime Supporting Member + Moderator

bradal

Member

Ken Roach

Lifetime Supporting Member + Moderator

bradal

Member

drbitboy

Lifetime Supporting Member

Similar Topics