Interleaving requests/responses for better bandwidth usage ?

gazroobari · Sep 3, 2020

I'm currently on a research project where I can send tag read & write requests to both SLC as well as CLX controllers.

Presently, there's no focus on performance. Thus, the mode of operation is to open one session, send a request, wait for a response, then repeat as fast as possible as long as there are requests waiting to be serviced.

In other words, on the wire (as shown by Wireshark), it looks like this:

Request 1
Response 1
Request 2
Response 2
Request 3
Response 3
and so on...

This achieves a range of anything from 25 to 42 responses per second, depending on the type of PLC and the mix of reads & writes.

Maybe that's not too bad -- I won't comment -- but there's a school of thought which says that we aren't reaching the levels of performance that Modbus can achieve (perhaps circa 150 responses per second).

Part of the reason MAY be because our code depends on some third-party libraries which abstract away some of the complexity of handling the TCP layer, and which handle request/response matching.

What I want to do, therefore, is try a performance experiment by not depending on the third-party library and come up with a purely native implementation (which happens to be Java in my case). However, the experiment is only worth doing if I know beforehand that both types of PLC will support an interleaved (or you might call it 'buffered') style of request/response queueing where this sort of thing happens on the wire:

Request 1
Request 2
Request 3
Request 4
Response 1
Response 2
Request 5
Response 3
Request 6
Response 4
Response 5
Request 7
Request 8
Response 6
and so on...

Can an SLC or CLX deal with this mode of operation (as long as I ensure that there is only one PDU per each TCP packet) ?

Any replies gratefully received.

Peter Nachtwey · Sep 3, 2020

Modbus does not support asynchronous transfers. DF1 does when connected full duplex and point to point. It has been a long time since I programmed anything in DF1 but if I remember correctly the PCCC layer has a TNS field or transaction sequence number field that can be used to match up acknowledged packets with the request packets.

DF1/PCCC is probably your best option since they are Rockwell protocols but they are old and the chances that the new Rockwell PLCs implement them with a big ring buffer to keep track of the sent packets is small.

This is quite a programming project that I wouldn't want to do

The question I have is why when Ethernet is available?

gazroobari · Sep 3, 2020

Peter,

Hi, and thanks for the reply.

However, you may have misunderstood. My existing code does already follow the standard CIP/DF1/Logix5000 protocols over Ethernet. I also populate the CIP sequence count and/or TNS fields as relevant so as to include a request sequence number which the relevant SendUnitData or SendRRData services echo back to me. That's not the problem....

What I'm saying is that, so far, I haven't managed to achieve a mode (b) whereby the PLC buffers several of my requests before responding. I only ever see a mode (a) which comprises request/response pair, followed by another, followed by another, etc. And I'm wondering whether, by operating in mode (b) rather than (a), the network can be better utilised (i.e. less latency). Could be that PLCs don't work like that, or that TCP itself doesn't work like that -- that's the question I was asking really.

Thanks !

Ken Moore · Sep 3, 2020

I have zero proof, but I would suspect it is always going to be send/receive like you show in your first example. You have to consider the PLC may have multiple request from multiple sources, so it makes sense (to me anyway) to handle the request / response as one transaction to client-1 and then another for client-2. Back in the old days, we used to try and pack as much information as possible into each transaction to "optimize" the traffic.

drbitboy · Sep 3, 2020

Ken Moore said:
You have to consider the PLC may have multiple request from multiple sources,

That may be the answer: fork a process for each request/response, then things will happen asynchronously, and the CLX should treat each process as a different client.

I did something similar to download over half a terabyte of files recently, forking 16 processes, where each process downloaded a unique subset of the files; it cut the download time from over two days to 11hours; the overall download rate was 100MBits/s over my ISP+home wifi.

Ken Roach · Sep 3, 2020

Why would the PLC wait for you to send more Requests before it began replying to them ?

In general, if you want more data in one packet, you should read arrays or UDTs. Creating an optimized assembly of tags is a more sophisticated technique that your average purpose-built driver can't do but that RSLinx/KepServer/Wonderware/etc do as a matter of course.

Are you using CIP Connected Messaging or Unconnected messaging ? One way you can tell, if your library doesn't explain it, is by looking at the Class 3 connections in the controller/Ethernet module embedded webpage. With unconnected messaging you can be hammering the heck out of a controller but the Class 3 connections list remains empty.

I wonder if the library you are using is opening a fresh CIP connection every time you send a request, then shutting it down immediately afterwards.

If you could ZIP and post a PCAP file that might be helpful.

Peter Nachtwey · Sep 3, 2020

Why can't people ask a proper question and...

tell us what they are really trying to achieve?

drbitboy said:
That may be the answer: fork a process for each request/response, then things will happen asynchronously, and the CLX should treat each process as a different client..

Yes!

However, gazroobari has not told us about the topology of the connection.
He has not told us the bit rate that may be the main bottle neck.
I don't want to waste time playing 20 questions.

gazroobari, needs to realize that PLCs are doing many other things at the same time. gazroobari should get two Raspberry PIs and try communicating with each other to find the practical limits of communications. The theoretical limits can be easily calculated.

We typically open a new thread for every connection on Ethernet.
We do the same for each serial port if we have them. A thread has a ring buffer of packets and each thread may be using a different protocol. Packets can be sent without waiting for the previous N packets to be acknowledged. When packets are received the TSN or packent number is matched with a sent packets TSN or packet number to process the acknowledgement and remove the sent packet from the ring buffer.

None of this is new. TCP/IP can do this.
Like drbitboy said, there are copy programs for the PC that do this. I use Teracopy. These streaming copy programs are good for when the response times are slow.

Doing this with a serial port seems like an exercise in how to do asynchronous communications but it is hardly worth the effort with Ethernet around.

BTW, our motion controllers can translated between many different PLCs each speaking its own protocol. The Red Lion Data Station does the same thing only cheaper since it isn't trying to do motion too.

gazroobari · Sep 4, 2020

Further clarification/background

Guys,

Thanks for the responses so far.

It seems I have ruffled a few feathers - sorry to those I've frustrated; I'm not deliberately trying to be coy, though you will have to trust me when I say that I cannot reveal the third-party library I've been working with.

The driver is purely Ethernet based - no serial comms. During coronavirus, I'm working from home, whilst the four PLCs that we have for testing are in the office 100 miles away. All PLCs are cabled to a hub off a router connected to the office intranet. All my live testing is done over a VPN. This obviously slows down response time a little because of the extra routing layers, but not massively so.

The four PLCs available to me are two SLC-family ones, and two CLX-family ones as follows:

ControlLogix 1756-L55/A
SLC 5/05 1747-L551
MicroLogix 1400 1766-L32BWA
CompactLogix 5370 1769-L30ER

I realise these models are somewhat long in the tooth but there's a requirement to support them (I've had some forum respondents say to me "Why not get the customer to upgrade ?" -- unhelpful !).

The driver is designed essentially just to read tag values from and write tag values to any of these PLC types. Tags that are read get their values displayed on my UI and can be interfaced to other applications. Tags that are written get their values input by the user from that same UI. Yes, I realise the risk there (the risk of bad data, particularly the risk of destroying a running PLC) but that's the customer requirement, and great care has been taken to validate all written values, so please don't shoot the developer. The risks are understood but accepted, and are outside the scope of this discussion.

Of course, some of these PLCs have their tags on the backplane, whilst others are on an I/O card. That intelligence is already built into the driver, which has the ability to distinguish between SLC and CLX via a List Identity request sent at the very start of proceedings. This List Identity only needs configuring once, hence isn't observed in the attached Wireshark.

Once we know the model of PLC, we know what type of messaging we need to use. In all cases, we start off sending a Register Session and then a Forward Open so as to establish session information (id, handle, connection serial number, etc.). This is done regardless of whether there are any tag requests waiting to be processed in my driver, but we keep the session alive by simply reading a predetermined tag every 8 seconds (the connection timeout is set to 16 seconds on the Forward Open). In the Wireshark capture, you will see that tag called "ping"; it doesn't actually exist, hence the error status, but the fact is that it still functions OK as a keepalive.

There's only ever one session created/alive at a time (which I can verify from browsing the PLC and checking the Encapsulation Session stats). Should this session die (because the physical connection gets broken and thus our keepalive no longer gets through), the driver keeps track of successive keepalive failures and, if this exceeds a certain limit, the session is sent a Forward Close and an Unregister Session to make sure that everything is tidied up before we then attempt a new session. This approach has been tested and seems to work well; we never lose the session in normal operations, but if we test the theory by deliberately disconnecting the VPN, the PLC diagnostics shows the new Encapsulation Session being created followed briefly by the old one disappearing. So, in terms of session management, things seem very stable.

Once the session is up and running, we know we can read/write tags when the user is ready (I use the term 'tag' here to mean both a CLX tag and an element/subelement of a PCCC data file), and of course we know which style of services/commands we need (PCCC/DF1 for SLC, or CIP commands for CLX). All tag reads/writes are sent as connected SendUnitData packets and these commands are working successfully for both families. The one connected session is maintained throughout and used by all tag read/write requests until the driver is stopped.

I need to make it clear here that the needs of our application is such that reading a tag isn't done cyclically. The user decides ad hoc which tags they want to read and when they want to read them. Thus, whilst these read requests COULD be buffered up for a short while and turned into single requests which fetch several tag values at once, that's not the way the driver works. As soon as a tag value is required by the user, one SendUnitData request is sent and it only ever contains a service to read one atomic item of data (and that atomic item might be a primitive type or an element of a structure, array or UDT). So, granted, it isn't making vast use of the bandwidth, but that's the way it is, and it is working. Given that we are operating in this style (which doesn't receive periodic, unsolicited tag values from the PLC), then I'm assuming that the "requested RPI" value that gets set in our Forward Open request isn't relevant ? I don't know for sure, as I can't seem to find out much about its significance. Our requested RPI value is currently 2,064,960 (so 2 seconds). Given that the timeout multiplier equates to a value of 8, that explains why we see the connection time out after 16 seconds if we don't send a keepalive.

The decision was taken early on that all such SendUnitData requests to read and write tags should set the CIP sequence count field with a unique id so that a response can be matched with its original request.

Timings have been analysed for the above and we know that our Java is written pretty efficiently. What I mean by this is that, over the VPN, if a read or write tag request takes say 30 millis to complete, typically only 1 millisecond of that is spent in our driver code, and the other 29 millis is shown by Wireshark as the interval between the two timestamps of sending the request and receiving the response (which is therefore either time spent 'on the wire' or inside the PLC). Whilst testing, our PLCs are pretty much idle, in the sense that they aren't running any specific ladder logic. I have tested whilst in the office rather than over a VPN, and the Wireshark round trip comes down from 30 ms to about 20 ms. The 20 to 30 ms range varies not just on network topology but also slightly on the model of PLC we are testing against.

So given all of the above, and given there are no functional issues, what's the problem ?

Rightly or wrongly, we've been asked to stress-test the driver by using a long-running loop to send nothing other than write requests to a series of tags known to exist in one of the PLCs (in this case, the SLC 1747). The aim is to establish a benchmark for how many writes per second we can achieve between our driver and the controller. The desire/target (and, I agree, not necessarily achievable) is for that benchmark to come close to what has been measured with a similar driver for Modbus, which has been observed as taking 4ms to complete a round trip for one request. So we set up that test by creating a random number generator which updates every second and, on every update, sends that value to a series of 50 tags. I'll attach a Wireshark capture that shows what happens, but basically the outcome is:

Request 1 sent
Response 1 received
Very small latency
Request 2 sent
Response 2 received
Very small latency
etc ...

... and the round-trip interval for each of these request/response pairs is, as I say, about 30ms (so about 35 requests per second). This is a long way from the Modbus target of 4ms (about 250 requests per second). Incidentally, a look at the browser tells me that the PLC's CPU is at about 10% during this test, so it's not as if it is lacking power to do more.

How am I processing these outgoing requests ? As the request for each tag write gets its payload populated, I create a new thread to send the request. The thread then gets sent to a pure Java FixedThreadPool executor which processes the queue of threads ASAP. But it's the actual "what goes on under the covers" when the request is actually sent out on the TCP link layer that I'm not in control of right now -- because of the third-party library I'm using to do link layer management for me. The only other thing I can tell you is this: if the thread size of my FixedThreadPool is set to 1, this all works stably albeit slowly because each TCP packet sent contains only one SendUnitData and the PLC can cope with that. However, if I ramp up the number of threads managed by the pool (to say 4), what seems to happen is that the third-party library code tries to bundle several SendUnitData PDUs into one TCP packet and the PLC cannot deal with this: it ignores all SendUnitData PDUs except the very last one, and so most of our requests get timed out!

Hence the questions I posed were/are:

- Is an idle PLC likely to give me substantially more throughput than 35 requests per second ?
- If so, what are the best mechanisms for doing this ?
- Purely from the PLC's point of view, should the outgoing stream of requests and the incoming stream of responses be able to operate asynchronously and independently of each other (i.e. full duplex), or is it limited to this half-duplex mode shown above ? I would have thought 'full duplex' ? If you guys think this is possible, then I'm willing to ditch the third-party library and take control of the TCP link layer myself to see if I can make it happen. The PLC under test certainly shows as having an ENBT in full duplex mode operating at upto 100 Mbps.
- ODVA seems to suggest my driver can only maintain one Forward Open session to the PLC at a time. Is this correct, or could I look at maintaining multiple sessions to see if this is another approach that might improve throughput ?
- Any other bright ideas ?

Hope I've been clear now and included enough background but feel free to ask for further clarification.

Peter Nachtwey · Sep 4, 2020

Wtf?

The driver is purely Ethernet based - no serial comms. During coronavirus, I'm working from home, whilst the four PLCs that we have for testing are in the office 100 miles away. All PLCs are cabled to a hub off a router connected to the office intranet. All my live testing is done over a VPN. This obviously slows down response time a little because of the extra routing layers, but not massively so.

This should have been made clear up front.

I will let others ask questions now. I'm done.

Interleaving requests/responses for better bandwidth usage ?

gazroobari

Member

Peter Nachtwey

Member

gazroobari

Member

Ken Moore

Lifetime Supporting Member

drbitboy

Lifetime Supporting Member

Ken Roach

Lifetime Supporting Member + Moderator

Peter Nachtwey

Member

gazroobari

Member

Attachments

Peter Nachtwey

Member

Similar Topics