Further clarification/background
Guys,
Thanks for the responses so far.
It seems I have ruffled a few feathers - sorry to those I've frustrated; I'm not deliberately trying to be coy, though you will have to trust me when I say that I cannot reveal the third-party library I've been working with.
The driver is purely Ethernet based - no serial comms. During coronavirus, I'm working from home, whilst the four PLCs that we have for testing are in the office 100 miles away. All PLCs are cabled to a hub off a router connected to the office intranet. All my live testing is done over a VPN. This obviously slows down response time a little because of the extra routing layers, but not massively so.
The four PLCs available to me are two SLC-family ones, and two CLX-family ones as follows:
ControlLogix 1756-L55/A
SLC 5/05 1747-L551
MicroLogix 1400 1766-L32BWA
CompactLogix 5370 1769-L30ER
I realise these models are somewhat long in the tooth but there's a requirement to support them (I've had some forum respondents say to me "Why not get the customer to upgrade ?" -- unhelpful !).
The driver is designed essentially just to read tag values from and write tag values to any of these PLC types. Tags that are read get their values displayed on my UI and can be interfaced to other applications. Tags that are written get their values input by the user from that same UI. Yes, I realise the risk there (the risk of bad data, particularly the risk of destroying a running PLC) but that's the customer requirement, and great care has been taken to validate all written values, so please don't shoot the developer. The risks are understood but accepted, and are outside the scope of this discussion.
Of course, some of these PLCs have their tags on the backplane, whilst others are on an I/O card. That intelligence is already built into the driver, which has the ability to distinguish between SLC and CLX via a List Identity request sent at the very start of proceedings. This List Identity only needs configuring once, hence isn't observed in the attached Wireshark.
Once we know the model of PLC, we know what type of messaging we need to use. In all cases, we start off sending a Register Session and then a Forward Open so as to establish session information (id, handle, connection serial number, etc.). This is done regardless of whether there are any tag requests waiting to be processed in my driver, but we keep the session alive by simply reading a predetermined tag every 8 seconds (the connection timeout is set to 16 seconds on the Forward Open). In the Wireshark capture, you will see that tag called "ping"; it doesn't actually exist, hence the error status, but the fact is that it still functions OK as a keepalive.
There's only ever one session created/alive at a time (which I can verify from browsing the PLC and checking the Encapsulation Session stats). Should this session die (because the physical connection gets broken and thus our keepalive no longer gets through), the driver keeps track of successive keepalive failures and, if this exceeds a certain limit, the session is sent a Forward Close and an Unregister Session to make sure that everything is tidied up before we then attempt a new session. This approach has been tested and seems to work well; we never lose the session in normal operations, but if we test the theory by deliberately disconnecting the VPN, the PLC diagnostics shows the new Encapsulation Session being created followed briefly by the old one disappearing. So, in terms of session management, things seem very stable.
Once the session is up and running, we know we can read/write tags when the user is ready (I use the term 'tag' here to mean both a CLX tag and an element/subelement of a PCCC data file), and of course we know which style of services/commands we need (PCCC/DF1 for SLC, or CIP commands for CLX). All tag reads/writes are sent as connected SendUnitData packets and these commands are working successfully for both families. The one connected session is maintained throughout and used by all tag read/write requests until the driver is stopped.
I need to make it clear here that the needs of our application is such that reading a tag isn't done cyclically. The user decides ad hoc which tags they want to read and when they want to read them. Thus, whilst these read requests COULD be buffered up for a short while and turned into single requests which fetch several tag values at once, that's not the way the driver works. As soon as a tag value is required by the user, one SendUnitData request is sent and it only ever contains a service to read one atomic item of data (and that atomic item might be a primitive type or an element of a structure, array or UDT). So, granted, it isn't making vast use of the bandwidth, but that's the way it is, and it is working. Given that we are operating in this style (which doesn't receive periodic, unsolicited tag values from the PLC), then I'm assuming that the "requested RPI" value that gets set in our Forward Open request isn't relevant ? I don't know for sure, as I can't seem to find out much about its significance. Our requested RPI value is currently 2,064,960 (so 2 seconds). Given that the timeout multiplier equates to a value of 8, that explains why we see the connection time out after 16 seconds if we don't send a keepalive.
The decision was taken early on that all such SendUnitData requests to read and write tags should set the CIP sequence count field with a unique id so that a response can be matched with its original request.
Timings have been analysed for the above and we know that our Java is written pretty efficiently. What I mean by this is that, over the VPN, if a read or write tag request takes say 30 millis to complete, typically only 1 millisecond of that is spent in our driver code, and the other 29 millis is shown by Wireshark as the interval between the two timestamps of sending the request and receiving the response (which is therefore either time spent 'on the wire' or inside the PLC). Whilst testing, our PLCs are pretty much idle, in the sense that they aren't running any specific ladder logic. I have tested whilst in the office rather than over a VPN, and the Wireshark round trip comes down from 30 ms to about 20 ms. The 20 to 30 ms range varies not just on network topology but also slightly on the model of PLC we are testing against.
So given all of the above, and given there are no functional issues, what's the problem ?
Rightly or wrongly, we've been asked to stress-test the driver by using a long-running loop to send nothing other than write requests to a series of tags known to exist in one of the PLCs (in this case, the SLC 1747). The aim is to establish a benchmark for how many writes per second we can achieve between our driver and the controller. The desire/target (and, I agree, not necessarily achievable) is for that benchmark to come close to what has been measured with a similar driver for Modbus, which has been observed as taking 4ms to complete a round trip for one request. So we set up that test by creating a random number generator which updates every second and, on every update, sends that value to a series of 50 tags. I'll attach a Wireshark capture that shows what happens, but basically the outcome is:
Request 1 sent
Response 1 received
Very small latency
Request 2 sent
Response 2 received
Very small latency
etc ...
... and the round-trip interval for each of these request/response pairs is, as I say, about 30ms (so about 35 requests per second). This is a long way from the Modbus target of 4ms (about 250 requests per second). Incidentally, a look at the browser tells me that the PLC's CPU is at about 10% during this test, so it's not as if it is lacking power to do more.
How am I processing these outgoing requests ? As the request for each tag write gets its payload populated, I create a new thread to send the request. The thread then gets sent to a pure Java FixedThreadPool executor which processes the queue of threads ASAP. But it's the actual "what goes on under the covers" when the request is actually sent out on the TCP link layer that I'm not in control of right now -- because of the third-party library I'm using to do link layer management for me. The only other thing I can tell you is this: if the thread size of my FixedThreadPool is set to 1, this all works stably albeit slowly because each TCP packet sent contains only one SendUnitData and the PLC can cope with that. However, if I ramp up the number of threads managed by the pool (to say 4), what seems to happen is that the third-party library code tries to bundle several SendUnitData PDUs into one TCP packet and the PLC cannot deal with this: it ignores all SendUnitData PDUs except the very last one, and so most of our requests get timed out!
Hence the questions I posed were/are:
- Is an idle PLC likely to give me substantially more throughput than 35 requests per second ?
- If so, what are the best mechanisms for doing this ?
- Purely from the PLC's point of view, should the outgoing stream of requests and the incoming stream of responses be able to operate asynchronously and independently of each other (i.e. full duplex), or is it limited to this half-duplex mode shown above ? I would have thought 'full duplex' ? If you guys think this is possible, then I'm willing to ditch the third-party library and take control of the TCP link layer myself to see if I can make it happen. The PLC under test certainly shows as having an ENBT in full duplex mode operating at upto 100 Mbps.
- ODVA seems to suggest my driver can only maintain one Forward Open session to the PLC at a time. Is this correct, or could I look at maintaining multiple sessions to see if this is another approach that might improve throughput ?
- Any other bright ideas ?
Hope I've been clear now and included enough background but feel free to ask for further clarification.