Please don't take offense: my criticism is of the software, not the people who wrote it !
I read that code in some detail, and concluded that it represents several iterations of attempts to deal in software with problems that arise from wiring and signal quality in this large, heavily laden network.
If it were my system, I would try to re-write a simpler MSG sequencer from scratch, and "wrap around" the diagnostic logic that measures performance and counts errors.
I think that the focus on "latency" is probably misplaced. The I/O scan is probably a hundred milliseconds at the most, so several-seconds-long failures of control of a PowerFlex drive cannot be related to network performance.
It is possible that the drive is losing its network connection entirely, and re-establishing it, which is why the start and stop commands do not function as expected. That problem should be addressed first.
In my opinion, getting the system to a stage where there are no misleading error codes that mask the actual problems is important as well. For example, if devices have been permanently removed from the network then their Scanlist entries should be disabled so that error messages related to them do not appear on the 1756-DNB display.