SLC's "Overflow Trap" fault and STI

Ron Beaufort

Lifetime Supporting Member
Join Date
Jul 2002
Location
Charleston, SC
Posts
5,701
Greetings to all,

this is one of those “yeah, it makes sense now that I think about it” type situations that most people never think about ... until it creates a problem ... the first time that I ran into it was a few days ago ... and I’ve never seen it mentioned anywhere before ... luckily it came up in the comfort and privacy of the lab ... not out there in a “let’s make money” production environment ... anyway, here it is for what it’s worth ...

most of us who work with Allen-Bradley SLC systems know that an “out-of-limits” math operation can generate a processor fault condition ... for a quick refresher, if the “overflow trap” bit (S:5/0) is set to a 1 at the end of the program scan, then the processor will fault ... if you need more detail than that, read this old post ...

old post about the Overflow Trap bit

once a programmer finds out about that S:5/0 “overflow trap” problem, then he often writes all of his future SLC programs to include something like this at the very end of Ladder File 2 ...

[attachment]

this rung has the effect of making sure that the “overflow trap” bit is ALWAYS turned off just before the very end of the processor’s scan ... thus, there is NO possibility for any “out-of-limits” math operation anywhere in the program to cause an “overflow fault” condition ... problem solved, right? ... well, not so fast ...

suppose that the “out-of-limits” math operation happens to take place within an STI (Selectable Timed Interrupt) file ... now the nice thing about an STI is that we (the programmers) get to decide just how often the STI file gets executed ... we do this by specifying a “setpoint time” when we set up the STI ... the trick is that we don’t usually know (or care) exactly WHERE in the program that the STI gets scanned ... and therein lies the problem ...

now suppose that the STI contains an “out-of-limits” math operation ... and that the STI just happens to execute that “out-of-limits” math operation right in BETWEEN the “let’s unlatch the overflow trap” rung and the “end” rung of Ladder File 2 ... bummer ... the processor IS going to fault after all ...

and consider how tricky this could be to troubleshoot ... the STI could execute that “out-of-limits” math operation many, many, many times and never give us a fault ... just as long as the STI’s execution happens to take place somewhere - anywhere - in the program scan ABOVE that “unlatch the overflow trap” rung ... so this thing gives us the potential of a program which runs without faulting for a very long time ... and then all of a sudden the processor faults out due to an “overflow trap” situation which we just KNEW that we had absolutely positively prevented with our handy little “unlatch” rung ... and incidentally, the longer the program, the less chance that you’ll see this type of problem shut your system down ... and that just makes the darned thing even harder to troubleshoot ... it only faults if the STI just happens to hit that very tiny slice of scan-time ... right in between those two consecutive rungs ... and statistically speaking, this is going to happen at 3:00 o’clock in the morning ... on a weekend ...

now for a quick simple solution ... how about putting a second “unlatch the overflow trap” rung right at the end of the STI file ... especially if there is ANY possibility of an “out-of-limits” math operation taking place anywhere inside that STI file ...

and finally, thanks to Phil for this chance to share ...

trapwithsti.jpg
 
Another solution

Instead of putting a default unlatch overflow trap in two places, how about doing it "the right way" from the start?

Create a "Fault Routine". Set up some LAD file (I like to use 13 for esthetic reasons) that has no JSR call (unless you really want it to run every scan). Put that number in S:29.

Now when the processor reaches the end of the program, it will run that LAD file. Put the Unlatch of S:5/0 there, unlatch S:1/13, and the SLC is back in business (until the next time, which with an STI problem, could be a long time).

But, since you've got ladder logic which is functioning AFTER the fault, you can get creative.

One thing I do is capture the date/time registers into a rolling FIFO, along with the fault number, and rung/ladder of failure. That way, if I'm told that the processor faulted during 3rd shift, and they cycled power, switched the keys back and forth, and generally beat on it until it worked, I've got SOME evidence as to what went on.

If there are calculations that you suspect could be causing problems, you can have the fault routine move some "default" data into those registers.
 
Greetings, Allen,

You said:

Put the Unlatch of S:5/0 there, unlatch S:1/13,

basically I like your "fault routine" approach ... but ... if you were to do it just the way you wrote it, then we'd automatically reset ANY and ALL of the processor's "user" faults ... my approach would only kill the "math overflow" variety of faults ... I'll bet that if we really tried, that we could come up with SOME types of faults that we would NOT want to just automatically reset ... of course you could always add some more logic to your fault routine and make the unlatch of S:1/13 conditional ... and base the conditions on what type of error had been generated ... but that's getting a little beyond the "quick simple solution" that I had in mind in my earlier post ...

anyway, I don't know about you, but I really enjoy these nit-picky little details ...
 
Hi Ron

how about doing it "the right way" from the start

Hmm... your approach of saving error data in a log file by means of the fault routine seems like a great idea to me. But I dont think that you can catch the ladder+rung number for a math overflow. The ladder+rung number error location is only updated when a major error has just occurred, and this is avoided by means of the fault routine(please correct me if I am wrong).

To find an offending math opperation can be a tricky little excercise, so I do unlatch the S:5/0 bit in every file, and save the file number to an adress for helping locating the error.

If the ladder+rung error location IS updated upon execution of the fault routine, the I think I will adopt yuor approach :)
 
Re: Another solution

Allen Nelson said:

One thing I do is capture the date/time registers into a rolling FIFO, along with the fault number, and rung/ladder of failure. That way, if I'm told that the processor faulted during 3rd shift, and they cycled power, switched the keys back and forth, and generally beat on it until it worked, I've got SOME evidence as to what went on.

Allen:

Would you mind sharing that bit of code, or pointing me to an example?
 
Since I happen to have it handy

Here's a sample of a Fault Routine I have:

[attachment]

There's all sorts of code that could be added here (at the top)

For example, here's a loop check to find and fix any negative timer presets:


+--------- MOV ---+
---------------------------| -- -1 ==> N7:0 |
+-----------------+


Q5.0 +--------- ADD ---+
-----|LBL|-----------------| N7:0 + 1 = N7:0 |
+-----------------+


+----------- LES ---+ +------- CLR ---+
-----| T4:[N7:0].PRE < 0 |---| T4:[N7:0].PRE |
+-------------------+ +---------------+

+---- LES ---+ Q5.0
-----| N7:0 < 100 |-------------------(JMP)
+------------+



You could even condition the above code so that it only executes on S:6 = 0034H.
This one's handy when you've got an operator interface at which time setpoints are entered. Every now and again, someone enters a negative value on the ONE tag that you don't have range checking on.

plcnet.jpg
 
Will this work with micrologix 1100?

Hi Allen,

Thx for this bit of logic.. Will it work on a micrologix 1100? I tried to copy the status bits of rung and file and it wouldn't verify. I got the error that the status word, S:21 for instance, couldn't be copied at the word level.

Thank you,
Craig

Here's a sample of a Fault Routine I have:

[attachment]

There's all sorts of code that could be added here (at the top)

For example, here's a loop check to find and fix any negative timer presets:


+--------- MOV ---+
---------------------------| -- -1 ==> N7:0 |
+-----------------+


Q5.0 +--------- ADD ---+
-----|LBL|-----------------| N7:0 + 1 = N7:0 |
+-----------------+


+----------- LES ---+ +------- CLR ---+
-----| T4:[N7:0].PRE < 0 |---| T4:[N7:0].PRE |
+-------------------+ +---------------+

+---- LES ---+ Q5.0
-----| N7:0 < 100 |-------------------(JMP)
+------------+



You could even condition the above code so that it only executes on S:6 = 0034H.
This one's handy when you've got an operator interface at which time setpoints are entered. Every now and again, someone enters a negative value on the ONE tag that you don't have range checking on.
 
I am surprised that so few have run into this before but it never hurts tor review and refresh.
This has been a known problem from the beginning I ran into it over 25 years ago
The recommended fix is to check the clear the overflow bit after every math function but that's a lot of extra code so the second is to always reset the overflow bit on the last rung of prog 2. The fault will only happen when prog 2 rolls over to rung 0 if the overflow bit is set at that time it generates a major fault
I have leaned to include the unlatch in the last rung if I use any math functions in the SLC
A program can run without a problem for years then all of a sudden start faulting, it all depends of the result of the math functions
 
I am surprised that so few have run into this before but it never hurts tor review and refresh.
This has been a known problem from the beginning I ran into it over 25 years ago
The recommended fix is to check the clear the overflow bit after every math function but that's a lot of extra code so the second is to always reset the overflow bit on the last rung of prog 2. The fault will only happen when prog 2 rolls over to rung 0 if the overflow bit is set at that time it generates a major fault
I have leaned to include the unlatch in the last rung if I use any math functions in the SLC
A program can run without a problem for years then all of a sudden start faulting, it all depends of the result of the math functions


It's not a 100% "fix", checkout Allen's post #2, the processor can fault after an STI, if the STI sets the trap bit.

Fault routine is the way to go. Clear the "trap" bit if it is set.

It's worth noting that only the "500" series of processors have this trap bit, and it is IMHO totally worthless, it doesn't tell you where, it just says you had a math over/under flow SOMEWHERE in your code, pretty pointless really, and it tells you too late, your code will continue to the end of the scan....
 
It's not a 100% "fix", checkout Allen's post #2, the processor can fault after an STI, if the STI sets the trap bit.

Fault routine is the way to go. Clear the "trap" bit if it is set.

It's worth noting that only the "500" series of processors have this trap bit, and it is IMHO totally worthless, it doesn't tell you where, it just says you had a math over/under flow SOMEWHERE in your code, pretty pointless really, and it tells you too late, your code will continue to the end of the scan....




I put the clear* close to where I expect the overflow, and where I don't expect the overflow I might allow or prefer the fault, because it means summat happened I did not plan for.


In my RNG code I can safely ignore the overflow cases, and actually want the lower 16-bit (or 15- or 31-bit) result, so I have to set the [Math Overflow Selected; S:2/14 in ML1100] bit to begin with.



* I should have said check and clear.




The problem described in the OP for this thread is quite interesting; it says we probably need to deal with any traps in the LAD file in which they occur.
 
Last edited:
it looks like you missed the first part
the first, best and preferred option is to check the overflow bit when any math unction is used
it is intended that the overflow bit is to be use with the function result. If the result value to large to fit in a 16 bit word at that point you then use it and clear I.
most math functions don't use it so we don't check for it.
 
If I have a complex program with a lot of math or scaling blocks, I will trap the overflow bit at the end of each ladder file (and I always break up my code into multiple files, even if they're simple). At least then I can check my internal math error bits to see which file contained the offending calculation.

I will latch an internal address when I find the math overflow bit set, and then unlatch it. I reset those bits with the alarm reset command (usually an HMI button).

Sidenote: I once had a complicated carton tracking program that used a lot of indirect addressing that worked great for 4 months and then started faulting for indirect address out of range. I thought I was very careful about how I managed the indirection and had a hard time figuring out what was going on. I puzzled over it for a long time (probably several hours which felt like all day) before I finally realized that my STI routine was happening in the middle of some other code in the main program and I had used the same PLC address in both routines as the index address inside the square brackets.

About a week before the problems showed up, I had added some pneumatic controls logic to the machine that impacted the scan time just enough so the execution of the STI began to align itself with the execution of the indirection in the other routine. All I had to do was pick another unused address for the STI, but I was a bit panicked hunting for this while conveyors were stalled and product was stacking up on us.

I could clear the fault and it would come back within five minutes. Why did it run a week after my PLC changes like this? There were other changes made to some remote HMI code made about that same time is my best guess, and those added comms loads also affected the scan time.

Now I am even more cautious with STI routines and indirection.
 
Hi Mickey,

I found that S:21 does exist in the ML1100, but rslogix is telling me that the selected word is not readable at the word level. I am literally trying to MOV S:21 to N34:1. If it cannot be moved, can I just access S:21 direclty for my query?

Thank you again,
Craig

S:21 does not exist in the ML1100.

The sample above was written for an SLC5/??. The status words fro the ML1100 and the SLC5/?? are slightly different. Check the manual for your ML1100.

https://literature.rockwellautomation.com/idc/groups/literature/documents/rm/1763-rm001_-en-p.pdf

Appendix B
 
Math Overflow time warp ... the story continues ...

dudes – I can NOT believe that it's been seventeen years since I started this thread ... but it looks like the old "math overflow trap" bit never goes completely out of style ...

just to move the discussion along – here's a little exercise that I used to run my boot camp students through when we got around to the basics of using analog signals with the SLC-5/04 systems ...

I'd have each student wire up a simple potentiometer to provide a 4 to 20 mA signal to a 1746-NI4 analog input module (a very commonly used piece of hardware) ... then it was easy to demonstrate the value of the input signal changing in the data table when the potentiometer was changed up and down ...

then once we had the inputs working, each student would wire up a 1746-NO4I analog output module (also very commonly used) to provide a 4 to 20 mA signal to a panel meter ...

it was easy to manually type in various values to the output module's address and watch the meter move up or down ...

next I'd have each student program a simple rung with an MOV (Move) instruction to logically tie the input device (the potentiometer) to the output device (the meter) ...

but - this would present a "problem" to be solved ...

the 1746-NI4 analog input module always uses a range of 3277 to 16384 to represent a 4 to 20 mA current range ... (sort of "weird" - but really no problem yet) ...

on the other hand ...

the 1746-NO4I analog output module produces an output range of 4 to 20 mA when its data value ranges from 6242 to 31208 ... (also "weird" - but that's just the way they built them way back when) ...

so ... simply MOVing the input signal over to the output device would only make the meter move about halfway up ... (in case you haven't noticed - the two data ranges are DIFFERENT) ...

now if we knew how to do the y=mx+b scaling operation, we could simply replace the MOV with an MUL (multiply) instruction – and multiply the input by about 1.90479 and put the result into the output location ... to you "math junkies" out there – (you know who you are) – this is quite "close enough" for what we're doing today ...

or ... for those of us who really don't like math – we could use the beloved SCP (Scale with Parameters) instruction to magically do all of the math for us ... so each student enters their very own SCP rung – and sets up the input and the output addresses - and of course the proper values for the ranges ...

Input Min. = 3277
Input Max. = 16384
Scaled Min. = 6242
Scaled Max. = 31208

BINGO! – now it works fine ... twiddling the potentiometer back and forth throughout its full 4 to 20 mA range now makes the milliamp meter move through a suitable 4 to 20 range ...

the sun is shining ... the birds are singing ... the pope gets his Wheaties on time ... life is lovely ...

but suddenly the student on Bench Number One has a "problem" to be solved ... his processor has mysteriously FAULTED ... his "plant" has come to a screeching halt ... his Bo$$ man is extremely agitated ...

keep in mind that my boot camp classes were designed to teach "troubleshooting" and "problem solving" skills to maintenance technicians ... they were NOT intended to be "programming" classes ...

specifically – I was never really focused on how PLC systems "work" ... instead I concentrated mostly on the skills needed whenever things did NOT "work" ... (the old saying about building a "better mousetrap" comes to mind) ...

QUESTION: how can you expect technicians to learn "troubleshooting" and "problem solving" skills – if the class fails to present any problems to be solved – and never presents any troubles to be shot? ...

anyway – back to our story ...

here's the diabolical trap that has been set up on Bench Number One ... this particular potentiometer has been (intentionally) sized so that the analog input signal can go substantially HIGHER than the "correct" range of 4 to 20 mA ... in fact, this one can get all the way up to about 22 mA ... and here's the unexpected "problem" with that "higher than expected" range ...

the "Scaled Max." entry on the SCP instruction does NOT (I repeat: it does NOT) place any sort of "limit" on how high the "Output" address can be driven ... regardless of what many (as yet – unburned) programmers may truly believe in their own little hearts-of-hearts ...

in simple terms ... if you just happen to (somehow or other) feed in more than the "expected" Input Max. value – then you can indeed get more than the "expected" scaled Output value ...

but there's more to the story ...

the tricky part is that when the SCP (bless its little heart) is doing its mysterious INTERNAL y=my+b mathematical scaling operations – if it happens to ever cipher up an INTERNAL value greater than (ready for this?) 32,767 - then the SLC-type processor freaks out – and sets the S:5/0 Math Overflow Trap bit ... (most programmers don't expect this INTERNAL calculation effect) ...

so ... did someone happen to program in one of those handy-dandy little "trap unsetter" rungs to keep from faulting the processor? ...

but – this is just a lab experiment – right? ... how (you well might wonder) could something possibly cause that "higher than normal" input signal out there in the real world? ... well, consider this as one possible scenario ...

suppose that Mutt the Maintenance Tech is out there in the field doing some of his "preventive maintenance" chores today ... suppose he's calibrating a thermocouple – one which serves as an input to an SLC-type analog module ... suppose he's taking a tiny little screwdriver – and he's gently tweaking the two tiny little "calibration" screws on the thermocouple transmitter ... suppose that he happens to turn one of those tiny little screws just a tiny little bit too far ...

with the students all gathered around – Ron plays the part of Mutt the Maintenance Tech – slowly twiddles the potentiometer just a little bit too far – faults the processor – and shuts down the plant ...

I always enjoyed teaching this particular lesson in my SLC classes – and I missed doing it when I taught the PLC-5 and ControlLogix systems ... since those platforms don't fault when they have a math overflow - I'd have to come up with other ways to torture (sorry ... I meant to say "teach") the students ...

and incidentally – many (most?) thermocouple transmitters are designed to go "high fire" whenever their thermocouple signal is broken ... in simple terms – if the thermocouple ever breaks – then the transmitter intentionally sends a "fault" type signal of about 24 mA to let the PLC know that there's something definitely wrong out there in the real world ... (see a potential plant-wide shut-down situation in the context of our present discussion?) ...

going further ... an overview ... now each student has a functioning analog INPUT signal (coming in) – and an analog OUTPUT signal (going out) ... well, shucks ... now we're just a hop-skip-and-a-jump from replacing the SCP instruction with something much more powerful ... how about a PID instruction which can use its input signal (Process Variable) from some type of field device – to automatically regulate its output signal (Control Variable) to some other field device ...

so now we can start thinking about having the PLC mathematically control a system for us – based on a Proportional action – an Integral action – and maybe even a Derivative action ...

now that we've got the foundation – we can keep right on building on it ...

oh, well – back to retirement ... it's time to go out and cut the grass ... I think maybe I'll spice things up this week by going around the yard clockwise for a change ... or then again - I'll probably just stick with my standard tried-and-true counter-clockwise direction ... why do anything so radical this late in life? ...

be safe ... be well ...
.

full_workstation_slc_full.jpg
 
Last edited:

Similar Topics

We are getting a fault...... Math Overflow Trap S:5/0 = 1 Apparently we are getting a value greater than 32767 or less than -32766. If...
Replies
6
Views
7,779
I have an slc 504 that faults every couple of days. The overflow trap bit S:5/0 is set and I get major error code 0020. I read in the...
Replies
1
Views
4,630
I have read some threads on fault handeling, and reseting. Perhaps I dont understand yet. I have some an alalog device that if the machine sits...
Replies
4
Views
1,514
Can a math overflow in an STI file be automatically reset? Over the years, I've taken to adding one final rung in my MAIN (LAD:2) file, that is...
Replies
12
Views
11,547
I'm trying to read/write to an SLC5 with a ControlLogix L71 V35 plc that fails. The exact same code works on an L82S with V32. Is there a known...
Replies
9
Views
73
Back
Top Bottom