machine faults

Jieve

Member
Join Date
Feb 2012
Location
USA
Posts
274
Hello Guys,

Question: what are common fault checks and timeouts that you program into your machinery for different components? Does anyone have a list or something they might be willing to share, for components such as pneumatic cylinders not extending/retracting in a certain amount of time, motor contactors not switching, circuit breakers tripping, etc? I was told that it is the PLC programmer's job to consider all of the possible fault scenarios, and a former coworker told me he once programmed a part of an assembly line with over 1800 faults, as somewhat of a newbie I'm wondering if there are standard faults for component/sensor combinations that are very routine. I can't imagine myself sitting there and coming up 1800 possible faults. Any answers would be appreciated.

Thanks!
 
There is no real definitive answer I can give to this. The field devices will respond as quickly as they are able to, and all of them will be different.

Most contactors will provide feedback very quickly, so I would say a reasonable "grace" time would be in the order of 0.5 seconds before you trip the alarm. This allows for a reasonable PLC scan time if the I/O is Synchronous.

Pneumatically operated valves will obviously take longer, and depend on several factors - solenoid valve size, manifold port capacity, air tubing size, distance from the solenoid valve, tightness of valve seals etc. etc. Even two identical valves will take different times to open and close, for many reasons.

What we always did was start with (pre-commissioning) default grace times of 5 seconds to open, 5 seconds to close, and adjust those that took longer during commissioning.

When we started using low-power solenoids/multi-way manifolds we got caught out by exhaust capacity in the event of an "emergency stop" - i.e. so many valves trying to close simultaneously through the manifolds and the panel's exhaust ports that they all tripped the alarm "Failed to Close" (in time). We had to overcome this in software, by inhibiting the alarm during emergency shutdowns. Of course a restart was inhibited unless the valves were all in the correct state and not "in alarm".

In summary, define what is reasonable for each actuated plant item to go to its commanded state, and program your alarm grace times accordingly. Override these times in "abnormal" conditions, such as an emergency shutdown.

HTH
 
You have identified the most common failure detection mechanisms already. Usually, every machine element that moves has a failure mode you will want to detect and display to the operator.

Every motor has a starter or a drive. Most have a coupling, too.

Most discrete sensors should be made true or false within a certain period of time after the machine element is started or stopped.

Most sensors also have times when they shouldn't be going true or false; if they are, that means they might be damaged or defective.

You'll find that controller-detected fault conditions multiply quickly if you consider them thoroughly; a Motor Starter might have a simple "did the aux contact turn on in 250 ms" check to be sure the motor contactor solenoid is working, and it might have the same check for the aux contact turning off, and both of those checks for both forward and reverse.

I always number the fault on the HMI, so the user can find it by number rather than only by description.

I find my thinking on fault detection and alarming is directed by my thinking about functional safety. As an automation engineer, by the time you've been through the risk assessment you are already thinking closely about the consequences of single failures and dependent failures.

I just did a simple machine with just six limit switches, three valves, and two motors. I ended up with thirty-two different faults or warnings based on the things I thought could go wrong.

Then my co-worker did what we call the "evil monkey" test: he started throwing switches out of sequence and pressing limit switches when there were no machine elements nearby, and turning down the air supply to half pressure.

It took me another couple of days to improve the robustness of my sequencing logic, so that it ignored out-of-sequence triggers of the limit switches, but showed an alarm notifying the operator that they'd happened. And I added another dozen warnings and a half-dozen alarms.

All of these are a pain to keep track of and do the paperwork on verification and validation, and they chew up time and memory in the controller and HMI system.

But they're worth it; they tend to make your machine faster to diagnose and more reliable, even if 99% of the faults never happen in practice.
 
Some other considerations:

I find the vast majority of my projects that are sequential can very easily accomodate most fault conditions with one single watchdog timer. If the sequence gets stuck on one step too long the watchdog times out and the current step # is loaded into the AutoFault register. In this fashion, I will then have fault text allocated for every step and tag it with the step #. In these cases I only put a quicker fault detection on things where there might be a greater risk of damage or safety (these tend to be infrequent). During the steps of a sequence that wait indefinitely for some action I include an inhibit on the watchdog. In these cases I like to make sure there is some messaging on the screen to indicate what it is that the machine is waiting for in order to continue.

If both retract and extend switches are available I make separate steps to detect off transition of the one I'm leaving and the ON transition of the one I am approaching. This gives them an extra level of information to diagnose with (if it didn't leave at all it is more likely to indicate a faulty valve, loss of air, or the like, whereas if it did leave but didn't make it where it was going it is more likely a mechanical jam or a fault or mis-adjsuted switch).


Nuisance fault detection is sometimes just as painful as inexistent fault detection. Good examples of this are things like flow switches where every now and then the signal might drop for a second or two. I find myself often regretting not giving more liberal times to faults because they end up causing the machines to go down on spurious events that have little or no consequence. So let "consequence" be a large driving factor in how liberal you are with your times.

Consider whether some faults might not deserve a warning period to allow an operator an opportunity to fix an issue before it actually becomes a real problem. For example, if you are controlling a VFD over fieldbus then it is easy to measure the current. It is common to monitor this current to generate faults, but even better, let them know when things are approaching that threshold. Give them a chance to oil that chain or grease those bearing before you stop production. Some of these things could possibly wait for the next maintenance period.

I try to minimize the number of faults that are output based (timers based purely off of an output being on and an input condition expecting to be made). In only rare instances I will enable them in Manual mode.
 
For sequences that have multiple dependencies per state and generally follow a sequential order, I like to fall back on the drum sequencer with sequenced inputs. It gives you the built in diagnostics, step tracing and timing histograms, and you can literally "re-peg" it at run-time to make the machine almost infinitely re-programmable.

I prefer using MEQ for each group of sixteen inputs as opposed to the SQI, and I use a MVM instead of SQO.

Both of these use indirection to do the same thing, but lack the built in step control word and its associated limitations.

I just rewrote a program for a bag inserter I will test it next week. No more JMP/LBL OTL from helll logic...

I took the basic generic sequencer posted at MrPLC and added step time control with HMI/recipe control of the times along with masked inputs and step time recording limit checking. The core sequencing logic is only about 12 rungs even with the frills of the HMI showing the remaining states for the step to advance on the painelview plus.

That example I posted is not filled out enough to really demonstrate how to apply it. Perhaps I can fluff it up a bit with parts of my work using it on a real machine like the OK220 servo.
 
Hello Guys,

Question: what are common fault checks and timeouts that you program into your machinery for different components? Does anyone have a list or something they might be willing to share, for components such as pneumatic cylinders not extending/retracting in a certain amount of time, motor contactors not switching, circuit breakers tripping, etc? I was told that it is the PLC programmer's job to consider all of the possible fault scenarios, and a former coworker told me he once programmed a part of an assembly line with over 1800 faults, as somewhat of a newbie I'm wondering if there are standard faults for component/sensor combinations that are very routine. I can't imagine myself sitting there and coming up 1800 possible faults. Any answers would be appreciated.

Thanks!

Whenever you do alarming or software for a new plant you have to go through the process of a risk assessment to determine which alarms are necessary and which are just flooding your operator. You can allow for every possible combination of alarms but only a detailed risk assessment of your process will identify the critical from the non critical.

The idea is to keep it simple and mask out unnecessary alarms that do not improve the process that you are controlling and takes your operators attention from the important operations that he should be looking at.

Each process might have different alarms depending on the operation but there are common attributes for pieces of equipment. It will all depend on what sensors you have available to pass information on to the control system.

Rheinhardt
 
As Rheinhardt suggests it it better to limit the alarms visual to the operator.
you may consider grouping alarms or assuming you have a HMI display you might sequence the dislay to direct the operator to the actual fault.
I have seen this setup in a machine with about 25 O/Loads - very easy to find and in this case reset from the HMI.
I have also recently had to rehash a fault design as the previous programmer allowed more than one fault to display - issuing the first and main fault - this gave the operator the wrong information.
in some cases this can cause the repairman to replace perfectly good apparatus - not good - and excessive lost time
 
You have identified the most common failure detection mechanisms already. Usually, every machine element that moves has a failure mode you will want to detect and display to the operator.

Every motor has a starter or a drive. Most have a coupling, too.

Most discrete sensors should be made true or false within a certain period of time after the machine element is started or stopped.

Most sensors also have times when they shouldn't be going true or false; if they are, that means they might be damaged or defective.

You'll find that controller-detected fault conditions multiply quickly if you consider them thoroughly; a Motor Starter might have a simple "did the aux contact turn on in 250 ms" check to be sure the motor contactor solenoid is working, and it might have the same check for the aux contact turning off, and both of those checks for both forward and reverse.

I always number the fault on the HMI, so the user can find it by number rather than only by description.

I find my thinking on fault detection and alarming is directed by my thinking about functional safety. As an automation engineer, by the time you've been through the risk assessment you are already thinking closely about the consequences of single failures and dependent failures.

I just did a simple machine with just six limit switches, three valves, and two motors. I ended up with thirty-two different faults or warnings based on the things I thought could go wrong.

Then my co-worker did what we call the "evil monkey" test: he started throwing switches out of sequence and pressing limit switches when there were no machine elements nearby, and turning down the air supply to half pressure.

It took me another couple of days to improve the robustness of my sequencing logic, so that it ignored out-of-sequence triggers of the limit switches, but showed an alarm notifying the operator that they'd happened. And I added another dozen warnings and a half-dozen alarms.

All of these are a pain to keep track of and do the paperwork on verification and validation, and they chew up time and memory in the controller and HMI system.

But they're worth it; they tend to make your machine faster to diagnose and more reliable, even if 99% of the faults never happen in practice.

All very true, the more alarms you can program into the system to signal that "something went wrong" leads to a better, more robust, system that is more easily diagnosed and corrected by the relevant people.

Pumps and valves usually mean a flowrate or a pressure is required (not always), so incorporate this degree of cross-checking in addition to the pumps and valves functioning correctly. It is not unknown for "construction debris" to find its way to a place that causes problems. (Example : On one project I worked on, a vessel was not emptying, we checked pump rotations, valves actually open, etc., and eventually discovered a discarded grinding disc had settled perfectly over the tank outlet, effectively reducing the outlet diameter from 4 inches to about half an inch !! Fishing with about 20 feet of welding wire is "difficult" to say the least, and took over an hour to land that fish!)

Vessels with level probes should have alarming if the level probe signals are "illogical", but take account of the times the vessel is washed (CIP), which can lead to illogical probe states, but in this case it's OK.

Panel selector switches should be checked for one only one input on.

Panel pushbuttons should be tested for "Stuck On", but be aware that some operators tend to hold a pushbutton down way longer than is needed. It largely depends on the feedback from the plant that he gets, his comfort zone will let him release it when he is 100% convinced the button has done its job.

Playing "evil monkey" is an exercise that was mostly done on projects I got involved with, and certainly proved its worth.
 
When it comes to alarms, then the more the merrier is our philosophy.
In my last big project I had approx 1000 alarms. Normal for us is around 500 alarms.

You can create alarms for all possible standard conditions for an object, and then program a standard function block for that object with all these alarms already included.

I have an "enable" input to my function blocks so that I can mute the alarms in case that there is another cause than a problem with the object itself. For example if a circuit breaker trips, then all objects that are dependant on the power from that circuit breaker will be disabled. Then only the alarm for the circuit breaker will be triggered, and not 20 alarms that are indirectly triggered.

Also, certain non-critical alarms can be muted by the operator, so that for example a defective sensor can be dealt with by maintenance at a later time. If a muting is active then there will be warning on the HMI.

In addition to the basic lower level alarms, we have alarms on the higher level that checks for "impossible" activities or process that that has run amok (extreme high temperature for example).
And we also have a couple of warnings that check that the operators starts and stops the process in a certain sequence. That we dont just block this automatically is to make for some freedom for the operators to be able to handle an abnormal situation by their own initiative.
 
Guys, thanks so much for the detailed and informative responses!

Regarding these faults, under what circumstances would you actually stop the process and wait for a repair vs. simply mute the sensor? If a sensor signal fails that is required to go to the next step in a sequence, do you just program in a time delay overrride which can be activated via the hmi or just have the operator acknowledge the fault on the hmi then mute the sensor from that point on?

I'm not really sure i understand how you can write the program to mute the sensors in advance in case One consistently times out.

Thanks!
 
If the sensor is critically required for a transition from one sequence step to another, then it cannot be muted. If it is only an additional check then it can be muted.
If a sensor is muted, then the operator must visially check that the proces runs ok anyway.
 
Unfortunately, it's really not possible to generically answer your question exactly because we don't know your process. Those are items that need to be discussed during the project definition stage together with those who know the process.
 
Just my 2 cents for what it's worth. It might be more or less depending on exchange rates.

When you monitor a fault, it's good idea to have it fall into one of three categories.

Critical Fault: This is a fault that causes the machine to immediately stop. E-Stop, guard door opened, jammed components or product, loss of air pressure, etc.

Cycle Stop Fault: This is a fault that will cause the machine to stop at the end of a cycle and in a controlled manner. No components, cycle-stop pressed, etc.

General Fault: This is fault used to provide information to the operator warning them of things like low component levels, etc. These faults do not stop the machine unless they are not addressed. If they are not addressed then they then become Cycle Stop Faults which stops the machine in a controlled manner.

There are other naming conventions used for these scenarios depending on the OEM (Critical, Major, Minor) but for the most part this is what I have had to deal with when programming data collection. And having them placed into categories like this has made my data collection life a lot easier.

There ya go. My 2 cents spent. Hope this helps.

Dave
 
There will be timers for minimum start and stop times for pumps, belts, even solenoid valves, you may want your sequencer to tick on to the next step, but most solenoid valves need a dwell time of at least 20ms, and 120ms is a typical working value...this is the minimum, a timer dedicated and preset not changed after adjusting to the system.

For faults, I like the top down approach, define any all ecnompassing faults first, like E-Stop.

I don't care if forty seven other sensors fail if the e-stop is not enabled, I can filter those subsequent faults.

My fault logic typically has a debounce timer, that seals in when done to hold a bit true. If I need to remember through a power cycle, I will use OTL, but only allow the OTU or the seal to be broken if the condition generating the fault no longer exists. I don't need to add a fault to a queue as fast as the operator can try to reset it before calling it in.

For classes:
Warning: Information. ie. Bin low level, Machine Excess Idle minutes.
Alarm: Requires acknowledgement, deviation of cycle times, process setpoints, may require alteration to cycle.
Fault: Anything that requires orderly shutdown. ie Process HighHigh and LowLow limits, e-stop monitoring inputs, overtravel limit switches.

I think it's easier to keep up with meanings with those words at least in central USA. And, those three classes cover 99.9% of the situations I need as a programmer.
 
Guys, thanks so much for the detailed and informative responses!

Regarding these faults, under what circumstances would you actually stop the process and wait for a repair vs. simply mute the sensor? If a sensor signal fails that is required to go to the next step in a sequence, do you just program in a time delay overrride which can be activated via the hmi or just have the operator acknowledge the fault on the hmi then mute the sensor from that point on?

I'm not really sure i understand how you can write the program to mute the sensors in advance in case One consistently times out.

Thanks!

Safety related device alarms should NEVER be muted. In terms of your machine operation/Process the only way you should always determine if you can do this is have a Risk Assessment and go over all the possible scenarios that may occur.

This is the only sure way to determine if you should allow a mute or not.

When it comes to actually muting there are various ways. I have seen a simulate button for devices that came in handy for this functionality, offcourse it had an engineering level security so the decision has to come from above. It can be a bad thing if not managed properly.

Rheinhardt
 
Last edited:

Similar Topics

Hi Everybody, Need help on this subject. I want to have machine error messages generated in the PLC to display in the Panelview 'first in first...
Replies
7
Views
3,383
I'm getting frustrated creating arrays of variables in Machine edition. I need to make 2 variable arrays that are 102x2 in size, with varying...
Replies
3
Views
88
Hello, I am still new to PLC programming and I just got this job two year out of school so I don’t remember much. I was given a task were I have...
Replies
1
Views
160
I am trying to connect with a Schneider plc which has a firmware version only available in Somachine v4.2. In Machine expert After taking upload...
Replies
0
Views
95
Hi all, I would like to replicate my PC onto a virtual machine so I could use it on a different laptop when I'm on site. I have never created a...
Replies
5
Views
240
Back
Top Bottom