Error detection and Fault handling philosophies

RMA · Nov 23, 2004

Up until now, I've only been involved with relatively small projects with S7 (on my own, anyway) and I've been in the habit of putting all my error checking in a single FB or FC which I then called right at the start of OB1, or right at the end, depending on whether simple blocking of further operation or immediate shutdown to a safe state was required.

For my current project, which involves 21 functionally identical capacitor bank power supplies (modules), (but with different energy capacities) for a system to develop a 100T magnetic field by dumping the stored energy (Max 49 MJ) through up to four concentric magnetic coils, I've been doing most of the error handling on a per module basis. There are several reasons I chose to do this, but not least among them was the fact that although I'm working on a basis that ANY fault ANYWHERE shuts the whole system down, given the past history of the customer it's an odds on certainty, (despite the original specification, which is now unrecognisable), that somewhere along the line they're going to request the option to carry on with the experiment if one module faults on the basis that "OK, we only have 47,6 MJ instead of 49MJ, but we don't want to waste them, do we!", or whatever!

However, I'm finding that because there are also some common components which need to be included in the error checking, I'm starting to run into contradictions, in particular with regard to acknowledgements.

I'd be very interested to know how you guys with lots of PLC experience plan your fault supervision and error handling.

Not unconnected with this question, as I suspect applies to many people coming from an interrupt driven background, I'm still not (after about three years working with Step7) entirely comfortable with the cyclical programme characteristics of PLCs. This means that on the one hand, I tend to try and "bend" things to a sort of quasi-interrupt driven situation, using flags to mark "first time through" or "already done" etc. On the other hand, I suspect it also leads to me feeling more comfortable using SET / RESET commands rather than ASSIGN (=). I followed the discussion in the "To Latch or not to Latch" thread with considerable interest, but got the impression that some of the points raised were very AB specific. I would be very interested if somebody could expand on how far some of those points are relevant to the Step7 world and whether there are any peculiarities in Step7 (analagous to the unused blocks being run during Init, for example) which one should be aware of.

My whole period of working with Step7 has been with companies who because of their history and background would have preferred to have nothing to do with it (beneath their dignity, in most cases), so I got flung in and told to get on with it - it's only a PLC, it can't be that difficult! As a result, while I get by not too badly, I'm well aware that my knowledge is full of some pretty big holes!

By the way, just a few points about the project, this is not a machine which can seriously damage itself or people in the event of something going wrong, in most cases the worst thing that can happen is a few blown fuses on the individual module capacitors. The whole system is encased in a blast-proof building which nobody is allowed in during the experiment and the modules are so built that in the event of a power down (i.e. emergency stop) the capacitors will be drained to ground in < 5 seconds. As a result, the customer is not going to be too happy with what he regards as "unnecessary" aborts. (Though this will probably only become a priority once they manage to get the magnetic coils to hold together for more than one or two shots!)

Just to complete the picture, when the system is complete and assuming they ever get their coils to stay in one piece (probably the biggest questionmark!), the 100T should be achieved by dumping the capacitors at 24 kV with a peak current (for 10mS) of 340 kA - should make quite a bang!

JesperMP · Nov 23, 2004

Hi (again) Roy,

my error handling is quite simple.

Everywhere in the program an error condition will generate a "raw" alarm - a bit will be set (with =, not with S). Symbol ALR_xxx

An alarm handling routine will pick up the raw alarm bits and sets another alarm bit with memory. Symbol ALM_xxx
The HMI watches the ALM bits and displays a message for each alarm.
When the operator acknowledges the message on the HMI an ACK_xxx bit is sent to the PLC.
The alarm handling routine picks up the ACK bits and resets the ALM bits (if ALR is still not on). If ALR is still on, then the acknowledge is memorised and ALM will be reset later.

In the program, I monitor the ALR bits or the ALM bits depending on which suits the object best.
Some objects only needs to be stopped once (inhibit via the ALR bits), but most objects must only be started after the operator has acknowledged (inhibit via the ALM bits).

There are between 200 and 1000 alarms in a project.

Some of the alarms can be considered "less critical". For these I have "ignore" buttons on the HMI. The ALR bit is never set when a particular "ignore" button is activated.

How the proces reacts to a certain alarm condition can not be put simply. It must be considered carefully in each case.

For the particular problem you describe, i suggest that you make several alarm leves to suit your requirements.
Example:
HI: Only warning. The proces continues.
HIHI: Inhibit. The proces is interrupted, but can continue if the operator acknowledges within a time.
HIHIHI: Shutdown.
In addition let the HI, HIHI, and HIHIHI levels be adjustable on the HMI.

JesperMP · Nov 23, 2004

About interrupt or continous cycle based programming.

If the sequence is just a little bit complicated, you should consider to let your programming be state based.
I do that, and its a super way to exactly define a way a machine or proces should work.
I write my own state based code in S7. It is not at all complicated. You just have to understand the concept of "states".
As the states changes instantly from one to another via conditions, you could even consider the conditions to be "interrupts".
Search for "state based" or "state machine".
There is also a shrinkwrapped addon to STEP7 called HiGraph. But have never tried it though.

RMA · Nov 23, 2004

Thanks Jesper,

that looks like an interesting basis to work with. As I said, where PLCs are concerned, I'm entirely self-taught, so some things just sort of grow, which means that sooner or later you get to a point where the organisation is no longer optimal!

Somewhere, a few months back, I noticed that things were getting unmanageable in STL and persuaded the company to invest in S7Graph. That has turned out to be ideal - let's me turn flow charts into programs with no trouble. Only problems are that 21 parallel paths take up a lot of room on the screen

and, more particularly, it seems to consume enormous PC resources when you go online to observe the program and the HMI PC (2.8GHz - 512 MB) really struggles with regular MS error messages, especially if ProTool Runtime is running concurrently.

By the way, I found the button I created about six months ago to run ProTool in a window, so at the moment I'm doing everything on the PC, working quite well so far.

Cheers

Roy

JesperMP · Nov 23, 2004

Just a small info.

GRAPH and HiGRAPH should be a bit different.
As I understand it, GRAPH is more like a sequencer whereas HiGRAPH should be the equivalent of statebased programming.
But I have never worked with either.

dandrade · Nov 23, 2004

Do a search words fail fault in name of Hester, dandrade.

We initiate this debates, be well coming. Congratulations, by perceive the importance and the need of maintain confident in the results!!!

Jesper, you post correct method it is approval any (any recommend) norms. Jesper, is routine common for set/reset bits?

Examine, the documents and posts, more latter, in good time return.

JesperMP · Nov 23, 2004

dandrade said:
Jesper, you post correct method it is approval any (any recommend) norms.

I dont have any norms to work after when it comes to alarm handling. Maybe there are some - I dont know.
I figured out my way of doing it over a couple of years. Now I dont want to change it as it is simple and works well for me.

dandrade said:
Jesper, is routine common for set/reset bits?

If you are asking about the alarm handling, then the function is like this:

ALR:    _______--_________ ... ____--------------____

ALM:    _______-------____ ... ____--------------____

ACK:    _____________-____ ... __________-___________

ACKMEM: __________________ ... __________--------____

RMA · Nov 23, 2004

GRAPH is a Sequential Flow Chart program (SFC) as defined in IEC 61131-3, rather than the state based system of HiGRAPH. For my requirements ideal, because the programm flow in the system (apart from fault conditions) is linear and I can more or less program straight from my design Flow-Charts. I had never used GRAPH before, but got used to it very quickly. I've never done any genuine State-based programming nor have I used Hi-GRAPH, so I can't comment on them.

One thing, although you can run GRAPH on a 314, I think you might run out of resources pretty quickly. My biggest GRAPH block, running near the maximum allowed Transitions (ca. 250, I believe, I can't find the details just now) uses 23700+ Bytes of Arbeitsspeicher - that would be a big chunk out of the 314's 48kB!

By the way, I had to read your flow diagramm about three times before I understood it, but I got there in the end!

Ken M · Nov 23, 2004

Roy

Your issue of 'unnecessary aborts' is very important operationally. I remember a similar situation at a power station when the site were using PLCs to monitor the steam being fed to the turbines.

When the turbines were spinning and generating power for the grid one of the key things to be monitored was the steam temperature. If it got too cool when there could be the possibility of wet steam (i.e. steam and water vapour) entering the turbines, this could cause significant damage to the blades at 3000rpm. The key response to this was to dump the wet steam through a divert valve and stop the turbine. This was vital. On the other hand, once you started the shut-down sequence of a running turbine it could be anything up to 30 minutes before it was back up to generating speed again. So a false trip had just cost you 30 mins of power you could have sold. Say each turbine was rated at 350MW and 6 of them were tripped for no good reason - that's a lot of juice!

The big approach to this was triplication and voting. No one PLC could shut down the the turbines by itself, and each PLC had multiple sensors and multiple connections to the others.

In your case, the modularity of your system means that a single 'fault' is probably going to be OK-ish. But how many modules need to fault before you decide on a system shutdown? And how certain can you be of fault signals, especially if they're based on analog values like steam temperature in the case above? If a sensor drifts out of calibration will you know before it's too late?

Instinctively I just feel that fault handling and recovery must start with sound detection of the error. What you choose to do thereafter is the easy bit!

Regards

Ken

RMA · Nov 23, 2004

Hi Ken,

yes, I agree with the comparison. The only "big" project which I've worked on with S7 before this one (and I was part of a team, in that case) was for a 7MW Gas-Turbine generator plant, so I was exposed to some of the problems there.

In my case, most of the possible fault conditions are relatively easy to handle most of the time, because if they are things like missing feedback signals, then in most cases there is plenty of time to recognise the problem and deal with it.

The one place where it gets critical is that up to 15 of the modules can be connected together on a so-called collector, so that all the current can be directed towards one winding of the multi-part magnetic coil. The charging phase will last about 90 seconds (you see what I mean about having plenty of time!) so if anything happens in the first, say, 80 seconds, there is still more than enough time to separate a faulty module from the collector.

Where it gets a bit hairy, is if something goes wrong in the last few seconds of the charging phase. When the Modules are triggered, the voltages on each module must be within 1% of one another, otherwise there is the danger that the thyristor in one of the modules might be retriggered by the oscillating voltage on the collector. This will screw up the pulse form and probably ruin the experiment, but in most cases it shouldn't do any serious damage.

The worse condition is if one of the modules for some reason fails to reach it's set-point, then the other (up to 14) modules will try and discharge into the lower voltage module through the freewheel diode across the thyristor. If the voltage difference is great enough (what "enough" is we've yet to find out) then the potential for damage to the module is quite high. One of the problems is, the equipment is all one-off custom-built specials, so we have no idea what the charging speed is going to be at different parts of the charging cycle, or even whether all chargers will charge at the same speed, so at the moment, I have no idea how long I'm going to have to make a decision. We've just started testing the first module and later this afternoon, we'll be switching on the charger for the first time - so if you don't hear from me again, there may have been a loud bang! shutit

Hopefully, by the end of next week we should have some idea what the charging curve looks like, so i know what my times are looking like.

If a sensor drifts out of calibration will you know before it's too late?

That's one of the problems that we've still got to look at.

We tried out the Set-point O/P to the charger yesterday and the SM332 is giving out more than 1% too much. Fortunately it looks to be fairly linear rather than the more usual "S"-curve errors, so it will be relatively easy to compensate. However, it still remains an open question as to whether they will all be the same! On top of that we've got the errors in the charger, both for Set-point conversion and for Peak charge voltage. The manufacturer is promising 0.1%, but to be on the safe side we've insisted on the possibility to adjust things. They didn't wan't to say anything about stability though! Another problem could be the condenser hall itself, so far I've not seen any heating or air-conditioning. I assume there will need to be something, if only to avoid condensation problems, but how accurately it will hold temperature remains to be seen.

Error detection and Fault handling philosophies

RMA

Member

JesperMP

Lifetime Supporting Member + Moderator

JesperMP

Lifetime Supporting Member + Moderator

RMA

Member

JesperMP

Lifetime Supporting Member + Moderator

dandrade

Member

JesperMP

Lifetime Supporting Member + Moderator

RMA

Member

Ken M

Member

RMA

Member

Similar Topics