TROUBLESHOOTING METHODS? Interview question

rook

Guest
R
I would like to know some you pros out there, methods, philosophies, or procedures when troubleshooting a control system. Is it ok make a quick fix to a problem to get the process back up and running then implement a more permanet fix? as far as the troubleshooting methods and philosophies etc is concerned it seems to me that it is all dependent upon the process. some input is greatfully appreciated
 
Your right! I largely depends on the system, logic, mechanical, and electrical, and the symtoms of the problem.

Gennerally I try to get my guys to start at the output they want to be on or off, than work their way back through the program from there.
 
It just depends...period. IF you have a process/machine down that has 10 people sitting idle and loss of revenue is 1000's of dollars an hour...you do what can be done, adhering to safety.

There will be times you do workarounds that allows the process/machine to run...NOTE: I am not talking about bypassing ANY safety devices.

THere have been times when I have refused to make a machine run because I thought the process was unsafe, like when a boilers safeties were bypassed and the roof was blown off. There have been other times when nothing happened.

You do not wont to live with the fact that you have caused harm to another or yourself, at least I dont.

As far as troubleshooting goes I learned a long time ago to "go to the source". That is a broad statement but use with symptoms and see what happens.
 
Last edited:
First rule when trouble shooting an electrical control system - check the fuses and circuit breakers and that power is available where it should be, including control circuit power.
First rule when trouble shooting a machine controlled by a PLC - the same rule applies.
Second rule - check that the inputs that should be on are on.
beerchug
 
The other messages indicate a number of excellent issues when debugging. I have some personal opinions about debugging that may be of interest.

1. Don't assume anything is correct. Normally it is the thing you assume to be correct that will make you search for days/weeks. Use a careful process of elimination to determine where the problem exists, such as checking lights on sensors/inputs/programming software for electrical problems, expected operation modes, etc.
2. If you understand the program you are debugging, follow the sequence of operation and determine where it deviates from the expected. Trace this problem to where it deviates and fix the problem.
3. Remember that what you see on the screen in PLC programing software is not instantaneous, and events with a short period are rarely displayed. Find ways to track fast events, such as latches.
4. The worst approach to fixing problems is trial and error. If you don't know what has caused the problem, semi-random program changes are very inefficient and often result in unreliable patches.
5. The best method to reduce debugging is to carefully design the program before writing any code. A corollary is that the best debuggers are those who get a lot of practice because they start programming without planning.
6. Learn what a kludge is and only apply it when necessary.
7. Design and build programs in sections/modules if possible so that they may be added/tested in sections. This reduces the scope of errors to smaller sections of the program, and therefore easier to locate.
8. Finally, a good programmer may be able to debug well, but they know enough to minimize it through careful program design.
 
Yes, excellent list, Hugh... (y)

Also, keep in mind that when something goes wrong, it's quite often only ONE thing. Therefore, if you make a change that doesn't cure the problem, UNDO THAT CHANGE before proceeding!

IOW, don't add any NEW problems to the mix!... (n)

beerchug

-Eric
 
My approach to troubleshooting a PLC operated production system is simple. Begin with the component where the symptoms appear and then work your way back. Keeping in mind that 99% of all problems are I/O related, not the program. Quick fixes are essential when production is down, as long as no safety devices are overridden. Then finally, remember that there is a root cause of the failure. The root cause analysis can take hours, days, weeks or even months to determine, depending on the source, but there IS a root cause. It is essential to determine this root cause.
 
Excellent points Hugh.

Point 3 - latches - yes with most PLCs. One thing I like about Omron CX-Programmer is that one can place a differential monitor on a point. The selection is rising or falling. It then counts the number of times the differential event happens. Great idea. Helps commissioning enormously.

Design and careful programming are definately most essential. Properly performed, these 2 points probably contribute most to fast efficient commissioning and trouble shooting, particularly when the PLC person is the last one in. It is expected that the program should be downloaded and work straight away. We have all copped the lot from time to time and liquidated damages get waved around like lollies because, although others may have held the PLC person up for months, the job is finished and everyone wants to get practical completion and payment.

May I say that my first point of checking is to turn the PLC into programme mode and check all wiring/inputs by turning them on and checking the lights on the input card and the input in the programme. Essential first point. If the inputs are incorrectly wired it will never work.

The other thing that is obvious is that it is always the programme. When one is called in to trouble shoot a system everthing has been checked by everyone for correct operation. The first thing one cops on arrival is the big stare. As we all know, it is usually an input or output device that causes the problem but that has to be proved by the programmer.

beerchug
 
Last edited:
There is an excellent distinction emerging in this thread. There are two primary debugging modes,

Preproduction - While the control system is being developed, there are flaws that are exposed through testing. Hopefully these are (as much as possible) eliminated before production. Fixing these after a machine has entered production is (stressfully) costly and time consuming. During this stage, all system components software/hardware are suspect in problem solving.

Production - These errors are normally related to component failures, or unexpected/tested operation modes. Normally these errors should be assumed to be basic wiring/electical issues first before assuming software flaws.

In both cases the system validation/error checking should start at the lowest (electrical) level, and then work to the most abstract (software) level.
 
Last edited:
I always try to follow one rule when debugging a program or a system fault. Start at one end of the event chain, and follow cause and effect back one link at a time until you find a broken one. This technique applies to logic, where you are tracking status and value, as well as hardware where the diagnostics include voltage, crret, continuity, etc.
 
Design the debugging into the program

One can alway add rungs that check for invalid conditions. These rungs can be activated by a debugging contact. In C programming, we use assert statements that check if a condition is true. If not it call a subroutine to log the error.

State machines should have timeouts for most states. If the state times out and error message for that state can tell the operator exactly which input or condition that state was waiting for.

On PID loops one can check that the PV is following the SP. One can also check that the output is not saturating although some systems run this way after a step jump.

For some fast contact checking one can log the time a contact goes on or off. Some PLCs have system clocks that allow one to store the time in milliseconds. A simple fifo can be used. Sometimes two items, the event and the time must be stored. This can help with a sequence of events problem.

Unfortunately these tricks are rarely used.
 
Lot of good information and techniques here. My own method just depends on the problem. For the usual problems that just appear out of nowhere I generally use the method described by Tom Jenkins, start at one end of the system, verify everything and continue until you find the root cause. Like Ron Doran states, never compromise safety, it's just not worth the risk.

The fun problems are the intermittent ones that only happen sometimes and really test your skill and knowledge. If the system is failed and won't run until you find the cause...well you always find them usually rather easily. It's the intermittent problems that become a personal challenge that give me satisfation when finally solved.

You need to take an analytical approach and separate what it can be and what it can't be and why so. Gather as much information as you can. When did the problem first appear. What changed before then. Is it a component begining to fail or a design flaw. Your total knowledge of the system and all its components and interactions come into play. Not everyone can do it, no matter how intellegent they are. It's boring to some people and not always appreciated.

It's OK to make a quick fix as long as you go back and make a permanent fix later. For example, a failed input could be re-routed to a spare input but make sure you go back and replace the faulty component. If man made it, man can fix it.
 
There have been some good points thus far...

Something that I try to do also is find what else doesn't work. Often there are related problems and knowing that another part of the process does not work will help lead the way to what is common between the two problems.

Another mistake I see people make is wanting to change something to make something else work. If the process has been working in the past, it should still work with the same values, program, plumbing, etc.

In the same way, has something changed? Often I've found that a machine may have been working fine before going down for a PM and then will not start a few days later. What was done while it was down?

And as someone mentioned, the trial and error method is usually the worst method. It wastes time, effort, materials and often fails to get anything fixed.
 
Getting back to the posters original question,

Is it ok make a quick fix to a problem to get the process back up and running then implement a more permanet fix?

Ron was right when he said 'it depends', it most certainly does! Some of you will know the enviroment that I work in, and we can't afford to have a million £'s or so worth of plant sitting idle when there are ships at the berth (delays to the ships cost many £1000's a minute!). If this is the case we will 'frig' the fault to get the plant running and only do a full repair when we can, as work load allows. But having said that, as Ron stated, safety comes first! The frig will only be applied if it is safe to do so and it does not bypass any of the safety systems in place.

as for all the other answers... In my opinion, Tom Jenkin's gave the best one, short and to the point. You can't go far wrong if you follow that way of thinking.

Paul
 

Similar Topics

Hey when I turn on my Siemens PLC CPU 216-2 after runing 10 minute it's stop and showing SF indication after I turn off and some time later turn...
Replies
0
Views
92
Good Morning , Not sure how many use Markem-Imaje SmartDate X60 Printers , but I'm having trouble finding a good manual for...
Replies
3
Views
351
I'll preface this by saying this is the first time I've worked on a Siemens system and I'm fairly unfamiliar with them. I might mess up some of...
Replies
29
Views
653
Hello all, First time poster, long time viewer of these forums. Could not find my solution on here. We have had issues with a Comm Fail on an...
Replies
2
Views
339
Our wire stripper machine (Komax Gamma 263 S) stopped working. All kinds of fault messages... * Short circuit on digital output on CPU2000...
Replies
0
Views
272
Back
Top Bottom