Step7 - funny problems causing CPU crashes

RMA · Aug 21, 2007

Would anybody like to have a look at the attatched source file and see if they can come up with a suggestion for what's wrong with it.

Initially I had the situation where everything was working fine and then I started to add and modify things and after one modification I set my test trigger Bit M1999.0 and the CPU went into STOP because of cycle time > 150 ms. I restarted thee CPU and without changing anything tried again, but now everything worked fine - several hundred times. Then I made another change and, hey presto, first trigger - STOP due to cycle time. Restart try again - runs fine, several hundred times!

This happened four or five times before I landed at the point I'm now at where the CPU always goes into halt when I set the trigger Bit.

From the way the fault developed I reckon it feels like a TEMP or AR1 problem, but I'm blowed if I can see where!

Thanks in advance (says he, blithely expecting the problem to be solved when he comes back into the office tomorrow morning!

).

Cheers

Roy

L D[AR2P#0.0] · Aug 21, 2007

Roy, I've followed through the FC you posted and there are no backwards jumps present in the code so I don't think there is anything wrong here. The only backwards flow control is with the LOOP instruction and this is coded correctly (see end ^Note).
How many times is this FC being called ?
Are you monitoring any other blocks or VAT tables at the same time - maybe the 315 is just running out of puff. You could try temporarily extending the cycle time monitoring to say 1000 msec and see what happens.

^Note - I spotted a coding error just after the loop - see image below:

parky · Aug 21, 2007

Not really looked too closely at the code but if it's going out on cycle time then there are a number of possibilities.

1. too many loops remember every time you loop back it increases your cycle time, check how many loops it can run i.e. in some instances it could be infinite.
2. If the whole code is very large with many jumps back then the original code could be so large that it is close the max scan time watchdog set in the processor (increase scan time) to see if that cures it.

Nick B · Aug 21, 2007

Hi Roy,

Are you possibly using m1900.0 somewhere else in your project? Also what version of 315 and firmware are you using?

Nick

RMA · Aug 22, 2007

@Simon, I don't know what's happened there, I just checked the original program and it's correct - L [AR1,P#0.0] - as it more or less had to be, since I cut and pasted it from the loop after noticing that my average was incorrect, because I wasn't including Word 0 and had therefore only summed 255 Words, not 256.

@Nick, I'm pretty sure I'm not, but because I was a bit worried that there might be some OEM original code that was addressing memory indirectly and so not visible in X-Ref, I changed my original Marker addresses from 999/1000, etc., so I don't think that's the problem.

@Parky, the only (intentional!) loop in the program runs 256 times. According to the basic 315 specs a single pass ought to be somewhere around 1.5 µs, so that shouldn't be a problem. The program is called every cycle, but the Trigger Bit which causes full execution of the program - #Update_TZ - only occurs for a single cycle, about once a minute.

One thing I didn't mention in the original post, to keep it from getting overloaded, when the CPU is in STOP the details in the Stack folder of the diagnostic buffer are rubbish. Initially, where I had the situation that following a program change the CPU would crash once then run normally, everything looked okay, at least as far as the sequence of program calls was concerned - OB1 -> FC123 (the so called supervisor FC) -> FC14 the "work" SFC and finally FC2. The only thing that was a bit odd was the position of the cursor when FC2 was openened - the cursor was always at the first TAK instruction at the start of NW1, however the loop counter in the TEMP stack (Byte 1) was somewhere between 213 and 219, which seemed a bit odd.

Now, however, the B-Stack shows the weirdest things, OB1 - FC14 are the same as usual, but the final block which theoretically should have caused the crash (or am I on the wrong track there, when the problem is cycle time?), has been a weird selection of FCs - including FC14 a second time - FC14 is not re-entrant! The funny thing is that the Stack correctly shows that DB2 is open at this time (always, regardless which FC is allegedly active), which is correct, and DB2 didn't exist before I created this program, so there is no possibility that it is being called from elsewhere. Frankly, I reckon that whatever is being screwed up to cause the crash is also, or perhaps as a result, screwing up the data which lands in the B-Stack.

Still looking,

Roy

L D[AR2P#0.0] · Aug 22, 2007

Sorry not to be explicit, but the code circled in red is what you should be using and not the L [AR1,P#0.0]. This is because AR1 will still point to DBW2 when the loop exits, but your comment stated that it should be loading DBW0

Are you going to extend the cycle time monitoring ?

RMA · Aug 22, 2007

Oops, sorry, obviously didn't check that closely enough!

I've just discovered that the line's down at the moment while some other mods are made so I can try extending the cycle time. I'll bang it up to 1 sec, because I'm pretty sure that whatever's wrong is causing a permanent loop, which never ends.

Should be back soon.

Cheers

Roy

RMA · Aug 22, 2007

I thought I was going to be back in about five minutes - that would have been too easy I guess!

I modified the Timeout to 1500 ms and tried to download the SDB , but that didn't work because - according to the error message - some test applications were running. Checked everything, nothing to be found, even the networked PC was clean. That was when I discovered I couldn't get the PLC back into run because of an inconsistent SDB! I finally finished up having to pull the MMC to force a complete reset, to get up and running again!

Anyway, I then tried FC2 again in test mode and needless to say, it now worked fine, so I modified it back to use the correct Trigger Bits and watched a dozen or so WTs sail through the station with everything working fine, except that the Cognex camera was failing every second time. I came to the conclusion that the complete reset + reloading the program may have cleared some spurious fault, set the Timeout back to 150 ms and went chasing the Cognex problem. It took the best part of half - an hour to discover I'd been bitten the blasted Initial Values again - the receive length was back down to 4 Bytes instead of the 25 I changed it to a month or so back - aaargh! RAM to ROM wouldn't help here, or to be more precise I don't dare use it, because I did it out of habit fairly early in my tenure here and some time later when the CPU was stopped and restarted for some reason, the station wouldn't initialise, because some of the Initial Values were rubbish - that cost a few hours as well!

Anyway by this time we had about 40 WTs through the station and I was more or less on my way back here to report, when the CPU went into STOP again - with 150 ms of course!

So I've now set the Timeout back to 1500ms and so far after about another 20-odd WTs we've got typical cycle times of 25 ms and a max of 36 ms - and now I'm waiting for the next fault.

I'll let you know when it comes - I'm pretty sure it'll be "when", not "if"!

Cheers

Roy

L D[AR2P#0.0] · Aug 22, 2007

Whilst this wont fix your problem, please see below another implementation of your summation loop. I generally preload AR1 to the start of the area I am accessing and then increment it by the size of the variable being accessed, rather than using the loop counter and manipulating it. Note also that with two accumulators you dont need a temp variable.

RMA · Aug 22, 2007

Thanks Simon, that's neat with the "+AR1 P#2.0". That belongs in the category of "obvious once youvve seen it"!

So far we're up to 170 programm executions and counting, in an hour or so we'll be getting to the point where we wrap round in the ring-buffer.

One thing slowly simmering in the back of my mind is the question of whether I may have wakened a slumbering uninitiated TEMP in some other FC. You may remember that that happenend to me in Rossendorf when a program that had run reliably for months suddenly started giving trouble, because I hadn't initialised the Temps. I hadn't changed anything in that FC, but I had added a load of other programs in between times.

Cheers

Roy

RMA · Aug 23, 2007

Just a sudden thought

We're still up and running without any problems and having had a look my VAT I noticed that we haven't had any fualt conditions (Marker Bit M1999.2 & M1999.3 are still FALSE).

I haven't finished the mods for the OP, a TP170B, yet, so I intended to block the OP calls for the time being. Unfortunately, I didn't do this correctly, so that when testing we were issuing calls with Job 51 to jump to screens that didn't yet exist in the TP170B. Is it posssible that these calls could have caused the CPU to hang up so causing the cycle time fault and putting the CPU into STOP? At the moment I've got the Time-out back up at 1500ms, but so far the longest cycle is 42ms.

Cheers

Roy

Step7 - funny problems causing CPU crashes

RMA

Member

L D[AR2P#0.0]

Lifetime Supporting Member

parky

Member

Nick B

Member

RMA

Member

L D[AR2P#0.0]

Lifetime Supporting Member

RMA

Member

RMA

Member

L D[AR2P#0.0]

Lifetime Supporting Member

RMA

Member

RMA

Member

Similar Topics