SLC and pointers

cardosocea · Dec 22, 2021

Hello,

I've just came out of a call where a program that was modified two weeks ago threw the processor into fault.

The program has been done to run through the valves and motors of a plant through pointers. We copy the IO related to an item into 3 B files (Output, Feedback On/Running, Feedback Off/Stopped), we then have a pointer file where on the address for the item's number we give it the bit number for the Output and Inputs.

Two weeks ago, I modified it to add two valves, at the time tested this modification and managed to have it working, so the logic was being processed.

Today it faulted the processor with an out of bounds error (and the counter number at the index of the first item I added.

Would anyone have an idea of what could have caused logic that ran before faulted today? The device index runs in a loop, so it's not a case of not running today.

L D[AR2P#0.0] · Dec 22, 2021

Processor?

James Mcquade · Dec 22, 2021

your array needs to be increased by 1.
since you said you are running in a loop, let's say the index is 20 words.
at the very end of the cycle, the index counter went to 21, which is not there.
so i always have an extra word in there just for this purpose, to keep it from faulting.
so i increased the array 21 words
you might try to reset the counter when the index is >= 20 .
it's been several years since i did this, so you may have to work with it a little.
james

cardosocea · Dec 23, 2021

L D[AR2 said:
Processor?

It's a 1747-L553C.

The pointer instruction refers to B251:0/[N245:3], in this format.
where B251 is the file that holds a copy of the Digital Inputs and N245:3 holds the bit address.
B251 has 30 words (0-29), the index was 448 and 449 which sits within the number of bits within that file.

What bugs me is that the rung that it faulted on would have been run before since it's the one checking for faults from the valve feedback. It's gone around since two weeks ago and yesterday threw a fit. :/

James Mcquade said:
your array needs to be increased by 1.
since you said you are running in a loop, let's say the index is 20 words.
at the very end of the cycle, the index counter went to 21, which is not there.
so i always have an extra word in there just for this purpose, to keep it from faulting.
so i increased the array 21 words
you might try to reset the counter when the index is >= 20 .
it's been several years since i did this, so you may have to work with it a little.
james

Thanks James. The array has 480 bits, I was pointing at bit 448. Why would the logic work for two weeks and fail yesterday? The loop runs every cycle so any out of bounds should have faulted the processor straight away when the change was made ( I only had to change the pointer addresses in the N files).

drbitboy · Dec 23, 2021

So 448 and 449 are the bit addresses of the new valves?

How about the other (Bnnn?) files? Are they all (re-?) sized to 30 (Bnnn:0 to Bnnn:29)?

Would it make sense, from a diagnostic standpoint, to add a rung that detects when the bit address N245:3 is outside the valid range, and when that happens assign it to a non-existent valve, but for an existing bit in the B-files, and also count the events? E.g.



LIM 450 N245:3 -1 BST MOV 479 N245:3 NXB LES ADD F?:0 1 F?:0 BND

Where

F?:0 is a count of how many out-of-range events occur, which will count up to around 16 million, and not overflow after it gets there
479 is the last bit in the B-files, and presumably unused

Also, a PDF of the program would be useful to the forum participants in diagnosing the problem, if that is possible/allowed.

cardosocea · Dec 23, 2021

drbitboy said:
So 448 and 449 are the bit addresses of the new valves?

How about the other (Bnnn?) files? Are they all (re-?) sized to 30 (Bnnn:0 to Bnnn:29)?

Yes, 448 and 449 are the bit addresses for the valve closed signal.

The output file, for example, only goes up to 19 (20 positions), but the output bit location is 222 and 223, which covers it.

drbitboy said:
Would it make sense, from a diagnostic standpoint, to add a rung that detects when the bit address N245:3 is outside the valid range, and when that happens assign it to a non-existent valve, but for an existing bit in the B-files, and also count the events? E.g.

LIM 450 N245:3 -1 BST MOV 479 N245:3 NXB LES ADD F?:0 1 F?:0 BND

Where

F?:0 is a count of how many out-of-range events occur, which will count up to around 16 million, and not overflow after it gets there

479 is the last bit in the B-files, and presumably unused

The logic already does something similar, if the location isn't used, the value will be set to 9999. the logic checks for a NEQ to 9999 before checking the signals from the field.
The problem is checking the range though... the numbers I have set up are inside the space inside that file.

drbitboy said:
Also, a PDF of the program would be useful to the forum participants in diagnosing the problem, if that is possible/allowed.

I'll try to get one next year now as I'm off until the 3rd.

drbitboy · Dec 23, 2021

cardosocea said:
...if the location isn't used, the value will be set to 9999. the logic checks for a NEQ to 9999 before checking the signals from the field.

Good, but my suggestion was more about getting some idea of how often that happens.

If the code is something like



LIM 0 N245:3 479 XIC B251:[N245:3] OTE B252[N245:3]

is it possible that the code attempts to evaluate B251:[N245:3] for the XIC anyway, even if its incoming rung is False?

Also, if there is an OTE on the end, it will have to evaluate the indirect address to write a 0 or 1.

I am sure you have already considered all of this, but wanted to bring it up anyway.

cardosocea said:
The problem is [not?] checking the range though... the numbers I have set up are inside the space inside that file.

When it faults, can you go online and see what the current values are?

James Mcquade · Dec 23, 2021

since you said the loop runs every cycle, i am wondering if on this one scan you had a watchdog fault? i will try to find my logic for something similar, it was a message display with 20 words.
i also suspect that your fault may be at bit 447 which is bit 15 of word 27, which is a sign bit. SOMETIMES, the index does not like bit 15 because it is a sign bit. i have run into this before.
regards,
james

OkiePC · Dec 23, 2021

drbitboy said:
is it possible that the code attempts to evaluate B251:[N245:3] for the XIC anyway, even if its incoming rung is False?

This is worth investigating. If the illegal indirect address is in an instruction that requires action on a rung-in-false condition, then it would be a problem. You could get around this with a JMP/LBL pair, but a better plan would be to never stick an illegal value into the indirect address source.

If you have been stuffing it with "9999" and it never faulted before, that makes me think your code is already able to avoid processing the illegal reference.

I had a stumper of a program on a sorting conveyor that ran for two weeks before faulting. It boiled down to a process in an STI file where I was using a register that was also the indirect reference in many places in the main program. It took me a long time to figure out that the STI was updating the value to a "bad" number. I had LIMit checks everywhere I used indirection, so the STI had to run the interrupt during the main scan AFTER the LIMit check and BEFORE the indirect reference in order to cause the fault.

Other changes to add some I/O caused a tiny bump in the overall scan time after which the timing of the STI interrupting the main code at just the right instant became more likely. It faulted about a dozen times in four hours while I puzzled over what could be happening before I remembered the STI file.

So if the program has interrupts that can write to those addresses its another place to look.

drbitboy · Dec 23, 2021

cardosocea said:
I've just came out of a call where a program that was modified two weeks ago threw the processor into fault.

Two weeks ago, I modified it to add two valves, at the time tested this modification and managed to have it working, so the logic was being processed.

Today it faulted the processor with an out of bounds error (and the counter number at the index of the first item I added.

cardosocea said:
What bugs me is that the rung that it faulted on would have been run before since it's the one checking for faults from the valve feedback. It's gone around since two weeks ago and yesterday threw a fit. :/

K-T (Kepner-Tregoe - yes, I am that old) analysis, and common sense perhaps, would identify these statements as the most relevant data. Tested two weeks ago or not, it sounds like the program was running without faults before the modifications, so something related to the modifications, even tangentially, has to be the prime suspect for causing the recent faults after the modifications.

As OkiePC notes, it could be something innocuous and not directly in the modifications themselves e.g. something that changes how long a scan takes and when an STI fires. It could also be a very rare event, which is why it took two weeks to find.

Comparing the new code, and data files, to the old will be instructive when those PDFs become available.

Other ideas

The new indirect addresses, 448 and 449, are the first two bits in a word - e.g. B251:28 - that had perhaps not been referenced before two weeks ago.

Are there any edge detections/one-shots? Are all arrays containing the memory bits for that logic big enough?

SLC and pointers

cardosocea

Member

L D[AR2P#0.0]

Lifetime Supporting Member

James Mcquade

Member

cardosocea

Member

drbitboy

Lifetime Supporting Member

cardosocea

Member

drbitboy

Lifetime Supporting Member

James Mcquade

Member

OkiePC

Lifetime Supporting Member

drbitboy

Lifetime Supporting Member

Similar Topics