Reverse bit order in a word? Controllogix

Peter Nachtwey · Mar 28, 2008

PLCDontUQuitOnMe said:
What I found surprised me. The new, elegant, two rung solution was 2 1/2 times slower than the old, brute force, 17 rung method. Why is this?

My scan time is 200 us with the old code and 500 us with the new(your implementation).

Do the CLR, ADD, LES, and JMP instructions take THAT much more time to execute rather than simple XIC, OLE, and a single MOV?

I know it doesn't seem right but I bet the RSLogix programmers have the XIC and XIO and OTE functions optimized and the bit numbers and masks can be hard coded. This can all be done at compile time in the RSLogix. It is hard to make a general case indirect XIC with a variable bit number near as efficient because this must be done at run time. Also, for each bit you are also doing an extra compare and jump that isn't required if you used the simple 16X xic ote method. Personally, I would use the xic ote method for only 16 bits or one of the methods in the bit twiddling hacks link above.
A modern C or C++ compiler will often unroll loops and put the code inline to avoid the compare and jumps.

Note, I would be embarrased if it took an extra 300 microseconds on our product but to be honest we don't allow small loops that could possibly cause the PLC or motion controller to get stuck in an infinite loop and fault. I like these questions because I implement these problems on our product just to see how we measure up. I would recommend this because it can be done in-line and is very deterministic because the same code is executed regardless of the bit pattern:

Code:

Bits:=(SHR(Bits,1 ) AND 16#55555555 ) OR SHL(Bits AND 16#5555555,1 );
Bits:=(SHR(Bits,2) AND 16#33333333) OR SHL(Bits AND 16#33333333,2);
Bits:=(SHR(Bits,4) AND 16#0F0F0F0F) OR SHL(Bits AND 16#0F0F0F0F,4);
Bits:=(SHR(Bits,8) AND 16#00FF00FF) OR SHL(Bits AND 16#00FF00FF,8);
Bits:=SHR(Bits,16) OR SHL(Bits,16);

For those interested in the code generated and the instructioin timing click on the link below and download the .pdf
http://forum.deltamotion.com/viewtopic.php?f=12&t=30. You can see execution is fast even though our compilers still isn't optimized. There are too many times when Bits is stored at the end of an expression and then reloaded again at the beginning of the next instruction. The compiler does no optimzation across expressions.

This takes 12 microseconds on our product and it is swapping 32 bits. Swapping 32 bits one bit a time into another word.

Code:

BitsA.31:=BitsB.0;
BitsA.30:=BitsB.1;
...
BitsB:=BitsA;  // copy all 32 bits back to the original DWORD

Takes about 22 microseconds.

I wonder how a S7 would do using hand optimized STL.

TConnolly · Mar 28, 2008

PLCDontUQuitOnMe said:
Thanks for the responses!

Alaric: I implemented your simple but brilliant code and it works perfectly. Now I know how to use indirect addressing with a CLX.
The old code I had was just like you suggested, 16 rungs with ACC bits going to temp bits, which are then all moved into the output module.

When I put your new code in I put it above the old and made two jumps. One to go around the new code, and one to go around the old. Then I put a toggle bit with opposite examinations in front of each jump so I could toggle between the two methods.

What I found surprised me. The new, elegant, two rung solution was 2 1/2 times slower than the old, brute force, 17 rung method. Why is this?

My scan time is 200 us with the old code and 500 us with the new(your implementation).

Do the CLR, ADD, LES, and JMP instructions take THAT much more time to execute rather than simple XIC, OLE, and a single MOV?

Thats why I first recommended the brute force method. Loops, comparrisons, and branching all take time - I posted the second method mainly to show you how CLX indirect addressng was set up and how you could embed an expression in the indirect address, not as a recommended solution.

Lets take a look at this

LBL WHILE_LOOP BST XIC COUNTER.ACC.[POINTER] OTE REVERSED.[15-POINTER] NXB ADD POINTER 1 POINTER LES POINTER 16 JMP WHILE_LOOP

When broken down into the steps the processor will execute, it will look something like this:
1)Fetch Pointer from Memory
2)Build bit mask on pointer value**
3)Fetch Counter.ACC from memory
4)Bitwise AND counter.acc with mask.
5)test non-zero
6)If non-zero bit jump to 16
8)Load 15 into processor register
8)Fetch pointer from memory
9)Subtract pointer from 15
10)Construct mask on result value**
11)not mask
12)Fetch reverse word from memory
13)And with notted mask
14)store reverse word in memory
15)Jump to 23
16)Load 15 into processor register
17)Fetch pointer from memory
18)Subtract pointer from 15
19)Construct mask on result value**
20)Fetch reverse word from memory
21)OR with mask
22)store reverse word in memory
23)Fetch pointer from memory
24)load 1
25)Add pointer and 1 together
26)Store pointer
27)Fetch pointer from memory
28)Load 16
28)Compare
29)If less than jump to 1

**For brevity, I listed build mask as one step, it will actually take several step for the processor to do this, which it has to do four different times, but you get the idea

Thats just for one time through the loop.

Now look at what the processore will do for
XIC COUNTER.ACC.0 OTE REVERSE.15

1)Load 0001h
2)Fetch Counter.ACC from memory
3)Bitwise AND
4)Test result for non-zero
5)If non-zero Jump to 12
6)Load 7FFFh
7)Load REVERSE
9)Biwise AND
10) Store result in Memory location for REVERSE
11)Jump to 16
12) Load 8000h
13) Fetch REVERSE from meory
14) bitwise OR
15) Store iresult n memory location for REVERSE
16) End of rung

daba · Mar 29, 2008

I came across this problem many years ago.

In the early days of microprocessors, when "techies" were bread-boarding their own "computers", a UK-based computing magazine ran a competition to "... reverse the bits in a byte, using the least number of (machine code) instructions ...".

I think it was based around the Zilog Z80 microprocessor chip, can't quite remember.

I read the results of the competition with interest, and the winner was announced, again I can't remember the winning solution, but what I can vividly recall was a "special mention" for a solution that used just two instructions.

He didn't win, because his solution involved some hardware as well, a 2-port PIO (Parallel Input/Output) chip, in which he suggested wiring bit 0 of the Output port to bit 7 of the input port, bit 1 Out to bit 6 In, etc., etc.

The solution (in general syntax) was :-

OUT (Output Port}, byte
IN (Input Port), byte

He got the special mention for his "lateral thinking".

This got me wondering if the original problem is based on physical input data to the PLC - if it is, then simply re-wire it ...

Peter Nachtwey · Mar 29, 2008

If I needed this function to swap the bits for I/O, I would have my FPGA designer swap the bits.

PLCDontUQuitOnMe · Mar 31, 2008

Thanks for the detailed info. I will go over this when I get some more time. Also thanks to everyone else for the insightful posts.

Alaric said:
Thats why I first recommended the brute force method. Loops, comparrisons, and branching all take time - I posted the second method mainly to show you how CLX indirect addressng was set up and how you could embed an expression in the indirect address, not as a recommended solution.

Lets take a look at this

LBL WHILE_LOOP BST XIC COUNTER.ACC.[POINTER] OTE REVERSED.[15-POINTER] NXB ADD POINTER 1 POINTER LES POINTER 16 JMP WHILE_LOOP

When broken down into the steps the processor will execute, it will look something like this:
1)Fetch Pointer from Memory
2)Build bit mask on pointer value**
3)Fetch Counter.ACC from memory
4)Bitwise AND counter.acc with mask.
5)test non-zero
6)If non-zero bit jump to 16
8)Load 15 into processor register
8)Fetch pointer from memory
9)Subtract pointer from 15
10)Construct mask on result value**
11)not mask
12)Fetch reverse word from memory
13)And with notted mask
14)store reverse word in memory
15)Jump to 23
16)Load 15 into processor register
17)Fetch pointer from memory
18)Subtract pointer from 15
19)Construct mask on result value**
20)Fetch reverse word from memory
21)OR with mask
22)store reverse word in memory
23)Fetch pointer from memory
24)load 1
25)Add pointer and 1 together
26)Store pointer
27)Fetch pointer from memory
28)Load 16
28)Compare
29)If less than jump to 1

**For brevity, I listed build mask as one step, it will actually take several step for the processor to do this, which it has to do four different times, but you get the idea

Thats just for one time through the loop.

Now look at what the processore will do for
XIC COUNTER.ACC.0 OTE REVERSE.15

1)Load 0001h
2)Fetch Counter.ACC from memory
3)Bitwise AND
4)Test result for non-zero
5)If non-zero Jump to 12
6)Load 7FFFh
7)Load REVERSE
9)Biwise AND
10) Store result in Memory location for REVERSE
11)Jump to 16
12) Load 8000h
13) Fetch REVERSE from meory
14) bitwise OR
15) Store iresult n memory location for REVERSE
16) End of rung

Reverse bit order in a word? Controllogix

Peter Nachtwey

Member

TConnolly

Lifetime Supporting Member

daba

Lifetime Supporting Member

Peter Nachtwey

Member

PLCDontUQuitOnMe

Member

Similar Topics