Cube Root

L D[AR2P#0.0] · Mar 11, 2010

The S7 coding for the initial guess is as follows: (from Kahan, see interesting ref http://www.cims.nyu.edu/~dbindel/class/cs279/qbrt.pdf)

The initial guess for the cubed root of 15625 is 25.57 (as in Peter's previous post)

JesperMP · Mar 11, 2010

I think there is no multiply by 0.33333. It is an integer divide by 3.
I cannot digest Peters c code. I get lost when an integer is multiplied by a 0.33333 (a real number) and it has the same effect as dividing by an integer 3.

In the lecture by Kahan, he writes: "to compute a quick approximation q to y^1/3; compute Q:=C+Y/3 in fixed-point and reinterpret it as floating-point number q."

Peter Nachtwey · Mar 11, 2010

Excellent, L D[AR2,p#0.0] found the source of the trick.

JesperMP said:
I think there is no multiply by 0.33333. It is an integer divide by 3.

Yes but an integer divide by 3 is soooo slooooow.
C converts integers to reals automatically when multiplying by a real. The result get converted back to an unsigned int ( 32 bits ) when it is stored back to the location pointed to by *p. *p is a pointer but C types pointers where STL does not.

I cannot digest Peters c code. I get lost when an integer is multiplied by a 0.33333 (a real number) and it has the same effect as dividing by an integer 3.

That is a result of PLC constrained thinking.

In the lecture by Kahan, he writes: "to compute a quick approximation q to y^1/3; compute Q:=C+Y/3 in fixed-point and reinterpret it as floating-point number q."

Yes, Kahan is the man. Do a search for Kahan cube root. There is a lot to read.

Peter Nachtwey · Mar 11, 2010

OT, optimizing x**3

Alaric said:
Manufacturers that create speedy efficient controllers like the RMC are the exception rather than the rule [/shameless plug]. We shouldn't assume that the PLC will optimize our programs for us, unlike any half way decent C compiler does.

That is true and it is shameful. I would harp on it more but our own compiler is far from being done but those things we need to do quickly we can do quickly.

For example, since I'm not privy to the inner working of XPY except what AB publishes about it (2^Y*log2(X)) then I'll write MUL N7:0 N7:0 instead of XPY N7:0 2 because I know how it will process and have no reason to think that the XPY will do anything except 2^2*log2(N7:0) . Likewise, I convert polynomials from Ax³ + Bx² + Cx + D to ((Ax +B)X +C)X + D because I don't depend on the PLC to due anything at all to optimize it.

Good for you. We gave up on most people converting their polynomial to horner form
http://en.wikipedia.org/wiki/Horner_scheme
so we added a poly() function that allows poly(x,A,B,C,D,E,F,G,H). It is very fast.

Now here is shameless. I have uploaded some pictures showing how our compiler compiles some code. You can see that y=x**3.01 must use the POW function POW(x,3.01) would work too. ST use ** as the power function. For next line, y=x^3.0, the compiler can see 3.0 is an integer so it does a FLDD or floating point load direct from address x and makes two more duplicates on the stack. It then executes two floating point multiplies and does a floating point store direct in y. The next line of code shows there is only a minor difference between y=x**3.0 and y=x**4.0. Notice that only a DUP and FMUL are swapped. Finally a divide by a constant is converted to a multiply by 1/constant in the second compiler pass. FLDC is floating point load constant. The second compiler pass shows the time for each opcode. One can see the POW function is very expensive. 70 is the address of X and 71 is the address of Y.

If you use HP calculators then output of our compiler looks familiar because it is RPN at the point or almost like a Forth interpreter.

Notice that our compiler could remove the pseudo code for the first three lines of ST code because only he last line last line of code has any lasting affect on Y.

There are some aspect of optimizing that I think are quite difficult to do in a PLC. If the variables can be changed during a rung then each time a variable is used in a rung the variable must be read into the CPU. There are also problems with read modify write. This occurs when an interrupt and normal rung try to modify the same variable at the same time. Eventually we will get to the point where we can optimize over the whole step instead of just each line individually. We can do this because our steps are not interruptible.

TConnolly · Mar 11, 2010

Peter Nachtwey said:

Code:

float cuberoot(float a)
{
    float r = a;
    unsigned int* p = (unsigned int *) &r;
    *p = *p*0.33333333333 + 709921077; // 709921077 is the magic number
    float inv3a = 1.0/(3.0*a);
    printf("\na=%10.7f\n",a);
    printf("\nr=%10.7f\n",r);
    r = r*(4*a-r*r*r)*inv3a;
    printf("\nr=%10.7f\n",r);
    r = r*(4*a-r*r*r)*inv3a;
    printf("\nr=%10.7f\n",r);
    r = r*(4*a-r*r*r)*inv3a;
    printf("\nr=%10.7f\n",r);
    r = r*(4*a-r*r*r)*inv3a;
    printf("\nr=%10.7f\n",r);
    r = r*(4*a-r*r*r)*inv3a;
    return r;
}

int main (int argc, const char * argv[])
{
    float a= 15625.0;
    float r0=pow(a,(1.0/3.0));
    float r = cuberoot(a);
    printf("\npow(a,1/3)=%10.7f,  cuberoot(a)=%10.7f\n",r0,r);
    return 0;
}

That is a nice trick for forcing unsigned integer addition on the bit pattern of the IEEE float as though it were a 32 bit integer.

For the AB PLC guys it accomplishes almost the same thing as
BST COP float_tag dint_tag 1 NXB ADD dint_tag 709921077 dint_tag NXB COP dint_tag float_tag 1 BND
except the PLC won't do an unsigned ADD and its a lot more efficient. Instead of moving information around, the pointer P is assigned the memory address of r, and the compiler is told that the word at that address is to be treated as an unsigned integer. I had to look at it for a minute before I realized what it was doing because my initial thought was that someone just deliberately obfuscated the code. Its a work of art.

I'm going to have to remember that one, not that I am ever very likely to use it - I rarely do any C programming these days.

kamenges · Mar 11, 2010

Now we're talking about the right tool for the job

Originally posted by Peter Nachtwey:

That is true and it is shameful.

So what would your standard PLC user do if he went online and what he was looking at looked nothing like what he developed? He would go straight into brainlock. PLCs display active logic the way they do because of the audience looking at the logic. Even something as simple and logical as X^3 being changed to X*X*X would make alot of guys lose focus.

I said before that a tool is just a tool, which is true. But some tools are better than others for a given job. Now we are getting into tool selection.

Keith

Peter Nachtwey · Mar 11, 2010

kamenges said:
So what would your standard PLC user do if he went online and what he was looking at looked nothing like what he developed?

We hide all of that. The user can't see the generated code. They can only see the ST code they wrote. We listened to the forum. We save the source code with comments to flash as well as the binary image. When the user plugs in and goes online they can see the source code and comments stored in flash. The user doesn't care about machine code or pseudo code as long as it does what he wants and is fast enough.

ndzied1 · Mar 12, 2010

Peter Nachtwey said:
If you use HP calculators then output of our compiler looks familiar because it is RPN

I believe Horner is one of the classic examples of why RPN is so much more efficient with keystrokes than algebraic entry calculators.

Peter Nachtwey · Mar 12, 2010

"Someone" asked about the magic number.

I don't know how Kahan came up with the magic number. I know that there can only be (2^32)-1 of them and that isn't many, really. What if I simply tried all 4294967296 of them. Obviously many of the numbers will not even be close to working. "Someone" correctly pointed out that the exponent has an offset of +127 so 2/3s of that would be needed to added back in. So what is the number for (2/3)*127*2^23? It is very close to the magic number. Good insight there "Someone". I am surprised this "Someone" didn't try it. (2/3)*2^23 would be a good place to start looking for a magic number. It would be easy to try finding if the magic number worked for finding the cube root of 2 but what it work for 3? I bet the best magic number for 2 would work for 3 but there may be a slightly better number for 3. So what magic number would work best for all numbers we are trying to take the cuberoot of. Wow, the problem just got much bigger but did it? But think about it. The resulting mantissa of the cube root repeats every 2^3 or every mulitple of 8 so we can do a thorough search searching from 1 to 8. The mantissa will repeat again as the number we are taking the cuberoot of goes from 8 to 64. Now one can simply check for what number provides the minumum worst case estimate for the cube root for the number in the range of 1 through 8.
I mentioned above that the TI DSP we use doesn't use the IEEE format. I went through the same procedure to find a similar magic number for that format.
Perhaps there is some magic way of finding the magic number but I can find the magic number quick enough using little intuition and a lot of brute force.

JesperMP · Mar 12, 2010

Peter Nachtwey said:
Yes but an integer divide by 3 is soooo slooooow.
C converts integers to reals automatically when multiplying by a real. The result get converted back to an unsigned int ( 32 bits ) when it is stored back to the location pointed to by *p. *p is a pointer but C types pointers where STL does not.

It was the type conversion behind the scenes that wasnt clear to me.

Now I have coded cube root in 3 ways.
The "Regular" way, Kahans way with a divide, and Kahans way with conversions and a multiply.

Code:

      L     MD6             // 1st, the classical approach. MD10 = e^(Ln(MD6)/3)
      LN    
      L     3.333333e-001
      *R    
      EXP   
      T     MD10            // this should be an (MD6)^1/3 (as REAL)
      NOP   0               // In theory it will be accurate, but in praxis rounding error will be quite significant
 
      L     MD6             // Kahans method.
      L     3               // load 3
      /D                    // divide
      L     L#709921077     // load magic number
      +D                    // add as DINT (?)
      T     MD14            // MD14 should be an approximated (MD6)^1/3 (as REAL)
 
      L     MD6             // Kahans method. With a multiply in stead of a divide.
      DTR                   // convert REAL to REAL (!? - dont think, just do it)
      L     3.333333e-001   // load 0.33333333333
      *R                    // multiply by 0.333333333 - same as divide by 3
      RND                   // convert back to DINT (but it is a REAL, just close your eyes and do it)
      L     L#709921077     // load magic number
      +D                    // add as DINT
      T     MD18            // MD18 should be an approximated (MD6)^1/3 (as REAL)

Result of cube root of 16525:
Regular way: 25.47106
Kahans way w. divide: 26.11289
Kahans way w. multiply: 26.11283

Not sure why I get a slightly different result as LD. Maybe because I am using PLCSIM, and he is using a real CPU ?

I have not tested the execution times of the above. Could be interesting, but I would need to do it on a real CPU for it to make sense.

Peter Nachtwey · Mar 12, 2010

The "regular' way is not right

MD10 = e^(Ln(MD6)/3)
should produce the exact results not
Regular way: 25.47106
There is a huge rounding errors somewhere in the S7.
Using Scilab:

Code:

-->r=exp(log(a)*0.3333333)
 r  =
 
    24.999992

Scilab using log() instead of ln(). If I multiply by 0.33333333333333 I get the exact answer. Scilab uses 64 bit floats.

JesperMP · Mar 12, 2010

The problem may be with PLCSIM.
I notice that the rounding error for kahans method is different to LD's result in post #31. I think he uses a real CPU.

bernie_carlton · Mar 12, 2010

Mr "somebody" appears after uncrossing eyes staring at a IEEE 754 calculator and the Windows calculator moving numbers back and forth in decimal and hex representations.

My 'guess' sheepishly PM'd to Peter was that the division by 3 of the floating point representation of the original number would be kind of like doing a log/3 (which would give a cube root) but the exponent in the IEEE 754 format is offset by 127 so it would be dividing that 127 offset by 3 also. That had to be corrected by adding 2/3 of that offset back in.

If that were true then the 'magic number' * 1.5, then converted to hex and applied as a bit pattern to an floating point register should give a result of close to - what? I thought it would be '1' but it isn't quite.

I entered '1' into the IEEE 754 calculator (which by the way I found here), taking the hex representation, copying that to the Windows calculator, converting to decimal, multiply by 2/3 gives a number close to but not the same as the 'magic number'.

I'm thinking the unexpressed '1' in the mantissa is messing me up but my eyes (and mind) had crossed by that point. I think this is kind of like what is going on but I don't how to run it further.

TConnolly · Mar 12, 2010

JesperMP said:
[/code]Result of cube root of 16525:
Regular way: 25.47106
Kahans way w. divide: 26.11289
Kahans way w. multiply: 26.11283

Not sure why I get a slightly different result as LD. Maybe because I am using PLCSIM, and he is using a real CPU ?

I have not tested the execution times of the above. Could be interesting, but I would need to do it on a real CPU for it to make sense.

Try 15625.

bernie_carlton · Mar 12, 2010

So0, back to the 'magic number'. The representation of 1.0 times 2/3 seems to make a "fairly good though not the same as Kahan's" magic number.

To test this I went to square root and figured the same estimate method ((FP representation of number taken as a integer)/2 + 'Square Root Magic Number') with Square Root Magic Number being the FP representation of 1.0 times 1/2. This gives me:

Square Root 100 - initial guess = 10.25
Square Root 10000 - initial guess = 103.625

I may try at some time doing a more exhaustive test with other roots and other numbers for tested roots.

So, the general derivation of this simplistic MN for a integral root is (FP representation of 1.0 times ((root - 1)/root))

As a side note the non-rigid typing of memory in the AutomationDirect core line of PLCs makes this dual typed use of a memory space somewhat easier than those with strongly typed memory.

Cube Root

L D[AR2P#0.0]

Lifetime Supporting Member

JesperMP

Lifetime Supporting Member + Moderator

Peter Nachtwey

Member

Peter Nachtwey

Member

TConnolly

Lifetime Supporting Member

kamenges

Member

Peter Nachtwey

Member

ndzied1

Lifetime Supporting Member

Peter Nachtwey

Member

JesperMP

Lifetime Supporting Member + Moderator

Peter Nachtwey

Member

JesperMP

Lifetime Supporting Member + Moderator

bernie_carlton

Lifetime Supporting Member + Moderator

TConnolly

Lifetime Supporting Member

bernie_carlton

Lifetime Supporting Member + Moderator

Similar Topics