The bit hack challenge

TConnolly

Lifetime Supporting Member
Join Date
Apr 2005
Location
Salt Lake City
Posts
6,152
Time for another bit hack puzzle.

The following code is how a recycling counter (such as one might use in indexing through an array) is usually done:
I := 1+1
If I > Max_I then I:=0

It is implemented in a for/next or do/while loop. This necessitates a comparison and branching.

However, if for example you want to index a 4 item array, 0 thru 3, then this works really well.
I := (I+1) & 3
This code counts 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2 ....

The general form works for any 2n counts where counting begins at 0 and goes to to 2n-1
I:= (I+1) & (2n-1)
The advantage is that no branching or comparison is involved, as a result it will execute very fast when 2n-1 is a constant.

Now suppose that someone wanted to use some other number. For example, what bit hack will cycle 0 through 5?

The challenge: Pick any integer not equal to (2n-1) and develop a bit hack that can be expressed as a single equation that will count up from 0 to that number and then recyle to 0, or count down from that number to 0 and then reset to that number.
 
Last edited:
Well, it is supposed to be a challenge... 🍺

Besides, MOD involves division, (unless there is binary trick I don't know) and we know how you feel about division.
 
Last edited:
> I would frown on methods other than the
> AND and IF THEN methods you have already shown.

Why would you frown upon a=(a+1)%x? That's extremely common in, say, FIFO implementations in 'real' languages. Performance is not an issue on most modern processors, and will exceed the IF...THEN case in many situations.
 
Look at your Icon and then mine. That says it all.

> I would frown on methods other than the
> AND and IF THEN methods you have already shown.

Why would you frown upon a=(a+1)%x?
I count clock cycles because I/we make motion controllers.

That's extremely common in, say, FIFO implementations in 'real' languages.
Yes, I will use mod instructions if the variable 'a' could be many multiples larger than variable 'x' and the application is not speed critical. If not then I use Alaric's methods. I purposely will make queues or fifos 2^x entries long to make use of Alaric's a=(a+1)&(x-1) trick.

Performance is not an issue on most modern processors,
a=(a+1)&(x-1) works well on any processor.

and will exceed the IF...THEN case in many situations.
[/quote]
Show me!!! As Alaric pointed out, the MOD operator uses a divide instruction or must go through the divide process. This usually takes a one clock cycle for every bit in the divisor let alone the check for divide by 0 and to take the absolute value if required even if done in hardware. The if-then method usually takes about 5 clock cycles, one for the compare and 4 to make the jump.

Many modern risc machines, like ARMs, do not have a divide instruction. However, ARMs have a compare with conditional execution that makes the compare method the fastest method.

BTW, I like your G3xx because they can easily communicate with our controller and it is unique in that it can read/write blocks of data from our plot buffers as opposed to other HMIs, that can only attempt to read one tag at regular intervals to make a plot with a lot of sample jitter.
 
I didn't realize that the if-then takes just five clock cycles - thought it would be more with the jump. I might depend on how efficient the compiler is. I had a teacher back in college that used to have contests to see who could write the fastest code, getting above a certain benchmark could mean getting to skip an exam, so I had incentive to avoid branching anywhere it was possible. That may not have necessarily instilled the best programming habits in the students however it did get us to think creatively.

Normally I don't worry about those things, the time simply spent isn't worth the extra CPU time - but I do enjoy a good bit hack challenge to tinker with from time to time.

I've got a couple of machine right now each with a ControlLogix, Delta RMC and a G306 on it. Right now the G306 talks only to the CLX, but I've been itching to play with it talking to the RMC. I haven't had the time on the machine since, production wanted it, I guess the factory has to make money. Imagine that.
 
It obviously depends on the processor, but if it has a hardware divider the cost of the pipeline stall from the branch can be higher than the cost of the division, especially since the branch is taken most time. (The ARM can use its conditional execution features to avoid the jump, which is even better.) Of course, the real answer is to code using mod and use 2^n buffer sizes, knowing that the compile will reduce the mod to an and, and that you won't have to worry about breaking anything if your buffer sizes change.
 
I didn't realize that the if-then takes just five clock cycles - thought it would be more with the jump.
On our DSP it takes about 1 cycle per instruction, but calls and jumps take 4 cycles. This is because the pipeline is 4 instructions long and when the jump is made the pipeline instructions are 'lost'

I might depend on how efficient the compiler is.
This is a function of the CPU, of course the compiler must be able to take advantage of any tricky things. The DSP we use had delayed jumps and calls, I will not explain those but they essentially reduce the branches, jumps and calls to just 1 clock cycle because the instructions in the pipe line are executed before the jump or call is made.

I had a teacher back in college that used to have contests to see who could write the fastest code, getting above a certain benchmark could mean getting to skip an exam, so I had incentive to avoid branching anywhere it was possible. That may not have necessarily instilled the best programming habits in the students however it did get us to think creatively.
I avoid jumps when possible to avoid the difference in execution time. This makes the execution time more deterministic.

I've got a couple of machine right now each with a ControlLogix, Delta RMC and a G306 on it. Right now the G306 talks only to the CLX, but I've been itching to play with it talking to the RMC.
It works extremely well. No external PLC required as long as you don't need more I/O that what we can put on our controller. The Red Lion compliments our our weaknesses by having the ability to store huge amount of data in flash on a compact flash.

It obviously depends on the processor, but if it has a hardware divider the cost of the pipeline stall from the branch can be higher than the cost of the division, especially since the branch is taken most time.
So show me where a hardware divide is faster than reloading the queue!!!! This could only happen if there are a lot of wait states accessing the new instructions AND the divisor is only 8 bits. Hardware division is still basically a shift and subtract operation and it takes about one clock per bit of the divisor not counting checking for divide by 0 errors, overflow errors and getting the absolute value of the the divisor and dividend.
Modern CPUs also have branch predicting or a compile time optimization that optimizes the jump in a loop. This minimizes the number of pipeline stalls.
 
I = (I + 1) & ((I - 7) >> 16)
Where the 7 is the value to count up to and 16 is half the size of the bit register.
This bit hack utilizes a two’s compliment negative number as a mask which will reset the count when it becomes non-negative. The hack will only work for numbers small enough to be represented using the lower half of the bit register. For example, 65535 or less for a 32 bit register. As Peter mentioned, this hack would be a head scratcher if found in someone else’s program, and would generally be frowned upon.
 
I like it, good for you Brownhat!

This just needs to be documented so that it is a well known trick. On a TI TMSC33 a compare a jmp will take 5 clock cycles and your trick will take 4 clock cycles at best. I would need analyze this to make sure that the 7 can be subtracted from the I on the next clock cylinder without stalling the pipeline. Also, the shift right must happen in one clock cycle so the CPI must have a barrelshifter. If there is a pipeline stall then the tie goes to the obvious and straight forward method. Also, as I pointed out the TI can often to the jump without wasting clock cycles by using a delayed jump.
 
I = (I + 1) & ((I - 7) >> 16)
Where the 7 is the value to count up to and 16 is half the size of the bit register.
This bit hack utilizes a two’s compliment negative number as a mask which will reset the count when it becomes non-negative.

Wow... I would have never come up with that in a million years.

I was being so dense about this one that I didn't believe it would work until after I dumped it into a bash script on my linux box and saw it with my own eyes. It still took me stepping through the example one number at a time until the whole 'reset the count when it becomes non-negative' thing clicked.
 
Wow... I would have never come up with that in a million years.

I was being so dense about this one that I didn't believe it would work until after I dumped it into a bash script on my linux box and saw it with my own eyes. It still took me stepping through the example one number at a time until the whole 'reset the count when it becomes non-negative' thing clicked.

Me neither. And it seemed like it was going to take me nearly that long to figure it out. :oops:

@Brownhat Very clever. (y)
 
Nice one Brownhat. I've been playing with this problem off and on for several days and hadn't come up with a solution yet. I was starting to wonder if a general case solution even existed.
 

Similar Topics

See the screenshot of EIP tag list. We are trying to read in a digital input that is hard-wired. It is shown here as I31.1. I believe we cannot...
Replies
7
Views
285
A couple days ago I accidentally toggled an alwasyoff bit. The issue is it was set up as a single OTU on a rung, nothing else, and used as XICs...
Replies
3
Views
228
Hi I have an old panel builder 32 that I’m replacing for a factory talk me hmi . In the plc for the old panel builder there is a coms bit for the...
Replies
0
Views
76
Hello, Haven't been on in a while. I need to generate a bit level pdf of the I/O for RSLogix 500. I can generate a report but it just shows the...
Replies
1
Views
163
I tried researching but I still don't quite get it. As far as I understood, it's used after a function is called in STL and then if the function...
Replies
1
Views
147
Back
Top Bottom