Testing aside, I think you should always allow for someone, possibly more than one, being online. You don't want your code or processor "failing" because it can't handle online sessions.
I did my own testing a different way to Ron's. I set up 5 separate subroutine files...
1. No code at all (used as a "baseline")
2. Inline code (same as Ron's)
3, Parallel Branched
4, "Nested" Branched
5, Separate rungs
I then set up a "count" (ADD 1, MOD,5) to cycle through 0 to 4, and used this value to execute one of 5 FOR instructions, calling each of the subroutine files a large number of times (30,000 in my case).
By GSV'ing the LASTSCANTIME, I can calculate the time each code type takes to execute.
Being "online" throughout should impose the same "overhead" to all iterations equally.
Result1 is the "inline" code - fastest
Result2 is the parallel-branched code - second fastest
Result3 is the nested/branched code - third fastest
Result4 is the 4 separate rungs - slowest
It can be seen that from "fastest" to "slowest", is a difference of approx. 9mS, but remember there are 30,000 executions of the code in that time.
In conclusion, there is a difference, and the inline code wins it, but it is hardly a storming win !!