Good discussion. Unfortunately, I find that once you know the math, it is easy to be frustrated by the inflexibility of a PID. Three knobs to turn isn't much reward for the mathematical sophistication needed to plow through the theory.
The greatness of the PID lies in its intuitiveness and tuneability. The P, the I, and the D each consider the process error in a different way and combine to generate the output of the controller.
Proportional gain: controller correction effort is proportional to the present value of the process error. Large temperature error, turn the heater on high.
Integral gain allows the controller to remember: correction effort is proportional to the integral of the error from the past until the present (although simplistic to the point of inaccuracy, you can think of the integral as the average value of the past error over a certain time window times the length of the time window). I-gain reduces damping and can increase oscillation, since the contribution from the I term depends on a window of the past instead of just the here and the now. However, it can remove steady-state error precisely because it remembers the past.
Derivative gain helps the controller anticipate: correction effort is proportional to the present rate of change of the error, so if the error increasing, the corrective effort is raised accordingly to bring the process back where it belongs. The D-term adds damping, and can allow you to use higher P-gain without excessive oscillation. However the derivative is very sensitive to (amplifies) noise in the measurements, latencies, and update rate, which is one reason it is not always used. It is a valuable tuning knob, but can be hampered by inadequate instrumentation.
That's the essence of PID. If you want to know why bigger isn't always better for the three gain terms, or why certain adjustments can cause oscillation or instability, then you need to start diving into the math.