I will answer this from another perspective.
The communication channel from the CPU which is interpreting the program to the actual physical I/O is often many times slower than the access to a memory map which will represent the outputs.
Thus, during the program scan the intended outputs are written to the memory mapped section. At the end of the program scan the memory mapped area is sent to the actual physical outputs in one move, saving a lot of time.
Some instructions (if they exist in your system) like 'interrupt I/O read/writes' can override this sequence but at the expense of using precious program scan time.
So, from my point of view, the use of intermediate memory mapped locations which are then, at the end of the scan, applied to real outputs is to save time. The use of memory mapped inputs is another issue in itself.