One reason your BSL method did not work is that you used the SAME Source Bit "Input_Main_EM" for all 4 shifts. All that will do is move the same bit 4 different times. What you would need is to shift 4 consecutive bits 4 times. That will require 4 BSL instructions, each with a DIFFERENT consecuticve Source Bit address, and each BSL triggered 4 times for each of your pulses.This is what was in my mind. All though when I test this, It not working..
Or you can use the multiply-by-16 then MVM method.