Also if the memory block to be used is not too large and the area and power overhead can be tolerate...
a
Also if the memory block to be used is not too large and the area and power overhead can be tolerated could a 2d array of flip flops be used to solve the clock limit issue.