for an on-chip realization, do you need that whole output stage shown in the linked page?
I've often seen on-chip op amps that act a bit more like OTAs without a buffered output stage and they're sufficient for driving on-chip loads, especially capacitive ones like FET gates of further stages/feedback.
For a fully diff amplifier I did in the past, the common mode feedback op amp was the simple/minimalistic 5-transistor OTA and that was more than enough for the CMRR and stability I needed