In my opinion, there are 2 rules:
(1) Memories (just like any other hardened module) must be instantiated on user_project_wrapper level
(2) Maximal one clock per module
How you structure the layout is up-to-you.
(1) You can go flat on a reasonable laptop or PC with swap-space and have a flat design, hardening only a minimum number of modules (basically one hardened module per clock domain).
(2) You can go 90’ies style when CPU power was limited and go hierarchical, hardening all peripherals and modules individually, which is a PITA. IMHO by doing that, you introduce timing penalties, but I don’t have data to support this as I haven’t done it, because I like simplicity and prefer (1).
Cheers, Tobias
PS: Check out if you can push your OpenRAM module through the flow all the way down to the very end to avoid late surprises in the flow. They are kinda limited.