So, in your budget for this device, you need to allow many more transistors for the processor, more for RAM - I’m sure 32k is too small - and a lot more for interconnect. The system structure is likely to be “transputer” (processor, RAM and limited comms) and routers.
The nearest machine I know of that might have this sort of architecture is the Graphcore Colossus (
https://www.graphcore.ai/products/ipu). 60B transistors, 1472 processor cores, each with 900MB.