Increasingly FPGAs will be deployed at scale due to the need for increased need for power efficient computation and improved high level synthesis tool flows, creating a new category of device: data centre FPGAs. A method for using these FPGAs is to identify what proportion of a given workload would benefit from being implemented upon the available FPGAs while minimising communication off-chip. As part of the implementation of these tasks, care should be taken in identifying the parallel executio...