I worked on a software product for FPGA placement, a phase of FPGA physical synthesis. I implemented GPU acceleration on it using CUDA and shortened its runtime by 7 times on average. Apart from improving its performance, I simplified its development, porting the algorithm to a PyTorch-based framework, framing this nonlinear nonconvex optimization problem as training a neural network. I used C++ & CUDA extension on Python to speed up critical segments while maintaining low code complexity. I also extended the algorithm to consider clock network routing resource constraints with a quadratic penalty, considering global placement convergence and design legality. Furthermore, I expanded the algorithm to include new cell types, allowing it to process a non-academic real-life industry architecture and substantially improved its performance. No academic FPGA placement software ever contained these features, so my work resulted in a publication in a top IEEE journal.