The NetsPresso Platform Team is responsible for designing and building the core platforms and software that bring Nota AI’s model compression and optimization technologies from research into real-world products.
The team is composed of Model Representation, Quantization, Graph Optimization, Model Engineering, and Software Engineering functions. NetsPresso converts models from various deep learning frameworks into its proprietary unified intermediate representation (NPIR), and applies optimization techniques such as quantization, graph optimization, and compression to maximize inference efficiency across diverse target hardware environments (NPU, GPU, CPU).
In this role, you will work directly with real customer models and target hardware environments, leveraging NetsPresso’s optimization technologies to deliver “optimization that works in production.” Rather than simply providing tools, you will analyze model architectures alongside constraints such as accuracy, latency, and memory, and implement the most effective optimization strategies tailored to each use case.
(Additional assignments may be included during the process.)
This role offers a unique opportunity to work at the forefront of bringing NetsPresso’s technology out of the lab and into real-world customer products. You will directly face the challenges of deploying Gen AI models onto on-device NPUs, transforming “theoretically possible optimizations” into “optimizations that actually work in production.” We are looking for someone who enjoys this process—someone who can understand both the language of customers and engineers, and proactively translate field feedback into product improvements. If you are eager to push the boundaries of on-device AI in real-world environments, this role will offer you both rapid growth and the opportunity to create meaningful impact.
The NetsPresso Platform Team is responsible for designing and building the core platforms and software that bring Nota AI’s model compression and optimization technologies from research into real-world products.
The team is composed of Model Representation, Quantization, Graph Optimization, Model Engineering, and Software Engineering functions. NetsPresso converts models from various deep learning frameworks into its proprietary unified intermediate representation (NPIR), and applies optimization techniques such as quantization, graph optimization, and compression to maximize inference efficiency across diverse target hardware environments (NPU, GPU, CPU).
In this role, you will work directly with real customer models and target hardware environments, leveraging NetsPresso’s optimization technologies to deliver “optimization that works in production.” Rather than simply providing tools, you will analyze model architectures alongside constraints such as accuracy, latency, and memory, and implement the most effective optimization strategies tailored to each use case.
(Additional assignments may be included during the process.)
This role offers a unique opportunity to work at the forefront of bringing NetsPresso’s technology out of the lab and into real-world customer products. You will directly face the challenges of deploying Gen AI models onto on-device NPUs, transforming “theoretically possible optimizations” into “optimizations that actually work in production.” We are looking for someone who enjoys this process—someone who can understand both the language of customers and engineers, and proactively translate field feedback into product improvements. If you are eager to push the boundaries of on-device AI in real-world environments, this role will offer you both rapid growth and the opportunity to create meaningful impact.