Job Responsibilities:
1. Design and develop AI local inference code to support large-scale deployment scenarios.
2. Conduct architecture design and implement acceleration solutions for heterogeneous hardware platforms (including iGPU and d-GPU).
3. Optimize inference performance by applying advanced technical methods such as operator fusion, memory bandwidth reduction, quantization, and mixed precision.
4. Engage in device perception algorithm development and LLM performance fine-tuning.
5. Drive technological innovation in inference acceleration algorithms and maintain technical leadership in the field.
Job Requirements:
1. Profound professional knowledge and practical experience in AI local inference code development and large-scale deployment.
2. Solid expertise in architecture design of heterogeneous hardware platforms (iGPU/d-GPU) and hands-on experience in implementing corresponding acceleration solutions.
3. Proficient in applying advanced optimization techniques (operator fusion, memory bandwidth reduction, quantization, mixed precision, etc.) to improve inference performance.
4. Rich experience in device perception algorithm development and LLM performance fine-tuning.
5. Demonstrated ability in technological innovation of inference acceleration algorithms and a track record of maintaining technical leadership.
6. Strong problem-solving skills and the ability to independently tackle technical challenges in related fields.
1. Design and develop AI local inference code to support large-scale deployment scenarios.
2. Conduct architecture design and implement acceleration solutions for heterogeneous hardware platforms (including iGPU and d-GPU).
3. Optimize inference performance by applying advanced technical methods such as operator fusion, memory bandwidth reduction, quantization, and mixed precision.
4. Engage in device perception algorithm development and LLM performance fine-tuning.
5. Drive technological innovation in inference acceleration algorithms and maintain technical leadership in the field.
Job Requirements:
1. Profound professional knowledge and practical experience in AI local inference code development and large-scale deployment.
2. Solid expertise in architecture design of heterogeneous hardware platforms (iGPU/d-GPU) and hands-on experience in implementing corresponding acceleration solutions.
3. Proficient in applying advanced optimization techniques (operator fusion, memory bandwidth reduction, quantization, mixed precision, etc.) to improve inference performance.
4. Rich experience in device perception algorithm development and LLM performance fine-tuning.
5. Demonstrated ability in technological innovation of inference acceleration algorithms and a track record of maintaining technical leadership.
6. Strong problem-solving skills and the ability to independently tackle technical challenges in related fields.