Company
International SOS is looking to recruit a 6-month Data Analysis Intern to support in an upcoming project, an internal initiative aimed at conducting an in-depth analysis of high-cost medical conditions using proprietary health data sets from the past year. The project will support strategic decision-making by identifying cost drivers and patterns in medical claims.
The Role
Work in close supervision by the project leadership manager.
Build and evaluate an NLP/LLM-assisted pipeline to automatically map messy medical descriptions/claims text to ICD-10 (simplified), with measurable accuracy and human-in-the-loop validation.
Data understanding and labeling applied to NLP/LLM approach in collaboration with Intl.SOS health expert for review. Review and code health and claims records using simplified ICD-10 coding system.
Extract and organize relevant data to associate costs claimed with ICD-coded health files.
Perform data analysis to identify trends and insights related to high-cost medical conditions.
Ensure strict compliance with the organization’s confidentiality and privacy policies.
Prepare reports and dashboards using tools such as Excel and Power BI.
Communicate findings clearly in English (written and verbal).
Why Join Us?
Gain hands-on experience in health data analytics within a global organization.
Work on a meaningful project impacting healthcare cost management.
Exposure to advanced tools and real-world data sets.
Requirements
Undergraduates currently pursuing or recently completed a degree in Data Analytics, Health Information Management, Statistics, or related fields.
Possess strong data analysis skills.
Proficiency in Python (pandas) and ability to build a small reproducible NLP pipeline.
Basic knowledge of NLP/LLMs (text classification, embeddings, prompt design).
Familiarity with medical terminology (clinical training not required).
Ability to apply ICD-10 (simplified version) coding to health records.
Proficiency in Excel, Power BI, and email communication tools.
Experience with RAG / vector search and/or tools like spaCy, transformers will be an added advantage.
Familiarity with ICD-10 mapping or medical NLP preferred.