Job Title:
Associate Data Architect – Master Data Management (MDM)
Location:
Pune -
Hybrid
Experience:
10+ years of
experience in Data Architecture, Data Engineering/Integration with strong
exposure into Data Modelling and Database (RDBMS) Management.
About the
Role
We are seeking
an Associate Data/Database Architect to join our core product
architecture team building an enterprise-grade, multi-domain Master Data
Management (MDM) product platform.
You will play a
key role in optimizing and extending the MDM data model, implementing
efficient data ingestion and entity resolution mechanisms, and ensuring
the system supports multiple domains such as Party
(Individual/Organization), Product, Location, Policy, and Relationship in a cloud-native and scalable manner.
Key
Responsibilities
Data
Modeling & Architecture
- Enhance and extend the existing Party-based data
model into a multi-domain MDM schema (Party, Product, Location,
Relationship, Policy, etc.). - Design and maintain canonical data models and staging-to-core
mappings for multiple source systems. - Implement auditability, lineage, and soft-delete
frameworks within the MDM data model. - Contribute to the creation of golden records, trust
scores, match/merge logic, and data survivorship rules. - Ensure the model supports real-time and batch data
mastering across multiple domains.
Data
Engineering & Integration
- Help support to optimize data ingestion and
ETL/ELT pipeline using Python, PySpark, SQL, and/or Informatica (or equivalent tools). - Design and implement data validation, profiling,
and quality checks to ensure consistent master data. - Work on data harmonization, schema mapping,
and standardization across multiple source systems. - Help build efficient ETL mappings from
canonical staging layers to MDM core data models in PostgreSQL. - Develop REST APIs or streaming pipelines
(Kafka/Spark) for real-time data processing and entity
resolution.
Cloud &
Platform Engineering
- Implement and optimize data pipelines on AWS or Azure using native services (e.g., AWS Glue, Lambda, S3,
Redshift, Azure Data Factory, Synapse, Data Lake). - Deploy and manage data pipelines and databases
following cloud-native, cost-effective, and scalable design
principles. - Collaborate with DevOps teams for CI/CD, infrastructure-as-code, data pipeline and database deployment/migration automation.
Governance,
Security & Compliance
- Implement data lineage, versioning, and
stewardship processes. - Ensure compliance with data privacy and security
standards (GDPR, HIPAA, etc.). - Partner with Data Governance teams to define data
ownership, data standards, and stewardship workflows.
Requirements
Technical
Skills Required
Core Skills
- Data Modelling: Expert-level in Relational
(3NF) and Dimensional (Star/Snowflake) modelling; hands-on in Party
data model, multi-domain MDM, and canonical models. - Database: PostgreSQL (preferred), or any
enterprise RDBMS. - ER Modelling Tool – Erwin/ERStudio, Database
Markup Language (DBML). - ETL / Data Integration: Informatica, Python,
PySpark, SQL, or similar tools. - Cloud Platforms: AWS or Azure.
- Programming: Advanced SQL, Python, PySpark, and/or UNIX/Linux scripting.
- Data Quality & Governance: Familiarity
with data quality rules, profiling, match/merge, and entity
resolution. - DevOps - Version Control & CI/CD: Git,
Azure DevOps, Jenkins, Terraform, Redgate Flyway (preferred)
Database
Design & Optimization (PostgreSQL)
- Design and maintain normalized and denormalized
models using advanced features (schemas, partitions, views, CTEs,
JSONB, arrays). - Build and optimize complex SQL queries, materialized
views, and data marts for performance and scalability. - Tune RDBMS (PostgreSQL) performance – indexes,
query plans, vacuum/analyze, statistics, parallelism, and connection
management. - Leverage RDBMS (PostgreSQL) extensions such as:
- pg_trgm for fuzzy matching and
probabilistic search. - fuzzystrmatch, pg_vector for
semantic similarity and name matching. - hstore, jsonb for flexible attribute
storage. - Implement RBAC, row-level security, partitioning,
and logical replication for scalable MDM deployment. - Work with stored procedures, functions, and
triggers for data quality checks and lineage automation. - Implement HA/DR, backup/restore,
database-level encryption (at-rest, in-transit), column-level
encryption for PII/PHI data.
Good to Have
- Knowledge of Master Data Management (MDM) -
Customer, Product etc. - Experience with graph databases like Neo4j for relationship and lineage tracking.
- Knowledge of probabilistic and deterministic
matching, ML-based entity resolution, or AI-driven data
mastering. - Experience in data cataloging, data lineage
tools, or metadata management platforms. - Familiarity with data security frameworks and Well-Architected
Framework principles.
Soft Skills
- Strong analytical, conceptual and problem-solving
skills. - Ability to collaborate in a cross-functional,
agile environment. - Excellent communication and documentation
skills. - Self-driven, proactive, and capable of working with
minimal supervision. - Strong desire to innovate and build scalable,
reusable data frameworks.
Education
- Bachelor’s or master’s degree in computer science,
Information Technology, or related discipline. - Certifications in AWS/Azure, Informatica,
or Data Architecture are a plus.
Benefits
Why Join Us
- Be part of a cutting-edge MDM product initiative blending data architecture, engineering, AI/ML, and cloud-native
design. - Opportunity to shape the next-generation data
mastering framework for multiple industry domains. - Gain deep exposure to data mastering, lineage,
probabilistic search, and graph-based relationship management. - Competitive compensation, flexible working and a
technology-driven culture.