Role Overview:
We are looking for an experienced Data Architect to design and implement scalable, high-performance data platforms that align with our organization’s evolving business needs. The role involves leading data architecture design, integration, and optimization across cloud environments, while ensuring robust data governance, security, and scalability.
Responsibilities:
- Design and architect scalable, high-performance, and secure data platforms (Data Warehouses, Data Lakes, Lakehouse) to support analytics, AI/ML, and reporting needs.
- Leverage Big Data technologies such as PySpark, Hadoop, Kafka, and Databricks for large-scale data processing and transformation.
- Develop and maintain conceptual, logical, and physical data models aligned with business requirements and governance standards.
- Architect data integration pipelines to enable seamless data flow across diverse systems, including legacy and modern cloud platforms (Databricks, Snowflake, AWS, Azure).
- Lead modernization and migration initiatives from on-premises databases (e.g., Informix, Oracle) to cloud-native platforms (AWS, Azure, Google Cloud).
- Continuously optimize data pipelines and storage for performance, scalability, cost efficiency, and SLA adherence.
- Partner with business development teams to provide technical leadership during pre-sales engagements, including client workshops and architecture discussions.
- Collaborate with data engineers, data scientists, business analysts, and IT teams to ensure architecture alignment with business goals.
- Define and enforce data governance frameworks, quality standards, and security policies to ensure compliance with organizational and regulatory requirements.
- Evaluate emerging technologies and tools to define long-term data strategy and ensure future scalability.
Key Skills:
- Data modeling (conceptual, logical, physical) for high-performance platforms
- Cloud platforms: AWS, Azure, GCP; Databricks, Snowflake
- Big Data architecture and distributed processing (PySpark, Hadoop, Kafka)
- Relational (SQL Server, Oracle, MySQL) and NoSQL (DynamoDB, CosmosDB)
- ETL/ELT pipeline design, automation, and system integration
- Cloud migration: legacy to cloud transitions