Data Engineering
FTE - IntraEdge
Job Description
Job Description:
We are seeking an experienced and highly skilled AI Data Engineer to join our team. The successful candidate will be responsible for designing, building, and maintaining the data infrastructure and pipelines that power our AI, machine learning (ML), agentic AI, and generative AI (GenAI) initiatives. This role requires strong expertise in data engineering best practices and a deep understanding of the unique data needs of AI models.
Key responsibilities
- Build AI-ready data pipelines: Design, construct, and optimize scalable Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) pipelines specifically for AI and ML models.
- Architect data solutions: Develop and manage data architectures, including data lakes, data warehouses, and vector databases, to support various AI workloads.
- Ensure data quality and governance: Implement data validation, security, and governance policies to ensure the integrity, accessibility, and compliance of data used in AI models.
- Support AI model lifecycle: Collaborate with data scientists and ML engineers to prepare, integrate, and manage large-scale datasets for model training and deployment.
- Manage real-time data: Develop streaming data pipelines using technologies like Apache Kafka to support real-time AI applications and analytics.
- Optimize cloud infrastructure: Utilize AWS cloud computing platforms to build, deploy, and scale AI data solutions efficiently.
- Deploy AI models: Automate the training and deployment of AI/ML models into production via APIs and microservices.
- Monitor and troubleshoot: Implement data observability tools to monitor pipeline health, identify data drift, and quickly resolve any data quality issues that may impact model performance.
- AI-assisted development: Use AI assistants like Copilot in Microsoft Fabric notebooks to generate, explain, and fix code, accelerate data analysis, and streamline data transformation tasks.
Required qualifications
- Education: A Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related technical field is typically required.
- Experience: Proven experience in a data engineering or similar role, with specific experience supporting AI and ML projects.
- Programming: Fluency in programming languages such as Python and SQL, and familiarity with others like Java or Scala.
- Frameworks: Hands-on experience with ML frameworks like TensorFlow, PyTorch, and Scikit-learn, as well as LLM-specific tools like LangChain or LlamaIndex.
- Big data: Experience with distributed data processing frameworks such as Apache Spark and Hadoop.
- Cloud platforms: Proficiency with at least one major cloud provider (AWS, Azure, or GCP) and its AI data-related services.
- Databases: Expertise in both relational (SQL) and NoSQL databases, including vector databases for GenAI applications.
- DevOps and MLOps: Experience with CI/CD, Docker, and ML lifecycle management tools like MLflow is highly valued.
Job Requirements
Job Description:
We are seeking an experienced and highly skilled AI Data Engineer to join our team. The successful candidate will be responsible for designing, building, and maintaining the data infrastructure and pipelines that power our AI, machine learning (ML), agentic AI, and generative AI (GenAI) initiatives. This role requires strong expertise in data engineering best practices and a deep understanding of the unique data needs of AI models.
Key responsibilities
- Build AI-ready data pipelines: Design, construct, and optimize scalable Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) pipelines specifically for AI and ML models.
- Architect data solutions: Develop and manage data architectures, including data lakes, data warehouses, and vector databases, to support various AI workloads.
- Ensure data quality and governance: Implement data validation, security, and governance policies to ensure the integrity, accessibility, and compliance of data used in AI models.
- Support AI model lifecycle: Collaborate with data scientists and ML engineers to prepare, integrate, and manage large-scale datasets for model training and deployment.
- Manage real-time data: Develop streaming data pipelines using technologies like Apache Kafka to support real-time AI applications and analytics.
- Optimize cloud infrastructure: Utilize AWS cloud computing platforms to build, deploy, and scale AI data solutions efficiently.
- Deploy AI models: Automate the training and deployment of AI/ML models into production via APIs and microservices.
- Monitor and troubleshoot: Implement data observability tools to monitor pipeline health, identify data drift, and quickly resolve any data quality issues that may impact model performance.
- AI-assisted development: Use AI assistants like Copilot in Microsoft Fabric notebooks to generate, explain, and fix code, accelerate data analysis, and streamline data transformation tasks.
Required qualifications
- Education: A Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related technical field is typically required.
- Experience: Proven experience in a data engineering or similar role, with specific experience supporting AI and ML projects.
- Programming: Fluency in programming languages such as Python and SQL, and familiarity with others like Java or Scala.
- Frameworks: Hands-on experience with ML frameworks like TensorFlow, PyTorch, and Scikit-learn, as well as LLM-specific tools like LangChain or LlamaIndex.
- Big data: Experience with distributed data processing frameworks such as Apache Spark and Hadoop.
- Cloud platforms: Proficiency with at least one major cloud provider (AWS, Azure, or GCP) and its AI data-related services.
- Databases: Expertise in both relational (SQL) and NoSQL databases, including vector databases for GenAI applications.
- DevOps and MLOps: Experience with CI/CD, Docker, and ML lifecycle management tools like MLflow is highly valued.