CaritaTech LLC.
Hi,
Title: Data Engineer GenAI Focus (RAG, Agents, LLMs) Location: Phoenix, AZ (Onsite/Hybrid) No remote
Duration: 12+ Months
Contract Type: W2 Only
We are looking for a Data Engineer with strong Generative AI experience who has evolved from traditional data engineering into building cutting-edge AI-powered solutions. This role is ideal for someone who has hands-on experience with RAG pipelines, agent-based systems, and custom LLM applications in production environments.
Key Responsibilities:
Design and build scalable data pipelines to support AI/ML and GenAI workloads
Develop and deploy RAG (Retrieval-Augmented Generation) architectures using vector databases
Build and manage LLM-powered applications, including custom GPTs and enterprise AI assistants
Implement agentic workflows using frameworks like Lang
Chain, Auto
Gen, or similar
Integrate structured and unstructured data sources for AI consumption
Optimize data models and pipelines for performance, scalability, and reliability
Collaborate with data scientists, ML engineers, and business stakeholders to deliver AI-driven solutions
Ensure data quality, governance, and security across AI pipelines
Work with cloud platforms (AWS/Azure/Google Cloud Platform) for scalable GenAI deployments
Required Skills:
7+ years of Data Engineering experience (ETL, data pipelines, big data processing)
Proven hands-on experience in Generative AI / LLM-based solutions
Strong experience building RAG pipelines with vector databases (Pinecone, FAISS, Weaviate, etc.)
Experience with agent frameworks (Lang
Chain, Lang
Graph, Auto
Gen, CrewAI, etc.)
Solid programming skills in Python
Experience working with LLMs (OpenAI, Azure OpenAI, Claude, etc.)
Strong SQL and data modeling skills
Experience with Spark, Databricks, or similar big data technologies
Hands-on experience with cloud platforms (AWS/Azure/Google Cloud Platform)
Preferred Qualifications
Experience building custom GPTs or AI copilots
Familiarity with prompt engineering, embeddings, and vector search
Knowledge of ML Ops / LLM Ops practices
Exposure to real-time or streaming data pipelines
Strong understanding of data architecture and distributed systems
Verified Listing
This role has been verified for authenticity, market-rate compensation, and remote eligibility.
Get the latest updates on AI-powered hiring, career growth, and technical deep-dives delivered to your inbox.