AI Engineering Lead - Customer Care - RAG

- Cairo, Al Qāhirah, Egypt
- Amman, Al ‘A̅şimah, Jordan
+1 more
Products & Engineering

Job description

Proudly voted a Great Place to Work®, we are a dynamic startup in the SaaS space that is revolutionizing the way businesses communicate. Our team is made up of 500 energetic and passionate Unifones who are dedicated to delivering the best possible experience to 5000+ customer-centric companies.

We pride ourselves on our fun and collaborative work environment, where creativity and new ideas are constantly encouraged. As shareholders in the business, we’re so much more than a group of passionate communicators. We are Unifones. Join our team and be a part of something big!

Meet the team!

Our Engineering team is responsible for designing, developing, and maintaining the systems and technologies that drive Unifonic’s solutions. We work closely with other departments to ensure our products and services meet the needs of our customers. If you are passionate about technology and are excited about working on cutting-edge communication and engagement solutions, we want you on our team.

Our Customer Care Squad transforms customer support from reactive to predictive—leveraging state-of-the-art AI, Retrieval-Augmented Generation (RAG), and Large Language Models (LLMs) to provide accurate, real-time, personalized assistance at a massive scale.

As an AI Engineering Lead, you will draw on deep, hands-on experience in delivering large-scale, production-grade conversational AI and Retrieval-Augmented Generation (RAG) solutions. This role is for an AI expert who has genuinely "been there and done that"—someone ready to architect, build, and operate a real-time AI customer support platform with a relentless focus on accuracy, reliability, and ultra-low latency. You'll lead a lean, high-impact team, driving innovation while ensuring production excellence at every layer of the stack.

Help us shape the future of communication by:

Owning the architectural vision and technical roadmap for AI-driven customer support systems.
Designing, developing, and scaling real-time Retrieval-Augmented Generation (RAG) pipelines integrating state-of-the-art open-source LLMs (Llama 3, Mistral, Falcon, or similar).
Implementing scalable, high-performance vector search (Pinecone, Weaviate, Milvus) for robust knowledge retrieval and semantic search.
Having awareness of techniques such as quantization, pruning, distillation, batching, and caching for optimizing LLM inference and achieving sub-second response times.
Developing and exposing secure, performant APIs via FastAPI/gRPC or others, containerized (Docker), orchestrated (Kubernetes), and fully integrated into automated CI/CD pipelines.
Embedding comprehensive monitoring and evaluation (e.g. MRR, Recall@k, NDCG, Faithfulness, latency metrics) and implementing automated regression testing for continuous improvement.
Championing and enforcing best practices for data security, compliance (GDPR, Saudi PDPL), and responsible AI, including PII redaction and end-to-end encryption.
Demonstrating mastery of foundational software engineering by writing clean, maintainable, and testable code; designing robust, modular, and scalable systems; leveraging version control; and implementing comprehensive continuous
integration, automated testing, and deployment practices.
Leading rigorous design and code reviews, mentoring engineers, and fostering an innovative engineering culture grounded in clean architecture, SOLID principles, and proactive best practices to ensure system reliability, security, and agility.

Job requirements

What you’ll bring:

Bachelor’s degree in Computer Science, Data Engineering, Information Systems, or a related field
7+ years delivering production AI/NLP systems, including 3+ years as a technical lead or senior staff engineer.
Proven experience owning real-time conversational AI/RAG platforms at massive scale, serving thousands of concurrent users.
Expert proficiency in Java or Python with strong software engineering fundamentals and system-design capabilities.
Deep knowledge and hands-on experience with frameworks and technologies: Hugging Face, LangChain, LlamaIndex, SpringAI, vector databases (Pinecone, Weaviate, Milvus), and embedding models.
Strong expertise in low-latency inference optimisation and GPU resource management.
Solid experience building large-scale data ingestion and processing pipelines (Spark, Flink, Kafka, RabbitMQ).
Robust MLOps and deployment expertise (Docker, Kubernetes, MLflow, Kubeflow, Git-based prompt versioning, automated CI/CD).
Clear communicator capable of translating complex technical concepts into strategic business value.
Expertise in red-teaming practices and machine learning security research, including developing and reinforcing robust defenses against adversarial threats.
Arabic & English language proficiency.
Proven hands-on expertise in traditional advanced machine learning and deep learning, including applying techniques like CNNs, Transformers, AutoML frameworks, and hyperparameter optimization to tackle complex problems at production scale is preferred.
Practical experience developing multilingual, multi-channel, or voice-driven conversational agents is preferred.
Open-source contributions to LLM, NLP, or vector search ecosystems is preferred.
Familiarity with reinforcement learning or bandit algorithms for advanced conversational strategies is preferred

As a Unifone you’ll receive a range of benefits:

Competitive salary and bonus
Unifonic share scheme (we are all owners!)
30 holiday days after the first anniversary
Your Birthday off!
Spend up to 25 days per year working from anywhere in the world!
Paid leave for new parents
LinkedIn learning license

On-site

Cairo, Al Qāhirah, Egypt
Amman, Al ‘A̅şimah, Jordan

+1 more

Products & Engineering

Apply with Linkedin unavailable

Apply with Indeed unavailable

No suitable roles? Submit your CV instead.Get in Touch Today

Sign up for the latest Unifonic news, stories, webinars and more.

Business

﻿Products﻿
﻿Solutions﻿
﻿About Us﻿
﻿Career﻿

Resources

﻿Blog﻿
﻿Webinars﻿
﻿Whitepapers﻿
﻿Case Studies﻿

Support

﻿Help Centre﻿
﻿Developer﻿

Contact

﻿Request Demo﻿
﻿Contact Us﻿