Real Interview Questions from the Field
Hand-picked, scenario-based questions actually being asked in May 2026 for Cloud, DevOps, Kubernetes, Security, Observability and Agentic AI / LLMOps roles in India. Each comes with the answer pattern interviewers expect — not just dictionary definitions.
Trending 2026
The hottest topics being asked in product company interviews right now — AgenticOps, MCP, AI Gateways, AgentCore, prompt injection, FinOps, Karpenter, eBPF, Platform Engineering. Updated May 2026.
01What is AgenticOps and how is it different from AIOps?
02What is MCP (Model Context Protocol) and why is everyone talking about it in 2026?
03What is an AI Gateway and why is it the new must-have piece of infra in 2026?
04What is prompt injection and how do you defend a production agent against it?
05How would you build a self-healing CI/CD pipeline using LLM agents? (Asked at AWS, Razorpay, Atlassian — May 2026.)
06Amazon Bedrock vs Bedrock AgentCore vs SageMaker vs Amazon Q in 2026.
07What is FinOps and how would you reduce a $50K/month AWS bill by 30% without breaking workloads?
08Compare Karpenter vs Cluster Autoscaler — when do you pick which?
09What is eBPF and why is Cilium replacing iptables-based CNIs?
10What is Platform Engineering / IDP and why are companies moving past 'pure DevOps'?
11What's the difference between a service mesh, API gateway, and Ingress controller?
12What are the 4 Golden Signals (Google SRE) and why are they better than USE for service monitoring?
13What is GitOps and how is it different from CIOps?
☁️ AWS
Architecture, IAM, networking, cost, security — asked at SAA/SAP & SRE rounds.
01Your team accidentally deleted a production S3 bucket. How do you prevent this in future without slowing down developers?
02An EC2 instance can't reach the internet even though it's in a public subnet. What do you check, in order?
03Difference between IAM Roles, Resource-based policies, and SCPs — when do they intersect?
04How would you architect a multi-account landing zone for 50 microservices?
05Spot vs Reserved vs Savings Plans — when do you mix them?
06You need to migrate 200 on-prem VMs to AWS in 6 weeks. What service and approach do you use?
07What's the difference between AWS Organizations SCPs, IAM permission boundaries, and IAM session policies?
08How would you architect a globally available, low-latency static + API site with strong cache + DDoS protection?
☸️ Kubernetes
Pods, scheduling, networking, debugging, GitOps — staple of every SRE round.
01A pod is stuck in `CrashLoopBackOff`. Walk me through your debug steps.
02Difference between Deployment, StatefulSet, DaemonSet, and Job — give one production use case for each.
03How does a request reach a Pod from outside the cluster, end-to-end?
04What's the difference between Resource Requests and Limits, and why does setting only Limits cause issues?
05Argo CD vs Flux — which would you pick for a 100-team platform?
06What is a PodDisruptionBudget and when does it actually save you?
07How do you upgrade a production EKS cluster from 1.28 to 1.30 with zero downtime?
08What's the difference between a Sidecar, an Init Container, and an Ephemeral Debug container?
🏗️ Terraform / IaC
State, modules, drift, multi-env — asked at every cloud/DevOps interview.
01Two engineers ran `terraform apply` simultaneously and now state is corrupted. How do you recover, and how do you prevent it?
02Someone changed an AWS resource manually in the console. How do you handle drift?
03When would you use a Terraform module vs a workspace vs a separate state file?
04Explain `count` vs `for_each` and when each fails you.
05How do you handle secrets in Terraform without checking them into Git?
⚙️ CI/CD & GitOps
Pipelines, branching, secret handling, promotion strategies.
01Walk me through a secure CI/CD pipeline for a microservice deploying to prod EKS.
02Trunk-based vs GitFlow — what do you actually run in production teams?
03How do you stop a developer's machine from being able to deploy to prod?
🔐 DevSecOps & Cloud Security
Asked at every senior cloud/SRE interview in 2026.
01What's the difference between SAST, DAST, SCA and IAST? Where do they sit in the pipeline?
02How would you implement Zero-Trust for an internal Kubernetes platform?
03A leaked AWS access key was found on GitHub. Your incident response in 5 minutes?
04How do you prevent secrets from being committed to Git in the first place?
05What's the difference between IAM Role and Instance Profile?
🐧 Linux & Networking
Foundational — every infra/SRE round has at least 2 of these.
01A server's load average is 40 but CPU is 5%. What's happening and how do you investigate?
02Walk me through what happens when you type `curl https://api.example.com` and press Enter.
03Difference between hard link, soft link, and bind mount?
04How does a container differ from a VM at the kernel level?
📊 Observability & SRE
Logs, metrics, traces, SLOs — hot at FAANG/product company interviews.
01Define SLI, SLO, SLA, and Error Budget with one concrete example.
02How do you reduce alert noise without missing real incidents?
03Pull vs push metrics — when does each win?
04When would you choose Loki vs Elasticsearch for logs?
🤖 AI / Agentic AI / LLMOps
The 2026 wildcard round — every cloud, DevOps and SRE role now asks at least 2-3. Pulled from real interviews at AWS, Atlassian, Razorpay, Swiggy, FAANG and India-product unicorns (May 2026).
01What is an AI Agent and how is it different from a chatbot or a RAG pipeline?
02Explain RAG vs fine-tuning vs prompt-engineering — when do you choose which in 2026?
03What is MCP (Model Context Protocol) and why is it the 'USB-C of AI' in 2026?
04Compare MCP vs A2A (Agent2Agent protocol) — when do agents call MCP servers vs each other?
05Walk me through building a production AgenticOps pipeline that auto-triages CI failures.
06What is prompt injection and how do you defend a production agent against it?
07How do you evaluate an LLM-powered feature in production? (Real interview question at Atlassian May 2026.)
08What is hallucination grounding and how do you reduce hallucinations in a RAG system?
09Compare Amazon Bedrock vs Bedrock AgentCore vs SageMaker vs Amazon Q in 2026.
10How do you serve an open-source LLM cost-effectively at scale? vLLM vs TGI vs SGLang vs Bedrock?
11What is an AI Gateway and why are companies deploying one in front of every LLM call in 2026?
12How would you design memory for a long-running agent that handles a 6-hour customer support session?
13How do you handle multi-tenancy and PII in an enterprise RAG app?
14What's Reflexion / self-correction in agents and when does it actually help vs waste tokens?
15Frontier models in mid-2026 — what does the practical landscape look like?
16Why run an LLM locally with Ollama / vLLM instead of using GPT / Claude APIs?
17What is LLMOps and how does it differ from MLOps? What does an LLMOps pipeline contain?
18Walk me through a real Agentic AI architecture you'd ship to production (system-design round).
🐳 Docker & Containers
Image hygiene, layering, security — every DevOps round opens with these.
01Your Docker image is 1.8 GB. How do you bring it under 200 MB without breaking the app?
02Difference between CMD, ENTRYPOINT, and RUN — when do they fight each other?
03How would you scan and sign images in a CI pipeline?
04What happens to a running container if you delete its image?
🐍 Python for DevOps / SRE
Scripting, async, packaging — asked when role mentions automation.
01What's the GIL and when does it actually hurt you?
02How do you package a CLI tool so a teammate can `pipx install` it?
03list vs tuple vs set vs dict — pick one for: 10M lookups, ordered config, dedupe, immutable record.
04How do you avoid blocking the event loop in asyncio?
🏛️ System Design (Cloud-native)
Asked at every senior+ round. Focus on trade-offs, not buzzwords.
01Design a URL shortener that handles 10K writes/sec and 1M reads/sec.
02How do you design a multi-region active-active deployment for a SaaS app?
03How would you handle 'thundering herd' on a cache miss for a hot key?
04When would you choose Kafka over SQS, and vice versa?
🤝 Behavioural / Leadership
STAR format. Hiring managers weight these as heavily as tech rounds.
01Tell me about a time you disagreed with your manager. (STAR)
02Tell me about a production incident you led.
03How do you handle a teammate who consistently misses deadlines?
04Why do you want to leave your current role?
05Where do you see yourself in 3 years?
Want to crack these in real interviews?
Join a Cloudadhar batch — we run weekly mock interviews with detailed feedback on exactly these question patterns, plus salary negotiation playbooks.