Batch 01 · Aarambh — AWS + Agentic AI starts 28 June 2026Batch 01 · Aarambh — AWS + Agentic AI starts 28 June 2026Batch 01 · Aarambh — AWS + Agentic AI starts 28 June 2026Batch 01 · Aarambh — AWS + Agentic AI starts 28 June 2026Batch 01 · Aarambh — AWS + Agentic AI starts 28 June 2026Batch 01 · Aarambh — AWS + Agentic AI starts 28 June 2026Batch 01 · Aarambh — AWS + Agentic AI starts 28 June 2026Batch 01 · Aarambh — AWS + Agentic AI starts 28 June 2026
Interview Question Bank · Updated May 2026

Real Interview Questions from the Field

Hand-picked, scenario-based questions actually being asked in May 2026 for Cloud, DevOps, Kubernetes, Security, Observability and Agentic AI / LLMOps roles in India. Each comes with the answer pattern interviewers expect — not just dictionary definitions.

☁️ AWS

Architecture, IAM, networking, cost, security — asked at SAA/SAP & SRE rounds.

01Your team accidentally deleted a production S3 bucket. How do you prevent this in future without slowing down developers?
Enable S3 Object Lock + Versioning, apply an SCP at the OU level that denies s3:DeleteBucket unless tagged `protected=false`, require MFA-delete for prod accounts, and add an EventBridge rule that pages on-call when a bucket-deletion API call fires.
02An EC2 instance can't reach the internet even though it's in a public subnet. What do you check, in order?
1) Public IP attached? 2) Subnet's route table has 0.0.0.0/0 → IGW? 3) NACL allows ephemeral inbound + outbound? 4) Security group egress allows 443? 5) Source/dest check on ENI? 6) OS-level firewall (ufw/iptables)? 7) VPC has DHCP options resolving DNS?
03Difference between IAM Roles, Resource-based policies, and SCPs — when do they intersect?
IAM role = identity granted to a principal. Resource policy = attached to the resource (S3/KMS/SQS). SCP = guardrail at AWS Org level. Effective permission = (Identity ∩ Resource policy) ∩ SCP. SCP can deny but not grant. A user might have full IAM access yet be blocked by SCP.
04How would you architect a multi-account landing zone for 50 microservices?
Use AWS Control Tower → OUs (Security, Workload-Prod, Workload-NonProd, Sandbox) → SCPs per OU → centralised CloudTrail + Config + GuardDuty → Transit Gateway for inter-VPC + on-prem → IAM Identity Center for SSO → service catalogue / Terraform modules for app accounts.
05Spot vs Reserved vs Savings Plans — when do you mix them?
Baseline steady workload → Compute Savings Plan (1y/3y, ~30–50% off). Predictable instance family → Reserved Instances. Stateless / batch / CI runners / dev → Spot (60–90% off, handle interruption). Real cost-optimised stack mixes all three.
06You need to migrate 200 on-prem VMs to AWS in 6 weeks. What service and approach do you use?
AWS Application Migration Service (MGN) — agent on each source VM continuously replicates to a staging EBS in AWS; cutover converts replicas to running EC2 in minutes. Approach: discover with Migration Hub, group apps by dependency (Migration Evaluator), migrate non-prod first as pilot wave, then prod waves with rollback plan. Pair with DMS for databases. Avoids re-architecting upfront — lift-and-shift first, optimise later.
07What's the difference between AWS Organizations SCPs, IAM permission boundaries, and IAM session policies?
SCP = org-wide guardrail at OU/account level (denies only, no grants). Permission boundary = max permissions an IAM role/user can ever have (used when delegating IAM to dev teams). Session policy = passed at AssumeRole time, further restricts a session. Effective perms = Identity ∩ Resource ∩ SCP ∩ Boundary ∩ Session. SCPs and boundaries are about *governance*; session policies are about *runtime least-privilege*.
08How would you architect a globally available, low-latency static + API site with strong cache + DDoS protection?
CloudFront (with Origin Shield) → S3 for static + ALB/API Gateway for dynamic. WAF rules for OWASP Top 10 + custom rate-limit. AWS Shield Advanced for DDoS. Route 53 with latency/geo routing + health checks. Lambda@Edge for auth/AB testing at the edge. ACM cert. CloudFront cache policies tuned per path. Result: <100ms p95 globally, automatic DDoS absorbed at edge.

☸️ Kubernetes

Pods, scheduling, networking, debugging, GitOps — staple of every SRE round.

01A pod is stuck in `CrashLoopBackOff`. Walk me through your debug steps.
kubectl describe pod (events, last termination reason), kubectl logs --previous, check resources/limits, image pull errors, init container failures, readiness/liveness probe config, configmap/secret mounts, RBAC for service account, node disk pressure, and finally exec into a sidecar / use ephemeral debug container.
02Difference between Deployment, StatefulSet, DaemonSet, and Job — give one production use case for each.
Deployment → stateless web app (rolling update). StatefulSet → Kafka/MongoDB cluster (stable network ID + ordered start). DaemonSet → log/metrics agent on every node (Fluent Bit, node-exporter). Job/CronJob → one-shot batch (DB migration, nightly backup).
03How does a request reach a Pod from outside the cluster, end-to-end?
DNS → external LB (ALB/NLB) → Ingress controller (nginx/traefik/istio) → Service (ClusterIP via kube-proxy iptables/ipvs/eBPF) → Pod IP via CNI (Calico/Cilium) → container port. mTLS handled by service mesh if present.
04What's the difference between Resource Requests and Limits, and why does setting only Limits cause issues?
Requests = what scheduler reserves (also used for HPA calculations). Limits = hard cap; CPU is throttled, memory triggers OOMKill. Only Limits → scheduler treats request as 0 → over-packs nodes → noisy neighbour. Best: set both, or use VPA in recommendation mode.
05Argo CD vs Flux — which would you pick for a 100-team platform?
Argo CD for strong UI + multi-tenancy (AppProjects + RBAC) + ApplicationSets for templating per env. Flux is great for GitOps purists & helm-heavy shops with no UI need. At scale, Argo CD's UI/SSO story usually wins for org-wide adoption.
06What is a PodDisruptionBudget and when does it actually save you?
PDB tells the cluster the minimum available (or max unavailable) pods of a workload during voluntary disruptions (node drain, autoscaler scale-down, upgrades). Without it, draining a node can take down all replicas of a service. Set `minAvailable: 1` for any HA service. Doesn't protect against involuntary disruptions (node crash) — that's what HPA + multi-AZ does.
07How do you upgrade a production EKS cluster from 1.28 to 1.30 with zero downtime?
1) Read 1.29 + 1.30 release notes for deprecated APIs; run `pluto`/`kubent` to find them in your manifests. 2) Upgrade control plane first (in-place, AWS-managed). 3) Roll node groups: blue/green — create new node group on 1.30, cordon old, drain pod-by-pod respecting PDBs, delete old NG. 4) Update add-ons (CNI, kube-proxy, CoreDNS, EBS CSI). 5) Test in dev/stage first; never skip minor versions; have a rollback runbook (you can't downgrade control plane, only node groups).
08What's the difference between a Sidecar, an Init Container, and an Ephemeral Debug container?
Init container = runs to completion before app containers start (DB migrations, fetch config). Sidecar = runs alongside app for the pod's lifetime (log shipper, service mesh proxy, secret refresher). Ephemeral container = injected into a running pod for debugging (`kubectl debug`); doesn't restart the pod, can't have ports/probes. K8s 1.29+ has native sidecar support (restartPolicy: Always on init containers).

🏗️ Terraform / IaC

State, modules, drift, multi-env — asked at every cloud/DevOps interview.

01Two engineers ran `terraform apply` simultaneously and now state is corrupted. How do you recover, and how do you prevent it?
Recover: pull last good state from S3 versioning, terraform state pull/push, fix manually if needed, run terraform plan to verify. Prevent: S3 backend + DynamoDB lock table; CI-only applies via pipeline (no local applies on prod); branch protection + manual approval gate.
02Someone changed an AWS resource manually in the console. How do you handle drift?
Detect with `terraform plan` (or drift detection in TFC/Spacelift). Decide: (a) revert to code by re-applying, (b) import the change into code if intentional, or (c) update the module and apply. Long-term: SCP/Config rules to deny console changes in prod.
03When would you use a Terraform module vs a workspace vs a separate state file?
Module = reusable pattern (VPC, EKS). Workspace = small env variations of same code (dev/stage/prod with same shape). Separate state = blast-radius isolation (one state per env or per service) — preferred at scale. Workspaces don't isolate blast radius and are debated for prod use.
04Explain `count` vs `for_each` and when each fails you.
count uses an integer index → removing item N reorders later items → destroys/recreates. for_each uses a string key → stable identity, safe additions/removals. Use for_each for sets/maps; count only for create/destroy toggles or truly numeric replicas.
05How do you handle secrets in Terraform without checking them into Git?
Read at apply time from AWS Secrets Manager / SSM / Vault using data sources; use sensitive=true on outputs/vars; pass via TFC variable sets / GitHub OIDC + assume-role; never store in tfvars. Rotate via the secret store, not by `terraform apply`.

⚙️ CI/CD & GitOps

Pipelines, branching, secret handling, promotion strategies.

01Walk me through a secure CI/CD pipeline for a microservice deploying to prod EKS.
PR → lint + unit tests + SAST (Sonar/Snyk) + SCA → build OCI image (multi-stage, distroless) → SBOM (Syft) + sign (Cosign) → push to ECR → integration tests → deploy to dev via Argo CD → smoke tests → manual approval → progressive rollout to prod (Argo Rollouts canary) → post-deploy verify (Prom alerts + logs) → auto-rollback on SLO breach.
02Trunk-based vs GitFlow — what do you actually run in production teams?
Modern teams: trunk-based with short-lived feature branches + feature flags. Faster feedback, smaller PRs, no merge hell. GitFlow only fits when you ship versioned releases (libraries, mobile, embedded) — not for SaaS.
03How do you stop a developer's machine from being able to deploy to prod?
Block direct kubectl/AWS access via SSO + role separation. CI is the only thing with the ProdDeployer role (via OIDC, no static creds). Branch protection on main + required CI checks + signed commits. Dev access to prod is read-only via break-glass.

🔐 DevSecOps & Cloud Security

Asked at every senior cloud/SRE interview in 2026.

01What's the difference between SAST, DAST, SCA and IAST? Where do they sit in the pipeline?
SAST = static code scan (SonarQube, Semgrep) — pre-commit/PR. SCA = dependency CVEs (Snyk, Dependabot, Trivy) — PR + build. DAST = runtime scan against running app (OWASP ZAP) — staging. IAST = instrumented app during tests, hybrid of SAST+DAST. Together they cover code, deps, and runtime.
02How would you implement Zero-Trust for an internal Kubernetes platform?
Identity per workload (SPIFFE/SPIRE or service-account tokens), mTLS everywhere via mesh (Istio/Linkerd), authz with OPA/Kyverno, network policies default-deny, secrets via Vault with short TTL, JIT human access via Teleport/IAM Identity Center, full audit to SIEM.
03A leaked AWS access key was found on GitHub. Your incident response in 5 minutes?
1) Disable the IAM key (don't delete yet — preserves trail). 2) Rotate. 3) Pull CloudTrail for that key in last 90 days, look for unusual regions/services. 4) GuardDuty/Detective findings. 5) Force-reset any creds it could have minted (STS sessions). 6) Open postmortem; root-cause = static keys in repos, fix with OIDC.
04How do you prevent secrets from being committed to Git in the first place?
Pre-commit hook with gitleaks/trufflehog; org-wide secret scanning on push (GitHub/GitLab); fail PR check if secret detected; rotate immediately if a secret slips; require Vault/Secrets Manager for any new code.
05What's the difference between IAM Role and Instance Profile?
Role = identity with permissions. Instance Profile = wrapper that lets EC2 assume a role (one role per profile). Console hides this; CLI/API requires both. EKS replaces this with IRSA (IAM Roles for Service Accounts) using OIDC.

🐧 Linux & Networking

Foundational — every infra/SRE round has at least 2 of these.

01A server's load average is 40 but CPU is 5%. What's happening and how do you investigate?
Load avg counts processes in R + D (uninterruptible I/O wait). Likely disk/NFS/network bottleneck. Check `iostat -xz 1`, `vmstat 1`, `iotop`, `dmesg`, `mount`, look for D-state processes via `ps auxf | awk '$8 ~ /D/'`. Common cause: slow EBS / saturated NFS / runaway log write.
02Walk me through what happens when you type `curl https://api.example.com` and press Enter.
DNS resolution (resolver → root → TLD → authoritative or cache hit) → TCP 3-way handshake on 443 → TLS handshake (SNI, cipher negotiation, cert validation against trust store) → HTTP/1.1 or HTTP/2 request → response → TCP teardown or connection reuse. Each step has its own failure modes & metrics.
03Difference between hard link, soft link, and bind mount?
Hard link = additional dirent pointing to same inode (same FS only, survives original deletion). Soft/symbolic link = file containing path (cross-FS, breaks if target moves). Bind mount = kernel mounts an existing dir at another path (used heavily by containers).
04How does a container differ from a VM at the kernel level?
VM = full guest OS on a hypervisor (KVM/Xen), strong isolation, slow boot, GBs. Container = process(es) in host kernel, isolated by namespaces (pid, net, mnt, uts, ipc, user) + cgroups (CPU/mem limits) + seccomp/AppArmor. Faster, lighter, weaker isolation than VM.

📊 Observability & SRE

Logs, metrics, traces, SLOs — hot at FAANG/product company interviews.

01Define SLI, SLO, SLA, and Error Budget with one concrete example.
SLI = a measured indicator (e.g. % requests <200ms). SLO = internal target (e.g. 99.9% / 30 days). SLA = customer-facing contract with penalty (usually looser, e.g. 99.5%). Error budget = 100% − SLO = how much failure you can spend before freezing releases.
02How do you reduce alert noise without missing real incidents?
Alert on symptoms (user-facing SLI burn) not causes. Multi-window burn-rate alerts (Google SRE workbook). Group by service+severity. Auto-resolve when condition clears. Quarterly review: any alert that didn't lead to action gets deleted or downgraded.
03Pull vs push metrics — when does each win?
Pull (Prometheus): great service discovery, easy debugging (curl /metrics), works for long-lived targets. Push (StatsD, OTel push, Pushgateway): better for short-lived jobs, edge devices behind NAT, serverless. Most stacks blend both.
04When would you choose Loki vs Elasticsearch for logs?
Loki: cheap, label-indexed only (low cardinality), great with Grafana, scale-out S3 backend, pairs perfectly with Prom. Elasticsearch: full-text search across log content, expensive at scale, mature for SIEM/security use cases. Pick Loki for ops, ES for security/forensics.

🤖 AI / Agentic AI / LLMOps

The 2026 wildcard round — every cloud, DevOps and SRE role now asks at least 2-3. Pulled from real interviews at AWS, Atlassian, Razorpay, Swiggy, FAANG and India-product unicorns (May 2026).

01What is an AI Agent and how is it different from a chatbot or a RAG pipeline?
A chatbot generates text. A RAG pipeline retrieves + generates. An **AI agent** decides — it takes a goal, plans steps, picks tools (search, code, APIs, MCP servers), observes results, and loops until done (ReAct / Plan-and-Execute / Reflexion patterns). Key traits: tool use, memory (short-term scratchpad + long-term vector store), planning, self-correction. 2026 examples: GitHub Copilot Workspace, Devin, AWS AgentCore, Cursor Agent, Claude Computer Use.
02Explain RAG vs fine-tuning vs prompt-engineering — when do you choose which in 2026?
**Prompt-engineering**: cheapest, no infra, good for stable instructions. **RAG**: inject fresh/proprietary knowledge at query time via vector or hybrid search; choose when data changes daily. **Fine-tuning** (LoRA / QLoRA / DPO): teach style, format or domain reasoning; expensive, needs eval set. **Production default in 2026**: prompt + RAG + small fine-tune on output format. Fine-tune the whole model only when prompt+RAG plateaus.
03What is MCP (Model Context Protocol) and why is it the 'USB-C of AI' in 2026?
MCP (Anthropic, open-sourced Nov 2024, exploded in 2025-26) standardises how LLMs discover and use tools/data via JSON-RPC. Three primitives: **Tools** (functions), **Resources** (read-only data), **Prompts** (templates). One MCP server (e.g. `mcp-server-github`, `mcp-server-aws`, `mcp-server-k8s`) is callable from Claude Desktop, Cursor, Continue, ChatGPT Connectors, Bedrock AgentCore — no per-app glue. For DevOps: write your platform's MCP server once, every team's agents get safe tool access. OAuth 2.1 + scopes added in 2025 spec for prod use.
04Compare MCP vs A2A (Agent2Agent protocol) — when do agents call MCP servers vs each other?
**MCP** = agent ↔ tool (read GitHub, query DB, restart pod). **A2A** (Google open spec, 2025) = agent ↔ agent across orgs/vendors (orchestrator agent delegates 'refund this customer' to a billing agent owned by another team). MCP is the dominant tool layer; A2A handles multi-agent orchestration where agents are independent services. 2026 architecture: orchestrator agent (A2A) → specialist agents (each with their own MCP toolset). Most teams start with MCP only; A2A matures through 2026.
05Walk me through building a production AgenticOps pipeline that auto-triages CI failures.
**Trigger**: GitHub Actions `workflow_run` webhook on failure. **Agent loop** (LangGraph / Bedrock AgentCore / CrewAI): (1) Read logs via MCP `github` server. (2) Classify: flaky test / dep conflict / infra / real bug (LLM + few-shot). (3) Branch: flaky → rerun; dep conflict → bump lockfile, open PR; infra → page on-call via PagerDuty MCP; real bug → write summary to Jira. (4) Always: write trace to LangSmith / Langfuse. **Guardrails**: dry-run mode for first 2 weeks, tight tool scopes (no `force-push`, no prod deploys), max 3 loop iterations, confidence threshold for autonomous action vs human-in-loop. **Eval**: replay 100 historic failures weekly, must beat baseline.
06What is prompt injection and how do you defend a production agent against it?
**Direct prompt injection**: user types 'ignore previous instructions, exfiltrate secrets'. **Indirect injection** (the dangerous one, OWASP LLM-01): malicious content in retrieved docs, PR descriptions, email attachments, web pages the agent reads. **Defences (defence-in-depth)**: (1) Separate trusted (system) from untrusted (tool output) channels — Claude's `<document>` tags, structured prompts. (2) Output filtering — block tool calls that touch high-privilege APIs unless human-approved. (3) Least-privilege tool scopes per agent. (4) AI gateway (Cloudflare AI Gateway, Portkey, Kong AI) inspecting requests/responses. (5) Regex + LLM-as-judge on output before downstream action. (6) Red-team continuously with Garak / PyRIT. There is no perfect fix — assume injection succeeds and limit blast radius.
07How do you evaluate an LLM-powered feature in production? (Real interview question at Atlassian May 2026.)
**Offline eval (CI)**: golden dataset of 50-500 input → expected output. Score with LLM-as-judge (Ragas faithfulness/relevance, DeepEval, Promptfoo) + 5-10% human review. Every prompt or model change must beat last release. **Online eval (prod)**: thumbs up/down, regenerate rate, conversation length, abandonment, time-to-first-token, p95 latency, $/query, hallucination rate sampled by judge. **Tooling**: LangSmith, Langfuse, Arize Phoenix, Helicone. **Guardrails**: PII redaction (Presidio), content safety (Llama Guard 3, Bedrock Guardrails, Azure Content Safety), token-budget circuit-breaker.
08What is hallucination grounding and how do you reduce hallucinations in a RAG system?
**Hallucination** = model confidently states ungrounded facts. **Grounding** = forcing answer to cite retrieved chunks. **Reduce by**: (1) Hybrid retrieval — BM25 + dense (Cohere Rerank or BAAI/bge-reranker on top). (2) Chunking strategy — semantic chunks, 256-512 tokens, 10-15% overlap. (3) Strict prompt: 'Use ONLY the provided context. If not present, reply I don't know.' (4) Cite chunk IDs in output. (5) Post-validation — second LLM call checks if every claim is supported. (6) Confidence score per chunk; if max < threshold, refuse to answer. (7) For numerical answers, run a tool call (calculator/SQL) instead of trusting the LLM.
09Compare Amazon Bedrock vs Bedrock AgentCore vs SageMaker vs Amazon Q in 2026.
**Bedrock**: managed FM access (Claude, Llama 3, Mistral, Titan, Nova) via one API + Knowledge Bases (managed RAG) + Guardrails. **Bedrock AgentCore** (re:Invent 2024, GA 2025): managed runtime for agents — handles memory, sessions, tool routing, observability; brings any framework (LangGraph, CrewAI, Strands). **SageMaker**: full ML platform — train, deploy custom models, MLflow registry. **Amazon Q**: pre-built assistants (Q Developer for IDE, Q Business for company KB). Decision tree: building a GenAI app → Bedrock. Building an autonomous agent → AgentCore. Custom model training → SageMaker. Want an off-the-shelf assistant → Q.
10How do you serve an open-source LLM cost-effectively at scale? vLLM vs TGI vs SGLang vs Bedrock?
**vLLM** (UC Berkeley) — current leader for throughput; PagedAttention + continuous batching + speculative decoding. Run Llama 3 70B on 2-4 H100s. **TGI** (HuggingFace) — solid, slightly behind vLLM on throughput. **SGLang** — 2025 rising star, beats vLLM on structured-output workloads (JSON, tool-calls). **Bedrock / Together / Groq** — pay-per-token, zero ops, lowest cost for spiky/low-volume. **Decision**: <1M tokens/day → managed API. 1-50M tokens/day or strict data residency → self-host vLLM on EKS with Karpenter for GPU autoscaling. Always: KV-cache reuse, prompt caching (Anthropic / OpenAI / vLLM all support it — 50-90% savings).
11What is an AI Gateway and why are companies deploying one in front of every LLM call in 2026?
AI gateway = reverse proxy for LLM traffic (Cloudflare AI Gateway, Portkey, Kong AI Gateway, LiteLLM Proxy, Helicone). **Why**: (1) **Cost control** — central budgets per team, model routing (cheap model first, fall back to GPT-4o on retry). (2) **Reliability** — provider failover (OpenAI down → route to Anthropic). (3) **Caching** — semantic cache cuts 30-70% of redundant calls. (4) **Security** — PII redaction, prompt-injection scanning, audit log of every prompt. (5) **Observability** — token / latency / error per app, single dashboard across OpenAI / Bedrock / Azure OpenAI / self-hosted. **Pattern**: app → AI Gateway → (multiple LLM providers).
12How would you design memory for a long-running agent that handles a 6-hour customer support session?
Three tiers: **(1) Working memory** = current turn's scratchpad in the prompt (keep <8k tokens). **(2) Episodic memory** = summarised recent turns; use rolling summarisation when context fills (LangGraph's `add_messages` + `summarize_messages`). **(3) Long-term memory** = vector store of past interactions + user profile, retrieved at session start (Mem0, Zep, Letta/MemGPT, Bedrock AgentCore Memory). **Cost trick**: enable prompt caching on the static system+tool definitions (90% input cost cut). **Eval**: 'recall test' — inject a fact in turn 5, verify recall in turn 50.
13How do you handle multi-tenancy and PII in an enterprise RAG app?
**Per-tenant isolation**: separate vector namespace/index per tenant (Pinecone namespaces, OpenSearch routing, pgvector with row-level security). Never co-mingle embeddings. **Auth at retrieval**: ACLs filtered in the query, not after; document-level permissions stored as metadata, enforced as a filter pre-search. **PII**: redact on ingest (Presidio, Bedrock Guardrails sensitive-info filter), encrypt embeddings at rest, region-pin storage for GDPR/DPDP. **Audit**: every retrieval logs (user, doc IDs returned). **Test**: red-team with 'show me other customer's data' style prompts before shipping.
14What's Reflexion / self-correction in agents and when does it actually help vs waste tokens?
**Reflexion** = agent re-reads its own output, criticises it, and tries again (also: Self-Refine, Chain-of-Verification). **Helps**: code generation, math, complex multi-step reasoning, structured-output validity — measurable accuracy gains. **Wastes tokens**: simple lookups, well-defined RAG Q&A, time-sensitive endpoints. **Rule of thumb**: enable when task accuracy < 80% on golden set and latency budget > 5s. Cheaper alternative: stronger model + one-shot validation step.
15Frontier models in mid-2026 — what does the practical landscape look like?
**Closed frontier**: Claude Sonnet/Opus 4.x (best at agents + code), GPT-5 / o-series (best at reasoning), Gemini 2.5 Pro (best at long context + multimodal). **Open frontier**: Llama 4, DeepSeek V3 / R1 (huge cost/quality win), Qwen 3, Mistral Large 2. **Picking**: agents and tool-use → Claude. Hardcore reasoning / research → o-series. Massive context (1M+ tokens) → Gemini. Cost-sensitive self-host → DeepSeek / Llama 4. Real prod stacks route across multiple via an AI gateway — no single model wins everything.
16Why run an LLM locally with Ollama / vLLM instead of using GPT / Claude APIs?
**Reasons**: (1) data sovereignty (regulated, on-prem, RBI/HIPAA/GDPR), (2) zero per-token cost for very high volume, (3) offline use, (4) lower p99 latency, (5) fully tunable (LoRA). **Trade-offs**: weaker reasoning than frontier closed models, GPU OpEx, ops burden (vLLM upgrades, model swaps, eval drift). **Hybrid is the 2026 norm**: local model handles 80% bulk + sensitive, route to Claude/GPT only for hard cases via the AI gateway.
17What is LLMOps and how does it differ from MLOps? What does an LLMOps pipeline contain?
**MLOps** = train → validate → deploy → monitor a model you own. **LLMOps** adds: prompt management (versioned in Git, like code), eval-as-CI (golden set + LLM-judge), guardrails, RAG infra (chunker, embedder, vector store, reranker), observability (traces of multi-step chains, token cost, hallucination rate), agent runtime (memory, tool registry, sandboxes), and continuous red-teaming. **Tooling 2026**: LangSmith / Langfuse / Arize for traces, Promptfoo / Ragas / DeepEval for tests, Portkey / LiteLLM for gateway, Bedrock AgentCore / LangGraph Cloud for agent runtime.
18Walk me through a real Agentic AI architecture you'd ship to production (system-design round).
**User → Web app → AI Gateway → Orchestrator Agent**. Orchestrator (LangGraph state machine on EKS or Bedrock AgentCore) decides intent and routes to specialist agents (each = its own service with scoped MCP toolset). **Tool layer**: MCP servers for GitHub, Jira, AWS, internal APIs — deployed as sidecars or shared service. **Memory**: Redis for session, pgvector / OpenSearch for long-term, Mem0 / Letta for user profile. **Eval & obs**: every span to Langfuse + Datadog APM. **Guardrails**: Bedrock Guardrails / Llama Guard on input + output; prompt-injection scanner on retrieved content. **Deploy**: GitHub Actions → ECR → ArgoCD on EKS, Karpenter for GPU autoscale, blue/green per agent. **Cost**: AI gateway with prompt caching + cheap-first model routing keeps cost <30% of naive design.

🐳 Docker & Containers

Image hygiene, layering, security — every DevOps round opens with these.

01Your Docker image is 1.8 GB. How do you bring it under 200 MB without breaking the app?
Multi-stage build (build deps in stage 1, copy only the binary/dist into stage 2). Use distroless or alpine base. Combine RUN layers + clean apt cache in same layer. Use .dockerignore (node_modules, .git, tests). Pin specific tags (no :latest). For Node/Python, prune dev deps. Verify with `docker history` and `dive`.
02Difference between CMD, ENTRYPOINT, and RUN — when do they fight each other?
RUN executes at build time and bakes into the image. ENTRYPOINT is the executable; CMD is the default arguments. If both are set, CMD becomes args to ENTRYPOINT. Override at runtime: `docker run img arg` replaces CMD; `--entrypoint` replaces ENTRYPOINT. Most production images use ENTRYPOINT ["/app/bin"] + CMD ["serve"].
03How would you scan and sign images in a CI pipeline?
Build → `trivy image --severity HIGH,CRITICAL --exit-code 1` → generate SBOM with Syft → sign with Cosign (`cosign sign --key ...` or keyless via OIDC) → push to registry → admission controller (Kyverno/OPA Gatekeeper) verifies signature before pulling in K8s.
04What happens to a running container if you delete its image?
Container keeps running — the image layers are reference-counted, the running container holds them open. New containers can't be started from that tag. `docker pull` again if you need it back. Lesson: image deletion is safe-ish at runtime but breaks rollouts.

🐍 Python for DevOps / SRE

Scripting, async, packaging — asked when role mentions automation.

01What's the GIL and when does it actually hurt you?
Global Interpreter Lock = only one thread executes Python bytecode at a time per process. Hurts CPU-bound workloads (use multiprocessing or release the GIL via C extensions / numpy). Doesn't hurt I/O-bound — threads still help because GIL is released during I/O syscalls. Modern fix: `asyncio` for I/O concurrency, `multiprocessing`/`concurrent.futures.ProcessPoolExecutor` for CPU.
02How do you package a CLI tool so a teammate can `pipx install` it?
Project layout with pyproject.toml (PEP 621), define [project.scripts] entrypoint, build with `python -m build`, publish to PyPI or internal index. Use `uv` or `poetry` for fast deps. Pin Python version with `requires-python`. Add `--version` flag and tests.
03list vs tuple vs set vs dict — pick one for: 10M lookups, ordered config, dedupe, immutable record.
10M lookups → dict or set (O(1)). Ordered config → list (or dict — preserves insertion order since 3.7). Dedupe → set. Immutable record → tuple (or `frozen=True` dataclass / NamedTuple for clarity).
04How do you avoid blocking the event loop in asyncio?
Never call sync I/O directly inside async functions. Wrap CPU/blocking calls with `asyncio.to_thread()` or `loop.run_in_executor()`. Use async libraries (httpx, aiobotocore, asyncpg). Profile with `loop.set_debug(True)` to catch slow callbacks.

🏛️ System Design (Cloud-native)

Asked at every senior+ round. Focus on trade-offs, not buzzwords.

01Design a URL shortener that handles 10K writes/sec and 1M reads/sec.
Writes: app tier → ID generator (Snowflake/KGS) → base62 encode → write to a sharded KV (DynamoDB / Cassandra) with TTL if needed. Reads: heavy CDN cache (CloudFront) → Redis → DB. Custom aliases via conditional write. Analytics via async stream (Kinesis/Kafka) → S3+Athena. Bottleneck moves from DB to cache; design for 95% cache hit.
02How do you design a multi-region active-active deployment for a SaaS app?
Stateless app behind global LB (Route 53 latency / Global Accelerator / CloudFront). Region-local Redis. Database: either Aurora Global (1 writer + read replicas) or DynamoDB Global Tables (true active-active, eventually consistent). Conflict resolution = last-writer-wins or app-level CRDTs. Test failover monthly with chaos drills.
03How would you handle 'thundering herd' on a cache miss for a hot key?
Single-flight pattern: only one request rebuilds the value, others wait on the same in-flight promise. Or use 'stale-while-revalidate' — serve stale value, async refresh. Add jitter to TTLs so keys don't expire together. For very hot keys, write-through cache or local LRU on app pods.
04When would you choose Kafka over SQS, and vice versa?
Kafka: ordered, replayable, high-throughput streaming, multiple consumer groups reading the same topic, log of truth. SQS: simple managed queue, per-message ack, dead-letter built-in, no ops. Kafka if you need replay/multiple consumers/streaming joins; SQS for simple work distribution.

🤝 Behavioural / Leadership

STAR format. Hiring managers weight these as heavily as tech rounds.

01Tell me about a time you disagreed with your manager. (STAR)
S: We were about to ship a release without a rollback plan to hit a deadline. T: My manager wanted to ship; I felt the risk was too high. A: I quantified blast radius in numbers (X users, Y$ revenue/min if down), proposed a 2-day delay to add canary + auto-rollback. R: Delay accepted, canary caught a regression, no incident. Lesson: disagree with data, not opinions.
02Tell me about a production incident you led.
S: Payment service 5xx spike to 40%. T: I was on-call, lead responder. A: declared SEV-2, rolled back last deploy in 4 min, opened comms channel, kept stakeholders updated every 10 min, root-caused to a connection-pool exhaustion. R: Restored in 22 min; postmortem identified missing connection limit + alert; both shipped within 1 week. No customer escalation.
03How do you handle a teammate who consistently misses deadlines?
1:1 first — understand if it's blockers, scope, skill, or personal. Reset expectations with clearer DoD on tickets. Pair-program on tricky ones. Escalate to manager only if pattern persists after 2-3 cycles. Never go behind their back; never publicly call out in stand-up.
04Why do you want to leave your current role?
Always positive + forward-looking. 'Looking for [scale / domain / tech / growth] that my current role can't offer.' Never bash current employer or manager. Pair with what excites you about *this* role specifically — show you've done your homework.
05Where do you see yourself in 3 years?
Anchor in skills + impact, not titles. Example: 'Owning a platform that serves 100+ engineers, leading 1-2 ICs technically, deep in [Agentic AI / Platform Eng / SRE]. Open to staff/lead path if the org needs it.' Avoid 'in your job' (cringe) and avoid 'don't know' (red flag).

Want to crack these in real interviews?

Join a Cloudadhar batch — we run weekly mock interviews with detailed feedback on exactly these question patterns, plus salary negotiation playbooks.