Site Reliability Engineer — Cloud × IoT × AI

I build systems that run themselves.

Muhammad Ilham Kurniawan — engineer with 15+ years across infrastructure, multi-cloud operations, and intelligent automation. By day I keep a digital bank reliable and secure; beyond that, I design platforms that monitor, heal, and operate with minimal human intervention.

Tangerang, IndonesiaSRE @ Amar Bank — Tolaram GroupM.Eng candidate
15+
Years in Engineering
3
Cloud Platforms
6
Certifications
10+
Systems Shipped
About

Reliability by profession.
Autonomy by obsession.

My career began in server rooms — pulling cable, hardening Linux boxes, and keeping a steel manufacturer's network alive. Fifteen years later, the same instinct drives me at enterprise scale: as a Site Reliability Engineer at PT Bank Amar Indonesia (Tolaram Group), I own reliability strategy for digital-banking platforms — SLOs, incident management, data pipelines, and the automation that holds it all together.

The best infrastructure is the kind nobody has to think about — it observes itself, heals itself, and quietly does its job.

That conviction extends past the day job. I build AI agents that administer servers through natural language, smart buildings that track their own energy, and pipelines where Terraform, Ansible, and Kubernetes turn infrastructure into code. As a trainer in Cloud, DevOps, and IoT, I help other engineers do the same.

Quick Facts

Current role
SRE, Amar Bank
Group
Tolaram Group
Focus
Cloud × IoT × AI
Clouds
GCP · AWS · Alibaba
Education
M.Eng (thesis phase)
Also
Trainer — Cloud, DevOps, IoT
Selected Work

Projects that run without me

A selection of personal systems — several live in private repositories; a walkthrough or demo is available on request.

RAE — Remote Agent Executor

AI × Operations
Flagship · Go · MCP Server

What if an AI agent could safely run your servers? RAE is an Ansible-like remote-execution engine, written in pure Go, that speaks the Model Context Protocol — so Claude, Gemini, or any MCP client can inventory hosts, run playbooks, and learn from every execution.

  • MCP server exposing 10 operational tools to AI agents over SSH.
  • Adaptive memory in SQLite — learns command patterns, suggests next steps.
  • NLP intent recognition (TF-IDF + cosine) in Indonesian and English.
  • Anomaly detection flags unusual failure rates before they become incidents.
  • Ansible-compatible YAML inventory; concurrent SSH pooling; dry-run mode.
GoMCPSQLite SSHNLPCobra

An agent at work

User → AI: "update semua server" AI → RAE: list_hosts() ← web01 (ubuntu), db01 (ubuntu) AI → RAE: run_command(host="web01", cmd="apt upgrade -y", sudo=true) ← status: ok AI → RAE: get_suggestions(...) ← "Biasanya setelah update, Anda restart nginx. Lakukan sekarang?" AI → User: "Semua server updated ✓"

Shopping Agent

Agentic AI

A multi-agent system that researches, compares, and checks out products on online shops autonomously — agents divide the work of searching, evaluating, and acting through a real browser.

LangGraphPythonPlaywrightMulti-Agent

AI Intelligence System

Agentic AI

A personal AI chief-of-staff that runs unattended: daily intelligence briefings, contextual reminders, and system health reports — delivered before I ask.

LLMHermesCronAutomation

Smart Home Automation

IoT & Edge

A fully self-hosted smart home: AI-assisted control of air-conditioning, security, and energy monitoring — local-first, cloud-optional, and private by design.

Home AssistantMQTTTuyaNode-REDESP32

JurnalSearch

Research Tool

An academic search engine that queries CrossRef, Semantic Scholar, Google Scholar, and OpenAlex in parallel — enriched with Scopus Quartile, SINTA, and SJR rankings, streamed live over SSE, exportable to CSV/BibTeX.

PythonFlaskaiohttpPandas

School Information System

EdTech

Education-management modules for schools — student attendance, discipline-point tracking, and e-report cards (e-raport) — built to slot cleanly alongside Moodle e-learning when schools are ready to integrate.

PHPMySQLMoodle-ready

Multi-Cloud Hosting Platform

Cloud & Ops

A self-managed hosting platform serving schools and SMBs across Indonesia — provisioning, SSL, DNS, and mail, spread across GCP, AWS, and Alibaba Cloud.

HestiaCPNginxGCPAWSAliyun

IoT & DevOps Curriculum

Training

Authored open course materials and working source code for fundamental and advanced IoT classes (ESP32/C++), plus DevOps modules used in professional training programs.

ESP32C++CurriculumDevOps
Capabilities

What I bring to the table

Site Reliability Engineering

SLO/SLI design, incident management, and reliability strategy for mission-critical banking platforms.

Cloud & DevOps

Multi-cloud (GCP, AWS, Alibaba), Kubernetes, CI/CD with Jenkins · GitLab · Spinnaker, IaC with Terraform & Ansible.

Monitoring & Observability

The Grafana stack — Prometheus, Loki, Alertmanager — turned into dashboards, alerts, and answers.

IoT & Smart Building

Sensor networks, energy monitoring, and building automation — ESP32, MQTT, Home Assistant, Node-RED.

AI & LLM Engineering

Agentic workflows (LangGraph, MCP), RAG pipelines, local LLM deployment with Ollama, prompt & context engineering.

Training & Advisory

Cloud, DevOps, and IoT training from fundamentals to advanced; architecture advisory and knowledge transfer.

Track Record

Fifteen years, four industries,
one throughline: keep it running

  • Sep 2020 — Present

    DevOps Engineer → Site Reliability Engineer

    PT Bank Amar Indonesia Tbk (Tolaram Group) · Jakarta

    Joined as DevOps Engineer running multi-cloud infrastructure on GCP, AWS, and Alibaba Cloud with end-to-end release automation (Jenkins, GitLab CI/CD, Spinnaker); promoted to SRE in 2023 to own reliability strategy for the group's digital-banking platforms — monitoring, incident management, and data pipelines.

  • Mar 2017 — Aug 2020

    DevOps Engineer / IT System Administrator

    PT Indodev Niaga Internet (DataOn) · South Tangerang

    Owned enterprise SaaS infrastructure on AWS plus main and DR data centers; migrated legacy Xen virtualization to Docker and OpenShift OKD.

  • Oct 2014 — Feb 2017

    System Engineer

    Surya Institute / STKIP Surya · South Tangerang

    Administered Moodle LMS end-to-end and built school information systems — attendance, discipline points, and e-report cards — for uninterrupted academic operations.

  • Feb 2010 — Sep 2014

    IT System & Network Administrator — Coordinator

    PT Patama Adijaya Steel · Tangerang

    Planned and ran company-wide IT — servers, networks, CCTV — while coordinating the IT team and supporting ISO 9001:2008 certification.

Credentials

Certified, and still studying

Google CloudDeveloper
Alibaba Cloud ACACloud Computing
Alibaba Cloud ACACloud Security
Red HatAdministrator
Mikrotik MTCNANetwork Associate
Mikrotik MTCRERouting Engineer
Universitas PamulangMaster's (S2), Informatics Engineering · 2024 – present · thesis phase
University of RaharjaB.Sc. (S1), Computer Systems · 2006 – 2010
Contact

Let's build something that runs itself.

Open to collaboration on Cloud × IoT × AI projects — consulting, architecture reviews, training, or a conversation about making systems more autonomous.