Joshua Tobin — Principal Engineer

// about

From colocation to cloud-native
to agentic AI

"The engineers who will define the next decade aren't the ones who just learned AI — they're the ones who spent 20 years learning how systems fail, and now know exactly where to put the intelligence layer."

I've been in the engine room for over two decades. I've managed 1,000+ Linux servers supporting Second Life's 50 million users, built SRE culture at Heroku from the ground up as the first external hire, and led the compliance engineering that unlocked $173M in ACV for MuleSoft's regulated market expansion.

My edge isn't just the tenure — it's the breadth. I can write Terraform, debug a Kubernetes scheduler, build a Golang test suite, and lead a cross-functional incident response. I've operated at every layer of the stack, which means I design systems that actually survive contact with production.

Today I'm building AI-native tools — applying the same automation mindset I've carried for 20 years to the agentic era. Always be shipping. LedgerWatchdog is my current focus: an agentic billing intelligence platform in active development, purpose-built to keep my AI engineering skills sharp and prove out detection pipelines at scale.

The agentic era needs engineers who understand production.

AI agents fail in the same ways distributed systems fail — at scale, under load, at the edges. I bring the rare combination of operational depth and AI-native thinking that turns ambitious systems into reliable ones.

I don't just ship features — I build platforms.

Whether it's a Terraform pipeline, a compliance framework, or an agentic AI product, I operate at the layer where engineering decisions become business outcomes.

Location: Ballston Spa, NY (fully remote since 2012)
Education: B.T. in CIS — SUNY Cobleskill, 2003
Currently: Principal SRE at MuleSoft / Salesforce

// experience

Where I've built things

Principal Site Reliability Engineer / PMTS

Feb 2021 – Present

MuleSoft (Salesforce)

The world's #1 integration and API platform. $5B ARR, ~10,000 employees.

Led Hyperforce expansion to Japan and Canada, achieving SOC, ISO, and PCI compliance — unlocking $173M in ACV across regulated markets
Architected phased migration using the Strangler Facade pattern with 99.9% SLA, leading a 2-engineer team
Developed Terraform pipeline cutting EKS cluster deployment time by 30%; built Golang/Python test suites eliminating previously untestable code
Redesigned Golang-based deployment pipeline using Kubernetes, Postgres, and Argo Workflows
Introduced IAM security automation framework within a Zero-Trust architecture, simplifying compliance audits

Senior Site Reliability Engineer

Feb 2014 – Mar 2021

Heroku (Salesforce)

Leading PaaS provider, core to Salesforce's multi-billion-dollar revenue stream.

Established the SRE group as the first external hire — built incident response, RCA, and on-call culture from the ground up
Rebuilt status.heroku.com from scratch using Ruby on Rails, improving customer visibility and incident response speed
Developed Ruby compliance data management framework reducing security access risk by 90%
Achieved 100% monitoring adoption across all service teams (Nagios, Splunk, bespoke synthetic monitoring)
Built internal sitrep system used during major incidents across engineering and support

Cloud Systems Engineer

Jul 2012 – Feb 2014

Audax Health (Optum / UnitedHealth Group)

Digital health and gamification platform, acquired by Optum.

First fully remote Cloud Systems Engineer on the team, reporting directly to Director of Operations
Automated AWS deployments with Puppet and Chef, reducing setup times by 20%+
Deployed and maintained ELK stack for real-time logging and operational visibility
Migrated infrastructure bootstrapping from Puppet to Chef, improving configuration consistency

Services Engineer

Jan 2012 – Jul 2012

Canonical

Commercial Ubuntu and OpenStack support, $150M+ revenue.

Delivered Ubuntu/OpenStack deployments via Canonical's Jumpstart program, reducing client deployment time by 15%
Provided on-site integration guidance across client infrastructure (50% travel)

Systems Engineer

Feb 2007 – Jan 2012

Linden Lab (Second Life)

Creator of Second Life — a virtual world with 50M+ registered users.

Managed colocation infrastructure targeting 99.9% uptime for 50M+ users
Administered 1,000+ Linux servers powering 30,000+ virtual simulators
Managed critical services: DHCP/DNS, server imaging, email, monitoring
Transitioned to fully remote role while maintaining seamless operational continuity

Director of IT

Jul 2006 – Feb 2007

Destination Hotels & Resorts

40+ property luxury hotel brand, $500M annual revenue.

Led all IT operations for Lake Tahoe resort including Windows infrastructure, network, and vendor management
Designed and implemented point-to-point Wi-Fi network between hotel and service buildings

Systems Administrator

Aug 2005 – Aug 2006

Porter Novelli (Omnicom Group)

Global PR agency, subsidiary of Omnicom.

Managed network operations across 4 offices supporting 150+ users, reporting to the CIO
Led Sarbanes-Oxley compliance initiatives and managed IT product testing lab

Systems Administrator

Jul 2003 – Aug 2005

Whiteman Osterman & Hanna LLP

Leading Albany, NY law firm.

Managed LAN/WAN infrastructure, thin client network, and SQL billing systems
Implemented Linux-based spam solution and managed help desk staff

// projects

What I've been building

AI-powered school lunchbox planner

Munchly.ai

Kids swipe on AI-generated lunch suggestions, parents get peace of mind. A mobile-first product built on React Native and Cloudflare's edge stack, with Claude powering personalized meal recommendations. Waitlist open.

In developmentWaitlist open

React NativeExpoClaude AICloudflare WorkersHonoD1

↗ Live

AI-powered billing intelligence

LedgerWatchdog

An agentic AI platform in active development that analyzes invoices and financial documents to detect duplicate charges, suspicious transactions, and tax write-off opportunities. Currently in alpha — built to prove out the detection pipeline before onboarding customers.

AlphaIn development

PythonAI/LLMFastAPIPostgreSQL

↗ Live

Historic property showcase & real estate marketing

Winona Camps ADK

A personal project showcasing the history and character of The Winona Camps — a historic Adirondack property at Big Moose Station that I own. Built as both an archival showcase and a marketing platform ahead of sale.

Node.jsWeb

↗ Live

Team and roster management

AlpineRoster

An application for managing alpine team rosters and member coordination.

Coming soon

Web

Joshua Tobin
Principal Engineer

From colocation to cloud-native
to agentic AI

Where I've built things

What I've been building

Technical stack

Let's talk

Joshua TobinPrincipal Engineer

From colocation to cloud-nativeto agentic AI

Where I've built things

What I've been building

Technical stack

Let's talk

Joshua Tobin
Principal Engineer

From colocation to cloud-native
to agentic AI