Available for senior & principal roles

Joshua Tobin
Principal Engineer

20+ years building and operating production systems at scale — from Second Life's virtual world to Heroku's PaaS to MuleSoft's global compliance infrastructure. Now designing AI-native tools for the next era of distributed systems.

View Experience Get in Touch
$173M
ACV unlocked via Hyperforce expansion
60%
reduction in deployment times
50M+
users supported (Second Life, Heroku, MuleSoft)
20+
years in production systems
// about

From colocation to cloud-native
to agentic AI

I've been in the engine room for over two decades. I've managed 1,000+ Linux servers supporting Second Life's 50 million users, built SRE culture at Heroku from the ground up as the first external hire, and led the compliance engineering that unlocked $173M in ACV for MuleSoft's regulated market expansion.

My edge isn't just the tenure — it's the breadth. I can write Terraform, debug a Kubernetes scheduler, build a Golang test suite, and lead a cross-functional incident response. I've operated at every layer of the stack, which means I design systems that actually survive contact with production.

Today I'm building AI-native tools — applying the same automation mindset I've carried for 20 years to the agentic era. Always be shipping. LedgerWatchdog is my current focus: an agentic billing intelligence platform in active development, purpose-built to keep my AI engineering skills sharp and prove out detection pipelines at scale.

The agentic era needs engineers who understand production.

AI agents fail in the same ways distributed systems fail — at scale, under load, at the edges. I know how to build the infrastructure that makes them reliable.

Location: Ballston Spa, NY (fully remote since 2012)
Education: B.T. in CIS — SUNY Cobleskill, 2003
Currently: Principal SRE at MuleSoft / Salesforce

// experience

Where I've built things

Principal Site Reliability Engineer / PMTS
Feb 2021 – Present
MuleSoft (Salesforce)
The world's #1 integration and API platform. $5B ARR, ~10,000 employees.
  • Led Hyperforce expansion to Japan and Canada, achieving SOC, ISO, and PCI compliance — unlocking $173M in ACV across regulated markets
  • Architected phased migration using the Strangler Facade pattern with 99.9% SLA, leading a 2-engineer team
  • Developed Terraform pipeline cutting EKS cluster deployment time by 30%; built Golang/Python test suites eliminating previously untestable code
  • Redesigned Golang-based deployment pipeline using Kubernetes, Postgres, and Argo Workflows
  • Introduced IAM security automation framework within a Zero-Trust architecture, simplifying compliance audits
Senior Site Reliability Engineer
Feb 2014 – Mar 2021
Heroku (Salesforce)
Leading PaaS provider, core to Salesforce's multi-billion-dollar revenue stream.
  • Established the SRE group as the first external hire — built incident response, RCA, and on-call culture from the ground up
  • Rebuilt status.heroku.com from scratch using Ruby on Rails, improving customer visibility and incident response speed
  • Developed Ruby compliance data management framework reducing security access risk by 90%
  • Achieved 100% monitoring adoption across all service teams (Nagios, Splunk, bespoke synthetic monitoring)
  • Built internal sitrep system used during major incidents across engineering and support
Cloud Systems Engineer
Jul 2012 – Feb 2014
Audax Health (Optum / UnitedHealth Group)
Digital health and gamification platform, acquired by Optum.
  • First fully remote Cloud Systems Engineer on the team, reporting directly to Director of Operations
  • Automated AWS deployments with Puppet and Chef, reducing setup times by 20%+
  • Deployed and maintained ELK stack for real-time logging and operational visibility
  • Migrated infrastructure bootstrapping from Puppet to Chef, improving configuration consistency
Services Engineer
Jan 2012 – Jul 2012
Canonical
Commercial Ubuntu and OpenStack support, $150M+ revenue.
  • Delivered Ubuntu/OpenStack deployments via Canonical's Jumpstart program, reducing client deployment time by 15%
  • Provided on-site integration guidance across client infrastructure (50% travel)
Systems Engineer
Feb 2007 – Jan 2012
Linden Lab (Second Life)
Creator of Second Life — a virtual world with 50M+ registered users.
  • Managed colocation infrastructure targeting 99.9% uptime for 50M+ users
  • Administered 1,000+ Linux servers powering 30,000+ virtual simulators
  • Managed critical services: DHCP/DNS, server imaging, email, monitoring
  • Transitioned to fully remote role while maintaining seamless operational continuity
Director of IT
Jul 2006 – Feb 2007
Destination Hotels & Resorts
40+ property luxury hotel brand, $500M annual revenue.
  • Led all IT operations for Lake Tahoe resort including Windows infrastructure, network, and vendor management
  • Designed and implemented point-to-point Wi-Fi network between hotel and service buildings
Systems Administrator
Aug 2005 – Aug 2006
Porter Novelli (Omnicom Group)
Global PR agency, subsidiary of Omnicom.
  • Managed network operations across 4 offices supporting 150+ users, reporting to the CIO
  • Led Sarbanes-Oxley compliance initiatives and managed IT product testing lab
Systems Administrator
Jul 2003 – Aug 2005
Whiteman Osterman & Hanna LLP
Leading Albany, NY law firm.
  • Managed LAN/WAN infrastructure, thin client network, and SQL billing systems
  • Implemented Linux-based spam solution and managed help desk staff
// projects

What I've been building

AI-powered billing intelligence
LedgerWatchdog

An agentic AI platform in active development that analyzes invoices and financial documents to detect duplicate charges, suspicious transactions, and tax write-off opportunities. Currently in alpha — built to prove out the detection pipeline before onboarding customers.

AlphaIn development
PythonAI/LLMFastAPIPostgreSQL
Historic property showcase & real estate marketing
Winona Camps ADK

A personal project showcasing the history and character of The Winona Camps — a historic Adirondack property at Big Moose Station that I own. Built as both an archival showcase and a marketing platform ahead of sale.

Node.jsWeb
Team and roster management
AlpineRoster

An application for managing alpine team rosters and member coordination.

Web
// skills

Technical stack

Cloud & Infrastructure
AWS (EKS, S3, Lambda, RDS)GCPKubernetesDockerECS
Infrastructure as Code
TerraformCloudFormationPuppetChef
CI/CD & Automation
ArgoCDArgo WorkflowsCircleCIJenkins
Languages
PythonGolangRubyBashNode.js
Observability
PrometheusGrafanaELK StackSplunkNagios
Security & Compliance
Zero Trust / IAMSOC 2ISO 27001PCI DSSFedRAMPSOX
// contact

Let's talk

Looking for senior, principal, or staff-level engineering roles in cloud infrastructure, platform engineering, or AI systems. Open to full-time roles.