Career Profile

Results-driven Site Reliability Engineer specializing in observability engineering. Experienced in building scalable monitoring, logging, and tracing systems using tools like Prometheus, Grafana, and OpenTelemetry to enhance system reliability and performance. Passionate about driving automation, improving incident response, and contributing to operational excellence. Eager to leverage expertise and grow into leadership roles to deliver even greater value to the organization.

Experiences

Sr/Lead Site Reliability Engineer

October 2022 - Current
GEICO, Remote Az
  • Actively participated in the design and implementation of Grafana OSS.
  • Built API/UI for On Prem servers Observability Service Discovery.
  • Interviewed and trained a new SRE team.
  • Director and Executive level communications establishing needs and plans to implement SRE.
  • Managed SRE team goals, metrics and personal development.
  • Research and Development

    April 2020 - June 2022
    USAA, Phoenix Az
  • Reduced Mean Time To Detect and created support documents to reduce Mean Time To Resolve.
  • Created Product Readiness Review for applications being introduced to the production environment.
  • Worked with Business Teams to establish Service Level Objectives.
  • Worked with Product Development to maintain Service Level Objectives.
  • Made decisions regarding company resources impacting logging, monitoring and alerting.
  • Sr. Site Reliability Engineer

    MARCH 2019 - APRIL 2020
    American Express(OnX), Phoenix Az
  • Managed SRE team for a $147m payment platform.
  • Responsible for maintaining 99.999% Service Level Objectives.
  • Worked directly with VP (Executive Communications) and product development teams.
  • Focused on efficiency regarding logging, monitoring, alerting and incidents.
  • Site Reliability Engineer

    OCTOBER 2012 - MARCH 2019
    PayPal, Scottsdale Az
  • In the Command Center his responsibility was to respond to incidents and restore service.
  • Responsible for all customer facing interactions (payment, website, mobile and POS).
  • Engaged with Product Development for Post Mortem activities.
  • Professional Highlights

    Grafana OSS Stack
    • Grafana dashboarding and alerting
    • Prometheus/Blackbox Exporters
    • Service Discovery Automation
    Pyton and Flask
    • FastAPI/Swagger
    • Pandas data management
    • SQLAlchemy
    Incident Mangement
    • RCA and Postmortem
    • Incident Triage
    • Automation & Remediation
    Docker and K8s
    • Containerized app development
    • Deployment with Helm
    • Flux & ArgoCD