Skilled Sr Site Reliability Engineer is looking to expand his career with aspirations of becoming a Director of IT. His skills include Incident Management, Site Triage, Logging and Monitoring (Elk, Prometheus and Grafana, Splunk and DataDog), Training Experience and People Management. He has reported to VP level and performed Director Level activities..
- Helped implement, support and consult for new cross cloud platforms.
- Implemented the first SRE team at GEICO.
- Interviewed, hired and trained a new SRE team.
- Director and Executive level communications establishing needs and plans to implement SRE.
- Managed SRE team goals, metrics and personal development.
- Implemented logging and monitoring with ELK, Prometheus, Datadog and others.
- Reduced Mean Time To Detect and created support documents to reduce Mean Time To Resolve.
- Created Product Readiness Review for applications being introduced to the production environment.
- Worked with Business Teams to establish Service Level Objectives.
- Worked with Product Development to maintain Service Level Objectives.
- Made decisions regarding company resources impacting logging, monitoring and alerting.
DNS and Home Monitoring
- Installed Pi Hole to manage DNS.
- Setup Prometheus with Node Exporter and Blackbox Exporter.
- Created dashboards in Grafana to visualize ISP health and CPU metrics.
- Alert Manager used to notify of problems.
- Setup VPN so mobile devices can use Pi Hole.
- Devlope Page Sites.
- Web Hosting.
- Search Engine Optimization.
- Google Maps Listings.
- Website Analytics.
- Web Montioring.