Senior DevOps Engineer (Azure)
Quarry Consulting
Software Engineering
Kanata, Ottawa, ON, Canada · Ontario, Canada
Posted on Mar 14, 2025
Title: Sr. Cloud Site Reliability Engineer/ DevOps Engineer
Location: Toronto, ON - some on-site required for meetings
Duration: Permanent, Full-time
Qualifications:
What will you do?
- Lead and manage the resolution of complex technical issues involving company products and Azure cloud environment.
- Design and implement strategic operational enhancements to improve resiliency and system reliability.
- Conduct in-depth Root Cause Analysis (RCA) for high-severity incidents and drive initiatives to reduce error recurrence.
- Represent the organization in external client escalation calls, providing expert guidance and solutions.
- Architect and optimize cloud infrastructure for high performance, scalability, and cost-effectiveness.
- Provide thought leadership in managing and scaling container orchestration platforms such as AKS and OpenShift.
- Oversee the implementation of advanced monitoring solutions and integrate predictive analytics for proactive issue resolution.
- Develop and execute automation strategies to streamline operational workflows and incident responses.
- Create and maintain comprehensive documentation of cloud architectures, processes, and incident management strategies.
- Mentor and coach junior engineers, fostering a culture of continuous learning and innovation.
- Drive strategic initiatives, collaborating with cross-functional teams to achieve organizational objectives
What do you need to succeed?
Must Haves:
- Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s degree preferred).
- Senior level of experience with cloud support, operations, or a related role.
- Advanced expertise in Microsoft Azure (preferred) or equivalent cloud platforms.
- Demonstrated experience in designing and scaling container orchestration systems like AKS or OpenShift.
- Proven leadership in managing automated deployment pipelines, including Azure DevOps.
- Mastery in enterprise monitoring platforms (e.g., Azure Insights, Grafana) and predictive analytics tools.
- Advanced scripting skills with PowerShell, Python, or similar languages.
- Extensive experience in incident management and defining SLAs for global production environments.
- In-depth knowledge of database management, particularly Postgres
Preferred Qualifications:
- Advanced certifications in cloud platforms (e.g., Azure Solutions Architect Expert).
- Experience with ITSM tools and processes (e.g., ServiceNow).
- Comprehensive understanding of security and compliance in cloud environments.