Sr. SaaS Site Reliability Engineer
Nokia
Description
This role is designed around a lead position within the team that will help in shaping the processes, tools, and capabilities and take the SaaS SRE team to world class quality and capability!
This SRE lead position will be a key role in the ongoing success of the SaaS business and protecting customer Annual Recurring Revenue by assuring service reliability and instilling absolute confidence in service quality and security.
Nokia is a global leader in connectivity for the AI era. With expertise across fixed, mobile and transport networks, powered by the innovation of Nokia Bell Labs, we’re advancing connectivity to secure a brighter world.
- Flexible and hybrid working schemes
- A minimum of 90 days of Maternity and Paternity Leave, with the option to return to work within a year following the birth or adoption of a child (based on eligibility)
- Life insurance to all employees to provide peace of mind and financial security
- Well-being programs to support your mental and physical health
- Opportunities to join and receive support from Nokia Employee Resource Groups (NERGs)
- Employee Growth Solutions to support your personalized career & skills development
- Diverse pool of Coaches & Mentors to whom you have easy access
- A learning environment which promotes personal growth and professional development - for your role and beyond
The Nokia CNS SaaS SRE Operations department is looking for a strong and motivated Site Reliability Engineer (SRE). This role is designed around a position within the team that will help shape processes, tools, and capabilities and take the SaaS SRE team to world class quality and capability!
You have:
12+ years of operations, support, SRE, DevOps or related experience, Strong communication skills, including ability to create presentations or dashboards.
Experience or familiarity with public cloud native services and components (AWS, GCP, Azure)
Experience or familiarity with DevOps technologies (examples: GitHUB, Terraform/Terragrunt, etc)
Experience or familiarity with Kubernetes and related technologies (docker, helm, k8s API)
Experience or familiarity with Datadog Monitoring tool and ticketing systems like SF.com, ServiceNow, Jira including process and even API integrations
Experience with documentation management using Confluence, SharePoint and MS Teams
It would be nice if you also had:
Experience in Auto-recovery DevOps for continuous service improvement (Backup & Restore strategy).
Experience in L2/L3/L3 Application support integration with BU and Product teams
Recover, restore, and build self-recovery capability for cloud-native services and components (AWS, GCP and Azure)
Ensure Service Assurance for SaaS applications (use cases) deployed across all public cloud hyperscaler providers that CNS SaaS will have.
L1/L2 Site Reliability Engineering Operations (event & incident management, change management and execution, security and privacy compliance remediation and mitigation)
Provide technical and operational leadership over Agile DevOps practices including documentation, iteration, planning, scheduling, coordinating and executing
Help devise and execute strategies for accomplishing service assurance improvements using creative and cost-effective means and methods
Encourage and foster SRE contribution and input/participation in continuous service improvements – both technical and procedural
Collaborate with team members and peers/partner organizations to determine and define best practices that bring benefits to SRE Operations and the SaaS organization
Work with Product Managers and R&D teams of SaaS applications (use cases) to determine and support service-level agreements (SLAs), service-level indicators (SLIs) and service-level objectives (SLOs)