Systems Administrator
Nokia
We are looking for a skilled Linux System Administrator to support our Machine Learning and Artificial Intelligence operations. The successful candidate will be responsible for ensuring the stability, scalability, and security of our Linux-based infrastructure, which includes but is not limited to, clusters, grids, and clouds. This role requires strong technical expertise in Linux system administration, as well as experience with containerization (e.g., Docker) and, orchestration (e.g., Kubernetes). The ideal candidate will have a passion for ML/AI and be eager to collaborate with our data science and engineering teams to optimize our workflows.
- Corporate Retirement Savings Plan
- Health and dental benefits
- Short-term disability, and long-term disability
- Life insurance, and AD&D – Company paid 2x base pay
- Optional or Supplemental life and AD&D insurance (Employee/Spouse/Child)
- Paid time off for holidays and Vacation
- Employee Stock Purchase Plan
- Tuition Assistance Plan
- Adoption assistance
- Employee Assistance Program/Work Life Resource Program
Seeking an experienced Linux System administrator to join our engineering support team responsible for supporting the infrastructure and systems that power our ML/AI workflows.
- In-depth knowledge of Linux distributions (e.g., Ubuntu, CentOS), including kernel tuning, system configuration, and troubleshooting.
- Experience with containerization using Docker and orchestration using Kubernetes.
- Experience with configuration management tools (e.g., Ansible, SaltStack).
- Excellent problem-solving skills, with the ability to work independently and as part of a team.
- Strong communication and documentation skills.
It would be nice if you also had:
- Experience with ML/AI frameworks and libraries (e.g., TensorFlow, PyTorch).
- Knowledge of data storage solutions (e.g., HDFS, Ceph).
- Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
Manage and maintain the health of our Linux-based infrastructure, including servers, clusters, grids, and clouds.
- Ensure system uptime, performance, and security by monitoring logs, metrics, and alerts.
- Implement automation tools (e.g., Ansible, SaltStack) to streamline system deployment, configuration, and management.
- Collaborate with data science and engineering teams to design and implement optimized workflows for ML/AI workloads.
- Provide technical guidance on Linux system administration best practices and standards.
- Troubleshoot complex system issues and provide timely resolution.
- Develop and maintain documentation of system configurations, procedures, and troubleshooting guides.