CSC Senior Reliability Engineer

CSC Senior Reliability Engineer

8-11 years
Not Specified

Job Description

Senior Reliability Engineer
Reliability Engineering (RE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. RE ensures that Roche's services-both our internally critical and our externally-visible systems-have reliability and uptime appropriate to users needs and a fast rate of improvement. Additionally RE's will keep an ever-watchful eye on the system's capacity and performance. Much of their work focuses on building infrastructure, optimizing existing systems and eliminating work through automation.
A Senior Reliability Engineer is an infrastructure engineer who knows how to apply engineering principles to operations. They are well versed in a large number of technologies and welcome new tools and techniques. They work in conjunction with fellow engineering and operations members to come to the best possible solution. They are always looking for patterns and ways to increase efficiency, eliminate downtime, optimize costs, and maintain performance at scale. They will also advise our consumers on RE value proposition, adoption, industry best practices, and implementation strategy.
REs are responsible for the big picture of how the systems relate to each other, using a breadth of tools and approaches to solve a broad spectrum of problems. Practices such as limiting time spent on operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting and dynamic day-to-day work.
RE teams will have the opportunity to manage the complex challenges of scale which are unique to Roche, while using expertise in coding, algorithms, complexity analysis and large-scale system design.
RE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. It brings together people with a wide variety of backgrounds, experiences and perspectives. They are encouraged to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. REs provide on-call support to keep systems up and running, ensuring the consumers have the best and fastest experience possible.Job Responsibilities

  • Responsible for availability, tuning, performance, efficiency, change management, monitoring, emergency response, and capacity planning.

  • Engage in and improve the whole lifecycle of services-from inception and design through deployment, operation and refinement.

  • Create a bridge between engineering and operations by applying a software engineering mindset to system administration topics.

  • Monitors and resolves Incident/problems with platform operations, setting priorities and ensuring all areas collaborate in the resolution when required.

  • Support services before they go live through activities such as infrastructure design consulting, developing software platforms and frameworks, capacity planning and launch reviews.

  • Collaborate with Managed Services suppliers and external consultancy, ensuring the collaboration is as effective as possible.

  • Scale systems sustainably through mechanisms like automation, and evolve systems by promoting changes that improve reliability and velocity.

  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.

  • Look for continuous improvement activities both in technical, teamwork, collaboration and processes areas. Propose and lead continuous improvement activities.

  • Provide direction and guidance acting as an analyst by transforming the customer needs into specific requirements to be implemented in components managed by the team or by other teams.

  • Remain proactive and aware of operational challenges and opportunities and work with support team staff to resolve incidents and major incidents.

  • Ensure implemented solutions and components comply with Quality/Regulatory standards, as applicable.
Job Requirements / Qualifications

  • Strong interpersonal skills.

  • Well demonstrated customer & delivery focus.

  • Well proven scripting and automation skills with expertise in delivering and managing infrastructure as code.

  • Ability to work effectively with team members and virtual teams from different locations and different cultural backgrounds.

  • Ability to function independently with very little supervision and navigate ambiguity.

  • Excellent problem-solving and decision-making skills.

  • Strong oral and written communication skills in English. German, Spanish or Chinese (Mandarin) are significant pluses.

  • Moderate to extensive travel (20-30%) required and ability to work across multiple time zones, including on-call.

  • One or more industry certifications in the respective infrastructure solution area(s) is highly desired.
Education / Years of Experience
8-11 years of relevant work experience
or 6-9 years with Bachelor's degree
or 3-6 years with Masters degree
At least 4 years experience of working in one or more multinational work environments (e.g. healthcare industry experience is a plus) as a senior systems or software Engineer.Technology Skills

  • Strong hands-on technical skills in automation, infrastructure as code, code quality, logging, monitoring and observability, infrastructure configuration, scripting languages and applications.

  • Experience working with Infrastructure Systems internals, their administration and networking.

  • Experience applying design thinking, lean, prioritization and agile methodologies to evolve services offered to partners.

  • Experience on the definition of technical computing infrastructure entirely under the control of software with no operator or human intervention.

  • Experience defining Service Level Objectives and Service Level Indicators.

  • Experience with DevOps mindset, processes and tools.

  • Cross-Functional Technical Knowledge, tools/scripting/methodologies for: Configuration management, Infrastructure as Code, Automation Design, Infrastructure Development Life Cycle and hybrid Clouds .

  • Experience with algorithms, data structures, complexity analysis and software design.

  • Languages: Terraform, Ansible, Desired State Configuration, PowerShell, Shell Script, Python, Groovy, Pytest, Inspec.

  • Tools : Git, Bitbucket, GitLab, Jenkins, Rundeck, AWX, Splunk, Nexus, ScriptRunner, Molecule, Conjur, Service Portal Snow, API, Mulesoft

  • Platforms : AWS, Azure, GCP, Alicloud, Kubernetes

  • Ways of Working : Design Thinking, Lean, Agile Scrum, SRE practices

About Roche

At Roche, 91,700 people across 100 countries are pushing back the frontiers of healthcare. Working together, we've become one of the world's leading research-focused healthcare groups. Our success is built on innovation, curiosity and diversity. Roche Diagnostics is committed to create a great place to work for its employees and in 2015 we have been accredited as Regional Best Employer Asia Pacific by Aon Hewitt. Individual Best Employer Awards were awarded in Singapore, Korea, China and India.Roche is an equal opportunity employer.

Job Source :

Similar Jobs

Career Advice to Find Better