Site Reliability Engineer

Site Reliability Engineer
Company:

Mindvalley


Details of the offer

About Mindvalley:Mindvalley is the leading and most promising ed-tech company to date. We dominate the US market for Personal Growth Education. We are empowering athletes within every major US sports team and promoting successful learning strategies in major companies.We're currently on a mission to build the most advanced and complete learning experience to enable personal growth and development for all our amazing customers. We innovate tools that induce enlightenment within every aspect of human life. We are seeking the best engineers to build the best and most advanced education platform our species has seen. The goal to mark our success is: to power up to 100 countries, powering every Fortune 500 company, and progressing humanity towards a better future.About the Role: Join us in the mission to build and maintain a resilient, high-performance infrastructure! We're on the lookout for a dynamic and seasoned Site Reliability Engineer (SRE) to take charge as our SRE Engineering Manager. In this pivotal role, you'll lead an exceptional team of SREs, ensuring the stability, scalability, and efficiency of our cloud infrastructure and applications. Responsibilities: Cloud Infrastructure Development: Develop, and oversee our cloud infrastructure across leading platforms such as AWS, GCP, or Azure. Implement infrastructure as code (IaC) methodologies for streamlined provisioning and configuration management. Stay abreast of cloud advancements and best practices, driving optimization initiatives within our cloud environment. Collaborate closely with architects and cloud engineers to craft secure, cost-effective solutions that meet our evolving needs. Site Reliability Champion: Advocate for the principles of Site Reliability Engineering (SRE) within the team and throughout the organization. Spearhead the development and deployment of automated monitoring, alerting, and incident response systems. Cultivate a culture of proactive troubleshooting and continuous enhancement of infrastructure reliability. Utilize metrics analysis to pinpoint bottlenecks and fine-tune performance and scalability. CI/CD and DevOps Champion: Champion CI/CD and DevOps best practices within the team. Spearhead the development and deployment of automated pipelines for infrastructure deployments. Integrate monitoring and alerting systems into the CI/CD pipeline for proactive issue identification. Promote collaboration between SRE, development, and operations teams. Skills: Proficient in container orchestration systems, specifically Kubernetes. Skilled in Prometheus Metrics & Observability ecosystems. Strong understanding of Linux and network fundamentals. Experience with automation tools (Terraform, Ansible, Chef, Puppet). Knowledge of cloud services (AWS, GCP, Azure) and multi-cloud environments. Familiarity with the full Software Development Life Cycle, including both Waterfall and Agile methodologies. Excellent teamwork and communication skills, with a knack for detail-oriented problem-solving. Ability to work under pressure, managing critical systems with a focus on timely delivery. A proactive mindset, always looking for ways to improve system reliability and efficiency. Curiosity and a continuous learning attitude, embracing new technologies and methodologies to drive innovation. Experience: Demonstrated experience (3+ years) in system design, maintenance, and troubleshooting, with a solid background in Site Reliability Engineering, DevOps, Cloud Engineering, or similar roles. Proven track record in automating operations, including deployment, system configurations, and operational tasks, to minimize manual work and enhance efficiency. Expertise in container orchestration systems, especially Kubernetes, to ensure scalable and reliable application deployment. Proficient in implementing and managing monitoring tools like Prometheus for proactive issue detection and resolution. Strong foundation in Linux and network fundamentals, ensuring secure and optimized system operations. Experience with infrastructure as code tools (Terraform, Ansible, Chef, Puppet) for efficient system provisioning and management. Familiarity with cloud services (AWS, GCP, Azure) and the ability to navigate and optimize multi-cloud environments. Knowledge of the full Software Development Life Cycle, with experience in both Waterfall and Agile methodologies, to support continuous integration and delivery. Ability to lead incident response efforts, conduct thorough post-mortem analyses, and implement preventative measures to maintain high system availability and performance. Capacity planning and performance tuning expertise to manage growth effectively and maintain optimal service levels. Excellent communication skills, with the ability to work closely with cross-functional teams, including direct collaboration with C-level executives and tech leadership. A proactive, solution-oriented mindset, with a focus on continuous improvement and innovation to drive system reliability and efficiency. Curiosity and a commitment to continuous learning, with a willingness to explore new technologies and methodologies to enhance operational excellence. Mindvalley is an equal opportunity employer and does not discriminate on the basis of race, colour, religion, gender identity or expression, national origin, age, disability, marital status, sexual orientation, or any other legally protected status. We are committed to creating a diverse and inclusive workplace and encourage applications from all qualified individuals.


Source: Talent_Ppc

Job Function:

Requirements

Site Reliability Engineer
Company:

Mindvalley


Mechanical Manager (Shipyard Mechanical Workshop)

**(The job will be based in Bahrain for a permanent contract)****There will be family status provided - visa, medical insurance and yearly ticket allowance f...


From Marine & Offshore Co. - Kuala Lumpur

Published a month ago

Maintenance Technician

Ensure all building maintenance & plumbing systems are properly maintained & up-keep in good working condition- Daily check on Grease trap, main incoming wat...


From Quill City Mall - Kuala Lumpur

Published a month ago

Acmv Technician (Air-Cond) - Urgently Hiring

**Job description**:- Knowledge on troubleshooting, repairing and installation air conditioner (Split unit, Ceiling Cassette, Ceiling Exposed) will be an add...


From Tricor Tech Sdn Bhd - Kuala Lumpur

Published a month ago

Facilities Maintenance Technician #64875 (Work In

Industry/ Organization Type**:Telecommunication Services**:- Position Title**:Facilities Maintenance Technician**:- Working Location: New Tech Park & Techlin...


From Anradus Pte Ltd - Kuala Lumpur

Published a month ago

Built at: 2024-05-20T01:44:04.247Z