Reliability Engineering: Expertly manage, maintain, and troubleshoot all major power systems, including Uninterruptible Power Supply (UPS) systems, industrial generators, and Power Distribution Units (PDUs).
Redundancy & Uptime: Strategically implement and enforce strict adherence to electrical redundancy models, specifically N, N+1, and 2N architectures, ensuring zero unplanned downtime.
Capacity Management: Lead long-term electrical load balancing and comprehensive capacity planning initiatives to support continuous business growth and hardware deployment.
Vendor Management: Coordinate and supervise vendor-led maintenance, ensuring all preventive maintenance (PM) scheduling is executed precisely and documented according to compliance standards.
2. Cooling and Environmental Optimization
HVAC Expertise: Provide expert-level oversight for all facility HVAC (Heating, Ventilation, Air Conditioning) systems.
Complex Cooling: Manage and optimize the performance of specialized cooling hardware, including CRAC/CRAH units, chilled water systems, and high-efficiency in-row cooling solutions.
Thermal Efficiency: Direct the implementation and auditing of hot/cold aisle containment best practices to maximize thermal efficiency and reduce overall operational expenditure
Environmental Monitoring: Establish protocols for, and ensure continuous, precise monitoring of, all critical environmental parameters, including temperature, humidity, and airflow.
3. Facility Infrastructure and Standards
Space & Inventory: Govern rack space allocation and the overall physical layout. Define and maintain superior standards for cable management within the data hall.
Design Management: Oversee the integrity of physical infrastructure, including raised floor design and overhead cabling pathways.
DCIM Leadership: Serve as the primary administrator for Data Center Infrastructure Management (DCIM) tools, utilizing them for advanced analytics, predictive maintenance, and capacity forecasting.
Sensor Systems: Manage and respond to data from critical environmental sensors (temperature, humidity, water leakage, smoke) to ensure immediate and effective incident response.
Requirements
Minimum of 5 years of proven experience managing critical facility infrastructure, ideally in a large-scale data center environment.
Deep, technical knowledge of high-voltage power systems, cooling mechanics, and facility redundancy principles.
Demonstrated ability to manage projects, lead vendor relationships, and mentor junior technical staff.
Exceptional troubleshooting and root cause analysis skills in a high-stress, high-availability environment.
Excellent documentation and technical communication skills.
Manage the operational health and performance of critical power infrastructure
Oversee the operation and maintenance of HVAC systems
Manage rack space allocation and inventory
Strong technical understanding of electrical systems
Reliability Engineering: Expertly manage, maintain, and troubleshoot all major power systems, including Uninterruptible Power Supply (UPS) systems, industrial generators, and Power Distribution Units (PDUs).
Redundancy & Uptime: Strategically implement and enforce strict adherence to electrical redundancy models, specifically N, N+1, and 2N architectures, ensuring zero unplanned downtime.
Capacity Management: Lead long-term electrical load balancing and comprehensive capacity planning initiatives to support continuous business growth and hardware deployment.
Vendor Management: Coordinate and supervise vendor-led maintenance, ensuring all preventive maintenance (PM) scheduling is executed precisely and documented according to compliance standards.
2. Cooling and Environmental Optimization
HVAC Expertise: Provide expert-level oversight for all facility HVAC (Heating, Ventilation, Air Conditioning) systems.
Complex Cooling: Manage and optimize the performance of specialized cooling hardware, including CRAC/CRAH units, chilled water systems, and high-efficiency in-row cooling solutions.
Thermal Efficiency: Direct the implementation and auditing of hot/cold aisle containment best practices to maximize thermal efficiency and reduce overall operational expenditure
Environmental Monitoring: Establish protocols for, and ensure continuous, precise monitoring of, all critical environmental parameters, including temperature, humidity, and airflow.
3. Facility Infrastructure and Standards
Space & Inventory: Govern rack space allocation and the overall physical layout. Define and maintain superior standards for cable management within the data hall.
Design Management: Oversee the integrity of physical infrastructure, including raised floor design and overhead cabling pathways.
DCIM Leadership: Serve as the primary administrator for Data Center Infrastructure Management (DCIM) tools, utilizing them for advanced analytics, predictive maintenance, and capacity forecasting.
Sensor Systems: Manage and respond to data from critical environmental sensors (temperature, humidity, water leakage, smoke) to ensure immediate and effective incident response.
Requirements
Minimum of 5 years of proven experience managing critical facility infrastructure, ideally in a large-scale data center environment.
Deep, technical knowledge of high-voltage power systems, cooling mechanics, and facility redundancy principles.
Demonstrated ability to manage projects, lead vendor relationships, and mentor junior technical staff.
Exceptional troubleshooting and root cause analysis skills in a high-stress, high-availability environment.
Excellent documentation and technical communication skills.