Reliability Management with Lesley Cordero

About Show #869

Managing reliability means being available when things go wrong. But how do you make the on-call time productive? While at NDC in London, Richard talked to Lesley Cordero about her work with the New York Times on reliability management teams. Lesley talks about how putting regular sprint work into on-call time causes more problems than it solves - the quality of work suffers, and people get frustrated. Better to focus on preventative work, which is more contemplative. Even better to have an array of preventive efforts that can be worked on over time. The goal is to have fewer outages and more reliability, and that means being able to communicate reliability needs to leadership - document all the things!

Links:

Recorded January 25, 2023

 

Lesley Cordero is currently a Staff Software Engineer at The New York Times. She has spent the majority of her career on edtech teams as an engineer, including Google for Education and other edtech startups. In her previous roles she focused on building robust data pipelines, setting technical strategy, building excellent engineering teams & communities, and reliability management. Some more specifics include setting org-wide vision & strategy for observability, improving on-call processes, adopting chaos engineering practices, and cultivating culture that builds with the most vulnerable employees in mind first. She likes to show care for others by holding them accountable to the best versions of themselves and by buying them the occasional bubble tea.
 

Show Comments

blog comments powered by Disqus