Books

2023: My experiments with learning

I take immense pride in reading books as a source of knowledge, started with fiction during my student days and switched to anthropology, technology, science and leadership during the last twenty years. I usually list down the books I enjoyed reading during a year as blogpost but did not do it last year as I did not read many books in 2022. The same trend continued in 2023 albeit for additional reasons. I will list out these reasons intertwined with my experiments and conclude with my learnings through this process.

  1. Time at work: During these last couple of years, I enjoyed spending a lot more time than usual at work. I addressed meaningful challenges by applying my past learnings from books and deep thinking aided by coaching that I covered in another blogpost. These opportunities for hands-on learning have been super satisfying, far greater than any book can offer.
  2. Audible: Due to a couple of eye problems that I was trying to figure out root cause, I subscribed to Audible to check if listening to books can be an effective alternative. I felt good about this option after listening to “Atomic Habits” but did not work for two subsequent books, so gave it up for now. I usually read books before going to sleep and keep aside my book or kindle when I can longer focus on content. But with audible, I did not know when I stopped listening and lose track of the book easily.
  3. OTT Platforms: The documentaries available over YouTube, CuriosityStream, Netflix and other OTT platforms provide latest and crisp content that are quite effective to acquire quick knowledge at a high level. I have explored these options for more than 5 years but have significantly increased reliance on them. In fact, some of the book references were from here.
  4. Difficulty with finding high quality books of interest: Finally, I am quite picky when it comes to books and go through multiple reviews before starting to read one. Having read most of the classics and contemporary best books in my areas of interest over the last twenty years, finding new ones is difficult. I don’t mean to be disrespectful to the numerous awesome authors who spend their lifetime writing books. Just that I am a slow reader who takes almost a month for a 300-page book with limited time at my disposal. With other compelling options to acquire knowledge having emerged over the last decade, I need to pick the best horses for courses so that I don’t become a dinosaur.

The books I read over the years have helped me become who I am today and am sure they will continue to play a key role in shaping me in future too. There were times in the past when I felt a sense of accumulating learning debt when I don’t read books for a few months at a stretch. However, I did not feel that way during the last couple of years due to my experiments covered above. Having said that, I want to read at least five books in 2024 to check what I missed during the last couple of years and will start the new year by compiling my reading list!

The Fifth Discipline

One of the few books that keeps coming to my mind and reminds me of my north star every time I need help is “The Fifth Discipline – The Art & Practice of The Learning Organization” by Peter Senge. I have used the principles and learnings from this book countless number of times during the last ten years since I read this book. 2022 has been an extremely busy year for me at work and have neither been able to read many books nor post any blogs. So, thought I will celebrate tenth anniversary of reading this book by sharing the key learnings from this book here.

Peter Senge aptly uses the example of aviation technology taking more than thirty years to serve general public after Wright brothers invented flying to highlight that an idea moves from invention to innovation only when diverse “component technologies” comes together to integrate an ensemble of technologies that are critical to one another’s success. Similarly, there are five “component disciplines” are gradually converging to innovate learning organizations. They are – Personal Mastery, Mental Models, Building Shared Vision, Team Learning and Systems Thinking. While the first four are effective on their own to a certain extent, the fifth discipline of Systems Thinking integrates the other disciplines, fusing them into a coherent body of theory and practice.

This blog post is not about these disciplines or even Systems Thinking but lists out two key topics that provides an understanding and basis for the core disciplines for building a learning organization. These topics are the 7 Learning Disabilities that lead to individuals and organizations failing in the long term and the 11 Laws of the Fifth Discipline that enables an organization to sustain its ability to learn and grow.

Learning Disabilities: We have seen a number of well-established companies vanish over a short period of time. A study estimates that the average lifetime of the largest industrial enterprises is less than forty years, roughly half the lifetime of a human being! We have seen a number of industry leaders disappear during the last fifteen years, the ones relevant to our context will be Blackberry, Nokia, Kodak and Blockbuster to name a few. In all these companies, there was abundant evidence in advance that the firm was in trouble. The evidence goes unheeded, though the individual managers are aware of it. The organization as a whole cannot recognize impending threats, understand the implications of those threats or come up with alternatives. This is a reflection of these organizations failing to learn, which could be due to a number of reasons – they way they are designed and managed or the way people’s jobs are defined. Most importantly, the way we have all been taught to think and interact create fundamental learning disabilities. It is important that we learn to recognize when these disabilities occur and take corrective action.

  1. I am my position: We are trained to be loyal to our jobs – so much so that we confuse them with our own identities. When people in organizations focus only on their position, they have little sense of responsibility for the results produced when all positions across the organization interact. Moreover, when results are disappointing, it can be very difficult to know why and the default assumption is that “someone else screwed up”.
  2. The enemy is out there: Humans have the propensity to find someone or something outside ourselves to blame when things go wrong. In a Product Development organization, it is common for business analysts and testers to blame developers – “if only developers write quality code, we can satisfy customers”. Developers and business analysts blame testers – “if only QA tests important scenarios, we can prevent defects in production”. Testers and developers blame business analysts – “if only BAs provide proper requirements, we can deliver solutions that customers really need”. “The enemy is out there” syndrome is actually a by-product of “I am my position” , and the non-systemic ways of looking at the world that it fosters.
  3. The illusion of taking charge: Many a times, managers proclaim the need for taking charge in facing difficult problems, be proactive in approach rather than react. But if we simply become more aggressive fighting “the enemy out there”, we are only reacting. True proactiveness comes from seeing how we contribute to our own problems.
  4. The fixation on events: Conversations in many organizations are dominated by concern with short-term events like new budget cuts, who just got promoted or fired, missed milestone, etc. Our fixation of events is actually part of our evolutionary programming where our ancestors primarily needed only the ability to react to immediate threats to survive another day. However, if we focus on just events, the best we can ever do is predict an event before it happens so that we can react optimally. But we can never learn to create.
  5. The parable of the boiled frog: If you place a frog in a pot of boiling water, it will immediately try to scramble out. But if it is water at room temperature that is heated slowly, it will become groggier until it is unable to climb out of the pot. Similarly, we are also tuned to sensing sudden changes in our environment, but not to slow, gradual changes. We slip into what is famously referred as comfort zone and becomes very difficult to get out of it.
  6. The delusion of learning from experience: We learn from our experience but never directly experience the consequences of many of our important decisions. The most critical decisions made in organizations have systemwide consequences that stretch over years or decades.
  7. The myth of the management team: Every organization has a management team that is a collection of savvy, experienced managers who represent the organization’s different functions and areas of expertise. All too often, the managers tend to spend time fighting for turf, avoiding anything that will make them look bad personally and pretending that everyone is behind the team’s collective strategy.

This book covers these learning disabilities to highlight the need for the five disciplines of the learning organization. After I read this book ten years back, I consciously take a step back once in a while to introspect and look for any of these disabilities in myself and try to overcome if I found any.

The Laws of the Fifth Discipline: Systems Thinking that enables understanding complexity is the cornerstone of the learning organization. The eleven laws of this discipline helps us look at problems and opportunities holistically and avoid pitfalls of siloed thinking.

  1. Today’s problems come from yesterday’s “solutions”: Often we are puzzled by the causes of our problems, when we merely need to look at our own solutions to other problems in the past. For example – an organization that prioritizes reducing time to market thereby rushing a product to the market ends up dealing with quality issues and frustrated customers. Solutions that merely shift problems from one part of a system to another often go undetected because those who solved the first problem are different from those who inherit the new problem.
  2. The harder you push, the harder the system pushes back: Well-intentioned interventions to solve a problem call forth responses from the system that offset the benefits of the intervention, this phenomenon is called “compensating feedback”. For example – a person quits smoking to become more healthy but ends up gaining weight and suffers such a loss in self-image that he takes up smoking again to relieve the stress. When our initial efforts fail to produce lasting improvements, we push harder without understanding compensating feedback.
  3. Behavior grows better before it grows worse: Low-leverage interventions to solve problems actually work in the short-term as compensating feedback usually involves a delay. We declare victory too early and a new problem eventually shows up elsewhere in the system that someone else needs to solve now.
  4. The easy way out usually leads back in: We find comfort applying familiar solutions to problems, sticking to what we know best as it is easy for us. Pushing harder on familiar solutions while fundamental problems persist or worsen is a reliable indicator of non-systemic thinking reflecting “what we need here is a bigger hammer” syndrome
  5. The cure can be worse than the disease: Sometimes familiar solutions are not only effective but also addictive and dangerous. Alcoholism may start as simple social drinking to relieve stress but causes addiction and bigger problem in the long-term.
  6. Faster is slower: Organizations often go for quick fixes for problems that deliver results fast but don’t last long, despite being aware that solutions that stick take longer to show results.
  7. Cause and effect are not closely related in time and space: We tend to address symptoms rather than root cause as symptoms are readily visible while the real causes might have occurred at a different time. The first step in correcting this mismatch is to let go of the notion that cause and effect are closely related in time and space.
  8. Small changes can produce big results – but the areas of highest leverage are often the least obvious: We are usually tempted to go for familiar solutions to problems as they are the most obvious and easy to understand and implement. Understanding the system as a whole and deep analysis to identify the real underlying issue will help identify those small changes that have the potential to deliver the most.
  9. You can have your cake and eat it too – but not at once: Sometimes the knottiest dilemmas, when seem from systems point of view, are not dilemmas at all. They may just be false dichotomies. For example, we might not have to make a choice between quality and cost. They may both go up in the short-term but reduced rework in the long-term can bring in the required cost savings.
  10. Dividing an elephant in half does not produce two small elephants: Organizations are living systems that have integrity. Their character depends on the whole. Understanding the most challenging managerial issues require seeing the whole system that generates these issues. Dividing the system into silos can break this integrity.
  11. There is no blame: We tend to blame “others” for our problems. Systems thinking shows that there is no separate “other”, that you and the “other” are part of a single system.

Understanding the learning disabilities and the laws of systems thinking has helped me getting to the root of many problems over the years. It also prevented me from falling into the trap of familiar solutions that provide short-term relief but lead to bigger problems in the long-term.

Think Again

I began 2022 with Adam Grant‘s latest book “Think Again: The Power of Knowing What You Don’t Know“. This book is an invitation to let go of knowledge and opinions that are no longer serving us well, and to anchor our sense of self in flexibility rather than consistency. This also means abandoning some of our most treasured tools and some of the most cherished parts of our identity.

The key is “rethinking” – adopting mental flexibility to let go of our long held assumptions and have the courage to challenge our self-beliefs that might have led to success in the past but no longer relevant for the future. As an example, the COVID-19 pandemic has forced organizations to rethink the value of co-located teams working in close physical proximity all the time to deliver projects. Instead, all organizations are now exploring the flexibility offered by remote work for people to balance their professional and personal goals better. Contrary to long held belief, many remote teams have been more productive working from home as they repurposed unproductive time spent on activities like the office commute. It does not mean that remote work will be the better option forever, we are certain to encounter new challenges and we need to rethink again to address them.

This book makes a case for rethinking at three levels – Individual, Interpersonal and Collective.

Individual Rethinking – Updating our own views:

  1. A Preacher, a Prosecutor, a Politician and a Scientist walk into your mind: Our assumptions and beliefs often drive us towards two biases – Confirmation bias (seeing what we expect to see) and Desirability bias (seeing what we want to see). These biases contort our intelligence into a weapon against the truth. We find reasons to preach our beliefs more deeply, prosecute our case more passionately and ride the tidal wave of our political party. The tragedy is that we are usually unaware of the resulting flaws in our thinking. To avoid this conundrum, we should think like a scientist, and NOT like a preacher or a prosecutor or a politician. Thinking like a scientist requires searching for reasons why we might be wrong and revising our views based on what we learn.
  2. The Armchair Quarterback and the Imposter – Finding the sweetspot of confidence: Knowledge on a topic leads to both competence and confidence, the balance between both of them will determine our personality. We need to be careful when we progress from being a novice to an amateur in a skill as this is the stage when we tend to become overconfident reaching the summit of what is called “Mount Stupid“. As we progress further towards becoming a professional, we realize there is a lot more to learn and usually become more humble. We should strive to reach a state of “Confident Humility” – having faith in our capability while appreciating that we may not have the right solution or even be addressing the right problem. This gives us enough doubt to reexamine our old knowledge and enough confidence to pursue new insights.
  3. The Joy of being Wrong: Most of us are accustomed to defining ourselves in terms of our beliefs, ideas and ideologies. This can become a problem when it prevents us from changing our minds as the world changes and knowledge evolves. Our opinions become so sacred that we grow hostile to the mere thought of being wrong, and the totalitarian ego leaps in to silence counterarguments, squash contrary evidence and close the door on learning. Instead, it is better we define ourselves by our values. Values are our core principles in life – like respect, fairness, empathy, trust, courage, etc. Basing our identity on these kinds of principles enables us to remain open-minded and enjoy instances when were go wrong as opportunities to learn.
  4. The Good Fight Club – The psychology of constructive conflict: High performing groups thrive on task conflict to bring the collective best by challenging different views with mutual respect (so that it does not slip into relationship conflict).

Interpersonal Rethinking – Opening other people’s minds:

  1. Dance with Foes – How to win debates and influence people: When we want to convince others to rethink their opinions, we frequently take an adversarial approach that effectively shuts them down or rile them up instead of opening their minds. They play defense by putting up a shield, play offense by preaching their perspectives and prosecuting ours, or play politics by telling us what we want to hear without changing what they actually think. The better approach will be a more collaborative one, where we show more humility and curiosity, and invite other to think more like scientists. A good debate is not a war, it is more like a dance that has not been choreographed, negotiated with a partner who has a different set of steps in mind. To accomplish this, expert negotiators use a few techniques: acknowledging common ground, presenting fewer reasons to support their case and expressing curiosity with intriguing questions.
  2. Bad Blood on the Diamond – Diminishing prejudice by destabilizing stereotypes: In every human society, people are motivated to seek belonging and status. Identifying with a group checks both boxes at the same time: we become part of a tribe and we take pride when our tribe wins. This leads to rivalries between tribes (teams) that are typically geographically close, compete regularly and are evenly matched. Some examples in sports is rivalry between India and Pakistan on cricket or between the Yankees and Red Sox in baseball. To reinforce the rivalry, stereotypes are formed and for both mental and social reasons it is hard to undo them. Some of the ways to overcome stereotypes are: come up with shared identify using commonalities (overview effect), humanizing the team and focusing on an individual to explain irrationality of the stereotype.
  3. Vaccine Whisperers and Mild Mannered Interrogators – How the right kind of listening motivates people to change: People with unhealthy additions or unscientific beliefs are usually aware of their fallacies. If we try to persuade them to make a change, we evoke resistance and they are less likely to change. We can rarely motivate someone else to change, instead we are better off helping them find their own motivation to change. “Motivational Interviewing” is a practice that can help with this. Motivational interviewing starts with an attitude of humility and curiosity. Our role is to hold up a mirror so they can see themselves more clearly, empower them to examine their beliefs and behaviors that can activate a rethinking cycle. Three key techniques for motivational interviewing are: asking open-ended questions, engaging in reflective listening and affirming the person’s desire and ability to change.

Collective Rethinking – Creating communities of lifelong learners:

  1. Charged Conversations – Depolarizing our divided discussions: As humans, we have the tendency to seek clarity and closure by simplifying complex continuum into two categories. This is called binary bias. While democratization of information through internet was expected to expose us to different views and help us make informed rational decisions, binary bias has instead led to a more polarized world. To overcome binary bias, a good starting point is to become aware of the range of perspectives across a given spectrum and articulate the complexity instead of simplifying it. Some techniques to convey complexity are: including caveats, highlighting contingencies and expressing mixed emotions.
  2. Rewriting the Textbook – Teaching students to question knowledge: With so much emphasis placed on imparting knowledge and building confidence, many teachers don’t do enough to encourage students to questions themselves and one another. It is important to instill intellectual humility, disseminate doubt and cultivate curiosity to develop students of today into confidently humble experts in their respective domains tomorrow.
  3. That’s Not the Way We Have Always Done it – Building cultures of learning at work: Rethinking is more likely to happen in a learning culture, where people strive to know what they don’t know, doubt their existing practices and stay curious. To achieve this “Psychological Safety” – fostering a climate of respect, trust and openness in which people can raise concerns and suggestions without fear of reprise – is essential. In performance cultures, the emphasis on results often undermines psychological safety. When we see people get punished for failures and mistakes, we become worried about proving our competence and protecting our careers. While many organizations strive to build high performance cultures with the right intention, care should be taken to ensure psychological safety in parallel to promote learning culture. Psychological safety should be combined with process accountability to create a learning zone where people feel free to experiment and to poke holes in one another’s experiments in service of making them better.

Adam Grant concludes by making a case for regularly reconsidering our best-laid career and life plans to escape tunnel vision that hampers our growth. He leaves us with specific actions for impact, which are his top thirty practical takeaways for working on our rethinking skills. I will conclude this blogpost with these takeaways.

Books 2021

Most of the books I read last year were on leadership and technology, with a notable exception in “Born to run” that vividly covered a Mexican Indian running tribe. I will start with a super trio of books around leadership communication and list the most interesting books read last year.

Stories at Work – Indranil Chakraborty

This is the first of the trio around leadership communication. The definition – A story is a fact wrapped in context and delivered with emotion breaks a general misconception that stories are usually fiction created by professional writers for novels or movies. It goes on to explain why stories are the most effective way to communicate a message at work and how they can be used for building credibility, overcoming objections, getting strategies to work or articulating successes. You can find my blogpost on “Stories at work” here.

Sway – Brahman Ori

We are all influenced at times by the irresistible pull of irrational behavior. This book explains the science behind this behavior and can help avoid potential pitfalls due to certain psychological undercurrents: loss-aversion, swamp-of-commitment, value-attribution, diagnosis-bias, self-perpetuation, fairness-of-process and group-conformity. You can find my blogpost on “Sway” here.

Made to Stick – Chip Heath & Dan Heath

This is the last of the trio on leadership communication. Once we understand how to use stories at work and are aware of the psychological undercurrents that lead to irrational behavior from the first two books, this one explains the science behind “The Stickiness Factor” that leads to long lasting messages. It identifies the traits that make ideas sticky and provides a checklist for creating a SUCCESsful idea: a Simple Unexpected Concrete Credentialed Emotional Story. You can find my blogpost on Made to Stick” here.

Site Reliability Engineering (SRE) – Google experts

SRE is among the most popular technology topics during the last few years, with the IT industry viewing it as a better way to run production systems by applying a software engineering mindset to accomplish the work that would typically be performed manually by sysadmins. This book by hands-on masters of the domain provides compelling insights and motivation to transform the way technology org manages system operations. I documented learnings from the book “as-is” in my four-part blogpost series. Once we embarked on actual org transformation, we realized that realities in every org (particularly one that has legacy technology with monoliths and not truly services like Google) means one-size-fits-all approach will not work. However, this book provides some solid guiding principles that can help any org shift towards engineering oriented approach to running production systems.

Born to Run – Christopher McDougall

Being a compulsive runner myself and having found the pain of running marathons to be fun, this book was sheer joy! It is all about the mysterious Mexican Indian tribe of Tarahumara and readers who enjoy this book will be inspired to become long distance runners. I hope to visit Copper Canyons one day and be able to run along the treacherous trails vividly described in this book.

Deep Work – Cal Newport

We are living in a digital world surrounded by numerous electronic devices constantly distracting us with unending notifications. This has resulted in a lot of us finding it difficult to “focus without distraction on a cognitively demanding task“, which this book refers as “deep work” and which is an important skill to get any complex work done. Cal Newport has shared some surprising practices used by experts to switch to deep work along with emphasizing the need to embrace boredom and quitting social media to remove distractions.

The 4 Disciplines of Execution (4DX) – Chris McChesney, Sean Covey, Jim Huling

This book is referred by Deep Work, as an execution technique to achieve what needs to be done. The authors explains how the urgent day-to-day operational tasks that are referred as whirlwind impedes leaders at large organizations from executing important strategic goals. They also provide a four step framework to overcome this difficulty and excel at execution – focusing on the wildly important, acting on the lead measures, keeping a compelling scorecard and creating a cadence of accountability. You can find my blogpost on 4DX here.

Made to Stick

Why do some ideas succeed while others fail?

We have come across several urban legends that resonate with people though they are unbelievable and false, while some of the most potent transformational ideas are not even considered. The traditional belief is that successful communication requires getting the right people and setting the right context. But there is another factor that is key for an idea to become viral – “The Stickiness Factor” as explained by Malcom Gladwell in The Tipping Point. This blog is about a different book Made to Stick by Chip and Dan Heath, which identifies the traits that make ideas sticky and provides a checklist for creating a SUCCESsful idea: a Simple Unexpected Concrete Credentialed Emotional Story.

Simple: Great simple ideas have an elegance and a utility that make them function a lot like proverbs: short sentences (Compact) drawn from long experience (Core). Simple = Core + Compact

  • Finding the core requires stripping down an idea to its most critical essence. To get to the core, we have start with weeding out the superfluous elements and eventually get down to the tougher part of eliminating ideas that may be really important but just aren’t the most important idea.
  • Creating a compact message is the next step once the core is identified. Compactness is all about elegance, prioritization and being crisp. It should not result in dumbing down or shooting for the lowest common denominator to make things easy. Techniques like “inverted pyramid” used by journalists to present information in descending order of importance and “forced prioritization” limiting to just one thing, are helpful.
  • A few examples of simple ideas that are compact while retaining the core idea:
    • SouthWest Airlines tagline: THE low-fare airline.
    • Bill Clinton’s 1992 campaign lead: It’s the economy, stupid.

Unexpected: The first problem of communication is getting people’s attention. The most basic way to get someone’s attention is by breaking a pattern as humans adapt incredibly quickly to consistent patterns. For example, we quickly get used to certain sounds like that of our air-conditioner or ceiling fan and certain smells like a room freshener or candle, that we become consciously aware of these things only when something changes. To get people’s attention and keep it, naturally sticky ideas provoke two essential emotions: Surprise and Interest.

  • Surprise gets our attention. Some naturally sticky ideas propose surprising “facts”. Like the statement “You use only 10% of your brain”.
  • Interest keeps our attention. Interesting ideas like conspiracy theories or gossips makes us keep tab on developments thereby maintaining our interest over time.
  • The Gap Theory of curiosity: Curiosity happens when we feel a gap in our knowledge and these gaps cause pain till they are filled. To leverage the gap theory, we sometimes have to set the context and give people enough backstory that they start to care about the gaps in their knowledge. Mystery novelists and crossword puzzle writers excel in this technique by setting some context and giving us clues that generate sufficient interest when curiosity takes over and propels us to finish!

Concrete: As we master a language, we use complex words and abstraction to make our point. When we deliver our beautiful abstract message, listeners admire our mastery over language and extensive vocabulary. However, abstraction makes it difficult to understand an idea and remember it. It also makes it harder to coordinate our activities with others, who may interpret the abstraction in different ways. Concreteness helps us avoid these problems:

  • Concrete is memorable: When we are asked to remember how a beach feels, our sense memories are immediately evoked bringing back memories of the sight of sand and waves, smell of the ocean, sea breeze blowing across our face, etc. For a person who has not seen a beach at all, it is only the textbook definition of beach that comes to mind at best and cannot relate to it as well.
  • Concrete allows coordination: Abstract statements can mean different things for different people. For example, a goal to manufacture “the best car” will mean top speed for a race driver while it will mean comfort and space for a person looking to drive his family of four for a picnic. So, making it concrete with tangible aspects like “the car that can comfortably accommodate a family of four along with ample space to carry picnic bags” will help effectively coordinate within a team and reduce scope for different interpretations.

Credible: People’s beliefs are based on years of trust built in family, personal experience, faith and authorities. However, we can’t always rely on these factors to vouch for our message. Most of the time our messages have to vouch for themselves and must have “internal credibility”, which can be obtained from three sources:

  1. Details: A person’s knowledge of details is often a good proxy for expertise. Vivid details shared along with the message will increase the credibility of the idea.
  2. Statistics are a good source of internal credibility when they are used to illustrate relationships. Using “human-scale principle” will make statistics more effective and allows people to bring their intuition to bear in assessing whether the content of the message is credible.
  3. The Sinatra Test: In Frank Sinatra’s classic “New York, New York”, he sings about starting a new life at New York City and the chorus declares, “If I can make it there, I will make it anywhere”. An example passes the Sinatra test when just that instance is enough to establish credibility in a given domain. For example, if you have driven on Indian roads, you can drive anywhere.

Emotional: Accidental mutation of human brain thousands of years back led to cognitive revolution resulting in agricultural, scientific and industrial revolutions through analytical thinking that set us apart from other animals. However, most of our actions are still driven by primitive emotions or feelings invoked by an event. As we prepare to communicate our idea, it will help to remember that using statistics or science to make our point shifts people into a more analytical frame of mind. When people think analytically, they are less likely to think emotionally that makes them care for something. There are three strategies for making people care:

  1. Using associations (or avoiding them as the case may be): Piggybacking strategy associating ideas with emotions that already exist.
  2. Appealing to self-interest: Highlighting what is in it for oneself is a powerful way of engaging people with an idea.
  3. Appealing to identity: In some cases, going beyond self-interest and appealing to an identity that people care about will be impactful.

Stories: There are numerous books highlighting why stories are the most effective means of communicating messages at work and I have a blogpost on one of them. “Made to stick” highlights that stories cause mental simulation that evokes the same modules of the brain that are evoked in real physical activity. So, while mental simulation is not as good as doing something, it is the next best thing. There are three basic plots that can be used to make our ideas stick:

  1. The Challenge Plot: Obstacles seem daunting to the protagonist and the triumph of will power over adversity inspires us to act.
  2. The Connection Plot: These are about our relationships with other people and will be a good way to build relationships.
  3. The Creativity Plot: Involves someone making a mental breakthrough, solving a long standing puzzle or attacking a problem in an innovative way.

To summarize, for an idea to stick and be useful for a long time, it has to make the audience:

  1. Pay attention
  2. Understand and remember it
  3. Agree / Believe
  4. Care
  5. Be able to act on it

Sway

Can you remember instances when you decided to pursue a course of action that went against your professional training?

Have you wondered why leaders at times make silly mistakes by ignoring the obvious?

The question when we look back at such incidents invariably is: How can a professional with such established reputation and years of experience do this?

Leaders have to make decisions all the time based on available data and with no guarantee of success. Though they typically have the best interests of their organization in mind, sometimes things do go wrong and posterity can attribute at least some of the failures to past “irrational” decisions. It is usually easy for people to provide elaborate commentary on poor results with hindsight bias. Does this mean all leaders are doomed to be criticized in the future? Brafman Ori’s book “Sway: The Irresistible Pull of Irrational Behaviour” can help leaders deal with this occupational hazard during decision making by being aware of potential pitfalls due to certain psychological undercurrents: loss-aversion, swamp-of-commitment, value-attribution, diagnosis-bias, self-perpetuation, fairness-of-process and group-conformity.

The first time I came across the concept of irrational behavior was in the Dan Ariely’s book “Predictably Irrational” several years back. It was before I started blogging and when I was still reading non-fiction for fun rather than leadership insights. Brafman Ori has taken it to the next level by explaining the reasons behind these irrational behaviors and also shares some practices that can help us avoid falling into the trap.

History is filled with numerous instances of top-notch, award-winning professionals making a choice that contradicted their years of training resulting in disastrous outcomes as they swayed from the logical path. This book explores some of the psychological forces that derail rational thinking: how they creep upon us, when are we most vulnerable to them and why don’t we realize that we are getting swayed. By better understanding the seductive pull of these forces, we will be less likely to fall victim to them in the future.

Loss Aversion – tendency to go to great lengths to avoid potential losses: Human mind naturally experiences the pain associated with a loss much more vividly than it does the joy of experiencing a gain. This results in our overreacting to perceived losses for no apparent logical reason. The more meaningful a potential loss is, the more loss averse we become and get swept into an irrational decision. Some examples of this behavior are people subscribing to expensive “unlimited” phone or internet plans to avoid perceived loss associated with pay-as-you-go, investors in financial markets concentrating on avoiding losses rather than focusing on maximizing their gains, etc.

Swamp of Commitment – hanging on to a strategy even if the chances of success are small and the cost of delaying failure is high: History is filled with stories of civilizations and companies vanishing by sticking to a committed path even after realizing the need to change. It is also what leads a gambler or investor to chasing losses. Loss aversion and commitment have powerful effect on their own. But when they both combine, it becomes that much harder to break free and do something different.

Value Attribution – inclination to imbue a person or thing with certain qualities based on initial perceived value: From our childhood, we form perceptions about value. For example – premium branded products are preferred over regular ones even if packaging is all that is different, a lawyer is trusted more when he shows up wearing a fancy suit. We have mental models of how a top investment banker or computer geek or a seasoned politician will look like. Anyone who does not fit the stereotype finds it difficult to gain acceptance.

Diagnosis Bias – blindness to all evidence that contradicts our initial assessment of a person or situation: The proverb “The first impression is the best impression” is based on this psychological undercurrent. After all, the first impression might have occurred under exceptionally fortunate or unfortunate circumstances.

Self perpetuation – taking on characteristics assigned by diagnosis thereby reinforcing it: This is also called self-fulfilling prophecy or chameleon effect. When we brand or label people, they take on the characteristics of the diagnosis. It is also called Pygmalion effect when a positive trait is assumed and Golem effect in case of negative traits.

Fairness – its the process and not the outcome that causes irrational behavior: We expect procedural justice from people we deal with, which is perceived fairness of the process rather than just a fair outcome. For example, we consider a car salesman to be fair when he explains the reasons why the original price is worth every penny rather than another who gives an easy 10% discount without spending any effort to explain why this is the best possible deal. In the second case, we leave with the question of why only 10%, did the salesman really give me the best deal or did I overpay? To avoid this fairness dilemma, managers are asked to put greater effort, energy, investment and patience into nurturing relationships.

Group conformity – propensity to go along with the group and save the embarrassment of being odd person out: Regardless of how independent minded and steadfast people are, it is common for people to align with a group instead of voicing an unpopular viewpoint due to fear that others will doubt their intelligence, taste or competence. For a decision to be made by comprehensively considering all possibilities, it is important for the discussion to include initiators, blockers, supporters and observers. It might be frustrating to encounter blockers for what might appear to be an obvious course of action, but their opinions are absolutely essential to keeping groups balanced and hold back irrational behavior.

If this book were a movie, the first 90% is an intriguing build-up and the Epilogue is a fitting fast-paced climax where Brafman Ori gives us some invaluable suggestions on ways to avoid these pitfalls, some of them below:

  • Our natural tendency to avoid the pain of loss is most likely to distort our thinking when we place too much importance on short-term goals. When we adopt the long view, immediate potential losses don’t seem as menacing.
  • When we find ourselves unsure about whether or not to continue a particular approach, it is useful to ask – “If I were just arriving on the scene and were given the choice to either jump into this project as it stands now or pass on it, would I choose to jump in?” If the answer is no, then chances are we have been swayed by the hidden force of commitment.
  • The best strategy for dealing with the distorted thinking that can result from value attribution is to be mindful and observe things for what they are, not just for what they appear to be. You have to be prepared to accept that your initial impressions might be wrong.
  • “Propositional thinking” is all about keeping evaluations tentative instead of certain, learning to be comfortable with complex, sometimes contradictory information and taking your time and considering things from different angles before coming to a conclusion. Net-net, a self-imposed waiting period before making a diagnostic judgment can help avoid diagnostic bias.
  • One way to counter fairness sway is to try to weigh things objectively and not succumb to emotional maneuvers or moral judgments.
  • When we make decisions or take actions that will affect others, keeping them involved will help ensure that they feel the process is fair.
  • Just as communicating our process is important, so is giving voice to the dissenter. In group situations, the presence of a blocker can actually make the decision making process more rational and less likely to go off the track.

The ability to think has set humans apart from other animals and become all-conquering species. At the same time, our natural decision making machinery has limitations that leads us to being swayed at times by factors that have nothing to do with logic or reason. This book will help us recognize and understand the hidden world of sways, and learn to weaken their influence on our thinking process.

Stories at Work

After reading a few heavy technology books since the beginning of 2021, I was looking for a relatively light read and that’s when my senior leader recommended “Stories at Work” by Indranil Chakraborthy. With a recommendation from such an accomplished orator and fantastic storyteller, I bought the book immediately to check on the techniques to benefit from. Being an engineer who takes pride in analytical approach to solve problems, I considered myself to be good at articulating facts and data points. And by false dichotomy, assumed that I cannot be a good storyteller. Indranil broke this myth with the following definition of stories in business and set the tone for some awesome insights!

A story is a fact wrapped in context and delivered with emotion

I usually start any new learning with understanding “Why” it is required. In this case I remembered Yuval Noah Harari’s Sapiens referring to human’s ability to tell stories as a key result of Cognitive Revolution that led to advancement of our societies. In addition, Indranil Chakraborthy provides six compelling reasons why stories are profoundly relevant:

  1. Evolutionary predisposition: Biologists confirm that human brain is predisposed to think in story terms and explain things in story structures. Our brain converts raw experience into story form and then considers, ponders, remembers and acts on a self-created story, not the actual input experience. So, the next time someone nods vigorously indicating understanding of your speech but says something completely different when asked to paraphrase, blame the human brain at work!
  2. Childhood story exposure: We have been telling children stories to teach them values, behavior and build their knowledge. This exposure to stories through the key years of development results in adults who are irrevocably hardwired to think in story terms.
  3. Chemical post-it notes: Daniel Goleman’s “Emotional Intelligence” covers this topic in detail – in essence, emotionally intense events are permanently registered in our emotional brain (amygdala) by neuro-chemicals for super-fast retrieval whenever similar events take place in future.
  4. Neural coupling: When a story is told and it has meaning, brain patterns of the speaker and the listener synch. This can be used to ensure listeners fully comprehend what one wants to convey.
  5. Monkey see, monkey feel: When someone describes pain they went through, we feel the same way. This is due to mirror neurons that are fired up in the minds of both the listener and the storyteller.
  6. Data brain, story brain: Stories impact more areas of our brain than data does and hence stories involve us much more. This increases the likelihood of us taking action when we hear stories and not just data.

To summarize why stories are a powerful way to communicate our message – In this world of increasing noise and clutter, an ability to find an expressway to the listeners’ minds can be the most powerful skill in a leader’s repertoire. Stories can be this expressway if laid out appropriately!

Now that we understand “Why” stories are important in business, the book explores “What” are the four situations where we can start our storytelling journey:

  1. Using stories to build rapport and credibility: When we meet a person or a group for the time, we start with an introduction that is usually filled with our credentials that we expect to build trust. While credentials are an indirect pointer to our character, we can be more effective in building trust by sharing anecdotes from our life. Indranil calls them connection stories” that can help our listeners appreciate our values and result in forming a bond through shared values and beliefs. He also provides a step by step process to create and fine-tune connection stories:
    • Before meeting a new group of people, write down five words or phrases abut your character, values or beliefs that you would like your listener to infer about you.
    • Recollect and jot down an incident from your life where you have displayed many of these character traits.
    • Narrate the incident to someone you trust and write down what they inferred about you from it.
    • Based on the feedback, chisel down the story to just about a minute or less.
    • Tell the story to two other people that will automatically help you refine it further.
    • Retell the story to yourself, starting with the character trait you want your audience to take away.
    • Finally, record your story, transcribe it and fine-tune further by brutally eliminating the words that are unnecessary.
  2. Using stories to influence and overcome objections: When people have a strong belief on an illogical idea, it is usually based on some personal experience or story. Using data or logic to debate against such belief will be futile. The only way to convince people to change under such scenario is to replace it with a more powerful story, which is called influence story. An influence story has to be introduced carefully, using the following steps:
    1. Acknowledge the anti-story: Empathize with the listener’s story and express understanding of reasons behind prevailing belief.
    2. Share the story of the opposite point of view: This is the step to replace with a more powerful story, will be ideal if it can be corroborated.
    3. Make the case: Without offending the listener’s views, explain the need for a change.
    4. Make the point: Finally call for action, maybe to experiment with the change first and see the results for oneself.
  3. Getting strategies to stick: Organizations come up with well-thought vision and value statements but many a times they don’t stick across the organization due to three key reasons – abstraction of language, absence of context or the curse of knowledge. To address these challenges, “clarity stories” using simple English with the following structure are used:
    1. In the past: Articulate how we succeeded in the past using strategy relevant at that time.
    2. Then something happened: Highlight the changes caused by both external and internal factors that have rendered the past strategy irrelevant.
    3. So now: Introduce the new strategy that needs to be adopted to succeed in current reality.
    4. In the future: Explain your vision on how this new strategy will create new opportunities and success in future.
  4. Using stories to share best practices, knowledge or success: Just like we tend to focus on credentials while introducing ourselves, the focus while articulating success or best practices tends to be data points or statistics. Given humans are predisposed to assimilate stories better, the suggestion here is to turn them into “success stories”. Narrate the success as stories placing human characters appropriately for effective reach and impact.

After covering “Why” and “What”, the book goes on to cover “How” to put them together for different business scenarios. I would recommend reading the book for this section (and the previous ones as well for comprehensive understanding).

To summarize, the combination of four story patterns – connection stories, influence stories, clarity stories and success stories – will make external and internal communication more effective and transform an organization. Happy story-telling!

SRE – Management

We covered the motivation behind SRE in the first blogpost of this series, followed by Principles and Practices. Lets complete the foundation with Google’s guidance on how to get SREs working together in a team and working as teams. To ensure SRE approach sticks without the team slipping back to old ways, the new ways of working covered in this blogpost should be incorporated in a structured manner along with the team and the management committing to adhere to them at all costs.

Accelerating SREs to On-Call and Beyond: Educating new SREs on concepts and practices up front will shape them into better engineers and make their skills more robust.

  • Initial Learning Experiences – The Case for Structure Over Chaos: SRE must handle a mix of proactive (engineering) and reactive (on-call) work while traditional Operations teams are predominantly reactive. To position the team for success with proactive work, structured knowledge build-up of the system is essential. Some techniques for getting there:
    • Learning Paths That Are Cumulative and Orderly – Show the new SRE team an orderly path that will infuse confidence that there is a plan to mastery of the system through a combination of education, exposure and experience.
    • Targeted Project Work, Not Menial Work – Make the initial weeks effective by giving the engineers project work that can reinforce their learning.
  • Creating Stellar Reverse Engineers and Improvisational Thinkers: SREs will continue to encounter systems with design patterns that they have not seen before. They need strong reverse engineering skills along with ability to think statistically and improvise fully to untangle without avoid getting stuck.
  • Best Practices for Aspiring On-Callers: For engineers who typically prefer creating new tech solutions, being on-call to troubleshoot production issues can be made interesting with the following practices:
    1. A Hunger for Failure: Reading and Sharing Postmortems
    2. Disaster Role Playing (regular team exercises for new joiners to enact responding to pages)
    3. Break Real Things, Fix Real Things (by simulating volumes or issues in non-critical lower environments)
    4. Documentation as Apprenticeship (by overhauling outdated knowledge base)
    5. Shadow On-Call Early and Often
  • On-Call and Beyond – Rites of Passage and Practicing Continuing Education: Once the engineer has demonstrated ability to handle issues independently, it is time to be formally added to on-call rota and celebrate this milestone as a team. It is important to setup a regular learning series that helps the entire team stay in touch with changes.

Dealing with interrupts: Once the SRE team is in-charge of handling operations, “Managing Operational Load” is the next topic to focus on. Operational Load is the work that must be done to maintain the system in a functional state, and this will interrupt the SRE team working on any other planned project work. So, the objective is to handle such interruptions without distracting the engineers from their cognitive flow state. The interrupts fall into three general categories:

  • Pages concern production alerts and are triggered in response to production emergencies. They are commonly handled by a primary on-call engineer, who is focused solely on on-call work. A person should never be expected to be on-call and also make progress on projects or anything else with a high context switching cost. A secondary on-call engineer provides back-up in case of contingencies.
  • Tickets concern customer requests that require the team to take an action. The primary or secondary on-call engineer can work on tickets when there are no pages to handle. Depending on the nature and priority of tickets, a dedicated person might also be assigned to work on tickets.
  • Ongoing operational responsibilities include activities like team-owned code or flag rollouts, or responses to ad-hoc, time-sensitive questions from customers. An approach similar to handling tickets can be adopted.

Embedding a SRE to Recover from Operational Overload: A burdensome amount of ops work for a prolonged period will be dangerous because the SRE team might burn out or be unable to make progress on project work. One way to relieve this burden is to temporarily transfer a SRE into the overloaded team. Google’s guidance to the SRE who will be embedded on a team:

  • Phase 1: Learn the Service and Get Context – Remind the team that more tickets should not require more SREs and emphasize on healthy work habits that reduce the time spent on tickets. Some of the healthy habits are focusing on non-linear scaling of services, identifying sources of inordinate amount of stress, and identifying emergencies waiting to happen.
  • Phase 2: Sharing Context – After identifying pain points, suggest improvements and demonstrate better ways to work. Some examples are writing a good postmortem for the team or identifying root cause for frequent issues and suggesting solutions.
  • Phase 3: Driving Change – Nudge the team with ideas based on SRE principles and help them self-regulate. This can be done by helping the team fix any basic issues (like defining SLO), coaching team members to address issues in a permanent way or asking leading questions.

Communication and Collaboration in SRE: There is tremendous diversity in SRE teams as it includes people with various skills such as systems engineering, software engineering, project management, etc. Also, given the nature of responsibilities handled by SRE, team members tend to be more distributed across geographical regions and time zones when compared to product development. Considering these aspects, communication and collaboration among SRE teams and across other teams should be designed to address the joint concerns of production and the product in an atmosphere of mutual respect. There should be forums (like weekly Production Meetings) for the SRE team to articulate the state of the system they support and highlight improvement opportunities to Product Development.

The Evolving SRE Engagement Model: The focus so far has been on onboarding SRE support for a product or service that is already in production. While this “classic” engagement model is commonly a good starting point, there are two other models that are better at embedding SRE principles and practices earlier during development lifecycle. Let’s looks at all the three models, starting with the classic one.

  • Simple PRR (Classic) Model: When SRE receives a request for taking over production management, SRE gauges both the importance of the product and the availability of SRE teams. The SRE and development teams then agree on staffing levels to facilitate this support followed by a Production Readiness Review (PPR). Once the gaps and improvements identified from the review are addressed, SRE team assumes its production responsibilities.
  • Early Engagement Model: SRE participates in Design and later phases, eventually taking over the service any time during or after the build phase.
  • Evolving Services Development – Frameworks and SRE Platform: As the industry moves towards microservices architecture, the number of requests for SRE support and the cardinality of services to support will increase. To effectively address the increased demand, all microservices should adopt structured frameworks for production services. These frameworks include codified SRE best practices that are “production ready” by design and reusable solutions to mitigate scalability and reliability issues. A production platform built on top of such frameworks with stronger conventions reduces operational overhead.

These five ways to work should help establish and reinforce SRE teams in an organization. And with this, we come to the end of SRE overview series. I strongly recommend reading Google’s book to get a comprehensive understanding of SRE. As the industry moves further towards microservices and cloud, traditional support model that is predominantly based on manual operations will not be scalable and sustainable. The sooner organizations embark on pivoting towards an engineering-oriented support model with necessary investments in technology and people, the better for products and services they provide.

SRE – Practices

After covering the motivation behind SRE along with the responsibilities and principles in previous blogposts, this one will focus on “how” to get there by leveraging SRE practices used by Google. The book explains 18 practices and I strongly recommend reading the book to thoroughly understand them. I have provided a brief summary of the most common and relevant practices here.

The book has characterized the health of the service similar to Maslow’s hierarchy of human needs, with basic needs at the bottom (starting with Monitoring) and goes up all the way to taking proactive control of the product ‘s future rather than reactively fighting fires. All the practices fall under one of these categories.

Monitoring: Any software service cannot sustain in the long term if customers usually come to know of problems before the service provider. To avoid this situation of flying blind, monitoring has always been an essential part of supporting a service. Many organizations have L1 Service Desk teams that either manually perform runbook based checks or visually monitor dashboards (ITRS, App Dynamics, etc.) looking for any service turning “red”. Both these approaches involve manual activity, which make monitoring less effective and inefficient. Google being a tech savvy organization, always had automated monitoring through custom scripts that check responses and alert.

  • Practical Alerting from Time-Series Data: As Google’s monitoring systems evolved using SRE, they transformed to a new paradigm that made the collection of time-series a first-class role of the monitoring system, and replaced those check scripts with a rich language for manipulating time-series into charts and alerts. Open source tools like Prometheus, Riemann, Heka and Bosun allow any organization to adopt this approach. For organizations still relying heavily on L1 Service Desks, a good starting point will be to use a combination of white-box and black-box monitoring along with a production health dashboard and optimum alerting to eliminate the need for manual operations that only scales linearly.

Incident Response: Incidents that disrupt a software service dependent on numerous interconnected components is inevitable. SRE approaches these incidents as an opportunity to learn and remain in touch with how distributed computing systems actually work. While Incident Response and Incident Management are used interchangeably at some places, I consider Incident Response that includes technical analysis and recovery to be the primary responsibility of SRE team, whereas Incident Management deals with communication with stakeholders and pulling the who response together. Google has also called out Managing Incidents as one of the four practices under Incident Response:

  • Being On-Call is a critical duty for SRE team to keep their services reliable and available. At the same time, balanced on-call is essential to foster a sustainable and manageable work environment for the SRE team. The balance should ensure there is no operational overload or underload. Operational overload will make it difficult for the SRE team to spend at least 50% of their time on engineering activities leading to technology debt and inefficient manual workarounds creeping into support process. Operational underload can result in SREs going out of touch with production creating knowledge gaps that can be disastrous when an incident occurs. On-call approach should enable engineering work as the primary means to scale production responsibilities and maintain high reliability and availability despite the increasing complexity and number of systems and services for which SREs are responsible.
  • Effective Troubleshooting: Troubleshooting is a skill similar to riding a bike or driving a stick-shift car, something that becomes easy once you internalize the process and program your memory to subconsciously take necessary action. In addition to acquiring generic troubleshooting skill, solid knowledge of the system is essential for a SRE to be effective during incidents. Building observability into each component from the ground up and designing systems with well-understood interfaces between components will make troubleshooting easier. Adopting a systematic approach to troubleshooting (like Triage -> Examine -> Diagnose -> Test / Treat cycle) instead of relying on luck or experience will yield good results and better experience for all stakeholders.
  • Emergency Response: “Don’t panic” is the mantra to remember during system failures to be able to recover effectively. And to be able to act without panic, training to handle such situations is absolutely essential. Test-Induced emergency helps SRE proactively prepare for such eventualities, make changes to fix the underlying problems and also identify other weaknesses before they became outages. In real life, emergencies are usually change-induced or process induced and SREs learn from all outages. They also document the failure modes for other teams to learn how to better troubleshoot and fortify their systems against similar outages.
  • Managing Incidents: Most organizations already have an ITIL based Incident management process in place. SRE team strengthens this process by focusing on reducing mean time to recovery and providing staff a less stressful way to work on emergent problems. The features that can help achieve this are recursive separation of responsibilities, a recognized command post, live incident state document and clear handoff.

Postmortem and Root Cause Analysis: SRE philosophy aims to manually solve only new and exciting problems in production unlike some of the traditional operations-focused environments that end up fixing the same issue over and over.

  • Postmortem Culture of Learning from Failure has primary goals of ensuring that the incident is documented, all contributing root causes are well understood and effective preventive actions are put in place to reduce the likelihood and impact of recurrence. As the postmortem process involves inherent cost in terms of time and effort, well defined triggers like incident severity is used to ensure root cause analysis is done for appropriate events. Blameless postmortems are a tenet of SRE culture.

Testing: The previous practices help handle problems when they arise but preventing such problems from occurring in the first place should be the norm.

  • Testing for Reliability is the practice that helps adapting classical software testing techniques to systems at scale and improve reliability. Traditional tests during software development stage like unit testing, integration testing and system testing (smoke, performance, regression, etc.) help ensure correct behavior of the system before it is deployed into production. Production tests like stress / canary / configuration tests are similar to black-box monitoring that help proactively identify problems before users encounter them and also help staggered rollouts that limits any impacts in production.

Capacity Planning: Modern distributed systems built using component architecture are designed to scale on demand and rely heavily on diligent capacity planning to achieve it. The following four practices are key:

  • Load balancing at the Frontend: DNS is still the simplest and most effective way to balance load before the user’s connection even starts but has limitations. So, the initial level of DNS load balancing should be followed by a level that takes advantage of virtual IP addresses.
  • Load balancing in the data center: Once the request arrives at the data center, the next step is to identify the right algorithms for distributing work within a given datacenter for a stream of queries. Load balancing policies can be very simple and not take into account any information about the state of the backends (e.g., Round Robin) or can act with more information about the backends (e.g., Least-Loaded Round Robin or Weighted Round Robin).
  • Handling Overload: Load balancing policies are expected to prevent overload but there are times when the best plans fail. In addition to data center load balancing, per-customer limits and client-side throttling will help spread load over tasks in a datacenter relatively evenly. Despite all precautions, when backend is overloaded, it need not turn down and stop accepting all traffic. Instead, it can continue accepting as much traffic as possible, but to only accept that load as capacity frees up.
  • Addressing cascading failures: A cascading failure is one that grows over time as a result of positive feedback. It can occur when a portion of an overall system fails, increasing the probability that other portions of the system fail. Increasing resources, restarting servers, dropping traffic, eliminating non-critical load, eliminating bad traffic are some of the immediate steps that can address cascading failures.

Development: All the practices covered so far deal with handling reliability after software development is complete. Google recommends significant large-scale system design and software engineering work within the organization to enable SRE through following practices:

  • Managing Critical State – Distributed Consensus for Reliability: CAP Theorem provides the guiding principle to determine the properties that are most critical. When dealing with distributed software systems, we are interested in asynchronous distributed consensus, which applies to environments with potentially unbounded delays in message passing. Distributed consensus algorithms allow a set of nodes to agree on a value once but don’t map well to real design tasks. Distributed consensus adds higher-level system components such as datastores, configuration stores, queues, locking, and leader election services to provide the practical system functionality that distributed consensus algorithms don’t address. Using higher-level components reduces complexity for system designers. It also allows underlying distributed consensus algorithms to be changed if necessary in response to changes in the environment in which the system runs or changes in nonfunctional requirements.
  • Distributed Periodic Scheduling with Cron, Data Processing Pipelines and ensuring Data Integrity: What You Read Is What You Wrote are other practices during Development.

Product is at the top of the pyramid for any organization. Organizations will benefit by practicing Reliable Product Launches at Scale using Launch Coordination Engineering role to setup a solid launch process with launch checklist.

These practices shared by Google provide a comprehensive framework to adopt across software development lifecycle to improve reliability, resilience and stability of systems.