Speaking Of Reliability: Friends Discussing Reliability Engineering Topics | Warranty | Plant Maintenance

Reliability.FM: Accendo Reliability, focused on improving your reliability program and career

Gain the experience of your peers to accelerate improvement of your program and career. Improve your product development process, reliability or warranty performance; or your plant uptime or asset performance. Learn about reliability and maintenance engineering practical approaches, skills, and techniques. Join the conversation today.

Episodes

Mentioned books

Oct 11, 2021 • 0sec

Training and the Trades

Training and the Trades Abstract James and Fred discussing the impact of the labor shortage on maintenance programs. Key Points Join James and Fred as they discuss the dilemma of attracting the necessary talent that also has the experience and education to file maintenance trade roles, engineering roles, etc. Topics include: The plentiful openings, yet where is the talent? The various paths that are available to get into the trades The ‘Big Quit’ – the need to have a supportive culture to keep an attractive search Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes The post SOR 695 Training and the Trades appeared first on Accendo Reliability.

Oct 8, 2021 • 0sec

How and What to Learn

How and What to Learn Abstract James and Fred discussing learning from failures, yet shouldn’t we prevent failures? Key Points Join James and Fred as they discuss the idea of learning from failures when failures are being prevented. What do we focus on when not waiting for failures to occur? Topics include: Importance of learning of actual and possible failures Noticing the difference when proactive versus not on the impact on time to failures Focus on understanding failure mechanisms Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes The post SOR 694 How and What to Learn appeared first on Accendo Reliability.

Oct 4, 2021 • 0sec

Thoughts on ALARP

Thoughts on ALARP Abstract Chris and Fred discussing another question from a listener based on ‘as low as reasonably practicable’ or ALARP. This is used a lot in risk management and analysis … but what is it? Key Points Join Chris and Fred as they discuss the safety concept of reducing risks and a basic approach implied by ‘ALARP’ which stands for ‘as low as reasonably practicable.’ This is an often-quoted ‘goal’ for risk management frameworks, where we look at a system and try and reduce risks to ALARP. But what does this mean? … what is ‘practicable?’ Topics include: What is ‘reasonably practicable?’ This changes over time. At one point in time, driving while intoxicated was not a ‘big deal.’ Now it is illegal and you can go to jail if you do it. At one point in time, there were no real limits on what we could do to the environment. That is not the case anymore. So what is ‘reasonably practicable’ changes as society changes. And of course, ‘reasonable’ is subjective. What is reasonable to you might not be reasonable to someone else. Or the jury that is working out how much damages your company needs to fork out. ALARP often devolves into us continually asking ‘are we there yet?’ Which in turn becomes a ‘box ticking’ exercise. Why? Because when we keep asking ‘are we there yet?’ we simply convey impatience. If we are impatient, we want to ‘be there’ already. So impatient engineers, designers, managers and manufacturers simply want to make their decision defendable. Not genuinely balance risks and work out what society deems ‘reasonably practicable.’ So we stop looking at what might go wrong, and start hoping that everything is right. … so it sometimes makes us use this ‘business case’ approach to everything. But by then it is too late. Most organizations that do really well in the fields of quality, reliability and safety invest a ‘tiny’ amount of resources into making sure their first design is a quality/reliable/safe design. Why are these resources ‘tiny?’ Because if often costs next to nothing to incorporate key design features into the first iterations of design as opposed to having to redesign them in later. So we can incorporate lots of quality/reliable/safe design features from the start without paying for much at all. The problem is that the business case for each individual design characteristics costs more than making it happen in the first place. … which leads into balancing risks and responding to known risks. Whenever we are ticking boxes on a checklist to come up with a defendable reason as to why we don’t have to worry about risk anymore, we invariably focus on known risks and not unknown risks. And it can get even worse than that. The blowout preventer on the Deepwater Horizon offshore oil drilling rig that failed to stop a catastrophic oil spill was ‘built to standards.’ But the standards were out of date, and didn’t include a well-known failure mechanism that pushes the drill string to the side under pressure. There was a rush to be able to say this is safe as opposed to make sure it is safe. ‘Safe’ is simply a word that says we can use or sell something. Nothing else. We like to think that something that is ‘safe’ is something that is unlikely to cause harm. This is not the case. If something is ‘safe’ it means that it is able to be used or sold. And that is different. Many organizations don’t worry about ‘testing’ something is safe because they have ‘designed’ it to be safe. This sounds shocking. But in reality, the organizational cultures that do best at things like quality, reliability and safety, are those that rarely measure how well they are doing. They invest all their time into improving the design of their system and not measuring how compliant the design of their system is Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes The post SOR 693 Thoughts on ALARP appeared first on Accendo Reliability.

Oct 1, 2021 • 0sec

Reliability and Pumping Water

Reliability and Pumping Water Abstract Chris and Fred discussing service reliability based on a listener question where we don’t just look at reliability as it applies to an item, product or device. Instead … we look at ‘reliability of a service’ … or system, … or process. How can this help a farmer who is trying to work out how to best supply water to his cattle? Key Points Join Chris and Fred as they discuss reliability considerations when pumping water for a farmer who needs to water his cattle. He can choose to use a windmill, diesel motor and pump, solar and electric pump and so on. So can reliability engineering help him how to work out the ‘best’ solution for his problem? Topics include: Engineers can jump to the ‘design’ as opposed to the ‘solution.’ A small windmill may be the best solution … if the ground doesn’t freeze during the winter, there is a ‘small enough’ number of cattle, and we can easily check to make sure the windmill hasn’t failed … great! But here we need to consider so many ‘big world’ factors where the environment, farmer behaviours, number of cattle (how much water is needed) and so on to help us find the solution. And we need to do this before we start designing. Diesel engines don’t just stop working when they fail. What if there is an issue in the fuel supply chain? What if you can’t get a diesel mechanic onsite to service it? Neither of these two events constitutes a failure in their own right – but your system is still not pumping water to cattle. If we are going solar … what happens when there is cloud cover? How often do we have ‘disabling’ cloud cover? Can we be really clever … and if there is cloud cover understand there is a really good chance of rain … meaning we could also have some rainwater draining system to mitigate the lack of water being pumped? So it might be useful to include the ‘reliability of the supply chain’ and environmental considerations when we do reliability engineering here. Simple is often more reliable. Simple can be elegant. Around 50 % of failures occur at interfaces. Solar panels are very simple. They have no moving parts. And this means that the list of known failure mechanisms is very small. So there is a certain amount of reliability we gain by simplifying. And then there is the cost of failure – and preventing failure. Simplicity often helps us again. The most reliable system in the world may require hundreds of thousands of dollars of maintenance every year. Even if it doesn’t fail. So where do we start? FAILURE. Define failure at the highest level. In this case, it is not a diesel engine failing … this is design specific. Failure is thirsty cattle. Then you can use (proper) Root Cause Analysis (RCA) or perhaps Failure Mode and Effect Analysis (FMEA) techniques on how to work out which design approach will be the best for your farm, your number of cattle, your prevailing weather conditions, your cost of fuel, your skills as a diesel mechanic (et cetera). Perhaps the solution involves you moving to crops, changing to drought tolerant crops, moving or retiring from farming (seriously!). Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes The post SOR 692 Reliability and Pumping Water appeared first on Accendo Reliability.

Sep 27, 2021 • 0sec

Design Failure Patterns

Design Failure Patterns Abstract Kirk and Fred discussing how new technology designs of buildings or electronics are usually over-designed in the first generation. Key Points Join Kirk and Fred as they discuss the possible causes of design failure patterns Topics include: It is likely that the first high rise condos were over-designed with much more concrete and rebar than later designs due to cost pressures and building shortcuts. It would be great to see failure analysis animations similar to the New York Times graphic of the Florida condominium collapse (see link to article in the show notes below) Most failures in electronics systems and buildings arise from assignable causes such as overlooked design margins, errors in manufacturing, or user misuse or abuse. Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes Here is a link to the graphic illustration of failure analysis of the collapse of the Florida Condominium from the New York Times Please click on this link to access a relatively new analysis of traditional reliability prediction methods article from the US ARMY and CALCE titled “Reliability Prediction – A Continued Reliance on a Misleading Approach” For more information on the newest discovery testing methodology here is a link to the book “Next Generation HALT and HASS: Robust design of Electronics and Systems” written by Kirk Gray and John Paschkewitz. The post SOR 691 Design Failure Patterns appeared first on Accendo Reliability.

Sep 24, 2021 • 0sec

Reliability and Simplicity

Reliability and Simplicity Abstract Kirk and Fred discussing marketing reliability, brand reputation for reliability along with tradeoffs between making equipment simple and reliable. Key Points Join Kirk and Fred as they discuss a variety of topics around reliability. Topics include: Many times the end user of a critical piece of equipment would prefer an older model that has established reliability but less features. It seems rare for design companies to go to the end user of equipment for feedback after a new product reliability and use after product release. Fred gives and example of a connector and how design for manufacturing is an important part of creating a reliable product. Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes Please click on this link to access a relatively new analysis of traditional reliability prediction methods article from the US ARMY and CALCE titled “Reliability Prediction – A Continued Reliance on a Misleading Approach” For more information on the newest discovery testing methodology here is a link to the book “Next Generation HALT and HASS: Robust design of Electronics and Systems” written by Kirk Gray and John Paschkewitz. The post SOR 690 Reliability and Simplicity appeared first on Accendo Reliability.

Sep 20, 2021 • 0sec

Warning Signs and Culture

Warning Signs and Culture Abstract Chris and Fred discuss warning signs … especially those that revolve around culture … are often ignored. And it takes a ‘disaster’ or ‘catastrophe’ to do something about this. Why is this? Key Points Join Chris and Fred as they discuss Topics include: We started this conversation based on what is happening at the Los Alamos National Laboratory. Where (nuclear-related) safety issue after safety issue have occurred over the last decade. Simply putting too much plutonium or uranium in the same ‘space’ will trigger a ‘criticality incident’ that involves an always lethal radiation burst that results in an agonizing death over the next few days, weeks or even months. And yet at Los Alamos these materials (that look like normal ‘bits’ of metal) are routinely mishandled or otherwise mistreated. In one of these incidents, the entire Los Alamos ‘safety’ staff resigned in protest over safety not being taken seriously. So where is the accountability? Humans are wired to deal with immediate threats. Especially if it is something that is going to ‘eat’ you, like a lion, tiger or bear. But when we walk past a leaking pump or a corroded bridge span – we just don’t have the same emotional response. This is something we need to understand as human beings. What is ‘safe’ anyway? A lot of us think that ‘safe’ means the absence of risk. This is not the case. Different scenarios have different definitions of what ‘safe’ is … regardless of what we like to believe. ‘Safety’ as it relates to (for example) baby’s toys is different to ‘safety’ as it relates to parachuting. A ‘safe’ parachute jump involves risk that is very different to a ‘safe’ toy. We are often blind to the ramification of risk. Especially if we can’t see it every day. It is all about leadership and culture. If humans are easily able to dismiss or forget about the consequence of failures, then we need culture. Fred talked about an organization that realized that there were a lot of vehicular incidents and injuries in the parking lot. So the leadership required all employees who were ever going to rent a car to pass a week-long, professionally run, driver training course. They also ensured that people reversed their vehicles into spots in the parking lot as this has been shown to drastically reduce vehicular incidents. And the result? A 10 % reduction in all work lost due to employee injuries and time off work. Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes The post SOR 689 Warning Signs and Culture appeared first on Accendo Reliability.

Sep 17, 2021 • 0sec

When is MTBF OK?

When is MTBF OK? Abstract Chris and Fred discuss the MTBF … and if and when it can be used … sometimes in reliability engineering. We know that the MTBF is one of the most chronically overused (and misused) so-called ‘reliability’ metrics. But is there scope for it to be used … sometimes? Key Points Join Chris and Fred as they discuss if and when the MTBF can be used. We see it in textbooks and standards. Professors use it all the time. So it is little wonder that students and engineers also seem to rampantly misuse the MTBF and make (sometimes) disastrous decisions. There is nothing wrong with the ‘mathematics’ of the MTBF – it is all about how it needs to be used. But are there times when the MTBF can be used? Topics include: Scenarios where failure occurs with a constant hazard rate. But this is very, very rare. There are plenty of failure modes that have a constant hazard rate where external environmental stresses cause catastrophic failure. Think about tornados, tsunamis, and nails on the road for your car tire. It doesn’t matter how old or young your system is … these failure modes will be equally likely no matter how old or young your system is. But … to find an entire system that has a constant hazard rate is very, very rare. For example, while our car tire will potentially have one ‘puncture’ failure mode that has a constant hazard rate, it will also wear out. Scenarios where the MTBF is used to define a probability distribution … in conjunction with another parameter. Like the normal distribution (or bell curve) that models wear out failure phenomenon. But here, we are still not just relying on the MTBF to characterize the nature of failure. Logistics and sparing. The Poisson distribution is often used to model how many spare parts we need for a certain interval or duration. It is based on an assumption of a constant hazard rate. We know (through this statistical thing called the ‘central limit theorem’) that if we expect to have a large number of failures, then we know that the Poisson distribution becomes increasingly accurate. Think about 30 or more failures. But if you are expecting only a few spares in an interval … then the Poisson distribution will almost certainly lead you to over-estimate how many spare parts you need. Drenick’s Failure Law … asserts that in series systems composed of many components with small failure rates, which are immediately replaced with good as new components or perfectly repaired when they fail, system failures will be (asymptotically) exponentially distributed almost regardless of the component failure time distributions. But this doesn’t mean the underlying components are exponentially distributed (which means it only needs the MTBF to define failure behaviour). What this means is that when there is a huge number of components being replaced in this way, we have a ‘perfect’ mix of old and young components. Which means that even though individual failure mechanisms are wearing out, the system appears to have a constant hazard rate. But this takes TIME! And is rarely checked. And doesn’t take into consideration how things like preventive maintenance (PM) ‘reset’s component lives – all at once. Accelerated life testing … only where you are comparing two materials. If you have an underlying understanding of the Physics of Failure (PoF), you might be able to compare two materials in terms of their MTBT (or MTTF) only. This might help you make a quick decision on which material to use … but you also need to check that the underlying Time To Failure (TTF) distribution aligns with what you expect if you are going to use this data to predict reliability. Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes The post SOR 688 When is MTBF OK? appeared first on Accendo Reliability.

Sep 13, 2021 • 0sec

Creativity and Engineering

Creativity and Engineering Abstract Carl and Fred discuss the broad subject of creativity, and how it applies to engineering activities, including reliability engineering. Key Points Join Carl and Fred as they discuss creativity in engineering. Topics include: What is creativity? Book: Serious Creativity Creative challenge in recruiting interview Kids are inherently creative Odyssey of the Mind TRIZ Lateral Thinking How our brains put things in patterns, or ruts Exercises to think outside of patterns Using handwriting to unlock potential Deliberately do new things, breaking the pattern Deliberate techniques to enhance creative skills Divergent thinking vs convergent thinking Examples of creativity in reliability projects Subject Matter Experts can have blind spots Limitations of Brainstorming Discoveries sometime happen by accident, but you have to be willing to see Your mind needs to be open to think differently and see new things Asking questions Book: The Artist Way Book: Leonardo da Vinci Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes This podcast refers to three books: Serious Creativity, by Edward de Bono, published by Harper Business, 1992 The Artist’s Way, by Julia Cameron, publised by Tarcher Perigree, 2016 Leonardo da Vinci, by Walter Isaacson, published by Simon and Schuster, 2017 The podcast referes to an illustration from Effective FMEAs, by Carl Carlson: Divergent v sConvergent Thinking The post SOR 687 Creativity and Engineering appeared first on Accendo Reliability.

Sep 10, 2021 • 0sec

Short Term Thinking

Short Term Thinking Abstract Carl and Fred discussing the problem with short-term thinking; and the benefits of long-term thinking in the field of reliability engineering and management. Key Points Join Carl and Fred as they discuss the time span for reliability programs, and what happens when the focus is primarily on problem fixing, and not on problem prevention. Topics include: Reliability example: power outages Maintenance to achieve short-term financial performance Reactive vs proactive Fighting fires gets rewarded (Deming quote, see Show Notes) Management creates the culture of prevention Organziations can choose short- or long-term thinking Fixing problems is necessary; preventing problems must also be a necessity Covey’s “urgent – important” quadrants Spend time each day on “not-urgent, important” matters Limit distractions Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches. Download Audio RSS Show Notes Reference “Out of the Crisis,” by W. Edwards Deming, published by MIT Center for Advanced Engineering Study, 1982, page 107. “One gets a good rating for fighting a fire. The result is visible; can be quantified. If you do it right the first time, you are invisible. You satisfied the requirements. That is your job. Mess it up, and correct it later, you become a hero.” The post SOR 686 Short Term Thinking appeared first on Accendo Reliability.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner