

Speaking Of Reliability: Friends Discussing Reliability Engineering Topics | Warranty | Plant Maintenance
Reliability.FM: Accendo Reliability, focused on improving your reliability program and career
Gain the experience of your peers to accelerate improvement of your program and career. Improve your product development process, reliability or warranty performance; or your plant uptime or asset performance. Learn about reliability and maintenance engineering practical approaches, skills, and techniques. Join the conversation today.
Episodes
Mentioned books

Jun 10, 2024 • 0sec
Reliability Allocation Methods
Reliability Allocation Methods
Abstract
Kirk and Fred discuss reliability allocations for individual components and subsystems.
Key Points
Join Kirk and Fred as they discuss a question from one of our listeners and why solid state electronics and mechanical systems have significant differences in the intrinsic life entitlements and have to be analyzed with different models that may be available.
Topics include:
Knowing what the intrinsic life of a component in a new application is difficult if not to predict, but in power electronics there are some wear out mechanisms as seen in IGBT (Insulated Gate Bipolar Transistors) and batteries do wear out before the systems are technologically obsolete.
There are many variables in the life estimate of a component subsystems such as manufacturing, multiple suppliers, and variation in end-use environments that would take significant effort and time to test for.
HALT and Accelerated life testing (ALT) have different goals, where HALT can empirically expose a products fatigue weaknesses, and ALT is to quantify the time to failures, but ALT typically takes longer with more samples.
Prognostic and Health Management (PHM) have become a more significant method in electronics life measurement and is based on finding leading parametric measurements that indicate component degradation and eventual wear out.
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
Please click on this link to access a relatively new analysis of traditional reliability prediction methods article from the US ARMY and CALCE titled “Reliability Prediction – Continued Reliance on a Misleading Approach”. It is in the public domain, so please distribute freely. Trying to predict reliability for development is a misleading and costly approach.
You can now purchase the most recent recording of Kirk Gray’s Hobbs Engineering 8 (two 4 hour sessions) hour Webinar “Rapid and Robust Reliability Development 2022 HALT & HASS Methodologies Online Seminar” from this link.
For more information on the newest discovery testing methodology here is a link to the book “Next Generation HALT and HASS: Robust design of Electronics and Systems” written by Kirk Gray and John Paschkewitz.
The post SOR 973 Reliability Allocation Methods appeared first on Accendo Reliability.

Jun 7, 2024 • 0sec
An Idea Short Explainer Videos
An Idea Short Explainer Videos
Abstract
Chris and Fred discuss how short, 1-minute explainer videos could help reliability engineers … especially new ones!
Key Points
Join Chris and Fred as they discuss how people new to the topic can start immersing themselves in reliability engineering … but in an engaging and ‘easy’ way. We think that perhaps one (1) minute explainer videos might help.
Topics include:
We usually start with asking you ‘what are you trying to do?’ But for people new to reliability engineering, that can be a problem. You might not know what it is that needs to be fixed. You might know that something is wrong when it comes to reliability engineering, but don’t know what it is.
But even seasoned veterans can benefit from these sorts of explainer videos. The number of decisions that are being made that assume that (for example) the MTBF is a ‘relatively failure-free period,’ or is the ‘optimal servicing interval’ is astounding (these assumptions are wrong by the way). And this is often not those decision-makers fault (up until a point). The MTBF is religiously taught as a central tenet to reliability engineering ‘stuff.’ So why would they do anything differently?
What do you think? Are there any one (1) minute explainer videos you would like to see? … or perhaps one (1) minute explainer videos you think your ‘boss’ needs to see?
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
The post SOR 972 An Idea Short Explainer Videos appeared first on Accendo Reliability.

Jun 3, 2024 • 0sec
FRACAS and CMMS
FRACAS and CMMS
Abstract
Chris and Fred discuss what Failure Reporting and Corrective Action Systems (FRACAS) and Computerized Maintenance Management Systems (CSSM) are … and what they are not … and how they relate.
Key Points
Join Chris and Fred as they discuss both the Failure Reporting and Corrective Action System (FRACAS) and Computerized Maintenance Management System (CSSM). What are they? What makes them different?
Topics include:
FRACAS is for ‘unacceptable’ failures. As in a failure that we need to somehow get rid of and try and make never happen again. This goes beyond simply repairing or fixing the failure (of course you need to do that). But it then includes Root Cause Analysis (RCA) to then try and work out what changes you need to make to your design, manufacturing, maintenance or operation to make sure that failure never happens again. FRACAS are often used in the design phase to record test results or issues identified that need to be subject of next design iteration. FRACAS can also be used during sustainment/operations/use as well.
CMMS is for ‘acceptable’ failures. It sounds wrong to say that we have ‘acceptable’ failures. But here we mean failures that we are simply going to repair, and not try and address the root causes of.
So what makes a failure ‘unacceptable.’ Good question. What makes a technician going to enter a failure into the CMMS and then the FRACAS? This is where a lot of FRACAS fall down, especially when it comes to dealing with failures that occur at an unacceptably high frequency. The frequency of failure can be invisible to a single technician. So what is yours?
But a FRACAS is not a journal. A lot of so called FRACAS are little more than a spreadsheet that journals failures, or expensive software that allows people to record failures that they think are unacceptable. This is not a FRACAS. A FRACAS includes a team whos job is to routinely go through said journal, conduct RCA, identify Corrective Actions, and then be responsible for resourcing and implementing them. Does your FRACAS have this team? Unless you have this team (and instead rely on people to resolve these issues of their own volition), you don’t have a FRACAS.
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
The post SOR 971 FRACAS and CMMS appeared first on Accendo Reliability.

May 31, 2024 • 0sec
What is a Mode?
What is a Mode?
Abstract
Carl and Fred discuss why understanding the mode of failure is essential in Failure Mode and Effects Analysis.
Key Points
Join Carl and Fred as they discuss how the Mode of failure is used in FMEA and other methods.
Topics include:
Examples of mode of failure
Common misunderstandings about Failure Mode
What is observable is often the mode
Mode = the way or manner something happens
There is a journey from Function to Cause and Failure Mode is a key part of the journey
Cause vs Failure Mechanism
Focus on Failure Modes that matter the most
Proper FMEA definitions are key to successful FMEAs
There are important Failure Modes that are not the same as the antithesis of the function
Every FMEA step naturally leads to the next step
Important to define failure
Keep the team focused on what matters
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
The post SOR 970 What is a Mode? appeared first on Accendo Reliability.

May 27, 2024 • 0sec
Common Reliability Mistakes
Common Reliability Mistakes
Abstract
Carl and Fred discuss some of the most common reliability mistakes they have seen in their careers, both ones they have personally made or viewed.
Key Points
Join Carl and Fred as they discuss common erroneous assumptions and other types of mistakes and how to avoid them.
Topics include:
Not knowing how to interpret data, including confidence intervals, failure distributions
Assuming exponential distributions and why this is often wrong
Importance of always documenting and questioning the assumptions you make
Why a Reliability plan is needed to ensure the right methods are applied
Use of the Common Sense test
Steven Covey's Urgent-Important matrix
What's wrong with a Reliability Plan that merely tests a couple of units
Blaming the customer for failures
One-trick pony always using a favorite tool or method
Having a good plan compensates for one-trick pony
Need to have an overview understanding of all tools to be able to select the right tools
Sometimes we need to move out of our comfort zone to have an open mind about tools
Know the limitations of each tool
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
The post SOR 969 Common Reliability Mistakes appeared first on Accendo Reliability.

May 24, 2024 • 0sec
Selecting Tools to Solve Problems
Selecting Tools to Solve Problems
Abstract
Dianna and Fred discuss selecting tools to solve problems that are outside of the workplace, too!
Key Points
Join Dianna and Fred as they discuss selecting tools to solve problems that are outside of the workplace, too!
Topics include:
The many controllable variables of app-enabled coffee machines.
How tools we use at work can be used, again, for hobbies at home.
Using tools outside of work can help us practice their use.
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
Fred is excited to have purchased a brand-new coffee machine and wants to tweak its settings to craft the perfect brew. Dianna is astounded at the many variables of Fred’s new machine and learns to appreciate her fill-and-start model. Dianna doesn’t leave Fred in a lurch, though. They talk through some of the variables and outputs and also what kind of quality methods Fred can use to help him figure out his new machine.
The post SOR 968 Selecting Tools to Solve Problems appeared first on Accendo Reliability.

May 20, 2024 • 0sec
Are Silos an Issue
Are Silos an Issue?
Abstract
Dianna and Fred discuss workplace politics: are silos an issue?
Key Points
Join Dianna and Fred as they discuss workplace politics: are silos an issue?
Topics include:
What silos are, what they may look like, and the ways they are formed.
The effects of silos on projects and teams.
You are needed to bridge silos and facilitate teamwork across silos.
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
Fred and Dianna talk about a common workplace scenario: functional silos. They talk about their own experiences in how silos have caused problems. And they share stories about what they did for projects to bridge the silos, which led to greater project success. Their call to action: you are needed to bridge the silos!
The post SOR 967 Are Silos an Issue appeared first on Accendo Reliability.

May 17, 2024 • 0sec
Why is PoF so Hard?
Why is PoF so Hard?
Abstract
Chris and Fred discuss why the Physics of Failure (PoF) is hard to model? … or is it?
Key Points
Join Chris and Fred as they discuss how the Physics of Failure (PoF) is seen as hard to use to model time to failure of something. It usually needs a detailed equation or formula to model how long it takes for something to fail based on physical parameters like grain size, modulus, strain exponent and so on. Sounds hard!
Topics include:
What does PoF mean? It means that instead of doing things like testing products until failure to see the spread of times to failure (as in, how probability is distributed), an ‘accurate’ model that might have lots of parameters based on material properties is used instead of teasing to quickly and accurately model time to failure.
So what’s the problem? It can be really, really hard to know which of the thousands of complex equations are the one(s) that describe how your product fails. There are resources out there that have huge lists of PoF models (and their detailed equations) for you to pick from. But then … how do you know which one perfectly captures the way your thing fails?
Then there are the parameters. Some PoF models require tens of parameters to be known. But if you don’t know what these parameters are … you are in trouble. Some people just ‘guess’ these parameters based on similar materials or scenarios. The problem with this is now that you are modeling someone else’s failure that may or may not be similar to yours.
But we do use PoF more than we might think. When we do Accelerated Life Testing (ALT), we often use what we call ‘Arhennius Plots.’ These are charts that happen to make it really easy for us to see and model how increasing the temperature of a product speeds up the failure process. This allows us to ‘accelerate’ testing by increasing temperature to not have to spend 10 years testing products to understand service reliability. But … ‘Arhennius Plots’ only work for failure mechanisms that are based on chemical reactions (like corrosion, dendritic growth and so on). And many people try and use ‘Arrhenius Plots’ for things that are not chemical reactions.
Again … work out what decision you are trying to inform. This will help you see if you need to understand PoF, do your own test, use expert judgment or anything else!
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
The post SOR 966 Why is PoF so Hard? appeared first on Accendo Reliability.

May 13, 2024 • 0sec
MTBF, Really?
MTBF, Really?
Abstract
Chris and Fred discuss the MTBF … again. And again. People don’t (want to) get it. So here we go again …
Key Points
Join Chris and Fred as they discuss the MTBF and why it should virtually never be used. Why?
Topics include:
What’s wrong with the MTBF when it comes to reliability? When we assume that the only thing we need to understand is the MTBF, we can never use reliability models that include any form of early wear-in or late wear-out. So, it means we assume a constant hazard rate, which means your thing never stays young and never gets old. That’s right, a 100-year-old product that is somehow still working is just as likely to survive the next day as one that comes out of the box.
But when I assume (just) the MTBF, I get better results than when we do more detailed analysis. A Toyota Corolla has a 1.6 Litre engine. So does a F1 race car. Now let’s say that you measured the top speeds of both cars. For the F1 race car, we get 372.499 km/h or 231.46 mph. For the Toyota Corolla, we get 188.3 km/h or 117.0 mph. But let’s now say that we don’t like the top speed of the Toyota Corolla, and would like it to be higher. What you could do is pretend you didn’t measure the top speed of the Toyota Corolla, and then assume that because it’s engine is the same size as the F1 race car’s engine … we assume it has the same top speed as the F1 race car. Crazy right? … just as crazy as assuming an MTBF or constant hazard rate because you like the number you get better.
Ostriches don’t actually put their heads in the sand … but many ‘reliability engineers’ do. When we ask some organizations and reliability engineers why they still use nothing but the MTBF, they say things like ‘we’ve never seen it be anything else.‘ And when we ask what, if anything, they have done to look for evidence to the contrary … ‘we just assume we are in the bottom of the bathtub curve.’ Some people don’t know that no system actually has a ‘bathtub curve’ that we see beautifully traced out in a textbook. So why are we still here?
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
The post SOR 965 MTBF, Really? appeared first on Accendo Reliability.

May 10, 2024 • 0sec
Finding Failures and Firefighting
Finding Failures and Firefighting
Abstract
Kirk and Fred discuss new product market release schedule pressures, and then after customers start finding reliability issues, the actual firefighting begins. Many times, those who quickly can fix the causes of failures, the firefighters, get many more accolades than those who find and mitigate product weaknesses that become failures during the design and development phase.
Key Points
Join Kirk and Fred as they discuss the common excuses for not doing enough analysis and testing to discover latent defects before market release, if it does happen. Many products are robust designs and the latent defects are introduced during assembly and final testing.
Topics include:
Suppose HALT reveals significant differences in environmental step stress limits in a small group of 3 to 5 samples. In that case, that is likely an indication of a wide distribution in a component or subsystems that some percentage of the weakest of the distribution will intersect with the worst-case end-use stress environment, even though there are not enough samples to do a statistical analysis with.
CAD systems can very well analyze how component variations will affect the functions of the circuits, but they are based on ideal averages and not the real parametric variations in high-volume production.
Understanding the root cause of failure is of utmost importance. Even the most robust designs can only be reliable if the manufacturing processes are capable and consistent. This underscores the weight of the responsibility as manufacturing professionals to ensure the quality and reliability of our products.
Real firefighters saving a person from a burning building generally get much more publicity and accolades than the inventors of the smoke detector, which have saved magnitudes of more people by alerting them at the beginning of a fire in time to escape.
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.
Download Audio RSS
Show Notes
Please click on this link to access a relatively new analysis of traditional reliability prediction methods article from the US ARMY and CALCE titled “Reliability Prediction – Continued Reliance on a Misleading Approach”. It is in the public domain, so please distribute freely. Trying to predict reliability for development is a misleading a costly approach.
You can now purchase the most recent recording of Kirk Gray’s Hobbs Engineering 8 (two 4 hour sessions) hour Webinar “Rapid and Robust Reliability Development 2022 HALT & HASS Methodologies Online Seminar” from this link.
For more information on the newest discovery testing methodology here is a link to the book “Next Generation HALT and HASS: Robust design of Electronics and Systems” written by Kirk Gray and John Paschkewitz.
The post SOR 964 Finding Failures and Firefighting appeared first on Accendo Reliability.