Better Embedded System SW: November 2010

Monday, November 22, 2010

Embedded Software Risk Areas -- Dependability -- Part 1

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. If one of these bullets applies to your project, you should consider whether that presents undue risk to project success (whether it does or not depends upon your specific project and goals). The results of this study inspired the chapters in my book.

Here are some of the Dependability red flags (part 1 of 2):

No or incorrect use of watchdog timers

Watch dog timers are turned off or are serviced in a way that defeats their intended role in the system. For example, a watchdog might be kicked by an interrupt service routine that is triggered by a timer regardless of the status of the rest of the software system. Systems with ineffective watchdog timers may not reset themselves after a software timing fault.

Insufficient consideration of reliability/availability

There is no defined dependability goal or approach for the system, especially with respect to software. In most cases there is no requirement that specifies what dependability means in the context of the application (e.g., is a crash and fast reboot OK, or is it a catastrophic event for typical customer?). As a result, the degree of dependability is not being actively managed.

Insufficient consideration of security

There is no statement of requirements and intentional design approach for ensuring adequate security, especially for network-connected devices. The resulting system may be compromised, with unforeseen consequences.

Insufficient consideration of safety

In some systems that have modest safety considerations, no safety analysis has been done. In systems that are more overtly safety critical (but for which there is no mandated safety certification), the safety approach falls short of recommended practices. The result is exposure to unforeseen legal liability and reputation loss.

Wednesday, November 17, 2010

Embedded Software Risk Areas -- Verification & Validation

No peer reviews

Code, requirements, design and other documents are not subject to a methodical peer review, or undergo ineffective peer reviews. As a result, most bugs are found late in the development cycle when it is more expensive to fix them.

No test plan

Testing is ad hoc, and not according to a defined plan. Typically there is no defined criterion for how much testing is enough. This can result in poor test coverage or an inconsistent depth of testing.

No defect tracking

Defects and other issues are not being put into a bug tracking system. This can result in losing track of outstanding bugs and poor prioritization of bug-fixing activities.

No stress testing

There is no specific stress testing to ensure that real time scheduling and other aspects of the design can handle worst case expected operating conditions. As a result, products may fail when used for demanding applications.

Wednesday, November 10, 2010

Embedded Software Risk Areas -- Implementation -- Part 2

Ignoring compiler warnings

Programs compile with ignored warnings and/or the compilers used do not have robust warning capability. A static analysis tool is not used to make up for poor compiler warning capabilities. The result can be that software defects which could have been caught by the compiler must be found via testing, or miss detection entirely. If assembly language is used extensively, it may contain the types of bugs that a good static analysis tool would have caught in a high level language.

Inadequate concurrency management

Mutexes or other appropriate concurrent data access approaches aren’t being used. This leads to potential race conditions and can result in tricky timing bugs.

Use of home-made RTOS

An in-house developed RTOS is being used instead of an off-the-shelf operating system. While the result is sometimes technically excellent, this approach commits the company to maintaining RTOS development skills as a core competency, which may not be the best strategic use of limited resources.

Wednesday, November 3, 2010

Embedded Software Risk Areas -- Implementation -- Part 1

Inconsistent coding style

Coding style varies dramatically across the code base, and often there is no written coding style guideline. Code comments vary significantly in frequency, level of detail, and type of content. This makes it more difficult to understand and maintain the code.

Resources too full

Memory or CPU resources are overly full, leading to risk of missing real time deadlines and significantly increased development costs. An extreme example is zero bytes of program and data memory left over on a small processor. Significant developer time and energy can be spent squeezing software and data to fit, leaving less time to develop or refine functionality.

Too much assembly language

Assembly language is used extensively when an adequate high level language compiler is available. Sometimes this is due to lack of big enough hardware resources to execute compiled code. But more often it is due to developer preference, reuse of previous project code, or a need to economize on purchasing development tools. Assembly language software is usually more expensive to develop and more bug-prone than high level language code.

Too many global variables

Global variables are used instead of parameters for passing information among software modules. The result is often code that has poor modularity and is brittle to changes.

Better Embedded System SW