Monday, July 20, 2015

Avoiding EEPROM and Flash Memory Wearout


Summary: If you're periodically updating a particular EEPROM value every few minutes (or every few seconds) you could be in danger of EEPROM wearout. Avoiding this requires reducing the per-cell write frequency. For some EEPROM technology anything more frequent than about once per hour could be a problem. (Flash memory has similar issues.)

Time Flies When You're Recording Data:

EEPROM is commonly used to store configuration parameters and operating history information in embedded processors. For example, you might have a rolling "flight recorder" function to record the most recent operating data in case there is a system failure or power loss. I've seen specifications for this sort of thing require recording data every few seconds.

The problem is that  EEPROM only works for a limited number of write cycles.  After perhaps 100,000 to 1,000,000 (depending on the particular chip you are using), some of your deployed systems will start exhibiting EEPROM wearout and you'll get a field failure. (Look at your data sheet to find the number. If you are deploying a large number of units "worst case" is probably more important to you than "typical.")  A million writes sounds like a lot, but they go by pretty quickly.  Let's work an example, assuming that a voltage reading is being recorded to the same byte in EEPROM every 15 seconds.

1,000,000 writes at one write per 15 seconds is 4 writes per minute:
  1,000,000 / ( 4 * 60 minutes/hr * 24 hours/day ) = 173.6 days.
In other words, your EEPROM will use up its million-cycle wearout budget in less than 6 months.

Below is a table showing the time to wearout (in years) based on the period used to update any particular EEPROM cell. The crossover values for 10 year product life are one update every 5 minutes 15 seconds for an EEPROM with a million cycle life. For a 100K life EEPROM you can only update a particular cell every 52 minutes 36.  This means any hope of updates every few seconds just aren't going to work out if you expect your product to last years instead of months. Things scale linearly, although in real products secondary factors such as operating temperature and access mode can play a factor.



Reduce Frequency
The least painful way to resolve this problem is to simply record the data less often. In some cases that might be OK to meet your system requirements.

Or you might be able to record only when things change more than a small amount, with a minimum delay between successive data points. However, with event-based recording be mindful of value jitter or scenarios in which a burst of events can wear out EEPROM.

(It would be nice if you could track how often EEPROM has been written. But that requires a counter that's kept in EEPROM ... so that idea just pushes the problem into the counter wearing out.)

Low Power Interrupt
In some processors there is a low power interrupt that can be used to record one last data value in EEPROM as the system shuts down due to loss of power. In general you keep the value you're interested in a RAM location, and push it out to EEPROM only when you lose power.  Or, perhaps, you record it to EEPROM once in a while and push another copy out to EEPROM as part of shut-down to make sure you record the most up-to-date value available.

It's important to make sure that there is a hold-up capacitor that will keep the system above the EEPROM programming voltage requirement for long enough.  This can work if you only need to record a value or two rather than a large block of data. But it is easy to get this wrong, so be careful!

Rotating Buffer
The classical solution for EEPROM wearout is to use a rotating buffer (sometimes called a circular FIFO) of the last N recorded values. You also need a counter stored in EEPROM so that after a power cycle you can figure out which entry in the buffer holds the most recent copy. This reduces EEPROM wearout proportionally to the number of copies of the data in the buffer. For example, if you rotate through 10 different locations that take turns recording a single monitored value, each location gets modified 1/10th as often, so EEPROM wearout is improved by a factor of 10. You also need to keep a separate counter or timestamp for each of the 10 copies so you can sort out which one is the most recent after a power loss.  In other words, you need two rotating buffers: one for the value, and one to keep track of the counter. (If you keep only one counter location in EEPROM, that counter wears out since it has to be incremented on every update.)  The disadvantage of this approach is that it requires 10 times as many bytes of EEPROM storage to get 10 times the life, plus 10 copies of the counter value.  You can be a bit clever by packing the counter in with the data. And if you are recording a large record in EEPROM then an additional few bytes for the counter copies aren't as big a deal as the replicated data memory. But any way you slice it, this is going to use a lot of EEPROM.

Atmel has an application note that goes through the gory details:
AVR-101: High Endurance EEPROM Storage:  http://www.atmel.com/images/doc2526.pdf

Special Case For Remembering A Counter Value
Sometimes you want to keep a count rather than record arbitrary values. For example, you might want to count the number of times a piece of equipment has cycled, or the number of operating minutes for some device.  The worst part of counters is that the bottom bit of the counter changes on every single count, wearing out the bottom count byte in EEPROM.

But, there are special tricks you can play. An application note from Microchip has some clever ideas, such as using a gray code so that only one byte out of a multi-byte counter has to be updated on each count. They also recommend using error correcting codes to compensate for wear-out. (I don't know how effective ECC will be at wear-out, because it will depend upon whether bit failures are independent within the counter data bytes -- so be careful of using that idea). See this application note:   http://ww1.microchip.com/downloads/en/AppNotes/01449A.pdf

Note: For those who want to know more, Microchip has a tutorial on the details of wearout with some nice diagrams of how EEPROM cells are designed:
ftp://ftp.microchip.com/tools/memory/total50/tutorial.html

Don't Re-Write Unchanging Values
Another way to reduce wearout is to read the current value in a memory location before updating. If the value is the same, skip the update, and eliminate the wearout cycle associated with an update that has no effect on the data value. Make sure you account for the worst case (how often can you expect values to be the same?). But even if the worst case is bad, this technique will give you a little extra margin of safety if you get lucky once in a while and can skip writes.

If you've run into any other clever ideas for EEPROM wearout mitigation please let me know.

Leraning More
Nash Reilly has a nice series of tutorial postings on how Flash/EEPROM technology works. (I found out about these via Jack Ganssle's newsletter.)
http://cushychicken.github.io/nand-pt1-transistors/
http://cushychicken.github.io/nand-pt2-floating/
http://cushychicken.github.io/nand-pt3-arrays/
http://cushychicken.github.io/nand-pt4-pages-blocks/
http://cushychicken.github.io/nand-pt5-how-nand-breaks/
http://cushychicken.github.io/nand-pt6-dealing-with-flaws/ 
http://cushychicken.github.io/inconvenient-truths/

Oct 2019: Tesla is said to have a flash wearout problem for its SSDs.  https://insideevs.com/news/376037/tesla-mcu-emmc-memory-issue/


Job and Career Advice

I sometimes get requests from LinkedIn contacts about help deciding between job offers. I can't provide personalize advice, but here are...