Summary

Improve MMC performance by adding support for new MMC operations available in JEDEC 4.41. Improve the MMC framework performance by minimizing latency between two consecutive MMC requests.

Release Note

Add support for MMC operation such as background operations, high priority interrupts, reliable writes available in JEDEC 4.41. Add double buffering support to the MMC-framework to parallelize dma-transfer and cache maintenance.

Rationale

Managed NANDs supporting MMC/SD interface are becoming more and more popular. The two big benefits are error correction code (ECC) in hardware and standard hardware interface. This makes it possible to run a common file system such as ext4, but is this really optimal?

This is an overview view of the stack:

  • standard FS, Ext4
  • block device
  • MMC framework
  • MMC host driver
  • eMMC controller
  • eMMC card

One take on this is to to start optimize performance bottom up.

  • What does the hardware support?
  • Is this supported by the MMC-framework and host driver?
  • Does the FS utilizes these features to gain performance?

User stories

  1. Extensively writes to MMC results in loss in performance over time due to memory maintenance, what can be done to minimize this drop?
  2. In small devices it is often desirable to offload the CPU using DMA, will there be a performance penalty for DMA?
  3. From a toplevel perspective all reads and writes to the MMC may not have the same importance. Serving a demand paging is normally more urgent compared to reading a big jpg-file into RAM. Is there a way to give priority?
  4. It is not optimal to do many small reads and especially writes (less than erase block size) to MMC but what can be done avoid this?
  5. EMMC has hardware ECC (error correction code) does this help when the power is suddenly cut off. Any risk of the filesystem being corrupted?
  6. Will DDR_MODE for eMMC really double the performance?
  7. What would be the optimal characteristics for a filesystem on eMMC? Is ext4 well suited for eMMC?

Assumptions

Available hardware eMMC supporting JEDEC 4.41 and additional JEDEC 4.41 optional features that are needed to fulfill this design.

Design

1. Write performance drop

  • Investigate eMMC (JEDEC version 4.4) "trim" support in Linux kernel and ARM SoC. Trim support is available in v2.6.36 for SD/MMC/EMMC, Fill out implementation and fix bugs as needed.
  • Ext4 utilize trim operations to improve performance. Use benchmark tests from benchmarks to test how big affect TRIM has on performance.

  • Add support for background operation (JEDEC version 4.41) in the mmc framework and filesystems. In the best cases all maintenance operations such as trim and erase should run in the background while the card is idle.

2. double buffer for DMA

Improve MMC framework start/prepare job latency. (applicable for SD performance as well sd)

  • Investigate what type of MMC request that causes the number of pending request in the block device to increase.
  • The file system adds jobs asynchronously to the block device layer. The MMC framework runs one job at a time. A MMC job consist of two steps, send MMC cmd and request data transfer. The MMC framework waits until the data transfer is complete, then the MMC framework start executing the next job from the queue.
    • Action - minimize latency between job complete and next job started.
    • Would it make any sense to pass multiple commands to the host driver? For SCSI there is something called bi-directional (bidi) request, could this be of interest for MMC as well?

3. Prevent small MMC transfers

Improve MMC framework job scheduling. (applicable for SD performance as well sd)

  • Concatenate consecutive Write:s and Read:s in the block queue and pass down as one job to host driver.

  • ARCH_HAS_SG_CHAIN is not supported for ARM, is this a limitation?
  • The IO-scheduler concatenates consecutive read/write to some extent, are there any potential improvements to be made here? Would it improve performance to fetch multiple reads or writes from the block-queue?

4. Priority

Add support for the high priority interrupt operation (JEDEC version 4.41) in the mmc framework and filesystem, High priority interrupt allows a low-priority large read/write to be interrupted to make way for a high-priority operation, for instance a demand paging.

5. Reliable eMMC

Add support for reliable write in the mmc framework and filesystem (JEDEC version 4.3) (NAND Flash not reliable in the rotating-mass-storage sense). Reliable write is slower compared to normal write therefore it shall only be used for writing critical data such as partition tables. The filesystem has a flag to mark meta data. On block-level the flag is "struct request"->cmd_flags & WRITE_META. A straight forward way is to use this flag to write all meta data using the reliable write operation. A more optimized implementation would be to only write journals using reliable write.

6. DDR_MODE eMMC

Add support for DDR_MODE in mmc host driver (or use available host driver that supports it) and evaluate performance. In DDR_MODE emmc operates on both rising and falling edge.

7. FS for eMMC

Investigate FS performance for eMMC. (applicable for SD performance as well sd)

  • Investigate potential areas of improvements on FS-level for MMC in particular

Implementation

What order to do things?

The right order to implement this is not 1 through 7. One should pick the items that gives the most "bang for the buck". This is the proposed order:

  • Prestudy (Obviously the first one)
  • Verify eMMC support (Good to know early if the hardware is compatible, else need to look for other hardware)
  • 2. double buffer for DMA (Should be relatively straight forward to implement, get to know the internals in the MMC framework at the same time)
  • 5. Reliable eMMC (Guessing implementation should be fairly easy, testing phase is really long, next task may be started at this point)
  • 6. DDR_MODE eMMC (should be trivial)
  • 7. FS for eMMC (Good to have experience from previous tasks to focus on the right things)
  • 1. Write performance drop (this could be the most difficult one)
  • 4. Priority (Implementation not too hard, but at this point I don't see any good use cases for it. May change priority if an interesting case pops up)
  • 3. Prevent small MMC transfers (Little chance for good value for the money)

Prestudy

  • Read up on JEDEC eMMC 4.41
  • Read up on what is the preferred FS characteristic for MMC
  • Read up on the expected life time for eMMC
  • Read up on the expected performance drop over time due to wear leveling affects

Code Changes

Verify eMMC support

  • Add support in the MMC framework for detecting EXT_CSD for:
    • bg-operations
    • Reliable write
    • High Priority Interrupts
  • Add debug print to output the new EXT_CSD info at MMC init()

1. Write performance drop

  • Read up on the current algorithm for MMC maintenance for different FS.
  • Read up on the current patches on BG-ops on the mmc-list BG ops

  • Sketch a simple prove of concept implementation that defers erase and runs periodically BG-jobs.
    • Implement
    • Make it work
    • experiment using different erase technics such as trim secure, trim, erase as background operations.
    • Add performance test to mmc_test
    • Add debug code in mmc-framework to monitor/gather statistics from BKOPS_STATUS EXT_CSD.
  • Make a proper design
    • Get the design reviewed
    • Update design after review
    • break up implementation in parts, so that each piece can be sent to mainline in increments
    • break down into work items

2. double buffer for DMA

  • Add double buffering for DMA_MODE. Let the MMC framework pass down jobs in pair (if 2 or more jobs are available in the queue)
    • current-job and next-job. This will give the host driver a chance to prepare the next job while dma executes the current job, preparing the next job is optional.
  • core/block.c rw_rq() converts the generic block into a mmc request. The same function prepares the mmc-request, send it to host driver and checks the result. This functionality needs to be split in order to pass two mmc-request to the host driver.
  • The mmc-request data is stack local in rw_rq(), this can't work if the next mmc-request shall be valid after rw_rq() has returned
  • The host driver needs to handle preparation of the next request.
  • Make a proper design
    • Get the design reviewed
    • Update design after review
    • break up implementation in parts, so that each piece can be sent to mainline in increments
    • break down into work items

3. Prevent small MMC transfers

  • Read up on the IO-scheduler
  • Implement test cases to show how performance depends on block size.
  • Measure performance for different IO-sched configurations
  • Implement probes to monitor statistics in block level queue
    • number of request in the queue
    • number of consecutive read/write request
    • size of reads and writes
    • Run tests to show what is causing none optimal block level request queue status
  • Decide if it's possible to improve the performance in this matter

4. Priority

  • Read up on the current patches on HPI-ops on the mmc-list HPI

  • Implement a priority member in the block request struct
  • Implement a prove of concept in the MMC framework to handle the priority flag
  • Implement test cases in mmc_test.c that utilize the priority flag
  • Make a proper design
    • Get the design reviewed
    • Update design after review
    • break up implementation in parts, so that each piece can be sent to mainline in increments
    • break down into work items

5. Reliable eMMC

  • Reliable write has special requirements on alignment, investigate if this is a problem.
  • Add configuration in the MMC-framework to enable reliable write for all writes
  • Compare the throughput cost of normal write vs reliable write.
  • Implement and set up power crash tests
  • Compare the robustness cost of normal write vs reliable write.
  • At block level meta data is indicated by the cmd_flags & META_WRITE. Confirm that this flag is set and used by the FS.

  • Make a proper design
    • Get the design reviewed
    • Update design after review
    • break up implementation in parts, so that each piece can be sent to mainline in increments
    • break down into work items

6. DDR_MODE eMMC

  • Implement or find a host driver with DDR_MODE support
  • Run test to compare performance with DDR_MODE enable and disable

7. FS for eMMC

  • Read up on FS/MMC-host-driver configuration that concerns MMC.
    • Run appropriate test from benchmarks to show how configurations affect performance

Migration

  • Update any valid docs under Documentation
  • Verify that all new features are covered by mmc_test.c

Test/Demo Plan

Deliverables

The mainline mmc framework

  • MMC framework support for background operations and trim (patches are already exist).
  • MMC framework support for HPI: POSTPONED
  • MMC framework support for reliable write on WRITE_META: POSTPONED
  • MMC framework support double buffering
  • MMC host driver supporting all of the DDR_MODE, double buf, reliable write, hpi, bg opt

Papers

  • Investigate use cases and the potential benefit of block request priority: POSTPONED
  • Investigate IO-scheduler and how to concatenate block queue requests even more, bundle multiple request into one.: POSTPONED
  • Investigate BIDI used on scsi would it make sense in MMC framework.
    • Conclusion in brief: BIDI add support to send a chain of commands reads and writes to the controller. This isn't compatible with the MMC stack where there is no pipelining for commands. Commands are simply processed one by one.
  • Investigate potential areas of improvements on FS-level for MMC in particular: DUPLICATE
  • File system configurations and comparison on eMMC fs-compare

Investigate potential areas of improvements on FS-level for MMC in particular: DUPLICATE

Tests

The test scope of this blueprint is restricted to verify what is implemented or to identify potential bottleneck and areas for improvement. A more general overall testing of the storage performance is covered by benchmarks.

  • Crash testing reliable write
  • Performance test for reliable write
  • Performance test showing benefit of double buffering in the mmc framework.
  • Performance test showing increase of memory bandwidth over time for trim filesystem vs no trim
  • Performance test showing increase of memory bandwidth using background maintenance vs foreground
  • Performance test showing the increase of memory bandwith using DDR_MODE
  • Enable all features and run all the tests.
    • Investigate the result in order to tell if any of the improvements conflict (one improvement decreases performance for another).

Unresolved issues

BoF agenda and discussion

Investigate eMMC (version 4.4 standard) "trim" support in Linux kernel and ARM SoC. Fill out implementation and fix bugs as needed. Add support for background operations (such as internal-maintenance operations including erase) to drivers and to filesystems. Add support for reliable write (NAND Flash not reliable in the rotating-mass-storage sense). Investigate adding support for high priority interrupt (requires Linux vfs enhancements). High priority interrupt allows a low-priority large read/write to be interrupted to make way for a high-priority operation.


CategorySpec

WorkingGroups/KernelArchived/Specs/StoragePerfEMMC (last modified 2013-01-14 19:46:07)