The goal of this task is to improve usb gadget mass storage.
- 1. Separate data/control USB buffering, optimize USB buffer types.
- 2. Discussions at LDS favoured non-copy. USE vfs buffers and skip the internal usb-buffers.
- 3. Final implementation turned out to be very simplified. Increase the number of pipeline buffers.
1. Separate ctrl from data (NOT IMPLEMENTED)
Use separate buffers for ctrl command and data. Commands are typically handled by CPU access and not DMA. Data on the other hand well suited for DMA.
Avoid any cache operations
Improve performance by using using uncached buffers if using DMA. If cmd buffers only are accessed from CPU, the buffer should be cached but it is not necessary to clean or invalidate those buffers.
2. non-copy (NOT IMPLEMENTED)
May still be relevant but it will only reduce cpu time and memory usage and not improve the actual through put. All copy and preparation of usb request are done in parallel with an ongoing USB transfer. As long as the time for copy and prepare is shorter than the time it takes for USB to complete the transfer, it's ok.
3. Increase number of pipeline buffers
The design proposal #1 and #2 both tries to optimize for the case when there is no USB transfer in parallel with the preparation the tan USB request. This will happen when there comes in a new request and there is no transfer ongoing. During a data transfer over gadget mass storage this should ideally only happen once for the very first request. If this is true it's not necessary to optimize for the single request scenario. With only 2 pipeline buffers it may happen and it may happen often. The USB transfer completes both buffers and VFS has not filled in new data. Increase the number of buffers is enough to compensate for a bursty VFS and get rid of all "single request", request that runs in serial to the USB transfer.
2. optimize pipeline buffer types (NOT IMPLEMENTED)
- Hack the buffer handling to separate ctrl and data, and avoid cache operations.
- Run performance test and validate if this implementation should continue, publish result here.
- Questions in order to set the proper implementation
- What buffer size should be used for commands?
- It is feasible to allocate 16k x 2 dma_coherent memory, or is this considered to be too much?
- The usb mass storage needs to know if dma is supported in order to use uncached buffers or not. How can g_mass_storage retrieve this information?
3. Increase number of pipeline buffers
Read more about it here.
There current patch sets number of pipeline buffers in compile time.
- First of all it would be preferable to set this in runtime.
- Current implementation add support for changing the number in Kconfig and using a module param if CONFIG_USB_GADGET_DEBUG_FILES is defined. To find out the optimal number one may manually test what number gives the best result. It is possible to run a test that loads the g_mass_storage num_buffers=X, and runs test with different values of X in order to find the best choice. Such automatic test cases are not implemented in the scope of this BP.
- The implementation of variable pipeline buffers are queued for 3.2.
BoF agenda and discussion
LDS usb presentation Notes:
USB Mass Storage optimization
- 16K buffers used for CMD and DATA.
- Buffer DMA mapped and later used only for CMD and PIO
- When using DMA, buffer does not benefit from cache:
- VFS writes to buffer, DMA copies, CPU never needs to touch it in between
- TODO: Separate buffers for CMD and DATA.
- Cached CMD buffers
- Uncached DMA buffers
Prototype -> publish results -> propose changes
- Other proposals:
- zero copy: Access VFS buffers directly and skip copy to/from USB buffers?
Use case: USB gadget mode in which ARM device is a target to a PC.
- Look at zero copy first
- If that does not help, then look at DMA/CMD buffer split
WorkingGroups/KernelArchived/Specs/usb-gadget-mass-storage-optimization (last modified 2013-01-14 19:45:02)