Ahh nevermind, I didn't read the documentation. I added bs=16M and got a much better result. 973 MB/s.
It by default is doing 512 BYTE chunks, which is kind of wild. I'll see if I can make progress on the stream import.
Edit:
Default:
Command: qemu-img dd -f raw -O raw osize=32212254720 if=/root/test-netcat-plaindd.raw of=/root/test-local-qemu-dd.raw
Timing
- Total Elapsed Time: 12 minutes 22.28 seconds
- User CPU Time: 283.83 seconds
- System CPU Time: 467.52 seconds
- Total CPU Time: 751.35 seconds
- CPU Usage: 101% (slightly over one CPU core)
Performance
- Data Transferred: ~32.2 GB
- Throughput: ~43.4 MB/s
- Maximum Memory Used: ~32 MB
I/O Statistics
- File System Reads: 18,259,176 operations
- File System Writes: 62,945,008 operations
- Context Switches: 251,830,712 voluntary / 14,882 involuntary
- No Page Faults Requiring I/O: 0 (all data stayed in memory)
Command: qemu-img dd -f raw -O raw bs=16M osize=32212254720 if=/root/test-netcat-plaindd.raw of=/root/test-local-qemu-dd.raw
Timing
- Total Elapsed Time: 33.09 seconds
- User CPU Time: 0.06 seconds
- System CPU Time: 23.81 seconds
- CPU Usage: 72%
Performance
- Data Transferred: ~32.2 GB
- Throughput: ~973 MB/s
- Maximum Memory Used: ~47 MB
I/O Statistics
- File System Reads: 16,802,016 operations
- File System Writes: 63,029,640 operations
- Context Switches: 12,759 voluntary / 632 involuntary
- Major Page Faults: 19 (vs 0 before)
Comparison: Default vs bs=16M
So you know, only 19000x less context switching. Lol.