RTFM

[Read This Fine Material] from Joshua Hoblitt

First look at the Intel DC S3700 series of SATA interface SSDs

| 0 comments

Intel has been shipping the DC S3700 series fairly quietly for a couple of months now. I suppose that “DC” is supposed to mean Data Center? This new series is also high write endurance MLC based flash that’s designed to compete against high cost/high performance/high write life SLC devices. The spec sheet says it’s still 25nm MLC but must be either a new flash cell design and/or paired with a new controller as the performance blows the previous 710 series away. The interface has been updated from 3G SATA to 6G SATA (SATA 3), which is obviously needed now as the read & write spec exceeds 3G SATA (SATA 2). I also assume that the DC S3700s retain the 710s internal battery (or is it a capacitor?) backed cache; the spec sheet has just “Enhanced power-less data protection”.

Per drive capacity reliability numbers aren’t given, what the spec sheet has is “10 drive writes/day over 5 years while running JESD218 standard”. I assume that means write endurance is at least 10 x 365 x 5 x capacity or endurance = 18250 x drive capacity. However, it’s not clear what size those writes are (as the factor of write amplification due to write size vs erase block size will have a massive impact on cell life) as I don’t have access to a copy of that standard. If those are 4K random writes this works out to about 0.2x the endurance of the best SLC devices, which would be impressive. This could be more or less the same write life per cell as the 710 series but with a different test standard, which would not be impressive.

Is the fact that a new generation device is still 25nm based a sign that Intel has grossly excess 25nm fab capacity or a sign that there is trouble with flash cell reliability at smaller design sizes? I suspect a more likely possibility is that this is the exact same 25nm high endurance MLC flash as used in the 710 series paired with a better performing Sandforce based controller as is used in the Intel 520 & 330 series “regular” MLC devices.

While the 710 series was/is available as 100, 200, & 300GB 2.5″ SATA drives, the new DC 3700s are available in 100, 200, 400, & 800GB 2.5 SATA and 200 & 400GB 1.8″ SATA. The 1.8″ SATA drives are probably aimed at the blade server market. The pricing on the new series is also presently cheaper. The 100GB DC S3700s started shipping at ~$250 while the 100GB 710s were still hovering at a little over $400. It looks like the price on the 710s is starting to drop quickly thou as today I can find the 100GB 710 on AMZ for ~$300. Compared to what I would consider a best of bread SLC device, the Hitachi Ultrastar SSD400S, the pricing is incredible. The Hitachi devices cost OVER $1,500 for the 100GB version. At that level of cost delta, the DC S3700s would still be cost effective with less than 0.166x the write life. However, note that the Hitachi SLC devices are SAS while all of the Intel MLC devices are SATA. SATA will be a deal breaker for dual SAS expander setups. It’s curious that Intel is not shipping this series with at least a SAS interface option.

At the rate flash is increasing performance 12G SAS/SATA 4 controllers better start shipping in volume soon or I expect the popularity of PCIe flash solutions to start increasing.

DC 3700 vs. 710 vs. SSD400S specs
Drive Read MB/s Write MB/s Random Read 4K IOPS Random Write 4K IOPS Write Endurance
710 – 100GB 270 170 38,500 2,300 500TB (4K random)
710 – 200GB & 300GB 270 210 38,500 2,700 1.0 & 1.1 PB (4K random)
DC 3700 – 100GB 500 200 75,000 19,000 1.825PB (JESD218)
DC 3700 – 200GB 500 365 75,000 32,000 3.65PB (JESD218)
DC 3700 – 400/800GB 500 460 75,000 36,000 7.3 & 14.6 PB (JESD218)
Ultrastar SSD400S SLC & 6G SAS (all capacities?) 516 458 41,000 21,000 9 PB per 100GB of capacity (says random, assume 4K)

I did a fast, unscientific, test with dbench 4.0 in O_SYNC mode with 24 threads. No attempt was made to tune the Linux block I/O settings or the ext4 filesystem used. The system I tested on also had an active load so I’m not going to post the test system setup. Even so, this is by far the most impressive single device dbench/O_SYNC results I’ve seen to date. I’ll throw in the test system’s boot device, a 7200RPM “near line” SATA drive, for rough comparison against spinning media.

[root@foonsd2 ~]# lsscsi --verbose 0:0:1:0
[0:0:1:0]    disk    ATA      INTEL SSDSC2BA10 5DV1  /dev/sdd 
  dir: /sys/bus/scsi/devices/0:0:1:0  [/sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:1/0:0:1:0]
[root@foonsd2 ~]# function stdpart {
>     DATA_DEV=$1
>     PART_NAME=$2
> 
>     PARTED=/sbin/parted
>     SECTOR_ALIGNMENT=8192
> 
>     $PARTED -s /dev/${DATA_DEV} mklabel gpt
> 
>     DATA_TOTAL_SECTORS=`$PARTED -s /dev/${DATA_DEV} unit s print free | grep Disk | awk '{print $3}' | cut -d 's' -f1`
> 
>     if [ $SECTOR_ALIGNMENT -lt 128 ];
>     then
>       DATA_LVM_START=128
>     else
>       DATA_LVM_START=$SECTOR_ALIGNMENT
>     fi
> 
>     DATA_LVM_END=$(((($DATA_TOTAL_SECTORS - 1) / $SECTOR_ALIGNMENT) * $SECTOR_ALIGNMENT - 1))
> 
>     if [ -z $PART_NAME ]
>     then
>         PART_NAME=`hostname -s`.pv00
>     fi
> 
>     $PARTED -s /dev/${DATA_DEV} unit s mkpart primary $DATA_LVM_START $DATA_LVM_END
>     $PARTED -s /dev/${DATA_DEV} unit s name 1 $PART_NAME
> 
>     $PARTED -s /dev/${DATA_DEV} unit s print
> }
[root@foonsd2 ~]# stdpart sdd test
Model: ATA INTEL SSDSC2BA10 (scsi)
Disk /dev/sdd: 195371568s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End         Size        File system  Name  Flags
 1      8192s  195371007s  195362816s               test

[root@foonsd2 ~]# mkfs.ext4 -j /dev/sdd1 
mke2fs 1.41.12 (17-May-2010)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
6111232 inodes, 24420352 blocks
1221017 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
746 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
[root@foonsd2 ~]# yum install -y dbench

...

[root@foonsd2 ~]# dbench --directory=/mnt/tmp --sync -t 60 24
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004

Running for 60 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 12 secs
21 of 24 processes prepared for launch   0 sec
24 of 24 processes prepared for launch   0 sec
releasing clients
  24       106   118.40 MB/sec  warmup   1 sec  latency 48.212 ms
  24       258   116.81 MB/sec  warmup   2 sec  latency 32.805 ms
  24       413   116.62 MB/sec  warmup   3 sec  latency 33.279 ms
  24       566   116.21 MB/sec  warmup   4 sec  latency 64.177 ms
  24      1377   120.19 MB/sec  warmup   5 sec  latency 60.370 ms
  24      2786   125.97 MB/sec  warmup   6 sec  latency 79.440 ms
  24      4169   137.88 MB/sec  warmup   7 sec  latency 42.509 ms
  24      5592   137.74 MB/sec  warmup   8 sec  latency 92.960 ms
  24      6412   136.19 MB/sec  warmup   9 sec  latency 39.724 ms
  24      7569   141.34 MB/sec  warmup  10 sec  latency 97.209 ms
  24      8668   138.89 MB/sec  warmup  11 sec  latency 125.324 ms

...

  24  cleanup  60 sec
   0  cleanup  60 sec

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX     337461     0.192    48.156
 Close         247608     0.002     0.375
 Rename         14258     1.453    39.979
 Unlink         68245     0.688    38.411
 Qpathinfo     305413     0.050    34.690
 Qfileinfo      53101     0.001     0.858
 Qfsinfo        56038     0.005     1.028
 Sfileinfo      27378     1.446    36.467
 Find          117983     0.027    15.587
 WriteX        166238     6.908   203.554
 ReadX         527817     0.122    47.793
 LockX           1086     0.003     0.079
 UnlockX         1086     0.001     0.077
 Flush          23633     1.362    68.226

Throughput 174.88 MB/sec (sync open)  24 clients  24 procs  max_latency=203.559 ms
[root@foonsd2 ~]# umount /mnt/tmp/
[root@foonsd2 ~]# dd if=/dev/zero of=/dev/sdd count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.00114995 s, 445 kB/s


[root@foonsd2 ~]# lsscsi --verbose 0:0:0:0
[0:0:0:0]    disk    ATA      Hitachi HUA72201 JP4O  /dev/sda 
  dir: /sys/bus/scsi/devices/0:0:0:0  [/sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0]
[root@foonsd2 ~]# mkdir /tmp/dbench
[root@foonsd2 ~]# dbench --directory=/tmp/dbench/ --sync -t 60 24
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004

Running for 60 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 12 secs
22 of 24 processes prepared for launch   0 sec
24 of 24 processes prepared for launch   0 sec
releasing clients
  24        20    10.68 MB/sec  warmup   1 sec  latency 276.131 ms
  24        28    11.11 MB/sec  warmup   2 sec  latency 263.714 ms
  24        39    11.40 MB/sec  warmup   3 sec  latency 247.485 ms
  24        48    11.79 MB/sec  warmup   4 sec  latency 237.449 ms
  24        58    11.62 MB/sec  warmup   5 sec  latency 258.607 ms
  24        66    11.62 MB/sec  warmup   6 sec  latency 260.589 ms
  24        76    11.45 MB/sec  warmup   7 sec  latency 255.196 ms
  24        84    11.46 MB/sec  warmup   8 sec  latency 283.232 ms
  24        94    11.37 MB/sec  warmup   9 sec  latency 274.574 ms
  24       101    11.31 MB/sec  warmup  10 sec  latency 290.769 ms
  24       113    11.34 MB/sec  warmup  11 sec  latency 237.590 ms

...

  24  cleanup  60 sec
   0  cleanup  60 sec

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX       9244     6.757   221.715
 Close           8137     0.002     0.109
 Rename           304     8.485    97.074
 Unlink           787     6.380   122.212
 Qpathinfo       7544     0.727   159.491
 Qfileinfo       2861     0.001     0.007
 Qfsinfo          923     0.005     0.098
 Sfileinfo        979    11.716   157.608
 Find            2517     0.129   107.114
 WriteX         11368   116.134   519.096
 ReadX           9093     2.259   129.707
 Flush            767    18.829   243.631

Throughput 9.02044 MB/sec (sync open)  24 clients  24 procs  max_latency=519.101 ms

Leave a Reply