Content-MD5 considered helpful

Kind of an interesting thread going on the Amazon Web Services Forum, about data corruption on S3. It highlights how important it is for clients to send something like the Content-MD5 HTTP header to checksum the HTTP payload, and for the server to check it before saying 200 OK back…at least for data storage REST applications:

When Amazon S3 receives a PUT request with the Content-MD5 header, Amazon S3 computes the MD5 of the object received and returns a 400 error if it doesn’t match the MD5 sent in the header. Looking at our service logs from the period between 6/20 11:54pm PDT and 6/22 5:12am PDT, we do see a modest increase in the number of 400 errors. This may indicate that there were elevated network transmission errors somewhere between the customer and Amazon S3.

Some customers are claiming that the md5 checksums coming back from s3 are different than the ones for the content that was originally sent there. Perhaps the clients ignored the 400? Or maybe there is data corruption elsewhere. It’ll be interesting to follow the thread.