About a month ago, I had a problem. I needed to know how much data transfer was being used by each of my buckets in my Amazon S3 setup. I host both SignalLeaf and WatchMeCode file / episodes with S3 buckets, and I needed to know how much of the $1,000/mo bill I was paying was coming from which service, which SignalLeaf customer, which WatchMeCode episode.
After some digging, I found out that Amazon doesn’t tell you these things on it’s own. This is something that I just don’t understand… why wouldn’t they tell me how much data transfer / cost is coming out of each bucket, on their own? Fortunately they do provide logging – logging that would tell me how much data transfer is coming out of a given bucket, a given folder, for a given file. This would allow me to figure out the costs of each service, customer and episode as needed. You just have to enable logging on your bucket, and then find a logging service to crunch the numbers and give you the details that you need.
S3Stat.com To The Rescue
After doing some looking around, I found S3Stat.com – it’s a simple service that takes your log files from S3 and crunches the data in to some nice reports. It also does CloudFront stats, if you’re using CloudFront (I’m not at the moment, but may be soon).
One of the things I like about S3Stat is that it lets me have multiple buckets configured, and shows me stats for each bucket. I enabled logging and set up the S3Stat user for each bucket, and a day later saw this on my dashboard, telling me that I had some reporting data to look at.
This is the first thing that I needed for my cost analysis… a separation of data transfer between these buckets. But what I didn’t expect was the difference that I saw between these two buckets.
Data Transfer: WatchMeCode
I’ve been running S3Stat for about 20 days now, and the WatchMeCode reporting showed a much smaller number than I expected. I’ve got less than 30GB of data transfer, in fact. And at $0.12/GB, that comes out to less than $5 total charges for the last 20 days.
This is in spite of the thousands of downloads and views that WatchMeCode had… apparently when my friend Justin helped me re-encode the files to a smaller size, it really helped! The episode files were previously 4x larger… that would have only been $20 in charges, still, but I’ll take the $5 instead!
Data Transfer: SignalLeaf
Now SignalLeaf, on the other hand… I’m spending near $1,000 a month on S3 charges. In a 20 day period, that’s approximately $661 of charges when I take out the $5 that WatchMeCode is costing me. When I look at the report for SignalLeaf, then, I see something completely unexpected:
That’s near 4.5 TERABYTES of data transfer for the last 20 days!!!
Needless to say, I was floored by this discovery. Clearly I’m doing a lot of data transfer from SignalLeaf with all the podcasts that I’m hosting, but I was not expecting this at all.
(and yes, I know the numbers I’m talking about don’t add up… I’m using estimated $/per month with only 20 days of actual data analysis – it’s not going to add up completely, right now)
Now That I Have The Numbers
I was expecting far more of a balance between WatchMeCode and SignalLeaf, quite honestly. In fact, I had been assuming WatchMeCode was doing a lot more data transfer than it was – that’s why I went and re-encoded all my episodes to a smaller file size. But it turns out, as usual, my assumptions were bad. The real culprit of my $1,000/mo S3 data transfer charges is SignalLeaf…
The moral of the story: don’t assume you know where the real cost is happening. Measure things, and look at the actual reports to find the source. Whether it’s data transfer, performance problems in code, or whatever you think the problem / culprit may be, you need to do actual measurement to know for sure.
For me, I’m glad I turned on logging in S3 and I’m glad I found S3Stat.com. The combination of these two things has allowed me to dig in to not only the high level culprit for my high S3 data transfer costs, but also in to lower level details to see each file and by association of folders, see which of my podcast customers are costing me so much money. And now that I have these numbers to work with, I can work on solutions to the cost problem.