Electronics
Btrieve
Motorcycling
Software

Cleaning up junk that Azure creates in your storage account

At the very moment that you create a storage account in your azure subscription, you will start paying for it, and Azure will start creating data that you will be paying for. It is called Windows Azure Diagnostics, and is basically very high overhead log of windows events for everything that you may or may not care about in relation to your storage account. See this article for explanation.

So now, after reading the article, you probably agree that you need to clean up this mess for your own good, and Azure is not exactly helping you in doing it. Fear not, it is actually not that complicated. Fundamentally the only complicated part is to figure out how to efficiently select wanted rows from the tables - which boils down to how you calculate the partition ID for the rows. You would typically want to keep the logs for some reasonable time, and delete the records older that your retention time.

In python, this is how:

# convert azure timestamp to unix time def azts2time(azt): t=time.gmtime((float(azt)*1e-7)-11644473600) return time.mktime((t.tm_year-1600,t.tm_mon,t.tm_mday,t.tm_hour,t.tm_min,t.tm_sec,0,0,0)) # convert unix time to azure timestamp def time2azts(unixtime): t=time.gmtime(unixtime) t=time.mktime((t.tm_year+1600,t.tm_mon,t.tm_mday,t.tm_hour,t.tm_min,t.tm_sec,0,0,0)) t=(float(t)+11644473600)*1e7 return t

Azure timestamps are 100ns resolution time counters, while unix time has 1 second resolution. The partition key is just an azure timestamp with '0' added in front of it. using the above, your cleanup code for WAD tables can be quite trivial:

import azure from azure import storage import time keepdays=30 table_service=storage.TableService("account_name","account_key") wadtables = [ "WADLogsTable", "WADDiagnosticInfrastructureLogsTable", "WADDirectoriesTable", "WADPerformanceCountersTable", "WADWindowsEventLogsTable" ] def time2azts(unixtime): t=time.gmtime(unixtime) t=time.mktime((t.tm_year+1600,t.tm_mon,t.tm_mday,t.tm_hour,t.tm_min,t.tm_sec,0,0,0)) t=(float(t)+11644473600)*1e7 return t flt="PartitionKey le '0%d'"%(time2azts(time.time()-keepdays*86400)) for table_name in wadtables: print table_name while True: try: events=table_service.query_entities(table_name,filter=flt,select='PartitionKey,RowKey') except azure.WindowsAzureMissingResourceError: break if len(events)==0: break for el in events: tableservice.delete_entity(table,el.PartitionKey,el.RowKey)

Azure, of course, will charge you for all these delete transactions, and for performance also you would actually want to run in on some VM close the the storage account instance.

Copyright © Madis Kaal 2000-