Handling CSV files with Python

Today I got my Cellphone bill and since I’ve been using Tethering a lot lately I wanted to find out how much data I actually used during the last month. Looking at the Bill I got I realized they just keep telling you the data for every single connection but never the whole amount, and since adding up which feels like hundreds of numbers, is not really my favorite past-time I decided to try my luck with the .csv file they offer, too. Sadly the MacOS Numbers.app failed completely importing it, so from the 8 rows only 2 showed up. Realizing that a spreadsheet seemed not an option I decided to crank out a small script to do it for me and here we go.

Since I python is my preferred scripting language, and I knew there must be a module to handle .csv, I settled for it and after reading the documentation I found it to be really easy. Opening the file just works as always

csvfile = open(“YourFile.csv”)

Now comes the fun part since, of course, the file is not really comma separated (like you would think seeing .csv) but | separated, and has like a way to many rows and columns setting it up by hand might take a while, and will probably break on my next bill, I was looking for a more generic way. This is where the great CSV module shines because there is actually a piece of code to determine the, so called, dialect which is the delimiter, and row / column setup.

csvsniffer = csv.Sniffer()
dialect = csvsniffer.sniff(csvfile.read(4096))

The only tricky part is to get a large enough sample (the read(4096) part) to be sure the Sniffer gets the right setup. In the Python documentation they always take like 1024, which if the file has a lot of columns will probably fail. 4096 is a pretty big sample but, well I won’t run into performance issues anyway. Finnally there is actually just one more thing to do to really get a hold of all the data in the file, parse it with the now determined options.

reader = csv.DictReader(csvfile, dialect=dialect)

Using the DictReader instead of the normal reader simplifies everything a lot since the result is a collection of Python Dictionaries in which all items are tagged with the column they came from. Now to get the data you want is just a matter of running a loop over the collection, and filtering the right tags. In my case it looked like this

data_used = 0
for item in csv_dict_reader:
if item[‘TARIFGRUPPE’] == ‘WAP / Internet’:
data_used += int(item[‘DATENVOL’].split(’ ‘)[0])

Thats it, if you like check out the source for this script.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.