# Astro 496

This is the course website for Astro 496, Spring 2018

# Assignment 1 - Data Containers

Due: 2018-02-20

For this assignment, you are to construct a Jupyter notebook that explores data types in Python.

• Demonstrate how lists, tuples, strings and arrays work
• Explore how you might use iteration
• Write functions to test performance of data structures

You should test these structures:

• sets
• lists
• dicts
• numpy arrays

You will be graded on the writeup that accompanies this notebook. You should either do this writeup inline in the notebook or as a separate document. Here are some areas to address:

• Which data structures should you use if:
• You do or do not know their size in advance
• You want to have different data types in each
• You want to be able to remove things from the middle or the end
• You need to quickly perform mathematics on the elements
• For each data structure, address:
• How well does it perform when it is empty (0 items), when it is medium-sized (10000 elements) and when it is large (10 or 100 million elements)? Some things you may want to evaluate – how long does it take to sum all the elements, to insert an element, to append an element, to join two containers.
• Did you find anything surprising?

Here are a few tips:

• Ensure that when you run timing tests, you test exactly what you think you are testing. For instance, make sure you’re just testing the actual operations – not any setup or teardown costs.
• If you can, re-use functions. You may not be able to in all cases, however.
• Please write everything up; the majority of your grade will be based on what you say about your experiences and what you learned. This class is not meant to simply instruct you in how to code, but how to think about what you code.
• When you time things, time how long it takes to do something multiple times, not just one time.

An example of what the code might look like for a list is as follows:

``````import time

small = 10
medium = 10000
large = 10000000

small_list = []
medium_list = []
large_list = []

for i in range(small):
small_list.append(i)

for i in range(medium):
medium_list.append(i)

for i in range(large):
large_list.append(i)

Ntimes = 100

t1 = time.time()
for i in range(Ntimes):
max(small_list)
t2 = time.time()

small_max = (t2 - t1)/Ntimes

t1 = time.time()
for i in range(Ntimes):
max(medium_list)
t2 = time.time()

medium_max = (t2 - t1)/Ntimes

t1 = time.time()
for i in range(Ntimes):
max(large_list)
t2 = time.time()

large_max = (t2 - t1)/Ntimes
``````

Here, note that we’re also increasing the size of the list! So we’d probably want to look at each of the times divided by the size of each list. That way, we’ll know how performance changes as a function of the size of the container.