A python script to automate fetching data from the IPEDS database by scrapping its website
It has been tested to work with python2.
-
Run
python data_script.py <with options, see section below>to download all the csv files you want to import into the database -
Run
python generator.pyto generatemodel.pyandadmin.pythat Django uses to create the database scheme (model.py) and to register the database in the admin interface (admin.py) -
Run
python manage.py makemigrations --empty(from the root) to create deafult empty migrations -
Run
python manage.py makemigrationsto make the initial migrations -
Copy
migrate_data.pyinto the migrations folder and modify the second option of thedependciesline to the name of the inital migration file created during step 4dependencies = [ ('ipeds_import', '< filename before .py of initial migration file created in step 4 >') ]
-
Now everything should be ready to go! Run
python manage.py runserverto start the server and head tolocalhost:8000/adminto log in and see the changes made to the database
usage: data_script.py [-h] [-f] [-p PREFIX] [-y YEAR] [-s SERIES SERIES] [-c CHECK] [-a] [-d]
This program scraps the IPEDS website for its .csv data files.
optional arguments:
-h, --help show this help message and exit
-f, --fresh refreshes download cache, run if the files you are getting are old
-p PREFIX, --prefix PREFIX define the prefix of the files wanted, default is "HD" (for getting HDxxxx.zip files for example)
-y YEAR, --year YEAR input one number indicating the year you want and downloads it with specified prefix
-s SERIES SERIES, --series SERIES SERIES input two numbers indicating series of years you want (from year of first number to year of second number) and downloads them with specified prefix
-c CHECK, --check CHECK checks to see if the files (with the given prefix - default is HD - and year) exist
-a, --checkAll checks to see if any files exist (note that checkAll overrides all other options), <Response 200> indicates that it does (google search HTTP codes for other troubleshooting)
-d, --downloadAll downloads all files with specified prefix