UCL Data Science Student Challenge 2016
Your goal for the UCL Data Science Student Challenge is to hack & develop an innovative data science solution to improve the lives of Londoners while demonstrating the use of Azure Machine Learning, Microsoft’s data science tool.
This Azure Machine Learning tutorial will get you started using the PEACH dataset provided for this hackathon.
You can activate your Azure passes by going to this website – www.microsoftazurepass.com
You can use any public datasets for your solution, including:
- London Datastore, containing over 500 open datasets about all aspects of living in London, http://data.london.gov.uk/
- UK Government Open Data at https://data.gov.uk/
- Transport for London open data at https://tfl.gov.uk/info-for/open-data-users/
- API documentation is at https://api.tfl.gov.uk/ along with a great article on how to use it http://blog.tfl.gov.uk/2015/10/01/tfl-unified-api-part-1-introduction/
Special datasets for this weekend
We have made two large datasets available specially for this weekend, so explore these to help you create an amazing solution. Each dataset is stored in Microsoft Azure (in blob storage) and can be accessed via a unique URL.
Project PEACH Datasets
These datasets are provided by Project PEACH (https://code-4-health.org/peach), a large-scale open source community driven Data Science project by the Computer Science Department at University College London. It contains several related datasets around users lifestyle, shopping habits, and social media activity. It provides a very rich basis for exploring innovative Azure Machine Learning experiments to create compelling data science applications.
You can download the entire sample dataset (58MB) for initial development of your solution at – http://uclhack.blob.core.windows.net/peach/PEACHData.zip
The full datasets include the complete data you should use for training your machine learning solution. These can be up to 3GB in size. You should use these URLs to load data into Azure Machine Learning. We do not recommend that you download these to your local machine over WiFi, but you should use them directly in Azure Machine Learning Studio, using the URL provided.
Atmospheric Factors dataset
This includes data on users responses to different atmospheric factors, including as UV radiation, pollen count and air quality. It includes the user’s response (positive or negative) to each factor.
- Atmospheric factors documentation, http://uclhack.blob.core.windows.net/peach/AF_guide.pdf
- Atmospheric factors sample dataset (25MB), http://uclhack.blob.core.windows.net/peach/af_data.csv
- Atmospheric factors full dataset (around 1.2GB each)
This dataset contains users’ food diary entries, including what they have eaten, and any potential allergens in the food.
- Food diary documentation, http://uclhack.blob.core.windows.net/peach/fd_guide.pdf
- Food diary entries sample data (22MB), http://uclhack.blob.core.windows.net/peach/fd_fooddiary_data.csv
- Food diary allergy sample data (33MB), http://uclhack.blob.core.windows.net/peach/fd_allergy_data.csv
- Food diary sentiment feedback sample data (20MB), http://uclhack.blob.core.windows.net/peach/fd_sentimentfeedback_data.CSV
- Food diary full datasets, every 3 months (each file 1GB-3GB). We do not recommend that you download these to your local machine over WiFi, but you should use them directly in Azure Machine Learning Studio, using the URL provided.
- Food allergy full datasets, every 3 months (each file around 3GB)
- Food sentiment full datasets (around 1.75GB each)
Internet of Things wearables data
This dataset includes measurements of users’ body temperature, elevated heart rate and sleep quality.
- IoT documentation, http://uclhack.blob.core.windows.net/peach/iot_guide.pdf
- IoT body temperature sample dataset (1MB),http://uclhack.blob.core.windows.net/peach/iot_bodytemp_data.csv
- IoT heart rate sample dataset (3.5MB), http://uclhack.blob.core.windows.net/peach/iot_heartrate_data.csv
- IoT sleep quality sample dataset (10MB), http://uclhack.blob.core.windows.net/peach/iot_sleepquality_data.csv
These are the full datasets every 3 months (each file 50MB-500MB). We do not recommend that you download these to your local machine over WiFi, but you should use them directly in Azure Machine Learning Studio, using the URL provided.
- Full temperature log data (around 50MB each)
- Full heart rate data (around 160MB each)
- Full sleep quality data (around 500MB each)
Retail shopping data
This data includes what users purchased from different shops, including supermarkets, pharmacies, athletic apparel and outdoor gear.This could be tracked via store loyalty programs or OCR receipt scanning applications.
- Shopping data documentation, http://uclhack.blob.core.windows.net/peach/rs_guide.pdf
- Retail shopping sample dataset (60MB), http://uclhack.blob.core.windows.net/peach/rs_data.csv
Retail shopping full datasets every 3 months (each file 50MB-500MB). We do not recommend that you download these to your local machine over WiFi, but you should use them directly in Azure Machine Learning Studio, using the URL provided.
Social media data
This data is of users’ tweets.
- Social media data documentation, http://uclhack.blob.core.windows.net/peach/sm_guide.pdf
- Social media dataset (35MB), http://uclhack.blob.core.windows.net/peach/sm_data.csv
User profile data
This data contains the profile of every user in the PEACH dataset, including demographic, health and lifestyle data. It does not change over time.
- User profile documentation, http://uclhack.blob.core.windows.net/peach/up_guide.pdf
- User profile explicit information, http://uclhack.blob.core.windows.net/peach/up_explicit_data.csv
- User profile implicit information, http://uclhack.blob.core.windows.net/peach/up_implicit_data.csv
Nuffield Health Datasets
Anonymised datasets around lifestyle, profile and health of many users is available to you. This rich dataset provides exciting opportunities for you to build data science solutions around health and well-being. If you would like access to these datasets then please email kenji.takeda AT microsoft.com.