KodeContest® Summer 2020
Challenge 2.3 - Making Data Useful

The challenge
“Data is the new oil” is a debatable modern adage. There is a lot of data online, but it is often distributed across multiple websites and it becomes difficult to integrate all the data and gain a unified understanding.
Rules
- Scrape multiple public websites for necessary data and integrate them into a unified dataset.
- Perform data cleaning:
- Standardize units (eg. convert different metrics like inches, cm to a single metric throughout the dataset).
- Standardize formats.
- Account for missing fields (include your assumptions on how you are handling missing fields along with your submission.) For example, if a certain field is missing data you can do one of the following: i) Assign a value of zero, ii) Assign the most common value for the field, iii) Assign a certain value of your choice based on your reasoning, or iv) Any other option based on your dataset and assumptions.
- The integrated dataset must contain a minimum of 5000 rows.
- The integrated dataset must contain a minimum of 20 attributes.
- Develop a web interface for users to explore your integrated dataset.
- Use D3 to present both tabular and graphical views of the data.
- The choice of which graphical view to present (e.g., bar chart/time series plot/pie chart etc.) depends on your dataset.
- Create a website to include the following:
- A write-up about the site’s purpose and intended audience.
- A list of the team members.
- Acknowledgements to thank the data sources.
- D3 interfaces that present your final data reports.
- The website must allow the user to download the full/selected data for individual use.
What to submit?
Submission guidelines will be posted closer to the deadline. Stay tuned!