Statistics

profileAbrar Jahin
Document3.docx

Overview

This is a group assignment. It assesses your achievement of the following learning outcomes:

LO1: Present and describe information effectively

LO3: Draw conclusions about populations using sample information

LO4: Suggest ways to improve decision making processes

LO5: Obtain reliable forecasts of variables of interest

In this assignment, you are required to produce several statistical analyses for a given case study. The

analyses must be carried out using Microsoft Excel and Microsoft Power BI. For more details, refer to the

questions and tasks on the next page.

Submission items

You are required to submit the following:

a. One piece of written group report. Length: not more than 12 pages (A4-size, 11pt font size,

single space). The report should address all questions in this assignment.

b. One Excel workbook (.xlsx file), consisting of 7 worksheets. Each worksheet demonstrates the

working of questions 3 – 9.

c. One-page Power BI dashboard uploaded to Power BI service and shared with your Unit

Coordinator and the teaching support team. This addresses question 10.

d. One recording of PowerPoint presentation, saved as either .mp4 or .wmv format.

Items a, b and d must be compressed together into a single .zip file. Rename the .zip file with your group

number (e.g. Group1.zip)

Case study

The district of Springfield conducted an environmental study on freshwater reservoirs in its region. These include lakes, creeks, and public ponds. The study was instigated by recent concerns voiced by a local environmental protection group that fish in these reservoirs may have been contaminated by mercury that they are no longer safe for human consumption.

Mercury is a toxic metal that occurs naturally in the environment. At times, however, human activities may result in unnatural releases of mercury into water bodies, which could in turn enter fish. Consuming mercury-contaminated fish can lead to severe neurological and physiological disorders in humans.

Springfield’s officials identified 943 water reservoirs (including natural lakes) that have significant fisheries and are relatively accessible, based on information found in a previous survey carried out a decade ago. Of these, a simple random sample of 142 reservoirs were selected for the current study. Then, samples of fish were collected from only 122 reservoirs that contained a targeted group of predator fish species that the researchers are interested in. There are certain criteria that the researchers used for deciding the targeted fish species.

Fish were collected by angling, gill nets, trap nets, dip nets or beach seines. Up to 5 fish from the hierarchical order of preferred predator species were obtained. Care was taken to keep fish clean and free of contamination. In the laboratory, the fish fillet (muscle) of each fish was extracted and the fillets from each reservoir were ground up, combined and homogenised. Then, the tissue was subsampled to analyse the mercury levels.

In addition to collecting fish samples, the officials examined other possible factors that could contribute to elevated mercury levels in fish. They reckoned that this information could be useful for policy making by members of Springfield legislature.

Following completion of the field study, you were handed with a dataset containing 122 records of the studied reservoirs. Each record is described by the following variables:

Reservoir

: name of reservoir

Fish

: number of fish sampled

Mercury

: mercury level from sampled fish in parts per million (ppm)

Elevation

: reservoir’s elevation (in feet)

Drainage

: drainage area (in square miles). Drainage area is the area of land which collects and drains

the rainwater which falls on it, such as the area around a reservoir.

Surface Area : surface area of a reservoir (in acres)

Max. Depth : maximum depth of a reservoir (in feet)

RF

: Runoff Factor. Runoff is the amount of rainwater or melted snow which flows into rivers and streams. Higher runoff factors may lead to more surface waters from the reservoir watershed reaching reservoirs, influencing mercury concentration in fish.

FR

: Flushing Rate. Flushing rate is the number of times all water in a reservoir is theoretically exchanged during a year.

Dam

: Impoundment class (1 = no functional dam present; all natural flowage. 0 = at some manmade flowage in the drainage area)

RT

: Reservoir Type. Three types of reservoirs are identified (1 = oligotrophic. 2 = eutrophic. 3 = mesotrophic)

RS

: Reservoir Stratification. Two indicators are used (1 = reservoir is stratified. 0 = reservoir is not stratified). A reservoir is considered as ‘stratified’ if a temperature decrease of ≥1 degree per meter exists with depth.

Dataset

Dataset springfield.data is required to complete this assignment. It can be downloaded from the Assessments > Group Assignment (30%) section on Learnline.

Tasks

To complete this assignment, solve all problems below in your group. You should carefully consider the information given in the preceding case study and exclusively use the supplied dataset for analyses.

Problem A (10%): Data understanding and sampling

1. Describe the population, the sample and the levels of measurement in the given dataset.

2. Discuss the sampling technique used by Springfield officials and its implication(s) on the quality of collected data.

Problem B (20%): Descriptive statistics

3. Compute descriptive statistics for all eligible variables in the dataset. For quantitative variables, you must at least include the following statistical measures: mean, standard deviation, kurtosis, skewness, range, and five-number summary. Use appropriate statistics for categorical variables.

4. Using the computed statistics and appropriate charts, comment on the value distribution in each quantitative variable.

Problem C (30 %): Inferential statistics

Note: in solving questions 5 – 7, you must provide a justification for the chosen statistical method. By referring to Data Science Roadmap process model, show the step-by-step process of your statistical analyses in the submitted Excel workbook.

5. The national environmental agency determines that fish samples with more than 1.0 ppm are to be considered “Unsafe” because they exceed the safety limit for human consumption.

Springfield’s local environmental agency authority, however, considers samples with more than 0.4 ppm are at sufficient level of risk that they warrant further actions (e.g. issuing health advisory, banning fishing activities at selected reservoir, etc.). Based on the given dataset, what are the risk levels of reservoirs in Springfield? Should the local authority take any action?

6. There are concerns among industrialists who are benefiting from dams and dam constructions that there will be claims that high mercury levels in fish are related to the presence of dams in the reservoir’s drainage. Determine if the data support or refute this claim.

7. A colleague of yours wonders if the flushing rate of a reservoir could have anything to do with its sampled mercury level. Please answer her curiosity.

Problem D (10 %): Outlier analysis

8. By using outlier analysis, find if there is any reservoir with outlying mercury level.

9. With the outlier(s) removed, repeat the statistical analyses performed for Problem C Question 6. Report whether it results in a different public policy making. Finally, discuss common approaches in dealing with outliers.

Problem E (30 %): Data visualisation and storytelling

10. You are called to brief the members of Springfield legislature with results of your statistical analyses above. Your audience are not only concerned about the impact of mercury on community health and local tourism industries, but also need to know if certain public policies need to be immediately actioned. Note that most of your audience have no background in statistics.

Complete this task by:

· Producing a single Power BI dashboard containing relevant data visualisations. This dashboard must be uploaded to Power BI service and shared with your instructor and the teaching support team.

· Record a 5-minute PowerPoint presentation, utilising selected data visualisations found in your Power BI dashboard.

· In your report, explain in detail how you applied at least one principle of effective data visualisation and storytelling when planning the Power BI dashboard and in delivering the presentation.