Answer 1
(a). Arrange the data of 20 student’s result
Student number 
Results 
1 
42 
2 
53 
3 
54 
4 
61 
5 
61 
6 
61 
7 
62 
8 
63 
9 
64 
10 
66 
11 
67 
12 
67 
13 
68 
14 
69 
15 
71 
16 
71 
17 
76 
18 
78 
19 
81 
20 
83 
Total 
1318 
Data has arranged in ascending order for getting correct results.
 Compute Mean, Median and Mode
To calculate mean , following formula will implement:
Σ xi /9Total of marks) =1318; n (number of students)=20
131820= 65.9 is the mean value of student’s results.
Median calculation
Median = N2+1=202+1=11th Item of the above table will be median
Median is 67
 Compute 1st and 3rd Quartile
1st Quartile = 14 N+1 = 20+1/4 = 5th Item = 61
3rd Quartile =34N+1=3420+1=634=15th item=71
 Compute and Intercept 90th Percentile
Formula = 90% of total number of students(observation) = 0.90*20 = 18th Item = 78
(b) Inferential Statistics:
Inferential statistics include select sample from the available observation, in order to identify or solve the issue through applying appropriate tests. These tests are also helpful in hypothesis testing and prove the outcome with valid numbers.
Answer 2
(i) Prepare Joint Probability Table

Applied for More than 1 University 
Age Group 
Yes 

No 

23 and Under 
207 
207/808*100=25.62% 
201 
201/1210*100=16.61% 
2426 
299 
299/808*100=37.0% 
379 
379/1210*100=31.32% 
2730 
185 
185/808*100 = 22.90% 
268 
268/1210*100=22.15% 
3135 
66 
66/808*100=8.17% 
193 
193/1210*100= 15.95% 
36 and over 
51 
51/808*100=6.31% 
169 
169/1210*100= 13.97% 
Total / Joint Probability(%) 
808 
100% 
1210 
100% 
(ii) Given that a student applied to more than 1 university, what is the probability that the student is 2426 years old.
Probability of student is 2426 years old = 299/808 =37.00%
 Is the number of universities applied to independent of student age? Explain
Student age is an independent variable against number of observation collected for application made for enrolment in more than one university at a time. Any student at any age can take enrolment of in more than one university or they can adopt only one university at a time.
(b)
Information provided
x 
f(x) 
10 
0.05 
20 
0.1 
30 
0.1 
40 
0.2 
50 
0.35 
60 
0.2 
Total 

X represent number of new clients for counselling cases in the year 2021.
Formula of calculating Expected value = 𝐸(𝑥) = 𝜇 = ∑𝑥 ∗ 𝑓(𝑥)
x 
f(x) 
(𝑥 − 𝜇) 2 

(𝑥 − 𝜇) 2*f(x) 
10 
0.05 
33 
1089 
54.45 
20 
0.1 
23 
529 
52.9 
30 
0.1 
13 
169 
16.9 
40 
0.2 
3 
9 
1.8 
50 
0.35 
7 
49 
17.15 
60 
0.2 
17 
289 
57.8 
Total 



201 
Expected Value= (10*0.05+20*0.1+30*0.1+40*0.2+50*0.35+60*0.2) =43
Formula of Variance of a discrete random variable 𝑉𝑎𝑟(𝑥) = ∑(𝑥 − 𝜇) 2 𝑓(𝑥)
Variance = 201 (calculation shows in table)
Answer 3
 Formulate Hypothesis :
Problem statement: Population annual expenditure on prescription drugs per person is lower in the Midwest than the Northeast.
Hypothesis Statement:
Ho: µ ≤ $838 or
Ho: µ = $838
Ha: µ > $838
Problem statement can test on one tail test from left tail as it requires testing of lower limit.
One (Left) tail test
Formula:
 Calculate value of relevant test statistics and P value
Sample Mean (x) = $745
Null Hypothesis Mean = $838
SD = 300
Sample size = 60
Applying Formula (745838)/300/sqrt(n)
Z= 93/38.75 = 2.40
From the table given of Z score , at significance level of 0.05 ,
P value = 0.0071
 Based on the p value in part (III), at 99% confidence level, decide the decision criteria.
If the confidence level is 99% then there is 1% of significance level for this problem and at this level the critical value is 2.326 , for this Zscore is 2.4 which is less than critical value (2.326> 2.4). Null hypothesis shall be rejected.
 Make the conclusion Based on the analysis.
As per rejection of null hypothesis, it is concluded that the prescribed drugs expenditure is not lower in Midwest as comparison to Northwest.
Answer 4
 State the null and alternative hypothesis for single factor ANOVA to test for any significant difference in the mean price of gasoline for the three brands.
Hypothesis
H0 = µ1= µ2= µ3
H1 = µ1≠µ2≠ µ3
(ii) State the decision rule at 5% significance level.
Reject the H0 id t stat > Z critical value, Otherwise accept the null hypothesis
(iii) Calculate the test statistics

A 
B 
C 

3.77 
3.83 
3.78 

3.72 
3.83 
3.87 

3.87 
3.85 
3.89 

3.76 
3.77 
3.79 

3.83 
3.84 
3.87 

3.85 
3.84 
3.87 

3.93 
4.04 
3.99 

3.79 
3.78 
3.79 

3.78 
3.84 
3.79 

3.81 
3.84 
3.86 
Sample Mean 
3.811 
3.846 
3.85 
Varience 
0.003349 
0.004844 
0.00382 
ANOVA one way test Formula
Formula F= MSTR / MSE
MSTR = 𝑆𝑆𝑇𝑅 / 𝑘 – 1
MSE = SSE /𝑛r – k
𝑥Ӗ= (3.81 + 3.84 + 3.85)/3 = 3.83
SSTR= 10(3.81 3.83)2 + 10(3.843.83)2 + 10(3.853.83)2 = 0.009
MSTR = 0.009/ (31) = 0.0045
Pvalue and critical value approaches
Value of test statistic
SSE = 9(0.003) +9(0.005) + 9(0.004) =0.108
MSE = 0.108/(303) = 0.004
F= 0.0045/0.004 =1.125
ANOVA Table
Source of variation 
Sum of Squares 
Degrees of Freedom 
Mean Square 
F 
P value 
Treatment 
0.009 
2 
0.0045 
1.125 
0.044 
Error 
0.108 
27 
0.004 


Total 
0.117 
29 
0.0085 


P value calculation
Here Numerator df = 2; Denominator Df = 27 then the value of F at 0.01 = 5.49
Decision on the basis of test
The pvalue < .05, So null hypothesis shall be rejected
Decision as per critical value approach
Based on an F distribution with 2 numerator d.f. and 27 denominator d.f., F.05 = 3.35.
Reject H0 if F < 3.35
Here F = 1.125 <3.35 which is evidence for rejection of null hypothesis.
(d) Based on the calculated test statistics decide whether any significant difference in the mean price of gasoline for three bands.
The value of F is 1.125 which is lower than F critical value this means hypothesis has been rejected that means there is significant difference in the mean price of gasoline in all the three brands.
Answer 5
 Complete the missing entries from A to H in this output
A= R Square = SSR/SST = 35250755.68/ 42699148.82 = 0.82
B= Observation = 50 (provided in Answer)
C= residual = Total Regression = 492 = 47
D= 42699148.827448393.14 = 35250755.68
E= SSRegression / dfreg. = 35250755.68/2 = 17625377.8
F= SSR /(503) = 7448393.148/47 = 158476.45
G= 17625377.8/158476.45 = 111.217647
H= Coefficient of income /Standard error of income = 8.36
 Estimate the annual credit card charges for a threeperson household with an annual income of $40,000
To estimate charges of credit card , intercept, household value and size has been considered from the ANOVA table.
The annual credit card charges for three person family is $3700 where annual income is $40000.
 Did the estimated regression equation provide a good fit to the data? Explain
No, The reason behind the same is high variability between two variables X and Y which fails in establishing good fit to the data.
Answer 6
 Using linear trend equation forecast the sales of face masks for October 2020
Month 
Sales ($) 
1 
17000 
2 
18000 
3 
19500 
4 
22000 
5 
21000 
6 
23000 
Linear Trend Equation = Y= Mx+B
M= Y2Y1X1X2 = 2300021000/65 = 2000/1 = 2000
Y = Mx+B
Y= ?
X=1
B= 23000
Y = 2000*1+23000
Y = 2000+23000= 25000
$25000 will be the forecasted sale in the month of October.

Sales 
Weight 
Weighted Sale 
July 
22,000 
0.2 
4400 
August 
21,000 
0.3 
6300 
September 
23,000 
0.5 
11500 
Total 


22200 
So, the expected sale for the next month will be $22200
Question 1
Quantitative data are data in graphical form like figures percentages etc using figures the analyst analyzes data and assumes that the calculations can produce an accurate conclusion which can be applied to any wider community. In another hand qualitative analysis examines individual encounters extensively with the goal of defining and interpreting meaning through language narration or visual evidence by creating themes specifically for that group of participants (Kaushik and Mathur, 2014). Qualitative analysis provides knowledge only about the individual cases examined, although theories are often more general assumptions. Quantitative techniques may be used to confirm which of those assumptions is valid.
 The below options are derived from the given question
The type of information mentioned above is variable and the change in the new project of the rail road is the explicit data. As per the measurement scale, it can be see that there are two levels of variable here – which are Yes/ Noand has no specific ranking or order. Hence the measurement scale is nominal.
 The graphical presentation mentioned below specifieswhether the general support is in favor oris againstthe increase in the tax development for commencing the new project of rail road. The best presentation for such categories is reflected through a bar chart hence we have used it here.
Looking at chart above, it can be seen that maximum amount of votes that are reflecting are not in favor of the law which means that lesspeople are there who are in support of paying thedevelopment tax which is required to start the new project of the rail road.
Question 2
To describe how much time is spent in a weekly meeting data of 25 observations will be used.
 Mention below is the summary of total time spent by 25 CEO in a meeting weekly, time calculated is in hours:
 For preparing the data of frequency distribution along with frequency percentage a two hours class width is used.
Highest value = 23, Lowest value = 12,
 Mentioned below is the diagram used for calculating time spent by 25 CEOs in weekly meetings. The time is calculated in hours. As per the graphical presentation mention below,it can be seen that mostly the value is falling towards the left side .This shows that the data is tilted negatively.
Question 3
The question in the research that we are going through has multiple methods to collect the data and similarlyfor sampling also there are different methods which can be used. Let us see in the research which is the most suited method for sampling and collection.
 Investigate the voting purpose of the Australian voters in the forthcoming election:
Data collection: Since, many people will be involved the questionnaires method will be appropriate to collect the data as to a certain extent in the sample of large size close ended answers are used.
Definition of Random Sampling: Method in which a sample is taken randomly to draw a conclusion without being bias towards anything is Random Sampling method.
Significance: Any research in which large amount of people or data needs to be investigated, Random sampling is the best method to be used as it’s not possible to investigate the entire lot for drawing a conclusion. Just like in this case it’s not possible to get the views of the entire population of Australia.
Sampling:The random sampling method can be considered for this research, as it would not be possible to travelall of America enquiring each citizen regarding theirchoice. For drawing the concluding we can randomly pick any American citizenand question them.
 Find out reasons for the top 4 banks who are not giving its borrowers the complete interest cuts that the Australian reserve bank has introduced :
Data collection:To collectthis data we can use the interview method, where banking and financial expert can be interview to find out the likelyreasons.
Definition of Convenience sampling: A method in which a sample is taken from people or group who are related to the research and can be approached easily is Convenience sampling method. The only criteria in it are that people should be available and agree to participate.
Significance: In this scenario since the information that has to be gathered is related to banks and only people from banks can provide the inside information, Convenience sampling is the best method that can be used to gather the information as according to this method only those people who are related to the research, have information about it and are easily available to be investigated for drawing conclusion.
Sampling:Convenience sampling is another way to carry out this research. For this objectiveany expert in this field who is easy to approach can be interviewed to find out about the sentiments of the banks (Palinkas,et. al., 2015).
 Comprehend the demographic portfolio of Melbourne community which existat city council of Hume:
Data collection:For understand this; we can do a survey in that area and look for the required details from people residing over there.
Definition of stratified sampling:It is a method in which small groups are made out of the entire population to do the sampling. People having common characteristics are chosen to form a group and from this group samples are taken randomly.
Significance:In this scenario specific information is required from people within the community and to gather such information stratified sampling is best suited as according to this method groups are made of only those people who are within the specified community and share common interest once the group is formed then from within that group samples are take randomly to get to a conclusion.
Sampling: For this research, Stratified sampling method should be used,thatincludespeoplewho have similar characteristics to be gathered into subgroups,so that it becomes easy to know their demographic portfolio.
 Take adults opinionto find out whether use of marijuana in Australiashould be legalized.
Data collection:Focus Group method can be used in this, as views can be taken from the adults on legalizing the use of marijuana in Australia. As the method involves surveying, interviewingand observing the focusedgroup,hence the adult group can be survey, interview and observe here.
Sampling:In this research random sampling method can be implemented,which means taking opinionsof randomadults.
 Evaluatethe standardchildren’sage of Melbourne city :
Data Collection:Records and Documents are what could be used to find out the standard age of the children. These records and documents containreports of the census or something similar which can help in finding the standard age of the children.
Sampling:We can use the Stratified sampling method,according to which children’s date is separated from entire data that contains the completeinformation. That’s the reason Stratified sampling method should be used.
Question 4
 To find out whether watching television is the cause of increase in weight or not we can use the Scatterplot diagram in which we can layout the connection between spending time in front of the Television and becoming Overweight. In the overweight data negative number will be reflected for any underweight child.
 To calculate correlation coefficient amongst the variables we can use the toolpakof data analysis in excel.
Definition of Correlation coefficient:It is a statistical measureof strength to find out the relationship between relative activities of two variables. The range of the value falls between 1.0 and 1.0. Any number that is calculated and is more than 1.0 or less than 1.0 then the correlation is measured incorrectly. The value comes up as 1.0 then it reflects a negative correlation and if the value come up as 1.0 then it reflects a positive correlation. If the value is 0.0 then it reflect that there is no linear relationship amongst the movement of both the variables.
Correlation coefficient value consist of 0.89119 which means that there is a positive and strong relationship between Overweight (Kg) and television (hours). It meansthat as the hours of watching Television increase, there will be an increase in overweight andvice versa reflecting the relationship amongst the linear changes.
 Regression output in excel is mentioned below:
The likely regression equation:
Overweight =11.069 + 0.434*Television
Explanation:
Intercept: Average Overweight would be expected to go down by (average) 11.0691 if hours of television are zero.
Slope:The overweight will increase by (average) 0.434 Kg if one unit is increased in Television(hours)
 0.7942 is the coefficient value of determination, whichmeans the variation of 79.42% inoverweight isdescribed by the total hours spent in front of the televisionand hence this is reasonablya model that fits strong.
 As per the outcome of the summary mentioned above, 7.083 is the value of T and 0.000 is the value of P. Since the value of P is not more than 0.05, Variable is significant statistically and hence has a relationship between overweight and television.
 As per the outcome of the summary mentioned above, Standard error value consists of 1.5894. The value of P in the Fstats consists of 0.000. Asthe value of P is not more than 0.05 of significant level, the complete model is significant statistically;also the Rsquare value is used to determine the model’s fitness. Because 0.7942 is the value which means that this model is a good fit.
Reference:
Kaushik, M. and Mathur, B., 2014. Data analysis of students marks with descriptive statistics. International Journal on Recent and Innovation Trends in Computing and Communication, 2(5), pp.11881190.
Kim, T.K., 2017. Understanding oneway ANOVA using conceptual figures. Korean journal of anesthesiology, 70(1), p.22.
Palinkas, L.A., Horwitz, S.M., Green, C.A., Wisdom, J.P., Duan, N. and Hoagwood, K., 2015. Purposeful sampling for qualitative data collection and analysis in mixed method implementation research. Administration and policy in mental health and mental health services research, 42(5), pp.533544.
Suri, H., 2011. Purposeful sampling in qualitative research synthesis. Qualitative research journal, 11(2), p.63.
Taherdoost, H., 2016. Sampling methods in research methodology; how to choose a sampling technique for research. How to Choose a Sampling Technique for Research (April 10, 2016).