Workshop #4 Exercise- #2.2-2.4: (3 points for each subquestion)
Workshop #4 Exer
Workshop #4 Exercise- #2.2-2.4: (3 points for each subquestion)
Workshop #4 Exercise- #3: (3 points for each subquestion)
Workshop #4 Exercise- #2.2-2.4: (3 points for each subquestion)
Workshop #4 Exer
Workshop #4 Exercise- #2.2-2.4: (3 points for each subquestion)
Workshop #4 Exercise- #3: (3 points for each subquestion)
I have attached the assignment that contain the questions long with the files wi
I have attached the assignment that contain the questions long with the files with the data needed (users and sessions) to complete this. For question 13, he strongly mentioned to
Filter the sessions file so it only includes action_type == “booking_request”.
Count booking_request grouped by user_id.
Filter the output on count_booking_request ==1.
Filter the sessions table using the user_ids you found in #3.
Count the action_detail == “view_listing” grouped by user_id.
Use summary() to find the summary statistics
The Fat data
The Fat data contains the age, weight, height, and ten body circumf
The Fat data
The Fat data contains the age, weight, height, and ten body circumference measurements for 252 men. Each man’s percentage of body fat was accurately estimated by an underwater weighing technique.
The data frame contains the following variables:
brozek: Percent of body fat using Brozek’s equation, 457/Density – 414.2
siri: Percent body fat using Siri’s equation, 495/Density – 450
density: Density (gm/cm3)
age: Age (yrs)
weight: Weight (lbs)
height: Height (inches)
adipos: Adiposity index = Weight/Height2 (kg/m2)
free: Fat Free Weight = (1 – fraction of body fat) * Weight, using Brozek’s formula (lbs)
neck: Neck circumference (cm)
chest: Chest circumference (cm)
abdom: Abdomen circumference (cm) at the umbilicus and level with the iliac crest
hip: Hip circumference (cm)
thigh: Thigh circumference (cm)
knee: Knee circumference (cm)
ankle: Ankle circumference (cm)
biceps: Extended biceps circumference (cm)
forearm: Forearm circumference (cm)
wrist: Wrist circumference (cm) distal to the styloid processes
You can access the data using the following statement: data(fat, package = “faraway”)
Question 1
Fit a regression model with the brozek variable (percent of body fat) as a response and the following six predictors: age, neck, abdom, thigh, forearm and wrist.
Show the summary. Which predictors are significant at the 0.05 level?
Question 2
Provide interpretation to the coefficient of each significant predictor
Hints:
Hints: See Lesson 3, Slide 49 and Slide 58.
Question 3
Compute the median value of the six predictors. Store the medians in a variable named x0 and show the values .
Hint: See Lesson 4, Slide 18.
Question 4
Construct a confidence interval of the mean response based on the median values that you stored in x0.
Hint: See Lesson 4, Slide 20.
Question 5
Construct a prediction interval of the next response value based on the median values that you stored in x0.
Hint: See Lesson 4, Slide 20.
Question 6
Which of the two intervals is wider?
Question 7
Construct a confidence interval of the outcome variable for a person with the following characteristics:
Age: 49 years
Neck: circumference: 40 cm
Abdomen: circumference: 95 cm
thigh: circumference: 60 cm
forearm: circumference: 31 cm
wrist circumference: 19.5 cm
Hints:
You can store the predictor values in a new variable named x1. Here is an example of such a variable:
x1 <- c("(Intercept)" = 1, age = 25, neck =34, abdom = 84, forearm = 25, wrist = 25)
Note that the intercept should be 1, but you will need to update the values of the predictors.
Please help me solve the problems in the following document, and provide the ans
Please help me solve the problems in the following document, and provide the answer with pdf form. Thank you!
R questions will provide more data once accepted.R workshop – Data Visualization
R questions will provide more data once accepted.R workshop – Data Visualization Exercise #1-#8 (2 points each)
Workshop #4 Exercise- #1
Exercise- #2.1
The data set is below this link, and the description is below, and you need to u
The data set is below this link, and the description is below, and you need to use kNN (regression or classification) on the data. You need to know how to use R-markdown and R to work on this report.
https://drive.google.com/drive/folders/1e17NNZPVSc…
1. Given the code below, provide the fix to it so error message does not appear.
1. Given the code below, provide the fix to it so error message does not appear.
> x <- c(8, 5, 9, NA)
> for(i in seq_along(x))
+ {
+ if(x[i] = 5) { cat(i, “n”) }
Error: unexpected ‘=’ in:
“{
if(x[i] =”
> }
Error: unexpected ‘}’ in “}”
2. Given the function below, provide the fix so error does not appear:
> printmessage <- function(x) {
+ if(x > 0)
+ print(“x is greater than zero”)
+ else
+ print(“x is less than or equal to zero”)
+ invisible(x)
+ }
But when we call:
> printmessage(NA)
Error in if (x > 0) print(“x is greater than zero”) else print(“x is less than or equal to zero”): missing value where TRUE/FALSE needed
For the submission send the corrected function in which you fix the problems above, so errors do not occur.
Please submit (i) a word file explaining in detail your answers to each question
Please submit (i) a word file explaining in detail your answers to each question (you can use screenshots of the R to explain your answers) AND (ii) an R file with a separation for each question. For each question, make sure you develop the model and present the simulation results – the R file should be self-explanatory. The assessment of your work will include both the accuracy and the clarity of your word file and the R Code.
Part 1: The Cheddar Cheese Study
In a study of cheddar cheese from the LaTrobe V
Part 1: The Cheddar Cheese Study
In a study of cheddar cheese from the LaTrobe Valley of Victoria, Australia, samples of cheese were analyzed for their chemical composition and were subjected to taste tests. Overall taste scores were obtained by combining the scores from several tasters.
The cheddar dataset has 30 observations on the following four variables:
taste: a subjective taste score
Acetic: concentration of acetic acid (log scale)
H2S: concentration of hydrogen sulfide (log scale)
Lactic: concentration of lactic acid
Use the following statement to access the data: data(“cheddar”, package = “faraway”)
Question 1.1:
Show the first six observations of the dataset.
Hint: Use the head() function. For example: head(cars)
Question 1.2:
Show descriptive statistics for each of the variables.
What is the mean subjective taste score?
Hint: Use the summary function. For example: summary(cars)
Question 1.3:
Show a histogram for each of the variables. Make sure to label the x-axis of each histogram.
Hint: use the hist function. For example: hist(cars$speed, xlab = “Speed (mph)”, main = “”)
Question 1.4:
Fit a regression model with taste as the response and no predictors.
What is the value of the intercept? What does it represent?
For Example:
lmodNoX <- lm (speed ~ 1, data = cars)
summary(lmodNoX)
Question 1.5:
Fit a regression model with taste as the response and Acetic as the only predictor.
Is the model statistically significant at the 5% level?
Hint: See Lesson 2, Slide 51.
Question 1.6:
Calculate the p-value of the entire model you created in Question 1.5 using the anova() function.
Hint: See Lesson 3, Slide 17.
Question 1.7:
Fit a regression model with taste as the response and the three chemical contents as predictors (Acetic, H2S, and Lactic).
Which predictors are statistically significant at the 5% level?
Hint: See Lesson 3, Slide 16.
Question 1.8:
Use the anova() function to recalculate the significance of the Acetic variable as shown in the output of Question 1.7.
Hints: Use the anova function to compare the three predictor model with a model that does not include Acetic. See Lesson 3, Slide 19.
Question 1.9:
Test the hypothesis that the coefficients of Acetic and H2S both equal 0 when Lactic is included in the model.
Should we reject this hypothesis?
Hint: Lesson 3, Slide 22.
Part 2: Study of Teenage Gambling in Britain
The teengamb dataset contains a survey conducted to study teenage gambling in Britain. The dataset has 47 observations and five variables:
sex: 0 = male, 1 = female
status: Socioeconomic status score based on parents’ occupation
income: income in pounds per week
verbal: verbal score in words out of 12 correctly defined
gamble: expenditure on gambling in pounds per year
Use the following statement to access the data: data("teengamb", package = “faraway”)
Question 2.1:Convert the sex variable into a factor and label the levels (male and female).
Hint: See Lesson 3, Slide 48.
Question 2.2:
Show the number of males and the number of females.
Hint: Use the summary function. See Lesson 1, Slide 52.
Question 2.3:
Fit a model with gamble as the response and income, verbal and sex as predictors.
Which variables are statistically significant at the 5% level?
Provide interpenetration to the coefficients of the significant variables.Hints: See Lesson 3, Slide 49 and Slide 58.
Question 2.4:
Use the confint function to produce 95% confidence intervals for the coefficients based on the same model.
Can you deduce which coefficients are significant at the level of 5% based on the intervals?Hint: See Lesson 3, Slide 26, the last code for the whole model.
a. How many people in this sample have breast cancer? What proportion?
b. What p
a. How many people in this sample have breast cancer? What proportion?
b. What proportion of respondents are missing information for household income?
c. Why do the proportions change when we use the option “, missing” in the tab command?
d. What is the mean BMI in the sample? What is the standard deviation? What are the minimum and maximum values?
e. Are there any participants missing information for BMI? How can you tell?
please answer the questions according to the document provided.