Site icon UK Essayz

RStudio Data Analysis

Write the code in RStudio to solve the questions. Post all the codes in a document. 

1. In a study examining smoking and lung cancer, a random sample of men between the ages of 55 and 60 was obtained. The smoking and disease status of each sampled subject was ascertained. For each subject, a 1 is assigned if the subject had lung cancer (case) and a 0 if not. Similarly, a 1 indicates that a subject is a smoker and a 0 indicates a nonsmoker. The data are found in the Excel file LungCancer. ¢ Read the data into R, and use table() function to produce a contingency table summarizing these data. ¢ Assuming that there is no association between smoking and lung cancer, compute a table of expected counts. ¢ By hand, compute the observed value of the test statistic for testing association between lung cancer and smoking. ¢ Assuming there is no association, what is the distribution of the test statistic? ¢ Using R, compute the p-value for a test of association, and give a detailed conclusion based on the p-value and a comparison of the tables observed and expected counts. 2. The following data are from a study examining the incidence of tuberculosis in relation to blood groups in a sample of Eskimos. It is of interest to determine if there is any association between the disease and blood group within the ABO system. Severity O A AB B Moderate-advanced 7 7 7 13 Minimal 27 34 12 18 Not Present 55 52 11 24 ¢ Assuming that there is no association between disease and blood group, compute a table of expected counts. ¢ By hand, compute the observed value of the test statistic for testing association between disease and blood group. ¢ Assuming there is no association, what is the distribution of the test statistic? ¢ Using R, compute the p-value for a test of association, and give a detailed conclusion based on the p-value and a comparison of the tables observed and expected counts. 
4. The file growth gives data on the height of a white spruce tree measured annually for 50 years. Letting Yt denote the height of the tree at year t > 0, we consider describing the growth of the tree over time with a non-linear model Yt = f(t) + t , t iidˆ¼ N(0, σ2 ). Three growth curves are considered for f(t) (a) Logistic: f(t) = a/(1 + b ˆ— exp{ˆ’ct}) (b) Gompertz: f(t) = a exp{ˆ’b exp{ˆ’ct}} (c) Von Bertalanffy: f(t) = a ˆ’ a exp{ˆ’b(t + c)} ¢ Fit all three models using the non-linear least squares function nls() in R. Explain how you are choosing the starting values for nls() in each case. Produce a figure depicting the estimated curves all on the same plot, along with the observed data. Be sure to include a legend to distinguish the different curves. ¢ For each of the three models, give a 95% confidence interval for limt†’ˆžf(t). What does this represent? ¢ Select the best of the three models, and plot an estimate of the derivative df(t) dt , which represents the rate of growth over time.