Data science r – rstudio


3.Using the iris dataset:
a) combine the Setosa and Versicolor into group “0” and label the Virginica to “1”. Create a new variable called iris$Group with the 0 or 1 labels,   

b) build a logistic regression model using any available data that will predict the observation being Virginica ( value of 1 in Group variable),
c) calculate the probability of a new plant being a Virginica for the following parameters: 

Sepal.Width =5 Petal.Length =10 Petal.Width =7 Sepal.Length=9 

4.Using the kyphosis dataset:
a) convert the kyphosis$Kyphosis variable to numeric, assign a 1 to present and a 0 to absent,
b) build a logistic regression using all other variables and estimate the probability of the observation having a “present” hyphosis. What can you say about the coefficients? Are the significant?
c) calculate the probability of kyphosis being “present” for the following observation: Age=50, Start=10, Number=5. 

5. Using all the single variable regressions from Exercise 1, test if the variable pairs are homoscedastic or heteroscedastic. Plot your findings. Using the plot(x=my_x_variable, y=my_y_variable, type=”p”) function. Use my_data$variable_name to define x and y variables in the function.