# 4110/7110 Lecture 28 -- UNIVATIATE DATA # R functions used in this lecture #par(mfrow=c(2,2)) #windows() #x11() table # frequency table factor # to encode a vector as a factor as.factor # to coerce its argument to a factor. barplot # bar plot boxplot # box plot bwplot # trellis box plot(s) pie # pie plot dotchart # dot plot stem # stem and leaf plot hist # histogram plot histogram # trellis histogram plot(s) plot # generic plot function rnorm # random number from standard normal mean # calculate the mean median # calculate the median range # calculate the minimum and maximum sd # calculate the standard deviation var # calculate the variance length # calculates the count quantile # compute the quantiles IQR # compute the Interquartile range cor # compute correlations summary # generic function provides a summary results of an object fivenum # min, lower hinge, Median, upper hinge, max apropos # returns a character vector giving the names of # all objects in the search list matching what. scan() # read data data() # list all available data sets attach # attaching a data frame (or list) to the search path detach # detaching a data frame (or list) to the search path ######################################################## # Categorical Data x=c("Yes","No","No","Yes","Yes") table(x) x factor(x) as.factor(x) # Factors 1:5 mean(1:5) factor(1:5) mean(factor(1:5)) letters[1:5] factor(letters[1:5]) factor(LETTERS[1:5]) # Bar Charts beer = scan() 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1 barplot(beer) # this isn't correct barplot(table(beer)) # Yes, call with summarized data barplot(table(beer)/length(beer)) # divide by n for proportion par(mfrow=c(3,1)) barplot(beer) barplot(table(beer)) barplot(table(beer)/length(beer)) plot(1:10) ?abline abline(a=0,b=1)# Add a Straight Line to a Plot through the 1:10 points abline(a=1,b=1, col="red") windows() # or x11() plot(rnorm(10)) abline(-1.5,1)# Add a Straight Line to a Plot through the 1:10 points abline(h=0, col="green")# for the residual plot # Pie Charts # par(mfrow=c(2,2)) ?pie ?pie beer.counts = table(beer) # store the table result pie(beer.counts) # first pie -- kind of dull names(beer.counts) = c("domestic\n can","Domestic\n bottle", "Microbrew","Import") pie(beer.counts) # prints out names pie(beer.counts,col=c("red","green","yellow","white")) # now with colors # Dot Charts ?dotchart # plot of either a vector or matrix of numeric values VADeaths # we are going to use dataset VADeaths: Death rates per 1000 in Virginia in 1940. is.vector(VADeaths) is.matrix(VADeaths) dotchart(VADeaths, main = "Death Rates in Virginia - 1940") dotchart(t(VADeaths), xlim = c(0,100), main = "Death Rates in Virginia - 1940") # t: matrix transpose t(VADeaths) ################################################################# # Numerical Data sals = scan() # read in with scan 12 0.4 5 2 50 8 3 1 4 0.25 mean(sals) # the average var(sals) # the variance sd(sals) # the standard deviation median(sals) # the median summary(sals) data=c(10, 17, 18, 25, 28, 28) summary(data) quantile(data,.25) quantile(data,c(.25,.75)) # two values of p at once mean(sals,trim=1/10) # trim 1/10 off top and bottom mean(sals,trim=2/10) IQR(sals) # Stem-and-leaf plots scores = scan() 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5 stem(scores) sals = c(12, .4, 5, 2, 50, 8, 3, 1, 4, .25) # enter data cats = cut(sals,breaks=c(0,1,5,max(sals))) # specify the breaks cats # view the values table(cats) # organize levels(cats) = c("poor","rich","rolling in it") # change labels table(cats) # Histograms x=scan() 29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 par(mfrow=c(2,2)) hist(x) # frequencies hist(x,probability=TRUE) # proportions (or probabilities) rug(jitter(x)) # add tick marks hist(x,breaks=10) # 10 breaks, or just hist(x,10) hist(x,breaks=c(0,1,2,3,4,5,10,20,max(x))) # specify break points # Boxplots data() # list all available data setsAirPassengers women # data 'women' will be used names(women) mean(height) mean(women$height) attach(women) # to access the names above mean(height) boxplot(height,main="Average Height of \n American Women",horizontal=TRUE) boxplot(weight,main="Average Weight of \n American Women",horizontal=TRUE) plot(women, xlab = "Height (in)", ylab = "Weight (lb)", main = "women data: American women aged 30-39") detach(women) #Density Functions data(faithful) par(mfrow=c(1,1)) attach(faithful) # make eruptions visible hist(eruptions,15,prob=T) # proportions, not frequencies lines(density(eruptions)) # lines makes a curve, default bandwidth lines(density(eruptions,bw="SJ"),col='red')# Use SJ bandwidth, in red detach(faithful)