Practice Problems

1.9.7 (Page 71)1.65, 1.66

Notes

For every variable we talk around the variable’s distribution, which means a description of what worths the variable takes, and how frequently it bring away those values. Variables are visualized differently depending on their type.

You are watching: A categorical variable is only called bimodal under what circumstances?

Visualizing Categorical Variables

Frequency TableA table reflecting each feasible value, together with its frequency i.e. the counting of that occurencies. One can additionally include relative frequencies.Pie ChartA circular shape is separated in parts proportional come the loved one frequency of every value. An excellent for reflecting relation that each component to total.Bar ChartA rectangular bar for each value, whose elevation is proportional to the frequency. Good for comparing frequencies of values to each other.Pareto ChartA bar chart wherein the values have actually been ordered native most constant to the very least frequent.
*

Graph varieties for one categorical variable


General HealthFrequencyRel. Frequency
poor6773.385%
fair201910.095%
good567528.375%
very good697234.860%
excellent465723.285%

Activity: looking at this graphs and also the table, decide because that each that the adhering to statements i m sorry graph best illustration it (makes it much easier to notice):

Approximately 1 in 4 people answered “good” for their health and wellness level.More world answered “good” than those the answered “excellent”.The distinction in count in between “fair” and “poor” is around the very same as that between “good” and also “very good”.There are roughly 3 times much more people who said “good” 보다 those who stated “fair”.About 10% the the population answered “poor”.About 1 in 3 civilization answered “very good”.

Visualizing Scalar Variables

Visualizing scalar variables is an ext challenging. There are too countless individual worths to consider and also just presenting each value is not all the helpful. For instance telling someone precisely how countless respondents have actually a elevation of 62 inches, exactly how countless are in ~ 63 inches and also so top top is overwhelming and not really valuable in state of establishing trends. Us need means to to organize the info into an ext digestible nuggets.

Here are few of the standard tools for visualizing quantitative information.

SummariesNumerical recaps can offer us some limited but easy-to-work-with information. Frequency tables rotate out to be also unwieldy in this case. We will watch a variety of these summaries, consisting of the mean, median, standard deviation, and others.HistogramValues are broken into same spaced intervals. Draw one bar per interval whose elevation is proportional to the frequency of worths in that range.Stem-Leaf PlotUseful for certain varieties of values. Use an initial 1-2 digits for “stem”, then add one value via that is “leaf” ~ above the correct stem row.Box-plotVisual depiction of the “five number summary” the we will certainly talk around later.Density plotA continuous line that defines the data a little bit like a histogram, only much more precisely. We will certainly not be utilizing them in this course, but they room out there and also are useful.
*

Graph species for one quantitative variable


We often consider the values of a quantitative variable individually for each group defined by a categorical variable, and also in this instance variants the the over graphs have the right to help.


*

Graph types for a quantitative variable damaged down by a categorical variable


When visualizing scalar variables, over there is part terminology we use and patterns we look for:

ModeA setting refers to a unique section the the data the “stands out” together a spike in the graph. It require not it is in a solitary value, much more of a tendency for worths to concentrate approximately that point. A graph with a single mode is referred to as unimodal, one with two settings is dubbed bimodal. As soon as multiple modes are present, they end up being the main characteristic the the dataset.TailA term referring to the two ends that the data. A long tail shows that the data on that side is spread out and “goes on for a while”.Skewness

A skewed left circulation is one wherein the left tail is longer. This represents a concentration the data (higher bars) come the right.

A skewed right distribution is one whereby the ideal tail is longer. This represents a concentration the data (higher bars) to the left.

A symmetric distribution has both tails be about the same. If friend look in the middle of a symmetric distribution, then the 2 sides approximately the center should be (close to) mirror photos of every other.

OutliersAny values that seem come deviate native the all at once pattern are referred to as outliers. Part times this are just values that are too much from the rest. But some time they deserve to be outliers for various other reasons.Look for reasonsWhenever you think about a feature, constantly look for an explanation because that it. Is there an excellent reason why the distribution is skewed right? What are those outliers, do they do sense?Example: ar Data

The complying with graph contains one data-point for each county in the US. The value is the percent of female population in that county. Talk about the pattern of the distribution, and also provide possible explanations.

See more: Game Of Thrones War Pigs - War Pigs Game Of Thrones Edition


*

Female population proportion in county


What is the as whole pattern? Does the make sense?Do the average/typical worths make sense?Are there deviations indigenous the pattern? What might explain them?What more questions could we want to ask? What would we have to do in stimulate to obtain answers?

The adhering to is a graph that the percent of african american populace in the state. Prize the very same questions.


*

African-American population proportion in county


What is the overall pattern? Does the make sense?Do the average/typical worths make sense?Are over there deviations from the pattern? What could explain them?Why go this graph differ from the previous?What further questions could we want to ask? What would we must do in stimulate to obtain answers?