STATISTICS AND ANALYTICS
TOPIC/SUBTOPICS: -
a Definition of data and
classification (qualitative
quantitative discrete and
continuous data).
b Data collection tools
iv) Questionnaires.
v) Survey.
vi) Interviews.
vii) Focus group discussion.
1.3 Data cleaning.
Data Collection:
- Statistical Data Collection: This involves gathering information from various sources to analyze and draw conclusions. It can be done through surveys, experiments, observations, or existing datasets.
- Distinguishing Data Types:
- Qualitative Data: Describes qualities or characteristics (e.g., colors, opinions, categories). It’s non-numeric and often represented as labels or categories.
- Quantitative Data:
- Discrete Data: Consists of distinct, separate values (e.g., number of cars, students in a class).
- Continuous Data: Represents measurements on a continuous scale (e.g., height, temperature).
- Problem Statement for Data Collection:
- Clearly define the problem or research question you want to address.
- Specify what data you need to answer that question.
- Root Cause Analysis:
- Collect data that helps identify the underlying reasons behind a problem or issue.
Data Collection Tools:
- Questionnaires: Structured surveys with predefined questions. Useful for collecting standardized responses from a large audience.
- Surveys: Gather information from individuals through interviews or online forms.
- Interviews: Conducted one-on-one or in groups to explore in-depth insights.
- Focus Group Discussions: Bring together a small group to discuss specific topics.
Data Cleaning:
- Data Preprocessing: Refine and prepare data for analysis.
- Steps:
- Handle missing values (impute or remove them).
- Remove duplicates.
- Standardize formats (e.g., date formats, units).
- Correct inconsistencies (e.g., typos).
- Address outliers.
- Ensure data quality and reliability.
TOPIC/SUBTOPICS: -
a Descriptive statistics
viii) Data tabulation (frequency table
viii) Data tabulation (frequency table
ix) Relative frequency table.
b Grouped data
x) Bar graph
xi) Pie chart
xii) Line graph
xiii) Frequency polygon
xiv) Frequency curve
xv) Relative frequency polygon
xvi) Histograms
xvii) Box plot
xviii) Leaf stem plot to be done
in Microsoft excel
Bar Chart:
- A bar chart displays categorical data using rectangular bars. Each bar represents a category, and the height of the bar corresponds to the value.
- To create a bar chart in Excel:
- Enter your data in a spreadsheet.
- Select the data range.
- Go to the “Insert” tab.
- Choose “Bar Chart” and select the desired type (e.g., clustered, stacked).
Pie Chart:
- A pie chart shows the proportion of different categories within a whole.
- To create a pie chart in Excel:
- Enter your data.
- Select the data range.
- Go to the “Insert” tab.
- Choose “Pie Chart.”
Histogram:
- A histogram displays the distribution of continuous data.
- To create a histogram in Excel:
- Prepare your data.
- Go to the “Data Analysis” tool (requires enabling the Analysis ToolPak).
- Select “Histogram” and specify the input range and bin range.
Frequency Polygon:
- A frequency polygon is a line graph that shows the frequency distribution of continuous data.
- Create a histogram first, then connect the midpoints of each bin with straight lines.
Relative Frequency Table:
- Calculate the relative frequency (proportion) for each category in your dataset.
Box Plot:
- A box plot (box-and-whisker plot) displays the distribution of data, including median, quartiles, and outliers.
- Use the “Insert” tab and choose “Box and Whisker Plot.”
Leaf-Stem Plot:
- A leaf-stem plot is used to display individual data points.
- Create a column for the leaf (last digit) and another for the stem (remaining digits).
TOPIC/SUBTOPICS: -
a. Determination of central tendencies
Range, Mean, Mode and Median for the
data in Microsoft excel.
b. Determination of absolute measures
of dispersion for data like range
quartile deviation, mean deviation,
standard deviation and variance in
Microsoft Excel.
c. Skewness and kurtosis graphs in
Microsoft excel and interpretations of
results.
Central Tendencies:
- Range: The difference between the maximum and minimum values in your dataset.
- Mean (Average): Sum of all values divided by the number of values.
- Mode: The most frequently occurring value.
- Median: The middle value when the data is sorted.
Measures of Dispersion:
- Range: Already discussed (max - min).
- Quartile Deviation (Interquartile Range): Difference between the first quartile (Q1) and the third quartile (Q3).
- Mean Deviation (Average Deviation): Average of the absolute differences between each value and the mean.
- Standard Deviation: Measures the spread or dispersion of data around the mean.
- Variance: Square of the standard deviation.
Skewness and Kurtosis:
- Skewness: Measures the asymmetry of the distribution.
- Positive skew: Tail extends to the right (mean > median).
- Negative skew: Tail extends to the left (mean < median).
- Kurtosis: Measures the peakedness or flatness of the distribution.
- Leptokurtic: High peak (more data in tails).
- Mesokurtic: Normal distribution.
- Platykurtic: Flat peak (less data in tails).
- Skewness: Measures the asymmetry of the distribution.
To calculate these in Excel:
- Range:
=MAX(data) - MIN(data)
- Mean:
=AVERAGE(data)
- Mode: Use the
MODE.SNGL
function. - Median:
=MEDIAN(data)
- Quartile Deviation:
=QUARTILE(data, 3) - QUARTILE(data, 1)
- Mean Deviation: Calculate the absolute deviations from the mean and find their average.
- Standard Deviation:
=STDEV.P(data)
- Variance:
=VAR.P(data)
- Skewness:
=SKEW(data)
- Kurtosis:
=KURT(data)
Remember to replace “data” with your actual dataset. Interpretation depends on context and the specific problem you’re analyzing.
TOPIC/SUBTOPICS: -
4.1 Introduction to PYTHON.
4.2 Syntax of PYTHON.
4.2 Syntax of PYTHON.
4.3 Comments of PYTHON.
4.4 Data types of PYTHON.
4.5 Variables of PYTHON.
4.6 If-else in PYTHON.
4.7 Loops in PYTHON.
4.8 Arrays and functions in PYTHON.
Introduction to Python:
- Python is a high-level, interpreted programming language known for its readability and versatility.
- Key features:
- Simple and expressive syntax.
- Extensive standard library.
- Dynamic typing.
- Object-oriented and functional programming support.
- Widely used in web development, data science, automation, and more.
Syntax of Python:
- Python uses indentation (whitespace) to define code blocks (e.g., loops, functions).
- Statements end with a newline character (no semicolons).
- Example:
if condition: # Indented block print("Hello, Python!")
Comments in Python:
- Comments provide explanations within code.
- Single-line comments start with
#
. - Multi-line comments use triple quotes (
'''
or"""
). - Example:
# This is a single-line comment """ This is a multi-line comment """
Data Types in Python:
- Common data types:
- Integers (
int
): Whole numbers (e.g., 42). - Floating-point numbers (
float
): Decimal numbers (e.g., 3.14). - Strings (
str
): Text (e.g., “Hello, World!”). - Boolean (
bool
): RepresentsTrue
orFalse
. - Lists, tuples, dictionaries, sets: Collections of data.
- Integers (
- Example:
age = 25 name = "Alice" is student = True
- Common data types:
Variables in Python:
- Variables store data values.
- No need to declare types explicitly.
- Example:
x = 10 y = "Hello"
If-Else Statements in Python:
- Control flow based on conditions.
- Example:
if x > 5: print("x is greater than 5") else: print("x is not greater than 5")
Loops in Python:
- For loop: Iterates over a sequence (e.g., list, string).
while
loop: Repeats as long as a condition is true.- Example:
for i in range(5): print(i)
Arrays and Functions in Python:
- Arrays (Lists): Ordered collections of elements.
- Functions: Reusable blocks of code.
- Example:
def greet(name): return f"Hello, {name}!"
0 Comments