STATISTICS AND ANALYTICS

TOPIC/SUBTOPICS: -

a Definition of data and classification (qualitative quantitative discrete and continuous data).

b Data collection tools

iv) Questionnaires.

v) Survey.

vi) Interviews.

vii) Focus group discussion.

1.3 Data cleaning.

Data Collection:
- Statistical Data Collection: This involves gathering information from various sources to analyze and draw conclusions. It can be done through surveys, experiments, observations, or existing datasets.
- Distinguishing Data Types:
  - Qualitative Data: Describes qualities or characteristics (e.g., colors, opinions, categories). It’s non-numeric and often represented as labels or categories.
  - Quantitative Data:
    - Discrete Data: Consists of distinct, separate values (e.g., number of cars, students in a class).
    - Continuous Data: Represents measurements on a continuous scale (e.g., height, temperature).
- Problem Statement for Data Collection:
  - Clearly define the problem or research question you want to address.
  - Specify what data you need to answer that question.
- Root Cause Analysis:
  - Collect data that helps identify the underlying reasons behind a problem or issue.
Data Collection Tools:
- Questionnaires: Structured surveys with predefined questions. Useful for collecting standardized responses from a large audience.
- Surveys: Gather information from individuals through interviews or online forms.
- Interviews: Conducted one-on-one or in groups to explore in-depth insights.
- Focus Group Discussions: Bring together a small group to discuss specific topics.
Data Cleaning:

Data Preprocessing: Refine and prepare data for analysis.
Steps:

Handle missing values (impute or remove them).
Remove duplicates.
Standardize formats (e.g., date formats, units).
Correct inconsistencies (e.g., typos).
Address outliers.
Ensure data quality and reliability.

TOPIC/SUBTOPICS: -

a Descriptive statistics
viii) Data tabulation (frequency table

ix) Relative frequency table.

b Grouped data

x) Bar graph

xi) Pie chart

xii) Line graph

xiii) Frequency polygon

xiv) Frequency curve

xv) Relative frequency polygon

xvi) Histograms

xvii) Box plot

xviii) Leaf stem plot to be done in Microsoft excel

Bar Chart:
- A bar chart displays categorical data using rectangular bars. Each bar represents a category, and the height of the bar corresponds to the value.
- To create a bar chart in Excel:
  1. Enter your data in a spreadsheet.
  2. Select the data range.
  3. Go to the “Insert” tab.
  4. Choose “Bar Chart” and select the desired type (e.g., clustered, stacked).
Pie Chart:
- A pie chart shows the proportion of different categories within a whole.
- To create a pie chart in Excel:
  1. Enter your data.
  2. Select the data range.
  3. Go to the “Insert” tab.
  4. Choose “Pie Chart.”
Histogram:
- A histogram displays the distribution of continuous data.
- To create a histogram in Excel:
  1. Prepare your data.
  2. Go to the “Data Analysis” tool (requires enabling the Analysis ToolPak).
  3. Select “Histogram” and specify the input range and bin range.
Frequency Polygon:
- A frequency polygon is a line graph that shows the frequency distribution of continuous data.
- Create a histogram first, then connect the midpoints of each bin with straight lines.
Relative Frequency Table:
- Calculate the relative frequency (proportion) for each category in your dataset.
Box Plot:
- A box plot (box-and-whisker plot) displays the distribution of data, including median, quartiles, and outliers.
- Use the “Insert” tab and choose “Box and Whisker Plot.”
Leaf-Stem Plot:

A leaf-stem plot is used to display individual data points.
Create a column for the leaf (last digit) and another for the stem (remaining digits).

TOPIC/SUBTOPICS: -

a. Determination of central tendencies Range, Mean, Mode and Median for the data in Microsoft excel.

b. Determination of absolute measures of dispersion for data like range quartile deviation, mean deviation, standard deviation and variance in Microsoft Excel.

c. Skewness and kurtosis graphs in Microsoft excel and interpretations of results.

Central Tendencies:
- Range: The difference between the maximum and minimum values in your dataset.
- Mean (Average): Sum of all values divided by the number of values.
- Mode: The most frequently occurring value.
- Median: The middle value when the data is sorted.
Measures of Dispersion:
- Range: Already discussed (max - min).
- Quartile Deviation (Interquartile Range): Difference between the first quartile (Q1) and the third quartile (Q3).
- Mean Deviation (Average Deviation): Average of the absolute differences between each value and the mean.
- Standard Deviation: Measures the spread or dispersion of data around the mean.
- Variance: Square of the standard deviation.
Skewness and Kurtosis:
- Skewness: Measures the asymmetry of the distribution.
  - Positive skew: Tail extends to the right (mean > median).
  - Negative skew: Tail extends to the left (mean < median).
- Kurtosis: Measures the peakedness or flatness of the distribution.
  - Leptokurtic: High peak (more data in tails).
  - Mesokurtic: Normal distribution.
  - Platykurtic: Flat peak (less data in tails).

To calculate these in Excel:

Range: =MAX(data) - MIN(data)
Mean: =AVERAGE(data)
Mode: Use the MODE.SNGL function.
Median: =MEDIAN(data)
Quartile Deviation: =QUARTILE(data, 3) - QUARTILE(data, 1)
Mean Deviation: Calculate the absolute deviations from the mean and find their average.
Standard Deviation: =STDEV.P(data)
Variance: =VAR.P(data)
Skewness: =SKEW(data)
Kurtosis: =KURT(data)

Remember to replace “data” with your actual dataset. Interpretation depends on context and the specific problem you’re analyzing.

TOPIC/SUBTOPICS: -

4.1 Introduction to PYTHON.
4.2 Syntax of PYTHON.

4.3 Comments of PYTHON.

4.4 Data types of PYTHON.

4.5 Variables of PYTHON.

4.6 If-else in PYTHON.

4.7 Loops in PYTHON.

4.8 Arrays and functions in PYTHON.

Introduction to Python:
- Python is a high-level, interpreted programming language known for its readability and versatility.
- Key features:
  - Simple and expressive syntax.
  - Extensive standard library.
  - Dynamic typing.
  - Object-oriented and functional programming support.
  - Widely used in web development, data science, automation, and more.
Syntax of Python:
- Python uses indentation (whitespace) to define code blocks (e.g., loops, functions).
- Statements end with a newline character (no semicolons).
- Example:
```
if condition:
    # Indented block
    print("Hello, Python!")
```
Comments in Python:
- Comments provide explanations within code.
- Single-line comments start with #.
- Multi-line comments use triple quotes (''' or """).
- Example:
```
# This is a single-line comment
"""
This is a
multi-line comment
"""
```
Data Types in Python:
- Common data types:
  - Integers (int): Whole numbers (e.g., 42).
  - Floating-point numbers (float): Decimal numbers (e.g., 3.14).
  - Strings (str): Text (e.g., “Hello, World!”).
  - Boolean (bool): Represents True or False.
  - Lists, tuples, dictionaries, sets: Collections of data.
- Example:
```
age = 25
name = "Alice"
is student = True
```
Variables in Python:
- Variables store data values.
- No need to declare types explicitly.
- Example:
```
x = 10
y = "Hello"
```

If-Else Statements in Python:

Control flow based on conditions.

Example:

if x > 5:
    print("x is greater than 5")
else:
    print("x is not greater than 5")

Loops in Python:
- For loop: Iterates over a sequence (e.g., list, string).
- while loop: Repeats as long as a condition is true.
- Example:
```
for i in range(5):
    print(i)
```
Arrays and Functions in Python:
- Arrays (Lists): Ordered collections of elements.
- Functions: Reusable blocks of code.
- Example:
```
def greet(name):
    return f"Hello, {name}!"
```

POLYTECNIC MATERIAL

STATISTICS AND ANALYTICS

STATISTICS AND ANALYTICS

TOPIC/SUBTOPICS: -

TOPIC/SUBTOPICS: -

TOPIC/SUBTOPICS: -

TOPIC/SUBTOPICS: -

THANK YOU

Posted by POLYTECNIC MATERIAL

Post a Comment

0 Comments

Most Popular

Building Estimation & Valuation (syllabus summary)

Estimation and Costing

Construction Materials

Tags

Report Abuse

Search This Blog

Random Posts

Building Estimation & Valuation (syllabus summary)

Estimation and Costing

Construction Materials

Project Management Skills

MOST IMPORTANT TOPICS

Featured post

Modern Surveying

Popular Posts

Estimation and Costing

Building Estimation & Valuation (syllabus summary)

Site Management

Footer Menu Widget

Contact form

STATISTICS AND ANALYTICS

STATISTICS AND ANALYTICS

TOPIC/SUBTOPICS: -

TOPIC/SUBTOPICS: -

TOPIC/SUBTOPICS: -

TOPIC/SUBTOPICS: -

THANK YOU

Posted by POLYTECNIC MATERIAL

You may like these posts

Post a Comment

0 Comments

Most Popular

Tags

Search This Blog

Random Posts

Featured post

Popular Posts

Footer Menu Widget

Contact form