Introduction to IoT - Lecture 8: Big Data Analytics Techniques

Supervised and Unsupervised Learning

Supervised Learning: Involves labeled data; the algorithm learns from input-output pairs.
- Examples: Classification, Regression
Unsupervised Learning: No labeled data; the algorithm tries to learn structure from the input.
- Examples: Clustering, Association Rule Mining

Supervised Learning

Input Data: Labeled (predefined outputs).
Training: Uses a training dataset to learn patterns.
Goal: Prediction (e.g., classifying spam, predicting prices).
Types:
- Regression (continuous outputs, like house prices).
- Classification (discrete outputs, like "cat" vs. "dog").
Classes: Known in advance.
Analysis: Typically offline (pre-processed data).

Unsupervised Learning

Input Data: Unlabeled (no predefined outputs).
Training: Works directly on raw input data.
Goal: Analysis (e.g., finding hidden patterns).
Types:
- Clustering (grouping similar data, like customer segments).
- Association (discovering relationships, like "people who buy X also buy Y").
Classes: Unknown—learns from data structure.
Analysis: Often real-time (dynamic data).

Big Data Techniques Overview

Technique	Category	Use Case
K-Means Clustering	Clustering	Group items by similarity
Apriori	Association Rules	Discover relationships between items
Linear/Logistic Regression	Regression	Find relationship between inputs and outcomes
TF-IDF	Text Analysis	Analyze and weight terms in textual data
Naïve Bayes, Decision Tree	Classification	Assign labels to known objects
ARIMA	Time Series Analysis	Forecast future values in temporal data

Clustering

Clustering groups similar data points together.
Unsupervised learning technique.
Data points in the same cluster are highly similar; different clusters are dissimilar.

Applications of Cluster Analysis

Marketing: Target customer segments
Biology: Classify species or gene functions
City Planning: Group houses by features
Other: Pattern recognition, image processing, etc.

Types of Clustering

Exclusive (Hard) Clustering: Each data point belongs to only one cluster (e.g., K-Means).
Overlapping (Soft) Clustering: Data points can belong to multiple clusters (e.g., Fuzzy C-Means).
Hierarchical Clustering: Builds a tree-like cluster structure (dendrogram).

K-Means Clustering

Unsupervised technique for partitioning n data points into k clusters.
Each point is assigned to the nearest centroid.
Input: Numeric features with a defined distance metric (e.g., Euclidean).
Output: Cluster centroids and point-cluster assignments.

Steps in K-Means

Choose k and initialize centroids.
Assign each point to the nearest centroid.
Recompute centroids based on new assignments.
Repeat steps 2–3 until convergence (centroids stabilize or oscillate).

Association Rules

Unsupervised technique for discovering relationships between items.
Does not predict an outcome; identifies patterns.
Example format: If X is observed, then Y is also observed.
Commonly used in Market Basket Analysis (e.g., customers who buy bread also buy butter).
Example algorithm: Apriori

Regression

Determines the relationship between input features and an output variable.
Identifies influential variables and helps improve outcomes.
Two major types: Linear and Logistic regression.

Linear Regression

Models relationship between continuous outcome and input variables.
Assumes a linear relationship.
Is probabilistic, not deterministic.
Can include transformations to achieve linearity.

Linear Regression Model Equation

y = β * 0 + β_{1} x_{1} + β_{2} x_{2} + \dots + β * p - 1 x_p - 1 + ε

Key Components

Outcome Variable ( $y$ ): The continuous target variable being predicted.
Input Variables ( $x_{j}$ ): Features influencing $y$ , e.g., $x_{1}, x_{2}, \dots$ .
Intercept ( $β_{0}$ ): Baseline value of $y$ when all $x_{j} = 0$ .
Coefficients ( $β_{j}$ ): Quantify the effect of each $x_{j}$ on $y$ . For example, $β_{1}$ is the change in $y$ for a 1-unit increase in $x_{1}$ , holding other variables constant.
Error Term ( $ε$ ): Captures unexplained variability, reflecting the model's probabilistic nature (not deterministic).

Logistic Regression

Used when the outcome is categorical (e.g., Yes/No, Pass/Fail).
Based on the logistic function (sigmoid).
Outputs probabilities in the range (0, 1).
Suitable for binary classification problems.

Logistic Regression Model Equation

Linear Component:

y = β * 0 + β_{1} x_{1} + β_{2} x_{2} + \dots + β * p - 1 x_p - 1

Logistic Function (Sigmoid):

p (x * 1, x_{2}, \dots, x * p - 1) = \frac{e^{y}}{1 + e^{y}}

Key Components

Linear Predictor ( $y$ ): A linear combination of input variables ( $x_{j}$ ) and coefficients ( $β_{j}$ ), similar to linear regression.
Logistic Function: Transforms $y$ into a probability $p$ between 0 and 1, ensuring outputs are valid probabilities.
Probability Interpretation: $p$ represents the likelihood of a binary outcome (e.g., "success" or "failure").

First Term

Intro to Cybersecurity

Lectures

IT Essentials

Assignments

Exams

Lectures

Sections

Math

Lectures

Models

Python

Assignments

Exams

Lectures

Sections

Second Term

C Essentials

C Essentials

Lectures

Cybersecurity Essentials

Lectures

Intro To IoT

Assignments

Lectures

MS Office

Assignments

C++

Assignments

Lectures

Summaries

Tasks

DB

Lectures

Summaries

DigitalEngineering

Assingments

Lectures

Sheets

Linux

Assignments

Lectures

Tasks

OS

Assignments

Lectures

Summaries

Summaries second attempt

WebDevelopment

Lectures

Summaries

Tasks

Chapters

Chapter One

Introduction to IoT - Lecture 8: Big Data Analytics Techniques ​

Supervised and Unsupervised Learning ​

Supervised Learning ​

Unsupervised Learning ​

Big Data Techniques Overview ​

Clustering ​

Applications of Cluster Analysis ​

Types of Clustering ​

K-Means Clustering ​

Steps in K-Means ​

Association Rules ​

Regression ​

Linear Regression ​

Linear Regression Model Equation ​

Key Components ​

Logistic Regression ​

Logistic Regression Model Equation ​

Key Components ​

Introduction to IoT - Lecture 8: Big Data Analytics Techniques

Supervised and Unsupervised Learning

Supervised Learning

Unsupervised Learning

Big Data Techniques Overview

Clustering

Applications of Cluster Analysis

Types of Clustering

K-Means Clustering

Steps in K-Means

Association Rules

Regression

Linear Regression

Linear Regression Model Equation

Key Components

Logistic Regression

Logistic Regression Model Equation

Key Components