Lecture 7: RDBMS vs. NoSQL

1. The Evolution of Databases

The database landscape has evolved significantly. Traditional systems from the 1970s to the 2000s were dominated by the Relational Model (RDBMS), which prioritized data integrity and structure through ACID compliance and the SQL standard.

Since the 2010s, the rise of "Big Data," the internet, and dynamic applications has led to the emergence of NoSQL databases. These systems are designed for different priorities, emphasizing:

Horizontal Scaling: Distributing data across many commodity servers.
Flexible Schema: Allowing data structure to change easily.
Big Data Readiness: Handling large volumes of structured, semi-structured, and unstructured data.

2. Understanding RDBMS (Relational Databases)

An RDBMS, or Relational Database Management System, organizes data into tables with rows and columns. Its primary focus is on ensuring data reliability through a set of properties known as ACID.

The ACID Properties

ACID is a set of guarantees for database transactions. It is critically important in systems where data integrity is paramount, such as banking, finance, and safety systems.

Atomicity: Each transaction is an "all or nothing" operation. If any part of the transaction fails, the entire transaction is rolled back, and the database is left unchanged.
Consistency: Every transaction brings the database from one valid state to another, following all defined rules like constraints and triggers.
Isolation: Concurrent transactions do not interfere with each other. The result is the same as if the transactions were executed one after the other (sequentially).
Durability: Once a transaction is successfully completed (committed), it will remain so permanently, even in the event of a power loss or system crash.

However, enforcing ACID properties, particularly Atomicity and Isolation, adds processing overhead and forces database servers to use sequential evaluation, which can complicate scaling across multiple servers.

3. Key Challenges Faced by RDBMS

Modern applications have exposed several limitations in the traditional RDBMS model.

A. Scalability and Performance

Vertical Scaling (Scaling Up): RDBMS traditionally scales by adding more resources (CPU, RAM) to a single server. This approach is expensive and has a physical upper limit.
Horizontal Scaling (Scaling Out): Distributing a relational database across multiple machines is inherently complex, especially while trying to maintain ACID guarantees across the system.
Complex Joins: To avoid data redundancy, RDBMS normalizes data across many tables. Retrieving data often requires joining these tables, which becomes very slow and computationally expensive as datasets grow massive.

B. Schema Rigidity

Predefined Structure: RDBMS requires a fixed schema to be defined before any data can be stored.
Difficult Changes: Modifying the schema in a live system is a complex and slow process that can require downtime. This is a significant bottleneck in agile development environments or when dealing with dynamic data like IoT sensor readings or user-generated content.
Poor for Unstructured Data: RDBMS is optimized for structured data and struggles to efficiently store and query semi-structured (JSON, XML) or unstructured data.

C. The CAP Theorem Trade-off

The CAP Theorem states that a distributed data system can only simultaneously provide two of the following three guarantees:

Consistency (C): All clients see the same data at the same time. Every read receives the most recent write.
Availability (A): The system always responds to requests (though it might not be with the most recent data).
Partition Tolerance (P): The system continues to operate even if communication between its nodes is lost or delayed.

In a distributed environment, you must choose two out of three.

RDBMS Priority (CP): Traditional distributed RDBMS prioritizes Consistency and Partition Tolerance. During a network failure, the system may become unavailable to ensure that no inconsistent data is returned.
NoSQL Priority (AP): Many NoSQL databases prioritize Availability and Partition Tolerance. During a network failure, the system remains available but may return stale data. This leads to a model called Eventual Consistency, where the data will become consistent across the system over time. This trade-off is acceptable for applications like social media feeds, where high uptime is more critical than immediate, perfect consistency.

4. Fundamental Comparison: RDBMS vs. NoSQL

Feature	Relational Databases (RDBMS/SQL)	NoSQL Databases (Not only SQL)
Data Model	Tabular (rows and columns). Relationships defined by foreign keys and accessed with joins.	Varies: Key-Value, Document, Column-Family, or Graph. Data is often denormalized.
Schema	Fixed, rigid, and predefined. The structure must be defined before adding data.	Flexible, dynamic, and often schema-less. The structure is not strictly enforced.
Scalability	Primarily Vertical (Scale Up). Horizontal scaling is difficult and expensive.	Primarily Horizontal (Scale Out). Easily distributes data across clusters of commodity servers.
Data Structure	Highly Structured. Ideal for financial, transactional, and inventory data.	Structured, Semi-structured, or Unstructured. Ideal for web content, sensor data, dynamic profiles.
Consistency	ACID (Atomicity, Consistency, Isolation, Durability). Strong consistency is guaranteed.	Often BASE (Basically Available, Soft State, Eventual Consistency). Sacrifices strong consistency for availability and speed.
Query Language	SQL (Structured Query Language), a standardized declarative language.	Varies by model. Uses model-specific languages/APIs like MQL (MongoDB) or Cypher (Neo4j).

5. Main Types of NoSQL Databases

NoSQL is not a single technology but a category of databases with different data models. The four main types are:

1. Document Databases

Description: Stores data in flexible, self-contained documents, typically in JSON, BSON, or XML format. Related data is often nested within a single document.
Characteristics: Flexible schema, intuitive for developers as documents map well to programming objects, and good read performance since related data is stored together, reducing the need for joins.
Use Cases: Content management, e-commerce product catalogs, user profiles.

2. Key-Value Stores

Description: The simplest NoSQL model. Data is stored as a collection of unique keys and their associated values, like a dictionary or hash map.
Characteristics: Extremely fast for simple lookups, highly scalable, but querying is limited to retrieving values by their key.
Best Use Cases: Caching, session management, shopping carts, real-time leaderboards.
Leading Examples: Redis, Amazon DynamoDB.

3. Column-Family (Wide-Column) Stores

Description: Stores data in tables, but data is organized by columns rather than rows. Rows can be sparse, meaning a row does not need a value for every column.
Characteristics: Built for massive horizontal scaling (petabytes of data), provides very high write throughput, and is efficient for analytical queries that access a subset of columns over many rows.
Best Use Cases: Big Data processing, logging systems, time-series data (like from IoT devices).
Leading Examples: Apache Cassandra, HBase.

4. Graph Databases

Description: Designed specifically to store and navigate relationships. Data is stored as Nodes (entities) and Edges (the relationships connecting nodes).
Characteristics: Excels at traversing complex relationships quickly (e.g., "friends of friends"). Performance remains constant regardless of the total dataset size, unlike relational joins.
Best Use Cases: Social networks, recommendation engines, fraud detection, knowledge graphs.
Leading Example: Neo4j (uses the Cypher query language).

First Term

Intro to Cybersecurity

Lectures

IT Essentials

Assignments

Exams

Lectures

Sections

Math

Lectures

Models

Python

Assignments

Exams

Lectures

Sections

Second Term

C Essentials

C Essentials

Lectures

Cybersecurity Essentials

Lectures

Intro To IoT

Assignments

Lectures

MS Office

Assignments

C++

Assignments

Lectures

Summaries

Tasks

DB

Lectures

Summaries

DigitalEngineering

Assingments

Lectures

Sheets

Linux

Assignments

Lectures

Tasks

OS

Assignments

Lectures

Summaries

Summaries second attempt

WebDevelopment

Lectures

Summaries

Tasks

Chapters

Chapter One

Lecture 7: RDBMS vs. NoSQL ​

1. The Evolution of Databases ​

2. Understanding RDBMS (Relational Databases) ​

The ACID Properties ​

3. Key Challenges Faced by RDBMS ​

A. Scalability and Performance ​

B. Schema Rigidity ​

C. The CAP Theorem Trade-off ​

4. Fundamental Comparison: RDBMS vs. NoSQL ​

5. Main Types of NoSQL Databases ​

1. Document Databases ​

2. Key-Value Stores ​

3. Column-Family (Wide-Column) Stores ​

4. Graph Databases ​

Lecture 7: RDBMS vs. NoSQL

1. The Evolution of Databases

2. Understanding RDBMS (Relational Databases)

The ACID Properties

3. Key Challenges Faced by RDBMS

A. Scalability and Performance

B. Schema Rigidity

C. The CAP Theorem Trade-off

4. Fundamental Comparison: RDBMS vs. NoSQL

5. Main Types of NoSQL Databases

1. Document Databases

2. Key-Value Stores

3. Column-Family (Wide-Column) Stores

4. Graph Databases