[[Big Data.doc]]
Big Data can be described in terms of:
- volume - too big to fit into a single server. Relational databases don’t scale well across multiple machines.
- velocity - data must be generated/processed/responded to quickly
- variety - data in many forms such as structured, unstructured, text, multimedia.
The key thing about Big Data is its lack of structure. This lack of structure poses challenges because:
- analysing the data is made significantly more difficult
- relational databases are not appropriate because they require the data to fit into a row-and-column format.
Machine learning techniques are needed to discern patterns in the data and to extract useful information.
Functional programming is a solution, because it makes it easier to write correct and efficient distributed code.
Functional programming languages support:
- immutable data structures (data structure that can't be modified once created)
- In Java, Arrays are mutable but Strings are immutable
- statelessness (doesn't remember previous interactions)
- A counter function would be stateful, a print function would probably by stateless
- Stateless web services don't hold information on the server side. They might store this information as cookies on the client side.
- higher-order functions (functions that take functions as arguments and or return functions as results)
# Fact Based Model
- Immutable facts are recorded with timestamps
- Each fact within a fact-based model captures a single piece of information
- Data is never deleted. New facts are created instead. If someone dies, their history is still there in the model.
```mermaid
flowchart LR
A(<b>Refrigerated</b>:<br>Yes)-.-B([<b>Store</b>:<br>Manchester])
B---C([<b>Truck</b>:<br>MJ15HWE])
```
```mermaid
flowchart LR
A(<b>Attribute</b>:<br>Dotted Line)-.-B([<b>Object</b>:<br>Manchester])
B---C([<b>Relationship</b>:<br>Solid Line])
```