4.11 Big Data Resources - Tony Ballantyne

[[Big Data.doc]] Big Data can be described in terms of: - volume - too big to fit into a single server. Relational databases don’t scale well across multiple machines. - velocity - data must be generated/processed/responded to quickly - variety - data in many forms such as structured, unstructured, text, multimedia. The key thing about Big Data is its lack of structure. This lack of structure poses challenges because: - analysing the data is made significantly more difficult - relational databases are not appropriate because they require the data to fit into a row-and-column format. Machine learning techniques are needed to discern patterns in the data and to extract useful information. Functional programming is a solution, because it makes it easier to write correct and efficient distributed code. Functional programming languages support: - immutable data structures (data structure that can't be modified once created) - In Java, Arrays are mutable but Strings are immutable - statelessness (doesn't remember previous interactions) - A counter function would be stateful, a print function would probably by stateless - Stateless web services don't hold information on the server side. They might store this information as cookies on the client side. - higher-order functions (functions that take functions as arguments and or return functions as results) # Fact Based Model - Immutable facts are recorded with timestamps - Each fact within a fact-based model captures a single piece of information - Data is never deleted. New facts are created instead. If someone dies, their history is still there in the model. ```mermaid flowchart LR A(Refrigerated: Yes)-.-B([Store: Manchester]) B---C([Truck: MJ15HWE]) ``` ```mermaid flowchart LR A(Attribute: Dotted Line)-.-B([Object: Manchester]) B---C([Relationship: Solid Line]) ```