The CAP Theorem

Mar 22, 2013 00:00 · 427 words · 3 minute read Programming

What is Distributed System ?

The client can read and write data into distributed system from any node.

What is CAP Theorem ?

It define the behaviour of distributed systems. It will guarantee two of “Consistency”, “Availability” and “Partition Tolerance” attribute at the same time. In other words you can get any two at any time but not possible to get all of them at the same time.

Consistency

system guarantees to read data as fresh as you just wrote.

Availability

give reasonable response in a reasonable amount of time.

Partition Tolerance

The network can continue to work if it is divided into parts due to landline noise or any failure .

Fallacies of distributed computing

Network programming != object oriented programming

The wrong assumption that programmers made usually are:

The network is reliable

Method calls are reliable but web service are not due to the network failures.programmers have to code for the failure scenarios as well.

Latency is zero

Method calls take no time to response while request on network usually takes time. The request might time out in between and do not return any response. That’s why some of the applications are working fine locally but they failed in real networks.

Bandwidth is infinite

Programmers are sometimes assumed that there is no shared memory on the network and they copy whole parameter into one request and that might cause failure of the request. Programmers should break the parameters into chunk with zero latency. In other words, programmers should find a balance between bandwidth and network latency.

The Network is secure

The distributed system might get attacked via its vulnerability. The internal network is not safe and most attacks occur from inside.

Topology does not change

Platforms and tools might be changed and programmers should take into account theses changes into their codes by constructing configurable pieces.

There is one administrator

In a local environment the boss is the programmer ! But in production there might be more than one administrator on the network. For example, someone might run the network and someone runs the applications.

The transport cost is zero

In object oriented code , creating an object is almost free as it only consumes a little memory . In distributed systems sending the message over the network cost real money (cost of running network , bandwidth and etc).

The network is homogeneous

Applications in larger network usually developed by different teams and vendors . upgrading the version of system is the bottleneck and we can not bring whole the system down to upgrade it.