In Cassandra, tombstones are used in deletion since the writes are written to immutable files. I read that tombstones also solve the tough problem of deleting in distributed systems. This is where I am confused. What problems exist in deleting from distributed databases? For eg: Take a 3 node cluster with nodes A, B and C. Say node C is down and a delete came. It is marked as tombstone in A and B and success is returned back to client. After sometime the compaction kicks in on A and B and clears out this tombstone. Now when a read comes for the previously deleted value, A and B return nothing while C returns the old value. But here I read that the value given by C takes precedence over the empty responses.
If the tombstoned record has already been deleted from the rest of the cluster before that node recovers, Cassandra treats the record on the recovered node as new data, and propagates it to the rest of the cluster.
Why does it do this? Since quorum nodes say the value is not present, why don't we return that back to the client? This could potentially simplify the problem of deletes in distributed systems as we needn't wait for gc grace seconds before clearing out the tombstones.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…