Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
502 views
in Technique[技术] by (71.8m points)

distributed system - In Cassandra, why does a single value take precedence over quorum nodes empty response?

In Cassandra, tombstones are used in deletion since the writes are written to immutable files. I read that tombstones also solve the tough problem of deleting in distributed systems. This is where I am confused. What problems exist in deleting from distributed databases? For eg: Take a 3 node cluster with nodes A, B and C. Say node C is down and a delete came. It is marked as tombstone in A and B and success is returned back to client. After sometime the compaction kicks in on A and B and clears out this tombstone. Now when a read comes for the previously deleted value, A and B return nothing while C returns the old value. But here I read that the value given by C takes precedence over the empty responses.

If the tombstoned record has already been deleted from the rest of the cluster before that node recovers, Cassandra treats the record on the recovered node as new data, and propagates it to the rest of the cluster.

Why does it do this? Since quorum nodes say the value is not present, why don't we return that back to the client? This could potentially simplify the problem of deletes in distributed systems as we needn't wait for gc grace seconds before clearing out the tombstones.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The quorum returns nothing could also mean that the rest of the nodes simply didn't receive value because the nodes were down, so in this case the single node having data is correct, and this value will be propagated to the nodes that doesn't have it. Cassandra simply don't know, if the data is missing because it was deleted via tombstone vs. data is missing because nodes weren't available at the time of write.

That's why it's important to run repairs regularly and make sure that this happens during gc_grace_seconds. And that you didn't put back machine after being offline greater than this period.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...