First, 100,000 rows is not a large number of columns. It is doubtful that you are even talking about a gigabyte of data.
Both XML and JSON incur overhead for storing field names in the data. If you have lots of repeated field names, then lots of redundant field names are being stored. And bigger rows slow down queries.
JSON and XML also have challenges in verifying the field names. This can be handled through the application or constraints. That said, they can be quite useful when the attributes are rarely used.
Your sample data simply suggests NULLable columns. You can have name
and place
. If there is no place
then the value is NULL
. Given that you have a fixed pool, there is a good change that this structure is the simplest and most efficient. The only downside is that adding a new column requires adding a column to a table. And that can be an expensive operation.
An alternative is an EAV model, which is mentioned in the comments. This solves the problem of repeating the names, because you can use ids instead. So, you could have:
create table optionalFields (
optionFieldId int identity(1, 1) primary key,
name varchar(255)
);
create table userOptionalFields (
userOptionalFieldId int identity (1, 1) primary key,
userId int references users(userId),
optionalFieldId int references optionalFields(optionalFieldId),
value varchar(255)
);
The downside to an EAV model is that it is simplest when all the values are strings, and that can be a little tricky if some of the values are numbers or dates. On the positive side, the database ensures that the fields are valid.
The choice between the different data models depends on factors such as:
- The data type of the values.
- The total number of fields.
- How often new fields are added.
- Whether the fields (for a given user) are updated and if so, if they are updated one-by-one or all-at-once.
- How common fields are for a given user.
- How familiar you are with XML and JSON.
- Whether field names have synonyms (for instance "FullName for "Name").
- Whether field names ever change. For instance, might "Name" suddenly become "FullName"?
And no doubt other issues as well.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…