Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
99 views
in Technique[技术] by (71.8m points)

sql - Is json suitable for holding values coming from dynamic columns

Is json suitable for holding values coming from dynamic columns? I want to store data about documents where the structure is known in advance, e.g. Name, Name, Year, Place, etc. For now, I have defined these fields as columns in a table in MS SQL. However, I would like to store data about "dynamic" documents, where the user himself will select a field from a certain pool and insert a value. For now, I store this data as json in one column. For example once will be something like this:'{"Name":"John Doe","Place":"Chicago"}' and other times it can only be: '{"Name": "John Doe"}'

I wonder how to unify this data. I thought that data from first type of documents (with a known number of columns) should also be stored in json. But I don't know if this is a good approach with large amounts of data, eg 100,000 records.

question from:https://stackoverflow.com/questions/65626112/is-json-suitable-for-holding-values-coming-from-dynamic-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

First, 100,000 rows is not a large number of columns. It is doubtful that you are even talking about a gigabyte of data.

Both XML and JSON incur overhead for storing field names in the data. If you have lots of repeated field names, then lots of redundant field names are being stored. And bigger rows slow down queries.

JSON and XML also have challenges in verifying the field names. This can be handled through the application or constraints. That said, they can be quite useful when the attributes are rarely used.

Your sample data simply suggests NULLable columns. You can have name and place. If there is no place then the value is NULL. Given that you have a fixed pool, there is a good change that this structure is the simplest and most efficient. The only downside is that adding a new column requires adding a column to a table. And that can be an expensive operation.

An alternative is an EAV model, which is mentioned in the comments. This solves the problem of repeating the names, because you can use ids instead. So, you could have:

create table optionalFields (
    optionFieldId int identity(1, 1) primary key,
    name varchar(255)
);

create table userOptionalFields (
    userOptionalFieldId int identity (1, 1) primary key,
    userId int references users(userId),
    optionalFieldId int references optionalFields(optionalFieldId),
    value varchar(255)
);

The downside to an EAV model is that it is simplest when all the values are strings, and that can be a little tricky if some of the values are numbers or dates. On the positive side, the database ensures that the fields are valid.

The choice between the different data models depends on factors such as:

  • The data type of the values.
  • The total number of fields.
  • How often new fields are added.
  • Whether the fields (for a given user) are updated and if so, if they are updated one-by-one or all-at-once.
  • How common fields are for a given user.
  • How familiar you are with XML and JSON.
  • Whether field names have synonyms (for instance "FullName for "Name").
  • Whether field names ever change. For instance, might "Name" suddenly become "FullName"?

And no doubt other issues as well.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...