I want to split a >500MB ASCII based text file after ~5000 occurrences of a delimiter ("00I" in may case). I am using the code from (https://stackoverflow.com/a/42302328/14957413)
awk -v n=5000 '
function ofile() {
if (op)
close(op);
op = sprintf("file.GES.%d.", ++p)
}
BEGIN{ofile()}
/00I/{++i} i>n{i=1; ofile()}
{ print $0 > op }'
file
The source file start with around ~1000 lines of variables declarations, that I need to also have in every new file that I create with the snippet from above.
Input
//file header
00K
01Filename
02Fieltype
03Date
//00F describes a variable
00F
0101
02Variable name 1
03text
04length
00F
0102
02Variable name 2
03number
04length
//content I want split
00I
01Value for first F, e.g. Test
02Value for second F, e.g. 1
//this repeats a couple of 1.000.000 times
00I
01Value for first F, e.g. TestN
02Value for second F, e.g. N
expected output for first to nth file
//Header
00K
01Filename
02Fieltype
03Date
//Variable declaration
00F
0101
02Variable name 1
03text
04length
00F
0102
02Variable name 2
03number
04length
//Content
00I
01Value for first F, e.g. Test
02Value for second F, e.g. 1
Two ideas
- Extending awk statement to store the first ~1000 lines of the source file in a variable and to append it in every newly generated file.
- Preparing a separate file with the variable declaration and adding its content to every newly generated file.
Questions
What is the best way to achieve the task?
Can it be done by extending the awk expression?
Do I need to run two statements - first the awk and second the sed statement?
Help is very much appreaciated.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…