Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
667 views
in Technique[技术] by (71.8m points)

powershell - Split XML by node

I've got a big XML file (more than 4Gb) and i should integrate it into SQL table. SQL has a 2 Gb limitation. The idea is to split the file into serveral XML. For example 500Mb by file.

I try to adapt a powershell script but i still have the following error

enter image description here

Here is the PowerShell script i'm using and i can't find what's wrong

param(  [string]$file = $(throw "ENTITY.XML"), $matchesPerSplit = 50, $maxFiles = [Int32]::MaxValue, $splitOnNode = $(throw "entity"), $offset = 0 )

$ErrorActionPreference = "Stop";

trap {
    $ErrorActionPreference = "Continue"
    write-error "Script failed: $_ 
 $($_.ScriptStackTrace)"
    exit (1);
}

$file = (resolve-path $file).path

$fileNameExt = [IO.Path]::GetExtension($file)
$fileNameWithoutExt = [IO.Path]::GetFileNameWithoutExtension($file)
$fileNameDirectory = [IO.Path]::GetDirectoryName($file)


$reader = [System.Xml.XmlReader]::Create($file) 
 
$matchesCount = $idx = 0

try {
    "Splitting $from on node name='$splitOnNode', with a max of $matchesPerSplit matches per file. Max of $maxFiles files will be generated."
    $result = $reader.ReadToFollowing($splitOnNode)
    $hasNextSibling = $true
    while (-not($reader.EOF) -and $result -and $hasNextSibling -and ($idx -lt $maxFiles + $offset)) {
        if ($matchesCount -lt $matchesPerSplit) {
            if($offset -gt $idx) {
               $idx++
               continue
            }
        
            $to = [IO.Path]::Combine($fileNameDirectory, "$fileNameWithoutExt.$($idx -$offset)$fileNameExt")
            "Writing to $to"
            $toXml = New-Object System.Xml.XmlTextWriter($to, $null)
            $toXml.Formatting = 'Indented'
            $toXml.Indentation = 2
            try {
               $toXml.WriteStartElement("split")
               $toXml.WriteAttributeString("cnt", $null, "$idx")
               
               do {
                  $toXml.WriteRaw($reader.ReadOuterXml())
                  $matchesCount++;
                  $hasNextSibling = $reader.ReadToNextSibling($splitOnNode)
               } while($hasNextSibling -and ($matchesCount -lt $matchesPerSplit))
               $toXml.WriteEndElement();
            } 
            finally {
               $toXml.Flush()
               $toXml.Close()
            }
            $idx++
            $matchesCount = 0;
        }
    }
}
finally {
    $reader.Close()
}

My XML file is on the same folder as the PS file and "entity" is the node.

question from:https://stackoverflow.com/questions/65886722/split-xml-by-node

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...