Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
287 views
in Technique[技术] by (71.8m points)

vb.net: word document iterating words very slow

I have the following code, which selects the text in the current page and then procedes to get each word (wa is the word application object):

    wa.Selection.MoveUp(Word.WdUnits.wdWindow, 1, 1) '0=move,1=extend
    wa.Selection.Collapse()
    wa.Selection.MoveDown(Word.WdUnits.wdWindow, 1, 1) '0=move,1=extend
    Dim r As Word.Range
    r = wa.Selection.FormattedText
    Dim Stopwatch As New Stopwatch()
    Stopwatch.Start()
    Dim params = New Dictionary(Of String, String)
    Dim wrd As String
    For i = 1 To r.Words.Count 'wa.Selection.Words.Count
       'params.Add(CStr(i), r.Words.Item(i).Text)
        wrd = r.Words.Item(i).Text 'wa.Selection.Words.Item(i).Text.ToString()
    Next
    Stopwatch.Stop()
    MsgBox(Stopwatch.Elapsed.TotalMilliseconds & "###" & wa.Selection.Words.Count)

In the above section, i get all the text of the current page and want to get each word's text. the current page where i test is 450 words. it takes 3200 milliseconds, which is way too much, about 7ms per word. if i limit it to 100 words, it's 160 milliseconds, about 1.6/ms per word. If i limit to 50 words, it's 45 milliseconds, less than one ms per word.

Initially I was trying to get it from the Selection object, but the speed is the same.

Am I doing something wrong? How can I improve on this?

Looping through an array of 450 items and just doing an assignment shouldn't take 3.2 seconds.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Looping through an array of 450 items and just doing an assignment shouldn't take 3.2 seconds.

You are iterating through a Words collection not an array. If I recall correctly, these collections get populated/evaluated on each access of their items.

An alternative would be use a Word.Range object and move its Start and End points to the next word. The following is one such implementation of this technique.

Dim params As New Dictionary(Of String, String)
Dim count As Int32 = 0

Dim currentPageRange As Word.Range = wa.Selection.Bookmarks("Page").Range
Dim currentWordRange As Word.Range = doc.Range((currentPageRange.Start), (currentPageRange.Start))

Dim endOfRange As Int32 = currentPageRange.End ' store this position locally to avoid the interop property get on each iteration of the while loop

While currentWordRange.Start < endOfRange
  count += 1
  currentWordRange.Expand(Word.WdUnits.wdWord) ' move end to start of next word
  params.Add(count.ToString(), currentWordRange.Text)
  currentWordRange.Collapse(Word.WdCollapseDirection.wdCollapseEnd) ' collapse to start of next word
End While

Running this code against a page with 655 words takes about 600 ms while iterating the Words collection takes about 3100 ms.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...