Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
682 views
in Technique[技术] by (71.8m points)

regex - Extract values from HTML TD and Tr

I have some HTML source that i get from a website for option quotes. (please see below)

What is the best way to extract the various text values in tr and store in a collection based on the strike price (4700 in this case available in the mid td 4700.00)

Some people recommend regex while other suggest to use a html parser. I'm doing this in VBA so whats the best way?

<!--<td><a href="javascript:popup1('','','1')">Quote</a></td>

<td><a href="javascript:popup1('','','','','CE')"><img src="/images/print3.gif"></a>



</td>-->





                    <td><a href="javascript:chartPopup('NIFTY', 'OPTIDX', '25JAN2012', '4700.00','CE','S&P CNX NIFTY');"><img src="/live_market/resources/images/grficon.gif" /></a></td>

                        <td class="ylwbg"> 2,935,500</td>

                        <td class="ylwbg"> 27,550</td>

                        <td class="ylwbg"> 12,458</td>


                        <td class="ylwbg"> 23.79</td>

                        <!-- End-->

                        <td class="ylwbg">





                            <a href="/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=4700.00&type=CE&expiry=25JAN2012" target="_blank"> 139.25</a>



                        </td>

                        <!--*Net Change*-->



                        <td class="ylwbg" Style="color:Red;"> -7.35</td>



                        <td class="ylwbg"> 200</td>

                        <td class="ylwbg"> 139.15</td>

                        <td class="ylwbg"> 142.45</td>

                        <td class="ylwbg"> 200</td>

                        <td class="grybg"><a href="/live_market/dynaContent/live_watch/option_chain/optionDates.jsp?symbol=NIFTY&instrument=OPTIDX&strike=4700.00"><b>4700.00</b></a></td>

                        <td class="nobg"> 1,300</td>

                        <td class="nobg"> 76.00</td>

                        <td class="nobg"> 79.00</td>

                        <td class="nobg"> 1,350</td>



                        <!--*Net Change*-->



                            <td class="nobg" Style="color:Red;"> -1.55</td>





                        <td class="nobg">



                            <!-- <a href="javascript:popup1('NIFTY','OPTIDX','25JAN2012','4700.00','PE')"> 76.00</a> -->



                            <a href="/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=4700.00&type=PE&expiry=25JAN2012" target="_blank"> 76.00</a>







                        </td>



                        <td class="nobg"> 26.33</td>



                        <td class="nobg"> 32,772</td>

                        <td class="nobg"> 103,700</td>



                        <td class="nobg"> 5,123,300</td>



                        <td><a href="javascript:chartPopup('NIFTY', 'OPTIDX', '25JAN2012', '4700.00','PE','S&P CNX NIFTY');"><img src="/live_market/resources/images/grficon.gif" /></a></td>



<!--<td><a href="javascript:popup1('','','1')">Quote</a></td>

<td><a href="javascript:popup1('','','','','PE')"><img src="/images/print3.gif"></a></td>-->



                    </tr>
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

After some fiddling I have derived a regex/VBA solution using

  1. XMLHTTP to access the site (change strSite to suit)
  2. a Regexp to get the required numbers
  3. a variant array with 20 records to hold, then dump the numbers to the active sheet

output Looking at the source HTML to find Regex patterns

The Call options have a common starting and finishing string that delimit the 10 values, but there are three different strings

  1. Strings 1-4,7-10 for each record match <td class="ylwbg">X</td>
  2. String 6 has a Style (and other text) preceding the > before the X
  3. String 5 contains a much longer <a href textX</a>

A regex of .Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(</td>)" extracts all the needed strings, but further work is needed later on string 5

The Put options start with <td class="nobg" so these are happily not extracted by a regex that gets points 1-3

enter image description here Actual Code

    Sub GetTxt()
    Dim objXmlHTTP As Object
    Dim objRegex As Object
    Dim objRegMC As Object
    Dim objRegM As Object
    Dim strResponse As String
    Dim strSite As String
    Dim lngCnt As Long
    Dim strTemp As String
    Dim X(1 To 20, 1 To 10)
    X(1, 1) = "OI"
    X(1, 2) = "Chng in vol"
    X(1, 3) = "Volume"
    X(1, 4) = "IV"
    X(1, 5) = "LTP"
    X(1, 6) = "Net Chg"
    X(1, 7) = "Bid Qty"
    X(1, 8) = "Bid Price"
    X(1, 9) = "Ask Price"
    X(1, 10) = "Ask Qnty"

    Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
    strSite = "http://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionDates.jsp?symbol=NIFTY&instrument=OPTIDX&strike=4700.00"

    On Error GoTo ErrHandler
    With objXmlHTTP
        .Open "GET", strSite, False
        .Send
        If .Status = 200 Then strResponse = .ResponseText
    End With
    On Error GoTo 0

    Set objRegex = CreateObject("vbscript.regexp")
    With objRegex
        '*cleaning regex* to remove all spaces
        .Pattern = "[xA0s]+"
        .Global = True
        strResponse = .Replace(strResponse, vbNullString)
        .Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(</td>)"
        If .Test(strResponse) Then
            lngCnt = 20
            Set objRegMC = .Execute(strResponse)
            For Each objRegM In objRegMC
                lngCnt = lngCnt + 1
                If Right$(objRegM.submatches(2), 2) <> "a>" Then
                    X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod 10 > 0, lngCnt Mod 10, 10)) = objRegM.submatches(2)
                Else
                'Get submatches of the form <a href="/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=4700.00&type=CE&expiry=23FEB2012" target="_blank"> 206.40</a>
                    strTemp = Val(Right(objRegM.submatches(2), Len(objRegM.submatches(2)) - InStrRev(objRegM.submatches(2), """") - 1))
                    X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod 10 > 0, lngCnt Mod 10, 10)) = strTemp
                End If
            Next
        Else
            MsgBox "Parsing unsuccessful", vbCritical
        End If
    End With
    Set objRegex = Nothing
    Set objXmlHTTP = Nothing
    [a1].Resize(UBound(X, 1), UBound(X, 2)) = X
    Exit Sub
ErrHandler:
    MsgBox "Site not accessible"
    If Not objXmlHTTP Is Nothing Then Set objXmlHTTP = Nothing
End Sub

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...