Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
752 views
in Technique[技术] by (71.8m points)

java code to split text file into chunks based on chunk size

i need to split the given text file into equally sized chunks and store them into an array. The input is a set of many text files in same folder. Im using the following code for this:

int inc = 0;
File dir = new File("C:\Folder");
    File[] files = dir.listFiles();
    for (File f : files) {
        if(f.isFile()) {
            BufferedReader inputStream = null;
            try {
                inputStream = new BufferedReader(new FileReader(f));
                String line;

                while ((line = inputStream.readLine()) != null) {
                    String c[] = splitByLength(line, chunksize);
                    for (int i=0;i<c.length;i++) {
                        chunk[inc] = c[i];
                        inc++;
                    }
                }
            }
            finally {
                if (inputStream != null) {
                    inputStream.close();
                }
            }
        }
    }

public static String[] splitByLength(String s, int chunkSize) {  

    int arraySize = (int) Math.ceil((double) s.length() / chunkSize);  
    String[] returnArray = new String[arraySize];  
    int index = 0;  
    for(int i=0; i<s.length(); i=i+chunkSize) {  
        if(s.length() - i < chunkSize) {  
            returnArray[index++] = s.substring(i);  
        }   
        else {  
            returnArray[index++] = s.substring(i, i+chunkSize);  
        }  
    }
    return returnArray;  
}

Here the chunk values are stored in the "chunk" array. But the problem here is since i have used the readLine() command to parse the text file, the result obtained is correct only if the chunk size is less than the number of characters in a line. Lets say every line has 10 characters and the number of lines in the file is 5. Then if i provide chunk size of any value greater than 10 it always split the file into 10 chunks with each line in each chunk.

Example, consider a file with the following contents,

abcdefghij
abcdefghij
abcdefghij
abcdefghij
abcdefghij

if chunk size = 5 then,

abcde | fghij | abcde | fghij | abcde | fghij | abcde | fghij | abcde | fghij |

if chunk size = 10 then,

abcdefghij | abcdefghij | abcdefghij | abcdefghij | abcdefghij |

if chunk size > 10 then also my code only provides the same as before,

abcdefghij | abcdefghij | abcdefghij | abcdefghij | abcdefghij |

I tried using RandomAccessFile and FileChannel but i wasnt able to obtain the needed results... Can anyone help me solve this problem? thank you..

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

That's because BufferedReader.readLine() reads only a line not the whole file.

I assume that the line break characters and are not part of the normal content you interested in.

Maybe that helps.

// ...
StringBuilder sb = new StringBuilder(); 
String line;
while ((line = inputStream.readLine()) != null) {
    sb.append(line);

    // if enough content is read, extract the chunk
    while (sb.length() >= chunkSize) {

        String c = sb.substring(0, chunkSize);
        // do something with the string

        // add the remaining content to the next chunk
        sb = new StringBuilder(sb.substring(chunkSize));
    }
}
// thats the last chunk
String c = sb.toString();
// do something with the string

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...