Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
150 views
in Technique[技术] by (71.8m points)

java - My CSV reader doesn't recognize missing value at the beginning and at the end of a row

Hi I'm working on a simple imitation of Panda's fillna method which requires me to replace a null/missing value in a csv file with an input (in terms of parameter). Almost everything is working fine but I have one issue. My CSV reader can't recognize the null/missing at the beginning and at the end of a row. For example,

   Name,Age,Class
   John,20,CLass-1
   ,18,Class-1
   ,21,Class-3

It will return errors. Same goes to this example ..

   Name,Age,Class
   John,20,CLass-1
   Mike,18,
   Tyson,21,

But for this case (at the end of the row problem), I can solve this by adding another comma at the end. Like this

   Name,Age,Class
   John,20,CLass-1
   Mike,18,,
   Tyson,21,,

However, for the beginning of the row problem, I have no idea how to solve it.

Here's my code for the CSV file reader:

public void readCSV(String fileName) {
    fileLocation = fileName;
    File csvFile = new File(fileName);
    Scanner sfile;
//    noOfColumns = 0;
//    noOfRows = 0;
    data = new ArrayList<ArrayList>();
    int colCounter = 0;
    int rowCounter = 0;
    
    try {
        sfile = new Scanner(csvFile);
        
        while (sfile.hasNextLine()) {
            String aLine = sfile.nextLine();
            Scanner sline = new Scanner(aLine);
            sline.useDelimiter(",");
            colCounter = 0;
            while (sline.hasNext()) {
                if (rowCounter == 0) 
                    data.add(new ArrayList<String>());
                
                
                data.get(colCounter).add(sline.next());
                colCounter++;
            }
            rowCounter++;
            sline.close();
        }
//        noOfColumns = colCounter;
//        noOfRows = rowCounter;
        sfile.close();
    } catch (FileNotFoundException e) {
        System.out.println("File to read " + csvFile + " not found!");
    }
} 
question from:https://stackoverflow.com/questions/65650966/my-csv-reader-doesnt-recognize-missing-value-at-the-beginning-and-at-the-end-of

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Unless you write a CSV file yourself, the writer mechanism will never arbitrarily add delimiters to suit the needs of your application method so, give up on that train of thought altogether because you shouldn't do it either. If you do indeed have access to the CSV file creation process then the simple solution would be to not allow the possibility of null or empty values to enter the file. In other words, have the defaults (in such a case) placed into empty elements as the CSV file is being written.

The Header line within a CSV file is there for a reason, it tells you the number of data columns and the names of those columns within each line (row) that make up the file. Between the header line and the actual data in the file you can also establish a pretty good idea of what each column Data Type should be.

In my opinion, the first thing your readCSV() method should do is read this Header Line (if it exists) and gather some information about the file that the method is about to iterate through. In your case the Header Line consists of:

Name,Age,Class

Right off the start we know that each line within the file consists of three (3) data columns. The first column contains the name of Name, the second column contains the name of Age, and the third column contains the name of Class. Based on all the information provided within the CSV file we can actually quickly assume the data types:

Name      (String)
Age       (Integer)
Class     (String)

I'm only pointing this out because in my opinion, although not mandatory, I think it would be better to store the CSV data in an ArrayList or List Interface of an Object class, for example:

ArrayList<Student> studentData = new ArrayList<>();

//  OR  //

List<Student> studentData = new ArrayList<>();

where Student is an object class.

You seem to want everything within a 2D ArrayList so with that in mind, below is a method to read CSV files and place its' contents into this 2D ArrayList. Any file column elements that contain the word null or nothing at all will have a default string applied. There are lots of comments within the code explaining what is going on and I suggest you give them a read. This code can be easily modified to suit your needs. At the very least I hope it gives you an idea of what can be done to apply defaults to empty values within the CSV file:

/**
 * Reads a supplied CSV file with any number of columnar rows and returns 
 * the data within a 2D ArrayList of String ({@code ArrayList<ArrayList<String>>}).
 * <br><br>File delimited data that contains 'null' or nothing (a Null String ("")) 
 * will have a supplied common default applied to that column element before it is 
 * stored within the 2D ArrayList.<br><br>
 * 
 * Modify this code to suit your needs.<br>
 * 
 * @param fileName (String) The CSV file to process.<br>
 * 
 * @param csvDelimiterUsed (String) // The delimiter use in CSV file.<br>
 * 
 * @param commonDefault (String) A default String value that can be common 
 * to all columnar elements within the CSV file that contains the string 
 * 'null' or nothing at all (a Null String ("")). Those empty elements will 
 * end up containing this supplied string value postfixed with the name of 
 * that column. As an Example, If the CSV file Header line was 
 * 'Name,Age,Class Room' and if the string "Unknown " is supplied to the 
 * commonDefault parameter and during file parsing a specific data column 
 * (let's say Age) contained the word 'null' or nothing at all (ex: 
 * Bob,null,Class-Math OR Bob,,Class-Math) then this line will be stored 
 * within the 2D ArrayList as:<pre>
 * 
 *     Bob, Unknown Age, Class-Math</pre>
 * 
 * @return (2D ArrayList of String Type - {@code ArrayList<ArrayList<String>>})
 */
public ArrayList<ArrayList<String>> readCSV(final String fileName, final String csvDelimiterUsed, 
                                            final String commonDefault) {
    String fileLocation = fileName;         // The student data file name to process.
    File csvFile = new File(fileLocation);  // Create a File Object (use in Scanner reader).
   
    /* The 2D ArrayList that will be returned containing all the CSV Row/Column data.
       You should really consider creating a Class to hold Student instances of this 
       data however, this can be accomplish by working the ArrayList later on when it
       is received.     */
    ArrayList<ArrayList<String>> fileData = new ArrayList<>();
    
    // Open the supplied data file using Scanner (as per OP).
    try (Scanner reader = new Scanner(csvFile)) {
        /* Read the Header Line and gather information... This array
           will ultimately be setup to hold default values should 
           any file columnar data hold null OR null-string ("").   */
        String[] columnData = reader.nextLine().split("\s*\" + csvDelimiterUsed + "\s*");
        
        /* How many columns of data will be expected per row. 
           This will be used in the String#split() method later 
           on as the limiter when we parse each file data line. 
           This limiter value is rather important in this case
           since it ensures that a Null String ("") is in place
           of where valid Array element should be should there
           be no data available instead of just providing an 
           array of 'lesser length'.                       */
        int csvValuesPerLineCount = columnData.length;
        
        // Copy column Names Array: To just hold the column Names.
        String[] columnName = new String[columnData.length];
        System.arraycopy(columnData, 0, columnName, 0, columnData.length);
        
        /* Create default data for columns based on the supplied 
           commonDefault String. Here the supplied default prefixes 
           the actual column name (see JavaDoc).          */
        for (int i = 0; i < columnData.length; i++) {
            columnData[i] = commonDefault + columnData[i];
        }
        
        // An ArrayList to hold each row of columnar data.
        ArrayList<String> rowData;
        // Iterate through in each row of file data...
        while (reader.hasNextLine()) {
            rowData = new ArrayList<>(); // Initialize a new ArrayList.
            // Read file line and trim off any leading or trailing white-spaces.
            String aLine = reader.nextLine().trim();
            // Only Process lines that contain something (blank lines are ignored).
            if (!aLine.isEmpty()) {
                /* Split the read in line based on the supplied CSV file 
                   delimiter used and the number of columns established 
                   from the Header line. We do this to determine is a
                   default value will be reguired for a specific column 
                   that contains no value at all (null or Null String("")). */
                String[] aLineParts = aLine.split("\s*\" + csvDelimiterUsed + "\s*", csvValuesPerLineCount);
                /* Here we determine if default values will be required 
                   and apply them. We then add the columnar row data to 
                   the rowData ArrayList.                 */
                for (int i = 0; i < aLineParts.length; i++) {
                    rowData.add((aLineParts[i].isEmpty() || aLineParts[i].equalsIgnoreCase("null"))
                                ? columnData[i] : aLineParts[i]);
                }
                /* Add the rowData ArrayList to the fileData 
                   ArrayList since we are now done with this 
                   file row of data and will now iterate to 
                   the next file line for processing.     */
                fileData.add(rowData);
            }
        }
    }
    // Process the 'File Not Found Exception'.
    catch (FileNotFoundException ex) {
        System.err.println("The CSV file to read (" + csvFile + ") can not be found!");
    }
    // Return the fileData ArrayList to the caller.
    return fileData;
}

And to use the method above you might do this:

ArrayList<ArrayList<String>> list = readCSV("MyStudentsData.txt", ",", "Unknown ");
if (list == null) { return; }        
StringBuilder sb;
for (int i = 0; i < list.size(); i++) {
    sb = new StringBuilder("");
    for (int j = 0; j < list.get(i).size(); j++) {
        if (!sb.toString().isEmpty()) { sb.append(", "); }
        sb.append(list.get(i).get(j));
    }
    System.out.println(sb.toString());
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...