You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When parsing a CSV file, AbstractCharInputReader throws ArrayIndexOutOfBoundsException when buffer starts with whitespace and the parsing fails with the error below:
Exception in thread "main" com.univocity.parsers.common.TextParsingException: java.lang.ArrayIndexOutOfBoundsException - Index -1 out of bounds for length 128
Parser Configuration: CsvParserSettings:
Auto configuration enabled=true
Auto-closing enabled=true
Autodetect column delimiter=false
Autodetect quotes=false
Column reordering enabled=true
Delimiters for detection=[]
Empty value=
Escape unquoted values=false
Header extraction enabled=false
Headers=[null]
Ignore leading whitespaces=false
Ignore leading whitespaces in quotes=false
Ignore trailing whitespaces=true
Ignore trailing whitespaces in quotes=false
Input buffer size=128
Input reading on separate thread=false
Keep escape sequences=false
Keep quotes=false
Length of content displayed on error=1000
Line separator detection enabled=false
Maximum number of characters per column=-1
Maximum number of columns=20480
Normalize escaped line separators=true
Null value=null
Number of records to read=all
Processor=none
Restricting data in exceptions=false
RowProcessor error handler=null
Selected fields=none
Skip bits as whitespace=true
Skip empty lines=true
Unescaped quote handling=STOP_AT_DELIMITERFormat configuration:
CsvFormat:
Comment character=#
Field delimiter=,
Line separator (normalized)=\n
Line separator sequence=\r\n
Quote character="
Quote escape character="
Quote escape escape character=null
Internal state when error was thrown: line=1, column=34, record=1, charIndex=641, headers=[null]
at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:402)
at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:623)
at com.univocity.parsers.common.AbstractParser.internalParseAll(AbstractParser.java:552)
at com.univocity.parsers.common.AbstractParser.parseAll(AbstractParser.java:545)
at com.univocity.parsers.common.AbstractParser.parseAll(AbstractParser.java:532)
at com.example.univocitytest.UnivocityTestApplication.main(UnivocityTestApplication.java:114)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 128
at com.univocity.parsers.common.input.AbstractCharInputReader.getString(AbstractCharInputReader.java:482)
at com.univocity.parsers.csv.CsvParser.parseSingleDelimiterRecord(CsvParser.java:186)
at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:109)
at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:581)
... 4 more
Reproducing the problem
The problem can only be reproduced with the attached CSV file bad_file_bug.csv
because the data in the file and the input buffer size configuration causes the issue.
Create a simple java application with the main method as below:
public static void main(String[] args) throws FileNotFoundException {
CsvParserSettings settings = new CsvParserSettings();
settings.setAutoConfigurationEnabled(true);
settings.setAutoClosingEnabled(true);
settings.setDelimiterDetectionEnabled(false);
settings.setQuoteDetectionEnabled(false);
settings.setColumnReorderingEnabled(true);
settings.setEmptyValue("");
settings.setEscapeUnquotedValues(false);
settings.setHeaderExtractionEnabled(false);
settings.setHeaders((String) null);
settings.setIgnoreLeadingWhitespaces(false);
settings.setIgnoreLeadingWhitespacesInQuotes(false);
settings.setIgnoreTrailingWhitespaces(true);
settings.setIgnoreTrailingWhitespacesInQuotes(false);
settings.setInputBufferSize(128);
settings.setReadInputOnSeparateThread(false);
settings.setKeepEscapeSequences(false);
settings.setKeepQuotes(false);
settings.setErrorContentLength(1000);
//settings.setLineSeparatorDetectionEnabled(true);
settings.setMaxCharsPerColumn(-1);
settings.setMaxColumns(20480);
settings.setNormalizeLineEndingsWithinQuotes(true); //?
settings.setNullValue("null");
settings.setSkipBitsAsWhitespace(true);
settings.setSkipEmptyLines(true);
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER);
settings.getFormat().setComment('#');
settings.getFormat().setDelimiter(',');
settings.getFormat().setLineSeparator("\r\n");
settings.getFormat().setQuote('"');
settings.getFormat().setQuoteEscape('"');
// Create a CSV parser
CsvParser parser = new CsvParser(settings);
// Parse all rows from a CSV file
List<String[]> allRows = parser.parseAll(new FileReader("**path/to/the/attached/csv/file**"));
// Print all rows
for (String[] row : allRows) {
for (String column : row) {
System.out.print(column + "||");
}
System.out.println();
}
}
Use the attached csv file as input to the FIleReader constructor which is given as input to CsvParser.parseAll() method.
Running the above application reproduces the error mentioned in the description.
Root Cause
The problem occurs in the case when the input buffer size specified in configuration (in this case 128 ) makes it such that the first character in the buffer is a whitespace character. When getString() method of AbstractCharInputCharReader class is trying to trim the trailing whitespaces by walking back on the buffer until a non-whitespace character is seen. However, if the first character in the buffer is a whitepace character, the code tries to decrement variable i one more time and access the buffer with an index of -1. This causes the exception.
public final String getString(char ch, char stop, boolean trim, String nullValue, int maxLength) {
if (i == 0) {
return null;
}
int i = this.i;
for (; ch != stop; ch = buffer[i++]) {
if (i >= length) {
return null;
}
if (lineSeparator1 == ch && (lineSeparator2 == '\0' || lineSeparator2 == buffer[i])) {
break;
}
}
int pos = this.i - 1;
int len = i - this.i;
if (maxLength != -1 && len > maxLength) { //validating before trailing whitespace handling so this behaves as an appender.
return null;
}
this.i = i - 1;
if (trim) {
i = i - 2;
while (buffer[i] <= ' ' && whitespaceRangeStart < buffer[i]) {
len--;
i--;
}
}
String out;
if (len <= 0) {
out = nullValue;
} else {
out = new String(buffer, pos, len);
}
nextChar();
return out;
}```
The text was updated successfully, but these errors were encountered:
Description
When parsing a CSV file, AbstractCharInputReader throws ArrayIndexOutOfBoundsException when buffer starts with whitespace and the parsing fails with the error below:
Reproducing the problem
The problem can only be reproduced with the attached CSV file
bad_file_bug.csv
because the data in the file and the input buffer size configuration causes the issue.
Create a simple java application with the main method as below:
Use the attached csv file as input to the FIleReader constructor which is given as input to CsvParser.parseAll() method.
Running the above application reproduces the error mentioned in the description.
Root Cause
The problem occurs in the case when the input buffer size specified in configuration (in this case 128 ) makes it such that the first character in the buffer is a whitespace character. When getString() method of AbstractCharInputCharReader class is trying to trim the trailing whitespaces by walking back on the buffer until a non-whitespace character is seen. However, if the first character in the buffer is a whitepace character, the code tries to decrement variable i one more time and access the buffer with an index of -1. This causes the exception.
The text was updated successfully, but these errors were encountered: