-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV-264: Added DuplicateHeaderMode for flexibility with header strictness. #114
Conversation
Thanks for the PR. You have removed some public API methods. This is not allowed and would fail binary compatibility tests. I have enabled these on the CI environments. If you rebase on master the errors should appear in the checks. You can run:
on the project to see the full checks on the code. The |
Sorry about that, I rebased master and applied the changes requested. I usually use |
You can force push to your branch all you want; -) we usually do not rewrite history for long lived branches like master and release. |
Not sure if you guys are interested in merging this PR, but figured I'd fix the merge conflicts that appeared from recent commits. |
@SethiPandi |
3268d4e
to
a22ace9
Compare
Just a heads up that this is rebased with |
@SethFalco I feel there is something important missing here:
For example, if duplicate headers are allowed, does the first column win? Should it? Does the last column win? Certainly, it should not be random or undocumented. Also, would you mind rebasing on master please? I've recently deprecated the |
71801b9
to
ef1529a
Compare
It's worth noting this change doesn't impact how data is resolved internally or how headers are referenced, but just how if duplicate empty headers are allowed. I can peek into the current behavior anyway and try to document it if it's not already. Meanwhile, I've rebased with master. I also updated an exception to no longer recommend using one of the deprecated methods. Edit: Sorry, forgot to add @ Deprecated to the getter. |
d74d1cc
to
1be64a6
Compare
I didn't see it documented, figured the easiest way to find out would be to just test it.
List<CSVRecord> records = parser.getRecords();
for (CSVRecord record : records)
System.out.println(record.get("A"));
// 2
// 4
List<CSVRecord> records = parser.getRecords();
for (CSVRecord record : records) {
Map<String, String> map = record.toMap();
System.out.println(map);
}
// {A=2, B=6}
// {A=4, B=8} I think we could add a docstring on It might be a good idea to add test cases to cover this scenario, if it'll become documented behavior. Mind if I do this in a separate PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SethFalco
Thank you for the update. Please see my comments scattered throughout.
Gary
src/test/java/org/apache/commons/csv/issues/JiraCsv264Test.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/commons/csv/issues/JiraCsv264Test.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/commons/csv/issues/JiraCsv264Test.java
Outdated
Show resolved
Hide resolved
b6b801b
to
adf02b2
Compare
Sorry about the imports, I'm guessing that due to a setting in IntelliJ setting back when I was using that. |
@SethFalco |
Just rebased this to resolve merge conflicts.
Seems no one wants to comment on this. ^-^' Any chance this could be merged soon, or still waiting on feedback? (Note, I'm in no rush.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like a good fix for the issue described in JIRA. Only adds one more setting, giving more flexibility for users.
+1 and thanks for the contribution and patience @SethFalco !
-Bruno
Instead of only having a boolean
true
/false
for how duplicate header values are handled, this uses an enum instead.Previously the only possibilities were:
true
: To allow duplicates.false
: To disallow duplicates, except empty cells which were allowed to be duplicates.This pull request makes an enum with three options:
ALLOW_ALL
: To always allow duplicates. (Same astrue
previously.)ALLOW_EMPTY
: To allow duplicates only if they're empty cells. (Same asfalse
previously.)DISALLOW
: To disallow all duplicates.This provides a little more flexibility in the strictness for the parser, and makes what the options do clearer.
Jira Issue: https://issues.apache.org/jira/browse/CSV-264