How to preprocess invalid CSV in a canonical way #1835
PanCakeConnaisseur
started this conversation in
Idea
Replies: 1 comment 1 reply
-
What do you mean by illegal syntax? If it's not a valid csv file then you will just treat it as a normal text file. In that case doing pd.read_csv within your node may not be too bad, it's is acting like a transformation logic (arguably a string2dataframe function) instead of I/O. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What is the canonical way of fixing a CSV file with illegal syntax and then continue working with it? I cant' use
type: pandas.CSVDataSet
for it in the data catalog because parsing it would drop some illegal data.So far I am using
kedro.extras.datasets.text.TextDataSet
and fix the raw string of the file in a node. But how should I create the next catalog entry. I tried telling the node tooutput
it into a data entry oftype: pandas.CSVDataSet
but I get the error thatstr
does not contain ato_csv
attribute. Should I callpandas.read_csv()
in my syntax fixing method manually? Or how do I add preprocssing steps to fix the faulty CSV?Beta Was this translation helpful? Give feedback.
All reactions