-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xena_dataset.py: Add logging functionality #45
base: master
Are you sure you want to change the base?
Conversation
c17e851
to
30cb8aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never know for sure? Is there a buffer for log file so that you can't trace the progress in real time?
xena_gdc_etl/xena_dataset.py
Outdated
@@ -36,6 +37,8 @@ | |||
GDC_RELEASE_URL, | |||
) | |||
|
|||
logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, logger is configured in gdc2xena.py
. I believe you should getLogger
as self.logger
when an object is instantiated: https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Difficulties would be the functions below. It's not quite normal to have a logger passed over to a function, is it? Maybe we can leave them along for now and just deal with the dataset classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I was wondering how can I get the root_dir
without making significant changes in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we use the logger in the module level, functions can be covered as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. You don't get root_dir
. You get the instance. Example: yunhailuo@d72d5ad
I run xge etl -p TCGA-OV -t muse_snv &
and get a log as:
2019-06-16 22:31:34 Xena-GDC-ETL [INFO]: Importing [1/1] projects: TCGA-OV
2019-06-16 22:31:34 root [INFO]: Starts to download...
2019-06-16 22:31:34 root [INFO]: This is from read_by_ext.
2019-06-16 22:31:34 root [ERROR]: Test error message!
2019-06-16 22:31:34 Xena-GDC-ETL [WARNING]: No muse_snv data for cohort TCGA-OV.
Traceback (most recent call last):
File "/Users/yunhailuo/Github/xena-GDC-ETL/xena_gdc_etl/gdc2xena.py", line 87, in gdc2xena
dataset.download().transform().metadata()
File "/Users/yunhailuo/Github/xena-GDC-ETL/xena_gdc_etl/xena_dataset.py", line 652, in download
raise ValueError(msg)
ValueError: Test error message!
I'm not sure if your code has something special or you haven't notice this. If I do what you've done here in my test branch. I end up with having the log going to stdout instead of a file. To me, this is because xena_database.py
is imported before logging.basicConfig
in gdc2xena.py
which makes the config here take precedence because "This function does nothing if the root logger already has handlers configured for it."
The reason I don't like logging/logger in function is not because they can't be covered. Those function are in a way general functions. I don't want to have log there (which is equivalent to print when there is no log) to mess up other potential logs. But I guess we can use logger since it couldn't hurt that much for a log... And they are not really "general"...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, another basicConfig is not required.
For 31b69b6 + basicConfig
I end up with 2 logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me with this (yunhailuo@a447162) or share the code again? I couldn't get 2 logs and wondering how that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, two logs can be useful to accommodate your unfinished.json
and long less useful info. I just don't know a simple clean way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, I used some FileHandler
if I remember correctly. Sending the code in a while.
UPD: Something like:
log_format = '%(asctime)-15s %(name)s [%(levelname)s]: %(message)s'
logger = logging.getLogger('Xena-GDC-ETL')
logger.setLevel(logging.INFO)
formatter = logging.Formatter(log_format)
file_handler = logging.FileHandler('etl_' + time.strftime("%Y%m%d-%H%M%S") + '.err',)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worry. That's the way I don't prefer for now. I thought your initial code without handler will do.
xena_gdc_etl/xena_dataset.py
Outdated
@@ -475,7 +487,8 @@ def projects(self, projects): | |||
elif isinstance(projects, list): | |||
self._projects = projects | |||
else: | |||
raise ValueError('"projects" must be either str or list.') | |||
logging.exception('"projects" must be either str or list.') | |||
raise ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors raised should still have their message. Same below. You can do a msg
string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I must have misunderstood. Can you please clarify? I thought something like this: https://github.com/yunhailuo/xena-GDC-ETL/pull/45/files#diff-ff44a6f244700f099c71d2285046c32aR487 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I didn't notice that. You are right.
A side note: is it a right use of logging.exception
? Should this function "only be called from an exception handler"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work, we can replace this by logging.error(msg, exc_info=True)
.
I think |
We should trim it if that's reasonable. Wondering how the log currently looks like before we trimming it? |
I didn't do full run, but it looks like this: 2019-06-16 21:30:37 [INFO]: Starts to download...
2019-06-16 21:30:37 [INFO]: Searching for raw data ...
2019-06-16 21:30:40 [INFO]:
499 files found for mirna data of ['TCGA-OV'].
2019-06-16 21:30:42 [INFO]:
[1/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-1512-01A.bbc678f8-ce50-43ae-96dd-b7e1da51ebab.txt" ...
2019-06-16 21:30:43 [INFO]:
[2/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-0912-01A.3c32784a-6a7b-4cb7-888a-387dc1bb66ee.txt" ...
2019-06-16 21:30:45 [INFO]:
[3/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-61-2094-01A.2fabb2d2-ffd2-4fa0-accf-360cbcf0002b.txt" ...
2019-06-16 21:30:47 [INFO]:
[4/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-0924-01A.461c4bf9-7207-457d-9113-8a0364a141c6.txt" ...
2019-06-16 21:30:49 [INFO]:
[5/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-23-2072-01A.305f5852-36a9-4e9f-a1e6-e120799da4b8.txt" ...
2019-06-16 21:30:51 [INFO]:
[6/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-A5FU-01A.acbe9b7c-259d-4a60-827e-5611764a03d2.txt" ...
2019-06-16 21:30:53 [INFO]:
[7/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-0805-01A.60f18ee9-9203-43c0-b00e-5e99912ba7ff.txt" ...
2019-06-16 21:30:55 [INFO]:
[8/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-24-1563-01A.08782f35-bac0-436a-8591-4a7e85d23fc9.txt" ... |
Did you say:
? |
Sorry, I meant to say |
a9614ca
to
0e20436
Compare
I'd say put this on hold and re-focus on data importing for now. |
No description provided.