Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

对数据表格标题翻译的修正意见 #1

Open
CutieDeng opened this issue Nov 21, 2021 · 5 comments
Open

对数据表格标题翻译的修正意见 #1

CutieDeng opened this issue Nov 21, 2021 · 5 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@CutieDeng
Copy link
Owner

对数据表格标题翻译的修正意见

  • 所有英文单词后面请使用 "," (等英文标点)

  • total_cases 等标题的单位为「人」。

  • total_casesnew_cases 提供的信息内容是可互相推导的,忽略new_cases 相关列。

  • stringency_index: 财政紧缩指数

  • handwashing_facilities: 卫生设施

  • reproduction rate: 基本传染数,基本再生数。
    是指没有任何防疫作为介入且所有人没有免疫力情况下,一个感染到某种传染病的初发个案,能够把疾病传染给其他多少个人的平均数。
    基本传染数通常写作 $R_0$ . 容易发现,该值愈大,即流行病愈难控制。
    在没有防疫情况下:

    • $R_0 < 1$, 该传染病将逐渐消失。
    • $R_0 > 1$, 该传染病会以指数方式散步,成为流行病。
    • $R_0 = 1$, 传染病会变成地方性流行病。

    见:维基百科:基本传染数


补充:

  • icu_patients: 进入 ICU 的病例数
  • hosp_patients: 入院病例数
  • weekly_icu_admissions: 周进入 ICU 病例数
  • new_tests: 检测数
  • positive_rate: (检测)阳性率
  • tests_units: 检测单元
  • total_vaccinations: 接种疫苗数
  • total_boosters_per_hundred: (疫苗)加强针接种数
  • excess_mortality_cumulative: 超额死亡累计数
@CutieDeng CutieDeng added the documentation Improvements or additions to documentation label Nov 21, 2021
@CutieDeng
Copy link
Owner Author

追增有效信息列

将所有的信息分成两部分,一部分描述为「原始信息」,另一部分描述为「附加信息」。条件:原始信息能够自动地推导出附加信息,即——在接下来具体的实现中,不会实际存储附加信息,以节约存储的磁盘空间和避免信息不一致错误。

@ChristinaLJC
Copy link
Collaborator

smoothed关键词仍待解决,目前的理解是“降噪”

@CutieDeng
Copy link
Owner Author

无法正常显示可以点击Data atom model v1125.pdf进行阅读。

Data atom model v1125

This modification is modified at 22:44, Nov. 25, 2021.

Geography with country

  • ISO code
  • continent
  • location

[raw information] Some ISO code starting with 'OWID' such as 'OWID_NAM' is the summary of the information of a continent. It stored the position information in col 'location' rather than 'continent'.

Time Information

  • date

Country Information

  • <Geography>
  • population
  • population density
  • median age
  • aged 65/70 older
  • GDP per capital:question:
  • extreme poverty
  • cardiovascular death rate
  • diabetes prevalence
  • female/male smokers
  • handwashing facilities
  • hospital beds per thousand
  • life expectancy
  • human development index

Epidemic Information

  • <Country>

  • <Time>

  • Total cases

    You can using difference equation to imply to 'new cases' values.

  • Total deaths

  • ❔Total cases per million(total deaths, new deaths,

  • ❕Reproduction rate

  • Patients in ICU

  • Patients in hospitals

  • Weekly ICU admissions

  • Weekly hospital admissions

  • Total tests

  • ❕Positive rate

  • ❕Tests per case

  • ❕Tests units

  • Total vaccinations

  • People vaccinated

  • People fully vaccinated

  • ❓Total boosters

  • Stringency index

  • Excess mortality cumulative absolute

  • Excess mortality cumulative

  • Excess mortality

  • Excess mortality cumulative per million

Graph Information

  • New cases smoothed
  • New deaths smoothed
  • New cases smoothed per million
  • New deaths smoothed per million
  • New tests smoothed
  • New vaccinations smoothed

@ChristinaLJC
Copy link
Collaborator

已经仔细阅读了你目前上传的整理数据的代码。现提出两点问题:

  1. 用Scanner读入的效率如何?如果换成bufferedReader,效率会不会更高?
  2. 希望可以在重要的代码前面加上注释,以便后续理解,譬如:compareIndex方法是将对应index的列按照国家分类?

@CutieDeng
Copy link
Owner Author

收到你的询问,现给出回答:

  1. Scanner 本身带有相对友好的预处理功能,效率偏低;在需要效率的情况下,建议改成 BufferedReader.
  2. 收到,未来的代码我会在必要的、容易引起困惑的地方加上注释。

顺便补充接下来对数据处理的工作:

  • 关注 per 相关信息是否冗余,比如:New cases smoothed per million 是否与 New cases smoothed 有直接的比例关系
  • 澄清部分引起疑惑的信息的含义
  • 给出最后的数据模型架构

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants