Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deprecated - too many problems w dataset] Kylel/semeval2017 #16

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
6e6675a
Span data structure; functions for handling span labeling; script to …
kyleclo Feb 18, 2019
4fcc10a
fix bug where infinite while loop when labeling tokens because token …
kyleclo Feb 18, 2019
da1adca
add logging to know if script has stalled
kyleclo Feb 18, 2019
f626a64
add try-except to catch bad annotation in brat
kyleclo Feb 18, 2019
77f37a5
semeval2017 ner dataset
kyleclo Feb 18, 2019
6c0291d
typsetting
kyleclo Feb 18, 2019
11bb876
refactor Span stuff for clarity
kyleclo Feb 18, 2019
33cd99a
fix bug in Span where recursively calling itself in <> and <= >= methods
kyleclo Feb 18, 2019
66db2a3
add better comments; revert to sentence splitting since contexts can …
kyleclo Feb 18, 2019
15921ca
fix bug in span where wrong length
kyleclo Feb 18, 2019
a092ba7
handle bug with whitespacing in entity mention annotations
kyleclo Feb 18, 2019
cccd81c
updated data to semeval2017 ner
kyleclo Feb 18, 2019
48725b6
clean up script for ner
kyleclo Feb 18, 2019
f583662
split semeval2017 script into NER and REL
kyleclo Feb 19, 2019
ee37144
relex data for semeval17
kyleclo Feb 19, 2019
4b4edf8
data structure for Relation Mention
kyleclo Feb 19, 2019
fff051e
Merge branch 'master' into kylel/semeval2017
kyleclo Feb 20, 2019
213346b
add char start/stop spans to conll2003 data for semeval
kyleclo Feb 21, 2019
1a45df2
add script to add spans to semeval17 conll data
kyleclo Feb 21, 2019
08f36bd
readd chunk label to semeval data; move end span to pos location
kyleclo Feb 21, 2019
acc5da9
script for loading allenlnp model from beaker and predicting in semev…
kyleclo Feb 22, 2019
ab9d62b
scienceie2017 scripts
kyleclo Feb 22, 2019
85b6499
test files for evaluating semeval
kyleclo Feb 22, 2019
2e77256
update semeval predict script to pull experiment from beaker
kyleclo Feb 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11,939 changes: 11,939 additions & 0 deletions data/ner/semeval2017/dev.txt

Large diffs are not rendered by default.

23,376 changes: 23,376 additions & 0 deletions data/ner/semeval2017/test.txt

Large diffs are not rendered by default.

69,550 changes: 69,550 additions & 0 deletions data/ner/semeval2017/train.txt

Large diffs are not rendered by default.

40 changes: 40 additions & 0 deletions data/rel/semeval2017/dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{"text": "<E2>Complex Langevin</E2> (<E1>CL</E1>) dynamics [1,2] provides an approach to circumvent the sign problem in numerical simulations of lattice field theories with a complex Boltzmann weight, since it does not rely on importance sampling.", "label": "Synonym-of", "metadata": {"id": "S0003491613001516", "spans": [18, 20, 0, 16]}}
{"text": "This method leads to solution of perhaps the most known test-case that exhibits a <E2>first order phase transition</E2> (semi-heuristically described) such as the <E1>van der Waals model</E1>.", "label": "Hyponym-of", "metadata": {"id": "S0003491615001839", "spans": [335, 354, 263, 291]}}
{"text": "The Hamiltonian then simplifies to H=-\u03b3eB(S1(z)+S2(z))+|\u03b3e|\u03b1S\u21922\u00b7I\u2192, where <E1>\u03b1</E1> is the <E2>isotropic hyperfine coupling</E2>.", "label": "Hyponym-of", "metadata": {"id": "S0009261413004612", "spans": [1570, 1571, 1579, 1607]}}
{"text": "<E2>Optical processes</E2>, including <E1>resonance energy transfer</E1> are similarly dependent on the local environment of molecular chromophores [2\u20134].", "label": "Hyponym-of", "metadata": {"id": "S0009261414000372", "spans": [300, 325, 271, 288]}}
{"text": "Such <E2>methods, mostly multi-step and time-consuming</E2>, can typically be cast in one of two distinct categories: synthetic mechanisms designed to produce a single stereoisomer, or <E1>separation techniques</E1> to isolate distinct enantiomers from a racemic mixture.", "label": "Hyponym-of", "metadata": {"id": "S0009261415001517", "spans": [441, 462, 270, 315]}}
{"text": "A model Hamiltonian exhibiting tunnelling dynamics through a multidimensional asymmetric double well potential has been used as a test by the MP/SOFT [18] and CCS methods [19] mentioned above, and also more recently by a <E1>configuration interaction</E1> (<E2>CI</E2>) expansion method [20] and two-layer version of CCS (2L-CCS).", "label": "Synonym-of", "metadata": {"id": "S0009261415008362", "spans": [336, 361, 363, 365]}}
{"text": "Based on the theoretical analysis, the <E2>value of the measuring resistor</E2>, <E1>Rm</E1>, has no effect on the corrosion process and on the estimated value of noise resistance.", "label": "Synonym-of", "metadata": {"id": "S0010938X13003818", "spans": [72, 74, 39, 70]}}
{"text": "A <E2>surfactant</E2> is a <E1>surface active agent</E1>.", "label": "Hyponym-of", "metadata": {"id": "S0010938X14002157", "spans": [18, 38, 2, 12]}}
{"text": "Regions with larger (\u0394\u03a8) indicate increased surface reactivity [11,15,18], and even a correlation between Volta potential differences measured in nominally dry air and their <E2>free corrosion potential</E2> (<E1>Ecorr</E1>) pre-determined under immersed conditions has been reported [18].", "label": "Synonym-of", "metadata": {"id": "S0010938X15003261", "spans": [1156, 1161, 1130, 1154]}}
{"text": "By applying this mapping, <E2>n-alkanes chains</E2> containing multiples of three carbon units can be represented directly: <E1>n-C6H14</E1>, n", "label": "Hyponym-of", "metadata": {"id": "S0021961415003821", "spans": [919, 926, 830, 846]}}
{"text": "A <E2>direct numerical simulation</E2> (<E1>DNS</E1>) approach is used to evaluate profiles of fluid velocities and concentrations in water, and several important turbulence statistics have been evaluated without using turbulent closures, and subgrid-scale models.", "label": "Synonym-of", "metadata": {"id": "S0021999113005652", "spans": [1410, 1413, 1381, 1408]}}
{"text": "Regardless of the details, the forced response is composed of shallow-water waves, possibly including <E1>Kelvin waves</E1>, with the largest amplitudes in <E2>waves</E2> with a natural frequency \u03c9f close to that of the forcing frequency \u03c9; various examples of this sort are given in Chapters 9 and 10 of Gill [16].", "label": "Hyponym-of", "metadata": {"id": "S0021999113005846", "spans": [442, 454, 487, 492]}}
{"text": "We finish Section 2 by reformulating our model into the phase field framework, which appears more suitable for the problem in hand, and we formulate the <E1>cell tracking problem</E1> as a <E2>PDE constrained optimisation problem</E2>.", "label": "Hyponym-of", "metadata": {"id": "S0021999115003423", "spans": [594, 615, 621, 657]}}
{"text": "The dynamics of various <E2>physical phenomena</E2>, such as the <E1>movement of pendulums, planets, or water waves</E1> can be described in a variational framework.", "label": "Hyponym-of", "metadata": {"id": "S0021999115007895", "spans": [56, 102, 24, 42]}}
{"text": "Although relatively long communication times between remote processors may hinder this process in typical <E2>parallel computers</E2>, this is not the case for <E1>GPGPU architectures</E1>.", "label": "Hyponym-of", "metadata": {"id": "S0021999115008153", "spans": [409, 428, 364, 382]}}
{"text": "R-adaptivity \u2013 <E2>mesh redistribution</E2> \u2013 involves <E1>deforming a mesh</E1> in order to vary local resolution and was first considered for atmospheric modelling more than twenty years ago by Dietachmayer and Droegemeier", "label": "Hyponym-of", "metadata": {"id": "S0021999115008372", "spans": [213, 229, 182, 201]}}
{"text": "<E1>Calumite</E1> is a <E2>powdered material</E2>, with a typical particle size distribution between limits of ca.", "label": "Hyponym-of", "metadata": {"id": "S0022311513010313", "spans": [849, 857, 863, 880]}}
{"text": "As such these materials are exposed to a large number of environmental factors that will promote <E2>degradation mechanisms</E2> such as <E1>oxidation</E1>.", "label": "Hyponym-of", "metadata": {"id": "S0022311514006722", "spans": [222, 231, 191, 213]}}
{"text": "the intended crystalline phase was the closely related <E1>titanate pyrochlore</E1>, <E2>CaUTi2O7</E2>.", "label": "Synonym-of", "metadata": {"id": "S0022311514006941", "spans": [544, 563, 565, 573]}}
{"text": "For completeness, we report two <E2>shell models</E2> with the best results given by the <E1>Catlow potential model</E1>.", "label": "Hyponym-of", "metadata": {"id": "S0022311514009271", "spans": [825, 847, 777, 789]}}
{"text": "The latter mechanism, <E1>DHC</E1>, is a sub-critical, time dependent <E2>cracking phenomenon</E2> that requires long range hydrogen diffusion for repeated local hydride growth and fracture at a hydrostatic tensile stress raiser [5,41,42].", "label": "Hyponym-of", "metadata": {"id": "S0022311515002354", "spans": [792, 795, 831, 850]}}
{"text": "During burnup, pure <E1>UO2 fuel</E1> tends to oxidize to <E2>UO2+x</E2>.", "label": "Synonym-of", "metadata": {"id": "S0022311515301653", "spans": [802, 810, 831, 836]}}
{"text": "<E1>Methane</E1> (<E2>CH4</E2>) is a precursor for carbonaceous deposits that form a sacrificial layer protecting the underlying graphite from excessive weight loss [15] and reduction in mechanical strength [16].", "label": "Synonym-of", "metadata": {"id": "S0022311515303901", "spans": [731, 738, 740, 743]}}
{"text": "An essential part of <E2>nuclear reactor analysis</E2> is the <E1>prediction of the three-dimensional space-time kinetics of neutrons</E1> in a relatively large, finite, heterogeneous, three-dimensional reactor core.", "label": "Hyponym-of", "metadata": {"id": "S0029549313003439", "spans": [53, 120, 21, 45]}}
{"text": "Methods that predict the cell temperature at <E1>maximum power point</E1> (<E2>MPP</E2>) operation offer a more realistic approach since they include the electrical energy generation of the solar cells (i.e. real operating conditions); Yandt et al.", "label": "Synonym-of", "metadata": {"id": "S0038092X15003059", "spans": [740, 759, 761, 764]}}
{"text": "They summarized this information across calibrations by computing <E1>Highest Posterior-Density</E1> (<E2>HPD</E2>) intervals, and subsequently represent the total solution uncertainty with a probability-box (p-box).", "label": "Synonym-of", "metadata": {"id": "S0045782514001947", "spans": [268, 293, 295, 298]}}
{"text": "Thus, <E2>surface modifications</E2>, such as <E1>doping</E1>, functionalization and improving the pore structure and specific surface area of nanocarbons, are important to enhance gas adsorption.", "label": "Hyponym-of", "metadata": {"id": "S0079642514000784", "spans": [1438, 1444, 1407, 1428]}}
{"text": "Continuum approaches, which are based on the fact that the <E2>geometrical features</E2> of the film (i.e., the <E1>nanocolumns</E1>) are much larger than the typical size of an atom [42,266,267], have been also explored.", "label": "Hyponym-of", "metadata": {"id": "S0079642515000705", "spans": [471, 482, 427, 447]}}
{"text": "The structure of the failed surface can be represented with a mathematical graph, where graph nodes represent failed faces and graph edges exist between failed faces with common triple line in the cellular structure, i.e. where two <E1>micro-cracks</E1> formed a continuous larger <E2>crack</E2>.", "label": "Hyponym-of", "metadata": {"id": "S0167844214000652", "spans": [694, 706, 734, 739]}}
{"text": "These materials have two components, one being a semiconducting material with diamagnetic properties while the other is a <E2>magnetic dopant</E2> such as <E1>transition metal</E1> having un-paired d electrons [2].", "label": "Hyponym-of", "metadata": {"id": "S0254058415300766", "spans": [369, 385, 345, 360]}}
{"text": "We found that for n-propyl benzene, the relative yield of C3H3+ is extremely sensitive to the phase of the <E1>laser pulse</E1> as compared to any of the other possible <E2>channels</E2>.", "label": "Hyponym-of", "metadata": {"id": "S0301010409001219", "spans": [1002, 1013, 1055, 1063]}}
{"text": "Interestingly, the very low H2 adsorption has been successfully characterised as <E1>weak binding interactions</E1> and, for the first time, we have found that the adsorbed H2 in the pore channel has a liquid type recoil motion at 5K (below its melting point) as a direct result of this <E2>weak interaction</E2> to the MOF host.", "label": "Hyponym-of", "metadata": {"id": "S0301010413004096", "spans": [1204, 1229, 1401, 1417]}}
{"text": "Several extended PES scans of <E1>Na3</E1> and other <E2>alkali trimers</E2> followed this initial study, employing DFT [7], complete active space SCF [8], or a configuration interaction approach based on valence bond wave functions [9].", "label": "Hyponym-of", "metadata": {"id": "S0301010415002189", "spans": [275, 278, 289, 303]}}
{"text": "Alternatively to H-atom photodetachment from the <E1>intermediate radicals</E1>, the latter may serve as <E2>reducing agents</E2>.", "label": "Hyponym-of", "metadata": {"id": "S0301010415300355", "spans": [49, 70, 96, 111]}}
{"text": "This is because the rough wall treatment in the soft sphere implementation adds extra virtual walls during the collision of a particle with a <E2>wall</E2>, which is a more realistic representation of a rough wall compared to the hard sphere rough wall treatment where one <E1>random wall</E1> is considered.", "label": "Hyponym-of", "metadata": {"id": "S0301932213000487", "spans": [1090, 1101, 968, 972]}}
{"text": "By appropriately choosing one of three <E2>finite difference schemes</E2> (<E1>central, forward, or backward differencing</E1>), it has been demonstrated that thin liquid ligaments can be well resolved see Xiao (2012).", "label": "Hyponym-of", "metadata": {"id": "S0301932213001985", "spans": [203, 245, 176, 201]}}
{"text": "To achieve this, <E1>large eddy simulations</E1> (<E2>LES</E2>) of a horizontal turbulent channel flow laden with five different particle shapes, incorporating the drag, lift and toque model derived in Zastawny et al.", "label": "Hyponym-of", "metadata": {"id": "S0301932214001931", "spans": [214, 236, 238, 241]}}
{"text": "[54] and expanded into a <E2>group contribution approach</E2>, <E1>SAFT-\u03b3</E1>, by Papaioannou et al.", "label": "Hyponym-of", "metadata": {"id": "S0378381215300297", "spans": [801, 807, 772, 799]}}
{"text": "Similarly, the <E2>modeling approaches</E2> used to understand and parameterize active mechanisms and phenomena over lifetime fall into the broad categories of <E1>micro-, meso- and macroscopic approaches</E1>.", "label": "Hyponym-of", "metadata": {"id": "S1359028614000989", "spans": [979, 1019, 843, 862]}}
{"text": "Nevertheless, many present experimental <E2>nuclear fusion devices</E2> (DIII-D, <E1>TCV</E1>, etc.) and new ones (JT-60SA, KSTAR, Wenderstein-7X) use carbon elements, so the removal of carbon co-deposits is still necessary for a better device operation\u2014plasma density control, dust events, etc.", "label": "Hyponym-of", "metadata": {"id": "S2352179115300041", "spans": [1044, 1047, 1012, 1034]}}
Loading