deploy: 1a5c28b

tongyx361 · Oct 30, 2024 · 152bb3d · 152bb3d
commit 152bb3d
Show file tree

Hide file tree

Showing 23 changed files with 6,771 additions and 0 deletions.
diff --git a/.nojekyll b/.nojekyll
diff --git a/core.html b/core.html
diff --git a/index.html b/index.html
diff --git a/robots.txt b/robots.txt
@@ -0,0 +1 @@
+Sitemap: https://tongyx361.github.io/symeval/sitemap.xml
diff --git a/search.json b/search.json
@@ -0,0 +1,62 @@
+[
+  {
+    "objectID": "index.html",
+    "href": "index.html",
+    "title": "SymEval",
+    "section": "",
+    "text": "For common users/developers, please just run the following command the install the package:\npip install \"git+https://github.com/tongyx361/symeval.git\"",
+    "crumbs": [
+      "SymEval"
+    ]
+  },
+  {
+    "objectID": "index.html#installation",
+    "href": "index.html#installation",
+    "title": "SymEval",
+    "section": "",
+    "text": "For common users/developers, please just run the following command the install the package:\npip install \"git+https://github.com/tongyx361/symeval.git\"",
+    "crumbs": [
+      "SymEval"
+    ]
+  },
+  {
+    "objectID": "index.html#quick-start",
+    "href": "index.html#quick-start",
+    "title": "SymEval",
+    "section": "Quick Start",
+    "text": "Quick Start\n\nfrom symeval import *\n\nevaluator = EvaluatorMathBatch()\n\nsymeval provides elaborate answer extraction and correctness judgement pipelines based on regular expressions and SymPy symbolic calculation, which is able to correctly process\n\nmost mathematical objects such as matrices (vectors), intervals, symbols besides numbers,\nas well as some special texts like bool expressions, dates and times.\n\nEvaluatorMath implements an elaborate evaluation pipeline for mathematical reasoning tasks.\nSymPy symbolic calculation causes risks of ex-long evaluation time.\nTo address this, we implement EvaluatorMathBatch to evaluate in batch with timeout but still efficiently.\n\ntest_eq(\n    evaluator.batch_eq(ref_answers=[\"1/2\", \"1/2\"], pred_answers=[\"0.5\", \"2/4\"]),\n    [True] * 2,\n)\n\nHere we provide a quick start guide. For more details, please refer to the API reference.\n\nsource\n\nEvaluatorMathBatch\n\n EvaluatorMathBatch (strict_extract:bool=True,\n                     include_percentage:bool=True, rel_tol:float=1e-09,\n                     abs_tol:float=1e-08, percent_rel_tol:float=0.001,\n                     ascii_only:bool=True, timeout:int=5, n_procs:int=2,\n                     use_tqdm:bool=True)\n\nBatch evaluator for math problems, capable of extracting answer segment from complex resp and processing various mathematical objects (e.g. fractions, symbolic expressions, matrices, vectors) and special text (e.g. bool values).\n\n\n\n\n\n\n\n\n\n\nType\nDefault\nDetails\n\n\n\n\nstrict_extract\nbool\nTrue\n\n\n\ninclude_percentage\nbool\nTrue\nWhether to include percentage comparisons.\n\n\nrel_tol\nfloat\n1e-09\nThe relative tolerance for numerical comparisons.\n\n\nabs_tol\nfloat\n1e-08\nThe absolute tolerance for numerical comparisons. Necessary for precision issues.\n\n\npercent_rel_tol\nfloat\n0.001\nThe absolute tolerance for percentage comparisons.\n\n\nascii_only\nbool\nTrue\nOnly allowing ASCII characters\n\n\ntimeout\nint\n5\nThe timeout for each evaluation.\n\n\nn_procs\nint\n2\n\n\n\nuse_tqdm\nbool\nTrue\n\n\n\n\n\nAccurately Extracting Answer Strings\nEvaluatorMath can:\n\nextract short answers from long responses rather accurately\nand normalize into a mathematical expression.\n\n\n# MATH-style boxed answer\nevaluator.extract_ans(\"Therefore, $1+1=\\\\boxed{2}$.\")\n\n\n# Answer around \"answer\"\nevaluator.extract_ans(\n    \"Both $1$ and $11$ divide $11,$ so $\\\\boxed{11}=2$, and since $1,$ $2,$ $4,$ $5,$ $10,$ and $20$ divide $20,$ then $\\\\boxed{20}=6$. The inner expression, $\\\\boxed{11}\\\\times\\\\boxed{20}=2\\\\times6=12$. Finally, $\\\\boxed{12}=6$ because $1,$ $2,$ $3,$ $4,$ $6,$ and $12$ divide $12.$\\n\\nTherefore, $6$ is our answer. Please note that we have not boxed the correct answer as we normally do, as that would be especially confusing for this problem.\"\n)\n\n\n# Use the last number by default\nevaluator.extract_ans(\n    'First, we need to count the total number of letters in the word \"CIRCLE\". There are 6 letters.\\n\\nNext, we need to count the number of distinct letters. There are 6 distinct letters in the word \"CIRCLE\": C, I, R, L, E, and G.\\n\\nNow, let\\'s consider the arrangements of the distinct letters. The number of ways to arrange n distinct items is n factorial (n!). So, we have 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720 ways to arrange the distinct letters.\\n\\nHowever, the word \"CIRCLE\" has one letter that repeats (the letter \\'C\\' repeats twice). We have over-counted the number of distinct arrangements by including arrangements that are just rotations of each other (for example, \"CIRCLE\" and \"LCIRCE\" are considered different arrangements here, but they are the same word when read).\\n\\nTo correct for this, we divide the total number of arrangements by the number of ways to arrange the repeated letters. The number of ways to arrange 2 identical items is 2! = 2 × 1 = 2. So, we divide the total number of arrangements by 2 to get the correct number of distinct arrangements.\\n\\nTherefore, the number of ways to arrange the letters of the word \"CIRCLE\" is 720 ÷ 2 = 360.'\n)\n# More cases ...\n\n\n# Normalize fraction\nevaluator.extract_ans(\"The answer is 1/2\")\n\n\n# Normalize pmatrix\nevaluator.extract_ans(\n    \"The answer is \\\\begin{pmatrix} 3 \\\\\\\\ \\\\frac{\\\\pi}{2} \\\\end{pmatrix}\"\n)\n# More cases ...\n\nMore test cases:\n\n\nCode\ntest_eq(evaluator.norm_ans_str(\"864 \\\\mbox{ inches}^2\"), \"864\")\ntest_eq(evaluator.norm_ans_str(\"\\\\frac{270}7\\\\text{ degrees}\"), \"\\\\frac{270}7\")\ntest_eq(evaluator.norm_ans_str(\".0000672\"), \"0.0000672\")\n\n\n\n\nCorrectly Processing Various Mathematical Objects / Special Text\nEvaluatorMath, based on regular expressions and SymPy symbolic calculation, is able to correctly process\n\nmost mathematical objects such as matrices (vectors), intervals, symbols besides numbers,\nas well as some special texts like bool expressions, dates and times.\n\n\nevaluator.eq(\"x+y\", \"y+x\") == True  # Expression\n\n\nevaluator.eq(\"\\\\frac{1}{2}\", \"0.5\") == True  # LaTeX\n\n\nevaluator.eq(\n    \"\\\\begin{array}1\\\\\\\\2\\\\end{array}\",\n    \"1,2\",\n)  # Matrix (Vector)\n\n\nevaluator.eq(\"{1,2}\", \"{2,1}\", compare_sets=True)  # Set\n\n\nevaluator.eq(\"no\", \"false\")  # Bool\n# More mathematical objects and special texts ...\n\nMore test cases:\n\n\nCode\ntest_eq(evaluator.eq(\"251,7\\\\\\\\ \\\\noindent\", \"0\"), False)\ntest_eq(evaluator.eq(\"3.54*10^{-7}\", \"3.54e-07\"), True)\ntest_eq(evaluator.eq(r\"\\frac{1}{2}\", \"0.5\"), True)\ntest_eq(evaluator.eq(\"1\", \"100\"), False)\ntest_eq(evaluator.eq(\"100\", \"1\"), False)\ntest_eq(evaluator.eq(\"3.04\", \"0.0304\", False), True)\ntest_eq(evaluator.eq([\"0.0304\", 0.0304], \"3.04\"), True)\ntest_eq(evaluator.eq(\"x&lt;-1\", \"x&gt;3\"), False)\ntest_eq(\n    evaluator.eq(\"(-\\\\infty,0)\\\\cup(0,\\\\infty)\", \"(-\\\\infty,0)\\\\cup(0,\\\\infty)\"),\n    True,\n)\ntest_eq(evaluator.eq(\"1+2,2+1\", \"2+1,1+2\"), True)\ntest_eq(evaluator.eq(\"5\", \"5\"), True)\ntest_eq(evaluator.eq(\"0.1 + 0.2\", \"0.3\"), True)  # `0.1 + 0.2 == 0.3` is `False`\ntest_eq(evaluator.eq(\"x + y\", \"y + x\"), True)\ntest_eq(evaluator.eq(\"C\", \"C\"), True)\ntest_eq(evaluator.eq(\"1,234\", \"1234\"), True)\ntest_eq(evaluator.eq(\"12,34\", \"(12,34)\"), True)\n\ntest_eq(evaluator.eq(\"\\\\$ 5\", \"5\"), True)\ntest_eq(evaluator.eq(\"3 * \\\\sqrt{13}\", \"3\\\\sqrt{13}\"), True)\ntest_eq(evaluator.eq(\"\\\\pi/2\", \"\\\\frac{\\\\pi}{2}\"), True)\ntest_eq(evaluator.eq(\"(3,\\\\pi/2)\", \"(3,\\\\frac{\\\\pi}{2})\"), True)\ntest_eq(evaluator.eq(\"23000\", \"\\\\$23{,}000\"), True)\ntest_eq(evaluator.eq(r\"\\left(1,2\\right)\", r\"\\left(2,1\\right)\", compare_sets=True), True)\ntest_eq(evaluator.eq(\"White\", \"white\"), True)\ntest_eq(evaluator.eq(\"[0,3)\", \"[0,1]\"), False)\ntest_eq(evaluator.eq(\"[0,1]\", \"[0,3)\"), False)\ntest_eq(evaluator.eq(\"1001.5\", \"1001\"), False)\ntest_eq(evaluator.eq(\"\\\\frac{2003}{2}\", \"1001\"), False)\n\n\n\ntest_eq(evaluator.eq(\"-2,1\", \"1,-2\", compare_sets=True), True)\n\n\n\nNormalized Majority Voting\n\nmaj_answers_list, norm_answers_list = evaluator.batch_get_maj_answers(\n    [[\"\", \"\", \"1\", \"2\", \"2\", \"3\", \"3\", \"3\"]]\n)\nprint(f\"{maj_answers_list = } &lt;- {norm_answers_list = }\")\n\n\n\n\nParsing LaTeX\n\nInterval\n\nfrom symeval import latex2sympy_interval\n\n\nlatex2sympy_interval(\"(-11,-10)\\\\cup\\\\{-\\\\sqrt{110}\\\\}\")\n\n\nlatex2sympy_interval(\"(-\\\\infty, 0) \\\\cup (0, \\\\infty)\")\n\n\nlatex2sympy_interval(\"(a+b,b]\")\n\n\n\nMatrix / Vector\n\nfrom symeval import EvaluatorMathBatch\n\nevaluator = EvaluatorMathBatch()\n\n\nevaluator.latex2matrix(r\"\\sqrt{400\\cos^2(9\\pi/44)},\\frac{\\pi}{4}\")\n\n\nevaluator.latex2matrix(\n    r\"\\begin{pmatrix} \\frac{1}{2} & 0 & -\\frac{\\sqrt{3}}{2} \\\\ 0 & 1 & 0 \\\\ \\frac{\\sqrt{3}}{2} & 0 & \\frac{1}{2} \\end{pmatrix}\"\n)\n\n\ntest_eq(\n    evaluator.latex2matrix(\"\\\\begin{pmatrix}-18\\\\\\\\-49\\\\\\\\96\\\\end{pmatrix}\"),\n    Matrix([[-18, -49, 96]]),\n)\ntest_eq(\n    evaluator.latex2matrix(\"\\\\begin{pmatrix} 2 & 3 \\\\\\\\ 0 & -2 \\\\end{pmatrix}\"),\n    Matrix([[2, 3], [0, -2]]),\n)\n\n\n\n\nNormalization\n\ntest_eq(evaluator.norm_math_str(\"251,7\\\\\\\\ \\\\noindent\"), \"251,7\")\n\n\ntest_eq(fix_a_slash_b(\"(3/4)\\\\sqrt{3}\"), \"(\\\\frac{3}{4})\\\\sqrt{3}\")\n\n\ntest_eq(evaluator.norm_pm(\"x\\\\pmy\"), \"x-y,x+y\")\ntest_eq(evaluator.norm_pm(\"a\\\\mpb\"), \"a-b,a+b\")\ntest_eq(evaluator.norm_pm(\"1\\\\pm\\\\sqrt{19}\"), \"1-\\\\sqrt{19},1+\\\\sqrt{19}\")\ntest_eq(evaluator.norm_pm(r\"\\{1\\pm\\sqrt{5},-2\\}\"), \"1-\\\\sqrt{5},1+\\\\sqrt{5},-2\")\ntest_eq(\n    evaluator.norm_pm(\"\\\\(\\\\frac{1\\\\pm\\\\sqrt{17}}{4}\\\\)\"),\n    \"\\\\frac{1-\\\\sqrt{17}}{4},\\\\frac{1+\\\\sqrt{17}}{4}\",\n)\ntest_eq(\n    evaluator.norm_pm(r\"\\frac{1\\pm\\sqrt{1-\\frac{2}{\\sqrt{3}}}}{1}\"),\n    \"\\\\frac{1-\\\\sqrt{1-\\\\frac{2}{\\\\sqrt{3}}}}{1},\\\\frac{1+\\\\sqrt{1-\\\\frac{2}{\\\\sqrt{3}}}}{1}\",\n)\n\n\ntest_eq(norm_deg(r\"20^\\circ\"), r\"20\")\ntest_eq(norm_deg(r\"\\sin 20^\\circ\"), r\"\\sin {20*\\frac{\\pi}{180}}\")\n\n\ntest_eq(evaluator.norm_basic_fn(r\"sinx\"), r\"\\sin^{1}x\")\ntest_eq(evaluator.norm_basic_fn(r\"\\sin^2x\"), r\"\\sin^{2}x\")\n\n\n\nProcessing Sets\n\ntest_eq(evaluator.extract_set(\"{2,1}\"), [\"1\", \"2\"])\n\n\ntest_eq(is_set(\"{2,1}\"), True)\ntest_eq(is_set(\"orange\"), False)\ntest_eq(is_set(\"x&lt;-1orx&gt;3\"), True)\ntest_eq(is_set(\"(3/4)sqrt(3)\"), False)\n\n\n\nManipulating Strings\n\ntest_eq(evaluator.remove_first_paren_pair(\"{white}\", \"{\"), \"white\")",
+    "crumbs": [
+      "SymEval"
+    ]
+  },
+  {
+    "objectID": "index.html#contribution-guidelines",
+    "href": "index.html#contribution-guidelines",
+    "title": "SymEval",
+    "section": "Contribution Guidelines",
+    "text": "Contribution Guidelines\n\nSetup\nFor intended contributors, we recommend installing the package with the dev extras and setting up the pre-commit hooks by running:\ngit clone https://github.com/tongyx361/symeval.git\ncd symeval\npip install \".[dev]\"\npre-commit install\nconda install quarto # For nbdev\n\n\nFile Structure\nsymeval\n├── utils # Repository utilities\n├── symeval # Package code for common utilities\n└── nbs # Notebooks and other files to run tests and generate documentation with https://nbdev.fast.ai\n\n\nChecklist Before Commit\nRun the prepare-commit.sh to clean the notebooks and export scripts for pipeline notebooks, generate documentation, run tests, render README if needed:\nbash utils/prepare-commit.sh",
+    "crumbs": [
+      "SymEval"
+    ]
+  },
+  {
+    "objectID": "core.html",
+    "href": "core.html",
+    "title": "symeval",
+    "section": "",
+    "text": "from symeval import *",
+    "crumbs": [
+      "`symeval`"
+    ]
+  },
+  {
+    "objectID": "core.html#api-reference",
+    "href": "core.html#api-reference",
+    "title": "symeval",
+    "section": "API Reference",
+    "text": "API Reference\n\nsource\n\nEvaluatorMathBatch\n\n EvaluatorMathBatch (strict_extract:bool=True,\n                     include_percentage:bool=True, rel_tol:float=1e-09,\n                     abs_tol:float=1e-08, percent_rel_tol:float=0.001,\n                     ascii_only:bool=True, timeout:int=5, n_procs:int=2,\n                     use_tqdm:bool=True)\n\nBatch evaluator for math problems, capable of extracting answer segment from complex resp and processing various mathematical objects (e.g. fractions, symbolic expressions, matrices, vectors) and special text (e.g. bool values).\n\n\n\n\n\n\n\n\n\n\nType\nDefault\nDetails\n\n\n\n\nstrict_extract\nbool\nTrue\n\n\n\ninclude_percentage\nbool\nTrue\nWhether to include percentage comparisons.\n\n\nrel_tol\nfloat\n1e-09\nThe relative tolerance for numerical comparisons.\n\n\nabs_tol\nfloat\n1e-08\nThe absolute tolerance for numerical comparisons. Necessary for precision issues.\n\n\npercent_rel_tol\nfloat\n0.001\nThe absolute tolerance for percentage comparisons.\n\n\nascii_only\nbool\nTrue\nOnly allowing ASCII characters\n\n\ntimeout\nint\n5\nThe timeout for each evaluation.\n\n\nn_procs\nint\n2\n\n\n\nuse_tqdm\nbool\nTrue\n\n\n\n\n\nsource\n\n\nEvaluatorMath\n\n EvaluatorMath (strict_extract:bool=True, include_percentage:bool=True,\n                rel_tol:float=1e-09, abs_tol:float=1e-08,\n                percent_rel_tol:float=0.001, ascii_only:bool=True)\n\nEvaluator for math problems, capable of extracting answer segment from complex resp and processing various mathematical objects (e.g. fractions, symbolic expressions, matrices, vectors) and special text (e.g. bool values).\n\n\n\n\n\n\n\n\n\n\nType\nDefault\nDetails\n\n\n\n\nstrict_extract\nbool\nTrue\n\n\n\ninclude_percentage\nbool\nTrue\nWhether to include percentage comparisons.\n\n\nrel_tol\nfloat\n1e-09\nThe relative tolerance for numerical comparisons.\n\n\nabs_tol\nfloat\n1e-08\nThe absolute tolerance for numerical comparisons. Necessary for precision issues.\n\n\npercent_rel_tol\nfloat\n0.001\nThe relative tolerance for percentage comparisons. Relative for different surface forms (e.g. 99% v.s. 0.99).\n\n\nascii_only\nbool\nTrue\nOnly allowing ASCII characters\n\n\n\n\nsource\n\n\nEvaluatorBase\n\n EvaluatorBase (strict_extract:bool=True)\n\nBase class for evaluators.",
+    "crumbs": [
+      "`symeval`"
+    ]
+  }
+]
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Sitemap: https://tongyx361.github.io/symeval/sitemap.xml