We present SciAgentGym, the first benchmark environment for evaluating LLM agents' capability in multi-step scientific tool-use. SciAgentGym provides a comprehensive suite of scientific tools across ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results