Job Listings

Jobs (78014)

Evaluate LLM/ChatGPT on many benchmark samples and produce a report

Upwork

We are looking for a research report on OpenAI o1 model. You will evaluate the model on various benchmarks (about sample 100 questions for each) and tests.

The final report will include sample answers, tables, and graphs to show model performance in each category. A research background in LLM/AI is preferred.

The timeline is 2-3 weeks. The maximum budget is $80 but preference is given to lower offers.

Location: Anywhere

Posted: Nov. 6, 2024, 10:11 p.m.

Apply Now Company Website