Job Listings

NLP Expert for Frequency Word Definitions

Upwork

Description:

We are seeking an experienced Machine Learning (ML) expert to assist in preparing a dataset of the 100k most common English words.

The goal is to compile, structure, and process a comprehensive set of metadata for each word entry, including pronunciation, part of speech, definitions, synonyms, usage examples, and more.

Responsibilities:

Dataset Compilation: Extract and compile a list of the 50k, 100k, 200k, most common English words, ensuring that all entries are lemmatized (i.e., in their base or dictionary form).

Metadata Collection: Develop scripts or use APIs to gather relevant metadata for each word, including:

Pronunciation (preferably in a standard dictionary format).

Part of speech.

Concise definitions.

Example sentences.

Synonyms and antonyms.

Etymology (optional).

Word frequency data.

- Note that this metadata must be commercially usable

Requirements:

Expertise in Machine Learning and Data Science: Proven experience in data extraction, processing, and analysis.

Familiarity with Linguistic Data: Experience working with linguistic datasets, corpora, or dictionary projects is a strong plus.

Programming Skills: Proficiency in Python or similar programming languages, with experience using libraries such as NLTK, spaCy, or similar for natural language processing (NLP).

API Experience: Experience working with APIs like Wiktionary, WordNet, or other linguistic databases.

Attention to Detail: Strong focus on data quality and accuracy.

Communication: Ability to clearly communicate progress, challenges, and results.

Deliverables:

- A structured dataset of say 100k most common english English lemmatized words to start, with complete metadata.

Scripts or tools used to gather and process the data, with clear documentation.

Location: Anywhere

Posted: Sept. 1, 2024, 7:25 a.m.

Apply Now Company Website