Join our research team to solve information extraction 🙂You need to be an ML, NLP, and LLM expert**PhD or Master 2 RequiredWe are looking for a Research Scientist to create VLMs such as NuExtract3 to power the https://nuextract.ai/ platform.

Your job will involve creating datasets, training LLMs, performing experiments / ablation studies, and so on. Check the list of typical topics bellow.

We release our models with open-source licenses and occasionally publish papers about them.

You will join a team of brilliant ML scientists supervised by our CEO (https://www.linkedin.com/in/etiennebcp/).

We are a 3-years-old AI startup with 12 employees located in Station F, Paris. We did YCombinator.

We have a hybrid work model -- you should be able to work from our office regularly (at least once a week).

Requirements

Research Master 2, or PhD.Strong ML/NLP/LLM background.
Self-driven, creative, passionate about ML/NLP/LLMs.
Knows how to fine-tune an LLM (both SFT and RL). Up to date with LLM research.
Researcher and builder mindset.
Enjoy startup environment (fast pace, frequent changes of directions)

Responsibilities

Training task-specific LLMsRunning experiments/ablation studies
Creating datasets
Developing software related to LLMsStaying up to date with relevant LLM & NLP research
Typical R&D topics we are working on (non exhaustive list):
1. Extraction Confidence
Users of NuExtract.ai want to be able to quickly verify the validity of extracted values in the JSON output.
To do so, they need to know which values NuExtract is confident about, and which ones it is not.
We want to figure out how we can get an uncertainty score for the extraction values of NuExtract.
This is not trivial due to multiplicity of correct answers and correlations between answers.
1. Extraction Localization Users of NuExtract.ai want to be able to quickly verify the validity of extracted values.
To do so, they need to know where, in the document, the information is coming from (or deduced from).
We want to figure out how to do this best.
1. Long Document Extraction
LLMs have a limited context length which limits document size.
We want to figure out how NuExtract could extract information from documents much longer than its context length.
1. Reasoning for Structured Extraction
We want to train NuExtract able to reason via private chain of thoughts about its extraction.
1. Extraction Agent We want to provide a reasoning NuExtract the ability of using tools (e.g. zooming on document or performing a web search) in order to improve extraction quality.
1. Structured Extraction Benchmark
There is no public benchmark for structured extraction.
We want to create such benchmark and make it public.
Links:
Platform: https://nuextract.ai/Blog posts: https://about.nuextract.ai/blog
Hugging Face: https://huggingface.co/numind
Github: https://github.com/numindai
Discord: https://discord.com/invite/3ts
EtJNCDeNuNER paper: https://arxiv.org/abs/2402.15343

Your job will involve creating datasets, training LLMs, performing experiments / ablation studies, and so on. Check the list of typical topics bellow.

We release our models with open-source licenses and occasionally publish papers about them.

You will join a team of brilliant ML scientists supervised by our CEO (https://www.linkedin.com/in/etiennebcp/).

We are a 3-years-old AI startup with 12 employees located in Station F, Paris. We did YCombinator.

We have a hybrid work model -- you should be able to work from our office regularly (at least once a week).

Requirements

Research Master 2, or PhD.Strong ML/NLP/LLM background.
Self-driven, creative, passionate about ML/NLP/LLMs.
Knows how to fine-tune an LLM (both SFT and RL). Up to date with LLM research.
Researcher and builder mindset.
Enjoy startup environment (fast pace, frequent changes of directions)

Responsibilities

Training task-specific LLMsRunning experiments/ablation studies
Creating datasets
Developing software related to LLMsStaying up to date with relevant LLM & NLP research
Typical R&D topics we are working on (non exhaustive list):
1. Extraction Confidence
Users of NuExtract.ai want to be able to quickly verify the validity of extracted values in the JSON output.
To do so, they need to know which values NuExtract is confident about, and which ones it is not.
We want to figure out how we can get an uncertainty score for the extraction values of NuExtract.
This is not trivial due to multiplicity of correct answers and correlations between answers.
1. Extraction Localization Users of NuExtract.ai want to be able to quickly verify the validity of extracted values.
To do so, they need to know where, in the document, the information is coming from (or deduced from).
We want to figure out how to do this best.
1. Long Document Extraction
LLMs have a limited context length which limits document size.
We want to figure out how NuExtract could extract information from documents much longer than its context length.
1. Reasoning for Structured Extraction
We want to train NuExtract able to reason via private chain of thoughts about its extraction.
1. Extraction Agent We want to provide a reasoning NuExtract the ability of using tools (e.g. zooming on document or performing a web search) in order to improve extraction quality.
1. Structured Extraction Benchmark
There is no public benchmark for structured extraction.
We want to create such benchmark and make it public.
Links:
Platform: https://nuextract.ai/Blog posts: https://about.nuextract.ai/blog
Hugging Face: https://huggingface.co/numind
Github: https://github.com/numindai
Discord: https://discord.com/invite/3ts
EtJNCDeNuNER paper: https://arxiv.org/abs/2402.15343

Machine Learning Scientist, LLM Training

Job Description

Requirements

Responsibilities

Apply now

Stay at the forefront
of market

Machine Learning Scientist, LLM Training

Job Description

Requirements

Responsibilities

Apply now

Machine Learning Scientist, LLM Training

Job Description

Requirements

Responsibilities

Apply now

Stay at the forefront of market

Machine Learning Scientist, LLM Training

Job Description

Requirements

Responsibilities

Apply now

Stay at the forefront
of market