| Job Position | Company | Posted | Location | Salary | Tags |
|---|---|---|---|---|---|
Binance | Hong Kong, Hong Kong |
| |||
Binance | Bangkok, Thailand |
| |||
Gemini | Seattle, WA, United States | $168k - $240k | |||
Zinnia | Remote | $140k - $160k | |||
| Learn job-ready web3 skills on your schedule with 1-on-1 support & get a job, or your money back. | | by Metana Bootcamp Info | |||
Sorare | New York, NY, United States | $31k | |||
Launchpadtechnologiesinc | Remote | $88k - $100k | |||
Binance | Brisbane, Australia |
| |||
ION Group | New York, NY, United States | $112k - $120k | |||
Kraken | Canada | $72k - $100k | |||
Binance | Taipei, Taiwan |
| |||
Integra | Remote | $21k - $52k | |||
Integra | Remote | $79k - $85k | |||
Coinhako | Singapore, Singapore | $80k - $154k | |||
Dvtrading | Remote | $130k - $170k | |||
Bluecubeservices | Remote | $79k - $87k |
LLM Applied Data Scientist (RAG/ NLP)
Responsibilities
- Design, develop, and optimize data processing and retrieval pipelines for enterprise-level generative tasks and mode training applications  (Customer Service, Token Report, Web3 Domain Models). This includes embedding, reranking, context engineering, and query rewriting models.
- Research and evaluate advanced AI-native retrieval algorithms (e.g., low-latency, multimodal retrieval, hierarchical retrieval, GraphRAG) to strengthen large-scale LLM/VLM/Agentic AI capabilities in Binance products.
- Collaborate with infrastructure and application teams to integrate RAG pipelines into production systems, ensuring scalability, reliability, and measurable business impact.
- Develop and optimize retrieval and ranking pipelines (indexing, vector search, retrieval scoring, reranking) to improve user experience.
- Participate in LLM training and RAG system, staying current with techniques such as pre-training, SFT, and reinforcement learning, and apply them to retrieval and generation tasks.
- Apply NLP, CV, and multimodal methods to analyze user-generated content (classification, quality evaluation, trend detection, comment analysis).
Requirement
- Master’s in Information Retrieval, NLP, Machine Learning, Computer Vision, Multimodal Learning, or related fields.
- Proficient in PyTorch with strong coding skills in Python or C++.
- Strong communication skills, intellectual curiosity, and passion for lifelong learning. Able to identify opportunities and drive cutting-edge retrieval & RAG technologies into real-world applications.
- Solid theoretical foundation in information retrieval, NLP, and deep learning (experience with embeddings, reranking, query understanding preferred).
- Hands-on experience with RAG, vector databases, multimodal/graph retrieval, or large-scale AI systems.
- Strong engineering ability to translate research into scalable, production-level systems.
- Self-driven, able to own projects end-to-end (design → implementation → deployment).
- Publications in top-tier conferences/journals (NeurIPS, ICML, ACL, CVPR, SIGIR, KDD, WWW) are a plus; awards in ACM/ICPC or similar competitions preferred.
What does a data scientist in web3 do?
A data scientist in web3 is a type of data scientist who focuses on working with data related to the development of web-based technologies and applications that are part of the larger web3 ecosystem
This can include working with data from decentralized applications (DApps), blockchain networks, and other types of distributed and decentralized systems
In general, a data scientist in web3 is responsible for using data analysis and machine learning techniques to help organizations and individuals understand, interpret, and make decisions based on the data generated by these systems
Some specific tasks that a data scientist in web3 might be involved in include developing predictive models, conducting research, and creating data visualizations.