shawhin/ai-job-embedding-finetuning
Viewer • Updated • 1.01k • 58 • 4
How to use shawhin/distilroberta-ai-job-embeddings with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("shawhin/distilroberta-ai-job-embeddings")
sentences = [
"Senior Data Analyst Pricing, B2B B2C Pricing Strategies, A/B Testing Analysis",
"Qualifications\n Data Engineering, Data Modeling, and ETL (Extract Transform Load) skillsData Warehousing and Data Analytics skillsExperience with data-related tools and technologiesStrong problem-solving and analytical skillsExcellent written and verbal communication skillsAbility to work independently and remotelyExperience with cloud platforms (e.g., AWS, Azure) is a plusBachelor's degree in Computer Science, Information Systems, or related field",
"Skills You BringBachelor’s or Master’s Degree in a technology related field (e.g. Engineering, Computer Science, etc.) required with 6+ years of experienceInformatica Power CenterGood experience with ETL technologiesSnaplogicStrong SQLProven data analysis skillsStrong data modeling skills doing either Dimensional or Data Vault modelsBasic AWS Experience Proven ability to deal with ambiguity and work in fast paced environmentExcellent interpersonal and communication skillsExcellent collaboration skills to work with multiple teams in the organization",
"experience, an annualized transactional volume of $140 billion in 2023, and approximately 3,200 employees located in 12+ countries, Paysafe connects businesses and consumers across 260 payment types in over 40 currencies around the world. Delivered through an integrated platform, Paysafe solutions are geared toward mobile-initiated transactions, real-time analytics and the convergence between brick-and-mortar and online payments. Further information is available at www.paysafe.com.\n\nAre you ready to make an impact? Join our team that is inspired by a unified vision and propelled by passion.\n\nPosition Summary\n\nWe are looking for a dynamic and flexible, Senior Data Analyst, Pricing to support our global Sales and Product organizations with strategic planning, analysis, and commercial pricing efforts . As a Senior Data Analyst , you will be at the frontier of building our Pricing function to drive growth through data and AI-enabled capabilities. This opportunity is high visibility for someone hungry to drive the upward trajectory of our business and be able to contribute to their efforts in the role in our success.\n\nYou will partner with Product Managers to understand their commercial needs, then prioritize and work with a cross-functional team to deliver pricing strategies and analytics-based solutions to solve and execute them. Business outcomes will include sustainable growth in both revenues and gross profit.\n\nThis role is based in Jacksonville, Florida and offers a flexible hybrid work environment with 3 days in the office and 2 days working remote during the work week.\n\nResponsibilities\n\n Build data products that power the automation and effectiveness of our pricing function, driving better quality revenues from merchants and consumers. Partner closely with pricing stakeholders (e.g., Product, Sales, Marketing) to turn raw data into actionable insights. Help ask the right questions and find the answers. Dive into complex pricing and behavioral data sets, spot trends and make interpretations. Utilize modelling and data-mining skills to find new insights and opportunities. Turn findings into plans for new data products or visions for new merchant features. Partner across merchant Product, Sales, Marketing, Development and Finance to build alignment, engagement and excitement for new products, features and initiatives. Ensure data quality and integrity by following and enforcing data governance policies, including alignment on data language. \n\n Qualifications \n\n Bachelor’s degree in a related field of study (Computer Science, Statistics, Mathematics, Engineering, etc.) required. 5+ years of experience of in-depth data analysis role, required; preferably in pricing context with B2B & B2C in a digital environment. Proven ability to visualize data intuitively, cleanly and clearly in order to make important insights simplified. Experience across large and complex datasets, including customer behavior, and transactional data. Advanced in SQL and in Python, preferred. Experience structuring and analyzing A/B tests, elasticities and interdependencies, preferred. Excellent communication and presentation skills, with the ability to explain complex data insights to non-technical audiences. \n\n Life at Paysafe: \n\nOne network. One partnership. At Paysafe, this is not only our business model; this is our mindset when it comes to our team. Being a part of Paysafe means you’ll be one of over 3,200 members of a world-class team that drives our business to new heights every day and where we are committed to your personal and professional growth.\n\nOur culture values humility, high trust & autonomy, a desire for excellence and meeting commitments, strong team cohesion, a sense of urgency, a desire to learn, pragmatically pushing boundaries, and accomplishing goals that have a direct business impact.\n\n \n\nPaysafe provides equal employment opportunities to all employees, and applicants for employment, and prohibits discrimination of any type concerning ethnicity, religion, age, sex, national origin, disability status, sexual orientation, gender identity or expression, or any other protected characteristics. This policy applies to all terms and conditions of recruitment and employment. If you need any reasonable adjustments, please let us know. We will be happy to help and look forward to hearing from you."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1 on the ai-job-embedding-finetuning dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Resulting model from example code for fine-tuning an embedding model for AI job search.
Links
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("shawhin/distilroberta-ai-job-embeddings")
# Run inference
sentences = [
'data integrity governance PowerBI development Juno Beach',
'skills: 2-5 y of exp with data analysis/ data integrity/ data governance; PowerBI development; Python; SQL, SOQL\n\nLocation: Juno Beach, FL\nPLEASE SEND LOCAL CANDIDATES ONLY\n\nSeniority on the skill/s required on this requirement: Mid.\n\nEarliest Start Date: ASAP\n\nType: Temporary Project\n\nEstimated Duration: 12 months with possible extension(s)\n\nAdditional information: The candidate should be able to provide an ID if the interview is requested. The candidate interviewing must be the same individual who will be assigned to work with our client. \nRequirements:• Availability to work 100% at the Client’s site in Juno Beach, FL (required);• Experience in data analysis/ data integrity/ data governance;• Experience in analytical tools including PowerBI development, Python, coding, Excel, SQL, SOQL, Jira, and others.\n\nResponsibilities include but are not limited to the following:• Analyze data quickly using multiple tools and strategies including creating advanced algorithms;• Serve as a critical member of data integrity team within digital solutions group and supplies detailed analysis on key data elements that flow between systems to help design governance and master data management strategies and ensure data cleanliness.',
"QualificationsAdvanced degree (MS with 5+ years of industry experience, or Ph.D.) in Computer Science, Data Science, Statistics, or a related field, with an emphasis on AI and machine learning.Proficiency in Python and deep learning libraries, notably PyTorch and Hugging Face, Lightning AI, evidenced by a history of deploying AI models.In-depth knowledge of the latest trends and techniques in AI, particularly in multivariate time-series prediction for financial applications.Exceptional communication skills, capable of effectively conveying complex technical ideas to diverse audiences.Self-motivated, with a collaborative and solution-oriented approach to problem-solving, comfortable working both independently and as part of a collaborative team.\n\nCompensationThis role is compensated with equity until the product expansion and securing of Series A investment. Cash-based compensation will be determined after the revenue generation has been started. As we grow, we'll introduce additional benefits, including performance bonuses, comprehensive health insurance, and professional development opportunities. \nWhy Join BoldPine?\nInfluence the direction of financial market forecasting, contributing to groundbreaking predictive models.Thrive in an innovative culture that values continuous improvement and professional growth, keeping you at the cutting edge of technology.Collaborate with a dedicated team, including another technical expert, setting new benchmarks in AI-driven financial forecasting in a diverse and inclusive environment.\nHow to Apply\nTo join a team that's redefining financial forecasting, submit your application, including a resume and a cover letter. At BoldPine, we're committed to creating a diverse and inclusive work environment and encouraging applications from all backgrounds. Join us, and play a part in our mission to transform financial predictions.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
ai-job-validation and ai-job-testTripletEvaluator| Metric | ai-job-validation | ai-job-test |
|---|---|---|
| cosine_accuracy | 0.9901 | 1.0 |
query, job_description_pos, and job_description_neg| query | job_description_pos | job_description_neg | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| query | job_description_pos | job_description_neg |
|---|---|---|
Data engineering Azure cloud Apache Spark Kafka |
Skills:Proven experience in data engineering and workflow development.Strong knowledge of Azure cloud services.Proficiency in Apache Spark and Apache Kafka.Excellent programming skills in Python/Java.Hands-on experience with Azure Synapse, DataBricks, and Azure Data Factory. |
requirements, and assist in data structure implementation planning for innovative data visualization, predictive modeling, and advanced analytics solutions.* Unfortunately, we cannot accommodate Visa Sponsorship for this role at this time. |
Databricks, Medallion architecture, ETL processes |
experience with Databricks, PySpark, SQL, Spark clusters, and Jupyter Notebooks.- Expertise in building data lakes using the Medallion architecture and working with delta tables in the delta file format.- Familiarity with CI/CD pipelines and Agile methodologies, ensuring efficient and collaborative development practices.- Strong understanding of ETL processes, data modeling, and data warehousing principles.- Experience with data visualization tools like Power BI is a plus.- Knowledge of cybersecurity data, particularly vulnerability scan data, is preferred.- Bachelor's or Master's degree in Computer Science, Information Systems, or a related field. |
experience with a minimum of 0+ years of experience in a Computer Science or Data Management related fieldTrack record of implementing software engineering best practices for multiple use cases.Experience of automation of the entire machine learning model lifecycle.Experience with optimization of distributed training of machine learning models.Use of Kubernetes and implementation of machine learning tools in that context.Experience partnering and/or collaborating with teams that have different competences.The role holder will possess a blend of design skills needed for Agile data development projects.Proficiency or passion for learning, in data engineer techniques and testing methodologies and Postgraduate degree in data related field of study will also help. |
Gas Processing, AI Strategy Development, Plant Optimization |
experience in AI applications for the Hydrocarbon Processing & Control Industry, specifically, in the Gas Processing and Liquefaction business. Key ResponsibilitiesYou will be required to perform the following:- Lead the development and implementation of AI strategies & roadmaps for optimizing gas operations and business functions- Collaborate with cross-functional teams to identify AI use cases to transform gas operations and business functions (AI Mapping)- Design, develop, and implement AI models and algorithms that solve complex problems- Implement Gen AI use cases to enhance natural gas operations and optimize the Gas business functions- Design and implement AI-enabled plant optimizers for efficiency and reliability- Integrate AI models into existing systems and applications- Troubleshoot and resolve technical issues related to AI models and deployments- Ensure compliance with data privacy and security regulations- Stay up-to-date with the latest advancements in AI and machine lea... |
QualificationsAbility to gather business requirements and translate them into technical solutionsProven experience in developing interactive dashboards and reports using Power BI (3 years minimum)Strong proficiency in SQL and PythonStrong knowledge of DAX (Data Analysis Expressions)Experience working with APIs inside of Power BIExperience with data modeling and data visualization best practicesKnowledge of data warehousing concepts and methodologiesExperience in data analysis and problem-solvingExcellent communication and collaboration skillsBachelor's degree in Computer Science, Information Systems, or a related fieldExperience with cloud platforms such as Azure or AWS is a plus |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
query, job_description_pos, and job_description_neg| query | job_description_pos | job_description_neg | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| query | job_description_pos | job_description_neg |
|---|---|---|
Data Engineer Snowflake ETL big data processing |
experience. But most of all, we’re backed by a culture of respect. We embrace authenticity and inspire people to thrive. |
Skills Looking For:- The project involves creating a unified data structure for Power BI reporting.- Candidate would work on data architecture and unifying data from various sources.- Data engineering expertise, including data modeling and possibly data architecture.- Proficiency in Python, SQL, and DAX.- Work with AWS data, and data storage.- Experience with cloud platforms like AWS is preferred.- Familiarity with Microsoft Power Automate and Microsoft Fabric is a plus.- Collaborating with users to understand reporting requirements for Power BI. Must be good at using Power BI tools (creating dashboards); excellent Excel skills.- Supply chain background preferred. |
GenAI applications, NLP model development, MLOps pipelines |
experience building enterprise level GenAI applications, designed and developed MLOps pipelines . The ideal candidate should have deep understanding of the NLP field, hands on experience in design and development of NLP models and experience in building LLM-based applications. Excellent written and verbal communication skills with the ability to collaborate effectively with domain experts and IT leadership team is key to be successful in this role. We are looking for candidates with expertise in Python, Pyspark, Pytorch, Langchain, GCP, Web development, Docker, Kubeflow etc. Key requirements and transition plan for the next generation of AI/ML enablement technology, tools, and processes to enable Walmart to efficiently improve performance with scale. Tools/Skills (hands-on experience is must):• Ability to transform designs ground up and lead innovation in system design• Deep understanding of GenAI applications and NLP field• Hands on experience in the design and development of NLP mode... |
skills, education, experience, and other qualifications. |
data engineering ETL cloud platforms data security |
experience. While operating within the Banks risk appetite, achieves results by consistently identifying, assessing, managing, monitoring, and reporting risks of all types. |
experience. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16learning_rate: 2e-05num_train_epochs: 1warmup_ratio: 0.1batch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | ai-job-validation_cosine_accuracy | ai-job-test_cosine_accuracy |
|---|---|---|---|
| 0 | 0 | 0.8812 | - |
| 1.0 | 51 | 0.9901 | 1.0 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-distilroberta-v1