AMIBECH

Annotators and Training

We provide trainings and upskilling opportunities on the subject of AI and also provide services related to the development of AI models through our social enterprise. We are conscious that each person comes with their own sets of values, beliefs, and personal biases, and we raise awareness among our annotators about the potential biases they might be exhibiting and transmitting to the data while performing the annotation. The training module on “Bias in AI” on the Humans in the Loop Training Center should be completed by all annotators in order to be eligible for annotation projects by earning the badge of “Ethical AI Annotator”.

However, we also promote a view of annotation as interpretative and collaborative work where a shared understanding must be built between annotators and the client.
We promote the use of clear guidelines, open communication, edge case examples, as well as straightforward annotation interfaces and processes, as ways to mitigate potential biases.

Humans in the Loop believes that informed annotators are empowered to make better decisions when annotating the dataset. Therefore we always share the background information of the AI project and its purpose in the Guidelines which are shared with annotators.

Best Practice

Special attention is paid to the following projects, with the following best practices for each one:

Projects which require the annotation of subjective human self-held characteristics and protected attributes, such as gender, race, disability, and others.
Instead of annotating race or ethnicity, skin color should be annotated according to a widely-accepted dermatological convention, in addition to other objective characteristics such as the presence of an eye crease.
Instead of annotating gender (especially with binary variables), context-aware labels should be used for objective annotation, such as beard, makeup, shirt
Projects which require the annotation of image-level tags for classification purposes should be discouraged given the challenges in interpretability or model predictions.
We promote the use of clear guidelines, open communication, edge case examples, as well as straightforward annotation interfaces and processes, as ways to mitigate potential biases.

Training Solutions

Data Training Solutions

Training Data

We’ll craft comprehensive prompts and answers across a variety of dimensions such as tone, delivery format, justification and more. Working closely with your team we’ll identify the right distribution of data that needs to be created to build a foundational set of training data or fine-tune an existing LLM.

Data Augmentation

Our experts will augment your training data by creating additional prompts based on examples and instructions from your application. We can also validate user-written responses as well as review and rewrite prompts generated by a model.

Synthetic Data

When real training data is too difficult or not cost effective to obtain, our team can create synthetic data sets to help train your model. Our human-in-the-loop approach ensures the highest level of quality, delivering synthetic data that can help improve model performance and reduce hallucinations, bias, and other errors.

Edge Case Generation

Our skilled team will analyze the distribution of prompts used with your model to identify gaps and potential areas of non-standard behavior. We then leverage these insights to design targeted test prompts to help train your model for a wider range of scenarios.