On-call, on-site data labelling service

Submitted by Gaurav Kaila
one year ago
A data labelling service for AI projects which require large amount of labelled data in order to to train AI models. Users benefit by focussing on doing high quality work such as building efficient AI model architecture, improving end-to-end model ML pipelines and validating results .
As an AI practitioner, I spend 25-30% of my time weekly on data labelling due to resource shortages, in-experience of remote data labellers and data privacy issues such as GDPR. This results in me spending less of my time doing higher-value work such as research into AI modelling, feature selection and extraction, result validation and building robust AI pipelines.

There are remote labelling services available such as MightyAI, Google AI platform, Amazon Mechanical Turk but all these services require you to transfer your data to the third party and human labellers doing labelling remotely and you validating their results. This is good for certain projects and corporations where data privacy is not a concern or where data can be transferred out of the premise.

Unfortunately, an increasing number of corporations are focussing on data privacy especially with GDPR in EU. This results in data labelling happening in the house, mostly by junior members of the team leading to unhappy employees, bad data labelling and poor AI results.

My idea addresses this gap in the market by proposing an on-call, on-site data labelling service where freelancers could come on-premise and label data on authorized machines, without data going out of site. As project managers, users of this service could supervise the labelling process and immediately resolve any issue. Manual QA of the labelling process could also happen simultaneously by different members of the team such that once the data is labelled, it is ready for ingestion without having to spend time in transferring data from remote location to the cloud or on-premise compute.