About Harsh Singhal
Harsh Singhal is a global Data Science and Machine Learning leader.
With over 15 years of experience, Harsh is an industry-recognized leader in Machine Learning, Data Science, and Artificial Intelligence. Harsh's career journey spans various verticals across global markets and top-tech companies like LinkedIn, Netflix and Palo Alto Networks with a focus on delivering data-driven solutions and driving business outcomes.
Harsh has a proven record of building and scaling high-performance teams - a testament to this is his success at Koo, where Harsh spearheaded the expansion of ML/Data Science teams.
As a result-driven professional, Harsh has applied ML/AI techniques to a wide array of business problems, from bot detection and spam detection to sales product recommendation and account takeover prevention during my time at LinkedIn and Netflix.
This broad spectrum of experience shows his flexibility and adaptability in handling complex business challenges.
Harsh's innovative approach is underpinned by patents in key areas like bot detection and threat detection. Harsh also has publications in renowned forums like the IEEE Systems, Man, and Cybernetics Society.
Harsh maintains an online publication datascience.fm that attracts thousands of readers every month and has seen contributions from student leaders and industry professionals.
Harsh has developed Molecule Search to provide molecule-based patent search to medicinal chemistry enthusiasts. This product applies vector similarity search using RDKit Postgres extension. The product also includes ChatGPT based patent summary and applicant patent landscape analysis.
Molecule Search is a great example of a data product.
Harsh started his career with Mu Sigma in 2008 where he developed the first ever industry curricula for R. At Mu Sigma, Harsh was a founding member of the Innovations and Development team where he worked on automating Machine Learning models, a paradigm that was later termed as AutoML. Harsh relocated to the Bay Area in 2011 after joining LinkedIn's Bangalore office and being hired as the first Data Scientist in LinkedIn India.
Between 2011 and 2020 Harsh lived in the Bay Area, California where he developed impactful Data Science & Machine Learning (DSML) solutions at companies such as LinkedIn and Netflix
After having spent a decade in California, Harsh decided to move to India in early 2021.
Harsh joined Koo in late 2021 to build their ML/AI team. Harsh quickly scaled the ML team from 3 to 20 engineers, comprising folks in Data Science, Machine Learning, and ML Ops.
Koo has been downloaded by more than 50M+ users across the world.
Under Harsh's leadership the team successfully delivered ML-powered product features such as ChatGPT assisted writing tools for creators, Semantic Search, Multilingual Topics, People You May Know, Content Recommendation, Feed Ranking, Trending Topics, led all aspects of Content Moderation and Spam detection, and was responsible for developing all personalization features across the app.
A key highlight of Harsh's contributions at Koo was to deliver an industry-first multilingual Topics feature. This feature allowed every user irrespective of their native language to find content based on their Topic of interest and increased the average retention and time spent amongst millions of Koo users. Topics was developed on the back of many innovations in the field such as the use of multilingual embeddings, open-source model training and deployment technologies such as Ludwig and BentoML.
Koo collaborated with AI4Bharat very closely to develop KooBERT, the first open source BERT model training on multilingual microblog content.
Personalization features such as Recommended For You enabled creators to be disovered more efficiently by users who used Koo to connect with their favorite content and creators. These features were developed using large-scale data engineering technologies such as Spark and recommender system algorithms such as ALS.
The Content Moderation technologies developed by the ML team led by Harsh provided a safe environment for users. The team adopted cutting edge infrastructure elements such as Vector databases like Milvus and fine-tuned Large Language Models (LLMs) such as LLama2 and mT5 to detect toxicity.
Harsh has been invited to panel discussion on Indian language tech covered by GoI think tank on AI and is a strong supporter of developing technologies out of India.
Harsh has a YouTube channel where he posts videos on a variety of topics of interest to professionals in the Data and AI ecosystem.
Harsh Singhal actively works with student communities and guides them to excel in their journey towards DSML excellence. Harsh is also involved as an advisor in developing DSML curricula at academic institutions to increase AI talent density amongst India's student community.