"Stable Diffusion Data Set Creator Removes CSAM"

Unveiling the Re-LAION-5B Data Set: LAION's Latest Move to Combat Illegal Content

LAION, the renowned German research organization behind the training of various generative AI models, has just released a new data set that boasts of being meticulously cleansed of any known links to suspected child sexual abuse material (CSAM). This groundbreaking release, known as Re-LAION-5B, is actually an improved version of the previous LAION-5B data set. It has been refined with input from respected entities like the Internet Watch Foundation, Human Rights Watch, the Canadian Center for Child Protection, and the now-defunct Stanford Internet Observatory. Available in two versions - Re-LAION-5B Research and Re-LAION-5B Research-Safe (which goes a step further by eliminating additional NSFW content) - this data set has been meticulously curated to filter out thousands of links to known and "likely" CSAM.

LAION has made it clear in a blog post that they have always been dedicated to removing illegal content from their datasets and have taken proactive measures to ensure compliance with this commitment. Importantly, it should be noted that LAION's data sets do not contain images per se; rather, they consist of indexes of links to images and image alt text sourced from a different data set known as the Common Crawl, which aggregates information from various websites and webpages.

The release of Re-LAION-5B follows a thorough investigation conducted by the Stanford Internet Observatory in December 2023, which revealed that the previous LAION-5B data set, particularly a subset called LAION-5B 400M, contained links to illegal images sourced from social media platforms and adult websites. This subset also included links to inappropriate content such as pornographic imagery, racist language, and harmful stereotypes.

In response to this revelation, LAION temporarily took LAION-5B offline. The Stanford report recommended that models trained on LAION-5B should be deprecated and distribution halted where possible. Subsequently, AI startup Runway removed its Stable Diffusion 1.5 model from the AI hosting platform Hugging Face, signaling a shift in the industry.

The new Re-LAION-5B data set, consisting of approximately 5.5 billion text-image pairs and released under an Apache 2.0 license, provides metadata that can assist third parties in cleaning up existing copies of LAION-5B by removing any illicit content. Despite LAION's emphasis on the research-oriented nature of their data sets, some organizations, like Stability AI and even Google in the past, have utilized them for commercial purposes.

In conclusion, LAION's latest release marks a significant step towards combating illegal content in AI training data sets. By making the Re-LAION-5B data set available, LAION hopes to encourage research labs and organizations to transition from outdated data sets to the enhanced, sanitized version. This move not only upholds ethical standards but also ensures the integrity and legality of AI models trained on these datasets, ultimately safeguarding against the propagation of harmful content in the digital space.

What's Hot

Loop Media secures $525K subordinated loan – Investment News

Laser Photonics Q2 2024 Results: Mixed Performance Amid Growth

“Expert analysis: Sterling Check’s appointment of interim chief accounting officer”

“Stable Diffusion Data Set Creator Removes CSAM”

“Trump Opposes Florida Abortion Rights Amendment: Impact on Investments”

Brazil Bans Elon Musk’s X: Top Court Decision

Sustainable Aviation Fuels: Ready for Takeoff?

US Crypto Leaders Host Fundraiser for Vice President Harris

OpenAI Valued at $100B+ by Investors

Brazil Supreme Court Halts Elon Musk’s XSuspension

Review: Record Shares of Voters Turned Out for 2020 election

EU: ‘Addiction’ to Social Media Causing Conspiracy Theories

World’s Most Advanced Oil Rig Commissioned at ONGC Well

Melbourne: All Refugees Held in Hotel Detention to be Released

GENERALIFX Review: A Comprehensive Look at AI-Powered Trading Software

Queen Elizabeth the Last! Monarchy Faces Fresh Demand to be Axed

Marquez Explains Lack of Confidence During Qatar GP Race

News

Company

Services

What's Hot

“Stable Diffusion Data Set Creator Removes CSAM”

Keep Reading

News

Company

Services

Subscribe to Updates