Meta will withhold multimodal AI models from the EU amid regulatory uncertainty

Posted on 17/07/2024 by Pranav Dixit

Meta has decided to not offer its upcoming multimodal AI model and future versions to customers in the European Union citing a lack of clarity from European regulators, according to a statement given by Meta to Axios. The models in question are designed to process not only text but also images and audio, and power AI capabilities in Meta platforms as well as the company’s Ray-Ban smart glasses.

"We will release a multimodal Llama model over the coming months, but not in the EU due to the unpredictable nature of the European regulatory environment," Meta said in a statement to Axios.

Meta’s move follows a similar decision by Apple, which recently announced it would not release its Apple Intelligence features in Europe due to regulatory concerns. Margrethe Vesteger, the EU’s competition commissioner, had slammed Apple’s move, saying that the company’s decision was a “stunning, open declaration that they know 100 percent that this is another way of disabling competition where they have a stronghold already.” Withholding Meta’s multimodal AI models from the EU could have far-reaching implications — it means that any companies that use them to build their products and services would be unable to offer them in Europe.

Thomas Regnier, an EU spokesperson, told Engadget that the regulator does not comment on individual decisions of companies. "It is the companies' responsibility to ensure that their services comply with our legislation," Regnier said in a statement and added that all companies are welcome to offer service in Europe as long as they comply with the bloc's laws, including the upcoming Artificial Intelligence Act.

Meta told Axios that it still plans to release Llama 3, the company’s upcoming text-only model in the EU. The company’s primary concern stems from the challenges of training AI models using data from European customers while complying with the General Data Protection Regulation (GDPR), the EU's existing data protection law. In May, Meta announced that it planned to use publicly available posts from Facebook and Instagram users to train future AI models but was forced to stop doing so in the EU after receiving pushback from data privacy regulators in the region. At the time, Meta defended its actions, saying that being able to train its models on the data of European users was necessary to reflect local culture and terminology.

"If we don’t train our models on the public content that Europeans share on our services and others, such as public posts or comments, then models and the AI features they power won’t accurately understand important regional languages, cultures or trending topics on social media," the company said in a blog post. "We believe that Europeans will be ill-served by AI models that are not informed by Europe’s rich cultural, social and historical contributions."

Despite its reservations about releasing its multimodal models in the EU, Meta still plans to launch them in the UK, which has similar data protection laws to the EU. The company argued that European regulators are taking longer to interpret existing laws compared to their counterparts in other regions.

Update, July 18 2024, 6:40 PM ET: This story has been updated to include a statement from an EU spokesperson.

This article originally appeared on Engadget at https://www.engadget.com/meta-will-reportedly-withhold-multimodal-ai-models-from-the-eu-amid-regulatory-uncertainty-215543292.html?src=rss

Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models

Posted on 16/07/2024 by Pranav Dixit

Some of the world’s largest tech companies trained their AI models on a dataset that included transcripts of more than 173,000 YouTube videos without permission, a new investigation from Proof News has found. The dataset, which was created by a nonprofit company called EleutherAI, contains transcripts of YouTube videos from more than 48,000 channels and was used by Apple, NVIDIA and Anthropic among other companies. The findings of the investigation spotlight AI’s uncomfortable truth: the technology is largely built on the backs of data siphoned from creators without their consent or compensation.

The dataset doesn’t include any videos or images from YouTube, but contains video transcripts from the platform's biggest creators including Marques Brownlee and MrBeast, as well as large news publishers like The New York Times, the BBC, and ABC News. Subtitles from videos belonging to Engadget are also part of the dataset.

“Apple has sourced data for their AI from several companies,” Brownlee posted on X. “One of them scraped tons of data/transcripts from YouTube videos, including mine,” he added. “This is going to be an evolving problem for a long time.”

Apple has sourced data for their AI from several companies

One of them scraped tons of data/transcripts from YouTube videos, including mine

Apple technically avoids "fault" here because they're not the ones scraping

But this is going to be an evolving problem for a long time https://t.co/U93riaeSlY
— Marques Brownlee (@MKBHD) July 16, 2024

A Google spokesperson told Engadget that previous comments made by YouTube CEO Neal Mohan saying that companies using YouTube's data to train AI models would violate the paltform's terms and service still stand. Apple, NVIDIA, Anthropic and EleutherAI did not respond to a request for comment from Engadget.

So far, AI companies haven’t been transparent about the data used to train their models. Earlier this month, artists and photographers criticized Apple for failing to reveal the source of training data for Apple Intelligence, the company own spin on generative AI coming to millions of Apple devices this year.

YouTube, the world’s largest repository of videos, in particular, is a goldmine of not only transcripts but also audio, video, and images, making it an attractive dataset for training AI models. Earlier this year, OpenAI’s chief technology officer, Mira Murati, evaded questions from The Wall Street Journal about whether the company used YouTube videos to train Sora, OpenAI’s upcoming AI video generation tool. “I’m not going to go into the details of the data that was used, but it was publicly available or licensed data,” Murati said at the time. Alphabet CEO Sundar Pichai has also said that companies using data from YouTube to train their AI models would violate of the platform’s terms of service.

If you want to see if subtitles from your YouTube videos or from your favorite channels are part of the dataset, head over to the Proof News' lookup tool.

Update, July 16 2024, 3:17 PM PT: This story has been updated to include a statement from Google.

This article originally appeared on Engadget at https://www.engadget.com/apple-nvidia-and-anthropic-reportedly-used-youtube-transcripts-without-permission-to-train-ai-models-170827317.html?src=rss

A hacking group reportedly leaked confidential data from thousands of Disney Slack channels.

Posted on 16/07/2024 by Pranav Dixit

A hacking group leaked over a terabyte of confidential data from more than 10,000 Slack channels belonging to Disney, the Wall Street Journal reported on Monday. The leaked information includes discussions about ad campaigns, computer code, details about unreleased projects and discussion about interview candidates among other things. “Disney is investigating this matter,” a company spokesperson told the Journal.

Nullbulge calls itself a hacktivist group advocating for the rights of artists. A spokesperson for the group told the Journal that it targeted Disney due to concerns about the company's handling of artist contracts and its approach to generative AI. For weeks, the group teased its access to Disney’s Slack, posting snippets of confidential information such as attendance figures for Disneyland parks on X. Nullbulge told the Journal that it accessed Disney’s confidential information by compromising an employee’s computer computer twice, including through malicious software that it buried in a videogame add-on.

For more than a year, generative AI has sparked tensions between the companies that make and use the tech and members of the creative community who have accused corporations of using their work to train AI models without consent or compensation.

This article originally appeared on Engadget at https://www.engadget.com/a-hacking-group-reportedly-leaked-confidential-data-from-thousands-of-disney-slack-channels-001124844.html?src=rss

Artists criticize Apple’s lack of transparency around Apple Intelligence data

Posted on 03/07/2024 by Pranav Dixit

Later this year, millions of Apple devices will begin running Apple Intelligence, Cupertino's take on generative AI that, among other things, lets people create images from text prompts. But some members of the creative community are unhappy about what they say is the company’s lack of transparency around the raw information powering the AI model that makes this possible.

“I wish Apple would have explained to the public in a more transparent way how they collected their training data,” Jon Lam, a video games artist and a creators’ rights activist based in Vancouver, told Engadget. “I think their announcement could not have come at a worse time.”

Creatives have historically been some of the most loyal customers of Apple, a company whose founder famously positioned it at the “intersection of technology and liberal arts.” But photographers, concept artists and sculptors who spoke to Engadget said that they were frustrated about Apple’s relative silence around how it gathers data for its AI models.

Generative AI is only as good as the data its models are trained on. To that end, most companies have ingested just about anything they could find on the internet, consent or compensation be damned. Nearly 6 billion images used to train multiple AI models also came from LAION-5B, a dataset of images scraped off the internet. In an interview with Forbes, David Holz, the CEO Midjourney, said that the company’s models were trained on “just a big scrape of the internet” and that “there isn’t really a way to get a hundred million images and know where they’re coming from.”

Artists, authors and musicians have accused generative AI companies of sucking up their work for free and profiting off of it, leading to more than a dozen lawsuits in 2023 alone. Last month, major music labels including Universal and Sony sued AI music generators Suno and Udio, startups valued at hundreds of millions of dollars, for copyright infringement. Tech companies have – ironically – both defended their actions and also struck licensing deals with content providers, including news publishers.

Some creatives thought that Apple might do better. “That’s why I wanted to give them a slight benefit of the doubt,” said Lam. “I thought they would approach the ethics conversation differently.”

Instead, Apple has revealed very little about the source of training data for Apple Intelligence. In a post published on the company’s machine learning research blog, the company wrote that, just like other generative AI companies, it grabs public data from the open web using AppleBot, its purpose-made web crawler, something that its executives have also said on stage. Apple’s AI and machine learning head John Giannandrea also reportedly said that “a large amount of training data was actually created by Apple” but did not go into specifics. And Apple has also reportedly signed deals with Shutterstock and Photobucket to license training images, but hasn’t publicly confirmed those relationships. While Apple Intelligence tries to win kudos for a supposedly more privacy-focused approach using on-device processing and bespoke cloud computing, the fundamentals girding its AI model appear little different from competitors.

Apple did not respond to specific questions from Engadget.

In May, Andrew Leung, a Los Angeles-based artist who has worked on films like Black Panther, The Lion King and Mulan, called generative AI “the greatest heist in the history of human intellect” in his testimony before the California State Assembly about the effects of AI on the entertainment industry. “I want to point out that when they use the term ‘publicly available’ it just doesn’t pass muster,” Leung said in an interview. “It doesn’t automatically translate to fair use.”

It’s also problematic for companies like Apple, said Leung, to only offer an option for people to opt out once they’ve already trained AI models on data that they did not consent to. “We never asked to be a part of it.” Apple does allow websites to opt out of being scraped by AppleBot forApple Intelligence training data – the company says it respects robots.txt, a text file that any website can host to tell crawlers to stay away – but this would be triage at best. It's not clear when AppleBot began scraping the web or how anyone could have opted out before then. And, technologically, it's an open question how or if requests to remove information from generative models can even be honored.

This is a sentiment that even blogs aimed at Apple fanatics are echoing. “It’s disappointing to see Apple muddy an otherwise compelling set of features (some of which I really want to try) with practices that are no better than the rest of the industry,” wrote Federico Viticci, founder and editor-in-chief of Apple enthusiast blog MacStories.

Adam Beane, a Los Angeles-based sculptor who created a likeness of Steve Jobs for Esquire in 2011, has used Apple products exclusively for 25 years. But he said that the company’s unwillingness to be forthright with the source of Apple Intelligence training data has disillusioned him.

"I'm increasingly angry with Apple," he told Engadget. "You have to be informed enough and savvy enough to know how to opt out of training Apple's AI, and then you have to trust a corporation to honor your wishes. Plus, all I can see being offered as an option to opt out is further training their AI with your data."

Karla Ortiz, a San Francisco-based illustrator, is one of the plaintiffs in a 2023 lawsuit against Stability AI and DeviantArt, the companies behind image generation models Stable Diffusion and DreamUp respectively, and Midjourney. “The bottom line is, we know [that] for generative AI to function as is, [it] relies on massive overreach and violations of rights, private and intellectual,” she wrote on a viral X thread about Apple Intelligence. “This is true for all [generative] AI companies, and as Apple pushes this tech down our throats, it’s important to remember they are not an exception.”

The outrage against Apple is also a part of a larger sense of betrayal among creative professionals against tech companies whose tools they depend on to do their jobs. In April, a Bloomberg report revealed that Adobe, which makes Photoshop and multiple other apps used by artists, designers, and photographers, used questionably-sourced images to train Firefly, its own image-generation model that Adobe claimed was “ethically” trained. And earlier this month, the company was forced to update its terms of service to clarify that it wouldn’t use the content of its customers to train generative AI models after customer outrage. “The entire creative community has been betrayed by every single software company we ever trusted,” said Lam. It isn’t feasible for him to switch away from Apple products entirely, he’s trying to cut back — he’s planning to give up his iPhone for a Light Phone III.

“I think there is a growing feeling that Apple is becoming just like the rest of them,” said Beane. “A giant corporation that is prioritizing their bottom line over the lives of the people who use their product.”

This article originally appeared on Engadget at https://www.engadget.com/artists-criticize-apples-lack-of-transparency-around-apple-intelligence-data-131250021.html?src=rss

Artists criticize Apple’s lack of transparency around Apple Intelligence data

Posted on 03/07/2024 by Pranav Dixit

Apple did not respond to specific questions from Engadget.

This article originally appeared on Engadget at https://www.engadget.com/artists-criticize-apples-lack-of-transparency-around-apple-intelligence-data-131250021.html?src=rss

Google’s greenhouse gas emissions climbed nearly 50 percent in five years due to AI

Posted on 03/07/2024 by Pranav Dixit

Google’s greenhouse gas emissions spiked by nearly 50 percent in the last five years thanks to energy-guzzling data centers required to power artificial intelligence, according to the company’s 2024 Environmental Report released on Tuesday. The report, which Google releases annually, shows the company’s progress towards meeting its self-proclaimed objective of becoming carbon neutral by 2030.

Google released 14.3 million metric tons of carbon dioxide in 2023, the report states, which was 48 percent higher than in 2019, and 13 percent higher than a year before. “This result is primarily due to increases in data center energy consumption and supply chain emissions,” said Google in the report. “As we further integrate AI into our products, reducing emissions may be challenging due to increasing energy demands associated with the expected increases in our technical infrastructure investment.”

Google’s report spotlights the environmental impact that the explosion of artificial intelligence has had on the planet. Google, Microsoft, Amazon, Meta, Apple and other tech companies plan to pour billions of dollars into AI, but training AI models requires enormous amounts of energy. Using AI features uses significant amounts of energy too. In 2023, researchers at AI startup Hugging Face and Carnegie Mellon University found that generating a single image using artificial intelligence can use as much energy as charging a smartphone. Analysts at Bernstein said that AI would “double the rate of US electricity demand growth and total consumption could outstrip current supply in the next two years,” the Financial Times reported. Last month, Microsoft, which also pledged to go “carbon negative” by the end of this decade, reported that its greenhouse gas emissions had risen nearly 30 percent since 2020 due to the construction of data centers.

Google’s report said that the company’s data centers were using way more water than before to stay cool as a result of expanded AI workloads. Some of those workloads so far have involved Google Search suggesting that people eat rocks and put glue on their pizza to prevent the cheese from falling off, as well as Gemini, the company’s AI-powered chatbot, generating images of ethnically diverse Nazis.

In 2023, Google’s data centers consumed 17 percent more water than the year before. That’s 6.1 billion liters, enough to irrigate approximately 41 golf courses annually in the southwestern United States, according to the company’s strangely kooky measure.

“As our business and industry continue to evolve, we expect our total GHG (greenhouse gas) emissions to rise before dropping toward our absolute emissions reduction target,” Google’s report stated, without explaining what would precipitate the drop. “Predicting the future environmental impact of AI is complex and evolving, and our historical trends likely don’t fully capture AI’s future trajectory. As we deeply integrate AI across our product portfolio, the distinction between AI and other workloads will not be meaningful.”

This article originally appeared on Engadget at https://www.engadget.com/googles-greenhouse-gas-emissions-climbed-nearly-50-percent-in-five-years-due-to-ai-002646115.html?src=rss

Google’s greenhouse gas emissions climbed nearly 50 percent in five years due to AI

Posted on 03/07/2024 by Pranav Dixit

This article originally appeared on Engadget at https://www.engadget.com/googles-greenhouse-gas-emissions-climbed-nearly-50-percent-in-five-years-due-to-ai-002646115.html?src=rss

Midjourney is creating Donald Trump pictures when asked for images of ‘the president of the United States’

Posted on 01/07/2024 by Pranav Dixit

Midjourney, a popular AI-powered image generator, is creating images of Donald Trump and Joe Biden despite saying that it would block users from doing so ahead of the upcoming US presidential election.

When Engadget prompted the service to create an image of “the president of the United States,” Midjourney generated four images in various styles of former president Donald Trump.

Midjourney created an image of Trump despite saying it wouldn't.

When asked to create an image of “the next president of the United States,” the tool generated four images of Trump as well.

Midjourney generated Donald Trump images despite saying it wouldn't.

When Engadget prompted Midjourney to create an image of “the current president of the United States,” the service generated three images of Trump and one image of former president Barack Obama.

Midjourney also created an image of former President Obama

The only time Midjourney refused to create an image of Trump or Biden was when it was asked to do so explicitly. “The Midjourney community voted to prevent using ‘Donald Trump’ and ‘Joe Biden’ during election season,” the service said in that instance. Other users on X were able to get Midjourney to generate Trump’s images too.

The tests show that Midjourney’s guardrails to prevent users from generating images of Trump and Biden ahead of the upcoming US presidential election aren’t enough — in fact, it’s really easy for people to get around them. Other chatbots like OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini and Meta AI did not create images of Trump or Biden despite multiple prompts.

Midjourney did not respond to a request for comment from Engadget.

Midjourney was one the first AI-powered image generators to explicitly ban users from generating images of Trump and Biden. “I know it’s fun to make Trump pictures — I make Trump pictures,” the company’s CEO, David Holz, told users in a chat session on Discord, earlier this year. “However, probably better to just not — better to pull out a little bit during this election. We’ll see.” A month later, Holz reportedly told users that it was time to “put some foots down on election-related stuff for a bit” and admitted that “this moderation stuff is kind of hard.” The company’s existing content rules prohibit the creation of “misleading public figures” and “events portrayals” with the “potential to mislead.”

Last year, Midjourney was used to create a fake image of Pope Francis wearing a puffy white Balenciaga jacket that went viral. It was also used to create fake images of Trump being arrested ahead of his arraignment at the Manhattan Criminal Court last year for his involvement in a hush money payment made to adult film star Stormy Daniels. Shortly afterwards, the company halted free trials of the service and, instead, required people to pay at least $10 a month to use it.

Last month, the Center for Countering Digital Hate, a non-profit organization that aims to stop the spread of misinformation and hate speech online, found that Midjourney’s guardrails against generating misleading images of popular politicians including Trump and Biden failed 40% of its tests. The CCDH was able to use Midjourney to create an image of president Biden being arrested and Trump appearing next to a body double. The CCDH was also able to bypass Midjourney’s guardrails by using descriptions of each candidate’s physical appearance rather than their names to generate misleading images.

“Midjourney is far too easy to manipulate in practice – in some cases it’s completely evaded just by adding punctuation to slip through the net,” wrote CCDH CEO Imran Ahmed in a statement at the time. “Bad actors who want to subvert elections and sow division, confusion and chaos will have a field day, to the detriment of everyone who relies on healthy, functioning democracies.

Earlier this year, a coalition of 20 tech companies including OpenAI, Google, Meta, Amazon, Adobe and X signed an agreement to help prevent deepfakes in elections taking place in 2024 around the world by preventing their services from generating images and other media that would influence voters. Midjourney was absent from that list.

This article originally appeared on Engadget at https://www.engadget.com/midjourney-is-creating-donald-trump-pictures-when-asked-for-images-of-the-president-of-the-united-states-212427937.html?src=rss

Please don’t get your news from AI chatbots

Posted on 28/06/2024 by Pranav Dixit

This is your periodic reminder that AI-powered chatbots still make up things and lie with all the confidence of a GPS system telling you that the shortest way home is to drive through the lake.

My reminder comes courtesy of Nieman Lab, which ran an experiment to see if ChatGPT would provide correct links to articles from news publications it pays millions of dollars to. It turns out that ChatGPT does not. Instead, it confidently makes up entire URLs, a phenomenon that the AI industry calls “hallucinating,” a term that seems more apt for a real person high on their own bullshit.

Nieman Lab’s Andrew Deck asked the service to provide links to high-profile, exclusive stories published by 10 publishers that OpenAI has struck deals worth millions of dollars with. These included the Associated Press, The Wall Street Journal, the Financial Times, The Times (UK), Le Monde, El País, The Atlantic, The Verge, Vox, and Politico. In response, ChatGPT spat back made-up URLs that led to 404 error pages because they simply did not exist. In other words, the system was working exactly as designed: by predicting the most likely version of a story’s URL instead of actually citing the correct one. Nieman Lab did a similar experiment with a single publication — Business Insider — earlier this month and got the same result.

An OpenAI spokesperson told Nieman Lab that the company was still building “an experience that blends conversational capabilities with their latest news content, ensuring proper attribution and linking to source material — an enhanced experience still in development and not yet available in ChatGPT.” But they declined to explain the fake URLs.

We don’t know when this new experience will be available or how reliable it will be. Despite this, news publishers continue to feed years of journalism into OpenAI’s gaping maw in exchange for cold, hard cash because the journalism industry has consistently sucked at figuring out how to make money without selling its soul to tech companies. Meanwhile, AI companies are chowing down on content published by anyone who hasn’t signed these Faustian bargains and using it to train their models anyway. Mustafa Suleyman, Microsoft’s AI head, recently called anything published on the internet “freeware” that is fair game for training AI models. Microsoft was valued at $3.36 trillion at the time I wrote this.

There’s a lesson here: If ChatGPT is making up URLs, it’s also making up facts. That’s how generative AI works — at its core, the technology is a fancier version of autocomplete, simply guessing the next plausible word in a sequence. It doesn’t “understand” what you say, even though it acts like it does. Recently, I tried getting our leading chatbots to help me solve the New York Times Spelling Bee and watched them crash and burn.

If generative AI can’t even solve the Spelling Bee, you shouldn't use it to get your facts.

This article originally appeared on Engadget at https://www.engadget.com/please-dont-get-your-news-from-ai-chatbots-000027227.html?src=rss

The nation’s oldest nonprofit newsroom is suing OpenAI and Microsoft

Posted on 27/06/2024 by Pranav Dixit

The Center for Investigative Reporting, the nation’s oldest nonprofit newsroom that produces Mother Jones and Reveal sued OpenAI and Microsoft in federal court on Thursday for allegedly using its content to train AI models without consent or compensation. This is the latest in a long line of lawsuits filed by publishers and creators accusing generative AI companies of violating copyright.

“OpenAI and Microsoft started vacuuming up our stories to make their product more powerful, but they never asked for permission or offered compensation, unlike other organizations that license our material,” said Monika Bauerlein, CEO of the Center for Investigative Reporting, in a statement. “This free rider behavior is not only unfair, it is a violation of copyright. The work of journalists, at CIR and everywhere, is valuable, and OpenAI and Microsoft know it.” Bauerlein said that OpenAI and Microsoft treat the work of nonprofit and independent publishers “as free raw material for their products," and added that such moves by generative AI companies hurt the public’s access to truthful information in a “disappearing news landscape.”

OpenAI and Microsoft did not respond to a request for comment by Engadget.

The CIR’s lawsuit, which was filed in Manhattan’s federal court, accuses OpenAI and Microsoft, which owns nearly half of the company, of violating the Copyright Act and the Digital Millennium Copyright Act multiple times.

News organizations find themselves at an inflection point with generative AI. While the CIR is joining publishers like The New York Times, New York Daily News, The Intercept, AlterNet and Chicago Tribune in suing OpenAI, others publishers have chosen to strike licensing deals with the company. These deals will allow OpenAI to train its models on archives and ongoing content published by these publishers and cite information from them in responses offered by ChatGPT.

On the same day as the CIR sued OpenAI, for instance, TIME magazine announced a deal with the company that would grant it access to 101 years of archives. Last month, OpenAI signed a $250 million multi-year deal with News Corp, the owner of The Wall Street Journal, to train its models on more than a dozen brands owned by the publisher. The Financial Times, Axel Springer (the owner of Politico and Business Insider), The Associated Press and Dotdash Meredith have also signed deals with OpenAI.

This article originally appeared on Engadget at https://www.engadget.com/the-nations-oldest-nonprofit-newsroom-is-suing-openai-and-microsoft-174748454.html?src=rss

GIZMODO.cz

Syndicated rss news

Author Archives: Pranav Dixit

Meta will withhold multimodal AI models from the EU amid regulatory uncertainty

Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models

A hacking group reportedly leaked confidential data from thousands of Disney Slack channels.

Artists criticize Apple’s lack of transparency around Apple Intelligence data

Artists criticize Apple’s lack of transparency around Apple Intelligence data

Google’s greenhouse gas emissions climbed nearly 50 percent in five years due to AI

Google’s greenhouse gas emissions climbed nearly 50 percent in five years due to AI

Midjourney is creating Donald Trump pictures when asked for images of ‘the president of the United States’

Please don’t get your news from AI chatbots

The nation’s oldest nonprofit newsroom is suing OpenAI and Microsoft