Cloudflare is taking a stand against AI website scrapers

Cloudflare has released a new free tool that prevents AI companies' bots from scraping its clients' websites for content to train large language models. The cloud service provider is making this tool available to its entire customer base, including those on free plans. "This feature will automatically be updated over time as we see new fingerprints of offending bots we identify as widely scraping the web for model training," the company said.

In a blog post announcing this update, Cloudflare's team also shared some data about how its clients are responding to the boom of bots that scrape content to train generative AI models. According to the company's internal data, 85.2 percent of customers have chosen to block even the AI bots that properly identify themselves from accessing their sites.

Cloudflare also identified the most active bots from the past year. The Bytedance-owned Bytespider bot attempted to access 40 percent of websites under Cloudflare's purview, and OpenAI's GPTBot tried on 35 percent. They were half of the top four AI bot crawlers by number of requests on Cloudflare's network, along with Amazonbot and ClaudeBot.

It's proving very difficult to fully and consistently block AI bots from accessing content. The arms race to build models faster has led to instances of companies skirting or outright breaking the existing rules around blocking scrapers. Perplexity AI was recently accused of scraping websites without the required permissions. But having a backend company at the scale of Cloudflare getting serious about trying to put the kibosh on this behavior could lead to some results.

"We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection," the company said. "We will continue to keep watch and add more bot blocks to our AI Scrapers and Crawlers rule and evolve our machine learning models to help keep the Internet a place where content creators can thrive and keep full control over which models their content is used to train or run inference on."

This article originally appeared on Engadget at https://www.engadget.com/cloudflare-is-taking-a-stand-against-ai-website-scrapers-220030471.html?src=rss

Virtual tabletop gaming platform Roll20 experienced a serious data breach

Popular virtual tabletop service Roll20 has experienced a serious security breach, according to an email the company sent out to users. The email, written on July 2, warned users that their personal data may have been exposed, including “first and last name, email address, last known IP address, and the last four digits” of credit cards. However, the breach didn’t expose passwords or full financial information, so that’s good.

The company discovered “unauthorized access” to an administrative account last week. It immediately blocked the impacted account, but this particular account had access to the aforementioned personal information. Roll20 doesn’t know if anyone actually used this breach to scoop up data, saying it has “no reason to believe that your personal information has been misused” and that it’s notifying users “out of an abundance of caution.”

Engadget reached out to the company for more information regarding the timeline and the potential impact. We’ll update this post when we hear more. “We truly regret that this incident occurred on our watch,” Roll20 founder Riley Dutton told Wargamer.

It’s worth noting that users have been asking the company to implement two-factor authentication (2FA) for years, to no avail. It experienced a similar data breach in 2018 that impacted four million users. It’s probably time for Roll20 to bump its charisma stats and approach a 2FA service provider, for the good of the realms. 

This article originally appeared on Engadget at https://www.engadget.com/virtual-tabletop-gaming-platform-roll20-experienced-a-serious-data-breach-181052179.html?src=rss

Virtual tabletop gaming platform Roll20 experienced a serious data breach

Popular virtual tabletop service Roll20 has experienced a serious security breach, according to an email the company sent out to users. The email, written on July 2, warned users that their personal data may have been exposed, including “first and last name, email address, last known IP address, and the last four digits” of credit cards. However, the breach didn’t expose passwords or full financial information, so that’s good.

The company discovered “unauthorized access” to an administrative account last week. It immediately blocked the impacted account, but this particular account had access to the aforementioned personal information. Roll20 doesn’t know if anyone actually used this breach to scoop up data, saying it has “no reason to believe that your personal information has been misused” and that it’s notifying users “out of an abundance of caution.”

Engadget reached out to the company for more information regarding the timeline and the potential impact. We’ll update this post when we hear more. “We truly regret that this incident occurred on our watch,” Roll20 founder Riley Dutton told Wargamer.

It’s worth noting that users have been asking the company to implement two-factor authentication (2FA) for years, to no avail. It experienced a similar data breach in 2018 that impacted four million users. It’s probably time for Roll20 to bump its charisma stats and approach a 2FA service provider, for the good of the realms. 

This article originally appeared on Engadget at https://www.engadget.com/virtual-tabletop-gaming-platform-roll20-experienced-a-serious-data-breach-181052179.html?src=rss

Meta is changing its policy for the most-moderated word on its platforms

Meta is changing a long-running policy regarding the Arabic word “shaheed,” which has been described as the most-moderated word on the company’s apps. The company said in an update to the Oversight Board that use of the word alone would no longer result in a post’s removal.

The Oversight Board had criticized the company for a “blanket ban” on the word, which is often translated as “martyr,” though, as the board noted, it can have multiple meanings. Meta’s previous policy, however, didn’t take that “linguistic complexity” into account, which resulted in a disproportionate number of takedowns over a commonly used word. Shaheed, the board said earlier this year, “accounts for more content removals under the Community Standards than any other single word or phrase,” across the company’s apps.

In its latest update, Meta said that it had tested a new approach to moderating the word following a recommendation from the board. “Initial results from our assessment indicate that continuing to remove content when “Shaheed” is paired with otherwise violating content – or when the three signals of violence outlined by the Board are present – captures the most potentially harmful content without disproportionality impacting voice,” the company wrote.

The change should have a significant impact on Meta’s Arabic-speaking users, who, according to the board, have been unfairly censored as a result of the policy. “The Oversight Board welcomes Meta’s announcement today that it will implement the Board’s recommendations and introduce significant changes to an unfair policy that led to the censoring of millions of people across its platforms,” the board said in a statement. “The policy changes on how to moderate the Arabic word ‘shaheed’ should have a swift impact on when content is removed, with a more nuanced approach ending a blanket ban on a term that Meta has acknowledged is one the most over-enforced on its platforms.”

This article originally appeared on Engadget at https://www.engadget.com/meta-is-changing-its-policy-for-the-most-moderated-word-on-its-platforms-185016272.html?src=rss

Microsoft reveals further emails compromised by Russian hack

An attack on Microsoft by Russian hackers had further implications than initially reported. The tech giant is notifying additional individuals that emails between them and Microsoft were accessed, Bloomberg reports. A group known as Midnight Blizzard or Nobelium orchestrated this attack, along with the 2020 SolarWinds hack. The US government has previously linked Midnight Blizzard to the Russian Foreign Intelligence Service. 

Microsoft previously informed some individuals that their emails were viewed, but the company is now sharing specifics. "This week we are continuing notifications to customers who corresponded with Microsoft corporate email accounts that were exfiltrated by the Midnight Blizzard threat actor, and we are providing the customers the email correspondence that was accessed by this actor," a Microsoft spokesperson stated. "This is increased detail for customers who have already been notified and also includes new notifications." Microsoft is making customers aware via email, which initially led to concerns that the notification was a phishing scam.

Microsoft first disclosed the hack in January, stating that a password spray attack gained the group access to "a very small percentage of Microsoft corporate email accounts" in late 2023. Employees with compromised emails included members of the senior leadership, cybersecurity and legal teams.

At the time, Microsoft said vulnerabilities in its systems were not to blame for the attack but that it would be improving security. However, the US government has brought the heat against Microsoft, with a March report from the Cyber Safety Review Board finding the company's "security culture was inadequate and requires an overhaul." In April, the US Cybersecurity and Infrastructure Security Agency (CISA) issued an order requiring federal agencies to analyze hacked emails and secure Microsoft cloud accounts, among other measures. CISA notified all impacted agencies and required them to provide regular updates on the steps taken to thwart this "grave and unacceptable risk."

This article originally appeared on Engadget at https://www.engadget.com/microsoft-reveals-further-emails-compromised-by-russian-hack-130014275.html?src=rss

Meta’s Oversight Board made just 53 decisions in 2023

The Oversight Board has published its latest annual report looking at its influence on Meta and ability to shift the policies that govern Facebook and Instagram. The board says that in 2023 it received 398,597 appeals, the vast majority of which came from Facebook users. But it took on only a tiny fraction of those cases, issuing a total of 53 decisions.

The board suggests, however, that the cases it selects can have an outsize impact on Meta’s users. For example, it credits its work for influencing improvements to Meta’s strike system and the “account status” feature that helps users check if their posts have violated any of the company’s rules.

Sussing out the board’s overall influence, though, is more complicated. The group says that between January of 2021 and May of 2024, it has sent a total of 266 recommendations to Meta. Of those, the company has fully or partially implemented 75, and reported “progress” on 81. The rest have been declined, “omitted or reframed,” or else Meta has claimed some level of implementation but hasn’t offered proof to the board. (There are five recommendations currently awaiting a response.) Those numbers raise some questions about how much Meta is willing to change in response to the board it created.

The Oversight Board's tally of how Meta has responded to its recommendations,
Oversight Board

Notably, the report has no criticism for Meta and offers no analysis of Meta’s efforts (or lack thereof) to comply with its recommendations. The report calls out a case in which it recommended that Meta suspend the former prime minister of Cambodia for six months, noting that it overturned the company’s decision to leave up a video that could have incited violence. But the report makes no mention of the fact that Meta declined to suspend the former prime minister’s account and declined to further clarify its rules for public figures.

The report also hints at thorny topics the board may take on in the coming months. It mentions that it wants to look at content “demotion,” or what some Facebook and Instagram users may call “shadowbans” (the term is a loaded one for Meta, which has repeatedly denied that its algorithms intentionally punish users for no reason). “One area we are interested in exploring is demoted content, where a platform limits a post’s visibility without telling the user,” the Oversight Board writes.

For now, it’s not clear exactly how the group could tackle the issue. The board’s purview currently allows it to weigh in on specific pieces of content that Meta has removed or left up after a user appeal. But it’s possible the board could find another way into the issue. A spokesperson for the Oversight Board notes that the group expressed concern about demoted content in its opinion on content related to the Israel-Hamas war. “This is something the board would like to further explore as Meta’s decisions around demotion are pretty opaque,” the spokesperson said.

This article originally appeared on Engadget at https://www.engadget.com/metas-oversight-board-made-just-53-decisions-in-2023-100017750.html?src=rss

Reddit puts AI scrapers on notice

Reddit has a warning for AI companies and other scrapers: play by our rules or get blocked. The company said in an update that it plans to update its Robots Exclusion Protocol (robots.txt file), which allows it to block automated scraping of its platform.

The company said it will also continue to block and rate-limit crawlers and other bots that don’t have a prior agreement with the company. The changes, it said, shouldn’t affect “good faith actors,” like the Internet Archive and researchers.

Reddit’s notice comes shortly after multiple reports that Perplexity and other AI companies regularly bypass websites’ robots.txt protocol, which is used by publishers to tell web crawlers they don’t want their content accessed. Perplexity’s CEO, in a recent interview with Fast Company, said that the protocol is “not a legal framework.”

In a statement, a Reddit spokesperson told Engadget that it wasn’t targeting a particular company. “This update isn’t meant to single any one entity out; it’s meant to protect Reddit while keeping the internet open,” the spokesperson said. “In the next few weeks, we’ll be updating our robots.txt instructions to be as clear as possible: if you are using an automated agent to access Reddit, regardless of what type of company you are, you need to abide by our terms and policies, and you need to talk to us. We believe in the open internet, but we do not believe in the misuse of public content.”

It’s not the first time the company has taken a hard line when it comes to data access. The company cited AI companies’ use of its platform when it began charging for its API last year. Since then, it has struck licensing deals with some AI companies, including Google and OpenAI. The agreements allow AI firms to train their models on Reddit’s archive and have been a significant source of revenue for the newly-public Reddit. The “talk to us” part of that statement is likely a not-so-subtle reminder that the company is no longer in the business of handing out its content for free.

This article originally appeared on Engadget at https://www.engadget.com/reddit-puts-ai-scrapers-on-notice-205734539.html?src=rss

OpenAI will block people in China from using its services

OpenAI plans to block people from using ChatGPT in China, a country where its services aren’t officially available, but where users and developers access it via the company’s API anyway. Securities Times, a Chinese state-owned newspaper reported on Tuesday that OpenAI had started sending emails to users in China outlining its plans to block access starting July 9, according to Reuters.

“We are taking additional taps to block API traffic from regions where we do not support access to OpenAI’s services,” an OpenAI spokesperson told the publication. The move could impact several Chinese startups which have built applications using OpenAI’s large language models.

Although OpenAI’s services are available in more than 160 countries, China isn’t one of them. According to the company’s guidelines, users trying to access the company’s products in unsupported countries could be blocked or suspended — although the company hasn’t explicitly done so until now.

It’s not clear what prompted OpenAI’s move. Last month, the company revealed that it stopped covert influence operations — including one that originated from China — that used its AI models to spread disinformation across the internet. Bloomberg pointed out that OpenAI’s move coincides with Washington's pressure on American tech companies to limit China’s access to cutting-edge technologies developed in the US.

This article originally appeared on Engadget at https://www.engadget.com/openai-will-block-people-in-china-from-using-its-services-200801957.html?src=rss

EU finds Microsoft violated antitrust laws by bundling Teams

It has been nearly a year since the European Commission opened its investigation into Microsoft and there's finally a preliminary finding. The European Union's executive body announced its "view" that the tech giant violated antitrust laws by tying Microsoft Teams to its Office 365 and Microsoft 365 business suites. Last October, Microsoft unbundled Teams for users in the European Union and Switzerland, but the European Commission's Statement of Objections calls it "insufficient."

The European Commission used its statement to detail its concern "that Microsoft may have granted Teams a distribution advantage by not giving customers the choice whether or not to acquire access to Teams when they subscribe to their SaaS productivity applications. This advantage may have been further exacerbated by interoperability limitations between Teams' competitors and Microsoft's offerings. The conduct may have prevented Teams' rivals from competing, and in turn innovating, to the detriment of customers in the European Economic Area."

Microsoft faces a fine equal to 10 percent of its annual worldwide turnover if the EU confirms its preliminary findings, so it's no surprise the company is being cordial. "Having unbundled Teams and taken initial interoperability steps, we appreciate the additional clarity provided today and will work to find solutions to address the Commission's remaining concerns," said Brad Smith, Vice Chair and President of Microsoft, in a statement shared with Engadget.

This ordeal began in 2020 when Slack — now owned by Salesforce — filed an antitrust complaint against Microsoft, claiming it broke the EU's competition rules in bundling Teams to its suites. In April 2023, Microsoft declared its intention to offer Teams on its own (albeit without a clear plan), but the European Commission still formally opened an investigation just three months later. Following October's unbundling, Microsoft announced this past April that Teams would be available separately from Microsoft 365 and Office 365 to customers worldwide — current users could also switch plans. 

The European Commission's Statement of Objections also mentions a complaint by Alfaview, another video-conferencing software, which filed a similar grievance to Slack in July 2023 and notes it has open proceedings based on that complaint.

This article originally appeared on Engadget at https://www.engadget.com/eu-finds-microsoft-violated-antirust-laws-by-bundling-teams-121520916.html?src=rss

Amazon reportedly thinks people will pay up to $10 per month for next-gen Alexa

We've known for a while that Amazon is planning to soup up Alexa with generative AI features. While the company says it has been integrating that into various aspects of the voice assistant, it's also working on a more advanced version of Alexa that it plans to charge users to access. Amazon has reportedly dubbed the higher tier "Remarkable Alexa" (let's hope it doesn't stick with that name for the public rollout).

According to Reuters, Amazon is still determining pricing and a release date for Remarkable Alexa, but it has mooted a fee of between roughly $5 and $10 per month for consumers to use it. Amazon is also said to have been urging its workers to have Remarkable Alexa ready by August — perhaps so it's able to discuss the details as its usual fall Alexa and devices event.

This will mark the first major revamp of Alexa since Amazon debuted the voice assistant alongside Echo speakers a decade ago. The company is now in a position where it's trying to catch up with the likes of ChatGPT and Google Gemini. Amazon CEO Andy Jassy, who pledged that the company was working on a “more intelligent and capable Alexa" in an April letter to shareholders, has reportedly taken a personal interest in the overhaul. Jassy noted last August that every Amazon division had generative AI projects in the pipeline.

"We have already integrated generative AI into different components of Alexa, and are working hard on implementation at scale — in the over half a billion ambient, Alexa-enabled devices already in homes around the world — to enable even more proactive, personal, and trusted assistance for our customers," said an Amazon spokeswoman told Reuters. However, the company has yet to deploy the more natural-sounding and conversational version of Alexa it showed off last September.

Remarkable Alexa is said to be capable of complex prompts, such as being able to compose and send an email, and order dinner all from a single command. Deeper personalization is another aspect, while Amazon reportedly expects that consumers will use it for shopping advice, as with its Rufus assistant.

Upgraded home automation capability is said to be a priority too. According to the report, Remarkable Alexa may be able to gain a deeper understanding of user preferences, so it might learn to turn on the TV to a specific show. It may also learn to turn on the coffee machine when your alarm clock goes off (though it's already very easy to set this up through existing smart home systems).

Alexa has long been an unprofitable endeavor for Amazon — late last year, it laid off several hundred people who were working on the voice assistant. It's not a huge surprise that the company would try to generate more revenue from Remarkable Alexa (which, it's claimed, won't be offered as a Prime benefit). Users might need to buy new devices with more powerful tech inside so that Remarkable Alexa can run on them properly.

In any case, $10 (or even $5) per month for an upgraded voice assistant seems like a hard sell, especially when the current free version of Alexa can already handle a wide array of tasks. 

This article originally appeared on Engadget at https://www.engadget.com/amazon-reportedly-thinks-people-will-pay-up-to-10-per-month-for-next-gen-alexa-152205672.html?src=rss