Published on : Jan 12, 2023
As businesses and organizations continue to look for new ways to gain insights and make decisions, web scraping and alternative data grow in popularity. According to Oxylabs, premium proxy and public web data-acquisition solution provider experts, this year, the spotlight will be put on optimizing generative AI, cybersecurity, and ML, as well as expanding web data application.
Increase in generative AI and Cybersecurity significance
Tomas Montvilas, Chief Commercial Officer at Oxylabs, anticipates that generative AI, cybersecurity practices, and alternative data collection can help businesses stay competitive during the upcoming recession.
"As many economies are predicted to face recession, investment decisions will need to be tighter and less risky. Collecting alternative financial data will be key in getting the necessary signals to make these predictions."
Furthermore, according to Tomas, generative AI is increasingly widely used in business use cases. Massive data sets from the public web are required to train these models.
Another central area Tomas foresees in the web scraping sector is cybersecurity. Cyber attacks are becoming increasingly sophisticated, necessitating new proactive protection techniques such as threat monitoring and hunting.
"Cybersecurity practices continue to be adopted by more organizations. Hence, use cases like proactive threat hunting and monitoring that require continuous large-scale web monitoring will become more common," added Tomas.
Spotlight on personal data scraping
During 2022, the scraping industry could breathe a sigh of relief as at least one enduring issue was put to rest. Companies combating scrapers can no longer use the Computer Fraud and Abuse Act (CFAA) to stop scraping public-facing data.
Denas Grybauskas, the Head of Legal at Oxylabs, says, that "we can expect that during 2023 other legal grounds and arguments will be tried and become more popular in the courts against data scraping companies, such as infringement of terms of service, intellectual property protection, etc."
As 2022 ended with quite a few stories of personal data scraping and data breaches (Clearview fines in Europe, Meta database leak that affected more than 500M users, Meta's GDPR fines, etc.), according to Denas, we can expect more spotlight on personal data scraping from regulators and authorities.
"Finally, 2023 might be the year when the scraping and data collection industry will begin self-regulation initiatives," said Denas.
Intensive scale of web data applications
Gediminas Rickevičius - the VP of Global Partnerships at Oxylabs, is confident that with evolving AI capabilities next year, the same as in 2022, the importance and scale of web data applications in commerce will continue to grow.
Gediminas predicts that further parallel evolution of web scraping and blocking systems can also be foreseen. It means a greater need for resources and know-how.
"Therefore, I suggest leaving web scraping in the expert's hands. Although the cost of commercial scraping will increase, doing it yourself will be even more expensive than with professionals' help," predicted Gediminas.
Focus on machine learning
According to Julius Černiauskas, the CEO at Oxylabs, more machine learning models will be deployed in the field. "Although there have been many ML failures in the past, I believe the tide is turning for ML engineering teams due to a combination of greater attention on data quality and economic pressure to make ML more useful," said Julius.
Data scientists were formerly expected to work on a wide variety of data projects. However, when the next generation of data technology becomes more widely used, data scientists will devote their talents to more complex projects using hand-crafted prediction models.
"Many businesses will have difficulties in the next year. As a result, IT companies must discover methods to save expenses, make the greatest use of data scientists' time and talents, and incorporate machine learning and predictive modeling capabilities into teams that directly influence revenue and profitability.
"As a result, I expect that businesses will maximize data science resources by augmenting skilled data scientists inside the company with data technologies that automate regular portions of data science work using reliable approaches," Julius anticipated.