Connect with us

Business

Web Scraping in 2022 & Beyond

Source: Finance Derivative

Web scraping has been coming into the limelight in recent years due to the rising interest in data. Businesses across the globe have been eyeing automated data collection as a way to enhance their profitability and overall decision making.

We’ve sat down with the Lead of Commercial Product Owners at Oxylabs.io, Nedas Višniauskas, to talk about the future of web scraping. Few people have been as deeply involved with the industry as Nedas, which has allowed him to gain a unique perspective on how it has developed and how it will continue to do so.

What do you think has been the biggest change in web scraping over the last decade? How has Oxylabs participated in these changes?

There have been some interesting changes during the past few years. One of them, I think, has been the proliferation of increasingly sophisticated anti-bot systems. Scraping such websites at scale, in turn, becomes more difficult.

Scraping enthusiasts, of course, have their own answer to these issues, which is to develop dedicated data collection tools. These, while limiting the field of use, can bypass the anti-bot systems and they are constantly being updated for that purpose.

Another important change has been the rising popularity of JavaScript. More and more websites are using it to load critically important data dynamically, which means it’s essentially unreachable without browsers.

Headless ones, therefore, are a necessity. At the same time, that means infrastructure costs are rising as headless browsers take up much more computing power and traffic than simple HTTP requests.

Finally, ethics have been in the limelight. For example, residential proxy providers are looking for ways to inform and reward participants of the network. We ourselves took charge of building the framework for ethical acquisition, which, I believe, has played a part in the fact that there are less shady practices and more clarity among all industry participants.

To answer the second question, Oxylabs have reacted to these changes with the development of Scraper APIs. We created both dedicated and universal scrapers that can acquire publicly available data from nearly any website without issue. Additionally, all of our proxies are ethically sourced, giving our partners the much needed peace of mind when engaging in scraping.

Have you seen or noticed any particular trends in data acquisition or web scraping? Are specific data types becoming popular?

Off the cuff I’d say that the use of ecommerce and delivery data has been booming since the pandemic hit. Businesses want to (legally) spy on competitors and gain access to as much data as possible. Data types like pricing, products or delivery times are important to any competitor.

But these have always been important. Maybe I would say that external data in general has risen in importance. Outside of that, I don’t think there have been any particular trends in data types. There have been, however, changes in the entire supply chain. As I’ve mentioned, businesses only really need the data. Even then, the data is not the key – insights are.

As such, businesses at the tail-end of the chain have proliferated in recent years. Data-as-a-service aggregators, ones that collect information and sell sets of it, have been rising in popularity.

There are also some businesses that provide insights directly. While these are still few and far between, some of them have unique value propositions that I could see as worthwhile. Jungle Scout, for example, is a service that both scrapes external data and has large datasets from internal sources. As such, they can provide insights other businesses can’t.

What do you think are the biggest challenges the industry is facing currently? Are there any innovative solutions to these or other challenges on the horizon?

Bot protection has always been the greatest challenge. Scraping, you see, is a cat-and-mouse game. Websites attempt to implement anti-bot measures, such as the well-known CAPTCHA, while scraping companies attempt to continue evading them to retain access to data.

There have been great strides made in bot protection. TLS (Transport Layer Security) fingerprinting has been one such improvement. Sophisticated websites can use initial network handshakes to match them with headers. As many scraping tools manually modify the headers sent, TLS can often be mismatched, which would be a dead giveaway.

On the other hand, the deck is always slightly stacked in the favor of scraping. Most anti-bot protection features put a dent in the overall user experience. Filling in a CAPTCHA is something that detracts from that frictionless experience of the modern web we’re used to.

Some businesses use these techniques and see no issue. Others, ones highly concerned with delivering the best user experience possible, avoid using CAPTCHAs unless absolutely necessary. It’s always a tradeoff. More bot protection equals, almost always, worse UX, which leads to less revenue. But then less people are scraping your website.

Additionally, new pages with interesting data and content appear all the time. And you don’t start building a website from bot protection. It has to be functional first. So, the process of scraping is a lot easier than it could be for a long time.

Would you say that there are potential benefits in web scraping for academic research or policy-making? If so, why hasn’t the scientific or political community adopted the practice?

Academic research, quantitative in particular, is in large part based on data that doesn’t exist on the internet, yet. There could be studies, however, on internet behavior or something of the like where scraping could be immensely useful. Additionally, I think we’re not seeing such widespread adoption due to the previously mentioned barrier to entry.

Let’s imagine that there’s no previous scraping experience in some particular university. The researcher would have to build everything from the ground up, get all the deep knowledge, and the funding required just to start acquiring the data.

It doesn’t help that the research areas that benefit the most from scraping (like sociology, economics, psychology, etc.) are far removed from the coding, development, and IT in general. I think it’s more of an unfortunate, but temporary, circumstance, because web scraping providers will be able to reduce the barrier by a significant margin in the future.

When it comes to policy-making, I’m not so sure. I think that rather than making, it should be about enforcing. Governments are definitely knee-deep in web scraping for all kinds of security purposes. Businesses, on the other hand, have been using the same processes to protect themselves from counterfeits and copyright infringement. There’s an entire business vertical dedicated explicitly to brand protection.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Business

Why email marketing remains one of the best forms of digital marketing

Crafting a strong email marketing strategy involves a real balance between creativity and making data-driven decisions, which, is just one of the roles undertaken by marketing and data company Go Live Data on behalf of its many clients.

Guiding some of the biggest corporates in the UK including Amazon Business, AxA and Premierline Business Insurance, Adam Herbert, CEO of Go Live Data, advises on the key components to a successful email campaign and why as one of the most effective marketing tools available, email still plays a crucial role in digital marketing:

Forming a direct means of communication, emails provides a and two-way access between businesses and their customers. And it may sound obvious to say, but unlike social media or other digital channels, every email allows marketers to reach their audience straight into their inbox, and this is where individuals are most likely to engage with the content they’re being shown.

Offering a high return on investment,  emails consistently deliver one of the highest ROI’s compared to other forms of digital marketing such as PPC and advertising. According to studies, the average is around £40 for every £1 spent, which is huge; and due to the low cost of email, its ability to drive conversions and to retain customers.

What’s more, with email segmentation and many personalisation techniques available, marketers can tailor their messages to specific groups of their audience, based on demographics, their behaviours, interests, and purchase history making them not only very targeted, but personalised too. The key is to deliver relevant content to subscribers, which means marketers can increase engagement, conversions, as well as customer satisfaction.

There are specific platforms which allow for automation, giving marketers the ability to set up automated workflows triggered by user actions and also means that marketers can deliver timely and relevant messages at scale, by nurturing leads, as an effective way to guide customers efficiently through the sales funnel.

Emails are also an excellent way to build customer relationships, by nurturing over time. By consistently delivering valuable content, exclusive offers, and personalised recommendations, businesses can strengthen the ‘bond’ with their audiences and increase brand loyalty. Email provides a means of two-way communication, which allows customers to send in their feedback, to ask any questions they may have and to  engage with a brand directly.

They are also a great way to drive traffic to your website, blog and social media, or any other digital channels connected to your business. By including attractive or compelling calls-to-action (CTAs) and relevant content, you can encourage subscribers to take action such as making a purchase, signing up for a webinar, or downloading a resource, which in turn will drive conversions and revenue for your business.

Email platforms offer substantial analytics and reporting functions that enable marketers to track the performance of their campaigns in real-time. Monitoring of key metrics such as open rates, click-through rates, conversion rates, and revenue generated, allows marketers to measure the effectiveness of their campaigns and of course make data-driven decisions to optimise and plan future activities.

Overall, emails are an integral component of a digital marketing and by leveraging email effectively, businesses can engage their audience, nurture leads, drive sales, and ultimately grow their businesses.

Continue Reading

Business

Conflicting with compliance: How the finance sector is struggling to implement GenAI

By James Sherlow, Systems Engineering Director, EMEA, for Cequence Security

GenerativeAI has multiple applications in the finance sector from product development to customer relations to marketing and sales. In fact, McKinsey estimates that GenAI has the potential to improve operating profits in the finance sector by between 9-15% and in the banking sector, productivity gains could be between 3-5% of annual revenues. It suggests AI tools could be used to boost customer liaison with AI integrated through APIs to give real-time recommendations either autonomously or via CSRs, to inform decision making and expedite day-to-day tasks for employees, and to decrease risk by monitoring for fraud or elevated instances of risk.

However, McKinsey also warns of inhibitors to adoption in the sector. These include the level of regulation applicable to different processes, which is fairly low with respect to customer relations but high for credit risk scoring, for example, and the data used, some of is in the public domain but some of which comprises personally identifiable information (PII) which is highly sensitive. If these issues can be overcome, the analyst estimates GenAI could more than double the application of expertise to decision making, planning and creative tasks from 25% without to 56%.

Hamstrung by regulations

Clearly the business use cases are there but unlike other sectors, finance is currently being hamstrung by regulations that have yet to catch up with the AI revolution. Unlike in the EU which approved the AI Act in March, the UK has no plans to regulate the technology. Instead, it intends to promote guidelines. The UK Financial Authorities comprising the Bank of England, PRA, and FCA have been canvassing the market on what these should look like since October 2022, publishing the results (FS2/23 – AI and Machine Learning) a year later which showed a strong demand for harmonisation with the likes of the AI Act as well as NIST’s AI Risk Management Framework.

Right now, this means financial providers find themselves in regulatory limbo. If we look at cyber security, for instance, firms are being presented with GenAI-enabled solutions that can assist them with incident detection and response but they’re not able to utilise that functionality because it contravenes compliance requirements. Decision-making processes are a key example as these must be made by a human, tracked and audited and, while the decision-making capabilities of GenAI may be on a par, accountability in remains a grey area. Consequently, many firms are erring on the side of caution and are choosing to deactivate AI functionality within their security solutions.

In fact, a recent EY report found one in five financial services leaders did not think their organisation was well-positioned to take advantage of the potential benefits. Much will depend on how easily the technology can be integrated into existing frameworks, although the GenAI and the Banking on AI: Financial Services Harnesses Generative AI for Security and Service report cautions this may take three to five years. That’s a long time in the world of GenAI, which has already come a long way since it burst on to the market 18 months ago.

Malicious AI

The danger is that while the sector drags its heels, threat actors will show no such qualms and will be quick to capitalise on the technology to launch attacks. FS2/23 makes the point that GenAI could see an increase in money laundering and fraud through the use of deep fakes, for instance, and sophisticated phishing campaigns. We’re still in the learning phase but as the months tick by the expectation is that we can expect to see high-volume self-learning attacks by the end of the year. These will be on an unprecedented scale because GenAI will lower the technological barrier to entry, enabling new threat actors to enter the fray.

Simply blocking attacks will no longer be a sufficient form of defence because GenAI will quickly regroup or pivot the attack automatically without the need to employ additional resource. If we look at how APIs, which are intrinsic to customer services and open banking for instance, are currently protected, the emphasis has been on detection and blocking but going forward we can expect deceptive response to play a far greater role. This frustrates and exhausts the resources of the attacker, making the attacks cost-prohibitive to sustain.

So how should the sector look to embrace AI given the current state of regulatory flux? As with any digital transformation project, there needs to be oversight of how AI will be used within the business, with a working group tasked to develop an AI framework. In addition to NIST, there are a number of security standards that can help here such as ISO 22989, ISO 23053, ISO 23984 and ISO 42001 and the oversight framework set out in DORA (Digital Operational Resilience Act) for third party providers. The framework should encompass the tools the firm has with AI functionality, their possible application in terms of use cases, and the risks associated with these, as well as how it will mitigate any areas of high risk.

Taking a proactive approach makes far more sense than suspending the use of AI which effectively places firms at the mercy of adversaries who will be quick to take advantage of the technology. These are tumultuous times and we can certainly expect AI to rewrite the rulebook when it comes to attack and defence. But firms must get to grips with how they can integrate the technology rather than electing to switch it off and continue as usual.

Continue Reading

Business

Recognising the value of protecting intellectual property early builds strong foundation for innovators

Innovation Manager at InnoScot Health, Fiona Schaefer analyses an essential facet of developing ideas into innovations

Helping the NHS to innovate remains a key priority during this period of recovery and reform. Even within the current cash-strapped climate, there is the opportunity to maximise the first-hand experience of the healthcare workforce and its knowledge of where new ideas are needed most.

Entrepreneurial-minded, creative staff from any discipline or activity are often best placed to recognise areas for improvement – the reason why a significant number of solutions come from, and are best developed with, health and social care staff.

NHS Scotland is a powerful driver of innovation, but to truly harness the opportunities which new ideas offer for development and commercialisation, the knowledge and intellectual property (IP) underpinning them needs to be protected. That vital know-how and other intangible assets – holding appropriate contracts for example – are key from an early stage.

Medical devices can take years to develop and gain regulatory approval, so from the outset of an idea’s development – and before revenue is generated – filing for IP protection and having confidentiality agreements in place are ways to start creating valuable assets. This is especially important when applying for patent protection because that option is only available when ideas have not been discussed or presented to external parties prior to application.

Without taking that critical initial step to protect IP, anyone – without your permission – could copy the idea, so anything of worth should be protected as soon as possible, making for a clear competitive advantage and ownership in the same sense as possessing physical property.

The common theme is that to be successful – and ultimately support the commercialisation of ideas that will improve patient care and outcomes – the idea must be novel, better, quicker, or more efficient than existing options. Furthermore, to turn it into a sound proposition worth investing in, it must also be technically and financially feasible. It isn’t enough to just be new and novel – the best innovations offer tangible benefits to patient outcomes and staff working practices.

Of course, even more so in the current climate of financial constraints, the key question of ‘Who will pay for your new product or service?’ needs to be considered up front as well.

Whilst development of a strong IP portfolio requires investment and dedicated expertise, when done well and at the appropriate time, then it is resource well spent, offering a level of security whilst developing an asset which can be built upon and traded. There are various ways commercialisation can progress and whilst not all efforts will be successful, intellectual property is an asset which can be licensed or sold to others offering a range of opportunities to secure a good return.

In my experience, however, many organisations including the NHS are still missing the opportunity to recognise and protect their knowledge assets and intellectual property early in the innovation pathway. This is partly due to lack of understanding – sometimes one aspect is carefully protected, whilst another is entirely neglected. In other cases, the desire to accelerate to the next stage of product development means such important foundational steps are not given the attention required for long-term success.

Good IP management goes beyond formally protecting the knowledge assets associated with a project, e.g. by patenting or design registration, however. When considered with other intangible assets such as access to datasets, clinical trial results, standard operating procedures, quality management systems, and regulatory approvals, it is the combination which will be key to success.

Early securing of IP protection or recognition of IP rights in a collaboration agreement, demonstrates foresight and business acumen. Later on, it can significantly boost negotiating power with a licensing partner or build investor confidence.

Conversely, omissions in IP protection or suitable contracts can be damaging, potentially derailing years of product development and exposing organisations to legal challenges and other risks. Failing to protect a promising idea can also mean commercial opportunities are missed, thus leading to your IP being undervalued.

Ideas are evaluated by formal NHS Scotland partner InnoScot Health in the same way whether they are big or small, a product, service, or new, innovative approach to a care pathway.

We encourage and enable all 160,000 NHS Scotland staff, regardless of role or location, to come forward with their ideas, giving them the advice and support they need to maximise their potential benefits.

Protecting the IP rights of the health service is one of the cornerstones of InnoScot Health’s service offering. In fact, to date we have protected over 255 NHS Scotland innovations. Recently these have included design registration and trademarks for the SARUS® hood and trademarks for SCRAM®, building and protecting a recognised range of bags with innovative, intuitive layouts. Spin outs such as Aurum Biosciences meanwhile have patents underpinning their novel therapeutics and diagnostics.

We assist in managing this IP to ensure a return on investment for the health service. Any revenue generated from commercialising ideas and innovations from healthcare professionals is shared with the innovators and the health board through our agreements with them and the revenue sharing scheme detailed in health board IP and innovation policies.

Fundamentally, we believe that it is vital to harness the value of expertise and creativity of staff with a well-considered approach to protecting IP and knowledge input to projects from the start.

Continue Reading

Copyright © 2021 Futures Parity.