(Go: >> BACK << -|- >> HOME <<)

skip to main content
10.1145/3589334.3645510acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article
Open access

Hyperlink Hijacking: Exploiting Erroneous URL Links to Phantom Domains

Published: 13 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Web users often follow hyperlinks hastily, expecting them to be correctly programmed. However, it is possible those links contain typos or other mistakes. By discovering active but erroneous hyperlinks, a malicious actor can spoof a website or service, impersonating the expected content and phishing private information. In 'typosquatting,' misspellings of common domains are registered to exploit errors when users mistype a web address. Yet, no prior research has been dedicated to situations where the linking errors of web publishers (i.e. developers and content contributors) propagate to users. We hypothesize that these 'hijackable hyperlinks' exist in large quantities with the potential to generate substantial traffic. Analyzing large-scale crawls of the web using high-performance computing, we show the web currently contains active links to more than 572,000 dot-com domains that have never been registered, what we term 'phantom domains.' Registering 51 of these, we see 88% of phantom domains exceeding the traffic of a control domain, with up to 10 times more visits. Our analysis shows that these links exist due to 17 common publisher error modes, with the phantom domains they point to free for anyone to purchase and exploit for under 20, representing a low barrier to entry for potential attackers.

    Supplemental Material

    MP4 File
    video presentation
    MP4 File
    Supplemental video

    References

    [1]
    Sumayah Alrwais, Kan Yuan, Eihal Alowaisheq, Zhou Li, and XiaoFeng Wang. 2014. Understanding the Dark Side of Domain Parking. In Proceedings of the 23rd USENIX Conference on Security Symposium (SEC '14). USENIX Association, USA, 207--222.
    [2]
    Anirban Banerjee, Dhiman Barman, Michalis Faloutsos, and Laxmi N Bhuyan. 2008. Cyber-Fraud Is One Typo Away. IEEE INFOCOM 2008 - The 27th Conference on Computer Communications (April 2008). https://doi.org/10.1109/INFOCOM.2008.258
    [3]
    Burton H. Bloom. 1970. Space/Time Trade-Offs in Hash Coding with Allowable Errors. Commun. ACM, Vol. 13, 7 (July 1970), 422--426. https://doi.org/10.1145/362686.362692
    [4]
    Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, Vol. 30, 1--7 (April 1998), 107--117. https://doi.org/10.1016/S0169--7552(98)00110-X
    [5]
    Common Crawl. 2023. Overview. https://commoncrawl.org/overview.
    [6]
    DNIB Staff. 2023. The Domain Name Industry Brief Quarterly Report. https://dnib.com/articles/the-domain-name-industry-brief-q2--2023.
    [7]
    Dnspython Contributors. 2020. About Dnspython | Dnspython. https://www.dnspython.org/about/.
    [8]
    Benjamin Edelman. 2003. Large-Scale Registration of Domains with Typographical Errors. https://cyber.harvard.edu/archived_content/people/edelman/typo-domains/.
    [9]
    Suhendry Effendy and Roland H.C. Yap. 2017. Analysing Trends in Computer Science Research : A Preliminary Study Using The Microsoft Academic Graph. In Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion. ACM Press, Perth, Australia, 1245--1250. https://doi.org/10.1145/3041021.3053064
    [10]
    ICANN. 2014. Centralized Zone Data Service (CZDS ) - ICANN. https://www.icann.org/resources/pages/czds-2014-03-03-en.
    [11]
    International Telecommunication Union. 2022. Facts and Figures 2022 - Internet Use. https://www.itu.int/itu-d/reports/statistics/2022/11/24/ff22-internet-use.
    [12]
    Internet Archive. 2023. Internet Archive : Wayback Machine. https://archive.org/web/.
    [13]
    Beliz Kaleli, Brian Kondracki, Manuel Egele, Nick Nikiforakis, and Gianluca Stringhini. 2021a. To Err. Is Human: Characterizing the Threat of Unintended URLs in Social Media. In Proceedings 2021 Network and Distributed System Security Symposium. Internet Society, Virtual. https://doi.org/10.14722/ndss.2021.24322
    [14]
    Beliz Kaleli, Brian Kondracki, Manuel Egele, Nick Nikiforakis, and Gianluca Stringhini. 2021b. To Err. Is Human : Characterizing the Threat of Unintended URLs in Social Media. Proceedings of the 28th Network and Distributed System Security Symposium (NDSS) (Jan. 2021). https://doi.org/10.14722/ndss.2021.24322
    [15]
    Mohammad Taha Khan, Xiang Huo, Zhou Li, and Chris Kanich. 2015. Every Second Counts: Quantifying the Negative Externalities of Cybercrime via Typosquatting. In 2015 IEEE Symposium on Security and Privacy. IEEE, San Jose, CA, 135--150. https://doi.org/10.1109/SP.2015.16
    [16]
    Panagiotis Kintis, Najmeh Miramirkhani, Charles Lever, Yizheng Chen, Rosa Romero-Gómez, Nikolaos Pitropakis, Nick Nikiforakis, and Manos Antonakakis. 2017. Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, Dallas Texas USA, 569--586. https://doi.org/10.1145/3133956.3134002
    [17]
    Amy J. Ko and Brad A. Myers. 2005. A Framework and Methodology for Studying the Causes of Software Errors in Programming Systems. Journal of Visual Languages & Computing, Vol. 16, 1--2 (Feb. 2005), 41--84. https://doi.org/10.1016/j.jvlc.2004.08.003
    [18]
    Koehler, Wallace. 2004. A Longitudinal Study of Web Pages Continued: A Consideration of Document Persistence. Web Documents, Half-life, Linkrot, Persistence, Web Citations. Information Research, Vol. 9, 2 (Jan. 2004).
    [19]
    Tobias Lauinger, Abdelberi Chaabane, Ahmet Salih Buyukkayhan, Kaan Onarlioglu, and William Robertson. 2017. Game of Registrars : An Empirical Analysis of vphantomPost-Expiration vphantom Domain Name Takeovers. In 26th USENIX Security Symposium (USENIX Security 17). 865--880.
    [20]
    Victor Le Pochat, Tom Van Goethem, and Wouter Joosen. 2019. Funny Accents : Exploring Genuine Interest in Internationalized Domain Names. In Passive and Active Measurement, David Choffnes and Marinho Barcellos (Eds.). Vol. 11419. Springer International Publishing, Cham, 178--194. https://doi.org/10.1007/978--3-030--15986--3_12
    [21]
    V. I. Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, Vol. 10 (Feb. 1966), 707.
    [22]
    Tyler Moore and Benjamin Edelman. 2010a. Measuring the Perpetrators and Funders of Typosquatting. In Financial Cryptography and Data Security, David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Doug Tygar, Moshe Y. Vardi, Gerhard Weikum, and Radu Sion (Eds.). Vol. 6052. Springer Berlin Heidelberg, Berlin, Heidelberg, 175--191. https://doi.org/10.1007/978--3--642--14577--3_15 Series Title: Lecture Notes in Computer Science.
    [23]
    Tyler Moore and Benjamin Edelman. 2010b. Measuring the Perpetrators and Funders of Typosquatting. In Financial Cryptography and Data Security (Lecture Notes in Computer Science ), Radu Sion (Ed.). Springer, Berlin, Heidelberg, 175--191. https://doi.org/10.1007/978--3--642--14577--3_15
    [24]
    Nick Nikiforakis, Marco Balduzzi, Lieven Desmet, Frank Piessens, and Wouter Joosen. 2014. Soundsquatting: Uncovering the Use of Homophones in Domain Squatting. In Information Security, Sherman S. M. Chow, Jan Camenisch, Lucas C. K. Hui, and Siu Ming Yiu (Eds.). Vol. 8783. Springer International Publishing, Cham, 291--308. https://doi.org/10.1007/978--3--319--13257-0_17 Series Title: Lecture Notes in Computer Science.
    [25]
    Nick Nikiforakis, Luca Invernizzi, Alexandros Kapravelos, Steven Van Acker, Wouter Joosen, Christopher Kruegel, Frank Piessens, and Giovanni Vigna. 2012a. You are what you include: large-scale evaluation of remote javascript inclusions. In Proceedings of the 2012 ACM conference on Computer and communications security. ACM, Raleigh North Carolina USA, 736--747. https://doi.org/10.1145/2382196.2382274
    [26]
    Nick Nikiforakis, Luca Invernizzi, Alexandros Kapravelos, Steven Van Acker, Wouter Joosen, Christopher Kruegel, Frank Piessens, and Giovanni Vigna. 2012b. You Are What You Include: Large-Scale Evaluation of Remote Javascript Inclusions. In Proceedings of the 2012 ACM Conference on Computer and Communications Security. ACM, Raleigh North Carolina USA, 736--747. https://doi.org/10.1145/2382196.2382274
    [27]
    Nick Nikiforakis, Steven Van Acker, Wannes Meert, Lieven Desmet, Frank Piessens, and Wouter Joosen. 2013. Bitsquatting: exploiting bit-flips for fun, or profit?. In Proceedings of the 22nd international conference on World Wide Web. ACM, Rio de Janeiro Brazil, 989--998. https://doi.org/10.1145/2488388.2488474
    [28]
    Joshua Reynolds, Adam Bates, and Michael Bailey. 2022. Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations. In Computer Security -- ESORICS 2022, Vijayalakshmi Atluri, Roberto Di Pietro, Christian D. Jensen, and Weizhi Meng (Eds.). Vol. 13556. Springer Nature Switzerland, Cham, 166--185. https://doi.org/10.1007/978--3-031--17143--7_9 Series Title: Lecture Notes in Computer Science.
    [29]
    Iskander Sanchez-Rola, Davide Balzarotti, Christopher Kruegel, Giovanni Vigna, and Igor Santos. 2020. Dirty Clicks: A Study of the Usability and Security Implications of Click-related Behaviors on the Web. In Proceedings of The Web Conference 2020. ACM, Taipei Taiwan, 395--406. https://doi.org/10.1145/3366423.3380124
    [30]
    Geoffrey Simpson, Tyler Moore, and Richard Clayton. 2020. Ten years of attacks on companies using visual impersonation of domain names. In 2020 APWG Symposium on Electronic Crime Research (eCrime). IEEE, Boston, MA, USA, 1--12. https://doi.org/10.1109/eCrime51433.2020.9493251
    [31]
    Johnny So, Najmeh Miramirkhani, Michael Ferdman, and Nick Nikiforakis. 2022. Domains Do Change Their Spots : Quantifying Potential Abuse of Residual Trust. In 2022 IEEE Symposium on Security and Privacy (SP ). IEEE, San Francisco, CA, USA, 2130--2144. https://doi.org/10.1109/SP46214.2022.9833609
    [32]
    Jeffrey Spaulding, Shambhu Upadhyaya, and Aziz Mohaisen. 2016. The Landscape of Domain Name Typosquatting: Techniques and Countermeasures. In 2016 11th International Conference on Availability, Reliability and Security (ARES). IEEE, Salzburg, Austria, 284--289. https://doi.org/10.1109/ARES.2016.84
    [33]
    Jeffrey Spaulding, Shambhu Upadhyaya, and Aziz Mohaisen. 2017. You've Been Tricked! A User Study of the Effectiveness of Typosquatting Techniques. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, Atlanta, GA, USA, 2593--2596. https://doi.org/10.1109/ICDCS.2017.221
    [34]
    Janos Szurdi. 2020. Measuring and Analyzing Typosquatting Toward Fighting Abusive Domain Registrations. Thesis. Carnegie Mellon University. https://doi.org/10.1184/R1/12766136.v1
    [35]
    Janos Szurdi, Balazs Kocso, Gabor Cseh, Jonathan Spring, Mark Felegyhazi, and Chris Kanich. 2014. The Long “vphantomTaile ”vphantom of Typosquatting Domain Names. In 23rd USENIX Security Symposium (USENIX Security 14). San Diego, CA, USA, 191--206.
    [36]
    Janos Szurdi, Meng Luo, Brian Kondracki, Nick Nikiforakis, and Nicolas Christin. 2021. Where Are You Taking Me? Understanding Abusive Traffic Distribution Systems. In Proceedings of the Web Conference 2021. ACM, Ljubljana Slovenia, 3613--3624. https://doi.org/10.1145/3442381.3450071
    [37]
    Rashid Tahir, Ali Raza, Faizan Ahmad, Jehangir Kazi, Fareed Zaffar, Chris Kanich, and Matthew Caesar. 2018. It's All in the Name: Why Some URLs are More Vulnerable to Typosquatting. In IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. IEEE, Honolulu, HI, 2618--2626. https://doi.org/10.1109/INFOCOM.2018.8486271
    [38]
    Unit 42, Palo Alto Networks. 2022. 2022 Ransomware Threat Report. Technical Report. io

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '24: Proceedings of the ACM on Web Conference 2024
    May 2024
    4826 pages
    ISBN:9798400701719
    DOI:10.1145/3589334
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2024

    Check for updates

    Author Tags

    1. common crawl
    2. crawling
    3. domains
    4. hijackable
    5. hijacking
    6. hyperlinks
    7. links
    8. phantom domains
    9. phishing
    10. spoofing
    11. typosquatting
    12. vulnerabilities
    13. web

    Qualifiers

    • Research-article

    Funding Sources

    • CSCRC

    Conference

    WWW '24
    Sponsor:
    WWW '24: The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore, Singapore

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 893
      Total Downloads
    • Downloads (Last 12 months)893
    • Downloads (Last 6 weeks)598

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media