It's an interesting exercise to look back to the year 2000 to see how we approached cyber security. We just started to realize that data might be a useful currency, but for the most part, security pursued preventative avenues, such as firewalls, intrusion prevention systems, and anti-virus. With the advent of log management and security incident and event management (SIEM) solutions we started to gather gigabytes of sensor data and correlate data from different sensors to improve on their weaknesses and accelerate their strengths. But fundamentally, such solutions didn't scale that well and struggled to deliver real security insight.
Today, cybersecurity wouldn't work anymore without large scale data analytics and machine learning approaches, especially in the realm of malware classification and threat intelligence. Nonetheless, we are still just scratching the surface and learning where the real challenges are in data analytics for security.
This talk will go on a journey of big data in cybersecurity, exploring where big data has been and where it must go to make a true difference. We will look at the potential of data mining, machine learning, and artificial intelligence, as well as the boundaries of these approaches. We will also look at both the shortcomings and potential of data visualization and the human computer interface. It is critical that today's systems take into account the human expert and, most importantly, provide the right data.
Presentation talks about introduction to MITRE ATT&CK Framework, different use cases, pitfalls to take care about.. Talk was delivered @Null Bangalore and @OWASP Bangalore chapter on 15th February 2019.
Microsoft Sentinel is a cloud-native security information and event management (SIEM) solution powered by AI and automation. It collects security data from various sources at cloud scale, uses machine learning to analyze the data and detect threats, provides visualizations to investigate incidents and related entities, and enables automating common security tasks and workflows through automation rules and playbooks. This increases security operations efficiency and helps organizations accelerate response to security threats.
In 2018, Zero Trust Security gained popularity due to its simplicity and effectiveness. Yet despite a rise in awareness, many organizations still don’t know where to start or are slow to adopt a Zero Trust approach.
The result? Breaches affected as many as 66% of companies just last year. And as hackers become more sophisticated and resourceful, the number of breaches will continue to rise.
Unless organizations adopt Zero Trust Security. In 2019, take some time to assess your company’s risk factors and learn how to implement Zero Trust Security in your organization.
From MITRE ATT&CKcon Power Hour January 2021
By Adam Pennington, ATT&CK Lead, MITRE
Adam leads ATT&CK at The MITRE Corporation and collected much of the intelligence leveraged in creating ATT&CK’s initial techniques. He has spent much of his 12 years with MITRE studying and preaching the use of deception for intelligence gathering. Prior to joining MITRE, Adam was a researcher at Carnegie Mellon’s Parallel Data Lab and earned his BS and MS degrees in Computer Science and Electrical and Computer Engineering as well as the 2017 Alumni Service Award from Carnegie Mellon University. Adam has presented and published in a number of venues including FIRST CTI, USENIX Security and ACM Transactions on Information and System Security.
SANS Ask the Expert: An Incident Response Playbook: From Monitoring to Opera...AlienVault
As cyber attacks grow more sophisticated, many organizations are investing more into incident detection and response capabilities. Event monitoring and correlation technologies and security operations are often tied to incident handling responsibilities, but the number of attack variations is staggering, and many organizations are struggling to develop incident detection and response processes that work for different situations.
In this webcast, we'll outline the most common types of events and indicators of compromise (IOCs) that naturally feed intelligent correlation rules, and walk through a number of different incident types based on these. We'll also outline the differences in response strategies that make the most sense depending on what types of incidents may be occurring. By building a smarter incident response playbook, you'll be better equipped to detect and respond more effectively in a number of scenarios.
Threat hunting and achieving security maturityDNIF
The document discusses threat hunting techniques and achieving maturity in threat hunting programs. It introduces threat hunting and defines it as proactively searching networks to detect advanced threats. It then covers threat hunting maturity models ranging from initial to leading levels. Common threat hunting techniques like searching, clustering, grouping and stack counting are explained. The threat hunting loop process of creating hypotheses, investigating, uncovering patterns and informing analytics is also outlined. Finally, two practical threat hunting case studies on potential command and control activity and suspicious emails are described.
6 Steps for Operationalizing Threat IntelligenceSirius
The best form of defense against cyber attacks and those who perpetrate them is to know about them. Collaborative defense has become critical to IT security, and sharing threat intelligence is a force multiplier. But for many organizations, good quality intelligence is hard to come by.
Commercial threat intelligence technology and services can help enterprises arm themselves with the strategic, tactical and operational insights they need to identify and respond to global threat activity, and integrate intelligence into their security programs.
Threat intelligence sources have varying levels of relevance and context, and there are concerns about data quality and redundancy, shelf life, public/private data sharing, and threat intelligence standards. However, if processed and applied properly, threat intelligence provides a way for organizations to get the insight they need into attackers’ plans, prioritize and respond to threats, shorten the time between attack and detection, and focus staff efforts and decision-making.
View to learn:
--The difference between threat information and threat intelligence.
--Available sources of intelligence and how to determine if they apply to your business.
--Key steps for preparing to ingest threat information and turn it into intelligence.
--How to derive useful data that helps you achieve your business goals.
--Tools that are available to make collaboration easier.
The presentation will describe methods for discovering interesting and actionable patterns in log files for security management without specifically knowing what you are looking for. This approach is different from "classic" log analysis and it allows gaining an insight into insider attacks and other advanced intrusions, which are extremely hard to discover with other methods. Specifically, I will demonstrate how data mining can be used as a source of ideas for designing future log analysis techniques, that will help uncover the coming threats. The important part of the presentation will be the demonstration how the above methods worked in a real-life environment.
The document discusses the MITRE ATT&CK framework, which is a knowledge base of adversary behaviors and tactics collected from real-world observations. It describes how the framework categorizes behaviors using tactics, techniques, and procedures. The framework can be used for threat intelligence, detection and analytics, adversary emulation, and assessment and engineering. The document provides examples of how organizations can map their detection capabilities and data sources to techniques in the framework to improve visibility of attacks. It cautions against misusing the framework as a checklist rather than taking a threat-informed approach.
From SIEM to SOC: Crossing the Cybersecurity ChasmPriyanka Aash
You own a SIEM, but to be secure, you need a Security Operations Center! How do you cross the chasm? Do you hire staff or outsource? And what skills are needed? Mike Ostrowski, a cybersecurity industry veteran, will review common pitfalls experienced through the journey from SIEM to SOC, the pros and cons of an all in-house SOC vs. outsourcing, and the benefits of a hybrid SOC model.
Learning Objectives:
1: You own a SIEM, but to be secure, you need a SOC. How do you cross the chasm?
2: What are the pros and cons of in-house, fully managed and hybrid security?
3: What considerations go into deciding whether to employ a hybrid strategy?
(Source: RSA Conference USA 2018)
The document is a presentation on threat hunting with Splunk. It discusses threat hunting basics, data sources for threat hunting, knowing your endpoint, and using the cyber kill chain framework. It outlines an agenda that includes a hands-on walkthrough of an attack scenario using Splunk's core capabilities. It also discusses advanced threat hunting techniques and tools, enterprise security walkthroughs, and applying machine learning and data science to security.
Creating Your Own Threat Intel Through Hunting & VisualizationRaffael Marty
The security industry is talking a lot about threat intelligence; external information that a company can leverage to understand where potential threats are knocking on the door and might have already perpetrated the network boundaries. Conversations with many CERTs have shown that we have to stop relying on knowledge about how attacks have been conducted in the past and start 'hunting' for signs of compromises and anomalies in our own environments.
In this presentation we explore how the decade old field of security visualization has emerged. We show how we have applied advanced analytics and visualization to create our own threat intelligence and investigated lateral movement in a Fortune 50 company.
Visualization. Data science. No machine learning. But pretty pictures.
Here is a blog post I wrote a bit ago about the general theme of internal threat intelligence:
http://www.darkreading.com/analytics/creating-your-own-threat-intel-through-hunting-and-visualization/a/d-id/1321225?
Threat intelligence is information that informs enterprise defenders of adversarial elements to stop them.
It is information that is relevant to the organization, has business value, and is actionable.
If you having all data and feeds then data alone isn’t intelligence.
#Threat #Intelligence #Forensics #ELK #Forensics #VAPT #SOC #SIEM #Incident #D3pak
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl
This document discusses threat hunting using IBM QRadar and Sqrrl analytics. It introduces threat hunting, the threat hunting process, and the Sqrrl behavior graph for visualizing and exploring linked security data. Use cases for threat hunting with Sqrrl analytics on the QRadar platform are presented, along with a reference architecture showing how Sqrrl integrates with QRadar. A demonstration of the Sqrrl threat hunting platform concludes the document.
ATTACKers Think in Graphs: Building Graphs for Threat IntelligenceMITRE - ATT&CKcon
From MITRE ATT&CKcon Power Hour January 2021
By Valentine Mairet, Security Researcher, McAfee
The MITRE ATT&CK framework is the industry standard to dissect cyberattacks into used techniques. At McAfee, all attack information is disseminated into different categories, including ATT&CK techniques. What results from this exercise is an extensive repository of techniques used in cyberattacks that goes back many years. Much can be learned from looking at historical attack data, but how can we piece all this information together to identify new relationships between threats and attacks? In her recent efforts, Valentine has embraced analyzing ATT&CK data in graphical representations. One lesson learned is that it is not just about merely mapping out attacks and techniques used into graphs, but the strength lies in applying different algorithms to answer specific questions. In this presentation, Valentine will showcase the results and techniques obtained from her research journey using graph and graph algorithms.
Many organizations and managed security providers are starting to move from SIEM, Security Information and Event Management, to EDR, Endpoint Detection and Response. The problem is this may not be the best decision for your organization. These technologies are similar but fundamentally different. This presentation also shares innovating ways to use your SIEM to catch the bad guys as well as learn some simple tricks for easing the burden of SIEM management.
MITRE ATT&CK framework is about the framework that is followed by Threat Hunters, Threat Analysts for Threat Modelling purpose, which can be use for Adversary Emulation and Attack Defense. Cybersecurity Analyst widely use it for framing the attack through its various used Tactics and Techniques.
Talk on Kaspersky lab's CoLaboratory: Industrial Cybersecurity Meetup #5 with @HeirhabarovT about several ATT&CK practical use cases.
Video (in Russian): https://www.youtube.com/watch?v=ulUF9Sw2T7s&t=3078
Many thanks to Teymur for great tech dive
The document discusses security information and event management (SIEM) solutions from HP, including the HP SIRM Platform, ArcSight Logger, ArcSight Connectors, ArcSight ESM, and ArcSight Express. The HP SIRM Platform provides 360 degree security monitoring, proactive security testing, and adaptive network defenses. It integrates security correlation, application security analysis, and network defense mechanisms. ArcSight Logger collects and stores logs from over 350 sources for searching, analysis and retention. ArcSight Connectors automate log collection and normalization into a common format. ArcSight ESM analyzes and correlates events for security monitoring, compliance, and intelligence. ArcSight Express uses a new correlation
The extent and impact of recent security breaches is showing that current security approaches are just not working. But what can we do to protect our business? We have been advocating monitoring for a long time as a way to detect subtle, advanced attacks that are still making it through our defenses. However, products have failed to deliver on this promise.
Current solutions don't scale in both data volume and analytical insights. In this presentation we will explore what security monitoring is. Specifically, we are going to explore the question of how to visualize a billion log records. A number of security visualization examples will illustrate some of the challenges with big data visualization. They will also help illustrate how data mining and user experience design help us get a handle on the security visualization challenges - enabling us to gain deep insight for a number of security use-cases.
This document discusses using machine learning and big data technologies to improve security workflows. It describes the challenges of analyzing large amounts of security data from many sources to detect threats. Machine learning can help by analyzing patterns in the data at scale. The document introduces the Lambda Defense approach, which applies a lambda architecture to build a "central nervous system" for security. This combines batch and real-time machine learning models to detect threats based on both sequential and unordered behaviors.
Rise of the machines -- Owasp israel -- June 2014 meetupShlomo Yona
Rise of the machines -- Owasp israel -- June 2014 meetup
Shlomo Yona presents why it is a good idea to use Machine Learning in Security and explains some Machine Learning jargon and demonstraits with two fingerprinting examples: a wifi device (PHY) and a browser (L7)
Video (at YouTube) - http://bit.ly/19TNSTF
Big Data Security Analytics, Data Science and Machine Learning are a few of the new buzzwords that have invaded out industry of late. Most of what we hear are promises of an unicorn-laden, silver-bullet panacea by heavy-handed marketing folks, evoking an expected pushback from the most enlightened members of our community.
This talk will help parse what we as a community need to know and understand about these concepts and help understand where the technical details and actual capabilities of those concepts and also where they fail and how they can be exploited and fooled by an attacker.
The talk will also share results of the author's current ongoing research (on MLSec Project) of applying machine learning techniques to information secuirty monitoring.
Software Analytics:Towards Software Mining that Matters (2014)Tao Xie
This document discusses software analytics and summarizes several related papers and projects. It introduces Software Analytics, which aims to enable software practitioners to perform data exploration and analysis to obtain useful insights. It then summarizes papers on techniques for performance debugging by mining stack traces, scalable code clone analysis, incident management for online services, and using games to teach programming.
Security Analytics for Data Discovery - Closing the SIEM GapEric Johansen, CISSP
This document discusses security analytics and hunting maturity. It defines hunting as a proactive approach to identifying incidents by actively looking for patterns, intelligence or hunches, rather than waiting for notifications. It describes the "SIEM gap" where SIEM tools are designed for known threats and lack the tools and flexibility for human analysis and hunting of unknown threats. It outlines techniques used in security analytics like event clustering, association analysis, and visualization to help analyze large datasets and discover unknown threats. The document argues security analytics provides the data access, analysis techniques and workflows to help close the SIEM gap and improve an organization's hunting maturity over time.
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorialeswcsummerschool
The document discusses big data techniques, tools, and applications. It describes how big data is enabled by increases in storage capacity, processing power, and data availability. It outlines common approaches to distributed processing, storage, and programming models for big data, including MapReduce, NoSQL databases, and cloud computing. It also provides examples of applications involving log file analysis, network alarm monitoring, media content analysis, and social network analysis.
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
This document provides an overview of data mining concepts and techniques from the third edition of the textbook "Data Mining: Concepts and Techniques" by Jiawei Han, Micheline Kamber, and Jian Pei. It introduces why data mining is important due to the massive growth of data, defines data mining, and discusses the multi-dimensional nature of data mining including the types of data, patterns, techniques and applications. The chapter also covers data mining functions such as generalization, association analysis, classification, and cluster analysis.
This chapter introduces data mining and discusses its rise due to the massive growth of digital data. It describes data mining as the automated process of discovering patterns and knowledge from large data sets. The chapter outlines several key aspects of data mining, including the types of data that can be mined, the patterns that can be discovered, the technologies used, and its applications across various domains.
This document provides an introduction to data mining concepts and techniques. It discusses why data mining has become important due to the massive growth of digital data. Data mining aims to extract useful patterns from large datasets through techniques like generalization, association analysis, classification, and cluster analysis. It can be applied to many types of data and has uses in domains such as business, science, and healthcare to gain insights and make predictions.
01Introduction to data mining chapter 1.pptadmsoyadm4
This chapter introduces data mining and discusses its rise due to the massive growth of digital data. It describes data mining as the automated extraction of meaningful patterns from large data sets, and notes it draws on techniques from machine learning, statistics, pattern recognition, and database systems. The chapter outlines different types of data that can be mined, patterns that can be discovered, and applications of data mining in various domains including business, science, and on the web.
This document provides an introduction to data mining concepts and techniques. It discusses why data mining has become important due to the massive growth of digital data. Data mining aims to extract useful patterns from large datasets through techniques like generalization, association analysis, classification, and cluster analysis. It can be applied to many types of data and has uses in domains such as business, science, and healthcare to help analyze data and discover useful knowledge.
This chapter introduces data mining and discusses its rise due to the massive growth of digital data. It describes data mining as the automated extraction of meaningful patterns from large data sets, and notes it draws on techniques from machine learning, statistics, pattern recognition, and database systems. The chapter outlines different types of data that can be mined, patterns that can be discovered, and applications of data mining in various domains including business, science, and on the web.
Building Your Application Security Data Hub - OWASP AppSecUSADenim Group
One of the reasons application security is so challenging to address is that it spans multiple teams within an organization. Development teams build software, security testing teams find vulnerabilities, security operations staff manage applications in production and IT audit organizations make sure that the resulting software meets compliance and governance requirements. In addition, each team has a different toolbox they use to meet their goals, ranging from scanning tools, defect trackers, Integrated Development Environments (IDEs), WAFs and GRC systems. Unfortunately, in most organizations the interactions between these teams is often strained and the flow of data between these disparate tools and systems is non-existent or tediously implemented manually.
In today’s presentation, we will demonstrate how leading organizations are breaking down these barriers between teams and better integrating their disparate tools to enable the flow of application security data between silos to accelerate and simplify their remediation efforts. At the same time, we will show how to collect the proper data to measure the performance and illustrate the improvement of the software security program. The challenges that need to be overcome to enable teams and tools to work seamlessly with one another will be enumerated individually. Team and tool interaction patterns will also be outlined that reduce the friction that will arise while addressing application security risks. Using open source products such as OWASP ZAP, ThreadFix, Bugzilla and Eclipse, a significant amount of time will also be spent demonstrating the kinds of interactions that need to be enabled between tools. This will provide attendees with practical examples on how to replicate a powerful, integrated Application Security program within their own organizations. In addition, how to gather program-wide metrics and regularly calculate measurements such as mean-time-to-fix will also be demonstrated to enable attendees to monitor and ensure the continuing health and performance of their Application Security program.
Towards a Threat Hunting Automation Maturity ModelAlex Pinto
Threat Hunting has been commonly definable as a series of investigative actions that should be performed by analyst teams to cover detection gaps where automated tools fail. However, as those techniques become more and more widespread and standardized, wouldn’t it be the case that we can automate a large part of those threat hunting activities, creating a definition oxymoron?
In this session, we will demonstrate how some threat hunting techniques can be automated or constructed to augment human activity by encoding analyst intuition into repeatable data extraction and processing technologies. Those techniques can be used to simplify the triage stage and get actionable information from potential threats with minimal human interaction. We then present a Hunting Automation Maturity Model (HAMM) that organizes these techniques around capability milestones, including internal and external context and analytical tooling.
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
This document provides an introduction to data mining concepts. It discusses why data mining is important due to the massive growth of data. It defines data mining as the automated analysis of large datasets to discover hidden patterns and unknown correlations. The document presents a multi-dimensional view of data mining, including the types of data that can be mined, the patterns that can be discovered, techniques used, and applications. It provides an overview of the key concepts in data mining.
AI & ML in Cyber Security - Why Algorithms Are DangerousRaffael Marty
Every single security company is talking in some way or another about how they are applying machine learning. Companies go out of their way to make sure they mention machine learning and not statistics when they explain how they work. Recently, that's not enough anymore either. As a security company you have to claim artificial intelligence to be even part of the conversation.
Guess what. It's all baloney. We have entered a state in cyber security that is, in fact, dangerous. We are blindly relying on algorithms to do the right thing. We are letting deep learning algorithms detect anomalies in our data without having a clue what that algorithm just did. In academia, they call this the lack of explainability and verifiability. But rather than building systems with actual security knowledge, companies are using algorithms that nobody understands and in turn discover wrong insights.
In this talk I will show the limitations of machine learning, outline the issues of explainability, and show where deep learning should never be applied. I will show examples of how the blind application of algorithms (including deep learning) actually leads to wrong results. Algorithms are dangerous. We need to revert back to experts and invest in systems that learn from, and absorb the knowledge, of experts.
Mining Software Repositories for Security: Data Quality Issues Lessons from T...CREST
This presentation highlights a range of issues that arise when dealing with data quality, and poses several recommendations, including:
Consideration of Label Noise in Negative Class
• Semi-Supervised, e.g., self-training, positive or Unlabeled training on unlabeled set
• Consideration of Timeliness
• Currently labeled data & more positive samples; Preserve data sequence for training
• Use of Data Visualization
• Try to achieve better data understandability for non data scientists
• Creation and Use of Diverse Language Datasets
• Bug seeding into semantically similar languages
• Use of Data Quality Assessment Criteria
• Determine and use specific data quality assessment approaches
• Better Data Sharing and Governance
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
Similar to Delivering Security Insights with Data Analytics and Visualization (20)
How to protect, detect, and respond to your threats.
This is an MSP centric talk exploring how to detect, protect, and respond to cyber security threats. We first walk through the cyber defense matrix, explore what security intelligence needs to be and emphasize the concepts with two case studies of BlackCat.
Blog Post: http://raffy.ch/blog. - Video: https://youtu.be/nk5uz0VZrxM
In this video we talk about the world of security data or log data. In the first section, we dive into a bit of a history lesson around log management, SIEM, and big data in security. We then shift to the present to discuss some of the challenges that we face today with managing all of that data and also discuss some of the trends in the security analytics space. In the third section, we focus on the future. What does tomorrow hold in the SIEM / security data space? What are some of the key features we will see and how does this matter to the user of these approaches.
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?Raffael Marty
The cyber security industry has spent trillions of dollars to keep external attackers at bay. To what effect? We still don't see an end to the cat and mouse game between attackers and the security industry; zero day attacks, new vulnerabilities, ever increasingly sophisticated attacks, etc. We need a paradigm shift in security. A shift away from traditional threat intelligence and indicators of compromise (IOCs). We need to look at understanding behaviors. Those of devices and those of humans.
What are the security approaches and trends that will make an actual difference in protecting our critical data and intellectual property; not just from external attackers, but also from malicious insiders? We will explore topics from the 'all solving' artificial intelligence to risk-based security. We will look at what is happening within the security industry itself, where startups are putting placing their bets, and how human factors will play an increasingly important role in security, along with all of the potential challenges that will create.
Artificial Intelligence – Time Bomb or The Promised Land?Raffael Marty
Companies have AI projects. Security products use AI to keep attackers out and insiders at bay. But what is this "AI" that everyone talks about? In this talk we will explore what artificial intelligence in cyber security is, where the limitations and dangers are, and in what areas we should invest more in AI. We will talk about some of the recent failures of AI in security and invite a conversation about how we verify artificially intelligent systems to understand how much trust we can place in them.
Alongside the AI conversation, we will discover that we need to make a shift in our traditional approach to cyber security. We need to augment our reactive approaches of studying adversary behaviors to understanding behaviors of users and machines to inform a risk-driven approach to security that prevents even zero day attacks.
In this presentation I explore the topic of artificial intelligence in cyber security. What is AI and how do we get to real intelligence in a cyber context. I outline some of the dangers of the way we are using algorithms (AI, ML) today and what that leads to. We then explore how we can add real intelligence through export knowledge to the problem of finding attackers and anomalies in our applications and networks.
Presented at AI 4 Cyber in NYC on April 30, 2019
The document summarizes an agenda for a Security Chat event discussing various cybersecurity topics:
1) Several speakers will present on DevSecOps, formjacking, open source security, and tools for discovering information on the internet.
2) The event is sponsored by Forcepoint, a large cybersecurity company that provides human-centric security solutions like data protection, web security, CASB, NGFW, and more.
3) There is an opportunity for lightning talks and announcements regarding job openings or presentation sharing at the conclusion.
AI & ML in Cyber Security - Why Algorithms are DangerousRaffael Marty
This document discusses the dangers of using algorithms in cybersecurity. It makes three key points:
1) Algorithms make assumptions about the data that may not always be valid, and they do not take important domain knowledge into account.
2) Throwing algorithms at security problems without proper understanding of the data and algorithms can be dangerous and lead to failures.
3) A Bayesian belief network approach that incorporates domain expertise may be better suited for security tasks than purely algorithmic approaches. It allows modeling relationships between different factors and computing probabilities.
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedRaffael Marty
We are writing the year 2017. Cyber security has been a discipline for many years and thousands of security companies are offering solutions to deter and block malicious actors in order to keep our businesses operating and our data confidential. But fundamentally, cyber security has not changed during the last two decades. We are still running Snort and Bro. Firewalls are fundamentally still the same. People get hacked for their poor passwords and we collect logs that we don't know what to do with. In this talk I will paint a slightly provocative and dark picture of security. Fundamentally, nothing has really changed. We'll have a look at machine learning and artificial intelligence and see how those techniques are used today. Do they have the potential to change anything? How will the future look with those technologies? I will show some practical examples of machine learning and motivate that simpler approaches generally win. Maybe we find some hope in visualization? Or maybe Augmented reality? We still have a ways to go.
Ensuring security of a company’s data and infrastructure has largely become a data analytics challenge. It is about finding and understanding patterns and behaviors that are indicative of malicious activities or deviations from the norm. Data, Analytics, and Visualization are used to gain insights and discover those malicious activities. These three components play off of each other, but also have their inherent challenges. A few examples will be given to explore and illustrate some of these challenges,
Creating Your Own Threat Intel Through Hunting & VisualizationRaffael Marty
The security industry is talking a lot about threat intelligence; external information that a company can leverage to understand where potential threats are knocking on the door and might have already perpetrated the network boundaries. Conversations with many CERTs have shown that we have to stop relying on knowledge about how attacks have been conducted in the past and start ‘hunting’ for signs of compromises and anomalies in our own environments.
In this presentation we explore how the decade old field of security visualization has emerged. We show how we have applied advanced analytics and visualization to create our own threat intelligence and investigated lateral movement in a Fortune 50 company.
Visualization. Data science. No machine learning. But pretty pictures.What is internal threat intelligence? Check out http://www.darkreading.com/analytics/creating-your-own-threat-intel-through-hunting-and-visualization/a/d-id/1321225
Raffael Marty gave a presentation on big data visualization. He discussed using visualization to discover patterns in large datasets and presenting security information on dashboards. Effective dashboards provide context, highlight important comparisons and metrics, and use aesthetically pleasing designs. Integration with security information management systems requires parsing and formatting data and providing interfaces for querying and analysis. Marty is working on tools for big data analytics, custom visualization workflows, and hunting for anomalies. He invited attendees to join an online community for discussing security visualization.
The Heatmap - Why is Security Visualization so Hard?Raffael Marty
The extent and impact of recent security breaches is showing that current approaches are just not working. But what can we do to protect our business? We have been advocating monitoring for a long time as a way to detect subtle, advanced attacks. However, products have failed to deliver on this promise. Current solutions don't scale in both data volume and analytical insights. In this presentation we will explore why it is so hard to come up with a security monitoring (or shall we call it security intelligence) approach that helps find sophisticated attackers in all the data collected. We are going to explore the question of how to visualize a billion events. We are going to look at a number of security visualization examples to illustrate the problem and some possible solutions. These examples will also help illustrate how data mining and user experience design help us get a handle of the security visualization challenges - enabling us to gain deep insight for a number of security use-cases.
Workshop: Big Data Visualization for SecurityRaffael Marty
Big Data is the latest hype in the security industry. We will have a closer look at what big data is comprised of: Hadoop, Spark, ElasticSearch, Hive, MongoDB, etc. We will learn how to best manage security data in a small Hadoop cluster for different types of use-cases. Doing so, we will encounter a number of big-data open source tools, such as LogStash and Moloch that help with managing log files and packet captures.
As a second topic we will look at visualization and how we can leverage visualization to learn more about our data. In the hands-on part, we will use some of the big data tools, as well as a number of visualization tools to actively investigate a sample data set.
Vision is a human’s dominant sense. It is the communication channel with the highest bandwidth into the human brain. Security tools and applications need to make better use of information visualization to enhance human computer interactions and information exchange.
In this talk we will explore a few basic principles of information visualization to see how they apply to cyber security. We will explore both visualization as a data presentation, as well as a data discovery tool. We will address questions like: What makes for effective visualizations? What are some core principles to follow when designing a dashboard? How do you go about visually exploring a terabyte of data? And what role do big data and data mining play in security visualization?
The presentation is filled with visualizations of security data to help translate the theoretical concepts into tangible applications.
DAVIX - Data Analysis and Visualization LinuxRaffael Marty
DAVIX, a live CD for data analysis and visualization, brings the most important free tools for data processing and visualization to your desk. There is no hassle with installing an operating system or struggle to build the necessary tools to get started with visualization. You can completely dedicate your time to data analysis.
This document discusses the intersection of cloud computing, big data, and security. It explains how cloud computing has enabled big data by providing large amounts of cheap storage and on-demand computing power. This has allowed companies to analyze larger datasets than ever before to gain insights. However, big data also presents security challenges as more data is stored remotely in the cloud. The document outlines both the benefits and risks to security from adopting cloud computing and discusses how big data analytics could also be used to enhance cyber security.
Cyber Security – How Visual Analytics Unlock InsightRaffael Marty
Video can be found at: http://youtu.be/CEAMF0TaUUU
In the Cyber Security domain, we have been collecting ‘big data’ for almost two decades. The volume and variety of our data is extremely large, but understanding and capturing the semantics of the data is even more of a challenge. Finding the needle in the proverbial haystack has been attempted from many different angles. In this talk we will have a look at what approaches have been explored, what has worked, and what has not. We will see that there is still a large amount of work to be done and data mining is going to play a central role. We’ll try to motivate that in order to successfully find bad guys, we will have to embrace a solution that not only leverages clever data mining, but employs the right mix between human computer interfaces, data mining, and scalable data platforms.
AfterGlow is a script that assists with the visualization of log data. It reads CSV files and converts them into a Graph description. Check out http://afterglow.sf.net for more information also.
This short presentation gives an overview of AfterGlow and outlines the features and capabilities of the tool. It discusses some of the harder to understand features by showing some configuration examples that can be used as a starting point for some more sophisticated setups.
AftterGlow is one the most downloaded security visualization tools with over 17,000 downloads.
Supercharging Visualization with Data MiningRaffael Marty
We are exploring how data mining can help visualization. I am giving examples of security visualizations and am discussing how data mining best augments visualization efforts.
Security Visualization - Let's Take A Step BackRaffael Marty
I gave the keynote at VizSec 2012. I used the opportunity to take a step back to see where security visualization is at and propose a challenge for how some of the problems we should be focusing on going forward.
Video recording is here: http://youtu.be/AEAs7IzTHMo
”NewLo":the New Loyalty Program for the Web3 Erapjnewlo
A loyalty program which based on the points has been playing a role of accelarator among the various activities in the economy. However, new economy trends, creator-economy and tokenomy, the revolution of new technologies, web3 AI, and more globalization are coming up.Those change society and economy, we believe it is the time that loyalty program has to re-consider its methods for configuration and efficiency.
“NewLo” is a brand new Loyalty program, which convert point into token.
Ethics guidelines for trustworthy AI (HIGH-LEVEL EXPERT GROUP ON ARTIFICIAL I...prb404
On 8 April 2019, the High-Level Expert Group on AI presented Ethics Guidelines for Trustworthy Artificial Intelligence. This followed the publication of the guidelines' first draft in December 2018 on which more than 500 comments were received through an open consultation.
According to the Guidelines, trustworthy AI should be:
(1) lawful - respecting all applicable laws and regulations
(2) ethical - respecting ethical principles and values
(3) robust - both from a technical perspective while taking into account its social environment
3. Raffael Marty
• Sophos
• PixlCloud
• Loggly
• Splunk
• ArcSight
• IBM Research
• SecViz
• Logging
• Big Data
• ML & AI
• SIEM
• Leadership
• Zen
4. 4
The master of Kennin temple was Mokurai. He had a little
protégé named Toyo who was only twelve years old. Toyo saw
how students entered the masters room each day and received
instructions and guidance in Zen. The young boy wished to do
zazen (meditation) as well. Upon convincing Mokuri, he went in
front of the master who gave him the following koan to ponder:
"You can hear the sound of two hands when
they clap together," said Mokurai. "Now show
me the sound of one hand."
5. Outline
5
• Big Data for Security
• A Security (Big) Data Journey
• Machine Learning and Artificial Intelligence
• Data Visualization
• Solving Security Problems with Data
• A Glimpse Into the Future
• My 5 Security Big Data Challenges
7. “memory has become the new hard disk,
hard disks are the tapes of years ago.”
-- unknown source
7
8. Security Data
Data
• infrastructure / network logs (flows, dns, dhcp,
proxy, routing, IPS, DLP, …)
• host logs (file access, process launch, socket
activity, etc.)
• HIPS, anti virus, file integrity
• application logs (Web, SAP, HR, …)
• metrics
• configuration changes (host, network
equipment, physical access, applications)
• indicators of compromise (threat feeds)
• physical access logs
• cloud instrumentation data
• change tickets
• incident information
Context
• asset information and classification
• identity context (roles, etc.)
• information classification and location (tracking
movement?)
• HR / personnel information
• vulnerability scans
• configuration information for each machine, network
device, and application
9. Big Data Systems – A Complex Ecosystem
9
Storing any kind of data
o Schema-less but with schema on demand
o Storing event data (time-series data, logs)
o Storing metrics
Data access
o Fast random access
o Ad-hoc analytical workloads
o Search
o Running models (data science)
Data processing needs
o Metric generation from raw logs
o Real-time matching against high volume
threat feeds
o Anonymization
o Building dynamic context from the data
o Enrichment with entity information
Use-cases
• Situational awareness / dashboards
• Alert triage
• Forensic investigations
• Incident management
• Reports (e.g., for compliance)
• Data sharing / collaboration
• Hunting
• Anomaly detection
• Behavioral analysis
• Pattern detection
• Scoring
requires
10. Are Today’s Systems Ready For Big Data Use Cases?
10
Data Sources
• Haven’t been built with analysis in mind
• Logs are incomplete
• Log formats are not standardized
Log mgmt | SIEM | “Big Data Lakes”
• Don’t scale well to volumes, variety, and velocity
• No standard data pipelines – results in point to point integrations that are
imperfect
• No standard storage concepts – results in data duplication
• No standard use-cases – results in ‘spaghetti architectures’
12. (Incomplete) Security Data History
12
“Big Data Is An Old Problem in Security”
1980
Firewalls,
IPSs, OSs,
Apps, Infra,
etc.
SecurityBigData
syslogd(8)
1996
Log Management and first SIM
“Big Data” in security
RDBMS
(way earlier already)
2004
CEF Standard (2007 CEE)
2006 2009 2014 2016
First logging as a service offering
Security Data Lake
Apache Metron (Open SOC)
Apache Spot
Distributed storage and processing
(Hadoop 0.1.0)
AWS (re-launch)
Kafka
Separation of query engines and data stores
(Presto, Drill, parquet, etc.)
Continued innovation on cloud platforms
(Athena, S3, etc.)
First RAID conference (ML / AD)
ML is slow and missing training data
First VizSec conference
Device and user-context correlation
First ”security analytics” solution
Deep Learning in security
(traffic and malware identification)
”Big Bang of Deep Learning”
First unstructured data store
and search engine (Solr)
Columnar data stores become
popular (MonetDB, etc.)
R (previously S)
Data Lake
Data centralization
Data insight
13. Security Data – The State Today
13
• “Security Data Lakes – an excuse to collect anything without having to think
about schemas and access patterns.”
• Data and infrastructure challenges to overcome
o Data standardization (parsing, schemas)
- Meaning of log entries and fields within
- When is a log generated, when not?
o Data infrastructure
- One architecture for all use-cases
- Self maintaining and healing
o Building ‘content’ across customers?
- Different policies
- Different data sources and configurations
o Data Privacy
15. ML and AI – What Is It?
15
• Machine learning – Algorithmic ways to “describe” data
o Supervised
- We are giving the system a lot of training data and it learns from that
o Unsupervised
- We give the system some kind of optimization to solve (clustering, dim reduction)
• Deep learning – a ‘newer’ machine learning algorithm
o Eliminates the feature engineering step
o Verifiability issues
• Data Mining – Methods to explore data – automatically and interactively
• Artificial Intelligence – “Just calling something AI doesn’t make it AI.”
”A program that doesn't simply classify or compute model parameters, but
comes up with novel knowledge that a security analyst finds insightful.”
16. Machine Learning in Security
16
• Supervised
o Malware classification
- Deep learning on millions of samples - 400k new malware samples a day
- Has increased true positives and decreased false positives compared to traditional ML
o Spam identification
• Unsupervised
o Tier 1 analyst automation (reducing workload from 600M events to 100 incidents)*
o User and Entity Behavior Analytics (UEBA)
- Uses mostly regular statistics and rule-based systems
* See Respond Software Inc.
17. Application of Machine Learning - Anomaly Detection
Objective : Find ‘security incidents’ in the data –
deviations from the ‘norm’
• What’s “normal”?
• Needs explainability for clusters
• Observe clusters over time (requires stable
‘incremental’ clustering)
• Even 0.01% of false positives are too high (1m
log records -> 100 anomalies)
18. Limits of Machine Learning
18
“Everyone calls their stuff ‘machine learning’ or even better ‘artificial intelligence’ - It’s not cool to
use statistics!”
“Companies are throwing algorithms on the wall to see what sticks - see security analytics market”
Machine Learning Challenges
• An algorithm is not he answer. It’s the process around it (find the best fit algorithm for the data
and use-case, feature engineering, supervision, drop outs, parameter choices, etc.)
• Even in deep learning, it’s not just about using tensorflow. Features matter (e.g., independent
bytes versus program flow)
• The algorithms are only as good as the data and the knowledge of the data
o Common data layers / common data models
o Enriched data
o Clean data (e.g, source/destination confusions)
• How do we build systems that incorporate expert knowledge?
19. Illustration of Parameter Choices and Their Failures
• t-SNE clustering of network traffic from two types of machines
perplexity = 3
epsilon = 3
No clear separation
perplexity = 3
epsilon = 19
3 clusters instead of 2
perplexity = 93
epsilon = 19
What a mess
25. Visualization Overview
25
• Why?
o Verify output of machine generated intelligence
o Focus experts where they are most useful, rather than having them build tools / queries to
understand the data
o Enable exploration and hunting
• What are the limitations?
o Data is always a problem – we need clean, enriched data
o Visualization of large data sets
o Interpretation is hard
- “And the single port with no traffic is port 0, which is reserved [24]” found in “Visualization of large
scale Netflow data” by Nicolai H Eeg-Larsen
- “… and the destinations are Internet Web Server or DNS server or both with the port 0.”
- “.. so many TCP port scans are distributed in the whole day that most of them can be considered as
false positives.”
https://www.researchgate.net/publication/257686749_IDSRadar_A_real-time_visualization_framework_for_IDS_alerts
26. VAST Challenge 2013 Submission – Spot the Problems?
26
dest port!
Port 70000?
src ports!
http://vis.pku.edu.cn/people/simingchen/docs/vastchallenge13-mc3.pdf
27. Visualization Challenges
27
• Backend
o Super quick data access in any possible way (search, scan, summarize)
o Ability to ingest any data source - intelligent parsing anyone?
• User Interface
o The right visualization paradigms
o How to visualize 1m records?
o The right data abstractions / summarizations / aggregations
o Easy to use and still flexible enough
• Data Science
o Make the machine help us interpret the data
• How to encode domain knowledge?
30. Solving Security Problems With Data
Objective: Automatically detect “problems” / attacks with data
Solution: Not ML or AI – the right process for the problem at hand
• Any data science approach:
o Encode domain knowledge – leverage trained experts (e.g., malware classification with n-grams, or
URLs)
o Involve the right ‘entities’ (e.g., push problems out to the end user)
o Collect the right data for the given use-cases – don’t forget context and cleaning
o Plan for expert feedback / validation loop
o Build solutions for actual problems with real data that produce actionable insight
o Share your insights with your peers – security is not your competitive advantage
• Supervised:
o Be selective on the problems that have good, large training data sets
• Unsupervised:
o We need good distance functions. Ones that encode domain knowledge!
31. Applications of Data in Security
31
• Prioritize event and entity data
• Rule-based correlations
• Behavior modeling
• Risk / exposure / threat computation
• Configuration assessments
• Data classification
• Data abstraction
• Cross ‘boundary’ data sharing
• Cross ‘customer’ analytics
• Crowd intelligence
• Enable free-form exploration
• Identify and attribute attacks
• Incident response
• Improve prevention
• Allocate / prioritize work / resources
• Situational awareness
• Understand exposure
• Risk inventory
• Spam, malware detection
• Feedback loop on initiatives
• Simplify security
• Continuous attestation
• Micro segmentation
• Risk informed, dynamic enforcement
(automation)
Data Data Operations Applications
Data is a core driver for many or most security use-cases
32. A Glimpse Into The Future
32http://www.aberdeenessentials.com/techpro-essentials/business-leaders-can-utilize-data-even-without-technology-background/
33. My Magic 8 Ball
• Data is distributed across the edge and (a) central data store
o We will have a (data lake)++ in every company with all security data (likely in the cloud)
o Centralize data for correlation (could we get a decentralized correlation system?)
o Keep raw sensor data at the edge and access through federated query system
o Threat intelligence will be tailored to your organization and exchanged in real-time
• APIs will be everywhere to let products integrate with each other
• Security Analytics as a product category, as well as orchestration will merge with the data platforms
(SIEM++)
• Algorithms take a back seat – insights are key
o Nobody cares whether you call something artificial intelligence or machine learning. It’s about actual results
o Products will learn from users more and more
• Startups will deliver innovation, but only large organizations will be able to deliver on the overall security
promise
• Detection is great. Protection is key. Closing the loop between insight and action.
o Continuous attestation
o Risk-based defense
• No 3D visualizations
34. Thoughts on How We Get There
34
• Focus on three types of users
o Data scientists and hunters – that now how to program, have security domain knowledge, and can find complex insights
o Security analysts – that are using product interfaces to deal with security issues that the system couldn’t deal with automatically
o Non security experts – that need insight into what is happening, but don’t know enough to intervene
• AWS will productize the ’all encompassing data backend’ (others will contribute the technology)
o Abstracting the data storage layer
o Self-optimizing and monitoring query engine
• Hire and train good UX people
• Hire and train security domain experts
o ”A course doesn’t make you a data scientist – not a good one at least”. It’s about the domain knowledge!
• Use deep belief networks rather than deep learning
• Build systems that help analysts and exports be more effective
o Don’t try to replace them - let them do the interesting work
o Don’t make up use-cases. Go into organizations and learn what the real problems are
o Understand the user personas you are catering to
o Stop building islands of products – SA is a feature – how do we build that on top of a common platform?
o Move away from algorithm thinking into use-cases and workflows
• Collect all your data (network and endpoint) in one data store
36. My 5 Challenges
• Establish a pattern / algorithm / use-case sharing effort
• Define a common data model everyone can buy into (CIM, CEF, CEE, Spot,
etc.)
o Including a semantic component for log records, not just syntax
• Build a common entity store
o Hooked up to a stream of data it automatically extracts entities and creates a state
store
o Allows for fast enrichment of data at ingest and query time
o Respects and enforces privacy
• Design a great CISO dashboard (framework)
o Risk and “security efficiency” oriented, actionable views
• Develop systems that ’absorb’ expert knowledge non intrusively