Ensuring security of a company’s data and infrastructure has largely become a data analytics challenge. It is about finding and understanding patterns and behaviors that are indicative of malicious activities or deviations from the norm. Data, Analytics, and Visualization are used to gain insights and discover those malicious activities. These three components play off of each other, but also have their inherent challenges. A few examples will be given to explore and illustrate some of these challenges,
3. Security – Shift Towards Analytics
6
Past Present Future
Prevention
• Single instance
focus
• AV, firewalls, IDS
• Cross entity
intelligence
• Synchronized
security
Detection
• Data collection
and centralization
• Big data
technologies
• Machine learning
attempts
• Many challenges
• Prediction?
• Machine assisted
insights
• UX focus
• Patterns, behaviors,
collaboration
+
• Data driven
learn
Why the shift? Attackers use novel and specific methods to compromise each target.
5. Data
9
• Types of data
o Time-series (with lots of categorical fields)
o Context (spatial data) – Entities, blacklists, etc.
o Multiple records for one “transaction” (fusion?)
• Many access use-cases
o Lookups / joins (external services also)
o Search, aggregate, compute, … (One interface? (extended) SQL?)
• Data challenges
o Collection (many data formats, many transports)
o Scale (storage cost, access speed)
o Encryption (transparent, fast)
o Operational challenges (bottlenecks, etc.)
o Collaboration (security, transport)
o How to find relevant insights? Not statistical anomalies!
• Can we get a reference implementation? The proverbial hair ball
6. Analytics
10
• Mostly anomaly / outlier detection! Finding attacker behavior in the data
o But what’s normal? This is not about statistical outliers!
• Approaches
o Cohort analysis (users and machines) -> e.g., clustering
o Hypothesis implementation -> e.g., beacon detection
o ”Learning” behavior -> e.g., interactive visualization of metrics
• Analytics challenges
o Categorical data
o Large amounts of data
o Statistical vs. actual anomalies
o Distance functions
o Not a ‘closed’ system
• We need humans in the loop! And that’s where visualization comes in.
Analytics drives visualization.
10
8. Why Visualization?
15
• SELECT count(distinct protocol) FROM flows;
• SELECT count(distinct port) FROM flows;
• SELECT count(distinct src_network) FROM flows;
• SELECT count(distinct dest_network) FROM flows;
• SELECT port, count(*) FROM flows GROUP BY port;
• SELECT protocol,
count(CASE WHEN flows < 200 THEN 1 END) AS [<200],
count(CASE WHEN flows>= 201 AND flows < 300 THEN 1 END)
AS [201 - 300],
count(CASE WHEN flows>= 301 AND flows < 350 THEN 1 END)
AS [301 - 350],
count(CASE WHEN flows>= 351 THEN 1 END) AS [>351]
FROM flows GROUP BY protocol;
• SELECT port, count(distinct src_network) FROM flows GROUP BY
port;
• SELECT src_network, count(distinct dest_network) FROM flows
GROUP BY port;
• SELECT src_network, count(distinct dest_network) AS dn,
sum(flows) FROM flows GROUP BY port, dn;
• SELECT port, protocol, count(*) FROM flows GROUP BY port,
protocol;
• SELECT sum(flows), dest_network FROM flows GROUP BY
dest_network;
• etc.
port dest_network
protocol src_network flows
10. Sophos – Security Made Simple
20
• For non experts
• Consolidating security capabilities
• Open architecture
• Data science to SOLVE problems
not to highlight issues
Analytics
UTM/Next-Gen Firewall
Wireless
Web
Email
Disk Encryption
File Encryption
Endpoint /
Next-Gen Endpoint
Mobile
Server
Sophos Central