Hacks, Leaks, and Revelations: The Art of Analyzing Hacked and Leaked Data / Взломы, утечки и разоблачения: Искусство анализа взломанных и просочившихся данных Год издания: 2024 Автор: Lee Micah / Ли Мика Издательство: No Starch Press ISBN: 978-1-7185-0313-7 Язык: Русский Формат: PDF (Not True), EPUB Качество: Издательский макет или текст (eBook) Интерактивное оглавление: Да Количество страниц: 640 Описание: Data-science investigations have brought journalism into the 21st century, and—guided by The Intercept ’s infosec expert Micah Lee— this book is your blueprint for uncovering hidden secrets in hacked datasets. In the current age of hacking and whistleblowing, the internet contains massive troves of leaked information. These complex datasets can be goldmines of revelations in the public interest— if you know how to access and analyze them. For investigative journalists, hacktivists, and amateur researchers alike, this book provides the technical expertise needed to find and transform unintelligible files into groundbreaking reports. Guided by renowned investigative journalist and infosec expert Micah Lee, who helped secure Edward Snowden’s communications with the press, youʼll learn the tools, technologies, and programming basics needed to crack open and interrogate datasets freely available on the internet or your own private datasets obtained directly from sources. Each chapter features hands-on exercises using real hacked data from governments, companies, and political groups, as well as interesting nuggets from datasets that never made it into published stories. You’ll dig into hacked files from the BlueLeaks law enforcement records, analyze social-media traffic related to the 2021 attack on the U.S. Capitol, and get the exclusive story of privately leaked data from anti-vaccine group America’s Frontline Doctors. Научные расследования с использованием данных привели журналистику в XXI век, и эта книга, написанная под руководством эксперта по информационной безопасности The Intercept Мики Ли, станет вашим руководством по раскрытию секретов во взломанных массивах данных. В нынешнюю эпоху взломов и разоблачений Интернет содержит огромные массивы просочившейся информации. Эти сложные наборы данных могут стать кладезем разоблачений в интересах общества — если вы знаете, как получить к ним доступ и проанализировать их. Для журналистов-расследователей, хактивистов и исследователей-любителей эта книга содержит технические знания, необходимые для поиска и преобразования непонятных файлов в новаторские отчеты. Под руководством известного журналиста-расследователя и эксперта по информационной безопасности Мики Ли, который помог обезопасить общение Эдварда Сноудена с прессой, вы изучите инструменты, технологии и основы программирования, необходимые для взлома и опроса наборов данных, свободно доступных в Интернете, или ваших собственных частных наборов данных, полученных непосредственно из источников. В каждой главе представлены практические упражнения с использованием реальных взломанных данных правительств, компаний и политических групп, а также интересных фрагментов из наборов данных, которые так и не попали в опубликованные статьи. Вы ознакомитесь со взломанными файлами из архивов правоохранительных органов BlueLeaks, проанализируете трафик в социальных сетях, связанный с атакой на Капитолий США в 2021 году, и получите эксклюзивную историю о частной утечке данных от американской группы Frontline Doctors, выступающей против вакцинации.
Примеры страниц
Оглавление
ACKNOWLEDGMENTS INTRODUCTION Why I Wrote This Book What You’ll Learn What You’ll Need PART I: SOURCES AND DATASETS 1 PROTECTING SOURCES AND YOURSELF Safely Communicating with Sources Working with Public Data Protecting Sensitive Information Minimizing the Digital Trail Working with Hackers and Whistleblowers Secure Storage for Datasets Low-Sensitivity Datasets Medium-Sensitivity Datasets High-Sensitivity Datasets Authenticating Datasets The AFLDS Dataset The WikiLeaks Twitter Group Chat Redaction What Data to Publish What to Redact Making Requests for Comment Password Managers Disk Encryption Exercise 1-1: Encrypt Your Internal Disk Windows macOS Linux Exercise 1-2: Encrypt a USB Disk Windows macOS Linux Protecting Yourself from Malicious Documents Exercise 1-3: Install and Use Dangerzone Summary 2 ACQUIRING DATASETS The End of WikiLeaks Distributed Denial of Secrets Downloading Datasets with BitTorrent The Origins of BlueLeaks Exercise 2-1: Download the BlueLeaks Dataset Communicating with Encrypted Messaging Apps Exercise 2-2: Install and Practice Using Signal Encrypting Messages with PGP Staying Anonymous Online with Tor and OnionShare Exercise 2-3: Play with Tor and OnionShare Communicating with My Tea Party Patriots Source Other Options for Acquiring Datasets from Sources Encrypted USB Drives Virtual Private Servers Whistleblower Submission Systems Summary PART II: TOOLS OF THE TRADE 3 THE COMMAND LINE INTERFACE Introducing the Command Line The Shell Users and Paths User Privileges Exercise 3-1: Install Ubuntu in Windows Basic Command Line Usage Opening a Terminal Clearing Your Screen and Exiting the Shell Exploring Files and Directories Navigating Relative and Absolute Paths Changing Directories Using the help Argument Accessing Man Pages Tips for Navigating the Terminal Entering Commands with Tab Completion Editing Commands Dealing with Spaces in Filenames Using Single Quotes Around Double Quotes Installing and Uninstalling Software with Package Managers Exercise 3-2: Manage Packages with Homebrew on macOS Exercise 3-3: Manage Packages with apt on Windows or Linux Exercise 3-4: Practice Using the Command Line with cURL Download a Web Page with cURL Save a Web Page to a File Text Files vs. Binary Files Exercise 3-5: Install the VS Code Text Editor Exercise 3-6: Write Your First Shell Script Navigate to Your USB Disk Create an Exercises Folder Open a VS Code Workspace Write the Shell Script Run the Shell Script Exercise 3-7: Clone the Book’s GitHub Repository Summary 4 EXPLORING DATASETS IN THE TERMINAL Introducing for Loops Exercise 4-1: Unzip the BlueLeaks Dataset Unzip Files on macOS or Linux Unzip Files on Windows Organize Your Files How the Hacker Obtained the BlueLeaks Data Exercise 4-2: Explore BlueLeaks on the Command Line Calculate How Much Disk Space Folders Use Use Pipes and Sort Output Create an Inventory of Filenames in a Dataset Count the Files in a Dataset Exercise 4-3: Find Revelations in BlueLeaks with grep Filter for Documents Mentioning Antifa Filter for Certain Types of Files Use grep with Regular Expressions Search Files in Bulk with grep Encrypted Data in the BlueLeaks Dataset Data Analysis with Servers in the Cloud Exercise 4-4: Set Up a VPS Generate an SSH Key Add Your Public Key to the Cloud Provider Create a VPS SSH into Your Server Start a Byobu Session Install Updates Exercise 4-5: Explore the Oath Keepers Dataset Remotely Summary 5 DOCKER, ALEPH, AND MAKING DATASETS SEARCHABLE Introducing Docker and Linux Containers Exercise 5-1: Initialize Docker Desktop on Windows and macOS Exercise 5-2: Initialize Docker Engine on Linux Running Containers with Docker Running an Ubuntu Container Listing and Killing Containers Mounting and Removing Volumes Passing Environment Variables Running Server Software Freeing Up Disk Space Exercise 5-3: Run a WordPress Site with Docker Compose Make a docker-compose.yaml File Start Your WordPress Site Introducing Aleph Exercise 5-4: Run Aleph Locally in Linux Containers Using Aleph’s Web and Command Line Interfaces Indexing Data in Aleph Exercise 5-5: Index a BlueLeaks Folder in Aleph Mount Your Datasets into the Aleph Shell Index the icefishx Folder Check Indexing Status Explore BlueLeaks with Aleph Additional Aleph Features Dedicated Aleph Servers Summary 6 READING OTHER PEOPLE’S EMAIL The Email Protocol and Message Structure File Formats for Email Dumps EML Files MBOX Files PST Outlook Data Files Exercise 6-1: Download Email Dumps from Three Datasets The Nauru Police Force Dataset The Oath Keepers Dataset The Heritage Foundation Dataset Researching Email Dumps with Thunderbird Exercise 6-2: Configure Thunderbird for Email Dumps Reading Individual EML Files with Thunderbird Exercise 6-3: Import the Nauru Police Force EML Email Dump Searching Email in Thunderbird Quick Filter Searches The Search Messages Dialog Exercise 6-4: Import the Oath Keepers MBOX Email Dump Exercise 6-5: Import the Heritage Foundation PST Email Dump Other Tools for Researching Email Dumps Microsoft Outlook Aleph Summary PART III: PYTHON PROGRAMMING 7 AN INTRODUCTION TO PYTHON Exercise 7-1: Install Python Windows Linux macOS Exercise 7-2: Write Your First Python Script Python Basics The Interactive Python Interpreter Comments Math with Python Strings Exercise 7-3: Write a Python Script with Variables, Math, and Strings Lists and Loops Defining and Printing Lists Running for Loops Control Flow Comparison Operators if Statements Nested Code Blocks Searching Lists Logical Operators Exception Handling Exercise 7-4: Practice Loops and Control Flow Functions The def Keyword Default Arguments Return Values Docstrings Exercise 7-5: Practice Writing Functions Summary 8 WORKING WITH DATA IN PYTHON Modules Python Script Template Exercise 8-1: Traverse the Files in BlueLeaks List the Filenames in a Folder Count the Files and Folders in a Folder Traverse Folders with os.walk() Exercise 8-2: Find the Largest Files in BlueLeaks Third-Party Modules Exercise 8-3: Practice Command Line Arguments with Click Avoiding Hardcoding with Command Line Arguments Exercise 8-4: Find the Largest Files in Any Dataset Dictionaries Defining Dictionaries Getting and Setting Values Navigating Dictionaries and Lists in the Conti Chat Logs Exploring Dictionaries and Lists Full of Data in Python Selecting Values in Dictionaries and Lists Analyzing Data Stored in Dictionaries and Lists Exercise 8-5: Map Out the CSVs in BlueLeaks Accept a Command Line Argument Loop Through the BlueLeaks Folders Fill Up the Dictionary Display the Output Reading and Writing Files Opening Files Writing Lines to a File Reading Lines from a File Exercise 8-6: Practice Reading and Writing Files Summary PART IV: STRUCTURED DATA 9 BLUELEAKS, BLACK LIVES MATTER, AND THE CSV FILE FORMAT Installing Spreadsheet Software Introducing the CSV File Format Exploring CSV Files with Spreadsheet Software and Text Editors My BlueLeaks Investigation Focusing on a Fusion Center Introducing NCRIC Investigating a SAR Reading and Writing CSV Files in Python Exercise 9-1: Make BlueLeaks CSVs More Readable Accept the CSV Path as an Argument Loop Through the CSV Rows Display CSV Fields on Separate Lines How to Read Bulk Email from Fusion Centers Lists of Black Lives Matter Demonstrations “Intelligence” Memos from the FBI and DHS A Brief HTML Primer Exercise 9-2: Make Bulk Email Readable Accept the Command Line Arguments Create the Output Folder Define the Filename for Each Row Write the HTML Version of Each Bulk Email Discovering the Names and URLs of BlueLeaks Sites Exercise 9-3: Make a CSV of BlueLeaks Sites Open a CSV for Writing Find All the Company.csv Files Add BlueLeaks Sites to the CSV Summary 10 BLUELEAKS EXPLORER Undiscovered Revelations in BlueLeaks Exercise 10-1: Install BlueLeaks Explorer Create the Docker Compose Configuration File Bring Up the Containers Initialize the Databases The Structure of NCRIC Exploring Tables and Relationships Searching for Keywords Building Your Own BlueLeaks Structure Defining the JRIC Structure Showing Useful Fields Changing Field Types Adding JRIC’s Leads Table Building a Relationship Verifying BlueLeaks Data Exercise 10-2: Finish Building the Structure for JRIC The Technology Behind BlueLeaks Explorer The Backend The Frontend Summary 11 PARLER, THE JANUARY 6 INSURRECTION, AND THE JSON FILE FORMAT The Origins of the Parler Dataset How the Parler Videos Were Archived The Dataset’s Impact on Trump’s Second Impeachment Exercise 11-1: Download and Extract Parler Video Metadata Download the Metadata Uncompress and Download Individual Parler Videos Extract Parler Metadata The JSON File Format Understanding JSON Syntax Parsing JSON with Python Handling Exceptions with JSON Tools for Exploring JSON Data Counting Videos with GPS Coordinates Using grep Formatting and Searching Data with the jq Command Exercise 11-2: Write a Script to Filter for Videos with GPS from January 6, 2021 Accept the Parler Metadata Path as an Argument Loop Through Parler Metadata Files Filter for Videos with GPS Coordinates Filter for Videos from January 6, 2021 Working with GPS Coordinates Searching by Latitude and Longitude Converting Between GPS Coordinate Formats Calculating GPS Distance in Python Finding the Center of Washington, DC Exercise 11-3: Update the Script to Filter for Insurrection Videos Plotting GPS Coordinates on a Map with simplekml Exercise 11-4: Create KML Files to Visualize Location Data Create a KML File for All Videos with GPS Coordinates Create KML Files for Videos from January 6, 2021 Visualizing Location Data with Google Earth Viewing Metadata with ExifTool Summary 12 EPIK FAIL, EXTREMISM RESEARCH, AND SQL DATABASES The Structure of SQL Databases Relational Databases Clients and Servers Tables, Columns, and Types Exercise 12-1: Create and Test a MySQL Server Using Docker and Adminer Run the Server Connect to the Database with Adminer Create a Test Database Exercise 12-2: Query Your SQL Database INSERT Statements SELECT Statements JOIN Clauses UPDATE Statements DELETE Statements Introducing the MySQL Command Line Client Exercise 12-3: Install and Test the Command Line MySQL Client MySQL-Specific Queries The History of Epik The Epik Hack Epik’s WHOIS Data Exercise 12-4: Download and Extract Part of the Epik Dataset Exercise 12-5: Import Epik Data into MySQL Create a Database for api_system Import api_system Data Exploring Epik’s SQL Database The domain Table The privacy Table The hosting and hosting_server Tables Working with Epik Data in the Cloud Summary PART V: CASE STUDIES 13 PANDEMIC PROFITEERS AND COVID-19 DISINFORMATION The Origins of AFLDS The Cadence Health and Ravkoo Datasets Extracting the Data into an Encrypted File Container Analyzing the Data with Command Line Tools Creating a Single Spreadsheet of Patients Calculating Revenue from Prescriptions Filled by Ravkoo Finding the Price and Quantity of Drugs Sold Categorizing Prescription Data by Drug A Deeper Look at the Cadence Health Patient Data Finding Cadence’s Partners Searching for Patients by City Searching for Patients by Age Authenticating the Data The Aftermath HIPAA’s Breach Notification Rule Congressional Investigation Simone Gold’s New Business Venture Scandal and Infighting at AFLDS Summary 14 NEO-NAZIS AND THEIR CHATROOMS How Antifascists Infiltrated Neo-Nazi Discord Servers Analyzing Leaked Chat Logs Making JSON Files Readable Exploring Objects, Keys, and Values with jq Converting Timestamps Finding Usernames The Discord History Tracker A Script to Search the JSON Files My Discord Analysis Code Designing the SQL Database Importing Chat Logs into the SQL Database Building the Web Interface Using Discord Analysis to Find Revelations The Pony Power Discord Server The Launch of DiscordLeaks The Aftermath The Lawsuit Against Unite the Right The Patriot Front Chat Logs Summary AFTERWORD A SOLUTIONS TO COMMON WSL PROBLEMS Understanding WSL’s Linux Filesystem The Disk Performance Problem Solving the Disk Performance Problem Storing Only Active Datasets in Linux Storing Your Linux Filesystem on a USB Disk Next Steps B SCRAPING THE WEB Legal Considerations HTTP Requests Scraping Techniques Loading Pages with HTTPX Parsing HTML with Beautiful Soup Automating Web Browsers with Selenium Next Steps INDEX
Lee Micah / Ли Мика - Hacks, Leaks, and Revelations: The Art of Analyzing Hacked and Leaked Data / Взломы, утечки и разоблачения: Искусство анализа взломанных и просочившихся данных [2024, PDF, EPUB, RUS] download torrent for free and without registration
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum