Index
A
- ACM code of ethics / ACM code of ethics
- action items / Action items
- Address Resolution Protocol (ARP) / IP
- alert
- situations / When do you alert?
- types / When do you alert?
- sending / How do you alert?
- services / Alerting services
- initial string / What is in an alert?
- audience, selecting / Who do you alert?
- alerting / Alerting
- all clear scenario
- declaring / Calling all clear
- alterting / Alerting
- Amazon Web Services (AWS) / Real-world interaction design, Ethernet and TCP/IP, Cloud fundamentals
- Ansible / How should we change our capacity?
- application
- instrumenting / Instrumenting an application
- measuring / What should we measure?
- application programming interface (API) / Separation of concerns
- authentication
- suggestions / Authentication
- authorization
- about / Authorization
- automation
- about / Automation
- building / Automation
- testing / Automation
- distributing / Automation
- as continuous category / Continuous everything
- autoscaling / Autoscaling
B
- Beats / ELK
- Best Current Practice (BCP) / A quick introduction to business finance
- best practices, code writing
- version control, using / Advice for writing code
- code reviews, conducting / Advice for writing code
- designated owner, assigning to projects / Advice for writing code
- humans / Advice for writing code
- blameless postmortems
- about / Blameless postmortems
- Border Gateway Protocol (BGP) / IP
- business finance
- about / A quick introduction to business finance
- economics jargon / A quick introduction to business finance
C
- Cacti / Cacti
- capacity, modifying
- about / How should we change our capacity?
- state and concurrency, checking / State and concurrency
- service limitations / Is your service limited by another service?
- events, scaling for / Scaling for events
- user-generated content (UGC) / Unpredictable growth–user-generated content
- preplanned, versus autoscaling / Preplanned versus autoscaling
- delivery criteria / Delivering
- categories, testing code
- unit tests / Unit, feature, and integration tests, Unit tests
- feature tests / Unit, feature, and integration tests, Feature tests
- integration tests / Unit, feature, and integration tests, Integration tests
- Chaos Engineering / Testing infrastructure
- checklist, for software writing
- monitoring / Building projects
- incident response / Building projects
- postmortems / Building projects
- testing and releasing / Building projects
- capacity planning / Building projects
- Chef / How should we change our capacity?
- Classless Inter-Domain Routing (CIDR) / CIDR notation
- cloud
- containers / Containers
- load balancing / Load balancing
- queues / Queues and Pub/Sub
- Pub/Sub / Queues and Pub/Sub
- Cloud
- fundamentals / Cloud fundamentals
- VMs / VMs
- autoscaling / Autoscaling
- storage, types / Storage
- CMS (content management systems) / Architecture–where performance changes come from
- command line interface (CLI) / An introduction to design and UX
- communication
- starting / Communication
- Incident Command System (ICS), using / Incident Command System (ICS)
- instances / Where do you communicate?
- containers / Containers
- Content Distribution Network (CDN) / Architecture–where performance changes come from
- content management system (CMS) / Developer experience
- cron job / Preplanned versus autoscaling
- curl / curl and wget
D
- data recovery testing / Testing infrastructure
- dd tool / How should we change our capacity?
- design
- about / An introduction to design and UX
- devices / Devices
- Disaster Recovery Testing (DIRT)
- reference / Testing processes
- DNS (domain name system)
- Docker / How should we change our capacity?
E
- economics jargon, business finance
- cash flow / A quick introduction to business finance
- Profits and losses (P&L) / A quick introduction to business finance
- balance sheet / A quick introduction to business finance
- Capex / A quick introduction to business finance
- Opex / A quick introduction to business finance
- Return on investment (ROI) / A quick introduction to business finance
- cost center versus profit center / A quick introduction to business finance
- ElasticSearch, Logstash, and Kibana (ELK) / ELK
- ELK / ELK
- Engineering / What is SRE?
- error budgets / A short introduction to SLIs, SLOs, and error budgets, Error budgets
- Ethernet / Ethernet and TCP/IP, Ethernet
- example architecture interview / Example architecture interview
F
- 18F / Design documents
- Federal Aviation Administration (FAA) / Alerting
- Free Software Foundation (FSF) / Linux fundamentals
G
- General Public License (GPL) / Linux fundamentals
- Glad Mad Sad framework / Retrospectives and standups
- Go language
- reference / References and related reading
- goreplay
- reference / Testing infrastructure
- graphical user interface (GUI) / An introduction to design and UX
H
- HTTP (Hypertext Transfer Protocol) / HTTP
- HTTP request
- sending / Sending an HTTP request
- DNS (domain name system) / DNS
- Ethernet / Ethernet and TCP/IP
- TCP/IP / Ethernet and TCP/IP
- HTTP / HTTP
- wget / curl and wget
- curl / curl and wget
- human resources (HR) / A quick introduction to business finance
I
- incident / What is an incident?
- incident analysis
- carrying out / Carrying out incident analysis
- incident response
- about / What is incident response?
- actions / What is incident response?
- infrastructure
- testing / Testing infrastructure
- Infrastructure as a Service (IaaS) / Preplanned versus autoscaling
- inodes
- reference / Files, directories, and inodes
- Interior Gateway Protocol (IGP) / IP
- internet / The internet
- Internet Control Message Protocol (ICMP) / ICMP
- Internet Service Providers (ISPs) / The internet
- IP / IP
J
- JavaScript Chart Library / When are we going to run out of capacity?
L
- 4Ls
- Learned / Retrospectives and standups
- Liked / Retrospectives and standups
- Lacked / Retrospectives and standups
- Longed For / Retrospectives and standups
- linting / Code reviews
- Linux
- fundamentals / Linux fundamentals
- file / Everything is a file
- directories / Files, directories, and inodes
- inodes / Files, directories, and inodes
- permissions / Permissions
- sockets / Sockets
- devices / Devices
- /proc / /proc
- filesystem layout / Filesystem layout
- process / What is a process?
- syscalls / syscalls
- programs, exploring / Build your own
- load balancers (LB) / Ethernet and TCP/IP, Cloud fundamentals
- load balancing / Load balancing
- LRU (Last Recently Used) / Example architecture interview
M
- MAC (media access control) / Ethernet
- mean time between failures (MTBF) / MTTR and MTBF
- mean time to recovery (MTTR) / MTTR and MTBF
- measuring mean time to recovery (MTTR) / Recovering the system
- mongoreplay
- reference / Testing infrastructure
- monitoring
- need for / Why monitoring?
- awareness, creating / Communicating about monitoring
- about / Do they even know there is monitoring?
- monitoring data
- collecting / Collecting and saving monitoring data
- saving / Collecting and saving monitoring data
- polling applications / Polling applications, Push applications
- managing / Managing and maintaining monitoring data
- maintaining / Managing and maintaining monitoring data
- monitoring information
- displaying / Displaying monitoring information
- arbitrary queries, using / Arbitrary queries
- graphs, using / Graphs
- dashboards, using / Dashboards
- chatbots, using / Chatbots
- multi-factor authentication (MFA)
- about / Real-world interaction design
N
- Nagios / Nagios
- National Incident Management System's (NIMS) / Incident Command System (ICS)
- National Institute of Standards and Technology (NIST) / Sockets
- nc / nc
- negative testing / Integration tests
- netstat / netstat
- network
- watching, tools / Tools for watching the network
- nines
- 90% (one nine of uptime) / Service levels
- 99% (two nines of uptime) / Service levels
- 99.9% (three nines of uptime) / Service levels
- 99.95% (three and a half nines of uptime) / Service levels
- 99.99% (four nines of uptime) / Service levels
- 99.999% (five nines of uptime) / Service levels
- not invented here syndrome (NIHS) / Developer experience
O
- objectives and key results (OKRs)
- about / Long-term work
- example / Example OKRs
- on-call
- connecting / Being on call
- Open Systems Interconnection (OSI) model
- layers / Ethernet and TCP/IP
P
- past postmortems
- analyzing / Analyzing past postmortems
- mean time between failures (MTBF) / MTTR and MTBF
- mean time to recovery (MTTR) / MTTR and MTBF
- alert fatigue / Alert fatigue
- past outages, discussing / Discussing past outages
- Paxos / State and concurrency
- performance changes
- detecting / Architecture–where performance changes come from
- phishing
- about / Phishing
- plan
- need for / Why plan?
- risk, managing / Managing risk and managing expectations
- expectations, managing / Managing risk and managing expectations
- defining / Defining a plan
- current capacity, measuring / What is our current capacity?
- capacity, running out of / When are we going to run out of capacity?
- capacity, modifying / How should we change our capacity?
- executing / Execute the plan
- Platforms as a Service (PaaS) / Cloud fundamentals
- polling applications
- about / Polling applications, Push applications
- Nagios / Nagios
- Prometheus / Prometheus
- Cacti / Cacti
- Sensu / Sensu
- StatsD / StatsD
- telegraf / Telegraf
- ELK / ELK
- postmortem
- about / What is a postmortem?
- writing / Why write a postmortem?
- root cause / Root cause
- without action items / Postmortems without action items
- postmortem-templates
- reference / How to write a postmortem document
- postmortem document
- writing, situations / When to write a postmortem document
- writing / How to write a postmortem document
- postmortem meeting
- holding / Holding a postmortem meeting
- process / What is a process?
- zombies / Zombies
- orphans / Orphans
- nice command / What is nice?
- processes
- testing / Testing processes
- proc filesystem (procfs) / /proc
- projects
- finding / Finding projects
- defining / Defining projects
- Readme Driven Development (RDD) / Defining projects, RDD
- planning / Planning projects
- building / Building projects
- documenting / Documenting and maintaining projects
- maintaining / Documenting and maintaining projects
- projects, buidling
- writing code, best practices / Advice for writing code
- separation of concerns / Separation of concerns
- long-term work / Long-term work
- notebooks / Notebooks
- projects, planning
- example / Example
- Tak server, building / Example
- retrospectives / Retrospectives and standups
- standups / Retrospectives and standups
- work, allocating / Allocation
- Prometheus
- about / Prometheus
- reference / References and related reading
- Pub/Sub / Queues and Pub/Sub
- Pub/Sub queues / Queues and Pub/Sub
- Puppet / How should we change our capacity?
Q
- query-playback
- reference / Testing infrastructure
- queues / Queues and Pub/Sub
R
- Raft / State and concurrency
- Readme Driven Development (RDD)
- ABOUT / RDD
- about / RDD
- example / Example
- design documents / Design documents
- real-world interaction design
- about / Real-world interaction design
- red herring / Carrying out incident analysis
- REL (requests, errors, latency) / Why monitoring?
- release
- validating / Validating your release
- releasing
- about / Releasing
- situations / When to release
- to production / Releasing to production
- Reliability / What is SRE?
- risk profile
- about / Risk profile
- rollbacks
- about / Rollbacks
- root cause analysis (RCA) / What is a postmortem?
- Ruby 2.5.0
- reference / References and related reading
S
- S3 outage
- reference / Why write a postmortem?
- Salt / How should we change our capacity?
- scientific method steps
- observe / What do you test?
- question / What do you test?
- hypothesis / What do you test?
- test / What do you test?
- reject or approve / What do you test?
- security
- about / Security
- authentication / Security, Authentication
- authorization / Security, Authorization
- risk profile / Security, Risk profile
- phishing / Phishing
- Sensu / Sensu
- service-oriented architecture (SOA) / SRE as a framework for new projects
- Service Level Agreement (SLA) / Service levels
- Service Level Indicator (SLI) / Service levels
- Service Level Objective (SLO) / Service levels
- Service Level Objectives (SLOs) / Managing risk and managing expectations
- shadow testing
- tools / Testing infrastructure
- Simple Notification Service (SNS) / Alerting services, Queues and Pub/Sub
- Simple Queue Service (SQS) / Queues and Pub/Sub
- Sinatra
- reference / References and related reading
- Site / What is SRE?
- Site Reliability Engineering (SRE)
- history / A brief history, What is SRE?
- using, as framework for new projects / SRE as a framework for new projects
- SLIs / A short introduction to SLIs, SLOs, and error budgets
- SLOs / A short introduction to SLIs, SLOs, and error budgets
- sockets / Sockets
- Software as a Service (SaaS) / Service levels, Cloud fundamentals
- SSL (Secure Sockets Layer) / Finding projects
- StatsD
- about / StatsD
- reference / References and related reading
- StatsD Ruby library
- reference / References and related reading
- sticky bit / Permissions
- storage
- types / Storage
- syscalls
- about / syscalls
- tracing, with strace tool / How to trace
- processes, watching / Watching processes
- averages, loading / Load averages
- system
- recovering / Recovering the system
T
- tcpdump / tcpdump
- tcpreplay
- reference / Testing infrastructure
- tech
- as profit center / Tech as a profit center and procurement
- as procurement / Tech as a profit center and procurement
- Telegraf / Telegraf
- test-driven development (TDD) / Unit tests
- testing
- about / Testing
- need for / What do you test?
- testing code
- about / Testing code
- review / Code reviews
- tools
- improving / Experience of tools
- performance budgets / Performance budgets
- tools, for network watching
- about / Tools for watching the network
- netstat / netstat
- nc / nc
- tcpdump / tcpdump
- Transmission Control Protocol (TCP) / TCP
- Transportation Security Administration (TSA) / Performance budgets
- Transport Layer Security (TLS) / HTTP
- Twelve Factor App / State and concurrency
- two device identifiers / Files, directories, and inodes
U
- units of scale / Units of scale
- USDS (United States Digital Service) / Design documents
- user-generated content (UGC) / Unpredictable growth–user-generated content
- User Datagram Protocol (UDP) / UDP
- user interface (UI) / An introduction to design and UX
- user testing
- about / User testing
- experience, picking / Picking an experience
- test, designing / Designing the test
- people, finding to test / Finding people to test
- UTF-8 / HTTP
- UX
- about / An introduction to design and UX
V
- vegeta
- reference / Testing infrastructure
- Vertica / What is our current capacity?
- virtual machines (VMs) / Cloud fundamentals, VMs
W
- wget / curl and wget
- Wheel of Misfortune (WoM) / Testing processes