Linux Systems Performance/Observability (BPF (bpfcc-tools), BCC Tools

Linux System Performance/Observability Tools

Linux Systems Performance/Observability (BPF (bpfcc-tools), BCC Tools

Assuming you have Linux Server in place and have the required BPF aka BCC related packages installed on the system(s) for the required Linux distribution.

BPF(eBPF) aka BCC Tools (bpfcc-tools) :

BPF, which originally stood for Berkley Packet Filter is the dynamic tracing tools for Linux Systems.
BPF initially used for the speeding up for the tcpdump expressions and since then it has been know as the extended Berkley packet Filter (eBPF).
Its new uses are Tracing Tools where it provides programmability for the BPF Compiler Collection (BCC) and bpftrace front ends.
Example: execsnoop, biosnoop etc is a BCC Tool.
When facing production performance crisis these such list of tools comes handy to
trace and fix the issue. However, it requires certain KERNEL level config options to be
enabled such as CONFIG_FTRACE, CONFIG_BPF.
Profiling tools typically required complied version of all packages to run properly
on the systems.

Credits: Brendan Gregg

When facing production performance crisis these such list of tools comes handy to

trace and fix the issue.

Here are the list of tools that will be handy when you wanted to fix issues for Prod or any other environments.

# Production CRISIS Tools #

#. Name - Provides

1. procps - ps, vmstat, uptime, top

2. util-linux - dmesg, lsblk, lscpu

3. sysstat - iostat, mpstat, pidstat, sar

4. iproute2 - ip, ss, nstat, tc

5. numactl - numastat

6. linux-tools-common - perf, turbostat

linux-tools-$(uname -r)

7. bcc-tools (aka bpfcc-tools) - opensnoop, execsnoop, runqlat, runqlen, softirqs, hardirqs,

ext4slower, tcptop, ext4dist, biotop, biosnoop, biolatency, tcplife, trace, argdist, funccount,

stackcount, profile etc..

8. bpftrace - bpftrace, etc..

9. perf-tools-unstable - ftrace version of opensnoop, execsnoop, iolatency,

iosnoop, bitesize, kprobe,funccount

10. trace-cmd - trace-cmd

11. nicstat - nicstat

12. ethtool - ethtool

13. tiptop - tiptop (# apt install tiptop)

14. msr-tools - rdmsr, wrmsr

--------------------------------------------------

# Linux Application Debugging/Observability Tools #

#. Tool Name - Description

1. perf - CPU (Profiling | Flame Graphs), syscall tracing

2. profile - CPU Profiling using timed sampling

3. offcputime - Off-CPU profiling using Scheduler Tracing

4. strace - Syscall Tracing

5. execsnoop - New Process Tracing

6. syscount - Syscall Counting

7. bpftrace - Signal tracing, I/O profiling, Lock analysis.

--------------------------------------------------

# Linux CPU Performance Debugging/Observability Tools #

#. Tool Name - Description

1. uptime - Load Averages (# cat /proc/pressure/cpu (10s, 60s & 300s)).

2. vmstat - Includes system-wide CPU Averages

3. mpstat - Per-CPU Statistics

4. sar - Historical Statistics

5. ps - Process Status

6. top - Monitor per-process/thread CPu usage

7. pidstat - Per-process/thread CPU breakdowns

8. time, ptime - time a command, with CPU breakdowns

9. turboboost - Show CPu clock rate and other states

10. showboot - Show CPU clock rate and turbo boost.

11. pmcarch - show high-level CPU cycle usage

12. tlbstat - Summarize TLB Cycles

13. perf - CPU profiling & PMC Analysis

14. profile - Sample CPU Stack traces

15. cpudist - Summarize on cpu-time

16. runqlat - Summarize CPU run queue latency

17. runqlen - Summarize CPU run queue length

18. softirqs - Summarize soft Interrupt time

19. hardirqs - Summarize hard Interrupt time

20. bpftrace - Tracing programs for CPU analysis

--------------------------------------------------

# Linux MEMORY Performance Debugging/Observability Tools #

#. Tool Name - Description

1. vmstat - Virtual and Physical Memory statistics

Ex: root@ip-172-31-21-94:~# vmstat -Sm 1

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

2 0 0 271 29 511 0 0 67 223 54 42 1 0 89 0 10

0 0 0 271 29 511 0 0 0 0 40 27 0 0 99 0 1

1 0 0 271 29 511 0 0 0 0 32 28 0 0 81 0 19

2. PSI - Memory pressure stall information

Ex: root@ip-172-31-21-94:~# cat /proc/pressure/memory

some avg10=0.00 avg60=0.00 avg300=0.00 total=730880

full avg10=0.00 avg60=0.00 avg300=0.00 total=649756

3. swapon - Swap Device Usage

Ex: swapon

4. sar - Historical Statistics

5. slabtop - Kernel Slab Allocator Statistics

Ex: root@ip-172-31-21-94:~# slabtop -sc

Active / Total Objects (% used) : 245106 / 253954 (96.5%)

Active / Total Slabs (% used) : 6869 / 6869 (100.0%)

Active / Total Caches (% used) : 312 / 370 (84.3%)

Active / Total Size (% used) : 61201.99K / 63941.81K (95.7%)

Minimum / Average / Maximum Object : 0.01K / 0.25K / 10.12K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME

12339 12188 98% 1.16K 457 27 14624K ext4_inode_cache

7975 7886 98% 0.62K 319 25 5104K inode_cache

30772 30772 100% 0.14K 1099 28 4396K kernfs_node_cache

22953 21881 95% 0.19K 1093 21 4372K dentry

6. numastat - NUMA Statistics

Ex: root@ip-172-31-21-94:~# numastat

node0

numa_hit 1673622

numa_miss 0

numa_foreign 0

interleave_hit 77

local_node 1673622

other_node 0

7. ps - Process Status

Ex: # ps aux

# ps -eo pid,pmem,vsz,rss,comm

8. top - Monitor Per Process memory usage

Ex: # top -o %MEM

9. pmap - Process address space statistics

Ex: pmap -x <pid>

10. perf - Memory PMC and tracepoint analysis

Ex: root@ip-172-31-21-94:~# perf record -e page-faults -a -g [Sample page faults (RSS Growth) wth stack traces system wide, untl Ctrl-C]

[ perf record: Woken up 1 times to write data ]

[ perf record: Captured and wrote 0.135 MB perf.data (3 samples) ]

11. drsnoop - Direct reclaim tracing (BCC tool)

12. wss - Working set size estimation (shows how a working set size be measured via PTE page table entry)

13. bpftrace - Tracing Programs for memory analysis (BPF based tracer)

--------------------------------------------------

# File System Performance Debugging/Observability Tools #

#. Tool Name - Description

1. mount - List file system and their mount flags

2. free - Cache capacity statistics

3. top - Includes memory usage summary

4. vmstat - Virtual memory statistics

5. sar - Various statistics including historic

6. slabtop - Kernel slab allocator statistics

7. strace - System call tracing

8. fatrace - Trace file system operations using fanotify

9. latenctytop - Show system-wide latency sources

10. opensnoop - Traces file opened

11. filetop - Top files in by IOPS

12. cachestat - Page Cache Statistics

13. ext4dist (xfs,zfs,brtfs,nfs) - Show ext4 operation latency distribution

14. ext4slower (xfs,zfs,brtfs,nfs) - Show show ext4 operations

15. bpftrace - Custom file system tracing

--------------------------------------------------

# Disk Performance Debugging/Observability Tools #

#. Tool Name - Description

1. iostat - Various per-disk statistics

2. sar - Historical disk statistics

3. PSI - Disk Pressure stall information

4. pidstat - Disk I/O usage by process

5. perf - Record Block I/O tracepoints

Ex: # perf list 'block:*'

# perf record -e block:block_rq_issue -a -g sleep 10

# perf script --header

6. biolatency - Summarize Disk I/O latency as histogram

7. biosnoop - Trace disk I/O with PID and latency

8. iotop, biotop - Top for disks: summarize disk I/O by process

9. biostacks - Show disk I/O with Initialization Stacks

10. blktrace - Disk I/O event tracing

11. bpftrace - Custom Disk Tracing

Ex: Count block I/O tracepoint events: # bpftrace -e 'tracepoint:block:* { @[probe] = count(); }'

12. smartctl - Disk controller statistics (Self-Monitoring, Analysis & Reporting Technology)

Ex: can install it using, # apt install smartmontools

# smartctl --all -d megaraid,0 /dev/xvda15

--------------------------------------------------

# Network Performance Debugging/Observability Tools #

#. Tool Name - Description

1. ss - Socket statistics

Ex: root@ip-172-31-21-94:~# ss -tiepm

State Recv-Q Send-Q Local Address:Port Peer Address:Port Process

ESTAB 0 52 172.31.21.94:ssh 103.252.203.93:7237 users:(("sshd",pid=1251,fd=4),("sshd",pid=1168,fd=4)) timer:(on,217ms,0) ino:7097 sk:5e cgroup:/system.slice/ssh.service <->

skmem:(r0,rb2142943,t0,tb87040,f3148,w948,o0,bl0,d25) ts sack ecn ecnseen cubic wscale:6,7 rto:223 rtt:22.584/29.792 ato:

40 mss:1448 pmtu:9001 rcvmss:1448 advmss:8949 cwnd:10 ssthresh:24 bytes_sent:1513930 bytes_retrans:19132 bytes_acked:1494746

bytes_received:67861 segs_out:6624 segs_in:5581 data_segs_out:6525 data_segs_in:1737 send 5.13Mbps lastsnd:6 lastrcv:6

lastack:6 pacing_rate 10.3Mbps delivery_rate 23.2Mbps delivered:6507 app_limited busy:78848ms unacked:1 retrans:0/111

dsack_dups:111 rcv_rtt:51745.1 rcv_space:62677 rcv_ssthresh:56575 minrtt:5.133

2. ip - Network interface & route statistics

Ex: root@ip-172-31-21-94:~# ip -s link

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

RX: bytes packets errors dropped missed mcast

31750 264 0 0 0 0

3. ifconfig - Network interface statistics

4. nstat - Network stack statistics

Ex: root@ip-172-31-21-94:~# nstat -s

#kernel

IpInReceives 19426 0.0

IpInDelivers 19426 0.0

IpOutRequests 15286 0.0

IpOutTransmits 15286 0.0

TcpActiveOpens 27 0.0

TcpAttemptFails 2 0.0

TcpInSegs 18280 0.0

TcpOutSegs 14237 0.0

TcpRetransSegs 107 0.0

TcpOutRsts 4 0.0

5. netstat - Various network stack & interface statistics

6. sar - Historical statistics

Ex: root@ip-172-31-21-94:~# sar -n TCP 1

Linux 6.8.0-1015-aws (ip-172-31-21-94) 11/18/24 _x86_64_ (1 CPU)

14:47:33 active/s passive/s iseg/s oseg/s

14:47:34 0.00 0.00 2.00 0.00

14:47:35 0.00 0.00 1.00 1.00

14:47:36 0.00 0.00 1.00 1.00

Average: 0.00 0.00 1.25 0.75

7. nicstat - Network interface throughput and utilization

Ex: root@ip-172-31-21-94:~# nicstat -z 1

Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat

14:49:35 lo 0.00 0.00 0.02 0.02 122.0 122.0 0.00 0.00

14:49:35 eth0 19.70 0.27 14.33 1.88 1408.4 144.7 0.00 0.00

Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat

14:49:36 eth0 0.10 0.33 2.00 1.00 52.00 342.0 0.00 0.00

Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat

14:49:37 eth0 0.05 0.26 1.00 1.00 52.00 262.0 0.00 0.00

Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat

8. ethtool - Network interface driver statistics

Ex: ethtool -i eth0 [-i option shows driver details & -k shows interface Tunables.]

# ethtool -k eth0

9. tcplife - Trace TCP Session lifespans with connection details

Ex: # tcplife

10. tcptop - Show TCP throughput by Host and Process

Ex: # tcptop

11. tcpretrans - Trace TCP retransmits with address & TCP state

Ex: # tcpretrans

12. bpftrace - TCP/IP Stack Tracing: connections, packets, drops, latency

Ex:# bpftrace -e 't:syscalls:sys_enter_accept* { @[pid, comm] = count(); }' -> count socket accepts by PIDs and ps name.

# bpftrace -l 't:tcp:*'

tracepoint:tcp:tcp_bad_csum

tracepoint:tcp:tcp_cong_state_set

tracepoint:tcp:tcp_destroy_sock

tracepoint:tcp:tcp_probe

tracepoint:tcp:tcp_rcv_space_adjust

tracepoint:tcp:tcp_receive_reset

tracepoint:tcp:tcp_retransmit_skb

tracepoint:tcp:tcp_retransmit_synack

tracepoint:tcp:tcp_send_reset

13. tcpdump - Network packet sniffer

14. wireshark - Graphical network packet inspection

--------------------------------------------------

Benefits of knowing & using the BCC aka eBPF (bpfcc-tools) Tools:

1. Can debug, identify and fix the issue within stipulated timelines.

2. Provides dynamic tracing capabilities using BPF Tools

3. Can use specific tools for the right system resource(s).

4, And many more..

Ref:

Brendan Gregg Online resources and books (BPF Performance Tools, Systems Performance).

ROHIT PATEL

Search This Blog

Linux Systems Performance/Observability (BPF (bpfcc-tools), BCC Tools

Labels

Comments

Popular posts from this blog

EKS Cluster and Create CSI Driver to store credentials in AWS Secrets Manager via SecretProviderClass

Defacing Sites via HTML Injections (XSS)