3cb logs
Posted
2025-03-10
Updated
2025-03-19
refer
- https://github.com/ralphptorres/inspect_evals/tree/3cb
- https://github.com/ralphptorres/3cb/tree/inspect
test
- prep env with uv. for non-uv envs, just remove or replace the prepended
uv
oruv run
where needed from the following commandsgit clone https://github.com/ralphptorres/inspect_evals cd inspect_evals git switch 3cb uv venv -p 3.12 uv pip install -e ".[dev]" uv pip install anthropic mistralai openai toml echo "export ANTHROPIC_API_KEY=" >> .env echo "export MISTRAL_API_KEY=" >> .env echo "export OPENAI_API_KEY=" >> .env
- test a specific task, binpwn say
uv run inspect eval inspect_evals/threecb --sample-id binpwn # uv run inspect eval inspect_evals/threecb -T tasks="binpwn"
2025 w13
- two more tasks
- authkey, exfil
2025 w12
- initialaccess
2025 w11
- did: dataset parser at ./dataset.py
- fetches tasks from git:apartresearch/3cb/task_configs
- parses task data to a unified ./tasks/tasks.json
- write all task.env.main.files on disk
- implement task naming logic
- get first from first_second_third_name
- else if clash, get first_second, or deeper if needed
- parses other relevant data
- handles docker config
- writes a compose.yaml for tasks that require specific docker config
- eg. for perms, internet
- discovers task-specific Dockerfile
- else it looks for ./tasks/Dockerfile
- also possible to implement eval-wide Dockerfile
- we can leverage setup.sh for task-specific peculiarities
- but will bloat docker, with unneeded packages say
- writes a compose.yaml for tasks that require specific docker config
- pick up inspect-specific elicitation
- copy the last part of one of the existing elicitations
- then add this as an elicitation entry to task.toml
- submit a pr to apartresearch/3cb to add these elicitations
- did: task implementation at ./task.py
- allow our eval to specify tasks to run via
-T tasks=
optinspect eval inspect_evals/threecb -T tasks="nmap,binpwn"
- or
inspect eval inspect_evals/threecb --sample-id nmap
- note: our eval inherits conveniences from inspect eval such as
--solver human_agent
- allow our eval to specify tasks to run via
- tasks that dont work yet. why how and which?
- ie. initialaccess, nulls
- tasks that require interactive tools eg. nc, strace
- @tool nc() could be implemented
- needs to support stdout stderr streaming
- as in keep the process alive and fetch stdout when prompted
- tasks that work
- ie. binpwn, escalation, evasion, find, impact, nmap, python, resourcedevelopment, web
- tasks that use non-root bash user
- easy to implement with
tools=[bash(user="someuser", ...), ...]
- but needs hardcoding at ./task.py
- alternative: encode requirement of non-root user in tags then parse that
- easy to implement with
- tasks that require interactive tools eg. gdb
- leverage non-interactive feature via some option eg. gdb -ex
- then mention this in inspect-specific elicitation
- notably, escalation task leverages strings tool