Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a bunch of extra documentation #138

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
d467416
added code descriptions
flimdejong Sep 12, 2024
baad6ae
fixed descriptions
flimdejong Sep 12, 2024
2300593
added proto pb2 python files
flimdejong Sep 12, 2024
91310d0
added better documentation of roboteam_networking
flimdejong Sep 12, 2024
c1c80d0
Added more documentation + docker volume to change SSL-game-controlle…
flimdejong Sep 17, 2024
fc42efc
Added more proto files
flimdejong Sep 17, 2024
c62bad2
add (updated) proto files for manipulating GC
flimdejong Oct 2, 2024
987e39c
add initial code for RL
flimdejong Oct 2, 2024
5adda7e
fix RL weird errors
flimdejong Oct 3, 2024
50b91a6
Fixed proto files finally (checked if build works)
flimdejong Oct 17, 2024
c236d5b
added initial support for kubernetes using docker
flimdejong Oct 17, 2024
7d2600b
fixed small error in kube.yaml
flimdejong Oct 17, 2024
fa85ccc
Added train.py file for initial testing
flimdejong Oct 18, 2024
5232577
Fix docker -compose file
flimdejong Oct 21, 2024
593c846
Made 5/8 programs running succesfully on kubernetes
flimdejong Oct 21, 2024
da18d1a
GC + PrimAI + Observer + Hub + Interface + Sim launched succesfully
flimdejong Oct 21, 2024
7f94d86
Kube pod working for all AI except secondary or Tigers (ports not tes…
flimdejong Oct 23, 2024
245a0aa
fix getstate, changed reset state
aelhabashy Oct 29, 2024
f2e5b5a
Merge branch 'nova' of https://github.com/RoboTeamTwente/roboteam int…
aelhabashy Oct 29, 2024
f3c018a
Fixed dockerfile + added Tigers AI
flimdejong Oct 29, 2024
4bc490a
Merge Adham's work with my update
flimdejong Oct 29, 2024
31b2c37
Add tigers_sumatra as submodule
flimdejong Oct 29, 2024
b756e0f
Fix spelling
flimdejong Oct 30, 2024
d2365df
Updated Dockerfile and kube.yaml file (tested on hostnetwork)
flimdejong Nov 1, 2024
6be3014
ray cluste and test file
crinagurev Nov 1, 2024
ba59fb7
Commit again
crinagurev Nov 1, 2024
e4d89eb
Added more compute + py3.10 image
flimdejong Nov 4, 2024
1ce8564
Reorganisation of code
flimdejong Nov 4, 2024
063104a
Changed useReferee to true (default on)
flimdejong Nov 4, 2024
4acb021
Added resetReferee functionality + slight reorganisation and added c…
flimdejong Nov 6, 2024
5e92dec
Added sync functionality to async API
flimdejong Nov 6, 2024
0de4436
Fixed env.py + train.py now works
aelhabashy Nov 6, 2024
b68af28
uncomment ballstate
aelhabashy Nov 6, 2024
d178436
reduced rate, fixed ball placement
aelhabashy Nov 7, 2024
bc57db2
Start a new game after HALT + gitignore mods
aelhabashy Nov 7, 2024
27e2bac
Changed worker node to python 3.10
flimdejong Nov 7, 2024
045fd28
Changed ray-cluster file and minor edit in env.py
flimdejong Nov 11, 2024
7d160a6
Added Dockerfile for ray dependencies
flimdejong Nov 11, 2024
cef3d09
Updated ray-cluster file
flimdejong Nov 11, 2024
f148234
Added info on README for Ray
flimdejong Nov 11, 2024
8e7adc1
initial ray tain implementation + new modified env
aelhabashy Nov 12, 2024
2af8274
updated env.py file
flimdejong Nov 13, 2024
d8d3828
Changed imports
flimdejong Nov 13, 2024
743840e
Updated README
flimdejong Nov 13, 2024
adc2bdb
Added Dockerfile that Ray runs on
flimdejong Nov 13, 2024
39c841e
Updated ray-cluster file
flimdejong Nov 13, 2024
a0822d4
Cleaning up unused files
flimdejong Nov 13, 2024
92c845d
Updated train.py file to test Ray cluster training
flimdejong Nov 13, 2024
3d33d3a
Updated env2.py to work with more defined imports
flimdejong Nov 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,13 @@ ENV/
env.bak/
venv.bak/
__pycache__/

# Docker
docker/runner/ssl-game-controller-config/20*
docker/runner/ssl-game-controller-config/state-store.json.stream

# RL
roboteam_ai/src/RL/src/tmp/eval_monitor.monitor.csv
roboteam_ai/src/RL/src/tmp/monitor.monitor.csv
roboteam_ai/src/RL/src/ppo_tensorboard_main
roboteam_ai/src/RL/src/best_model_PPO.zip
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,6 @@
[submodule "roboteam_robothub/roboteam_embedded_messages"]
path = roboteam_robothub/roboteam_embedded_messages
url = https://github.com/RoboTeamTwente/roboteam_embedded_messages.git
[submodule "tigers_sumatra"]
path = tigers_sumatra
url = [email protected]:RoboTeamTwente/tigers_sumatra.git
31 changes: 31 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
FROM roboteamtwente/roboteam:development

# Create symbolic link from /home/roboteam to /home/roboteamtwente
USER root
RUN ln -s /home/roboteamtwente /home/roboteam

# Install Java
RUN apk add --no-cache openjdk11

# Install Java 21
RUN wget https://github.com/adoptium/temurin21-binaries/releases/download/jdk-21.0.1%2B12/OpenJDK21U-jdk_x64_alpine-linux_hotspot_21.0.1_12.tar.gz \
&& tar -xzf OpenJDK21U-jdk_x64_alpine-linux_hotspot_21.0.1_12.tar.gz \
&& mv jdk-21* /usr/lib/jvm/java-21-openjdk

# Set up Java environment variables to point to Java 21
ENV JAVA_HOME=/usr/lib/jvm/java-21-openjdk
ENV PATH="${JAVA_HOME}/bin:${PATH}"

WORKDIR /home/roboteam

# Copy the entire current directory into the container
COPY --chown=roboteamtwente:roboteamtwente . /home/roboteam/

# Make sure build.sh is executable
RUN chmod +x build.sh

# Add the lib directory to LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/roboteam/build/release/lib

# Switch back to the roboteamtwente user
USER roboteamtwente
16 changes: 16 additions & 0 deletions Dockerfile.ray
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# This dockerfile uses the ray project's official ray image as a base.
# It also adds the torch and gymnasium libraries to the image.
# It also adds the roboteam RL code to the image.

FROM rayproject/ray:latest-py310

# Install dependencies in a single layer to keep it cached
RUN pip install torch==2.5.1 gymnasium numpy==1.24.3 ray[rllib] pyzmq

# Copy the entire roboteam root folder (including the roboteam_ai and roboteam_networking folders)
COPY roboteam_ai /roboteam/roboteam_ai
COPY roboteam_networking /roboteam/roboteam_networking

# Set working directory and Python path
WORKDIR /roboteam
ENV PYTHONPATH="/roboteam:${PYTHONPATH}"
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,13 @@ To enable Tracy
- Compile Tracy Server and run
- Information is in the tracy [docs](https://github.com/wolfpld/tracy)
- Run AI


### Use of Ray
The dockerimage Dockerfile.ray is used to build a docker image with the ray project's official ray image as a base. It also adds the necessary libraries to the image and the roboteam RL code to the image. Only build it if you want to deploy it to a cluster.

Build the docker image using the following command from the root folder:
- docker build -t roboteamtwente/ray:development -f Dockerfile.ray .

Push it using the following command:
- docker push roboteamtwente/ray:development
4 changes: 4 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,10 @@ docker run -itd --name rtt-release-env -h rtt-release-env roboteamtwente/robotea
#### TODO
https://lemariva.com/blog/2020/10/vscode-c-development-and-debugging-containers


### Profile explanation
If you want to run certain docker containers with a single command, you create a profile. This will run every docker container that has this command.

### Docker geeks
We highly discourage executing command under this section unless you know what you are doing and for some reason you need to edit dockerfiles or composes structure.

Expand Down
34 changes: 34 additions & 0 deletions docker/runner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@

In a ray or distributed computing cluster, the terms "head node" and "worker nodes" refer to different roles that containers play in the cluster. The head node is the master node in a Ray cluster. You typically have one head node. Worker nodes are the containers that execute the jobs, in parallel. You can have as many worker nodes as you want.

-----------------------------------------------------------
## Installing Kuberay:
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --namespace ray-system --create-namespace

https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html
The above source was used for creating the ray-cluster.yaml

Installing kubernetes and minikubernetes, you can follow this guide to check out how to install and run them: https://medium.com/@areesmoon/installing-minikube-on-ubuntu-20-04-lts-focal-fossa-b10fad9d0511

Use 'pip install ray' and then 'pip show ray' to get your version of ray.

----------------------------------------------------------------------------------
After you have both kubernetes and ray, use the following command to create a cluster: kubectl apply -f ray-cluster.yaml
This cluster launches a ray head node and one worker node. Launch the external simulator using kubectl apply -f simulator.yaml

'kubectl get pods'-> this is will give you the cluster name

Use to forward the needed port to the ray service: kubectl port-forward svc/<cluster name> 8265:8265
This is the port that will be used inside ray_jobs.py, where we submit the jobs to ray.


-----------------------------------------------------
## Useful commands
kubectl apply -f ray-cluster.yaml
kubectl delete -f ray-cluster.yaml
helm install kuberay-operator ray/kuberay-operator
helm uninstall kuberay-operator
kubectl port-forward svc/roboteam-ray-cluster-head-nodeport 8265:8265 6379:6379 10001:10001 8000:8000 &
26 changes: 14 additions & 12 deletions docker/runner/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
version: '3'

services:

roboteam_primary_ai:
Expand All @@ -18,7 +16,7 @@ services:
- LD_LIBRARY_PATH=/home/roboteamtwente/lib/
volumes:
- ../../build/release/:/home/roboteamtwente/
profiles: ["simulator","diff","game", "robocup"]
profiles: ["simulator","diff","game", "robocup", "RL"]

roboteam_secondary_ai:
image: roboteamtwente/roboteam:development
Expand Down Expand Up @@ -72,7 +70,7 @@ services:
- LD_LIBRARY_PATH=/home/roboteamtwente/lib/
volumes:
- ../../build/release/:/home/roboteamtwente/
profiles: ["simulator","diff"]
profiles: ["simulator","diff", "RL"]

roboteam_observer_game:
image: roboteamtwente/roboteam:development
Expand Down Expand Up @@ -111,7 +109,7 @@ services:
- LD_LIBRARY_PATH=/home/roboteamtwente/lib/
volumes:
- ../../build/release/:/home/roboteamtwente/
profiles: ["simulator","diff"]
profiles: ["simulator","diff", "RL"]

roboteam_robothub_game:
image: roboteamtwente/roboteam:development
Expand Down Expand Up @@ -147,7 +145,7 @@ services:
- 8080:8080
volumes:
- ../../roboteam_interface/:/home/roboteamtwente/
profiles: ["simulator","diff","game", "robocup"]
profiles: ["simulator","diff","game", "robocup", "RL"]

roboteam_autoref:
image: gradle:8.4.0-jdk17
Expand All @@ -164,7 +162,7 @@ services:
- GRADLE_USER_HOME=/home/roboteamtwente/.cache # Cache gradle dependencies
volumes:
- ../../roboteam_autoref/:/home/roboteamtwente/
profiles: ["simulator","diff","game","autoref"]
profiles: ["simulator","diff","game","autoref", "RL"]

erforce_autoref_sim:
image: roboteamtwente/roboteam:development
Expand All @@ -179,7 +177,7 @@ services:
- 10010:10010 # Tracker port
volumes:
- ../../:/home/roboteamtwente/
profiles: ["simulator","diff"]
profiles: ["simulator","diff", "RL"]

erforce_autoref_game:
image: roboteamtwente/roboteam:development
Expand Down Expand Up @@ -212,13 +210,17 @@ services:
- 30013:30013 #Simulation Feedback Port
volumes:
- ../../:/home/roboteamtwente/
profiles: ["simulator","diff"]
profiles: ["simulator","diff", "RL"]

ssl-game-controller:
image: robocupssl/ssl-game-controller:latest
container_name: RTT_ssl-game-controller
restart: always
network_mode: "host" # Workaround to connect from interface on host to AI websocket, please fix
network_mode: "host"
ports:
- "8081:8081/tcp" # UI port
profiles: ["simulator","diff","game"]
- "8081:8081/tcp"
volumes:
- ./ssl-game-controller-config:/config
- ./ssl-game-controller-data:/data
command: -address :8081
profiles: ["simulator","diff","game", "RL"]
133 changes: 133 additions & 0 deletions docker/runner/ray-cluster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
name: roboteam-ray-cluster
spec:
rayVersion: "2.38.0"
# Head node configuration
headGroupSpec:
rayStartParams:
dashboard-host: "0.0.0.0"
template:
metadata:
labels:
app: ray-head
spec:
hostNetwork: false
containers:
- name: ray-head
image: roboteamtwente/ray:development
imagePullPolicy: Always # Always pull the latest image
ports:
- containerPort: 8265 # dashboard port
- containerPort: 6379 # redis port
- containerPort: 10001 # GCS server port
- containerPort: 8000 # Serve port
resources:
requests:
cpu: "500m"
memory: "1Gi" # Increased from 256Mi
limits:
cpu: "1" # Changed from 600 (which was too high)
memory: "2Gi" # Increased from 512Mi

env:
- name: SIMULATION_HOST
value: "127.0.0.1" # Using localhost since we're on host network
- name: VISION_PORT
value: "10020" # Match your simulator's vision port
- name: REFEREE_PORT
value: "10003" # Match your simulator's referee port

command: ["/bin/bash", "-c", "--"]
args: ["ray start --head --port=6379 --dashboard-host=0.0.0.0 --block"]
livenessProbe:
exec:
command:
- bash
- -c
- "wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success"
initialDelaySeconds: 90
timeoutSeconds: 10
periodSeconds: 30
failureThreshold: 5
readinessProbe:
exec:
command:
- bash
- -c
- "wget -T 10 -q -O- http://localhost:8265/api/gcs_healthz | grep success"
initialDelaySeconds: 90
timeoutSeconds: 10
periodSeconds: 30
failureThreshold: 5


# Worker node configuration
workerGroupSpecs:
- groupName: worker-group
replicas: 1 # Number of worker nodes
rayStartParams:
num-cpus: "1"
template:
metadata:
labels:
app: ray-worker
spec:
hostNetwork: true
containers:
- name: ray-worker
image: roboteamtwente/ray:development
imagePullPolicy: Always # Always pull the latest image
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
env:
- name: RAY_HEAD_IP
value: "roboteam-ray-cluster-head-svc.default.svc.cluster.local"
- name: LD_LIBRARY_PATH
value: /home/roboteam/build/release/lib
- name: VISION_ADDRESS
value: "224.5.23.2"
- name: VISION_PORT
value: "10020"
- name: REFEREE_ADDRESS
value: "224.5.23.1"
- name: REFEREE_PORT
value: "10003"
- name: GC_PORT
value: "8081"
command: ["/bin/bash", "-c", "--"]
args: ["ray start --address='roboteam-ray-cluster-head-svc.default.svc.cluster.local:6379' --block"]

#Add service for port forwarding
---
apiVersion: v1
kind: Service
metadata:
name: roboteam-ray-cluster-head-nodeport # Changed name to avoid conflict
spec:
type: NodePort
selector:
app: ray-head
ports:
- name: dashboard
port: 8265
targetPort: 8265
nodePort: 30265 # Ray dashboard
- name: redis
port: 6379
targetPort: 6379
nodePort: 30679 # Redis
- name: gcs
port: 10001
targetPort: 10001
nodePort: 31001 # GCS server
- name: serve
port: 8000
targetPort: 8000
nodePort: 30800 # Serve
Loading
Loading