A question about running WAA on Azure #29

pixeli99 · 2024-10-13T15:16:07Z

Hi, this is a very outstanding project.

However, I ran into some issues while trying to use Azure, as I am not familiar with it. I hope to get some general advice to help me pinpoint the problem. I followed the steps in the readme strictly, but I changed the size of the ComputeInstance to Standard_D2a_v4, as it seems the free account does not have a quota for D8_V3. The error is as follows.

{
    "error": {
        "message": "Activity Failed:\n{\n    \"error\": {\n        \"code\": \"UserError\",\n        \"message\": \"UserErrorException:\\n\\tMessage: \\nError Code: ScriptExecution.StreamAccess.NotFound\\nNative Error: error in streaming from input data sources\\n\\tStreamError(NotFound)\\n=> stream not found\\n\\tNotFound\\nError Message: The requested stream was not found. Please make sure the request uri is correct.| session_id=3cae6541-8756-4a54-a0cf-1116939edff8\\n\\tInnerException None\\n\\tErrorResponse \\n{\\n    \\\"error\\\": {\\n        \\\"code\\\": \\\"UserError\\\",\\n        \\\"message\\\": \\\"\\\\nError Code: ScriptExecution.StreamAccess.NotFound\\\\nNative Error: error in streaming from input data sources\\\\n\\\\tStreamError(NotFound)\\\\n=> stream not found\\\\n\\\\tNotFound\\\\nError Message: The requested stream was not found. Please make sure the request uri is correct.| session_id=3cae6541-8756-4a54-a0cf-1116939edff8\\\"\\n    }\\n}\",\n        \"messageParameters\": {},\n        \"details\": []\n    },\n    \"time\": \"0001-01-01T00:00:00.000Z\",\n    \"componentName\": \"CommonRuntime\"\n}"
    }
}

The text was updated successfully, but these errors were encountered:

francedot · 2024-10-13T15:41:54Z

You might want to ask for more Azure Compute Quota for your region depending on your needs. As a reference, we currently use the Standard_D8_v3 VM size for our benchmarking, which falls under the Standard Dv3 Family Cluster Dedicated vCPUs category (each VM uses 8 cores). In any case, make sure that the VM supports nested virtualization.

pixeli99 · 2024-10-13T16:58:02Z

Thank you very much for your timely reply. I understand what you mean. I will try to apply for a quota adjustment. Thank you!

pixeli99 · 2024-10-18T12:01:30Z

Everything is much better now, but I encountered this bug. Does the author have any suggestions? I'm not familiar with this.

This error doesn't seem to affect the startup of the Windows environment; I can also connect to the desktop via RDP, but it results in continuous output (it keeps outputting and doesn’t stop after a period of time, which prevents the subsequent steps from being executed).

Waiting for a response from the windows server. This might take a while...

rhmsd · 2024-10-25T15:37:26Z

Hi, this error is caused by the DNS service in the Azure ML Compute occupied network port 53, please add a line sudo systemctl stop named.service to the startup file (Users/<YOUR_USER>/compute-instance-startup.sh) in Notebooks of ML Studio like below to fix it:

#!/bin/bash

echo "Initializing Compute VM at startup..."

# Install dos2unix
sudo apt-get install -y dos2unix

# Stop dnsmasq running on port 53
sudo systemctl stop systemd-resolved

# Stop DNS service on port 53
sudo systemctl stop named.service

# Stop nginx running on port 80
sudo service nginx stop

pixeli99 · 2024-10-27T05:46:17Z

I roughly understand what you mean, but doing this may not work for me. I will still encounter this issue. My final solution was to add a line of code in WindowsAgentArena/scripts/azure_files/run_entry.py to modify the port number of dnsmasq to resolve this problem.

print("display the content of /storage")  
os.system("ls -l /storage")  

# starts the VM and waits for it to fully load before proceeding
##### add this line
os.system('sed -i \'85s|.*|DNSMASQ_OPTS+=" --port 5353 --dhcp-range=$VM_NET_IP,$VM_NET_IP --dhcp-host=$VM_NET_MAC,,$VM_NET_IP,$VM_NET_HOST,infinite --dhcp-option=option:netmask,255.255.255.0"|\' /run/network.sh')
#####
os.system("/entry_setup.sh") # since it's in root we can just do /script.sh and don't need cd /

# launches the client script
os.system(f"cd /client && python run.py --agent_name {agent} --worker_id {worker_id} --num_workers {num_workers} --result_dir {result_dir} --test_all_meta_path {json_name} --model {model_name} --som_origin {som_origin} --a11y_backend {a11y_backend}")

print("Finished running entry script")

This does indeed allow me to run the WAA tests normally. Does the author think this is reasonable?

rhmsd · 2024-10-28T03:15:35Z

Glad you found a way to make it work by changing the DNSMASQ port to 5353. Note that if applications aren't configured to use this port, they may default to the standard DNS server on port 53, potentially leading to issues with resolving 'host.lan.' This could cause problems in dev, setup, or azure modes. Please verify that 'host.lan' resolves properly from the Windows VM. Just to clarify, I'm not the original author; these are just assumptions based on the code. :)

francedot · 2024-10-31T05:27:12Z

I roughly understand what you mean, but doing this may not work for me. I will still encounter this issue. My final solution was to add a line of code in WindowsAgentArena/scripts/azure_files/run_entry.py to modify the port number of dnsmasq to resolve this problem.
print("display the content of /storage")  
os.system("ls -l /storage")  

# starts the VM and waits for it to fully load before proceeding
##### add this line
os.system('sed -i \'85s|.*|DNSMASQ_OPTS+=" --port 5353 --dhcp-range=$VM_NET_IP,$VM_NET_IP --dhcp-host=$VM_NET_MAC,,$VM_NET_IP,$VM_NET_HOST,infinite --dhcp-option=option:netmask,255.255.255.0"|\' /run/network.sh')
#####
os.system("/entry_setup.sh") # since it's in root we can just do /script.sh and don't need cd /

# launches the client script
os.system(f"cd /client && python run.py --agent_name {agent} --worker_id {worker_id} --num_workers {num_workers} --result_dir {result_dir} --test_all_meta_path {json_name} --model {model_name} --som_origin {som_origin} --a11y_backend {a11y_backend}")

print("Finished running entry script")
This does indeed allow me to run the WAA tests normally. Does the author think this is reasonable?

When changing DNSMASQ port, please ensure that the training job can still connect to the internet. I remember taking this approach only to discover that our Chrome tasks were broken.

We’ll keep the issue open while we collect more data.

beala · 2025-01-17T19:52:17Z

For anyone struggling with the original error message:

{
    "error": {
        "message": "Activity Failed:\n{\n    \"error\": {\n        \"code\": \"UserError\",\n        \"message\": \"UserErrorException:\\n\\tMessage: \\nError Code: ScriptExecution.StreamAccess.NotFound\\nNative Error: error in streaming from input data sources\\n\\tStreamError(NotFound)\\n=> stream not found\\n\\tNotFound\\nError Message: The requested stream was not found. Please make sure the request uri is correct.| session_id=3cae6541-8756-4a54-a0cf-1116939edff8\\n\\tInnerException None\\n\\tErrorResponse \\n{\\n    \\\"error\\\": {\\n        \\\"code\\\": \\\"UserError\\\",\\n        \\\"message\\\": \\\"\\\\nError Code: ScriptExecution.StreamAccess.NotFound\\\\nNative Error: error in streaming from input data sources\\\\n\\\\tStreamError(NotFound)\\\\n=> stream not found\\\\n\\\\tNotFound\\\\nError Message: The requested stream was not found. Please make sure the request uri is correct.| session_id=3cae6541-8756-4a54-a0cf-1116939edff8\\\"\\n    }\\n}\",\n        \"messageParameters\": {},\n        \"details\": []\n    },\n    \"time\": \"0001-01-01T00:00:00.000Z\",\n    \"componentName\": \"CommonRuntime\"\n}"
    }
}

This is likely because the golden image was not uploaded to the datastore_input_path as specified in experiments.json. The default experiments.json expects the image to be in subdirectory named storage.

When I ran:

az storage blob upload-batch [...]  --source src/win-arena-container/vm/storage

It uploaded the contents of storage to the root path without creating the subdir, so when the job went looking for the storage subdir, it couldn't find it and crashed with this error.

pixeli99 closed this as completed Oct 13, 2024

pixeli99 reopened this Oct 18, 2024

rhmsd mentioned this issue Nov 6, 2024

Connection Timeout Error on run.py Script when Executing Task File #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about running WAA on Azure #29

A question about running WAA on Azure #29

pixeli99 commented Oct 13, 2024

francedot commented Oct 13, 2024 •

edited

Loading

pixeli99 commented Oct 13, 2024

pixeli99 commented Oct 18, 2024 •

edited

Loading

rhmsd commented Oct 25, 2024

pixeli99 commented Oct 27, 2024

rhmsd commented Oct 28, 2024

francedot commented Oct 31, 2024

beala commented Jan 17, 2025

A question about running WAA on Azure #29

A question about running WAA on Azure #29

Comments

pixeli99 commented Oct 13, 2024

francedot commented Oct 13, 2024 • edited Loading

pixeli99 commented Oct 13, 2024

pixeli99 commented Oct 18, 2024 • edited Loading

rhmsd commented Oct 25, 2024

pixeli99 commented Oct 27, 2024

rhmsd commented Oct 28, 2024

francedot commented Oct 31, 2024

beala commented Jan 17, 2025

francedot commented Oct 13, 2024 •

edited

Loading

pixeli99 commented Oct 18, 2024 •

edited

Loading