Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Troubleshooting Guides #1884

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions source/docs/software/basic-programming/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,4 @@ Basic Programming
git-getting-started.rst
cpp-units
joystick
robot-preferences
using-test-mode
reading-stacktraces
robot-preferences
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ When diagnosing robot issues, there is no substitute for thorough knowledge of t

.. note:: Note that all log files shown in this section have been scaled to match length using the Match Length button and then scrolling to the beginning of the autonomous mode. Also, many of the logs do not contain battery voltage information, the platform used for log capture was not properly wired for reporting the battery voltage.

.. tip:: Some error messages that are found in the Log Viewer are show below and more are detailed in the :doc:`driver-station-errors-warnings` article.
.. tip:: Some error messages that are found in the Log Viewer are show below and more are detailed in the :doc:`docs/software/troubleshooting/driver-station-errors-warnings` article.

"Normal" Log
~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion source/docs/software/driverstation/driver-station.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ The Operations Tab is used to control the mode of the robot and provide addition
- Teleoperated Mode causes the robot to run the code in the Teleoperated portion of the match.
- Autonomous Mode causes the robot to run the code in the Autonomous portion of the match.
- Practice Mode causes the robot to cycle through the same transitions as an FRC match after the Enable button is pressed (timing for practice mode can be found on the setup tab).
- :doc:`Test Mode </docs/software/basic-programming/using-test-mode>` is an additional mode where test code that doesn't run in a regular match can be tested.
- :doc:`Test Mode </docs/software/troubleshooting/using-test-mode>` is an additional mode where test code that doesn't run in a regular match can be tested.

2. Enable/Disable - These controls enable and disable the robot. See also `Driver Station Key Shortcuts`_.
3. Elapsed Time - Indicates the amount of time the robot has been enabled.
Expand Down
1 change: 0 additions & 1 deletion source/docs/software/driverstation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ Driver Station
driver-station
driver-station-best-practices
driver-station-log-viewer
driver-station-errors-warnings
programming-radios-for-fms-offseason
imaging-your-classmate
manually-setting-the-driver-station-to-start-custom-dashboard
10 changes: 10 additions & 0 deletions source/docs/software/troubleshooting/can-bus.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Debugging CAN-Related Problems
==============================

Usual symptoms

wiring

termination


40 changes: 40 additions & 0 deletions source/docs/software/troubleshooting/code-build.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Debugging Issues while Building Code
====================================

Common Symptoms
---------------


gradlew is not recognized...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``gradlew is not recognized as an internal or external command`` is a common error that can occur when the project or directory that you are currently in does not contain a ``gradlew`` file. This usually occurs when you open the wrong directory.

.. image:: images/reading-stacktraces/bad-gradlew-project.png
:alt: Image containing that the left-hand VS Code sidebar does not contain gradlew

In the above screenshot, you can see that the left-hand sidebar does not contain many files. At a minimum, VS Code needs a couple of files to properly build and deploy your project.

- ``gradlew``
- ``build.gradle``
- ``gradlew.bat``

If you do not see any one of the above files in your project directory, then you have two possible causes.

- A corrupt or bad project.
- You are in the wrong directory.

Fixing gradlew is not recognized...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``gradlew is not recognized...`` is a fairly easy problem to fix. First identify the problem source:

**Are you in the wrong directory?**
- Verify that the project directory is the correct directory and open this.

**Is your project missing essential files?**
- This issue is more complex to solve. The recommended solution is to :ref:`recreate your project <docs/software/vscode-overview/creating-robot-program:Creating a Robot Program>` and manually copy necessary code in.


Driving toward Root Cause
-------------------------
52 changes: 52 additions & 0 deletions source/docs/software/troubleshooting/common-field-problems.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
Common Field Problems
=====================

This article details some of the common problems that can plague your robot when it's on the field. It can be extremely frustrating and stressful when your robot breaks down. This article hopes to inform and instruct on what you can do to find the problem, and it's resolution.

.. important:: Remember to never eliminate any possibility! It never hurts to double or even triple check that everything is working properly.

Robot is stuttering and the RSL lights are dimming
--------------------------------------------------

Whenever your robot seems to give jerking motions and the RSL lights are dimming, this is usually a sign of :doc:`brownouts </docs/software/roborio-info/roborio-brownouts>`. One of the first steps you can take to resolving a brownout is identify when it occurred and any notable correlating events. Did you go into a match with your battery too low? Are you drawing too much current somehow? Can you reproduce this in the pit?

One of the most useful tools for identifying brownout causes is the :doc:`driver station log viewer </docs/software/driverstation/driver-station-log-viewer>`.

.. image:: /docs/software/roborio-info/images/identifying-brownouts.png

In the above image, you can see the brownout indicated by the highlighted orange line. The orange line represents dips (or lack of a straight line) in robot voltage.

Joystick inputs seem to be dropping
-----------------------------------

One of the characteristics of lost joystick inputs is when you press buttons or an axis and nothing happens! This can happen from a variety of reasons, so it's important to analyze which one is likely to your situation.

.. todo:: looking at the driverstation log and identifying if lost joysticks is a code related .. error:: text

.. important:: There is a current :ref:`known issue <docs/yearly-overview/known-issues:onboard i2c causing system lockups>` where I2C reads can take a long time or lock up the roboRIO.

Let's begin by asking a question. Can you reliably reproduce this issue at home or in the pits? This step is critical and assumptions *must not* be made.

Yes, I can
^^^^^^^^^^

This eliminates bandwidth or connectivity issues to the FMS. Some areas to explore are:

- Are joysticks working properly?
- Sometimes the issue can be as simple as a flakey USB cable or joystick.

- Is the computer running slow or sluggish? Try restarting
- High CPU or Disk Utilization can be indicators the Driver Station itself is sending inputs late.

- Is the code doing any long computation or loops? (Misuse of `for` and `while` loops can be common problems)
- In most cases, the use of any loops in FRC robot code can be avoided except in rare circumstances.

No, I cannot
^^^^^^^^^^^^

This is likely a bandwidth or IP configuration issue. Try setting your IP configurations to :ref:`DHCP <docs/networking/networking-introduction/ip-configurations:in the pits dhcp configuration>` or :ref:`Static <docs/networking/networking-introduction/ip-configurations:in the pits static configuration>`. Another potential problem could be excessive bandwidth utilization. Try :ref:`measuring your bandwidth utilization <docs/networking/networking-introduction/measuring-bandwidth-usage:viewing bandwidth usage>`.

Unable to connect to your robot?
--------------------------------

See :ref:`docs/software/troubleshooting/networking:Usual Symptoms`
112 changes: 112 additions & 0 deletions source/docs/software/troubleshooting/gathering-information.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
Gathering Debug Information
===========================

During the cycle of troubleshooting, a key step is to gather data. A large amount of the behavior of a robot's control system is *hidden* from view, and requires special tools to observe. While not exhaustive, the following is a list of common tools that robot software developers should be familiar with.

Driver Station
--------------

The National Instruments DriverStation is the first place to check when robot does not behave as expected.

In particular, the :ref:`Diagnostics Tab <docs/software/driverstation/driver-station:Diagnostics Tab>` and :ref:`Messages Tab <docs/software/driverstation/driver-station:Messages Tab>` frequently contain the minimum info needed to start driving toward root cause on a problem.

Additionally, the :ref:`Log File Viewer <docs/software/driverstation/driver-station-log-viewer:Driver Station Log File Viewer` provides more-detailed timeseries graphs of key data values and message logs.

rioLog
------

rioLog is a utility built into the WPILib suite and vsCode. It allows you to remotely view all of the :code:`stdout` and :code:`stderr` messages from your robot program. This includes all warnings, error messages, and print statements that your robot program generates. You can write your own software to generate these messages, as well as read the messages produced by WPILib or a 3rd party.

See :ref:`Riolog VS Code Plugin <docs/software/vscode-overview/viewing-console-output:Riolog VS Code Plugin>` for more info.

Command Line Utilities
----------------------

The Windows command prompt has a number of useful tools for troubleshooting.

The Windows command prompt may be accessed from the start menu. It is named :code:`cmd.exe`. The commands we describe here should be typed into the command prompt.

Using `ping`
^^^^^^^^^^^^

:code:`ping` is a utility built into Windows which allows for a basic network connection check between two points. It confirms basic functionality of both the physical layer (wiring or wireless), and a small portion of software.

It can be invoked by typing :code:`ping`, followed by a space, followed by the IP address to be checked, followed by Enter. For example, checking the IP address :code:`10.12.34.1`:

.. code-block:: console

C:\Users\YOUR_USER>ping 10.12.34.1

Pinging 10.12.34.1 with 32 bytes of data:
Reply from 10.12.34.1: bytes=32 time=3ms TTL=128
Reply from 10.12.34.1: bytes=32 time=3ms TTL=128
Reply from 10.12.34.1: bytes=32 time=3ms TTL=128
Reply from 10.12.34.1: bytes=32 time=3ms TTL=128

Ping statistics for 10.12.34.1:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 3ms, Maximum = 3ms, Average = 3ms

This shows four test "pings" being sent, and the device with IP address :code:`10.12.34.1` responding with a "Yup, I hear ya!" message within three milliseconds.

If None of the pings are responded to, it would likely indicate some total failure which prevents communication - perhaps a cable is unplugged, or the device is turned off, or doesn't have the expected IP address.

If only some of the packets come back, it would indicate a partial failure preventing some communication. Perhaps a cable is loose, the wifi network is being rate limited or interfered with.

Using :code:`ipconfig`
^^^^^^^^^^^^^^^^^^^^^^

:code:`ipconfig` is a utility built into Windows which summarizes the configuration of the network interfaces on the device. It can help confirm your computer is actually attached to a robot network, and should be capable of communicating with robot components.

It is invoked simply by typing :code:`ipconfig` and hitting Enter.

Here is an example of running it on a computer with one wireless (wifi) network interface and one wired (ethernet) interface, but with neither connected.

.. code-block:: console

C:\Users\YOUR_USER>ipconfig

Windows IP Configuration


Wireless LAN adapter Local Area Connection* 1:

Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :

Wireless LAN adapter Wi-Fi:

Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :

Here is another example with the wifi network properly connected to team 1234's robot over wifi:

.. code-block:: console

C:\Users\YOUR_USER>ipconfig

Windows IP Configuration


Wireless LAN adapter Wi-Fi:

Connection-specific DNS Suffix . : localdomain
Link-local IPv6 Address . . . . . : fe80::890d:bbae:d81c:d416%7
IPv4 Address. . . . . . . . . . . : 10.12.34.210
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 10.12.34.1


Manufacturer-Specific Interfaces
--------------------------------

3rd party manufacturers support custom interfaces to help address problems that are specific to their hardware. These include:

* `REV Robotics Hardware Client <https://docs.revrobotics.com/rev-hardware-client/>`__
* `Cross the Road Electronics Pheonix Framework <https://docs.ctre-phoenix.com/en/stable/ch05_PrepWorkstation.html>`__
* `Playing with Fusion's Web-Based Configuration <https://www.youtube.com/watch?v=LMuq73Vojw8&t=336s>`__

REV Robotics, Cross the Road Electronics, and Playing with Fusion all supply additional utilities for configuring and troubleshooting their hardware.


16 changes: 16 additions & 0 deletions source/docs/software/troubleshooting/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Troubleshooting
===============

.. toctree::
:maxdepth: 1

introduction.rst
gathering-information.rst
code-build.rst
driver-station-errors-warnings.rst
common-field-problems.rst
reading-stacktraces.rst
using-test-mode.rst
loop-overruns.rst
can-bus.rst
networking.rst
83 changes: 83 additions & 0 deletions source/docs/software/troubleshooting/introduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
Introduction to Troubleshooting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth mentioning keeping a written record of the steps that were taken (why they were taken), and what the observations were.

===============================

*Troubleshooting* is the art of identifying the causes of problems, and using the cause to iterate a better solution.

Issues Will Happen
------------------

Every robot will experience problems. These can be frustrating! Rest assured, fixing these issues is something every team goes through in a season.

This section of the docs is designed to help teams identify and fix common robot issues which have control-system root causes. While not an exhaustive list of all possible issues, the hope is to provide general guidance and specific examples to reduce the most common pain points.

Symptom vs. Root Cause
----------------------

When troubleshooting, be sure to separate *Symptom* and *Root Cause*.

The *Symptom* is the behavior you actually observe, which is not correct. For example, a robot which can only turn in place (and cannot drive straight) is a *symptom* a team might observe.

The *Root Cause* is the incorrect software, electrical hookup, or mechanical fault which actually caused the symptom to occur.

When troubleshooting effectively, a team will work backward from the observed symptom, to the root cause. Ideally, the root cause gets fixed, and in turn the symptom stops manifesting.

Sometimes, resource constraints might make a team "patch over" a symptom without identifying or fixing root cause. Teams should tread cautiously here, as patches are prone to break or cause more issues later on.

On Working Methodically
-----------------------

Effective troubleshooting requires teams to work methodically.

The Scientific Process, in Real-Time
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The core of all troubleshooting strategies is the same as the scientific process. Namely:

. Observe the world around you
. Propose a hypothesis
. Design and execute a test of that hypothesis
. Observe and interpret the results
. Repeat

In the case of most FRC robot troubleshooting, the hypothesis will be relatively small. A valid hypothesis could simply be "If I add a `* -1` to line 354 of my code, it should fix the motor that's running backward". The experiment would then be to make the change, upload the code, and attempt to reproduce the backward motor issue. If the motor is now running the correct direction, it is reasonable to assume the hypothesis was correct, and no further action is needed. However, if the issue persists, one could assume the hypothesis was not entirely correct, and the process must be repeated with a new hypothesis.

Change One Variable at a Time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be useful here to say that if a troubleshooting step did not change the observed symptom, it should be undone.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When interpreting the results of an experiment, it is critical that the experiment has controlled for all but one variable. Having only one changing variable is what allows experiment results to be interpreted to a single root cause.

If many variables change, and the problem goes away, you will not know which variable actually fixed the root cause.

While it may be tempting to change a lot of things hoping one of them fixes the issue, this will likely lead to a lot of things changed unnecessarily.

Undirected Guess and Check is Ineffective
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A naïve approach to troubleshooting will start by assuming *anything* could be the root cause, and pursue each option one by one. However, in a large and complex system (like a robot), the number of possibilities can be too large to effectively test each one, one by one.

From this perspective, it is best to start by making a few assumptions about what root causes are *most likely*, and test those first. As you get more experience doing troubleshooting, you'll gain a better intuition for where to start looking for problems.

However, keep in mind that these are *assumptions*. They're educated guesses as to where the problem *likely is not*, not exhaustive proof that a problem doesn't exist. Always be ready to go back and undo your assumptions if needed.

Be Egoless
^^^^^^^^^^

When troubleshooting, emotions can often start flying, as it sometimes appears blame is being placed. People can get defensive when their component or their design is called into question.

It's important to keep in mind that everyone is on the same team, working toward the same goal. Be careful to choose words and descriptions which describe and judge ideas, not people. Furthermore, try not to let your own ego and biases get in the way of considering possible faults in the systems you are responsible for.

Single vs. Multiple Points of Failure
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The most effective way to troubleshoot is to start by assuming that a *single point of failure* has triggered the symptom. For well designed, simple systems, this is usually the case.

However, as systems get larger and more complex, it's very possible multiple failures might exist. While this should always be a *secondary* assumption, be careful not to ignore the fact a symptom may be caused by multiple, interacting failures.

Practice, Practice, Practice
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Troubleshooting is a learned skill. While there are few concrete facts and figures to memorize, seeing examples of failures and their root causes over and over again is the best way to get better at isolating root causes from symptoms.

One will often see more experienced mentors or students look at an issue and quickly state a root cause. And, often, they'll be correct. Rest assured, this ability isn't magical or genetic - it's learned. Folks who are good at troubleshooting will *still* go through all the steps and processes these docs describe. However, they draw from a broader set of exposure to recognize patterns faster, and eliminate unlikely possibilities.

Be intentional about spending time practicing troubleshooting, and try not to worry if it takes longer than others.
Loading