Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unspecific search - peptidomics #167

Open
ErikHartman opened this issue Jan 29, 2025 · 1 comment
Open

Unspecific search - peptidomics #167

ErikHartman opened this issue Jan 29, 2025 · 1 comment

Comments

@ErikHartman
Copy link

ErikHartman commented Jan 29, 2025

Hi,

I'm trying to use sage to search some peptidomics data. Doing a search with PEAKS X resulted in ~3k peptides. I used the splitting tactic suggested in #154 and #97 to split the fasta into chunks and concatenate the results. In the end, the search resulted in 0 peptides. I likely did something stupid, but I don't know what.

Here is my config file:

{
  "database": {
    "bucket_size": 8192,
    "enzyme": {
      "missed_cleavages": 2,
      "min_len": 7,
      "max_len": 50,
      "cleave_at": ""
    },
    "fragment_min_mz": 150.0,
    "fragment_max_mz": 2000.0,
    "peptide_min_mass": 500.0,
    "peptide_max_mass": 5000.0,
    "ion_kinds": [
      "b",
      "y"
    ],
    "min_ion_index": 2,
    "max_variable_mods": 3,
    "static_mods": {
      "C": 57.0215
    },
    "variable_mods": {
      "M": 15.994
    },
    "decoy_tag": "rev_",
    "generate_decoys": true,
    "fasta": "./UP000005640_9606.fasta"
  },
  "quant": {
    "lfq": true,
    "lfq_settings": {
      "peak_scoring": "Hybrid",
      "integration": "Sum",
      "spectral_angle": 0.6,
      "ppm_tolerance": 5.0
    }
  },
  "precursor_tol": {
    "ppm": [
      -20.0,
      20.0
    ]
  },
  "fragment_tol": {
    "ppm": [
      -20.0,
      20.0
    ]
  },
  "isotope_errors": [
    0,
    2
  ],
  "deisotope": true,
  "min_peaks": 15,
  "max_peaks": 150,
  "max_fragment_charge": 1,
  "min_matched_peaks": 4,
  "predict_rt": true,
  "output_directory": ".",
  "mzml_paths": [
    "./SK_T2501_75uL_100SPE_60SPD_250116_4_S1-A1_1_3536.d"
  ]
}

Is there some blatant error here?

Example output for one chunk:

WARNING - Sage stderr for slice 4:
[2025-01-29T12:27:49Z WARN  sage_core::modification] Variable modifications must be specified as a list of modifications: [15.994]. This will become a HARD ERROR by v0.15
[2025-01-29T12:31:41Z INFO  sage] generated 7732957836 fragments, 278827623 peptides in 232567ms
[2025-01-29T12:31:41Z INFO  sage] processing files 0 .. 1 
[2025-01-29T12:31:42Z INFO  sage] - file IO:     1224 ms
[2025-01-29T12:32:01Z INFO  sage] - search:     18922 ms (1477 spectra/s)
[2025-01-29T12:32:01Z INFO  sage_core::ml::retention_alignment] aligning file #0: y = 1.0000x + 0.0000
[2025-01-29T12:32:01Z INFO  sage_core::ml::retention_alignment] aligned retention times across 1 files
[2025-01-29T12:32:01Z INFO  sage_core::ml::retention_model] - fit retention time model, rsq = 0.9703626922080913
[2025-01-29T12:32:01Z INFO  sage_core::ml::mobility_model] - fit mobility model, rsq = 0.9830665423653131, mse = 0.00006177281566545257
[2025-01-29T12:32:02Z INFO  sage_core::lfq] tracing MS1 features
[2025-01-29T12:32:02Z INFO  sage_core::lfq] integrating MS1 features
[2025-01-29T12:32:02Z INFO  sage] discovered 0 target MS1 peaks at 5% FDR
[2025-01-29T12:32:02Z INFO  sage] discovered 193 target peptide-spectrum matches at 1% FDR
[2025-01-29T12:32:02Z INFO  sage] discovered 0 target peptides at 1% FDR
[2025-01-29T12:32:02Z INFO  sage] discovered 0 target proteins at 1% FDR
[2025-01-29T12:32:03Z INFO  sage] finished in 253s
[2025-01-29T12:32:03Z INFO  sage] cite: "Sage: An Open-Source Tool for Fast Proteomics Searching and Quantification at Scale" https://doi.org/10.1021/acs.jproteome.3c00486
@lazear
Copy link
Owner

lazear commented Jan 29, 2025

Nothing looks terribly wrong from first glance - maybe try widening tolerances and see if it helps?

I would also recommend checking to see what the overlap of peptides/PSMs is versus PEAKS (ignoring FDR control, since sage will write everything to output) as a sanity check. It's possible that LDA isn't a great pick for rescoring in this case, so you could try something like using Mokapot to aggregate all of the results across slices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants