New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[WIP] Autogptq checkpoint conversion support #82

Draft

rahul-tuli wants to merge 6 commits into main from autogptq-checkpoint-support

Member

rahul-tuli commented Jun 13, 2024 •

edited by kylesayrs

Loading

Fixes #153

rahul-tuli mentioned this pull request

AutoGPTQ state dict converter neuralmagic/sparseml#2315

Closed

Satrat suggested changes

View reviewed changes

Satrat left a comment

I think we should discuss the converters design in more detail with Rob and Michael. I like the approach of modular transformations but I think it would be ideal if this could more closely match the compressors design and we could simplify the code. The end goal would be making it easy for a 3rd party to implement a new converter

src/compressed_tensors/utils/converters/converters.py Outdated



		class ConverterNames(Enum):
		EXLLAMA_TO_COMPRESSED_TENSOR = "exllama_to_compressed_tensor"

Satrat Jun 17, 2024

I think the "to_compress_tensor" part of this can be removed, makes it to wordy. We can assume that we are always going from another format to compressed tensors

src/compressed_tensors/utils/converters/converters.py Outdated

+                      filepath_: Path = Path(filepath)
+                      if not save_dir:
+                          save_dir = "compressed_tensors_model"

Satrat Jun 17, 2024

nit: could this be a constant at the top of the file rather than in the function?

src/compressed_tensors/utils/converters/converters.py

+                      save_dir_: Path = Path(save_dir)
+                      save_dir_.mkdir(exist_ok=True, parents=True)
+                      metadata = {"format": "pt", "source": "Created by SparseML"}

Satrat Jun 17, 2024

We should change this to created by compressed tensors instead

src/compressed_tensors/utils/converters/converters.py Outdated

Comment on lines 84 to 138

+                          for file in filepath_.glob("*.safetensors"):
+                              _LOGGER.info(f"Loading file: {file}")
+                              new_state_dict = {}
+                              state_dict: Iterable[StateDictType] = load_safetensors_state_dict(
+                                  file, by_layers=True
+                              )
+                              for layer_state_dict in state_dict:
+                                  new_state_dict.update(
+                                      cls.translate(state_dict=layer_state_dict, **kwargs)
+                                  )
+                              if new_state_dict:
+                                  save_file(
+                                      new_state_dict,
+                                      filename=save_dir_ / file.name,
+                                      metadata=metadata,
+                                  )
+                          _copy_non_safetensor_files_(filepath_, save_dir_)
+                          _update_quantization_config(filepath_, save_dir_)
+                      elif filepath_.is_file():
+                          new_state_dict = {}
+                          state_dict: Iterable[StateDictType] = load_safetensors_state_dict(
+                              file, by_layers=True
+                          )
+                          for layer_state_dict in state_dict:
+                              new_state_dict.update(cls.translate(state_dict=layer_state_dict))
+                          save_file(
+                              new_state_dict, save_path=save_dir_ / filepath_.name, metadata=metadata
+                          )
+                      return str(save_dir_)

Satrat Jun 17, 2024

A lot of this logic can be simplified by using the existing get_weight_mappings helper in compressed_tensors along with safetensors .get_weight() function

src/compressed_tensors/utils/converters/converters.py

Comment on lines +109 to +132

		for layer_state_dict in state_dict:
		new_state_dict.update(cls.translate(state_dict=layer_state_dict))

Satrat Jun 17, 2024

why is this applied one parameter at a time? I thought that a translation would happen for the whole state_dict not a single layer. There could be a case where a translation needs multiple keys to be completed

src/compressed_tensors/utils/converters/converters.py Outdated

Comment on lines 172 to 186

+                  for file in source_dir.glob("*"):
+                      if file.suffix != ".safetensors":
+                          _LOGGER.info(f"Copying file: {file} to {dest_dir}")
+                          shutil.copy(file, dest_dir / file.name)

Satrat Jun 17, 2024

FYI this will copy over "model.safetensors.index.json", not sure if this is desired

src/compressed_tensors/utils/converters/converters.py Outdated

+                  :param source_dir: The directory containing the original config.json file
+                  :param dest_dir: The directory to save the updated config.json file
+                  """
+                  from sparseml.transformers import SparseAutoConfig

Satrat Jun 17, 2024

We should definitely not be importing from sparseml in this repo, can we just use AutoConfig?

src/compressed_tensors/utils/converters/converters.py

		shutil.copy(file, dest_dir / file.name)


		def _update_quantization_config(source_dir: Path, dest_dir: Path):

Satrat Jun 17, 2024

nit: could we rename this fn to clarify more what it does? "update" feels too generic when what we are actually doing is removing the quantization config

src/compressed_tensors/utils/converters/converters.py Outdated

Comment on lines 196 to 271

+              def load_safetensors_state_dict(
+                  file_path: str, by_layers: bool = True
+              ) -> Iterator[Tuple[str, Dict[str, torch.Tensor]]]:
+                  """
+                  Load a safetensors file from disk
+                  :param file_path: path to the safetensors file
+                  :param by_layers: if True, return a iterator with dictionary of safetensors
+                      data by layers
+                  :return: Iterator of dictionary of safetensors data or iterator of
+                      dictionaries by layers
+                  """
+                  with safe_open(file_path, framework="pt", device="cpu") as f:
+                      if by_layers:
+                          current_layer = None
+                          layer_data = {}
+                          for key in sorted(f.keys()):
+                              layer_name, param_name = key.split(".", 1)
+                              if current_layer is None:
+                                  current_layer = layer_name
+                              elif layer_name != current_layer:
+                                  yield current_layer, layer_data
+                                  current_layer = layer_name
+                                  layer_data = {}
+                              layer_data[key] = f.get_tensor(key)
+                          if layer_data:
+                              yield layer_data
+                      else:
+                          yield {key: f.get_tensor(key) for key in f.keys()}

Satrat Jun 17, 2024

Why is this needed over the existing helper functions in safetensors_load and the safetensors .get_weights function. If we do still need it we should move it to safetensors_load.py

src/compressed_tensors/utils/converters/transformations.py

Comment on lines +40 to +47

+              def is_gptq_quantization_target(key: str) -> bool:
+                  """
+                  Assumes self_attn and mlp are the only quantization targets
+                  in model layers of the state_dict.
+                  :param key: The key of the state_dict
+                  :return: True if the key is a quantization target, False otherwise
+                  """
+                  return "model.layers" in key and ("self_attn" in key or "mlp" in key)

Satrat Jun 17, 2024

Why are we keying off of names here? I would think we want to instead check whether specific quantization keys exist in the state_dict like we do for the compressors. You could use the get_nested_weight_mappings helper

rahul-tuli force-pushed the autogptq-checkpoint-support branch from 97b8e15 to 282d96e Compare

June 27, 2024 14:15

rahul-tuli added 5 commits

June 27, 2024 14:16


          fix some typos

b3de9b4


          Add Converters Scafolding

f0da4c8


          Update code to read in state dict layer by layer

87b61ba


          allow passing in kwargs

d1cda42


          Add progress

343aa8d

rahul-tuli force-pushed the autogptq-checkpoint-support branch from 282d96e to f45b4a1 Compare

June 27, 2024 14:16


          Move validate safetensors function

ed95dfc

Add tests for the same

rahul-tuli force-pushed the autogptq-checkpoint-support branch from f45b4a1 to ed95dfc Compare

June 27, 2024 14:17

markurtz self-assigned this

Member

markurtz commented Oct 18, 2024

Assigning to myself for now to tackle after the reactors for observers and lifecycle are completed and ideally handle AutoAWQ as well as filed in #153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet