Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring Branch #79

Draft
wants to merge 134 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 132 commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
2db12a6
Added draft dataset types
lucas-wilkins Jul 30, 2024
c669b6d
Data sketch
lucas-wilkins Jul 31, 2024
41dc519
Merge pull request #77 from SasView/refactor_24-73-file-type-descript…
lucas-wilkins Jul 31, 2024
9976f4e
Some units
lucas-wilkins Aug 5, 2024
fc433c3
Merge pull request #78 from SasView/76-basic-outline-of-new-data-types
lucas-wilkins Aug 6, 2024
677008a
Work towards outline
lucas-wilkins Aug 6, 2024
41d63eb
Units now available and grouped
lucas-wilkins Aug 6, 2024
ffb0110
Merge branch 'refactor_24' into 76-basic-outline-of-new-data-types
lucas-wilkins Aug 6, 2024
8ccf70f
Merge pull request #80 from SasView/76-basic-outline-of-new-data-types
lucas-wilkins Aug 6, 2024
e2b5da4
More units
lucas-wilkins Aug 6, 2024
3aaec14
one d in shadow
lucas-wilkins Aug 6, 2024
7c08e01
Fixed density units
lucas-wilkins Aug 6, 2024
2e15da5
Use alias list to remove duplicates
lucas-wilkins Aug 6, 2024
a890f65
More units, towards formatting
lucas-wilkins Aug 6, 2024
35b7921
Units and accessors draft ready to begin tests on
lucas-wilkins Aug 7, 2024
ef7d2b9
Some tests
lucas-wilkins Aug 7, 2024
b8f16c2
SI unit module
lucas-wilkins Aug 7, 2024
bc8d677
si unit list
lucas-wilkins Aug 7, 2024
cf11e27
More tests, added names
lucas-wilkins Aug 7, 2024
8050c92
More units
lucas-wilkins Aug 7, 2024
0e492a0
Notes
lucas-wilkins Aug 7, 2024
e990f71
Notes
lucas-wilkins Aug 8, 2024
f9bb4a3
start of metadata structure
lucas-wilkins Aug 8, 2024
8b4372d
Metadata work, and unit groups
lucas-wilkins Aug 13, 2024
4372c2f
More metadata stuff
lucas-wilkins Aug 13, 2024
a6aed62
Named units in unit groups
lucas-wilkins Aug 13, 2024
6ab996e
More metadata, added absolute temperature stuff
lucas-wilkins Aug 14, 2024
9c6302a
Metadata objects complete for now
lucas-wilkins Aug 14, 2024
b3f4686
Percent test and fix
lucas-wilkins Aug 14, 2024
851253a
Work towards new data object
lucas-wilkins Aug 21, 2024
2f11e68
Basic reading
lucas-wilkins Aug 28, 2024
c5838a1
Work towards structuring inputs with uncertainties
lucas-wilkins Aug 29, 2024
18ef91a
Work on uncertainty propagation
lucas-wilkins Sep 9, 2024
b206b03
Added some code to enable test driven development.
jamescrake-merani Sep 11, 2024
194fd5c
Some minor changes to stop my editor from crying.
jamescrake-merani Sep 11, 2024
4463584
Pass in the dimensions so this code is correct.
jamescrake-merani Sep 11, 2024
ffea2d6
Wrote some tests ahead.
jamescrake-merani Sep 11, 2024
6a72ac6
Parse using a slant as well.
jamescrake-merani Sep 11, 2024
5788a82
Found a regex for splitting up the string.
jamescrake-merani Sep 11, 2024
ad251e5
Implemented the parse_single_unit function.
jamescrake-merani Sep 11, 2024
92b076a
Use two functions for parsing.
jamescrake-merani Sep 11, 2024
7159ca2
Use list comprehension to get potential symbols.
jamescrake-merani Sep 12, 2024
2e38eb5
parse unit strs function.
jamescrake-merani Sep 12, 2024
c19f655
Created a function to pass in a stack of units.
jamescrake-merani Sep 12, 2024
0a9cac2
Multiply dimensions function.
jamescrake-merani Sep 12, 2024
09d0b8d
Use the new multiply function.
jamescrake-merani Sep 12, 2024
355b8d5
Nvm I'm blind; there already was a multiply method.
jamescrake-merani Sep 12, 2024
6bf37fa
Parse in a whole unit.
jamescrake-merani Sep 12, 2024
938db13
I still need this multply for parse_unit_stack.
jamescrake-merani Sep 12, 2024
9d20353
System for combining units.
jamescrake-merani Sep 16, 2024
69e9310
Removed not implemented comment.
jamescrake-merani Sep 16, 2024
5ddd1bd
Parse in a named unit.
jamescrake-merani Sep 16, 2024
eae52d0
Avoid mutating state.
jamescrake-merani Sep 16, 2024
7d3840b
Replace the unit on the stack.
jamescrake-merani Sep 17, 2024
e8a2d3b
Fixed the logic around combining units.
jamescrake-merani Sep 17, 2024
99f4f81
Parse_name_unit can take in an already parsed unit.
jamescrake-merani Sep 17, 2024
bd3dfe8
Added a todo comment.
jamescrake-merani Sep 17, 2024
8a9d701
Take a unit from the command line.
jamescrake-merani Sep 17, 2024
1c773e0
Added whitespace on input.
jamescrake-merani Sep 17, 2024
a7bbba5
Fixed typo.
jamescrake-merani Sep 17, 2024
05c3c02
Only multiply scale by 1, or -1.
jamescrake-merani Sep 17, 2024
947f103
Look for slashes in the string.
jamescrake-merani Sep 17, 2024
a1b6288
Got fraction units working as well :)
jamescrake-merani Sep 17, 2024
c2f4a8f
Configure how ambiguities are dealt with.
jamescrake-merani Sep 18, 2024
16adb41
Only break if we have found a symbol.
jamescrake-merani Sep 18, 2024
e748fcb
Take in longest unit across the whole file.
jamescrake-merani Sep 18, 2024
f8a0a5a
Take in a unit group in parse_singe_unit.
jamescrake-merani Sep 19, 2024
a0bbd18
Parse a unit from a specific group.
jamescrake-merani Sep 19, 2024
41d1d1c
Equivalent function for from group.
jamescrake-merani Sep 19, 2024
462d76d
Is none not equal to none.
jamescrake-merani Sep 19, 2024
44bad8d
Removed old TODO comment.
jamescrake-merani Sep 19, 2024
b8ff6be
Catch key errors.
jamescrake-merani Sep 19, 2024
1f11030
Expand the try block.
jamescrake-merani Sep 19, 2024
a5d71de
Removed an old todo comment.
jamescrake-merani Sep 19, 2024
98fed84
New unit test in pytest.
jamescrake-merani Sep 20, 2024
ade0bb4
Raise an exception if the unit can't be parsed.
jamescrake-merani Sep 20, 2024
b313f4c
Added some unit tests that should error.
jamescrake-merani Sep 20, 2024
3808653
Created a regex validator for the unit str.
jamescrake-merani Sep 20, 2024
25f3903
Throw an exception if the validation fails.
jamescrake-merani Sep 20, 2024
0920a66
Update unit test to reflect new error.
jamescrake-merani Sep 20, 2024
0148f27
Unit test for what I was originally testing for.
jamescrake-merani Sep 20, 2024
06a0c9c
Added more tests for slants.
jamescrake-merani Sep 23, 2024
067929f
Slants should be valid unit strings as well.
jamescrake-merani Sep 23, 2024
98bfc0c
Parse in newton as its defined value.
jamescrake-merani Sep 23, 2024
8515a59
Remove the old testing file.
jamescrake-merani Sep 23, 2024
ded94f2
This function isn't being used.
jamescrake-merani Sep 23, 2024
3acb6e3
Added to the doc string about unit groups.
jamescrake-merani Sep 23, 2024
5a45950
Moved the unit group to first.
jamescrake-merani Sep 23, 2024
606eea3
Small rename.
jamescrake-merani Sep 23, 2024
0d272a7
Use destructuring to make this a bit cleaner.
jamescrake-merani Sep 23, 2024
d2abf95
Fixed function call.
jamescrake-merani Sep 23, 2024
eb8a1ec
Added some docstrings.
jamescrake-merani Sep 23, 2024
f466c53
Work on adding uncertainties, adding non-integer powers
lucas-wilkins Sep 23, 2024
c6f79af
Integer unit powers now work
lucas-wilkins Sep 23, 2024
46bbc44
Refactored parse named unit so it just takes one arg.
jamescrake-merani Sep 25, 2024
8c1b984
Removed old todo comment.
jamescrake-merani Sep 25, 2024
8c263b8
Added some docstrings.
jamescrake-merani Sep 25, 2024
556500e
Stop linter from moaning.
jamescrake-merani Sep 25, 2024
57a425e
Quantities now have histories, and variance could work, needs testing…
lucas-wilkins Sep 25, 2024
47bcefa
Quantities ready for testing
lucas-wilkins Sep 26, 2024
f119579
Bump to python 3.12
lucas-wilkins Sep 26, 2024
10f7774
Quantity combining seems to work
lucas-wilkins Sep 27, 2024
c241bda
Fixed error in helper function
lucas-wilkins Sep 27, 2024
463de25
Fixed error formatting bug
lucas-wilkins Sep 27, 2024
250d293
Tests for error propagation
lucas-wilkins Sep 30, 2024
accca77
More aliases for units
lucas-wilkins Sep 30, 2024
fdd06e8
Made file for target object for metadata
lucas-wilkins Sep 30, 2024
c8929c7
Accept spaces in the unit str.
jamescrake-merani Sep 30, 2024
5f00767
Split by dots as well.
jamescrake-merani Sep 30, 2024
3d62f9d
Main data reading for HDF5 prototype
lucas-wilkins Sep 30, 2024
544f131
Merge branch 'refactor_24' into unit_parsing
jamescrake-merani Oct 1, 2024
b000b38
Merge pull request #83 from SasView/unit_parsing
jamescrake-merani Oct 1, 2024
dff1f4d
integrating the units stuff
lucas-wilkins Oct 1, 2024
32cbb1f
Merge branch 'master' into refactor_24
lucas-wilkins Oct 1, 2024
b661ffd
Merge branch 'refactor_24' of https://github.com/SasView/sasdata into…
lucas-wilkins Oct 1, 2024
a83c0dc
Parsing of units in HDF5 reader
lucas-wilkins Oct 1, 2024
8bf4340
Fixed bug where ohms, and angstroms were forbidden
jamescrake-merani Oct 2, 2024
b821285
Accept the ^ char but don't do anything with it.
jamescrake-merani Oct 2, 2024
c9de771
Fixed moles potentially
lucas-wilkins Oct 2, 2024
36b08fc
Merge branch 'refactor_24' into unit_unicode_symbol_fix
jamescrake-merani Oct 2, 2024
426eb94
Unit name fixes
lucas-wilkins Oct 2, 2024
ff26117
Filling in some of the working for the accessors
lucas-wilkins Oct 3, 2024
80610ac
Merge pull request #84 from SasView/unit_unicode_symbol_fix
lucas-wilkins Oct 3, 2024
cc44356
Remove target data object attempt
lucas-wilkins Oct 3, 2024
7ab4bb9
Connecting metadata
lucas-wilkins Oct 7, 2024
ad1ba33
Accessor changes
lucas-wilkins Oct 7, 2024
c3b677f
Merge branch 'refactor_24' of https://github.com/SasView/sasdata into…
lucas-wilkins Oct 7, 2024
d1ef1d4
Merge tidying
lucas-wilkins Oct 7, 2024
f07f261
Metadata linked up, just not pointing in the right place right now
lucas-wilkins Oct 7, 2024
aa13492
Added some debugging info to the summary
lucas-wilkins Oct 7, 2024
fb10792
Debugging metadata
lucas-wilkins Oct 7, 2024
df6d045
Better reference methods
lucas-wilkins Oct 7, 2024
d0564ab
Is anyone capable of putting sensible things in HDF5 files? Correctio…
lucas-wilkins Oct 9, 2024
6b9ae5e
Line endings :)
lucas-wilkins Oct 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
strategy:
matrix:
os: [macos-latest, ubuntu-latest, windows-latest]
python-version: ['3.10', '3.11', '3.12']
python-version: ['3.12']
fail-fast: false

steps:
Expand Down
3 changes: 3 additions & 0 deletions sasdata/checklist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Things to check once everything is in place:

1) Do any centigrade fields read in incorrectly?
42 changes: 42 additions & 0 deletions sasdata/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
from enum import Enum
from typing import TypeVar, Any, Self
from dataclasses import dataclass

from quantities.quantity import NamedQuantity
from sasdata.metadata import Metadata
from sasdata.quantities.accessors import AccessorTarget
from sasdata.data_backing import Group, key_tree


class SasData:
def __init__(self, name: str, data_contents: list[NamedQuantity], raw_metadata: Group, verbose: bool=False):
self.name = name
self._data_contents = data_contents
self._raw_metadata = raw_metadata
self._verbose = verbose

self.metadata = Metadata(AccessorTarget(raw_metadata, verbose=verbose))

# TO IMPLEMENT

# abscissae: list[NamedQuantity[np.ndarray]]
# ordinate: NamedQuantity[np.ndarray]
# other: list[NamedQuantity[np.ndarray]]
#
# metadata: Metadata
# model_requirements: ModellingRequirements

def summary(self, indent = " ", include_raw=False):
s = f"{self.name}\n"

for data in self._data_contents:
s += f"{indent}{data}\n"

s += f"Metadata:\n"
s += "\n"
s += self.metadata.summary()

if include_raw:
s += key_tree(self._raw_metadata)

return s
126 changes: 126 additions & 0 deletions sasdata/data_backing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
from typing import TypeVar, Self
from dataclasses import dataclass
from enum import Enum

from sasdata.quantities.quantity import NamedQuantity

DataType = TypeVar("DataType")

""" Sasdata metadata tree """

def shorten_string(string):
lines = string.split("\n")
if len(lines) <= 1:
return string
else:
return lines[0][:30] + " ... " + lines[-1][-30:]

@dataclass
class Dataset[DataType]:
name: str
data: DataType
attributes: dict[str, Self | str]

def summary(self, indent_amount: int = 0, indent: str = " ") -> str:

s = f"{indent*indent_amount}{self.name.split("/")[-1]}:\n"
s += f"{indent*(indent_amount+1)}{shorten_string(str(self.data))}\n"
for key in self.attributes:
value = self.attributes[key]
if isinstance(value, (Group, Dataset)):
value_string = value.summary(indent_amount+1, indent)
else:
value_string = f"{indent * (indent_amount+1)}{key}: {shorten_string(repr(value))}\n"

s += value_string

return s

@dataclass
class Group:
name: str
children: dict[str, Self | Dataset]

def summary(self, indent_amount: int=0, indent=" "):
s = f"{indent*indent_amount}{self.name.split("/")[-1]}:\n"
for key in self.children:
s += self.children[key].summary(indent_amount+1, indent)

return s

class Function:
""" Representation of a (data driven) function, such as I vs Q """

def __init__(self, abscissae: list[NamedQuantity], ordinate: NamedQuantity):
self.abscissae = abscissae
self.ordinate = ordinate


class FunctionType(Enum):
""" What kind of function is this, should not be relied upon to be perfectly descriptive

The functions might be parametrised by more variables than the specification
"""
UNKNOWN = 0
SCATTERING_INTENSITY_VS_Q = 1
SCATTERING_INTENSITY_VS_Q_2D = 2
SCATTERING_INTENSITY_VS_Q_3D = 3
SCATTERING_INTENSITY_VS_ANGLE = 4
UNKNOWN_METADATA = 20
TRANSMISSION = 21
POLARISATION_EFFICIENCY = 22
UNKNOWN_REALSPACE = 30
SESANS = 31
CORRELATION_FUNCTION_1D = 32
CORRELATION_FUNCTION_2D = 33
CORRELATION_FUNCTION_3D = 34
INTERFACE_DISTRIBUTION_FUNCTION = 35
PROBABILITY_DISTRIBUTION = 40
PROBABILITY_DENSITY = 41

def function_type_identification_key(names):
""" Create a key from the names of data objects that can be used to assign a function type"""
return ":".join([s.lower() for s in sorted(names)])

function_fields_to_type = [
(["Q"], "I", FunctionType.SCATTERING_INTENSITY_VS_Q),
(["Qx", "Qy"], "I", FunctionType.SCATTERING_INTENSITY_VS_Q_2D),
(["Qx", "Qy", "Qz"], "I", FunctionType.SCATTERING_INTENSITY_VS_Q_3D),
(["Z"], "G", FunctionType.SESANS),
(["lambda"], "T", FunctionType.TRANSMISSION)
]

function_fields_lookup = {
function_type_identification_key(inputs + [output]): function_type for inputs, output, function_type in function_fields_to_type
}

def build_main_data(data: list[NamedQuantity]) -> Function:
names = [datum.name for datum in data]
identifier = function_type_identification_key(names)

if identifier in function_fields_lookup:
function_type = function_fields_lookup[identifier]
else:
function_type = FunctionType.UNKNOWN

match function_type:
case FunctionType.UNKNOWN:
pass
case _:
raise NotImplementedError("Unknown ")

def key_tree(data: Group | Dataset, indent_amount=0, indent: str = " ") -> str:
""" Show a metadata tree, showing the names of they keys used to access them"""
s = ""
if isinstance(data, Group):
for key in data.children:
s += indent*indent_amount + key + "\n"
s += key_tree(data.children[key], indent_amount=indent_amount+1, indent=indent)

if isinstance(data, Dataset):
s += indent*indent_amount + "[data]\n"
for key in data.attributes:
s += indent*indent_amount + key + "\n"
s += key_tree(data.attributes[key], indent_amount=indent_amount+1, indent=indent)

return s
79 changes: 79 additions & 0 deletions sasdata/dataset_types.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
""" Information used for providing guesses about what text based files contain """

from dataclasses import dataclass

import sasdata.quantities.units as units

#
# VERY ROUGH DRAFT - FOR PROTOTYPING PURPOSES
#

@dataclass
class DatasetType:
name: str
required: list[str]
optional: list[str]
expected_orders: list[list[str]]


one_dim = DatasetType(
name="1D I vs Q",
required=["Q", "I"],
optional=["dI", "dQ", "shadow"],
expected_orders=[
["Q", "I", "dI"],
["Q", "dQ", "I", "dI"]])

two_dim = DatasetType(
name="2D I vs Q",
required=["Qx", "Qy", "I"],
optional=["dQx", "dQy", "dI", "Qz", "shadow"],
expected_orders=[
["Qx", "Qy", "I"],
["Qx", "Qy", "I", "dI"],
["Qx", "Qy", "dQx", "dQy", "I", "dI"]])

sesans = DatasetType(
name="SESANS",
required=["z", "G"],
optional=["stuff", "other stuff", "more stuff"],
expected_orders=[["z", "G"]])

dataset_types = {dataset.name for dataset in [one_dim, two_dim, sesans]}


#
# Some default units, this is not how they should be represented, some might not be correct
#
# The unit options should only be those compatible with the field
#

unit_kinds = {
"Q": units.inverse_length,
"I": units.inverse_length,
"Qx": units.inverse_length,
"Qy": units.inverse_length,
"Qz": units.inverse_length,
"dI": units.inverse_length,
"dQ": units.inverse_length,
"dQx": units.inverse_length,
"dQy": units.inverse_length,
"dQz": units.inverse_length,
"z": units.length,
"G": units.area,
"shadow": units.dimensionless,
"temperature": units.temperature,
"magnetic field": units.magnetic_flux_density
}

#
# Other possible fields. Ultimately, these should come out of the metadata structure
#

metadata_fields = [
"temperature",
"magnetic field",
]



11 changes: 11 additions & 0 deletions sasdata/distributions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@


class DistributionModel:


@property
def is_density(self) -> bool:
return False

def standard_deviation(self) -> Quantity:
return NotImplementedError("Variance not implemented yet")
Loading