-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sourcery Starbot ⭐ refactored dialvarezs/alphafold #4
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to GitHub API limits, only the first 60 comments can be shown.
@@ -183,7 +183,6 @@ def predict_structure( | |||
models_to_relax: ModelsToRelax): | |||
"""Predicts structure using AlphaFold for the given sequence.""" | |||
logging.info('Predicting %s', fasta_name) | |||
timings = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function predict_structure
refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block
) - Merge dictionary assignment with declaration (
merge-dict-assign
)
if FLAGS.model_preset == 'monomer_casp14': | ||
num_ensemble = 8 | ||
else: | ||
num_ensemble = 1 | ||
|
||
num_ensemble = 8 if FLAGS.model_preset == 'monomer_casp14' else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function main
refactored with the following changes:
- Replace if statement with if expression (
assign-if-exp
)
command_args = [] | ||
|
||
# Mount each fasta path as a unique target directory | ||
target_fasta_paths = [] | ||
for i, fasta_path in enumerate(args.fasta_paths): | ||
mount, target_path = _generate_mount(f"fasta_path_{i}", Path(fasta_path)) | ||
mounts.append(mount) | ||
target_fasta_paths.append(target_path) | ||
command_args.append(f"--fasta_paths={','.join(target_fasta_paths)}") | ||
|
||
command_args = [f"--fasta_paths={','.join(target_fasta_paths)}"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function main
refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block
) - Merge append into list declaration (
merge-list-append
)
default=datetime.today().strftime("%Y-%m-%d"), | ||
default=datetime.now().strftime("%Y-%m-%d"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function parse_arguments
refactored with the following changes:
- Replace datetime.datetime.today() with datetime.datetime.now() (
use-datetime-now-not-today
)
|
||
pdb_lines = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function to_pdb
refactored with the following changes:
- Merge consecutive list appends into a single extend [×2] (
merge-list-appends-into-extend
) - Move assignment closer to its usage within a block (
move-assign-in-block
) - Merge append into list declaration (
merge-list-append
)
This removes the following comments ( why? ):
# Close the final chain.
all_paired_msa_rows = list(np.array(all_paired_msa_rows).transpose()) | ||
return all_paired_msa_rows | ||
return list(np.array(all_paired_msa_rows).transpose()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _match_rows_by_sequence_similarity
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
merged_example[feature_name] = np.sum(x for x in feats).astype(np.int32) | ||
merged_example[feature_name] = np.sum(iter(feats)).astype(np.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _merge_features_from_multiple_chains
refactored with the following changes:
- Simplify generator expression (
simplify-generator
)
sequence_set = set(tuple(s) for s in chain['msa_all_seq']) | ||
keep_rows = [] | ||
# Go through unpaired MSA seqs and remove any rows that correspond to the | ||
# sequences that are already present in the paired MSA. | ||
for row_num, seq in enumerate(chain['msa']): | ||
if tuple(seq) not in sequence_set: | ||
keep_rows.append(row_num) | ||
sequence_set = {tuple(s) for s in chain['msa_all_seq']} | ||
keep_rows = [ | ||
row_num for row_num, seq in enumerate(chain['msa']) | ||
if tuple(seq) not in sequence_set | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function deduplicate_unpaired_sequences
refactored with the following changes:
- Replace list(), dict() or set() with comprehension (
collection-builtin-to-comprehension
) - Convert for loop into list comprehension (
list-comprehension
)
This removes the following comments ( why? ):
# Go through unpaired MSA seqs and remove any rows that correspond to the
# sequences that are already present in the paired MSA.
if line[:4] == '#=GS': # Description lines - keep if sequence in list. | ||
if line.startswith('#=GS'): # Description lines - keep if sequence in list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _keep_line
refactored with the following changes:
- Replace str prefix/suffix check with call to
startswith/endswith
(str-prefix-suffix
)
for line in f: | ||
if _keep_line(line, seqnames): | ||
filtered_lines.append(line) | ||
|
||
filtered_lines.extend(line for line in f if _keep_line(line, seqnames)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function truncate_stockholm_msa
refactored with the following changes:
- Replace a for append loop with list extend (
for-append-to-extend
)
else: | ||
seen_sequences.add(masked_alignment) | ||
seqnames.add(seqname) | ||
|
||
filtered_lines = [] | ||
for line in stockholm_msa.splitlines(): | ||
if _keep_line(line, seqnames): | ||
filtered_lines.append(line) | ||
seen_sequences.add(masked_alignment) | ||
seqnames.add(seqname) | ||
|
||
filtered_lines = [ | ||
line for line in stockholm_msa.splitlines() | ||
if _keep_line(line, seqnames) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function deduplicate_stockholm_msa
refactored with the following changes:
- Convert for loop into list comprehension (
list-comprehension
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
for i in range(len(block_starts) - 1): | ||
hits.append(_parse_hhr_hit(lines[block_starts[i]:block_starts[i + 1]])) | ||
hits.extend( | ||
_parse_hhr_hit(lines[block_starts[i]:block_starts[i + 1]]) | ||
for i in range(len(block_starts) - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function parse_hhr
refactored with the following changes:
- Replace a for append loop with list extend (
for-append-to-extend
)
# Example 1: >4pqx_A/2-217 [subseq from] mol:protein length:217 Free text | ||
# Example 2: >5g3r_A/1-55 [subseq from] mol:protein length:352 | ||
match = re.match( | ||
if match := re.match( | ||
r'^>?([a-z0-9]+)_(\w+)/([0-9]+)-([0-9]+).*protein length:([0-9]+) *(.*)$', | ||
description.strip()) | ||
|
||
if not match: | ||
description.strip(), | ||
): | ||
return HitMetadata( | ||
pdb_id=match[1], | ||
chain=match[2], | ||
start=int(match[3]), | ||
end=int(match[4]), | ||
length=int(match[5]), | ||
text=match[6]) | ||
else: | ||
raise ValueError(f'Could not parse description: "{description}".') | ||
|
||
return HitMetadata( | ||
pdb_id=match[1], | ||
chain=match[2], | ||
start=int(match[3]), | ||
end=int(match[4]), | ||
length=int(match[5]), | ||
text=match[6]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _parse_hmmsearch_description
refactored with the following changes:
- Use named expression to simplify assignment and conditional (
use-named-expression
) - Lift code into else after jump in control flow (
reintroduce-else
) - Swap if/else branches (
swap-if-else-branches
)
This removes the following comments ( why? ):
# Example 1: >4pqx_A/2-217 [subseq from] mol:protein length:217 Free text
# Example 2: >5g3r_A/1-55 [subseq from] mol:protein length:352
aligned_cols = sum([r.isupper() and r != '-' for r in hit_sequence]) | ||
aligned_cols = sum(r.isupper() and r != '-' for r in hit_sequence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function parse_hmmsearch_a3m
refactored with the following changes:
- Replace unneeded comprehension with generator (
comprehension-to-generator
)
features = {} | ||
features['aatype'] = residue_constants.sequence_to_onehot( | ||
sequence=sequence, | ||
mapping=residue_constants.restype_order_with_x, | ||
map_unknown_to_x=True) | ||
features = { | ||
'aatype': | ||
residue_constants.sequence_to_onehot( | ||
sequence=sequence, | ||
mapping=residue_constants.restype_order_with_x, | ||
map_unknown_to_x=True, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function make_sequence_features
refactored with the following changes:
- Merge dictionary assignment with declaration (
merge-dict-assign
)
cif_path = os.path.join(mmcif_dir, hit_pdb_code + '.cif') | ||
cif_path = os.path.join(mmcif_dir, f'{hit_pdb_code}.cif') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _process_single_hit
refactored with the following changes:
- Use f-string instead of string concatenation (
use-fstring-for-concatenation
) - Replace interpolated string formatting with f-string (
replace-interpolation-with-fstring
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
template_features = {} | ||
for template_feature_name in TEMPLATE_FEATURES: | ||
template_features[template_feature_name] = [] | ||
|
||
template_features = { | ||
template_feature_name: [] | ||
for template_feature_name in TEMPLATE_FEATURES | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function HhsearchHitFeaturizer.get_templates
refactored with the following changes:
- Convert for loop into dictionary comprehension (
dict-comprehension
) - Use items() to directly unpack dictionary values (
use-dict-items
)
template_features = {} | ||
for template_feature_name in TEMPLATE_FEATURES: | ||
template_features[template_feature_name] = [] | ||
|
||
template_features = { | ||
template_feature_name: [] | ||
for template_feature_name in TEMPLATE_FEATURES | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function HmmsearchHitFeaturizer.get_templates
refactored with the following changes:
- Convert for loop into dictionary comprehension (
dict-comprehension
) - Use items() to directly unpack dictionary values (
use-dict-items
)
if not glob.glob(database_path + '_*'): | ||
if not glob.glob(f'{database_path}_*'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function HHBlits.__init__
refactored with the following changes:
- Use f-string instead of string concatenation (
use-fstring-for-concatenation
)
db_cmd.append('-d') | ||
db_cmd.append(db_path) | ||
db_cmd.extend(('-d', db_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function HHBlits.query
refactored with the following changes:
- Merge consecutive list appends into a single extend (
merge-list-appends-into-extend
)
if not glob.glob(database_path + '_*'): | ||
if not glob.glob(f'{database_path}_*'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function HHSearch.__init__
refactored with the following changes:
- Use f-string instead of string concatenation (
use-fstring-for-concatenation
)
db_cmd.append('-d') | ||
db_cmd.append(db_path) | ||
db_cmd.extend(('-d', db_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function HHSearch.query
refactored with the following changes:
- Merge consecutive list appends into a single extend (
merge-list-appends-into-extend
)
template_hits = parsers.parse_hmmsearch_a3m( | ||
query_sequence=input_sequence, | ||
a3m_string=a3m_string, | ||
skip_first=False) | ||
return template_hits | ||
return parsers.parse_hmmsearch_a3m(query_sequence=input_sequence, | ||
a3m_string=a3m_string, | ||
skip_first=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Hmmsearch.get_template_hits
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
raw_output = dict( | ||
sto=sto, | ||
tbl=tbl, | ||
stderr=stderr, | ||
n_iter=self.n_iter, | ||
e_value=self.e_value) | ||
|
||
return raw_output | ||
return dict(sto=sto, | ||
tbl=tbl, | ||
stderr=stderr, | ||
n_iter=self.n_iter, | ||
e_value=self.e_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Jackhmmer._query_chunk
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
single_chunk_results = [] | ||
for input_fasta_path in input_fasta_paths: | ||
single_chunk_results.append([self._query_chunk( | ||
input_fasta_path, self.database_path, max_sequences)]) | ||
return single_chunk_results | ||
|
||
return [[ | ||
self._query_chunk(input_fasta_path, self.database_path, max_sequences) | ||
] for input_fasta_path in input_fasta_paths] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Jackhmmer.query_multiple
refactored with the following changes:
- Convert for loop into list comprehension (
list-comprehension
) - Inline variable that is immediately returned (
inline-immediately-returned-variable
)
restype_atom14_to_atom37 = np.array(restype_atom14_to_atom37, dtype=np.int32) | ||
return restype_atom14_to_atom37 | ||
return np.array(restype_atom14_to_atom37, dtype=np.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _make_restype_atom14_to_atom37
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
restype_rigidgroup_base_atom37_idx = np.vectorize(lambda x: lookuptable[x])( | ||
base_atom_names) | ||
return restype_rigidgroup_base_atom37_idx | ||
return np.vectorize(lambda x: lookuptable[x])(base_atom_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _make_restype_rigidgroup_base_atom37_idx
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
assert len(atom14_data.shape) in [2, 3] | ||
assert len(atom14_data.shape) in {2, 3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function atom14_to_atom37
refactored with the following changes:
- Use set when checking membership of a collection of literals (
collection-into-set
)
# Create the global frames. | ||
# shape (N, 8) | ||
all_frames_to_global = backb_to_global[:, None] @ all_frames_to_backb | ||
|
||
return all_frames_to_global | ||
return backb_to_global[:, None] @ all_frames_to_backb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function torsion_angles_to_frames
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
This removes the following comments ( why? ):
# Create the global frames.
# shape (N, 8)
# Decide for each residue, whether alternative naming is better. | ||
# shape (N) | ||
alt_naming_is_better = (alt_per_res_lddt < per_res_lddt).astype(jnp.float32) | ||
|
||
return alt_naming_is_better # shape (N) | ||
return (alt_per_res_lddt < per_res_lddt).astype(jnp.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function find_optimal_renaming
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
This removes the following comments ( why? ):
# Decide for each residue, whether alternative naming is better.
# shape (N)
Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨
Here's your pull request refactoring your most popular Python repo.
If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.
Review changes via command line
To manually merge these changes, make sure you're on the
main
branch, then run: