Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery Starbot ⭐ refactored dialvarezs/alphafold #4

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SourceryAI
Copy link

Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the main branch, then run:

git fetch https://github.com/sourcery-ai-bot/alphafold main
git merge --ff-only FETCH_HEAD
git reset HEAD^

Copy link
Author

@SourceryAI SourceryAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to GitHub API limits, only the first 60 comments can be shown.

@@ -183,7 +183,6 @@ def predict_structure(
models_to_relax: ModelsToRelax):
"""Predicts structure using AlphaFold for the given sequence."""
logging.info('Predicting %s', fasta_name)
timings = {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function predict_structure refactored with the following changes:

Comment on lines -352 to +350
if FLAGS.model_preset == 'monomer_casp14':
num_ensemble = 8
else:
num_ensemble = 1

num_ensemble = 8 if FLAGS.model_preset == 'monomer_casp14' else 1
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

Comment on lines -59 to +65
command_args = []

# Mount each fasta path as a unique target directory
target_fasta_paths = []
for i, fasta_path in enumerate(args.fasta_paths):
mount, target_path = _generate_mount(f"fasta_path_{i}", Path(fasta_path))
mounts.append(mount)
target_fasta_paths.append(target_path)
command_args.append(f"--fasta_paths={','.join(target_fasta_paths)}")

command_args = [f"--fasta_paths={','.join(target_fasta_paths)}"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

Comment on lines -183 to +180
default=datetime.today().strftime("%Y-%m-%d"),
default=datetime.now().strftime("%Y-%m-%d"),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function parse_arguments refactored with the following changes:

Comment on lines 158 to -159

pdb_lines = []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function to_pdb refactored with the following changes:

This removes the following comments ( why? ):

# Close the final chain.

Comment on lines -174 to +173
all_paired_msa_rows = list(np.array(all_paired_msa_rows).transpose())
return all_paired_msa_rows
return list(np.array(all_paired_msa_rows).transpose())
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _match_rows_by_sequence_similarity refactored with the following changes:

Comment on lines -363 to +361
merged_example[feature_name] = np.sum(x for x in feats).astype(np.int32)
merged_example[feature_name] = np.sum(iter(feats)).astype(np.int32)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _merge_features_from_multiple_chains refactored with the following changes:

Comment on lines -450 to +452
sequence_set = set(tuple(s) for s in chain['msa_all_seq'])
keep_rows = []
# Go through unpaired MSA seqs and remove any rows that correspond to the
# sequences that are already present in the paired MSA.
for row_num, seq in enumerate(chain['msa']):
if tuple(seq) not in sequence_set:
keep_rows.append(row_num)
sequence_set = {tuple(s) for s in chain['msa_all_seq']}
keep_rows = [
row_num for row_num, seq in enumerate(chain['msa'])
if tuple(seq) not in sequence_set
]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function deduplicate_unpaired_sequences refactored with the following changes:

This removes the following comments ( why? ):

# Go through unpaired MSA seqs and remove any rows that correspond to the
# sequences that are already present in the paired MSA.

if line[:4] == '#=GS': # Description lines - keep if sequence in list.
if line.startswith('#=GS'): # Description lines - keep if sequence in list.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _keep_line refactored with the following changes:

  • Replace str prefix/suffix check with call to startswith/endswith (str-prefix-suffix)

for line in f:
if _keep_line(line, seqnames):
filtered_lines.append(line)

filtered_lines.extend(line for line in f if _keep_line(line, seqnames))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function truncate_stockholm_msa refactored with the following changes:

Comment on lines -363 to +366
else:
seen_sequences.add(masked_alignment)
seqnames.add(seqname)

filtered_lines = []
for line in stockholm_msa.splitlines():
if _keep_line(line, seqnames):
filtered_lines.append(line)
seen_sequences.add(masked_alignment)
seqnames.add(seqname)

filtered_lines = [
line for line in stockholm_msa.splitlines()
if _keep_line(line, seqnames)
]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function deduplicate_stockholm_msa refactored with the following changes:

Comment on lines -504 to +501
for i in range(len(block_starts) - 1):
hits.append(_parse_hhr_hit(lines[block_starts[i]:block_starts[i + 1]]))
hits.extend(
_parse_hhr_hit(lines[block_starts[i]:block_starts[i + 1]])
for i in range(len(block_starts) - 1))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function parse_hhr refactored with the following changes:

Comment on lines -554 to -569
# Example 1: >4pqx_A/2-217 [subseq from] mol:protein length:217 Free text
# Example 2: >5g3r_A/1-55 [subseq from] mol:protein length:352
match = re.match(
if match := re.match(
r'^>?([a-z0-9]+)_(\w+)/([0-9]+)-([0-9]+).*protein length:([0-9]+) *(.*)$',
description.strip())

if not match:
description.strip(),
):
return HitMetadata(
pdb_id=match[1],
chain=match[2],
start=int(match[3]),
end=int(match[4]),
length=int(match[5]),
text=match[6])
else:
raise ValueError(f'Could not parse description: "{description}".')

return HitMetadata(
pdb_id=match[1],
chain=match[2],
start=int(match[3]),
end=int(match[4]),
length=int(match[5]),
text=match[6])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _parse_hmmsearch_description refactored with the following changes:

This removes the following comments ( why? ):

# Example 1: >4pqx_A/2-217 [subseq from] mol:protein length:217  Free text
# Example 2: >5g3r_A/1-55 [subseq from] mol:protein length:352

Comment on lines -598 to +591
aligned_cols = sum([r.isupper() and r != '-' for r in hit_sequence])
aligned_cols = sum(r.isupper() and r != '-' for r in hit_sequence)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function parse_hmmsearch_a3m refactored with the following changes:

Comment on lines -39 to +46
features = {}
features['aatype'] = residue_constants.sequence_to_onehot(
sequence=sequence,
mapping=residue_constants.restype_order_with_x,
map_unknown_to_x=True)
features = {
'aatype':
residue_constants.sequence_to_onehot(
sequence=sequence,
mapping=residue_constants.restype_order_with_x,
map_unknown_to_x=True,
)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function make_sequence_features refactored with the following changes:

Comment on lines -733 to +727
cif_path = os.path.join(mmcif_dir, hit_pdb_code + '.cif')
cif_path = os.path.join(mmcif_dir, f'{hit_pdb_code}.cif')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _process_single_hit refactored with the following changes:

Comment on lines -880 to +875
template_features = {}
for template_feature_name in TEMPLATE_FEATURES:
template_features[template_feature_name] = []

template_features = {
template_feature_name: []
for template_feature_name in TEMPLATE_FEATURES
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function HhsearchHitFeaturizer.get_templates refactored with the following changes:

Comment on lines -942 to +937
template_features = {}
for template_feature_name in TEMPLATE_FEATURES:
template_features[template_feature_name] = []

template_features = {
template_feature_name: []
for template_feature_name in TEMPLATE_FEATURES
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function HmmsearchHitFeaturizer.get_templates refactored with the following changes:

if not glob.glob(database_path + '_*'):
if not glob.glob(f'{database_path}_*'):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function HHBlits.__init__ refactored with the following changes:

db_cmd.append('-d')
db_cmd.append(db_path)
db_cmd.extend(('-d', db_path))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function HHBlits.query refactored with the following changes:

if not glob.glob(database_path + '_*'):
if not glob.glob(f'{database_path}_*'):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function HHSearch.__init__ refactored with the following changes:

db_cmd.append('-d')
db_cmd.append(db_path)
db_cmd.extend(('-d', db_path))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function HHSearch.query refactored with the following changes:

Comment on lines -127 to +129
template_hits = parsers.parse_hmmsearch_a3m(
query_sequence=input_sequence,
a3m_string=a3m_string,
skip_first=False)
return template_hits
return parsers.parse_hmmsearch_a3m(query_sequence=input_sequence,
a3m_string=a3m_string,
skip_first=False)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Hmmsearch.get_template_hits refactored with the following changes:

Comment on lines -157 to +161
raw_output = dict(
sto=sto,
tbl=tbl,
stderr=stderr,
n_iter=self.n_iter,
e_value=self.e_value)

return raw_output
return dict(sto=sto,
tbl=tbl,
stderr=stderr,
n_iter=self.n_iter,
e_value=self.e_value)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Jackhmmer._query_chunk refactored with the following changes:

Comment on lines -179 to +178
single_chunk_results = []
for input_fasta_path in input_fasta_paths:
single_chunk_results.append([self._query_chunk(
input_fasta_path, self.database_path, max_sequences)])
return single_chunk_results

return [[
self._query_chunk(input_fasta_path, self.database_path, max_sequences)
] for input_fasta_path in input_fasta_paths]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Jackhmmer.query_multiple refactored with the following changes:

Comment on lines -139 to +133
restype_atom14_to_atom37 = np.array(restype_atom14_to_atom37, dtype=np.int32)
return restype_atom14_to_atom37
return np.array(restype_atom14_to_atom37, dtype=np.int32)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _make_restype_atom14_to_atom37 refactored with the following changes:

Comment on lines -184 to +177
restype_rigidgroup_base_atom37_idx = np.vectorize(lambda x: lookuptable[x])(
base_atom_names)
return restype_rigidgroup_base_atom37_idx
return np.vectorize(lambda x: lookuptable[x])(base_atom_names)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _make_restype_rigidgroup_base_atom37_idx refactored with the following changes:

Comment on lines -229 to +220
assert len(atom14_data.shape) in [2, 3]
assert len(atom14_data.shape) in {2, 3}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function atom14_to_atom37 refactored with the following changes:

Comment on lines -434 to +425
# Create the global frames.
# shape (N, 8)
all_frames_to_global = backb_to_global[:, None] @ all_frames_to_backb

return all_frames_to_global
return backb_to_global[:, None] @ all_frames_to_backb
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function torsion_angles_to_frames refactored with the following changes:

This removes the following comments ( why? ):

# Create the global frames.
# shape (N, 8)

Comment on lines -778 to +765
# Decide for each residue, whether alternative naming is better.
# shape (N)
alt_naming_is_better = (alt_per_res_lddt < per_res_lddt).astype(jnp.float32)

return alt_naming_is_better # shape (N)
return (alt_per_res_lddt < per_res_lddt).astype(jnp.float32)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function find_optimal_renaming refactored with the following changes:

This removes the following comments ( why? ):

# Decide for each residue, whether alternative naming is better.
# shape (N)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant