Clustering cancer patients with different symptom patterns over time: 2nd blog post

During the last few weeks of the Digital Scholarship & Publishing Studio Fellowship, I am revising my methodology paper as I discussed with my committee in the last meeting. In the previous data set, the dates of the first cycle of chemotherapy, treatments, diagnoses, and deaths were inaccurate. Thus, I obtained updated dates for the chemotherapy of 209 cancer patients. During the last two weeks, I clustered patients with the accurate dates of chemotherapy using different time periods, according to the attached video.

Previously, I used five symptoms; pain, nausea, and mobility, activity, and nutrition according to the Braden scale. But the Braden scale is not closely related to symptom severity after chemotherapy. So, I added three more symptoms: oral health, appetite, and psychosocial status.

To decide which time window to choose, I referred to previous literature. Albusoul et al. (2017) focused on symptom cluster change over time in breast cancer patients at four times: baseline, which is two days before the first chemotherapy treatment, during cycles 3 and 4, and one month after finishing chemotherapy. Byar et al. (2006) investigated the impact of adjuvant breast cancer chemotherapy on fatigue, other symptoms, and quality of life at 30, 60, and 90 days after the last treatment, and one year after the first treatment. Tracy (2016) identified symptom clusters and trajectories of depression and anxiety in Latina breast cancer survivors at baseline, 2, and 4 months post enrollment. So, I changed the time period of clustering from one month in my previous analysis to 3 months in the current analysis to evaluate the pattern of symptom clusters in a more extended time window.

For the next step, I need to create a spreadsheet including patients’ demographic and clinical characteristics and symptom cluster memberships using R codes. I was helped by Matthew Butler in the Studio when I revised my previous codes. In the new data sets, Cancer NAACCA and Cancer Vital, there are some discrepancies in the list of patients I had before. So, I sometimes filled out this spreadsheet manually to get patients’ medical records numbers (MRNs) and dates of death from reviewing in EPIC chart.

To sum up, I ran the expectation-maximization (EM) to cluster patients using one spreadsheet. The clinical implication is weak yet, so I will cluster patients using all cycles of chemotherapy rather than just first cycle. I hope I can get the interesting results soon from reanalysis using all cycles of chemotherapy and publish and share the clinically meaningful results with a larger audience. Participating as a fellow of Digital Scholarship & Publishing Studio was a valuable experience to realize the huge potential for using real health data. I gratefully appreciate your continuing support and the opportunity to meet other fellows and staff members.

-Sena Chae

Facebook Tweet