Chat with us, powered by LiveChat We are in the process of cleaning up the COVID EHR data, and expect to get a model training soon. We attempted to train a set of preliminary models, but ran into some data issues. We were - Wridemy Bestessaypapers

We are in the process of cleaning up the COVID EHR data, and expect to get a model training soon. We attempted to train a set of preliminary models, but ran into some data issues. We were

  

Assignment 2: Machine Learning Model Training 1 

• Two multi-part, multiple-choice questions.

•  AI in Healthcare with Phase 2 data set (HTML file)

• Details of the Q1 & Q2 m/c questions are shown in the attached question files.

• Lecture notes on Machine Learning in Healthcare for your reference

Phase 2: Model Training, Part 1

Welcome to Phase 2 of the capstone project. This section will be the first of two parts that concerns the model training process of the model development cycle. You continue to play the role of a bioinformatics professor. The questions will relate to the various challenges faced by the teams working on the two projects introduced in the first section.

Your two research teams have begun working on the projects, and have some preliminary results. Both teams have e-mailed you summaries of progress thus far, which are shown below.

Project 1: CXR-based COVID-19 Detector

Hi,  We are super excited to get this project kicked off! We have implemented the data pipeline and trained a few preliminary models, but there is still lots of room for improvement. Here is what we’ve done so far:  We split the data randomly into a training and test set. We are placing 90% of the data into the training set and 10% of the data into the test set. Additionally, the images were initially massive, on the order of 3000 by 3000 pixels. So, we re-sized the images to 224 by 224 pixels.   We are using the ResNet-50 CNN architecture. During training, we are applying data augmentation. Concretely, on a given image, with 50% probability, we are zooming in on a small, randomly selected region before feeding it to the model. Here is an example:

So far, we have seen the following training curves from our model. The loss for neither the training set nor the test set goes down very much.

As you can see, there is plenty of room for improvement. We’ll keep working on it, but let us know if you have any suggestions. Thanks.

Project 2: EHR-based Intubation Predictor

Hello,  We are in the process of cleaning up the COVID EHR data, and expect to get a model training soon. We attempted to train a set of preliminary models, but ran into some data issues. We were wondering if you could take a look at some of the problems we’ve found in the data and let us know what you think.  First, we noticed that the EHR data is actually quite sparse relative to what we thought we have. We only have about 3,000 EHR records– not 30,000, as we originally thought. This leaves us with about 300 COVID-positive and 2,700 COVID-negative exams. We might not be able to train a model on this data alone.  We are noticing some very strange patterns in the data, particularly in the lab values. For example, see the following histogram of D-DIMER lab values found for each exam across the entire dataset. The x-axis is the D-DIMER lab values, and the y-axis is the number of exams with that count. We use a log-scale on the y-axis improve readability.

We saw this in several CSV columns, including Ferritin, and Procalcitonin lab values. **We suspect that there is some underlying phenomena affecting all three lab values.**  Another issue we were running into were missing column values. We can’t create a feature vector for Logistic Regression if we are missing some values. How do you suggest we proceed regarding both the large outlier values and the missing values? Below is an example of the data once again, this time a sample of 30 exams (with the observed symptoms excluded). Note that NaN in the CSV means that the value is missing. Please take a look and let us know if you see something that we might have missed.

In [1]:

pd.read_csv('COVID_19_sample_data.csv')[     ['pat_deid', 'intubation_date', 'IP_admission_date', 'IP_discharge_date', 'clinic',      'birth_date', 'death_date', 'gender', 'ethnicity', 'race_new', 'LYMAB', 'CK', 'CR',      'LDH', 'TNI', 'DDIMER', 'FERRITIN', 'PROCTL', 'PT', 'BUN', 'CRP',      'SPO2', 'FIO2', 'NA']].iloc[5:35] 

Out[1]:

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?



pat_deid intubation_date IP_admission_date IP_discharge_date clinic birth_date death_date gender ethnicity race_new LYMAB CK CR LDH TNI DDIMER FERRITIN PROCTL PT BUN CRP SPO2 FIO2 NA
5 8f9539f2-e6ad-4e00-ad45-4abc2bff2214 NaN 2020-03-04 2020-03-14 Clinic B 1965-11-11 NaN F nonhispanic white 1.0 51.6 NaN 232.4 0.0020 400000.0 503900.0 0.0 NaN 16.4 0.88 88.1 NaN 136.9
6 d5dd13c4-c31e-419c-8c02-47e4ca1ac5e2 NaN 2020-03-02 2020-03-20 Clinic B 2018-08-16 NaN F nonhispanic white 1.1 52.9 NaN 300.7 0.0030 700000.0 579600.0 100.0 10.3 16.2 0.83 80.6 32.8 140.0
7 91369e11-b944-4132-be0f-af46e880936b NaN 2020-03-02 2020-03-21 Clinic C 1972-09-22 NaN M nonhispanic white 1.2 45.0 NaN NaN 0.0047 600.0 562.3 0.2 12.8 10.7 0.85 84.4 NaN 141.4
8 c70992c9-ff13-467b-9032-1901506edeef NaN 2020-02-29 2020-03-05 Clinic C 1959-06-17 2020-03-11 M nonhispanic white 0.8 84.2 NaN 321.8 0.0092 1100.0 877.6 0.3 13.5 7.6 NaN 88.6 88.3 141.0
9 c70992c9-ff13-467b-9032-1901506edeef 2020-03-05 2020-03-05 2020-03-12 Clinic B 1959-06-17 2020-03-11 M nonhispanic white 0.5 29.7 NaN 391.9 0.0553 17900000.0 1786400.0 200.0 12.7 11.5 0.94 75.6 88.3 143.2
10 9ec7d743-96e7-47c8-b2ee-6336633beb39 NaN 2020-03-10 2020-03-23 Clinic B 1969-03-22 NaN M nonhispanic white 1.1 35.7 NaN 312.5 0.0045 600000.0 615900.0 100.0 11.1 15.6 0.98 83.7 NaN 143.8
11 9ec7d743-96e7-47c8-b2ee-6336633beb39 NaN 2020-03-23 2020-03-25 Clinic C 1969-03-22 NaN M nonhispanic white 1.2 24.4 NaN 254.3 0.0021 600.0 406.2 0.1 11.8 15.6 0.80 NaN NaN 143.8
12 a527bcf0-3746-476c-90f2-dbab8868385e NaN 2020-03-03 2020-03-17 Clinic C 1978-11-27 NaN F nonhispanic white 1.1 25.7 NaN 228.1 0.0038 500.0 459.5 0.0 12.8 13.5 1.14 77.7 NaN 138.4
13 7f4ef129-1511-47a9-a9b7-8b0b2d02ad50 NaN 2020-03-12 2020-03-25 Clinic C 1952-05-06 NaN F nonhispanic white 1.0 29.7 NaN 321.5 0.0038 700.0 450.1 0.2 12.1 15.0 1.20 75.8 NaN 139.6
14 7078ae9a-4c79-4b30-b127-f76aabb6763e NaN 2020-02-17 2020-03-07 Clinic B 1968-04-26 NaN F hispanic white 0.9 48.7 NaN 306.5 NaN 600000.0 509000.0 100.0 12.0 18.4 1.14 77.3 NaN 139.6
15 a5c39700-6bf3-4984-af46-31344695e21b NaN 2020-03-05 2020-03-13 Clinic A 1940-01-09 2020-03-15 M nonhispanic white 0.7 85.5 NaN 312.9 0.0257 NaN 1552.8 NaN NaN 14.9 1.23 79.2 64.5 143.8
16 a5c39700-6bf3-4984-af46-31344695e21b 2020-03-12 2020-03-12 2020-03-16 Clinic C 1940-01-09 2020-03-15 M nonhispanic white 0.5 170.6 NaN 390.5 0.0238 15000.0 1755.9 0.2 14.0 14.9 1.15 75.6 40.9 143.8
17 ddb2d5e2-643e-4374-ac19-f6ca3c0d16f5 NaN 2020-02-25 2020-03-09 Clinic C 1967-12-24 NaN M nonhispanic white NaN 51.5 NaN 234.4 0.0029 500.0 527.4 0.1 11.5 12.0 0.90 88.9 42.0 138.8
18 21505aac-f219-43a8-ab3c-f57c6d8f1d1f NaN 2020-03-08 2020-03-21 Clinic B 1940-05-03 NaN F nonhispanic white 1.1 38.7 NaN 229.3 0.0029 600000.0 536500.0 NaN 12.6 8.4 1.07 77.0 NaN 137.4
19 7992bf94-feee-4728-9187-2c911df2819b NaN 2020-03-03 2020-03-17 Clinic C 2004-07-04 NaN F nonhispanic white 1.0 27.3 NaN 238.9 0.0022 600.0 NaN 0.1 11.0 10.9 0.98 NaN NaN 138.4
20 d2f6d528-39db-4b7e-8389-abd27af9a710 NaN 2020-02-29 2020-03-12 Clinic B 1996-06-26 NaN F nonhispanic white 1.1 31.5 NaN 254.3 0.0034 700000.0 455300.0 0.0 10.3 14.6 1.08 76.6 NaN 138.4
21 fa0b58e6-6817-4d49-8211-1dd34abf0c15 NaN 2020-03-11 2020-03-28 Clinic C 2008-11-21 NaN M nonhispanic white 0.9 23.0 NaN 232.0 0.0030 600.0 542.0 0.1 10.8 10.6 0.88 86.3 NaN 141.6
22 b83237f3-9ff5-491e-aab4-d63ccff85f85 NaN 2020-03-13 2020-03-30 Clinic C 2012-11-17 NaN M nonhispanic white 1.1 47.2 NaN 329.8 0.0038 700.0 535.0 0.0 12.4 10.1 1.13 75.3 NaN 137.3
23 46988a9c-9c86-429a-bc4a-b3d14ff321b0 NaN 2020-03-11 2020-03-21 Clinic B 1957-03-13 NaN M nonhispanic asian NaN 37.0 NaN 235.7 0.0030 NaN 535000.0 100.0 10.2 19.1 0.86 79.1 NaN 137.5
24 46988a9c-9c86-429a-bc4a-b3d14ff321b0 2020-03-20 2020-03-21 2020-03-24 Clinic B 1957-03-13 NaN M nonhispanic asian 1.2 29.6 NaN 304.2 0.0044 500000.0 553400.0 0.0 11.8 9.2 1.13 98.7 86.8 143.5
25 785b484d-7060-4d17-bf18-ef8bbafc6f04 NaN 2020-02-28 2020-03-10 Clinic B 1942-08-24 NaN F nonhispanic white 0.8 21.3 NaN 226.3 0.0023 500000.0 468600.0 NaN 12.6 17.2 0.91 81.7 NaN 138.2
26 edad31f3-5a08-4678-8d31-271a41a2aad5 NaN 2020-03-05 2020-03-13 Clinic C 1940-01-09 2020-03-19 M nonhispanic white 0.6 78.6 NaN 306.4 0.0256 2600.0 1764.2 0.2 13.8 18.0 1.14 83.0 62.0 141.2
27 edad31f3-5a08-4678-8d31-271a41a2aad5 2020-03-12 2020-03-12 2020-03-20 Clinic C 1940-01-09 2020-03-19 M nonhispanic white 0.3 184.4 NaN 370.1 0.0639 20600.0 1804.0 0.3 11.5 7.7 1.23 84.4 NaN 142.2
28 4607a669-4a97-4f0a-9661-856569905047 NaN 2020-03-09 2020-03-21 Clinic C 1993-11-26 NaN F nonhispanic white 1.1 48.7 NaN NaN 0.0037 600.0 590.9 0.1 12.5 13.7 0.97 81.1 NaN 142.4
29 c1800ba1-7cba-45d7-bdc4-0e0b583932e4 NaN 2020-02-23 2020-03-08 Clinic A 2018-01-20 NaN M hispanic white NaN 25.7 NaN 306.4 0.0030 600.0 464.0 0.0 10.9 7.6 NaN 77.8 NaN 141.7
30 d2718050-2e9c-4d5b-842e-52d910c1563f NaN 2020-03-04 2020-03-17 Clinic C 1997-06-01 NaN M nonhispanic white 1.2 50.6 NaN 339.8 0.0047 600.0 634.5 0.2 10.2 13.6 1.08 81.5 NaN 137.9
31 d2718050-2e9c-4d5b-842e-52d910c1563f NaN 2020-03-17 2020-03-22 Clinic A 1997-06-01 NaN M nonhispanic white 1.3