George Mason University 


Answers to Questions Asked About Exam

Question #4 in the Final Exam states the following: "Use the application discussed in class (BayesianPredictor-Version04) to assign the Nirvana county number (NC_Nr) to the two records from Question 3. Document the results for the following choices of identifiers.."   I'm unclear as to which two records from Question 3 we're supposed to utilize. Could you please clarify?

(1) In question 1 of the exam there are two records that you must transform from their original format to the target format.

(2) The steps of the transformation are Questions 2 and 3

(3) Once you have completed the transformation--i.e., you have the XSDs, and based on them have written and then executed the XSLT, you will have converted the original two records into the format of the target database. Those two records need to be examined to see whether they are different or equal to the records that already exist in the Training Data Set given in the exam. That's what Question 4 is alluding to.

I am working with the probability tables. I have the first two answers done, but when I try to put in the identifier table any other flags other than the admission and date of birth, I get an error message when clicking on the "match cases" that reads: too few parameters. Expect 1.  What does that mean, how do I fix it?

(1) I have run the exam in the data base and it works without glitches (see attached). Ergo, you  must be doing something wrong.

(2) Basically, the Identifier Table must have at least one field set with the flag = 2 to indicate which field is the 'key' and then any number of other fields set with the flag = 1.

(3) For the exam, the 'key' is the field 'NC_Nr' (i.e., that's the filed that must have its flag = 2).

(4) If she 'reset' the identifier table by running the first macro and forgot to set the flags as per (2) above then the code will not be able to execute and she'll likely get an error message like the one she is seeing. Every time you run the first macro you MUST open the Identifier table and reset the flags. Only after this step is completed can one run the 'Match Cases' macro.

Here are more details:

(a) The LR_Table is created automatically.

(b) IMPORTANT: Always close the LR_Table before you run the

'Match Cases' macro

(c) HIGHLY RECOMMENDED: Close all the tables before you run the

'Match Cases' macro to avoid inadvertently locking a record.

Follow the procedure below EXACTLY as it reads.

(1) Open the SourceCases in edit mode. Modify the structure of the SourceCases so that it now contains the fields of the records you are going to assign new keys using the Bayesian probalistic approach. Delete any extra fields that are not pertinent.

(2) Save the new structure of the SourceCases table. Open it. Enter  in the SourceCases table the records to be evaluated, e.g., cut and paste from Excel or from another database

(3) Delete the current CasesTable, then copy the SourceCases and paste it with ONLY THE STRUCTURE, then give it the name CasesTable.

(4) Open the CasesTable and copy one record (any one) from the SourceCases into the CasesTable.

(5) Delete the current TrainingDataSet, then copy the SourceCases and then paste it with ONLY THE STRUCTURE, then give it the name TrainingDataSet (6) Load the records for the training data set, e.g., cut and paste from Excel or from another database

(7) Run the first macro 'Create Identifier Table'. This will create the IdentifierTable

(8) Open the IdentifierTable and set the flags to 1 for those fields that are going to be used as identifiers, and to 2 for the field that is the 'key'

(9) Run the second macro 'Match Cases' to assign keys to the records in the SourceCases

1.Do I have to change the queries in the BayesianPredictor-Version04 example in order to use it for the exam? 2. In question two, when I use, Height as a numeric field of type 'short' it does not validate. If I use string it validate. Would check the question and let me know.

(1) To run the Bayesian application the only thing that you need to change is the structure of the three tables: SourceCases, CaseTable and TrainingDataSet.

You can begin with, for example, the TrainingDataSet. Open the current table, delete all the records, then click on the icon that allows you to edit the table structure. Create the new structure to match the the fields of the target database and save it. Once you have the new structure, you can delete the old SourceCases and the CasesTable. Then you can copy the TrainingDataSet table (structure only) twice. The first copy you rename CasesTable, the second copy you rename SourceCases. Open the SourceCases and load the data into the SourceCases (i.e., the two records you must have obtained from question 3). Next open the CasesTable and copy one record from the SourceCases into the CasesTable (any one you want). Close the tables if they are still open. Lastly, copy the data given in the exam for the TrainingDataSet into that table.

(2) Next run the macro to create the IdentifierTable. Open the IdentifierTable and set the flags for the 'key' (NC_Nr) and the 'identifiers' as directed in each of the subparts of Question 4.

(3) Finally, run the macro 'Match Cases'. You will have to change the flags in the IdentifiersTable every time that you run a new combination of 'identifiers'. NOTE: You do not need to run the first macro every time. However, if you do, remember to set ALL the flags correctly.

(4) The values given in 'Height' are of type string. You won't be able to validate using short.