Data was collected from http://archive.ics.uci.edu/ml/datasets/Automobile on the curb weight (kg) of cars, their engine sizes (cc) and the wheel base size (cm).
The curb weights of each car were recorded in kilograms and the engine size cubic centimetres of 206 selected cars. The data chosen to investigate is continuous and hence appropriate for regression analysis.
The purpose of this investigation to find if there is a relationship between the engine sizes of a car (predictor variable) and the curb-weight of a car (response variable) and to compare this relationship with one of the wheel base size of a car (predictor variable) and curb weight (response variable) .
This graph is a scatter plot of Engine Size (cubic centimetres) vs. Curb Weight (kg).
The scatter appears to particularly constant and to verify the validity of its linearity; it will be investigated under the Engine Size vs. Curb Weight Residuals section. There seems to be 3 possible outliers in the 5000-6000 cc range and also seems to be possible grouping in the 1500-2500 cc range. The straight line is the line of best fit between the x and y variables (engine size and the curb weight respectively) and has an equation of y=0.294x+546.9. The correlation coefficient is r= 0.85 and it measures the strength of the linear association and indicates that as the curb weight tends to increase as the engine size increases and that there is a strong, positive, linear relationship between the two variables.
The Essay on Crossley Gas Engine Output Car Speed
The engine tested was a Crossley Gas Engine, which when tested had a power output of 3. 993 kW in comparison to this a typical petrol engine would have an output of approximately 70 kW and a tractor unit (used to pull an articulated lorry) would have an output 250 kW. As can be seen this is a large difference in power output however many factors must be observed to see why this is the outcome. ...
The model is quite good and it is saying that for every 1 cubic centimetre increase in the engine size, we would estimate a 0.3 kg increase in curb weight.
This graph is a scatter plot graph of Wheel Base Size (cm) vs. Curb Weight (kg).
There is constant scatter. There seems to be some y-outliers in the 250-260 cm range and possible groupings about the 230-260 cm range. The line shown above is the line of best fit relationship between the two variables and has an equation of y=11.98x-1848. The correlation co-efficient is r =0.78 and it indicates that as the curb weight tends to increases as the wheel base size increases and that there is a strong, positive and linear relationship between the two variables. Our model is quite good and it is saying that for every 1 cm increase in wheelbase size, we would estimate there to be a 12 kg increase in curb weight.
Comparing the Relationships
Scatter Graphs
The relationship between Engine Size and Curb weight is stronger than that between Wheel Base Size and Curb Weight size because the plots on the graph are closer to the trend line. This is indicated by the larger correlation co-efficient (r) (in terms of strength and not actual number placing) of 0.85 for the Engine Size/ Curb Weight relationship against the weaker r-value of 0.78 for the Wheel Base Size/Curb Weight Relationship.
R2 Value:
The R2 value for the Engine Size and Curb Weight is 72.3% and for Wheel Base Size and Curb Weight is 60.2%. This confirms that the Engine Size/Curb Weight model is better as the R2 value is larger. However relatively low values (when compared with 80% as the minimum) of the R2 indicate that neither could be considered a good or reliable model for predicting Curb Weight.
Model 1: The coefficient of determination R2, indicates that 72.3% of the variability in Curb Weight can be accounted for by the regression model y= 0.294x+546.9 on x values (Engine Size).
Y is the Curb Weight of the car and x is the Engine Size of the car. The other 27.7% is due to other causes possibly lurking variables.
Model 2: The coefficient of determination R2 indicates that 60.2% of the variability in the curb weight of car can be accounted for by the linear regression model y=11.98x-1848 on Wheel Base Sizes .In this equation, y is the Curb weight (kg) and x is the Wheel Base Size. The other 39.8% is due to other causes.
The Essay on Inches Weight Total Model Pounder
6-POUNDER FIELD GUNS Page One The 6-pounder field gun was a lightweight, mobile piece that was a favorite of the field artillery in the first half of the nineteenth century. Rapid changes in technology and design had largely superseded it by the beginning of the American Civil War, but when superior weaponry was not available, some 6-pounders saw action. NOTE: While some of the guns illustrated ...
Residuals:
This residual graph does not appear to have a constant, random scatter. The residuals start from a very negative number (around about -200) and rise and become more positive as the engine size increases. However as it approaches the 2500cc mark, the residuals start to decrease again. This means the pattern of residuals looks more like a parabola. This should indicate that the data were not really linear.
If the 4 extremist outliers are removed (1 from the extreme positive residual end and 3 from the extreme negative end), the pattern in the bulk of the data can be seen:
We can observe, with the removal of the 4 extremist outliers, that there is there is still no random and constant scatter. Hence the linear model is not appropriate. In Alternative Approaches, I will discuss about non-linear model and see whether it will fit the data better than the model (now with removed outliers) and discuss the validity of outlier removal in Outliers.
The residual plot appears to have a constant scatter and no particular pattern. In order to further prove this, the three largest values (possibly outliers) are moved to better see any possible groupings.
Even now there is still constant scatter with the residual plot. Therefore a linear model is appropriate for modelling Wheel Base Size and Curb Weight. The large range of the residuals (from -300 to 600) and the size of the residuals themselves will prevent us from treating the model as reliable as indicated by the low R2 value (60.2%).
Predictions of Engine Size vs. Curb Weight (discussing it with the assumption as if it was a linear relationship)
Interpolation:
The linear model y= 0.294x+546.9 predicts that an engine size of 1770 cubic centimetres will give a curb weight of 1067 kilograms. An actual data value for 1770 cubic centimetres was 1086 kilograms. This is very close because of the strong relationship between Curb Weight and Engine Size (r=0.85).
The Term Paper on Henry Ford Car Model Cars
Henry Ford was a genius in many aspects or life. He changed industry, production, and everybody's lifestyle. Many people know about him because of the way he changed the automobile industry. He advanced technology and made life easier for the average American. Henry Ford was born on July 30, 1863 on a farm a few miles from Dearborn, Michigan (Merriam-Webster 27 D 9). His mother died when he was ...
There appears to be a levelling off of the data at the 1800 kg mark but the model continues in an upward direction, therefore making predictions beyond this point inaccurate. There is also a great variance in the residual plot at around the 1500-3000 cc range, making predictions within this range unreliable. So the model is only accurate at predicting at around the 1250-1500 range.
Extrapolation: This model will eventually give a negative curb weight after an engine size of -546.90.294 = -1860 cubic centimetres is reached. Both of these events are impossible as you cannot have a negative volume or weight. The same can be said for predicting data outside the data in the positive direction. An Engine Size of 15000 cc would give predict a curb weight of 4956.9 kilograms. Predicting outside of the data is unreliable as there are limits on just how small or large the engine size (or curb weights) generally are. Only in special circumstances like a limited edition car like the Fiat S76 which featured a 28300 cubic centimetre engine (http://www.teamdan.com/archive/cars/fiat/fiat_s76.html Accessed 16-3-12) or a car limitedly produced such as the V8 Nelson Racing Engines 705 HEMI BBC Pump Gas Warrior (carbidechainsawchain.org/carbide-chainsaw…/chainsaw-18-inch-gas/ Accessed 16-3-12) with an engine size of 11553 cc could a car have such large engine sizes.
Wheel Base Size vs. Curb Weight Interpolation: The linear model y = 11.98x – 1848 predicts that a Wheel Base Size value of 250 cm would give a Curb Weight Size of 1147 kilograms whereas an actual value gave a value of 1527 kilograms. This model underestimated the actual value of Curb Weight as shown by the residual plot. This great variance in many of the values as shown by the residual may indicate that another factor could be acting. This implies that Wheel Base Size is not the only variable that will affect and can act as a predictor for Curb Weight. So, prediction within the data is unreliable.
Extrapolation:
This model will give a negative curb weight below a Wheel Base Size of 184811.98= 154 cm. This shows the dangers and cautions that need to be executed when extrapolating a model as it can result in an insensible value (negative weight).
The Term Paper on Independent Variable Model Data Mortality
INTRODUCTION The purpose of this analysis was to develop a regression model to predict mortality. Data was collected, by researchers at General Motors, on 60 U. S. Standard Metropolitan Statistical Areas (SMS As), in a study of whether air pollution contributes to mortality. This data was obtained and randomly sorted into two even groups of 30 cities. A regression model to predict mortality was ...
The same model will still give values below a Wheel base size of 0 and this again is insensible as lengths and weights are scalar quantities that can only take positive values.
Correlation and Causation
The correlation co-efficient for both models was high in nature the Engine Size/Curb Weight relationship have an r= 0.85 and the Wheel Base Size/Curb Weight relationship have an r=0.78 (using r=0.8 as the approximate cut-off for a strong linear association).
Though these relatively high r-values indicate there is some relationship between engine size, wheel base size and curb weight they do not imply causality. It would be more appropriate to say there is an association between the two phenomena. Firstly, this is an observational data and study and not an experimental data and study so the two variables involved in each study cannot be trialed and viewed in isolation. This means that causality from correlation cannot be proven. There are many lurking variables that have an effect on curb weights such as a sunroof on the car, hatachback, bonnet size and the type of metal that casing of the car or the mainstay. Another lurking variable could affect the curb weight between car to car is the amount of tank of fuel and other fluids like engine oil, coolant, brake fluid, transmission fluid and any other fluid that is included in the car by the manufacturer and stated as the total curb weight. Engine Size will have a stronger relationship (discussing it with the assumption as if it was a linear relationship) than the wheel base size. This is probably because the constituents of the engine size are the very similar (pistons) and made of similar material between cars (excluding new developed hybrid cars) so will the variance in the data would be less (as proved by the residual range in the residual plot).
It could possibly gauge the weight of the car (without any passengers or stuff in the boot) as the larger the engine size, the larger the volume which would lead to a larger engine weight and contribute to a larger curb weight. Although Wheel Base size can be used to gauge a car’s curb weight, it does not necessarily contribute to a car’s weight. Many cars, even conventional ones, and different mass distribution which will affect the curb weight of the car. A car with a wheel base size of 245 cm can have a curb weight of 918kg and 1053kg.Thus a study between only two variables, such as Wheel Base Size vs. Curb Weight will never (in a observational study) create evidence for causality. There are way too many other variables that affect a bi-variate study.
The Term Paper on Hybrid Cars Gasoline Engine
The cars we use all over the world are detrimental to our Earth's environment. In the United States, air quality often fails to meet federal standards. Air pollution, water pollution, global warming, and ozone depletion are some of the problems we face each day that reflect the consequences of our actions. The cars we drive emit exhaust gas, whose harmful elements cause acid rain and global ...
Assumptions about the Data:
1. I have assumed, throughout my report, that both models had a linear relationship between the variables involved. In the case of the Engine Size as the predictor a linear model was not justified because the residual plot pattern showed a negative parabola. So, despite the slightly lower r value of r=0.78, I was more justified using a linear model with Wheel Base size as my predictor variable as there was no pattern in the residual plot.
2. I also assumed that the residuals were normally distributed. I was not justified by using a linear model to model the Engine Size/Curb Weight relationship as there was non-constant and non-random scatter. Thus, I was only justified in using a linear model for the Wheel Base Size/ Curb Weight relationship as the residuals were normally distributed, and showed constant and random scatter.
3. I assumed that continuous and random variables were used to predict Curb Weight. Engine Size and Wheel Base size had many different values and ranged from 1000-5342 cc and 220-307cm respectively. This is because some data may be discrete and/or multitudinous in value.
4. I assumed there were no outliers which can be justified as in the Wheel Base Size/Curb Weight relationship, as when the possible outliers were removed, there is still constant scatter. When outliers were removed in the Engine Size/Curb weight relationship, it did not change the non-linear scatter meaning it is justified.
5. I have assumed that all of the data had been collected in the same condition and using the same method. I was justified as these are 1985 Auto Import data and was collected using 1985 Model Import Car and Truck specification and 1985 Ward’s Automotive Yearbook.
Limitation of the Investigation:
The date the data was collected was in 1985. This is a big limitation as there may be new technology (electronic stability control etc…) and new development in cars (hybrid and energy-efficient cars) which can affect the variable investigated (curb weight of the car, engine size and wheel base size. This lack of current data can affect the model, the outcomes of my investigation and hence the validity of the model. It can also be irrelevant to the world today. An example of such is the data given for Mercury make cars. The brand is now defunct meaning it will not produce cars of this make anymore so is not relevant for current investigation and hinders the success of an accurate investigation.
The Essay on Osi Model Data Link Layer
Michael Rau seo IT 310 OSI Model In the early years of computer and network research and development many systems were designed by a number of companies. Although each system had its rights and were sold across the world, it became apparent as network usage grew, that it was difficult, to enable all of these systems to communicate with each other. In the early 1980 s, the International ...
Pieces of Data were not available between the 4000-5000 cc ranges of the Engine Size/ Curb Weight model. This would affected the outcome of the scatter graphs and linear models as could have affected the distribution pattern of residuals and could have made it more obvious to levelling off in the non-linear Engine Size/Curb Weight model .
The sample only included 22 makes of cars and not a census. 22 makes of cars may not allow us to gain an in-depth insight into the trends of Engine Size, Wheel Base Size and Curb Weights of cars as there are 3505 different makes of cars (not including all the minor automotive manufacturing companies) (http:members.chellos.nl/j.baartse/cars Accessed 21-3-12).
These added data could have possibly made a great difference to the data by significantly increasing or decrease the correlation co-efficient. And, it could make it easier for us to investigate the different between correlation and causality with more evidence present and make better predictions and estimation from the data. Thus, it will be more representative of the data set if there was a census-style investigation conducted.
Alternative Models:
Although there was sufficient evidence to justify the use of a linear model for Wheel Base Size (due to the high r value and constant, random scatter in the residual plots), there was nowhere near enough for Engine Size. The scatter plot seems to be levelling off at the 5000 cc mark and a pattern in the residual which indicates that is a non-linear model. The pattern displayed when an exponential model is linearised seems to make the data look more non-linear.
But when a power model is linearised, the data does looks more linear. The appropriateness of the power model is due to the fact it now has constant scatter and the plot is straight so correlation is now an appropriate measure of association.
The R2 value is slightly higher than the original linear one of 72.3%. There is an extra 2.5% of variability in the data is accounted for by the model. The shape of the trend line does account more for levelling of the curb weight values with a higher engine size. However, the model should not account for the fact that when the engine size is 0 cc, the curb weight is 0 kg as many other components of the car will still contribute to its curb weight (despite this situation being almost near impossible).
But it is inconsequential that the increase in R2 is marginally higher as what matters is the scatter is constant and the form of the plot is now straight, so the correlation is now an appropriate measure of association.
Piecewise Data:
Another alternative approach would have been to look for different groupings within the data and then conduct linear regression within these groupings. This approach would be valid if the front wheel drive (fwd) and rear wheel drive (rwd) were looked at separately (4wd had insufficient data for analysis).
There were evident groupings in each of my bivariate study.
There will be obvious groupings because those of the front wheel drive will have a have a compact structure and are limited to smaller engine sizes so will be grouped in the 1500-3000 cc range. The fwd cars had fewer components and will reduce the curb weight. The data shows that fwd cars generally had a lower curb weight than that of the rear wheel drive car. The number of 4wd in this sample is so small; we cannot determine whether or not grouping is present. The wheel base is smaller for the fwd because the car might have been smaller (due to a smaller engine) reducing the distance between the front wheel and rear wheel.
My study would then have to compare the differences of the data relationship between Fwd and Rwd. In this case , I decided to use Wheel Base size vs. Engine Size.
From these models, I can say that both regions showed a medium strength, positive linear relationship. The groupings would be a better way to compare Fwd and Rwd (4wd did not have sufficient data for a proper regression analysis).
The relatively similar r values indicate that the relationships between the two variables could have a similar strength with both wheel drive formats. My original study is very similar to this one because it still gives the general trend of increasing curb weight with an increase wheel base size for both fwd and rwd (together and apart).
Therefore, it would be unnecessary to split the data into regions and place a model on each. However, in viewing the data in groups in the wider context of both the drive wheels, allows for a good comparison.
Outliers:
I checked to see if these “outliers” (4982, 1685) and (5047, 1769) and (5342,1792) in the Engine Size/ Curb Weight are mistakes or unusual observations. It seems very unlikely that they were mistakes as when the x-values are reversed (transposed) we get, for example, an Engine Size with 9482cc but a Curb Weight of 1685 kg. Such an event seems very, very unlikely so we can dismiss this point. If 5047cc value was transposed we would get 547 cc and 1769 kg. This could be very unlikely as a small car like the Mini Golf is approximately 848cc (www.austinmemories.com/page8/page36/page36.html) so smaller engine is automobile is very unlikely. If the y values were transposed a 5342 cc engine size would have a 6185 kg (This could be the size of a small truck).
But since our sample is on automobiles, this is another dismissible fact. I believe it is safe to say, these value are unusual observations. Possible reasons are that may result in such an unusual observation could be because they had lighter constructions and design for its engine size which explained the slightly smaller curb weight which explains why it is 1685 kg not 2012kg (according to my linear model when 4982cc engine was substituted into the equation).
The Engine Size/Curb Weight model seems to have outliers in the x-direction which seem to be making the graph look logarithmic. When these outliers were removed, the graph seems to be more linear and did not show any levelling off with values (a common feature seen in logarithmic graphs).
But, just like the previous example, the removal of these “outliers”, improve the non-constant scatter. This is because the data itself may be non-linear so the “x-outlier’s” contribution did not significantly alter the position of the trend line (though it did slightly pull the trend line down).
Thus, the outlier can be classified as a straggler (as it seems to be wondering off in both the graph and far away from all the data sets in the residual plots).
Or in reality these “outliers”, probably weren’t very extreme values so aren’t really outliers and so removal of outliers is not valid meaning it cannot be taken out of the model.
Relevance and usefulness of the evidence and how widely the findings can be applied
The data collected was in 1985 so will be relevant to 1985 and will be irrelevance to today’s evidence. This is because cars today have a different structure (to make it lighter, more streamlined, have more even weight distribution and more fuel efficient) and are equipped with more powerful engines and more gear choice . Based on these 3 points alone, all our variables investigated (wheelbase, engine size and curb weight) would have altered significantly. So this data would be not be useful for predicting the curb weight of modern cars as many new lurking variables such as (no of gears, cylinders) will affect the curb weight . The most obvious way of using the data would to be gaining a general sense of what factors may most affect the curb weight of a car and what to look out for if you want to purchase a light weight car. A higher engine size implies (but does not mean that it is certain) a higher curb weight for that car.
My study is only relevant for these cars and models as circumstances based around engine size and wheelbase will be different with different models of the same brand (such as Holden V8 Race car and a everyday car like the Holden Barina).
This study could be applied to car enthusiasts, professional race car drivers who wish to roughly gauge how the construction of cars has evolved and what trends to look out for when purchasing a new car and what will suit their requirements. The data was collected from an American source so the data in the sample may not be relevant to all situations. Different parts of the world will import certain brands and certain makes of the cars (due to the demands of certain makes by customers from a certain country), so some of the makes will not be relevant for predictions in that certain country or region. The season of the year could affect the demand of certain cars wanted. In winter, cars with a better mileage and better traction control, a rear-window defroster and fog lights are selected for and manufacturers construct these cars with these demanded features (http://www.forbes.com/2009/01/23/best-cars-winter-forbeslife-cx_he_0123cars.html, Accessed 25/3/12) .The survey did survey large brands such as Toyota, Honda and Mercedes Benz but only apply to a certain sector as only some models from a make were selected. This means my model is only applicable and valid to those certain models and that certain time frame as those models may now be defunct or evolved significantly limiting the model’s ability to give useful predictions by a very large scale. This unreliable predictability is indicated by the low R2 values of both the model (with 80% as the value).
The absence of vital data (in the4000-5000 cc ranges of the Engine Size/ Curb Weight model) , as stated in Limitations, will limited the representation of the car population being investigated and may affect the outcome of the scatter graphs and linear/non-linear models- it could have shown a greater levelling off pattern which would confirm its (non-linearity) or made the scatter more constant (to justify the how it is in fact linear).
My observational study on possible predictors for curb weight is not a comprehensive study. As stated, there is insufficient data as I only have 23 out of 3505 makes in the world. Therefore, it can only act as a general indicator for certain groups (such as Fwd, rwd) but not individual makes and can only show a generic trend in curb weight of cars.