Big Data and Me: 2015

Sunday, April 26, 2015

Final exam: What is Big Data? And why it is so hard to define it?

Big data is a Mac truck with the speed of a NASCAR. This sounds completely awesome until you think about it for a second. The amount of data generated on earth is growing exponentially. Retailers are compiling databases with customer information with every swipe of the card. Organizations in logistics, finance, health data, as well as, many other sectors are also capturing more data. Social media has the public at large creating vase quantities of digital material. The government and private organizations collect data for the purpose of visual recognition, as well as, many other types of data bases. Not to mention that even our appliances have Wi-Fi now. In addition to all of this data, technology is also unearthing large and complex data sets such as the human genome.
The flow of data is so fast and furious at this point that some companies have to sit back and watch valuable data go un-minded, as their systems just cannot keep up. And here in lies the big problem with big data. Our ability to gather data is far beyond our ability to process it.
What is Big Data? I think it is more correct to ask what big data are: New technics that generate value from very large data sets that cannot be analyzed with traditional technics. These technics have opened a virtual Pandora Box of information. Much like so many other complex ideas – it is easier to say what big data are like, as opposed to, what they are. And the reason for that is well……we are just not really sure.
What we are sure about when it comes to big data is this- It’s big, I mean really big. The sheer volume of data is the greatest challenge, as well as, its greatest opportunity. There is too much of it for any traditional relational data base. Global internet traffic is getting close to the five zettabyte mark, per year. There is no doubt that with the ability to harness all of that information – we could be a far more productive and reliable society.
In addition to the being big, big data is also fast. The velocity of flow of data across a modern day server is often too great to be anywhere close to 100% useful. Epically for a growing consumer base that is accustomed to real time information.
Another V that describes what big data is like also describes why it is so hard to define. The Variety of big data is far greater that anything we could have imagined, just a few decades ago. Whereas, the traditional data were things like document, financial report, stick info and personnel files - todays data consists mostly of photos, audio/video, 3D models, simulations and location data. This unstructured data is difficult to even categorize, none the less, process.
On my personal journey through the world of big data this semester; I think the most impressive software I found was Hadoop. Hadoop is an open source software library for a reliable, scalable, distributed computing. It was the first reliable platform for big data analytics. One of the most well-known users of Hadoop is Linked In. They have taken advantage of the speed afforded by the process of distributive computing. This is where the data is sent to multiple servers or clusters of servers for processing before the information is sent back.
If there is one thing that I have learned about this semester that has me optimistic about the future of computing; it’s the idea of quantum computing. Hough only theoretical at this point, they should excel at computing the massive unstructured data sets, because they would compute using quantum mechanics states.

.
One may be included to wonder what the big deal is with big data. Well, the upside of being able to accurately analyze data could make a world of difference for a large portion of the world’s population. Big data provides opportunity and insight in to new and emerging types of data. Whereas in the past the human body would give off more information that a human body could compute. In the future this may not be the case
Oracle stated that “big data holds the promise of giving enterprises deeper insight into their customers, partners and business”. And this is a good thing for everyone involved. It means less overhead and thus lower prices. The McKinsey Global institute states that “the US health care sector could achieve an estimated saving of 300 billion dollars a year, by way of increasing efficiency and quality”.
Big data may if fact is able to pride help to sectors far from Silicon Valley or Wall St. Big data could help help farmers accurately forecast wither and prevent crop failures. It could help prevent a pandemic or maybe even a war
The McKinsey Global institute states that the US health care sector could achieve an estimated saving of 300 billion dollars a year, by way of increasing efficiency and quality

Monday, April 6, 2015

What Makes Visualization So Important To Researchers of Big Data?

I think the first reason that visualization is so important to researchers has to do with just how big, big data is. For example: one would find it difficult to express data quantitatively with just a number standing by itself. Regardless of how big that number is; without anything to compare it to – it would be hard for one to wrap their brain around it. So, if someone were to provide a number to represent the amount of stars in the sky, that number would first off, quickly be forgotten, but more importantly, it would not have been fully understood in the first place. Now, if you were to show a graph comparing the number of stars in the sky to grains of sand on earth to the number of people on earth – this would provide context and that is one thing that visualization can do quickly and in a way that can be easily remembered.
The human brain works best as a pattern recognizer/trend spotter. This is a tendency that has been honed over the years, since our days as hunter gathers. Despite being a species that rarely hunts or gathers anymore, the tendency persists in our cognitive functions and is the main reason why we respond so well to visualization – it is just easier to spot the patterns when given information in visual form as compared to hearing or reading it in text.
Properly visualizing data (thus, properly understanding it) is imperative to the process of collecting and refining data. Without the use of visual aids such as a statistical distribution graph, a data scientist would find it difficult to properly understand the data that they are working with, in a timely manner. But, with these tools, one could for example quickly find the statistical properties of the data set that they are working with.
Additionally, visualization can allow people who many not be savvy to a particular process, still be able to obtain a reasonable understanding of it in a relatively quick amount of time. For example: one may not know the ins and outs of a financial report, but if they see a bar labeled expenses that is taller than the bar labeled income – even a novice will be able to understand that the business is in the black.
Now, there is no doubt that visualization is the best way to convey information in the digestible form, but that is not to say that the visualization process is in itself easy. In order to maximize the benefits; a whole array of professionals have contributed to the art of visualization. This has been done over the years with the help of psychologists, statisticians and computer scientists. One important factor is that of aesthetics. The colors, shapes and placement of objects must be done in such a way that it is easy to read and draws the viewer to the areas where the information is associating with metrics or other information. In order to be able to decode the information quickly.
In addition to aesthetics, the type of graph that is chosen for particular data is also very important. A data scientist could not practically provide time series information as well with a pie chart, comparedto an index chart. Some types of information can be aggregated together on a chart, while other information is best interpreted if it stands alone, like a histogram.
And that is why visualization to so important to people that works in big data. Because they are not only using it to their jobs better, but also to better articulate their findings to the main audience of that data. Data scientists need a way to bridge the gap of knowledge when it comes to big data; whether internally or externally. And visualization tools are the best tools for that job.

ggplot in R

For the purpose of this project; I will be using the diamond data from the ggplot library. I’ve decided to just use the order factor variable of cut (Good, Very Good, Premium, Fair and Ideal) for my graphs.

I then realized that I really don’t have a strong knowledge of what exactly a factor is within R. So, I took advantage of the help function in R to read up on it.

Ok, so now that I am a little wiser on the data I am working with – I am going to attempt to plot the data that I have selected. I have selected the head function to return the top six objects (diamonds), from the data set “diamonds”, by the aesthetics of their cuts.

Now, I have to decide what type of plot to use. For this, I am going to reference the suggest website of http://docs.ggplot2.org. I initially tried using “geom_area” to get an area plot (see below), but R gave be a blank page in return.

So, now I am going to try to do a histogram. I’m not sure if I have to re-load the “head (diamonds)”, but I did it anyway and got the return I was looking for.

Alright, it looks like I am on the right track here as I see that all five of the variables are represented on the histogram; and they are ordered as I expected them to be.
Despite the area plot not working, I want to try at least one other type before I move on to trying out different colors. I first tried a bar plot, but that looked just like the histogram. So, I got the density graph (whatever that is) to work also. And again, everything appears to be in order.

Alright, I’m going back to the histogram now. But, I am going to add the variable of clarity to cut, along with a color fill to (hopefully) present a histogram that shows both cut and clarity overlapped on the graph.

I’m pretty happy with the above graph. It shows the diamonds grouped by clarity along the x axis and has the cuts for each of those clarities represented by the colors with a legend on the right hand side.

Sunday, March 29, 2015

Week 11 summary
Alright, so, I tried to follow along, but hit a road block with the codding. I’m really not sure where I went wrong as I copied and pasted straight from the blog.

Here is what I did gather about predictive analytics from the module and the blog. First off, it’s not easy. I am not sure if this would be true for all data or if this was particular to the stock market or just Apple stock in particular. I recall reading in the book “The Ascent of Money” by Niall Ferguson, about how a few people came up with a logarithm to crack the stock market. It worked…until it didn’t. That is to say that it worked for a while, only to have it fail as spectacularly as it once succeeded. Perhaps this is a testament to the limitations of predictive analytics; whether that is a permanent or temporary situation. Let’s say it was temporary. What then? What happens when the stock market gets cracked? This is an interesting thing to ponder. What happens when a system that is to such a high degree reactionary, gets predictable? Do we end up with one of those unstoppable forces V. the unmovable object scenarios? This isn’t a topic that has been discussed yet in any of my courses and perhaps I am getting ahead of myself, but I can’t help but wonder – how predictive can analytics get? And in situations where the prediction can affect the predicted; is that a bad thing?
Alight, back on task, here. The process by which the Apple stock was attempted to be predicted seemed pretty solid to me. Step one, gather the data. This part seems pretty straight forward and relatively easy as stock market data is regularly tracked and recorded. Once gathered, the data was narrowed down. This seems reasonable as the less data one has, the easier it is to work with. Choosing other stocks that seem to correlate with Apple, does not strike me as the plan though, as this can only be historical and could not account for future goings on. Furthermore, I one is going to go this route- why not just use the beta value, as this is common stat that can be obtained without any further analysis. The method of clustering did not yield stellar results. Perhaps this had to do with the lack of for cite when choosing the other stocks to cluster it with. The Linear Models also did not yield good results, as did the support vectors, and the classification trees. This all makes me wonder about what type of data is even the right type of data to yield productive analytics and what data should not even be attempted to be analyzed predictively. Perhaps that will be covered in the next class.

Sunday, March 22, 2015

T.test in R

When comparing two sets of data; it is important to first know if the two sets are significantly different from one another. Fore, if they are not, there is no real point in comparing them. This is where a T test comes in. A T test will provide a hypothesis, such as - the means of the two data sets are different; and then test that hypothesis.
Upon assignment I preformed a T test with the following two data sets:
> supporters = c(12,13,21,17,20,17,23,20,14,25)
> Non supporters = c(16,20,14,21,20,18,13,15,17,21).
These sets represent samples of a population of 150,000 and their opinions about support for a congressman's idea regarding libraries.
The importance here is not the particular data, but the process by which it can be analyzed using R.
As the results below show, R can be used to run a T test by entering the data sets and then typing "t.test" and hitting enter. As the screen shot shows, it is important to not leave a space when you data set name has more than one word in it.

Sunday, March 15, 2015

Thursday, March 12, 2015

Summary of Characteristics and Attributes of machine leaning

I have to disagree with Mr. Kobielus’s analogy of log data to dark matter. Despite the 3 V’s of log data in the world today being beyond the scope of human comprehension – with machine learning, log data becomes palatable, whereas, dark matter is incomprehensible despite our best technological efforts to quantify it. This is not to say that on face value, both are not equally mysterious. Much like the mysteries of the cosmos, predictive data lost in a sea of data seems illusory to the human eye.
Despite the short coming of the human ability; it has provided the tools with which we can corral this seemingly infinite data. By way of algorithms, computers can not only detect important data sets missed by human eye, but it can also learn to become more proficient and finding those sets. Perhaps the best way to go about this process is unsupervised learning. This is the process by which a computer will cluster, compress and summarize data, so that a human mind can begin to comprehend it.

Despite the cold and unhuman sound of the name “machine learning”; Hanna Wallach outlines how it can provide some very human results. She also, attempts to clarify what exactly big data is. For example: how does big data as we know it differ from large amounts of data that would come from the field of partial physics? Hanna sources a few quotes for the definition of big data – the best of which is – the amassing of huge amounts of statistical information on social and economic trends and human behavior. In other words, as she states “unlike data sets arising in physics, the data sets that typically fall under the big data umbrella are about people…”. I think this is the crux of big data and many people may miss this.

One way that machine learning can in some ways be more human than human; is in now it can find smaller subsets within large amounts of data. For example: minority members of populations might not be well represented in a set, but people are still interested in learning what their statistics have to tell us. Machine learning can drill down into the data to discover and analyze these small sets for that purpose. Whereas, a human eye or even recent technologies would find it difficult to sift through to find the data that represents minorities. Hanna describes this process as granulizing the data – meaning that it is looked upon from a micro, as well as, a macro level.
The overarching theme of Hanna message is that in order to provide real improvement to our society via data mining; social sciences will need to be combined with computer science. And prioritizing social questions over data availability. She suggests a question driven approach, instead of working inductive or deductively from the data itself. She feels that this is the best way to avoid missing information of small granular size.
Perhaps the best models for obtaining the goal of social improvement through data mining are exploratory and explanatory ones. It is through these models, that the typical model of prediction can be achieved; when dealing with highly granulated data such as that of minorities in a large population. Unfortunately, human biases will play into these as well.
Another important aspect of Hanna’s message is that of the uncertainty that comes with the minorities of a data set. She feels that uncertainty should be expressed in order to properly account for those smaller data sets within a large one. Luckily many machine learning methods do just that.

Monday, February 23, 2015

Summary

Everything You wanted to Know About Data Mining but were afraid to ask

In much the same way that technology in general has shrunk the world; the outgrowth of that technology known as data mining has shrunk our ability to discern information about people based on their actions. Whereas is the past it would take a one on one personal relationship to know your customer; today, we know plenty about our customers without have to even know their names. It is this monetization of data that is truly driving its progress.
When working with very small amounts of data, anomalies and patterns are relatively apparent. But, when applying those same intuitive processes to big data; one would be looking for a needle in the preverbal haystack. Enter big data – the process of drilling down and threw large amounts of data in order to find a pattern, cluster, relationship or hell anything that has predictive capabilities. This is where the money is at. Once one can make a reasonable prediction, the rest is just directing marking efforts in the same direction.
The methods vary; whether it is association learning, cluster detection or classification; one thing is for sure – with so much value to be had – there is no end in sight for big data analysis.

Big data blues: the dangers of data mining

It is quite possible that data mining will start to see diminishing returns at the same point that customers become uncomfortable with personal information being collected. We certainly aren’t there yet, as the majority of people consent (although many times indirectly) to the use of their data for business purposes. This may however, not always is the case.
The best way to avoid any potential customer blowback is for a company to be honest about what data they are collecting and what they are doing with it. Some companies have gone so far as to the make this information publically available online. The inverse of that would be for a company to be collecting/using data in a way that would offend a customer, while hiding that fact. This could and has had negative effects on business/customer relations.
Another issue that arises is the complexity of the policies that many companies have. For example: does anyone really read those policy disclosures or do we all just skip to the end and accept. Even if one did read the entire policy; would they understand it? Chances are that unless that person has some professional expertise – no. This raises a question of fairness as a concerned consumer may not even have the ability to be an informed consumer.
Despite these shortcomings – data mining is a mutually beneficial practice that streamlines both the consumer and business experience. For these reasons many are confident that there can be an agreement reached between business and consumer that will allow both parties to be comfortable. This will most likely come in the form of standardized codes of conduct for businesses to follow. Standardized codes would be good because instead of consumer having to understand the policies of each and every business that they deal with – they will just need to be familiar with the code of conduct. And as long as the business lives up to these codes – all will be well. This will take action on the part of the consumer in the event of a breach of these codes, because if people can get away with braking them; they will.

Sunday, February 15, 2015

What is Data Mining?/ How can data mining can help any business?

According to Williams “Data mining is the art and science of intelligent data analysis.” And it aims is to discover meaningful insights and knowledge from data. Tough accurate, the previous description hardly sums up the effect that data mining is having on the world around us. These effects certainly appear to be positive on the surface but, is big data minding our privacy while mining our data? Or, can the two realistically even be separated?
Let’s start off with the neigh Sayers of the data mining community. Crawford writes “We are now faced with large-scale experiments on city streets in which people are in a state of forced participation, without any real ability to negotiate the terms and often without the knowledge that their data are being collected.” And he’s right. In the age of data mining nearly every decision you make – whether it is what to eat for lunch or where you buy a shirt – that information along with some information about yourself will be logged and later analyzed. This information will come full circle back to you by way of a direct mailer, popup ad, a solicitation phone call or one of many other formats. The gathering and then drilling through large amounts of information for specific purpose (even if that purpose is not yet known, is called data mining.
Privacy issues aside, there are, from business perspective, many great advantages to mining data. As Mascot puts it “big data has leveraged big ROI. What he means by this, is that the time and money spend on gathering, analyzing and making predictions based on large amounts of data is well worth the price of admission.

One such case is that of the Carolinas HealthCare System who “purchases the data from brokers who cull public records, store loyalty program transactions, and credit card purchases.” “The idea is to use Big Data and predictive models to think about population health and drill down to the individual levels,” Once on individual level, recommendations can be made once comparing ones medical records with their personal spending; thus life style choices. Though, considered highly intrusive by some; one could hardly argue the benefits of ensuring that someone with Diabetes does not purchase too much candy. This will allow the hospital to streamline procedures and ultimately reduce costs.

Outline one data mining technique as discussed by Rijmenam (2014) and Williams (2011) and provide its benefits and negative aspects.

Williams highlighted the idea of Data Mining team. In this frame work there are specialized players working together for the goals of the overall project. such as the data miners, domain experts and data experts. Together they will mine usefulness out data. The downside to this framework is that there typically isn’t any sort of industry expertise amongst the data folks. This issue will be attempted to be remedies through a series of meeting but, could be a fatal issue if not addressed correctly
.
Rijmenam, outlines a classic statistical method known as regression analysis in data mining. Regression analysis tries to define the dependency between variable. This model is highly useful in making models that have predictive capabilities. The down side to regression analysis is that it assumes a one-way causal effect from one variable to the response of another. In other words, this type of analysis can show the one variable is dependent on another but not vice-versa

Thursday, February 12, 2015

The State of Hadoop 2014: Who’s using it and why? By Mike Wheatley

Hadoop is crouched in the attack position; ready to take over the world of big data. Despite being over eight years old now; Hadoop has yet to meet its full potential. Despite the tremendous upside (due to its versatility), Hadoop really hasn’t yet exploded onto the scene of big data, as some expected. There are some conflicting reports of just how widely used it is today but, everyone seems to be in agreement that it is the platform to launch the future of data.
Perhaps, the reason why Hadoop has yet to surpass the likes of Microsoft SQL Server or Oracle has to do with a fear of change from the user base. But, this fear to destine to give way to the versatility of Hadoop. The majority of the early adopters of Hadoop have come from the analytics, advertising and security sectors. The forecasts trend to Hadoop being a major player in all fields that deal with data by 2020. As, the projections have them going from a value of 2 billion to 50.2 billion by then.

Facebook trapped in MySQL ‘fate worse than death’ by Derrick Harris

As a result of the rapid growth of Facebook (along with several of web based companies), a problem has arisen. The problem is that a company that was built on the technology of the day, most likely will be tied to that technology (to some degree) without a major overhaul. Major overhauls are typically not possible to a company whose product is web based and moves in real time. Despite many patch type solutions; when your core program is outdated, problems are inevitable.
Though Facebook use of MySQL will eventually have to the way of the Dodo bird; there are possible solutions out there. One of those solutions is a platform known as NewSQL. NewSQL is a platform designed for the next generation Web 3.0 application.
Of Couse the same problem will occur in a potentially perpetual cycle. And this is what makes the big data game so exciting. There is no end; now throwing your hands up in victory and saying: we’ve won. As technology increases, so will our need to accommodate the data that it procuces.

Monday, February 2, 2015

Summary: What is MySQL? MySQL is a BDMS developed, distributed and supported by Oracle. As with all data bases; it can function as a tool for everything from personal organization to data mining for a large corporation. Unlike using a list to keep track of information – MySQL uses tables (to separate the types of data) and relationships to link them back together in a way that provides maximum utility. This is done by way of Structured Query Language (SQL); which is the language/code that is used to determine specific types of relationships between the data. Perhaps the best thing about MySQL is that it is an open source program. This means that anyone can access and adjust the program for free. Also, the ability to multi-thread different programs, libraries tools and backends increases the function of this program. This functionability is what makes it so widely used today. As a result of the unencumbered programming that results from being open source – there are several sets of features available for MySQL. . Introduction to MySQL: In the broadest terms – a database is a way to store and retrieve data. This is most efficiently done by using a relational system. A relational database is a structured collection of tables, in which there are key fields and cross reference tables that link the tables together. It is important to have each table be for a specific set of information and let the tables join back together in a report or query. MySQL is widely used by everyone from Facebook to corporations to individuals. MySQL runs on command codes, known as prompts (EX: “u” – specifies that you are giving a login and “;” to stop a command). To create a table using the MySQL prompts would be: CREATE TABLE, followed by pert ant information. One thing to keep in mind is that the code is case sensitive. To inquire about data from an existing base – you would use the “SELECT” function, followed by pertinent information (EX: SELECT * FROM students WHERE name = “Joe;” Remember that the “;” is important, as it signifies the end of the command. You can also pull records based on numeric value (EX: SELECT sid, name FROM students WHERE GPA >= 3.5;”. The functions of the SQL language are too many to name but, it is important to note that this language will enable you to create a database that is suited to your needs. . Big Data: Net Tricks for Econometrics: As computing power changes, so does our ability to analyses data. Classical methods of data analyses, such as the multiple linear regression models may not always be sufficient for the amount of data and variable that we are able to capture today. Not only are classical methods becoming antiquated but, also are the relatively new DBMS’s. A solution to this in NoSQL. NoSQL is a system that focuses less on the ability to manipulate the data but, is able to handle large amounts of it. This fits the needs of many modern day companies such as Google and Facebook. With this much data available; the shear amount becomes an issue. There are several programs that work in conjunction with the DBMS to “cleanse” the data; thus making it more palatable to produce reports for predictions, summarizations, estimations and hypothesis testing. In conjunction with the programs; there are several new methods for analyzing data that differ from the classical linear models. Some of these are CART, random forest and LASSO. Considering the NoSQL model has little data manipulation ability and fact that there is so much data to analyst – this fits perfectly with the technology of machine learning; in which extensive testing is done to ensure models that perform well outside of the test data sample. Additionally, all of this works sell with the discipline of economics; as it has so much data and so many variables. The future is bright for bid data. As computer generated representations of our lives continue to grow exponentially, so will the need for a solid understanding of big data. Whether it is business, personal, economical or geopolitical – big data affects our work and will continue to do so.

Saturday, January 24, 2015

Observations - The sizing needs to be adjusted on the columns The format for the REGISTRY_ID column appears to be an incorrect format. The annotation used makes all of the numbers the same on the sheet. Once I changed the format – I see that all of them are still the same. How useful is this ID if they are all the same? The phone numbers do not have any -‘s in-between the numbers. This makes them difficult to read. There are many blank or unknown fields. Many of the abbreviations are not intuitive to me. With no context; this makes it difficult to know what exactly this data is for. The data does not appear to be in any particular order – not alphabetical, numeric or chronological. How to make sense of this data Part one – assuming that context cannot be found for this data. Since there is no context and the abbreviations are not intuitively understood – I feel that the best way to make sense of this data is to organize it based on the dates. I would sort the data starting with the oldest entry without an end date, followed by ones with the most recent end date. The decision to organize this list as such is an educated guess on my part. I assume that the dates are the most important data (do to a lack of understanding of some of the data) and further assume anything that is old and does not have an end date would represent an active acct with much history; thus need to be most easily accessible at the top of the list. Entries with recent end dates may represent a customer at risk or a person would may want to renew some sort of license. Another method would be to import this Excel sheet into Access. Whereas Excel is little more than a list – Access is a data base that would allow for more functionality where it comes to searching for any of the data in this sheet. Part two – a search for context. I went to data.gov and used their search and filter functions in an attempt to obtain context for this data. Despite searching for several of the data from the Excel sheet; I was not able to find this particular sheet on the site; thus was unable to obtain the contextual information that I thought would help determine how best to make sense of this data.

Sunday, January 18, 2015

Equipment Service

Wednesday, January 14, 2015

     After reading the syllabus and watching the first lecture of my USF course on "Big Data"; it's safe to say that I have chosen the correct educational direction for myself. Lets start from a professional perspective.
     I am the Rental Maintenance Coordinator for a large heavy equipment dealership. My main objective is to reduce the maintenance cost of our rental fleet. There is only one issue with this - I haven't put hands on an actual machine in years. I spend my days in a cubical, typical several hundred miles away from the machines that I am concerned with. A few years ago, no such position would have existed; as there would not be much one could do concerning machine maintenance from several hundred miles away. Technology has changed that. As of about five years ago, many of our machines started coming off the line with telematics devices installed. These devices provide information such as location, hours of use, pressure readings, maintenance scheduling, function usage and perhaps most importantly, diagnostic trouble codes (DTC).
     As useful as this information is, there is another problem - there is just too much of it to handle. Attempting to find correlation between the DTC, hours, service history, working conditions, known issues, etc. is next to impossible by memory alone.
     I have an Excel spread sheet with some of this information, but am not sure exactly what to do with it. I do on occasion notice a pattern that turns out to have predictive capability, but I know that there are a lot more going unnoticed.
     What I hope to accomplish with this course and my education in general is to establish a process for managing this data in a base and applying statistical models to ensure that the company is as efficient as possible in addressing the maintenance and mechanical failures of our rental fleet. There are two main objectives that I have . 1) Reducing travel time and mileage (TTM). It is not that this is typically the highest dollar amount on an invoice, but that it is potentially avoidable. If we were to send a machine out on rent to a customer, only to have it have a mechanical failure shortly there after - we have not only let our customer down, but also endured costs in addition to the repair itself. 2) Reducing major failures. When it comes to heavy equipment; small issues can become major issues in a hurry. For example: If a machine shoots off a code for an issue with the hydraulic pump; it may be a relatively simple repair if addressed quickly. However, if not addressed quickly; one small failed part could cause catastrophic damage.
     From a personal perspective; I find the rate at which technology is currently increasing to be fascinating. More so than the rate of progress is the rate at which our personal lives depend on it. As an older student, I can remember the days before cell phones and the personal computer boom. Therefore I consider the idea of "Big Data" to not only be interesting, but important, as well. With the exponential growth of technology; it only makes sense to have a genera that deals specifically with managing it all.