10 Simple Hacks To Speed Up Your Data Analysis In Python

[1670 views]




Coding can solve almost all problems. And Python is particularly very apt for solving Data Analysis problems. But how do you intend to solve your problem of excessive time taken while coding? Well, that's the exact reason for which we are here today. There exist a number of life-saving tricks available for you to deploy. By using these little tricks, you can save up a lot of your valuable time while analyzing data in Python. You might be acquainted with some of them and not with others. But these all are considered the top tips to speed up your Data Analysis in Python.

  1. Make interactive pandas plots
  2. There is an in-built component of DataFrame class called the .plot() function in Pandas for visualization. But unfortunately, these plots are not that efficient and interactive. There are pandas.DatFrame.plot() function available also for easy plotting. But there is a more efficient yet effective way available to plot charts like Plotly with Pandas. It's the Cufflinks library that you can you for this purpose.

    Cufflinks combine together the interactivity of Plotly and the flexibility of Pandas. Have a look at its installation process and working-

    Installation:

    pip install plotly pip install cufflinks

    Usage:

    import pandas as pd import cufflinks as cf import plotly.offline cf. go_offline() cf. set_config_file (offline = False, world_readable = True)

    And finally, you can get your interactive plot as follows-

    df. iplot()

    If you are Interested in Programming, you can use Python Online Training and join the course to improve your skills in this field.

  3. Profile the pandas dataframe
  4. The profiling package of Python lets you understand the data. This is a simple as well as a quick method to explore and perform data analysis. Commonly, df.info() functions and df.describe() are the first steps in pandas in the process of EDA. But this provides just a simple picture of the dataset. Also, analyzing large datasets becomes a bit tough task for it. Contrary to it, the profiling function of pandas provides the additional df.profile_report() feature to analyze data in no time. An interactive and detailed HTML report is produced as a result of this function with just one line of code.

    Installation:

    pip install pandas- profiling or conda install –c anaconda pandas- profiling

    Usage:

    import pandas as pd import pandas_profiling df = pd.read_csv (‘data.csv’) df.profile_report()

    You can also export the report into an interactive HTML file as given below-

    profile = df.profile_report (title = ‘Pandas profiling report’) profile. to_file (output file =“data profiling.html”)

    The given statistics can be computed with a profiling package of pandas.

    1. Quantile statistics
    2. Histogram
    3. Descriptive statistics
    4. Correlations
    5. Most frequent values
    6. Essentials: unique values, types, and missing values

  5. Interactive debugging
  6. Another major feature of Python operators is the interactive debugger. %debug should be run after writing it in a new line whenever there is an exception in running the codes. This will bring you to the exception's position in a debugging environment which is highly interactive. The variable value can be checked, and the operations are performed here as well. You will need to hit q to exit the debugger.

  7. Use magic commands
  8. Jupyter Notebooks provide a collection of these Magic commands for finding solutions to most general data analysis problems. %1magic can be used to access all the available magic. They can be called without even typing % if they are already set to 1.

    There are two types of magic commands-
    1. They need a prefix of single % and are operated on a single input line. Examples include-

      • %pastebin- The code here is uploaded to Pastebin, and the URL is returned. It's an online service that hosts content and where the plain text such as the source code snippet can be stored. Then you can share the URL with others. Github gist is also similar to Pastebin.
      • %matplotlib notebook- you can use it for rendering static matplotlib inside the Jupyter notebook. You can also get resize-able plots if you replace "inside" with "notebook."
    2. They need a prefix of double %% and are operated on multiple input lines. Examples include-

      • %%latex- The cells are rendered as LaTeX here. It can be used for writing equations and mathematical formulae in a cell.
      • %%writefile- The cell content is written to a file. It writes the code to the foo.py file saved in the present directory.

  9. Make notes stand out by assigning colors to respective alert boxes
  10. In the Jupyter Notebooks, alert boxes can be used for highlighting anything that you want to stand out. The color of the boxes will depend upon the specified alert type. An example of a blue alert box for information is given below for reference-

    <div class = “alert alert- block alert- info”> <b>Tip: </b> Use blue box for notes and tips. Word "Note" need not be included if it's a note. </div>

    Other alerts include the yellow alert box for warning, red for danger, green for success, etc.

  11. Make the printing beautiful
  12. The data representation can be made pretty also in Python using the "pprint" module. When JSON data and dictionaries are printed, this module proves to be extremely useful.

  13. Use the "i" option to run python scripts
  14. The python hello.py is the typical command to run a python script. However, the addition of just an -i as Python -i hello.py could provide you with several more benefits. It occurs in the following two ways-

    1. No exit of Python occurs from the interpreter once the program is ended. The value variable value, as well as the function correctness, can be checked.
    2. A python debugger can be installed easily as you will still be in the interpreter.
    import pdb pdb. pm()

  15. Use a snippet to print all the cell outputs
  16. In a Jupyter Notebook cell, it prints just the last output of the cell. For the rest, the print() function needs to be added. However, you can add the following snippet to obtain all the outputs at once-

    from iPython.core.interactiveshell import InteractiveShell InteractiveShell. ast_node_interactivity = “all”

    The output by using the above snippet will be-

    In [1]: 11 + 1 12 + 2 13 + 3 14 + 4 15 + 5 Out [1]: 12 Out [1]: 14 Out [1]: 16 Out [1]: 18 Out [1]: 20

    To return to the original settings, use the given commands-

    InteractiveShell.ast_node_interactivity = “last_expr”

  17. Restore the cell deleted accidentally
  18. It's common to commit mistakes and delete a cell or cell content accidentally as we are humans. You can undo this in Jupyter Notebook with the given shortcut-

    • Use CTRL/CMD+Z to recover the cell content that you deleted.
    • Use EDIT > Undo Delete Cells or ESC+Z for recovering the whole cell that you deleted.

  19. Automate commenting of code
  20. The CTRL/CMD +/ command can be used to automate the commenting of selected lines in the code. This is a simple yet very useful trick that can help you speed up the process by a great amount. The same code line will also get uncommented, if you want, by using the same combination again.

Conclusion:

The volume of data is increasing by leaps and bounds every day. This makes it essential to come up with solutions that can save up time. And that time can be used further in more difficult tasks that need your attention. Any process gives its best only when used to the best of its capacity using all the short tricks. We hope the above information can help you optimize the usage of Python for speeding up data analysis.

                 






Comments










Search Anything:

Sponsored Deals ends in



Technical Quizzes Specially For You:

Search Tags