Import A website data as a table in ms fabric

Shafeeq Niyas 60 Reputation points
2024-07-14T08:58:30.28+00:00

i have a website i want to import as table and store in the datalake . its possible in power bi we can see as a table view, please give me a solution for this

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,997 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,843 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 26,186 Reputation points
    2024-07-14T10:55:46.8466667+00:00

    Have you heard about web scraping before?

    You can use tools Python for web scraping with BeautifulSoup.

    
    import requests
    
    from bs4 import BeautifulSoup
    
    import pandas as pd
    
    # URL of the website to scrape
    
    url = 'https://example.com/data'
    
    # Send a request to the website
    
    response = requests.get(url)
    
    # Parse the HTML content
    
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Extract the data you need
    
    data = []
    
    table = soup.find('table')  # Assuming the data is in a table
    
    for row in table.find_all('tr'):
    
        cols = row.find_all('td')
    
        cols = [ele.text.strip() for ele in cols]
    
        data.append(cols)
    
    # Convert to DataFrame
    
    df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
    
    df.to_csv('web_data.csv', index=False)
    

    Then you can transform the data as needed. This step can be done using Python, or you can leverage Azure Synapse or ADF for more complex transformations.

    For the data ingestion :

    Using Azure Synapse Analytics:

    1. Upload Data to Azure Data Lake:
      • Store the CSV file (web_data.csv) in Azure Data Lake.
    2. Load Data into Synapse:
      • Use Azure Synapse to create a table from the CSV file stored in Azure Data Lake.
    
    CREATE EXTERNAL DATA SOURCE MyExternalSource
    
    WITH (
    
        TYPE = HADOOP,
    
        LOCATION = 'https://mydatalakestorage.blob.core.windows.net/'
    
    );
    
    CREATE EXTERNAL FILE FORMAT MyFileFormat
    
    WITH (
    
        FORMAT_TYPE = DELIMITEDTEXT,
    
        FORMAT_OPTIONS (
    
            FIELD_TERMINATOR = ',',
    
            STRING_DELIMITER = '"',
    
            FIRST_ROW = 2
    
        )
    
    );
    
    CREATE EXTERNAL TABLE MyTable (
    
        Column1 VARCHAR(50),
    
        Column2 VARCHAR(50),
    
        Column3 VARCHAR(50)
    
    )
    
    WITH (
    
        LOCATION = 'path/to/web_data.csv',
    
        DATA_SOURCE = MyExternalSource,
    
        FILE_FORMAT = MyFileFormat
    
    );
    
    

    Using Azure Data Factory:

    1. Create Linked Services:
      • Link to the website (if API available) or the storage account.
    2. Create Pipelines:
      • Web scraping can be done using a Python script in Azure Data Factory's custom activities.
      • Ingest the data from the website, transform if needed, and store it in the data lake.
    3. Store Data in Azure Data Lake:
      • Define the destination as the Azure Data Lake and load the data.
    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.